Use this code as a base for doing real time transcription of a phone call using AWS Transcribe Streaming API. Read more about this here
An audio stream is sent via WebSockets connection to the resulting server and then relayed to the AWS Transcribe Streaming API service. Speech transcription is performed and the text returned to the console.
Authorize an IAM user by attaching the following policy to a user: (In IAM select the user, then click "Add inline policy" to the right and select the JSON tab. Inline is required because it appears the AWS interface has not yet been updated with the new function.)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "transcribestreaming",
"Effect": "Allow",
"Action": "transcribe:StartStreamTranscriptionWebSocket",
"Resource": "*"
}
]
}
NOTE: The AWS panel will indicate transcribe:StartStreamTranscriptWebSocket
is unknown. Ignore that.
In order to run this on Heroku, gather the following information and edit app.json:
API_KEY
- This is the API key from a Nexmo Account.API_SECRET
- This is the API secret from a Nexmo Account.LANG_CODE
- The language code. One of en-US, en-GB, fr-FR, fr-CA, es-US.SAMPLE_RATE
- The sample rate of the audio, in Hz. Max of 16000 for en-US and es-US, and 8000 for the other languages. Phone conversations are recommended at 8000.AWS_REGION
- AWS region. See - AWS Availability RegionsAWS_ACCESS_KEY_ID
- The AWS access key ID from the IAM user.AWS_SECRET_ACCESS_KEY
- The AWS secret access key from the IAM user.
This will create a new Nexmo application and phone number to begin testing with. View the logs to see the transcription response from the service. This can be done through the Heroku dashboard, or with the Heroku CLI using heroku logs -t
.
To run this locally, install an up-to-date version of Node.
Start by installing the dependencies with:
npm install
Then copy the example.env file to a new file called .env:
cp .env.example > .env
Edit the .env file to add in an application ID and the credentials from the authorized IAM user..
APP_ID=
LANG_CODE= (One of en-US, en-GB, fr-FR, fr-CA, es-US)
SAMPLE_RATE= (Max of 16000 for en-US and es-US, and 8000 for the other languages. Phone conversations are recommended at 8000.)
AWS_REGION=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
Now serve it up, and then use something like Ngrok to serve it
node server.js
This will create a new image with all the dependencies and run it at http://localhost:8000.
Tools like ngrok are great for exposing ports from a local machine to the internet. For instructions, check out this guide.
To run the app using Docker run the following command in a terminal:
docker-compose up
This will create a new image with all the dependencies and run it at http://localhost:8000.
Create a new Nexmo Voice application for this app, and associated it with a Nexmo number.
Install the CLI by following these instructions. Then create a new Nexmo Voice application that also sets up an answer_url
and event_url
for the app running locally on the machine.
Ensure to append /ncco or /event to the end of the URL to coincide with the routes in the script.
nexmo app:create aws-transcribe https://<your_hostname>/ncco https://<your_hostname>/event
nexmo app:create aws-transcribe {answer_url} {event_url}
IMPORTANT: This will return an application ID, and a private key. The application ID will be needed for the nexmo link:app as well as the .env file later, and create a file named private.key in the same location/level as server.js, by default, containing the private key.
If you don't have a number already in place, obtain one from Nexmo. This can also be achieved using the CLI by running this command:
nexmo number:buy
Finally, link the new number to the created application by running:
nexmo link:app YOUR_NUMBER YOUR_APPLICATION_ID