__ __ ___ ___ __ __ ___ __ __ __ ___ __
/__` |__) |__ |__ / ` |__| |__) |__ / ` / \ / _` |\ | | | | / \ |\ |
.__/ | |___ |___ \__, | | | \ |___ \__, \__/ \__> | \| | | | \__/ | \|
Browsers are all latest as of 2018-06-28, except:
- macOS was 10.13.1 (2017-10-31), instead of 10.13.5
- Since Safari does not support Web Speech API, the test matrix remains the same
- Xbox was tested on Insider build (1806) with Kinect sensor connected
- The latest Insider build does not support both WebRTC and Web Speech API, so we suspect the production build also does not support both
Quick grab:
- Web Speech API
- Works on most popular platforms, except iOS. Some requires non-default browser.
- iOS: None of the popular browsers support Web Speech API
- Windows: requires Chrome
- Cognitive Services Speech-to-Text
- Works on default browsers on all popular platforms
- iOS: Chrome and Edge does not support Cognitive Services (WebRTC)
Platform | OS | Browser | Cognitive Services (WebRTC) | Web Speech API |
---|---|---|---|---|
PC | Windows 10 (1803) | Chrome 67.0.3396.99 | Yes | Yes |
PC | Windows 10 (1803) | Edge 42.17134.1.0 | Yes | No, SpeechRecognition not implemented |
PC | Windows 10 (1803) | Firefox 61.0 | Yes | No, SpeechRecognition not implemented |
MacBook Pro | macOS High Sierra 10.13.1 | Chrome 67.0.3396.99 | Yes | Yes |
MacBook Pro | macOS High Sierra 10.13.1 | Safari 11.0.1 | Yes | No, SpeechRecognition not implemented |
Apple iPhone X | iOS 11.4 | Chrome 67.0.3396.87 | No, AudioSourceError |
No, SpeechRecognition not implemented |
Apple iPhone X | iOS 11.4 | Edge 42.2.2.0 | No, AudioSourceError |
No, SpeechRecognition not implemented |
Apple iPhone X | iOS 11.4 | Safari | Yes | No, SpeechRecognition not implemented |
Apple iPod (6th gen) | iOS 11.4 | Chrome 67.0.3396.87 | No, AudioSourceError |
No, SpeechRecognition not implemented |
Apple iPod (6th gen) | iOS 11.4 | Edge 42.2.2.0 | No, AudioSourceError |
No, SpeechRecognition not implemented |
Apple iPod (6th gen) | iOS 11.4 | Safari | No, AudioSourceError |
No, SpeechRecognition not implemented |
Google Pixel 2 | Android 8.1.0 | Chrome 67.0.3396.87 | Yes | Yes |
Google Pixel 2 | Android 8.1.0 | Edge 42.0.0.2057 | Yes | Yes |
Google Pixel 2 | Android 8.1.0 | Firefox 60.1.0 | Yes | Yes |
Microsoft Lumia 950 | Windows 10 (1709) | Edge 40.15254.489.0 | No, AudioSourceError |
No, SpeechRecognition not implemented |
Microsoft Xbox One | Windows 10 (1806) 17134.4054 | Edge 42.17134.4054.0 | No, AudioSourceError |
No, SpeechRecognition not implemented |
Interactive mode means continuous
is set to false
. In Cognitive Services Speech Services SDK, this translate to recognizeOnceAsync
.
Continuous mode means continuous
is set to true
, which is startContinuousRecognitionAsync
in Cognitive Services Speech SDK.
- Interactive mode (with interim results)
- W3C Web Speech API
start
audiostart
soundstart
speechstart
- One or more
result
events, ifinterimResults
is set totrue
speechend
soundend
audioend
result
results === [{ isFinal: true }]
end
- Cognitive Services Speech Services
- Call
recognizeOnceAsync()
- Receive zero or more
recognizing
event- With notable text in
result.text
result.json
is similar to{"Text":"text","Offset":200000,"Duration":32400000}
- With notable text in
- Receive a final
recognized
eventresult.json
is similar to{"RecognitionStatus":"Success","Offset":1800000,"Duration":48100000,"NBest":[{"Confidence":0.2331869,"Lexical":"no","ITN":"no","MaskedITN":"no","Display":"No."}]}
onSuccess(result)
callback fromrecognizeOnceAsync()
result
is similar to or same as theevent.result
object received fromrecognized(event)
- Call
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
start
audiostart
soundstart
speechstart
- One or more
results
, ifinterimResults
is set totrue
results === [{ isFinal: true }, { isFinal: true }]
- All with
isFinal === true
- (When
stop()
is called) speechend
soundend
audioend
end
- Cognitive Services Speech Services
- TBD
CallstartContinuousRecognitionAsync()
Receivestart
eventReceive multiplerecognizing
event❗ When speaking slowly with significant delay between sentences, the SDK is only able to recognize first sentence
CallstopContinuousRecognitionAsync()
Observed microphone stop recording
Receivestop
event
- W3C Web Speech API
stop()
is a supported feature in Web Speech API for push-to-talk operation.
❗ Cognitive Services does not support push-to-talk natively, we are trying to mimic the behavior by hiding the output after stop()
is called.
- We are taking the latest interim results as the final results
- Lexical ("one two three") does not get converted into ITN ("123") for interim results
- Cognitive Services does not return confidence for interims, thus, we will assume it is
0.5
- Microphone will not stop recording immediately
- Interactive mode (with interim results)
- W3C Web Speech API
start
audiostart
- Optional,
soundstart
- Optional,
speechstart
- Optional,
speechend
- Optional,
soundend
audioend
end
- Cognitive Services
recognizeOnceAsync
does not support stop or cancellation, thus, we need to mimic the behavior by ignoring somerecognizing
and the finalrecognized
event
- Call
recognizeOnceAsync()
- (
stop()
is called) - Receive a final
recognized
event onSuccess(result)
callback fromrecognizeOnceAsync()
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services
- TBD
- W3C Web Speech API
- Interactive mode (with interim results)
- W3C Web Speech API
start
audiostart
soundstart
speechstart
- One or more
result
events, ifinterimResults
is set totrue
speechend
soundend
audioend
- ❓ One or more
result
withresults === [{ isFinal: false }]
result
results === [{ isFinal: true }]
end
- Cognitive Services
recognizeOnceAsync
does not support stop or cancellation, thus, we need to mimic the behavior by ignoring somerecognizing
and the finalrecognized
event
- Call
recognizeOnceAsync()
- Receive zero or more
recognizing
event- With notable text in
result.text
result.json
is similar to{"Text":"text","Offset":200000,"Duration":32400000}
- With notable text in
- Receive a final
recognized
eventresult.json
is similar to{"RecognitionStatus":"Success","Offset":1800000,"Duration":48100000,"NBest":[{"Confidence":0.2331869,"Lexical":"no","ITN":"no","MaskedITN":"no","Display":"No."}]}
onSuccess(result)
callback fromrecognizeOnceAsync()
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services
- TBD
- W3C Web Speech API
- Interactive mode (with interim results)
- W3C Web Speech API
start
audiostart
audioend
error
error === 'aborted'
end
- Cognitive Services
- There is no
abort()
equivalent forrecognizeOnceAsync()
, thus, microphone will not stop recording immediately
- There is no
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services
- TBD
- W3C Web Speech API
- Interactive mode
- W3C Web Speech API
start
audiostart
soundstart
speechstart
- One or more
result
events, ifinterimResults
is set totrue
speechend
soundend
audioend
error
error === 'aborted'
end
- Cognitive Services
- TBD
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services
- TBD
- W3C Web Speech API
Turn on airplane mode.
- Interactive mode
- W3C Web Speech API
start
audiostart
audioend
error
error === 'network'
end
- Cognitive Services Speech Services
- Received
canceled
eventerrorDetails === 'Unable to contact server. StatusCode: 1006, Reason: '
error
callback is receivederrorDetails === 'Unable to contact server. StatusCode: 1006, Reason: '
- (Microphone was not turned on, or too short to detect if it has turned on)
- Received
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services Speech Services
- TBD
- W3C Web Speech API
Since browser speech does not requires subscription key, we assume this flow should be same as airplane mode.
- Interactive mode
- W3C Web Speech API
start
audiostart
audioend
error
error === 'network'
end
- Cognitive Services Speech Services
- Console (on Chrome) logged
WebSocket connection to 'wss://westus.stt.speech.microsoft.com/speech/recognition/interactive/cognitiveservices/v1?language=en-US&format=detailed&Ocp-Apim-Subscription-Key=...&X-ConnectionId=...' failed: HTTP Authentication failed; no valid credentials available
. - Received
canceled
eventerrorDetails === 'Unable to contact server. StatusCode: 1006, Reason: '
reason === 0
error
callback is receivederrorDetails === 'Unable to contact server. StatusCode: 1006, Reason: '
- (Microphone was not turned on, or too short to detect if it has turned on)
- Console (on Chrome) logged
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
- TBD
- Cognitive Services Speech Services
- TBD
- W3C Web Speech API
Microphone is muted and record level is at zero. This should be distinguishable by missing of soundstart
event on Web Speech API.
- Interactive mode
- W3C Web Speech API
start
audiostart
audioend
error
error === 'no-speech'
end
- Cognitive Services Speech Services
- After 5 seconds of silence,
recognized
result.json.RecognitionStatus === 'InitialSilenceTimeout'
result.offset === 50000000
- Microphone is off after this event
- After 5 seconds of silence,
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
start
audiostart
audioend
error
error === 'no-speech'
- Even in continuous mode, browser will timeout with
no-speech
after 5 seconds
end
- Cognitive Services Speech Services
start
After 15 seconds of silence,recognized
json.RecognitionStatus === 'InitialSilenceTimeout'
offset === 150000000
(Whenstop()
),stop
- W3C Web Speech API
Some sounds are heard, but they cannot be recognized as text. There could be some interim results with recognized text, but the confidence is so low it dropped out of final result.
- Interactive mode
- W3C Web Speech API
start
audiostart
soundstart
speechstart
speechend
soundend
audioend
end
- Cognitive Services Speech Services
- TBD
After 5 seconds of unrecognizable sound,recognized
json.RecognitionStatus === 'InitialSilenceTimeout'
offset === 50000000
Microphone is off after this event
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
start
audiostart
soundstart
speechstart
- (When
stop()
) speechend
soundend
audioend
end
- Cognitive Services Speech Services
start
After 15 seconds of unrecognizable sound,recognized
json.RecognitionStatus === 'InitialSilenceTimeout'
offset === 150000000
(Whenstop()
)stop
- W3C Web Speech API
- Interactive mode
- W3C Web Speech API
- (No
start
event was received) error
error === 'not-allowed'
end
- (No
- Cognitive Services Speech Services
recognizeOnceAsync(success, error)
returned witherror
callback"Runtime error: 'Error handler for error Error occurred during microphone initialization: NotAllowedError: Permission denied threw error Error: Error occurred during microphone initialization: NotAllowedError: Permission denied'"
- W3C Web Speech API
- Continuous mode
- W3C Web Speech API
error
error === 'not-allowed'
end
- Cognitive Services Speech Services
- TBD
- W3C Web Speech API