Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32 S3 BOX 3 - voice assistant - wake word detected and hangs forever #166

Open
gusarov opened this issue Feb 22, 2024 · 6 comments
Open

Comments

@gusarov
Copy link

gusarov commented Feb 22, 2024

Based on a logs it is clear that it detects "ok nabu" and kind of trying to start some next steps, but the image on the screen is not changing, it is still black background, nothing happens, and it stuck right there, never have any other movement. If I reset, it can repeat same logs and I do see reaction in a logs after "ok nabu", but that's it.

Could you give some hints, where is the next problem? HA is configured with whisper+piper+openWakeWord, ESPHome can compile and install OTA, what should I do next?

[00:39:40][W][micro_wake_word:150]: Wake word is already running
[00:39:47][D][micro_wake_word:121]: Wake Word Detected
[00:39:47][D][micro_wake_word:170]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[00:39:47][D][micro_wake_word:127]: Stopping Microphone
[00:39:47][D][esp_adf.microphone:234]: Stopping microphone
[00:39:47][D][micro_wake_word:170]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[00:39:47][D][esp-idf:000]: W (24293) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT

[00:39:47][D][esp-idf:000]: E (24296) AUDIO_ELEMENT: [filter] Element already stopped

[00:39:47][D][esp-idf:000]: W (24324) AUDIO_PIPELINE: There are no listener registered

[00:39:47][D][esp-idf:000]: I (24327) AUDIO_PIPELINE: audio_pipeline_unlinked

[00:39:47][D][esp-idf:000]: W (24332) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[00:39:47][D][esp-idf:000]: I (24337) I2S: DMA queue destroyed

[00:39:47][D][esp-idf:000]: W (24343) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE

[00:39:47][D][esp-idf:000]: W (24346) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[00:39:47][D][esp_adf.microphone:285]: Microphone stopped
[00:39:47][D][micro_wake_word:170]: State changed from STOPPING_MICROPHONE to IDLE
[00:39:47][D][voice_assistant:414]: State changed from IDLE to START_PIPELINE
[00:39:47][D][voice_assistant:420]: Desired state set to START_MICROPHONE
[00:39:47][D][voice_assistant:118]: microphone not running
[00:39:47][D][voice_assistant:202]: Requesting start...
[00:39:47][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE
[00:39:47][D][voice_assistant:435]: Client started, streaming microphone
[00:39:47][D][voice_assistant:414]: State changed from STARTING_PIPELINE to START_MICROPHONE
[00:39:47][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[00:39:47][D][voice_assistant:155]: Starting Microphone
[00:39:47][D][voice_assistant:414]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[00:39:47][D][esp-idf:000]: I (24409) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[00:39:47][D][esp-idf:000]: I (24417) I2S: I2S0, MCLK output by GPIO2

[00:39:47][D][esp-idf:000]: I (24426) AUDIO_PIPELINE: link el->rb, el:0x3d05a7f8, tag:i2s, rb:0x3d05ac0c

[00:39:47][D][esp-idf:000]: I (24430) AUDIO_PIPELINE: link el->rb, el:0x3d05a96c, tag:filter, rb:0x3d05cc4c

[00:39:47][D][esp-idf:000]: I (24441) AUDIO_ELEMENT: [i2s-0x3d05a7f8] Element task created

@johnrosshunt
Copy link

I'm having the same issue with an S3-3 box and an S3-lite box. I think this bug and also this bug are related. In my case the command executes once and then the S3 box locks up. There isn't any audio response -- just a noticeable "pop", and then nothing.

esp32-s3-box-3_logs.txt
esp32-s3-box-lite_logs.txt

@gusarov
Copy link
Author

gusarov commented Feb 28, 2024

I'm having the same issue with an S3-3 box and an S3-lite box. I think this bug and also this bug are related. In my case the command executes once and then the S3 box locks up. There isn't any audio response -- just a noticeable "pop", and then nothing.

esp32-s3-box-3_logs.txt esp32-s3-box-lite_logs.txt

btw, yes, I also can hear "pop". But command is not executed (likely because something else is missing down the way).
And this "pop" is only once, then it stuck forever until I press reboot

@kyutov
Copy link

kyutov commented Mar 1, 2024

I'm observing exactly the same comportment with ESP32-S3-BOX-3(B)

ESPHome version: 2024.2.1
Home Assistant version: 2024.2.5

Here are my logs:

INFO ESPHome 2024.2.1
INFO Reading configuration config/esp32-s3-box-3-5a7504.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
WARNING GPIO0 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
INFO Starting log output from /dev/cu.usbmodem2101 with baud rate 115200
[09:53:23][D][api:102]: Accepted xxx.xxx.xxx.xxx
[09:53:23][D][api.connection:1121]: Home Assistant 2024.2.5 (xxx.xxx.xxx.xxx): Connected successfully
[09:53:23][W][component:214]: Component api took a long time for an operation (0.21 s).
[09:53:23][W][component:215]: Components should block for at most 20-30ms.
[09:53:23][D][micro_wake_word:170]: State changed from IDLE to START_MICROPHONE
[09:53:23][W][component:214]: Component api took a long time for an operation (0.21 s).
[09:53:23][W][component:215]: Components should block for at most 20-30ms.
[09:53:23][D][micro_wake_word:108]: Starting Microphone
[09:53:23][D][micro_wake_word:170]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[09:53:24][D][esp-idf:000]: I (36899) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[09:53:24][D][esp-idf:000]: I (36905) I2S: I2S0, MCLK output by GPIO2
[09:53:24][D][esp-idf:000]: I (36912) AUDIO_PIPELINE: link el->rb, el:0x3d05a7f8, tag:i2s, rb:0x3d05ac0c
[09:53:24][D][esp-idf:000]: I (36915) AUDIO_PIPELINE: link el->rb, el:0x3d05a96c, tag:filter, rb:0x3d05cc4c
[09:53:24]
[09:53:24][D][esp-idf:000]: I (36920) AUDIO_ELEMENT: [i2s-0x3d05a7f8] Element task created
[09:53:24][D][esp-idf:000]: I (36922) AUDIO_THREAD: The filter task allocate stack on external memory
[09:53:24]
[09:53:24][D][esp-idf:000]: I (36925) AUDIO_ELEMENT: [filter-0x3d05a96c] Element task created
[09:53:24][D][esp-idf:000]: I (36927) AUDIO_ELEMENT: [raw-0x3d05aa9c] Element task created
[09:53:24]
[09:53:24][D][esp-idf:000]: I (36930) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16487695 Bytes, Inter:98700 Bytes, Dram:98700 Bytes
[09:53:24]
[09:53:24]
[09:53:24][D][esp-idf:000]: I (36932) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
[09:53:24][D][esp-idf:000]: I (36934) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1
[09:53:24]
[09:53:24][D][esp-idf:000]: I (36937) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1[D][esp-idf:000]: I (36940) AUDIO_PIPELINE: Pipeline started
[09:53:24]
[09:53:24][D][esp_adf.microphone:273]: Microphone started
[09:53:24][D][micro_wake_word:170]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[09:53:24][W][micro_wake_word:150]: Wake word is already running
[09:53:45][D][micro_wake_word:121]: Wake Word Detected
[09:53:45][D][micro_wake_word:170]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[09:53:45][D][micro_wake_word:127]: Stopping Microphone
[09:53:45][D][esp_adf.microphone:234]: Stopping microphone
[09:53:45][D][micro_wake_word:170]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[09:53:45][D][esp-idf:000]: W (58054) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT
[09:53:45]
[09:53:45][D][esp-idf:000]: E (58056) AUDIO_ELEMENT: [filter] Element already stopped
[09:53:45][D][esp-idf:000]: W (58088) AUDIO_PIPELINE: There are no listener registered
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58090) AUDIO_PIPELINE: audio_pipeline_unlinked
[09:53:45][D][esp-idf:000]: W (58090) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58092) I2S: DMA queue destroyed
[09:53:45]
[09:53:45][D][esp-idf:000]: W (58094) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE
[09:53:45]
[09:53:45][D][esp-idf:000]: W (58096) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE
[09:53:45]
[09:53:45][D][esp_adf.microphone:285]: Microphone stopped
[09:53:45][D][micro_wake_word:170]: State changed from STOPPING_MICROPHONE to IDLE
[09:53:45][D][voice_assistant:414]: State changed from IDLE to START_PIPELINE
[09:53:45][D][voice_assistant:420]: Desired state set to START_MICROPHONE
[09:53:45][D][voice_assistant:118]: microphone not running
[09:53:45][D][voice_assistant:202]: Requesting start...
[09:53:45][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE
[09:53:45][D][voice_assistant:118]: microphone not running
[09:53:45][D][voice_assistant:435]: Client started, streaming microphone
[09:53:45][D][voice_assistant:414]: State changed from STARTING_PIPELINE to START_MICROPHONE
[09:53:45][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[09:53:45][D][voice_assistant:155]: Starting Microphone
[09:53:45][D][voice_assistant:414]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[09:53:45][D][esp-idf:000]: I (58138) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58142) I2S: I2S0, MCLK output by GPIO2
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58146) AUDIO_PIPELINE: link el->rb, el:0x3d05a7f8, tag:i2s, rb:0x3d05ac0c
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58149) AUDIO_PIPELINE: link el->rb, el:0x3d05a96c, tag:filter, rb:0x3d05cc4c
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58152) AUDIO_ELEMENT: [i2s-0x3d05a7f8] Element task created
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58154) AUDIO_THREAD: The filter task allocate stack on external memory
[09:53:45][D][esp-idf:000]: I (58157) AUDIO_ELEMENT: [filter-0x3d05a96c] Element task created
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58159) AUDIO_ELEMENT: [raw-0x3d05aa9c] Element task created
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58161) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16489247 Bytes, Inter:100252 Bytes, Dram:100252 Bytes
[09:53:45]
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58163) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58165) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58169) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1
[09:53:45]
[09:53:45][D][esp-idf:000]: I (58173) AUDIO_PIPELINE: Pipeline started
[09:53:45]
[09:53:45][D][esp_adf.microphone:273]: Microphone started
[09:53:45][D][voice_assistant:414]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[09:57:49][I][ota:117]: Boot seems successful, resetting boot loop counter.
[09:57:49][D][esp32.preferences:114]: Saving 1 preferences to flash...
[09:57:49][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed

@kyutov
Copy link

kyutov commented Mar 1, 2024

I have found the reason which was causing this comportment in my setup. When I have configured the Assist pipeline, I haven't added any text-to-speech engine (thinking that in that way I will just not get any voice feedback from the box). I have added a text-to-speech engine and now the wake word seems to be working as expected.

@johnrosshunt
Copy link

I resolved the hang problem with my setup. I was using a self-signed SSL certificate, which apparently doesn't work. I went the Duck DNS/NGINX SSL proxy route (instead of HA Cloud) and now everything works as advertised. However, I still get an annoying "pop" every time the speaker plays audio.

I'm a bit disappointed that an SSL certificate from a valid CA authority is required just for a speaker. I had to enable port forwarding on my router and expose port 443 to the Internet. My goal was allow only local access, but I guess that's not possible if you want to use a voice assistant.

The Willow Project uses tokens to authenticate with the HA server and can run 100% local. I wonder if something like that would be possible to implement as alternative to SSL certificates?

@jaymunro
Copy link

jaymunro commented Mar 6, 2024

I believe I have resolved this in #173 this morning.

Try saying the wake word a second time it should continue as normal, but this will only be the case if you have Open Wake Word fully set up and configured in the pipeline in HA, otherwise it will hang forever. The issue I found was that VA is started with wake_word: true so it doesn't start in listening mode, but in wake word streaming mode.

To resolve, try my forked branch in #173.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants