Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting speaker states explicitly #6718

Draft
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

gnumpi
Copy link
Contributor

@gnumpi gnumpi commented May 10, 2024

What does this implement/fix?

Solves timing issues introduced with sending TTS responses via API to speaker.

  • set the speaker state to running only after receiving the started event.
  • doesn't stop speaker task on no data timeout, creates warning instead
  • finish() or stop() needs to be called explicitly to stop the speaker task.
  • finish() is introduced, it stops the task as soon as the audio queue is empty
  • VA writes only data to speaker if it is in running state
  • VA: set maximum chunk size of writing to speaker to 4 * 1024 (prevents blocking the loop for too long)

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Other

Related issue or feature (if applicable): fixes

Pull request in esphome-docs with documentation (if applicable): esphome/esphome-docs#

Test Environment

  • ESP32
  • ESP32 IDF
  • ESP8266
  • RP2040
  • BK72xx
  • RTL87xx

Example entry for config.yaml:

# Example config.yaml

Checklist:

  • The code change is tested and works locally.
  • Tests have been added to verify that the new code works (under tests/ folder).

If user exposed functionality or configuration variables are added/changed:

@codecov-commenter
Copy link

codecov-commenter commented May 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.25%. Comparing base (4d8b5ed) to head (a37e64f).
Report is 586 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #6718      +/-   ##
==========================================
+ Coverage   53.70%   54.25%   +0.54%     
==========================================
  Files          50       50              
  Lines        9408     9587     +179     
  Branches     1654     1689      +35     
==========================================
+ Hits         5053     5201     +148     
- Misses       4056     4062       +6     
- Partials      299      324      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gnumpi gnumpi force-pushed the speaker_status_adjustments branch from 6527312 to 1f5cfd9 Compare May 10, 2024 17:32
@gnumpi gnumpi marked this pull request as ready for review May 10, 2024 20:40
@gnumpi gnumpi requested a review from jesserockz as a code owner May 10, 2024 20:40
@probot-esphome
Copy link

Hey there @jesserockz, mind taking a look at this pull request as it has been labeled with an integration (i2s_audio) you are listed as a code owner for? Thanks!
Hey there @jesserockz, mind taking a look at this pull request as it has been labeled with an integration (speaker) you are listed as a code owner for? Thanks!
Hey there @jesserockz, mind taking a look at this pull request as it has been labeled with an integration (voice_assistant) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)

@nielsnl68
Copy link
Contributor

I have been testing it, sinds last night (MLT) and it is much much stabler then before.

esphome/components/i2s_audio/speaker/i2s_audio_speaker.cpp Outdated Show resolved Hide resolved
@@ -5,6 +5,7 @@ namespace speaker {

enum State : uint8_t {
STATE_STOPPED = 0,
STATE_WAITING_FOR_LOCK,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did not add this here in the first place is that not all speakers will have a lock to wait for. The lock is specific to the i2s speaker platform only and really the i2s speaker should just stay in STARTING state until it is actually ready then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is that the original solution did not finished correctly and that i found that the VA speaker buffer did got overflowed after some commands every time i tried. the audio was starting to get stuttered and breaking off.

The solution that @gnumpi suggested here, fixed this totally. The audio is now very stable and i do not see any buffer overflows anymore.

I understand what you are saying. An other solution would be that HA tells when there are no audio packages to receive anymore and that after the buffer is empty the speaker can be closed.
This was an suggestion i already wanted to make before @gnumpi created this solution for me.
See: https://discord.com/channels/429907082951524364/1229209995158225117

Copy link
Contributor Author

@gnumpi gnumpi May 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, I introduced this new state in order to not call try_lock when the lock is already owned while waiting for the task to setup the i2s driver. Sure, I could add an if clause to check if it is already owned, do you prefer this? On the other hand, platforms which don't need to wait for a resource could just skip this state and it could be called WAITING_FOR_RESOURCE, or something like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have no problems with the solution itself in terms of functionality. Just that the Lock is specific to the i2s platforms.

The i2s media player has this for internal tracking:

enum I2SState : uint8_t {
  I2S_STATE_STOPPED = 0,
  I2S_STATE_STARTING,
  I2S_STATE_RUNNING,
  I2S_STATE_STOPPING,
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my suggestions are either:

  • rename state to WAITING_FOR_RESOURCE (more general, can be skipped by platforms which don't need it)
  • add a state between STARTING and RUNNING e.g. PREPARING
  • add boolean to keep track if lock is already owned (while waiting for the i2s driver to be loaded inside the task)

which do you prefer? or do I miss a better solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer option 3 (or similar) because it keeps the starting implementation private to the i2s speaker. To the outside, its just starting until it is running.

esphome/components/speaker/speaker.h Outdated Show resolved Hide resolved
@esphome esphome bot marked this pull request as draft May 12, 2024 22:29
@esphome
Copy link

esphome bot commented May 12, 2024

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

@iperformance
Copy link

iperformance commented May 12, 2024 via email

@gnumpi gnumpi marked this pull request as ready for review May 13, 2024 11:01
@esphome esphome bot requested a review from jesserockz May 13, 2024 11:01
esphome/components/i2s_audio/speaker/i2s_audio_speaker.cpp Outdated Show resolved Hide resolved
Comment on lines -204 to +231
this->start();
ESP_LOGD(TAG, "Called play while speaker not running.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RTTTL component using the speaker does expect that the speaker start when play() is called.
This breaks that component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just adjusted the RTTTL component to also call start and finish explicitly. Could you give it a try @nielsnl68 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speaker/automation.h (yaml action speaker.play) will also need to be updated to call start before play

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of having two methods for the speaker? The first 'play' is non-blocking and state less, and hence, it can be used as an action. It copies the data to the speaker buffer, starts the task and then stops the task when all data has been played. Consequently, this method can only play audio that fits into the speaker buffer at once. And the second method e.g. 'stream' can be called the same way as the current implementation, when the speaker (task) is running.

@esphome esphome bot marked this pull request as draft May 13, 2024 12:39
Co-authored-by: NP v/d Spek <github_mail@lumensoft.nl>
Comment on lines +129 to +131
if (!this->speaker_->is_running()) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!this->speaker_->is_running()) {
return;
}

This should not be here. The below code will add more wave sounds to the speaker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you suggest to put it? Doesn't this prevent to start playing until the speaker is ready? And at the end of the audio the speaker running state should be unset only after the last samples have been send. Can you point me to the lines where you see the problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you want to be sure the RTTTL starts only when the speaker is has stopped.
Then we rearrange the code and a Starting and started mode like it is done in VA etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants