Delayed Microphone Audio Capture #7681

majweldon · 2024-03-12T20:00:04Z

Describe the bug

Ver 3.48.0 (Desired Behaviour)
-As soon as I push stop recording in a microphone input I can push submit (for transcription) with no errors. That is, the file seems usable from the moment stop is pushed.

Ver 4.21.0
-Once I stop a recording, I have to wait some time before the audio 'captures' before I can push submit. This delay is about 1 second for every 10 seconds of recording, so can be substantial for 5+ minutes of audio. I don't mind if there is additional latency, but, ideally, I can push the submit button as soon as I am done recording and come back once everything is done.

Thanks for building and supporting Gradio - it has changed my professional life for the better in a big way.

Mike :)

Have you searched existing issues? 🔎

I have searched and found no existing issues

Reproduction

[Weldon_Full_Visit_Format.txt](https://github.com/gradio-app/gradio/files/14577976/Weldon_Full_Visit_Format.txt)
import os
import openai
import time
from numpy import True_
import gradio as gr
import soundfile as sf
from pydub import AudioSegment

from openai import OpenAI

# Load API key from an environment variable
OPENAI_SECRET_KEY = os.environ.get("OPENAI_SECRET_KEY")
client = OpenAI(api_key = OPENAI_SECRET_KEY)

note_transcript = ""

def transcribe(audio, history_type):
  global note_transcript
  print(f"Received audio file path: {audio}")
     
  history_type_map = {
      "History": "Weldon_History_Format.txt",
      "Physical": "Weldon_PE_Note_Format.txt",
      "H+P": "Weldon_History_Physical_Format.txt",
      "Impression/Plan": "Weldon_Impression_Note_Format.txt",
      "Handover": "Weldon_Handover_Note_Format.txt",
      "Meds Only": "Medications.txt",
      "EMS": "EMS_Handover_Note_Format.txt",
      "Triage": "Triage_Note_Format.txt",
      "Full Visit": "Weldon_Full_Visit_Format.txt",
      "Psych": "Weldon_Psych_Format.txt",
      "SBAR": "SBAR.txt"
      
   }
  file_name = history_type_map.get(history_type, "Weldon_Full_Visit_Format.txt")
  with open(f"Format_Library/{file_name}", "r") as f:
    role = f.read()
  messages = [{"role": "system", "content": role}]

  ######################## Read audio file, wait as necessary if not written
  max_attempts = 1
  attempt = 0
  audio_data = None
  samplerate = None
  while attempt < max_attempts:
      try:
          if audio is None:
              raise TypeError("Invalid file: None")
          audio_data, samplerate = sf.read(audio)
          break
      except (OSError, TypeError) as e:
          print(f"Attempt {attempt + 1} of {max_attempts} failed with error: {e}")
          attempt += 1
          time.sleep(3)
  else:
      print(f"###############Failed to open audio file after {max_attempts} attempts.##############")
      return  # Terminate the function or raise an exception if the file could not be opened


  ########## Cast as float 32, normalize
  #audio_data = audio_data.astype("float32")
  #audio_data = (audio_data * 32767).astype("int16")
  #audio_data = audio_data.mean(axis=1)

  ###################Code to convert .wav to .mp3 (if neccesary)
  sf.write("Audio_Files/test.wav", audio_data, samplerate, subtype='PCM_16')
  sound = AudioSegment.from_wav("Audio_Files/test.wav")
  sound.export("Audio_Files/test.mp3", format="mp3")

  sf.write("Audio_Files/test.mp3", audio_data, samplerate)
  
    
  ################  Send file to Whisper for Transcription
  audio_file = open("Audio_Files/test.mp3", "rb")
  
  max_attempts = 3
  attempt = 0
  while attempt < max_attempts:
      try:
          audio_transcript = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
          break
      except openai.error.APIConnectionError as e:
          print(f"Attempt {attempt + 1} failed with error: {e}")
          attempt += 1
          time.sleep(3) # wait for 3 seconds before retrying
  else:
      print("Failed to transcribe audio after multiple attempts")  
    
  print(audio_transcript.text)
  messages.append({"role": "user", "content": audio_transcript.text})
  
  #Create Sample Dialogue Transcript from File (for debugging)
  #with open('Audio_Files/Test_Elbow.txt', 'r') as file:
  #  audio_transcript = file.read()
  #messages.append({"role": "user", "content": audio_transcript})
  

  ### Word and MB Count
  file_size = os.path.getsize("Audio_Files/test.mp3")
  mp3_megabytes = file_size / (1024 * 1024)
  mp3_megabytes = round(mp3_megabytes, 2)

  audio_transcript_words = audio_transcript.text.split() # Use when using mic input
  #audio_transcript_words = audio_transcript.split() #Use when using file

  num_words = len(audio_transcript_words)


  #Ask OpenAI to create note transcript
  response = client.chat.completions.create(model="gpt-4-1106-preview", temperature=0, messages=messages)
  note_transcript = response.choices[0].message.content
  print(note_transcript) 
  return [note_transcript, num_words,mp3_megabytes]

#Define Gradio Interface
my_inputs = [
    gr.Audio(sources=["microphone"], type="filepath",format="mp3"),
    gr.Radio(["History","H+P","Impression/Plan","Full Visit","Handover","Psych","EMS","SBAR","Meds Only"], show_label=False),
]

ui = gr.Interface(fn=transcribe, 
                  inputs=my_inputs, 
                  outputs=[gr.Textbox(label="Your Note", show_copy_button=True),
                           gr.Number(label="Audio Word Count"),
                           gr.Number(label=".mp3 MB")]
                 )


ui.launch(share=False, debug=True)

Screenshot

No response

Logs

Attempt 1 of 1 failed with error: Invalid file: None
###############Failed to open audio file after 1 attempts.##############
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.9/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/user/.local/lib/python3.9/site-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/user/.local/lib/python3.9/site-packages/gradio/blocks.py", line 1570, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "/home/user/.local/lib/python3.9/site-packages/gradio/blocks.py", line 1397, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "/home/user/.local/lib/python3.9/site-packages/gradio/blocks.py", line 1371, in validate_outputs
    raise ValueError(
ValueError: An event handler (transcribe) didn't receive enough output values (needed: 3, received: 1).
Wanted outputs:
    [textbox, number, number]
Received outputs:
    [None]

System Info

Gradio Environment Information:
------------------------------
Operating System: Linux
gradio version: 4.14.0
gradio_client version: 0.8.0

------------------------------------------------
gradio dependencies in your environment:

aiofiles: 23.2.1
altair: 5.2.0
fastapi: 0.110.0
ffmpy: 0.3.2
gradio-client==0.8.0 is not installed.
httpx: 0.27.0
huggingface-hub: 0.19.4
importlib-resources: 6.1.3
jinja2: 3.1.3
markupsafe: 2.1.5
matplotlib: 3.8.3
numpy: 1.26.2
orjson: 3.9.15
packaging: 23.2
pandas: 2.1.3
pillow: 10.2.0
pydantic: 2.6.4
pydub: 0.25.1
python-multipart: 0.0.9
pyyaml: 6.0.1
semantic-version: 2.10.0
tomlkit==0.12.0 is not installed.
typer: 0.9.0
typing-extensions: 4.8.0
uvicorn: 0.28.0
authlib; extra == 'oauth' is not installed.
itsdangerous; extra == 'oauth' is not installed.


gradio_client dependencies in your environment:

fsspec: 2023.10.0
httpx: 0.27.0
huggingface-hub: 0.19.4
packaging: 23.2
typing-extensions: 4.8.0
websockets: 11.0.3



Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.

Severity

I can work around it

abidlabs · 2024-03-13T19:20:37Z

Thanks @majweldon for the kind words! cc @hannahblair and @dawoodkhan82 as well as this relates to frontend validation

aliabid94 · 2024-04-02T22:08:49Z

taking a look

aliabid94 · 2024-04-02T22:59:01Z

Okay so I haven't exactly been able to reproduce the extent of lag that you describe. If I record a very long audio (~5 min), I do encounter two sources of lag:

Generating the waveform in the browser (this is new to gradio 4.x.). However on my Macbook pro, this only takes ~2s.
Processing the file for saving in the backend. This can take 5-6 seconds. However, this is identical in gradio 4.x and 3.42, so I'm not sure why you wouldn't see this in 3.42.

I made a PR that improves the performance of (2) - it only works though if the recorded audio format is "wav". Can you try installing the gradio from this PR. So do the following:

pip install https://gradio-builds.s3.amazonaws.com/b35e3ae839d208520180299077f4ce57bb96fca4/gradio-4.25.0-py3-none-any.whl
Change your audio component to gr.Audio(sources=["microphone"], type="filepath",format="wav")

See if you notice a difference in performance and lmk.

majweldon · 2024-04-05T16:22:39Z

Thank you so much for your time and effort @abidlabs. I have done as you said (with the url pasted into my requirements.txt), and am using the .wav format. It builds and runs with the PR library, but I still have a significant lag (about 12 seconds per minute of recorded audio) before I can process any audio data. I have attached the error log for reference. Here, I press the submit button once before the audio captures and once afterwards. I can tell the audio captures because the waveform in the gradio interface will refresh, though it is visibly the same waveform. Post capture, there is a valid audio path passed to my transcribe function which is missing in the pre-capture. Mike :)

On Tue, 2 Apr 2024 at 16:59, aliabid94 ***@***.***> wrote: Okay so I haven't exactly been able to reproduce the extent of lag that you describe. If I record a very long audio (~5 min), I do encounter two sources of lag: 1. Generating the waveform in the browser (this is new to gradio 4.x.). However on my Macbook pro, this only takes ~2s. 2. Processing the file for saving in the backend. This can take 5-6 seconds. However, this is identical in gradio 4.x and 3.42, so I'm not sure why you wouldn't see this in 3.42. I made a PR that improves the performance of (2) - it only works though if the recorded audio format is "wav". Can you try installing the gradio from this PR <#7917>. So do the following: 1. pip install https://gradio-builds.s3.amazonaws.com/b35e3ae839d208520180299077f4ce57bb96fca4/gradio-4.25.0-py3-none-any.whl 2. Change your audio component to gr.Audio(sources=["microphone"], type="filepath",format="wav") See if you notice a difference in performance and lmk. — Reply to this email directly, view it on GitHub <#7681 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BFNOKXYDO26O3REDJSPGAF3Y3MZ4VAVCNFSM6AAAAABES4R7XSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZTGI2TGMBSGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

``` ===== Application Startup at 2024-04-05 15:49:34 ===== Running on local URL: http://0.0.0.0:7860 To create a public link, set `share=True` in `launch()`. Received audio file path: None Attempt 1 of 1 failed with error: Invalid file: None ###############Failed to open audio file after 1 attempts.############## Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/gradio/queueing.py", line 522, in process_events response = await route_utils.call_process_api( File "/usr/local/lib/python3.9/site-packages/gradio/route_utils.py", line 260, in call_process_api output = await app.get_blocks().process_api( File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1750, in process_api data = self.postprocess_data(fn_index, result["prediction"], state) File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1521, in postprocess_data self.validate_outputs(fn_index, predictions) # type: ignore File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1495, in validate_outputs raise ValueError( ValueError: An event handler (transcribe) didn't receive enough output values (needed: 3, received: 1). Wanted outputs: [<gradio.components.textbox.Textbox object at 0x7f73f7c80220>, <gradio.components.number.Number object at 0x7f73f7c80340>, <gradio.components.number.Number object at 0x7f73f7c80490>] Received outputs: [None] ``` ************ After the lag for audio capture, I push re-submit and the error is gone Received audio file path: /tmp/gradio/be60e81248568cc78a52a8bd6c9accaa3fdc6193/audio.wav Dear fellow scholars, the medications are Tylenol, Metoprolol, and Aspirin. What a time to be alive! **Medications:** - Acetaminophen - Metoprolol - Aspirin

aliabid94 · 2024-04-05T20:35:02Z

Did the PR make any difference at all? If you're still seeing that much lag when the processing time should have been cut, then perhaps its a network issue? Are you running your demo locally or over a server?

majweldon · 2024-04-08T18:01:29Z

I didn't see any difference with the PR, unfortunately. My demo is running on the hugging face server, and I see similar behaviour at work, at home, and on my mobile device. Would network issues affect latency differently between the libraries? I can see the waveform and playback the audio within 5-6 seconds in both versions, similar to what you report. I just can't pass the audio to my function (transcribe) for much longer using the 4.x versions - it seems to have to wait until audio.wav is written to disk and can be passed in as a filepath. Thanks again, Mike :)

…

On Fri, 5 Apr 2024 at 14:35, aliabid94 ***@***.***> wrote: Did the PR make any difference at all? If you're still seeing that much lag when the processing time should have been cut, then perhaps its a network issue? Are you running your demo locally or over a server? — Reply to this email directly, view it on GitHub <#7681 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BFNOKX6YFTL5DLEFNDSMNE3Y34DI3AVCNFSM6AAAAABES4R7XSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBQGU4TEMBWGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

majweldon added the bug Something isn't working label Mar 12, 2024

abidlabs added the Regression Bugs did not exist in previous versions of Gradio label Mar 13, 2024

abidlabs added the Priority High priority issues label Mar 29, 2024

aliabid94 mentioned this issue Apr 2, 2024

Audio upload performance improvement for filepath output #7917

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delayed Microphone Audio Capture #7681

Delayed Microphone Audio Capture #7681

majweldon commented Mar 12, 2024 •

edited by aliabid94

abidlabs commented Mar 13, 2024 •

edited

aliabid94 commented Apr 2, 2024

aliabid94 commented Apr 2, 2024

majweldon commented Apr 5, 2024 via email •

edited by aliabid94

aliabid94 commented Apr 5, 2024

majweldon commented Apr 8, 2024 via email

Delayed Microphone Audio Capture #7681

Delayed Microphone Audio Capture #7681

Comments

majweldon commented Mar 12, 2024 • edited by aliabid94

Describe the bug

Have you searched existing issues? 🔎

Reproduction

Screenshot

Logs

System Info

Severity

abidlabs commented Mar 13, 2024 • edited

aliabid94 commented Apr 2, 2024

aliabid94 commented Apr 2, 2024

majweldon commented Apr 5, 2024 via email • edited by aliabid94

aliabid94 commented Apr 5, 2024

majweldon commented Apr 8, 2024 via email

majweldon commented Mar 12, 2024 •

edited by aliabid94

abidlabs commented Mar 13, 2024 •

edited

majweldon commented Apr 5, 2024 via email •

edited by aliabid94