Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delayed Microphone Audio Capture #7681

Open
1 task done
majweldon opened this issue Mar 12, 2024 · 6 comments
Open
1 task done

Delayed Microphone Audio Capture #7681

majweldon opened this issue Mar 12, 2024 · 6 comments
Labels
bug Something isn't working Priority High priority issues Regression Bugs did not exist in previous versions of Gradio

Comments

@majweldon
Copy link

majweldon commented Mar 12, 2024

Describe the bug

Ver 3.48.0 (Desired Behaviour)
-As soon as I push stop recording in a microphone input I can push submit (for transcription) with no errors. That is, the file seems usable from the moment stop is pushed.

Ver 4.21.0
-Once I stop a recording, I have to wait some time before the audio 'captures' before I can push submit. This delay is about 1 second for every 10 seconds of recording, so can be substantial for 5+ minutes of audio. I don't mind if there is additional latency, but, ideally, I can push the submit button as soon as I am done recording and come back once everything is done.

Thanks for building and supporting Gradio - it has changed my professional life for the better in a big way.

Mike :)

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Reproduction

[Weldon_Full_Visit_Format.txt](https://github.com/gradio-app/gradio/files/14577976/Weldon_Full_Visit_Format.txt)
import os
import openai
import time
from numpy import True_
import gradio as gr
import soundfile as sf
from pydub import AudioSegment

from openai import OpenAI

# Load API key from an environment variable
OPENAI_SECRET_KEY = os.environ.get("OPENAI_SECRET_KEY")
client = OpenAI(api_key = OPENAI_SECRET_KEY)

note_transcript = ""

def transcribe(audio, history_type):
  global note_transcript
  print(f"Received audio file path: {audio}")
     
  history_type_map = {
      "History": "Weldon_History_Format.txt",
      "Physical": "Weldon_PE_Note_Format.txt",
      "H+P": "Weldon_History_Physical_Format.txt",
      "Impression/Plan": "Weldon_Impression_Note_Format.txt",
      "Handover": "Weldon_Handover_Note_Format.txt",
      "Meds Only": "Medications.txt",
      "EMS": "EMS_Handover_Note_Format.txt",
      "Triage": "Triage_Note_Format.txt",
      "Full Visit": "Weldon_Full_Visit_Format.txt",
      "Psych": "Weldon_Psych_Format.txt",
      "SBAR": "SBAR.txt"
      
   }
  file_name = history_type_map.get(history_type, "Weldon_Full_Visit_Format.txt")
  with open(f"Format_Library/{file_name}", "r") as f:
    role = f.read()
  messages = [{"role": "system", "content": role}]

  ######################## Read audio file, wait as necessary if not written
  max_attempts = 1
  attempt = 0
  audio_data = None
  samplerate = None
  while attempt < max_attempts:
      try:
          if audio is None:
              raise TypeError("Invalid file: None")
          audio_data, samplerate = sf.read(audio)
          break
      except (OSError, TypeError) as e:
          print(f"Attempt {attempt + 1} of {max_attempts} failed with error: {e}")
          attempt += 1
          time.sleep(3)
  else:
      print(f"###############Failed to open audio file after {max_attempts} attempts.##############")
      return  # Terminate the function or raise an exception if the file could not be opened


  ########## Cast as float 32, normalize
  #audio_data = audio_data.astype("float32")
  #audio_data = (audio_data * 32767).astype("int16")
  #audio_data = audio_data.mean(axis=1)

  ###################Code to convert .wav to .mp3 (if neccesary)
  sf.write("Audio_Files/test.wav", audio_data, samplerate, subtype='PCM_16')
  sound = AudioSegment.from_wav("Audio_Files/test.wav")
  sound.export("Audio_Files/test.mp3", format="mp3")

  sf.write("Audio_Files/test.mp3", audio_data, samplerate)
  
    
  ################  Send file to Whisper for Transcription
  audio_file = open("Audio_Files/test.mp3", "rb")
  
  max_attempts = 3
  attempt = 0
  while attempt < max_attempts:
      try:
          audio_transcript = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
          break
      except openai.error.APIConnectionError as e:
          print(f"Attempt {attempt + 1} failed with error: {e}")
          attempt += 1
          time.sleep(3) # wait for 3 seconds before retrying
  else:
      print("Failed to transcribe audio after multiple attempts")  
    
  print(audio_transcript.text)
  messages.append({"role": "user", "content": audio_transcript.text})
  
  #Create Sample Dialogue Transcript from File (for debugging)
  #with open('Audio_Files/Test_Elbow.txt', 'r') as file:
  #  audio_transcript = file.read()
  #messages.append({"role": "user", "content": audio_transcript})
  

  ### Word and MB Count
  file_size = os.path.getsize("Audio_Files/test.mp3")
  mp3_megabytes = file_size / (1024 * 1024)
  mp3_megabytes = round(mp3_megabytes, 2)

  audio_transcript_words = audio_transcript.text.split() # Use when using mic input
  #audio_transcript_words = audio_transcript.split() #Use when using file

  num_words = len(audio_transcript_words)


  #Ask OpenAI to create note transcript
  response = client.chat.completions.create(model="gpt-4-1106-preview", temperature=0, messages=messages)
  note_transcript = response.choices[0].message.content
  print(note_transcript) 
  return [note_transcript, num_words,mp3_megabytes]

#Define Gradio Interface
my_inputs = [
    gr.Audio(sources=["microphone"], type="filepath",format="mp3"),
    gr.Radio(["History","H+P","Impression/Plan","Full Visit","Handover","Psych","EMS","SBAR","Meds Only"], show_label=False),
]

ui = gr.Interface(fn=transcribe, 
                  inputs=my_inputs, 
                  outputs=[gr.Textbox(label="Your Note", show_copy_button=True),
                           gr.Number(label="Audio Word Count"),
                           gr.Number(label=".mp3 MB")]
                 )


ui.launch(share=False, debug=True)

Screenshot

No response

Logs

Attempt 1 of 1 failed with error: Invalid file: None
###############Failed to open audio file after 1 attempts.##############
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.9/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/user/.local/lib/python3.9/site-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/user/.local/lib/python3.9/site-packages/gradio/blocks.py", line 1570, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "/home/user/.local/lib/python3.9/site-packages/gradio/blocks.py", line 1397, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "/home/user/.local/lib/python3.9/site-packages/gradio/blocks.py", line 1371, in validate_outputs
    raise ValueError(
ValueError: An event handler (transcribe) didn't receive enough output values (needed: 3, received: 1).
Wanted outputs:
    [textbox, number, number]
Received outputs:
    [None]

System Info

Gradio Environment Information:
------------------------------
Operating System: Linux
gradio version: 4.14.0
gradio_client version: 0.8.0

------------------------------------------------
gradio dependencies in your environment:

aiofiles: 23.2.1
altair: 5.2.0
fastapi: 0.110.0
ffmpy: 0.3.2
gradio-client==0.8.0 is not installed.
httpx: 0.27.0
huggingface-hub: 0.19.4
importlib-resources: 6.1.3
jinja2: 3.1.3
markupsafe: 2.1.5
matplotlib: 3.8.3
numpy: 1.26.2
orjson: 3.9.15
packaging: 23.2
pandas: 2.1.3
pillow: 10.2.0
pydantic: 2.6.4
pydub: 0.25.1
python-multipart: 0.0.9
pyyaml: 6.0.1
semantic-version: 2.10.0
tomlkit==0.12.0 is not installed.
typer: 0.9.0
typing-extensions: 4.8.0
uvicorn: 0.28.0
authlib; extra == 'oauth' is not installed.
itsdangerous; extra == 'oauth' is not installed.


gradio_client dependencies in your environment:

fsspec: 2023.10.0
httpx: 0.27.0
huggingface-hub: 0.19.4
packaging: 23.2
typing-extensions: 4.8.0
websockets: 11.0.3



Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.

Severity

I can work around it

@majweldon majweldon added the bug Something isn't working label Mar 12, 2024
@abidlabs abidlabs added the Regression Bugs did not exist in previous versions of Gradio label Mar 13, 2024
@abidlabs
Copy link
Member

abidlabs commented Mar 13, 2024

Thanks @majweldon for the kind words! cc @hannahblair and @dawoodkhan82 as well as this relates to frontend validation

@abidlabs abidlabs added the Priority High priority issues label Mar 29, 2024
@aliabid94
Copy link
Collaborator

taking a look

@aliabid94
Copy link
Collaborator

Okay so I haven't exactly been able to reproduce the extent of lag that you describe. If I record a very long audio (~5 min), I do encounter two sources of lag:

  1. Generating the waveform in the browser (this is new to gradio 4.x.). However on my Macbook pro, this only takes ~2s.
  2. Processing the file for saving in the backend. This can take 5-6 seconds. However, this is identical in gradio 4.x and 3.42, so I'm not sure why you wouldn't see this in 3.42.

I made a PR that improves the performance of (2) - it only works though if the recorded audio format is "wav". Can you try installing the gradio from this PR. So do the following:

  1. pip install https://gradio-builds.s3.amazonaws.com/b35e3ae839d208520180299077f4ce57bb96fca4/gradio-4.25.0-py3-none-any.whl
  2. Change your audio component to gr.Audio(sources=["microphone"], type="filepath",format="wav")

See if you notice a difference in performance and lmk.

@majweldon
Copy link
Author

majweldon commented Apr 5, 2024 via email

@aliabid94
Copy link
Collaborator

Did the PR make any difference at all? If you're still seeing that much lag when the processing time should have been cut, then perhaps its a network issue? Are you running your demo locally or over a server?

@majweldon
Copy link
Author

majweldon commented Apr 8, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Priority High priority issues Regression Bugs did not exist in previous versions of Gradio
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants