SoundSage LLM Integration (text-to-audio-processing)

WORK IN PROGRESS

Currently,AutoGain the Automatic Gain-Staging tool is fully functional and ready to use, this is just one factor of the Audio Suite with many more to come in the near future! However, the SoundSage Audio Suite is non-operational. Thank you for your patience while we work out the logistics and further plan the scope of the project. Please see Contributing if you would like to help.

Update

The "ChatBot" UI is currently operational however we are still working on developing the functions to enable it to parse specific task-related keywords into commands to move files around in the computer based on the information extracted from a users prompt.

Welcome to SoundSage!🦉

If you are reading this then you are about to be one of the very few to know of our journey! 🚀

We are on a mission to create the world's first AI-based audio processing suite. The tools use AI to analyze and process audio files, improving the quality and efficiency of the audio processing workflow.

User-friendly interface: The system will feature a simple and intuitive chat interface, making it easy for users to navigate and use the tool by simply prompting SoundSage.🦉
Efficient workflow: The tools will be designed to streamline the audio processing workflow, allowing users to process audio files quickly and easily.🛠
Operating Environment: The product is designed to operate in a standard computing environment. It requires a computer with sufficient processing power to run the AI algorithms and enough storage space to store the audio files.

Open-source Python program for automating Audio Processing using natural language processing. part 1 of a series for automating audio processing tasks, the end goal is to create a full set of tools for an AI to use for automating Audio processing for Music, Film, Game and any other possible applications.

See the AutoGain branch for the Working system that does not involve an LLM, this branch is specific to the LLM Integration.

SoundSage🦉

SoundSage is an advanced audio processing system that integrates automated audio tools with a language learning model (LLM) like OpenAI API. The system allows users to prompt a list of commands for audio processing, such as gain staging, balancing, subtractive EQ, noise reduction, and compression.

Project Overview

The LLM integration process involves the LLM interpreting a user prompt, extracting key information, and using this information to execute a series of actions. The LLM is designed to interpret user prompts and translate them into a series of actions that the system can execute. This process involves a number of scripts that work together to manage files, interact with the OpenAI API, and control the AutoGain tool. These actions include navigating to the specified folder, copying the audio files, pasting them into the "PreProcess" folder, initiating the AutoGain tool, and finally copying the processed files into a new folder. The completion script then sends a signal to indicate that the process is complete and provides the user with information about the location of the processed files. Please see the SoundSage FlowChart for a full visualization of the intended process.

How it Works:

Interpreting User Prompts: When a user provides a prompt to SoundSage, the LLM analyzes the prompt to understand the user's intent. This involves parsing the prompt, identifying keywords and phrases related to audio processing tasks, and extracting any additional information that might be relevant, such as the location of the audio files to be processed.
Generating Commands: Once the LLM has interpreted the user's prompt, it generates a list of commands for the audio processing tools. These commands are specific to the tasks that the user has requested, such as gain staging, EQ adjustment, or noise reduction. The LLM generates these commands in a format that the audio processing tools can understand and execute.
Executing Commands: The commands generated by the LLM are passed to the audio processing tools, which execute them in the order they were generated. This involves loading the specified audio files, applying the necessary audio processing tasks, and saving the processed files to the specified location.
Providing Feedback to the User: After the audio processing tasks have been completed, the LLM generates a response to the user. This response includes information about the tasks that were performed, any changes that were made to the audio files, and the location of the processed files. The response is designed to be easily understood by the user, providing them with a clear and concise summary of the audio processing workflow.

Code Documentation:

*please see SoundSage To Do List

openai_interaction.py: This script uses the OpenAI API to Parse Keywords and compiles a command list for the system to execute.

file_management.py: This script handles file management tasks. It contains functions for navigating to a specified folder, copying files, and pasting files.

error_handling.py: This script handles errors that might occur during the execution of your application. It also contains functions for providing progress updates to the user.

completion_handling.py: This script handles the completion of the process. It contains a function for sending a signal to the LLM to respond that the process is complete and a function for checking the new folder to ensure that the process completed successfully.

SoundSage Main main.py: This is the main script that runs your application. It imports and uses the functions from the other scripts.

menu_functions.py: This script contains functions that get executed when the menu options are selected. For example, it might contain a function for creating a new file when "New.." is selected, a function for saving the current state of the application when "Save As.." is selected, and so on.

send_button_functionality.py : This script contains the function that gets executed when the "Send" button is clicked. This function takes the user's input, sends it to the OpenAI API, and displays the response in the chat window.

autogain_interaction.py: This script contains functions for interacting with the AutoGain software. It contains functions for sending commands to the software and handling its responses.

AutoGain

AutoGain is the first of many audio tools that the SoundSage system uses to process audio based on a user's prompt. The AutoGain tool automates gain staging by analyzing the file and either boosting, turning down, or simply doing nothing to the specified file.

AutoGain Main main.py,audio_analysis.py and audio_processor.py: These scripts are part of the AutoGain tool. They handle the analysis and processing of audio files.

Getting Started

To get started with SoundSage, follow these steps:

Clone the repository to your local machine.
Install the required dependencies.
Run the main script to start the application.

Prerequisites

Before you can use SoundSage, you need to have the following software installed on your computer:

Python 3.8 or later
pip (Python package installer)

Installation

1. Clone the repository:

git clone https://github.com/Gabeiscool420/SoundSage---LLM-Audio-Processing.git

2. Navigate to the cloned repository:

cd SoundSage---LLM-Audio-Processing

3. Install the required dependencies:

pip install -r requirements.txt

4. Navigate to SoundSage Directory:

cd "SoundSage-LLM Integration"
cd SoundSage

5. Download the spaCy English Language Model:

After installing the Python package dependencies, you will need to download the spaCy English language model. This model is necessary for many of spaCy's features, including part-of-speech tagging, named entity recognition, and dependency parsing.

You can download the model by running the following command in your terminal:

python -m spacy download en_core_web_sm

6. Run the main script to start the application:

python main.py

Contributing Guidelines

Thank you for considering contributing to SoundSage! Anyone can contribute we just ask you to adhere to our guidelines! For Contribution Guidelines please see CONTRIBUTING.md. If everything looks good to you then feel free to take a stab at the SoundSage to-do list! please leave a comment and document any changes you have made as well as cite any code you may have borrowed!

Fork the repository.
Create a new branch for your changes.
Make your changes in your branch.
Submit a pull request with your changes.

Please make sure to update tests as appropriate.

Cheers! :)

The SoundSage Team

Future Tools

SoundSage plans to integrate more tools for automated audio processing, including:

AutoEQ
AutoDelay
AutoReverb
AutoLimiter
AutoEnhancer
AutoCompressor
AutoStereo Widener
AutoTime Stretcher
AutoPitch Corrector
AutoNoise Reduction

Precision: Depending on the quality of the user's input, the accuracy of the LLM's interpretation and the subsequent commands it generates could vary. There is a potential risk of inaccurate command generation or misinterpretation of the user's intent.

Efficiency: As the system involves multiple scripts and moving parts, the process flow might not be as efficient as it could be with a more streamlined setup. Any lag in communication between scripts or modules could slow down the overall operation.

Resource Consumption: The application may require significant processing power and memory to run efficiently, especially for large audio files or complex audio processing tasks.

Optimizing Script Interaction:

a. Identify Bottlenecks: Analyze the current script interaction process to identify any areas where inefficiencies may occur. This could involve running performance tests or reviewing the code to understand how data and commands are passed between scripts.

b. Design Central Controller: Design a central controller script that will manage the flow of data and commands between scripts. This script should be designed to handle any issues identified in the bottleneck analysis.

c. Implementation: Rewrite the existing scripts as needed to interact with the central controller, rather than interacting directly with each other. Ensure that the central controller has the necessary functions to manage all required interactions.

d. Testing: Run tests to confirm that the central controller is correctly managing script interactions and that the system is running more efficiently.

Leveraging Efficient Computing Resources:

a. Identify Resource Intensive Tasks: Identify which parts of the application are most resource-intensive. This could be done by monitoring resource usage during typical tasks.

b. Optimize Libraries and Platforms: Review the libraries and platforms used by the application to identify any that could be replaced with more efficient alternatives. For instance, if the application is currently running on a local machine, consider migrating it to a cloud-based solution that can provide more computing power and scalability.

c. Implement Changes: Replace the identified libraries or platforms with the more efficient alternatives. This may involve rewriting parts of the application to work with the new resources.

d. Testing: Run tests to confirm that the application is running more efficiently and that the changes have not introduced any new issues.

Modules Currently Used in SoundSage and AutoGain

tkinter: Used for creating the GUI.
PIL: Used for image processing.
audio_analysis: Custom module for analyzing audio files.
audio_processor: Custom module for processing audio files.
ffprobe: Used for extracting audio file information.
ffmpy: Used for applying gain adjustment to audio files.
shutil: Used for copying files.
openai_interaction: Custom module for interacting with the OpenAI API.

Functions

open_files(): Opens the file dialog to select audio files.
choose_output_directory(): Opens the directory dialog to select the output directory.
begin_process(): Starts the audio processing.

APIs

OpenAI API: Used for generating Python code based on user input.

Sources

AutoGain scripts: main.py, audio_analysis.py, audio_processor.py

Potential Modules

Here are some potential modules that could be used in the SoundSage LLM integration:

transformers: A Python library for state-of-the-art natural language processing, developed by Hugging Face. It provides pre-trained models for several major LLMs, including OpenAI's GPT-3 and Google's BERT. The library is released under the Apache 2.0 License.
scikit-learn: This is a machine learning library in Python. It features various classification, regression and clustering algorithms, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. This module is distributed under the 3-Clause BSD license.
SoundFile: This Python package can read and write sound files. File reading/writing is supported for many formats including WAV, FLAC, OGG, and more. This module is licensed under the BSD 3-Clause License.
spaCy: A Python library for advanced natural language processing. It provides functionalities for part-of-speech tagging, named entity recognition, and dependency parsing, among others. This could be used to enhance the LLM's ability to interpret user prompts. The library is released under the MIT License.
nltk (Natural Language Toolkit): A Python library for working with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources. This could be used to enhance the LLM's understanding of natural language. The library is released under the Apache 2.0 License.
torchaudio: An audio library for PyTorch. It provides a variety of audio transforms, supports audio I/O, and has dataloaders for common audio datasets. This could be used for loading and saving audio files, as well as performing common audio transformations. The library is released under the BSD-3-Clause

Licenses

The licensing information for the modules and functions used in the SoundSage AudioSuite and AutoGain python scripts is as follows:

tkinter, PIL, shutil: These are part of the Python Standard Library and are covered by the Python Software Foundation License.
ffprobe, ffmpy: These are licensed under the GNU General Public License (GPL) version 2.
AutoGain scripts: main.py, audio_analysis.py, audio_processor.py: These are custom modules made by SoundSage and adhere the to SoundSage open source license.
OpenAI API: this is part of OpenAI's open source license.

Name		Name	Last commit message	Last commit date
Latest commit History 250 Commits
.github		.github
.idea		.idea
SoundSage-LLM Integration		SoundSage-LLM Integration
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
PROCESS_FLOW.md		PROCESS_FLOW.md
README.md		README.md
TODO.md		TODO.md
requirements.txt		requirements.txt

License

Gabeiscool420/SoundSage---LLM-Audio-Processing

Folders and files

Latest commit

History

Repository files navigation