GitHub - 0xnu/digital-lincoln-llm: Digital Lincoln LLM

Digital Lincoln LLM - 2024 Hackathon

The Digital Lincoln PDF Chat is a Python-based chatbot that answers questions based on the content of a given PDF document. It extracts PDF text and uses OpenAI's GPT-3.5-turbo language model to generate contextually relevant responses to user queries. By integrating the extracted text from the PDF as context for the language model, the chatbot can provide informative and accurate answers based on the content of the PDF document.

Features

Extracts text from PDF documents, either from a local file or a URL
Utilises OpenAI's GPT-3.5-turbo model to generate responses based on the retrieved information
Provides a user-friendly web interface built with Gradio for easy interaction

Requirements

To run the Digital Lincoln LLM, you need the following:

Python 3.6 or higher
OpenAI API key
Required Python packages:
- openai
- PyPDF2
- requests
- gradio

Installation

Clone the repository:

git clone https://github.com/0xnu/digital-lincoln-llm.git

Navigate to the project directory:
```
cd digital-lincoln-llm
```

Install the required Python packages:

 ## Prerequisites
 python3 -m venv .venv
 source .venv/bin/activate
 pip3 install -r requirements.txt
 python3 -m pip install --upgrade pip
 deactivate

Usage

Make sure you have an OpenAI API key. If you don't have one, sign up at OpenAI and create an API key.
Run the digital_lincoln_llm.py script:
```
python3 digital_lincoln_llm.py
```
Open a web browser and go to http://localhost:7860 to access the Digital Lincoln LLM interface.
Enter your OpenAI API key in the designated input field.
Provide the path to a local PDF file or a URL of a PDF document that you want to use as the knowledge base for the chatbot.
Start asking questions in the user input field, and the chatbot will generate responses by retrieving relevant information from the PDF, generating contextually appropriate answers using GPT-3.5-turbo.

🚨 FYI, it creates a log.csv file with sensitive information in the flagged folder.

How It Works

When a user asks a question, the chatbot performs the following steps:

The question is used as a query to retrieve relevant passages from the extracted text of the PDF document.
The retrieved passages are then fed into the GPT-3.5-turbo model along with the user's question.
The language model generates a response based on the retrieved information, considering the context of the passages.
The generated response is returned to the user through the Gradio interface.

By incorporating a Large Language Model (LLM) into the process, the chatbot can focus on the most relevant information from the PDF document, leading to more accurate and contextually appropriate responses.

Limitations

The accuracy and relevance of the chatbot's responses depend on the quality and content of the PDF document provided.

The chatbot may be unable to answer questions that are not directly related to the information in the PDF.
The chatbot's performance may be affected by the size and complexity of the PDF document.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request on the GitHub repository. Feel free to customise and expand the code based on your specific project details and requirements.

License

This project is licensed under the MIT License.

Acknowledgements

OpenAI for providing the powerful GPT-3.5-turbo language model
PyPDF2 for simplifying the extraction of text from PDF documents
Gradio for enabling the creation of interactive web interfaces

Copyright

Developed at LincolnHack 2024 in collaboration with Digital Lincoln — Lincolnshire 🇬🇧

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
digital_lincoln_llm.py		digital_lincoln_llm.py
digital_lincoln_no_llm.py		digital_lincoln_no_llm.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

CODEOWNERS

CODEOWNERS

LICENSE

LICENSE

NOTICE

NOTICE

README.md

README.md

digital_lincoln_llm.py

digital_lincoln_llm.py

digital_lincoln_no_llm.py

digital_lincoln_no_llm.py

requirements.txt

requirements.txt

Repository files navigation

Digital Lincoln LLM - 2024 Hackathon

Features

Requirements

Installation

Usage

How It Works

Limitations

Contributing

License

Acknowledgements

Copyright

About

Releases

Packages

Languages

License

0xnu/digital-lincoln-llm

Folders and files

Latest commit

History

Repository files navigation

Digital Lincoln LLM - 2024 Hackathon

Features

Requirements

Installation

Usage

How It Works

Limitations

Contributing

License

Acknowledgements

Copyright

About

Topics

Resources

License

Stars

Watchers

Forks

Languages