Hugging Face Local LLM + LangChain

Description

!WORK IN PROGRESS!

This is a project to explore and showcase using LLM models from Hugging Face locally in LangChain. For my example I am using Meta's LLama 2 7B chat model. In order to use this model you will need to get an access from Meta and accept their terms and conditions. To use other models you will need to edit .env file and change the model name, just make sure it is a chat model in Hugging Face format based on Llama 2.

eg. meta-llama/Llama-2-7b-chat-hf not meta-llama/Llama-2-7b-chat

Few examples of models that should work:

meta-llama/Llama-2-7b-chat-hf, meta-llama/Llama-2-13b-chat-hf etc.
NousResearch/Llama-2-7b-chat-hf, NousResearch/Llama-2-13b-chat-hf etc.

Pre-requisites

API key from Hugging Face in the environment file .env
```
HUGGING_FACE_API_KEY=...
```
CUDA enabled GPU and CUDA toolkit installed (for GPU acceleration)
- make sure that you're using a strong GPU (eg. RTX 4090, A100) for fast inference

Usage

Init venv and install dependencies

python -m venv .venv
source ./.venv/bin/activate
pip install -r requirements.txt

Run the script (this will download the model and save it to ./models - not really necessary - without it, it would download model to cache and could be used directly from there)

I am saving it to ./models only to make it more transparent and avoid "magic".
```
python ./download_model.py
```
It will take a while to download the model, it is ~13GB in size.
Running pipeline directly or via LangChain

I have two examples - pipeline.py and langchain.py. First one is a simple pipeline that will generate a response to a given input. Second one is an example of using this model in LangChain. Both run fully offline using only local resources and are stateless. You can use them as a starting point for your own project.

Loading model takes a while (~2 minutes on my setup) - I am not entirely sure why it's much slower compared to running model direcly from Llama 2 repo instead of Hugging Face - work in progress. Once you load the model, inference is very fast.
```
python ./pipeline.py
```
```
python ./langchain_example.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_model.py		download_model.py
langchain_example.py		langchain_example.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.env.example

.env.example

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

download_model.py

download_model.py

langchain_example.py

langchain_example.py

pipeline.py

pipeline.py

requirements.txt

requirements.txt

Repository files navigation

Hugging Face Local LLM + LangChain

Description

Pre-requisites

Usage

About

Releases

Packages

Languages

License

alan-mroczek/hugging-face-local-llm-langchain

Folders and files

Latest commit

History

Repository files navigation

Hugging Face Local LLM + LangChain

Description

Pre-requisites

Usage

About

Resources

License

Stars

Watchers

Forks

Languages