Skip to content

0xsuid/code-generation-gpt-models

Repository files navigation

Code Generation with GPT-Neo Models

This project demonstrates the use of GPT-Neo models for code generation. GPT-Neo is a variant of the GPT (Generative Pre-trained Transformer) model, which is large scale autoregressive language model. In this project, we use GPT-Neo to generate code in Python programming language.

Requirements

  • Python 3.8 or higher
  • Pytorch 1.13 or higher
  • Hugging Face's transformers library
  • GPU

Getting Started

Clone the repository

git clone https://github.com/0xsuid/code-generation-gpt-models.git

Install Nvidia Driver, CUDA Toolkit & Python Dependencies For More info check INSTALLATION.md

chmod +x install.sh
./install.sh

Fine-Tune the model on apps dataset

Single GPU

Note: argument "-u" is required to disable python output buffering

nohup python3 -u tune_gpt.py --limit 10 --local-rank 0 --model "EleutherAI/gpt-neo-125M" --tokenizer "EleutherAI/gpt-neo-125M" > output.log 2>&1 &

Single GPU/MultiGPU with Deepspeed

nohup deepspeed tune_gpt.py --deepspeed deepspeed.json --model "EleutherAI/gpt-neo-125M" --tokenizer "EleutherAI/gpt-neo-125M" > output.log 2>&1 &

Supported Arguments

  1. Limit
    • "--limit" - "Limit Total no. of problems"
  2. Model
    • "--model" - "ID of the Model from huggingface - i.e. 'EleutherAI/gpt-neo-125M' "
  3. Tokenizer
    • "--tokenizer" - "ID of the Tokenizer from huggingface - i.e. 'EleutherAI/gpt-neo-125M' "
  4. Upload Model
    • "--upload-model" - "Upload fine-tuned model to Huggingface"
  5. Stop
    • "--stop-instance" - "Stop tensordock instance after training"
  6. Local Rank
    • "--local-rank" - "Local rank for deepspeed, it should be 0 when not using deepspeed to save model"
  7. Upload Experiement
    • "--upload-experiment"" - "Upload Experiment directory to huggingface repo"
  8. Verbosity
    • "--verbosity"

Logs Visualization with Tensorboard

tensorboard --logdir experiments/2022-10-15-9e416bbdeafeaea88e8747a0edd284f93d7551ea3cc387377269ceed52957730/logs

Labels

we pass the input data as the label instead of just the answer labels. This is because we are training a language model, hence we want the model to learn the pattern of the prompt and not just answer class. In a sense, the model learns to predict the words of the input question + answer structured in the prompt, and in the process learn the code generation task.

Cuda out of memory

When Using Multi-GPU Environment and first gpu run out of memory but we have more memory available on other gpus then setting "max_split_size_mb" might be useful

Decoding Strategies

Limitations

  • The generated code may not always be syntactically correct or runnable.
  • The model is only as good as the dataset it is trained on, so the quality of the generated code will depend on the diversity and quality of the training data. GPT-Neo models are large, so they require a powerful GPU and a lot of memory to train.

Conclusion

Code generation with GPT-Neo models is a promising approach for automating repetitive coding tasks. With the right dataset and fine-tuning, it can be used to generate high-quality code in a variety of programming languages. However, it still has some limitations, and it is not a substitute for human programmers.

References

About

Code generation with GPT Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published