ReliableLM4Code

This repository extends from our recent work, "Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey" and "Large language models for software engineering: A systematic literature review". It includes necessary information for our research and a curated collection of LM4Code papers and other resources (datasets, tutorials, etc.). The focus is primarily on papers that use pre-trained models, especially large language models, to improve the reliability of language models in Software Engineering research.

For more details, please access this site

Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case generation. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls, which hinder realistic performance and further impact their reliability and applicability in real-world deployment. Such challenges drive the need for a comprehensive understanding - not just identifying these issues but delving into their possible implications and existing solutions to build more reliable language models tailored to code intelligence. Based on a well-defined systematic research approach, we conducted an extensive literature review to uncover the pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues have been identified. After carefully examining these studies, we designed a taxonomy of pitfalls in LM4Code research and conducted a systematic study to summarize the issues, implications, current solutions, and challenges of different pitfalls for LM4Code systems. We developed a comprehensive classification scheme that dissects pitfalls across four crucial aspects: data collection and labeling, system design and learning, performance evaluation, and deployment and maintenance. Through this study, we aim to provide a roadmap for researchers and practitioners, facilitating their understanding and utilization of LM4Code in reliable and trustworthy ways.

Please feel free to send a pull request to add papers and relevant content that are not listed here. We uploaded our completed paper lists to Google Drive with detailed reviewed information.

Content

Papers

Data Collection and Labeling

Unbalanced Distribution

Deep Learning Based Vulnerability Detection (2021), arxiv, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Does data sampling improve deep learning-based vulnerability detection? Yeas! and Nays! (2023), ICSE, X Yang, et al. [pdf]
On the Value of Oversampling for Deep Learning in Software Defect Prediction (2021), TSE, R Yedida, T Menzies. [pdf]
Robust Learning of Deep Predictive Models from Noisy and Imbalanced Software Engineering Datasets (2022), ASE, Z Li, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), arxiv, B Steenhoek, et al. [pdf]

Label Errors

Robust Learning of Deep Predictive Models from Noisy and Imbalanced Software Engineering Datasets (2022), ASE, Z Li, et al. [pdf]
XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training (2022), TOSEM, Z Lin, et al. [pdf]
Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (Experience Paper) (2023), ISSTA, X Nie, et al. [pdf]

Data Noise

Slice-Based Code Change Representation Learning (2023), SANER, F Zhang, et al. [pdf]
Are we building on the rock? on the importance of data preprocessing for code summarization (2022), FSE, L Shi, et al. [pdf]
Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? (2018), ASE, Z Liu, et al. [pdf]

System Design and Learning

Data Snooping

AutoTransform: automated code transformation to support modern code review process (2022), ICSE, Thongtanunam, Patanamon, Chanathip Pornprasit, and Chakkrit Tantithamthavorn. [pdf]
Can Neural Clone Detection Generalize to Unseen Functionalitiesƒ (2021), ASE, C Liu, et al. [pdf]
CD-VulD: Cross-Domain Vulnerability Discovery Based on Deep Domain Adaptation (2020), TDSC, S Liu, et al. [pdf]
Deep just-in-time defect prediction: how far are we? (2021), ISSTA, Z Zeng, et al. [pdf]
Patching as translation: the data and the metaphor (2020), ASE, Y Ding, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, et al. [pdf]
Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models (2302), ICSE, S Gao, et al. [pdf]
Revisiting Learning-based Commit Message Generation (2023), ICSE, J Dong, Y Lou, D Hao, et al. [pdf]
Syntax and Domain Aware Model for Unsupervised Program Translation (2302), ICSE, F Liu, J Li, L Zhang. [pdf]
How Effective Are Neural Networks for Fixing Security Vulnerabilities (2023), ISSTA, Y Wu, N Jiang, HV Pham, et al. [pdf]
Towards More Realistic Evaluation for Neural Test Oracle Generation (2305), ISSTA, Z Liu, K Liu, X Xia, et al. [pdf]
On the Evaluation of Neural Code Summarization (2022), ICSE, E Shi, Y Wang, L Du, et al. [pdf]

Spurious Correlations

Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Diet code is healthy: simplifying programs for pre-trained models of code (2022), FSE, Z Zhang, H Zhang, B Shen, et al. [pdf]
Explaining mispredictions of machine learning models using rule induction (2021), FSE, J Cito, I Dillig, S Kim, et al. [pdf]
Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching (2021), TOSEM, D Zou, Y Zhu, S Xu, et al. [pdf]
Thinking Like a Developer? Comparing the Attention of Humans with Neural Models of Code (2021), ASE, M Paltenghi, M Pradel. [pdf]
Vulnerability detection with fine-grained interpretations (2021), FSE, Y Li, S Wang, TN Nguyen. [pdf]
What do they capture? a structural analysis of pre-trained language models for source code (2022), ICSE, Y Wan, W Zhao, H Zhang, et al. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, MM Rahman, R Jiles, et al. [pdf]
Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond (2023), ISSTA, E Shi, Y Wang, H Zhang, et al. [pdf]

Inappropriate Model Design

Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking (2022), TSE, H Wang, P Ma, Y Yuan, et al. [pdf]
Improving automatic source code summarization via deep reinforcement learning (2018), ASE, Y Wan, Z Zhao, M Yang, et al.[pdf]
Patching as translation: the data and the metaphor (2020), ASE, Y Ding, B Ray, P Devanbu, et al.[pdf]
Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention (2020), TSE, W Wang, Y Zhang, Y Sui, et al. [pdf]
XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training (2022), TOSEM, Z Lin, G Li, J Zhang, et al. [pdf]
RepresentThemAll: A Universal Learning Representation of Bug Reports (2023), ICSE, S Fang, T Zhang, Y Tan, et al. [pdf]
Template-based Neural Program Repair (2023), ICSE, X Meng, X Wang, H Zhang, et al. [pdf]

Performance Evaluation

Inappropriate Baseline

Towards More Realistic Evaluation for Neural Test Oracle Generationr (2023), ARXIV, Z Liu, K Liu, X Xia, et al. [pdf]

Inappropriate Evaluation Dataset

Deep Learning Based Program Generation From Requirements Text: Are We There Yet? (2020), TSE, H Liu, M Shen, J Zhu, et al. [pdf]
Generating realistic vulnerabilities via neural code editing: an empirical study (2022), FSE, Y Nong, Y Ou, M Pradel, et al. [pdf]

Low Reproducibility

An extensive study on pre-trained models for program understanding and generation (2022), ISSTA, Z Zeng, H Tan, H Zhang, et al. [pdf]

Inappropriate Performance Measures

Deep Learning Based Vulnerability Detection: Are We There Yet? (2021), TSE, S Chakraborty, R Krishna, Y Ding, et al. [pdf]
Improving automatic source code summarization via deep reinforcement learning (2018), ASE, Y Wan, Z Zhao, M Yang, et al. [pdf]
Multi-task learning based pre-trained language model for code completion (2020), ASE, F Liu, G Li, Y Zhao, et al. [pdf]
On the Value of Oversampling for Deep Learning in Software Defect Prediction (2021), TSE, R Yedida, T Menzies. [pdf]
Patching as translation: the data and the metaphor (2020), ASE, Y Ding, B Ray, P Devanbu, et al. [pdf]
Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention (2020), TSE, W Wang, Y Zhang, Y Sui, et al. [pdf]
SynShine: Improved Fixing of Syntax Errors (2022), TSE, Ahmed T, Ledesma N R, Devanbu P. [pdf]
An empirical study of deep learning models for vulnerability detection (2023), ICSE, B Steenhoek, MM Rahman, R Jiles, et al. [pdf]
Revisiting Learning-based Commit Message Generation (2023), ICSE, J Dong, Y Lou, D Hao, et al. [pdf]
Tare: Type-Aware Neural Program Repair (2023), ICSE, Q Zhu, Z Sun, W Zhang, et al. [pdf]
How Effective Are Neural Networks for Fixing Security Vulnerabilities (2023), ISSTA, Y Wu, N Jiang, HV Pham, et al. [pdf]
Towards More Realistic Evaluation for Neural Test Oracle Generation (2305), ISSTA, Z Liu, K Liu, X Xia, et al. [pdf]
GitHub Copilot AI pair programmer: Asset or Liability? (2023), JSS, AM Dakhel, V Majdinasab, A Nikanjam, et al. [pdf]

Deployment and Maintainance

Real-World Constraints

Examining Zero-Shot Vulnerability Repair with Large Language Models (2023), S&P, H Pearce, B Tan, B Ahmad, et al. [pdf]
A Performance-Sensitive Malware Detection System Using Deep Learning on Mobile Devices (2020), TIFS, R Feng, S Chen, X Xie, et al. [pdf]
Diet code is healthy: simplifying programs for pre-trained models of code (2022), FSE, Z Zhang, H Zhang, B Shen, et al.[pdf]
When Code Completion Fails: A Case Study on Real-World Completions (2019), ICSE, VJ Hellendoorn, S Proksch, HC Gall, et al. [pdf]
Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants (2023), arxiv, G Sandoval, H Pearce, T Nys, et al. [pdf]
Grounded Copilot: How Programmers Interact with Code-Generating Models (2023), OOPSLA1, S Barke, MB James, N Polikarpova. [pdf]
LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning (2308), arxiv, J Lu, L Yu, X Li, et al.[pdf]
Compressing Pre-trained Models of Code into 3 MB (2022), ASE, J Shi, Z Yang, B Xu, et al.[pdf]

Attack Threats

You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion (2021), USENIX Security, R Schuster, C Song, E Tromer, et al. [pdf]
Adversarial Robustness of Deep Code Comment Generation (2022), TOSEM, Y Zhou, X Zhang, J Shen, et al. [pdf]
An extensive study on pre-trained models for program understanding and generation (2022), ISSTA, Z Zeng, H Tan, H Zhang, et al. [pdf]
Generating Adversarial Examples for Holding Robustness of Source Code Processing Models (2020), AAAI, H Zhang, Z Li, G Li, et al. [pdf]
Semantic Robustness of Models of Source Code (2020), SANER, G Ramakrishnan, J Henkel, Z Wang, et al. [pdf]
You see what I want you to see: poisoning vulnerabilities in neural code search (2022), FSE, Y Wan, S Zhang, H Zhang, et al. [pdf]
Contrabert: Enhancing code pre-trained models via contrastive learning (2023), ICSE, S Liu, B Wu, X Xie, et al. [pdf]
On the robustness of code generation techniques: An empirical study on github copilot (2023), ICSE, A Mastropaolo, L Pascarella, E Guglielmi, et al. [pdf]
Two sides of the same coin: Exploiting the impact of identifiers in neural code comprehension (2023), ICSE, S Gao, C Gao, C Wang, et al. [pdf]
Multi-target Backdoor Attacks for Code Pre-trained Models (2023), ACL, Y Li, S Liu, K Chen, et al. [pdf]
Backdooring Neural Code Search (2023), ACL, W Sun, Y Chen, G Tao, et al. [pdf]
ReCode: Robustness Evaluation of Code Generation Models (2022), ACL, S Wang, Z Li, H Qian, et al. [pdf]
Natural Attack for Pre-trained Models of Code (2022), ICSE, Z Yang, J Shi, J He, et al. [pdf]
Coprotector: Protect open-source code against unauthorized training usage with data poisoning (2022), WWW, Z Sun, X Du, F Song, et al. [pdf]
On the Security Vulnerabilities of Text-to-SQL Models (2211), ISSRE, X Peng, Y Zhang, J Yang, et al. [pdf]

Security Concerns in Generated Code

Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions (2022), S&P, H Pearce, B Ahmad, B Tan, et al. [pdf]
Automated repair of programs from large language models (2023), ICSE, Z Fan, X Gao, M Mirchev, et al. [pdf]
Cctest: Testing and repairing code completion systems (2023), ICSE, Z Li, C Wang, Z Liu, et al. [pdf]
Analyzing Leakage of Personally Identifiable Information in Language Models (2023), S&P, N Lukas, A Salem, R Sim, et al. [pdf]
CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot (2023), USENIX Security, L Niu, S Mirza, Z Maradni, et al. [pdf]

Language Models for Code Intelligence

Decoder-only Models

GPT-1

Release Date: 2018-06
Institute: OpenAI
Paper: Improving Language Understanding by Generative Pre-Training

GPT-2

Release Date: 2019-02
Institute: OpenAI
Paper: Language Models are Unsupervised Multitask Learners

GPT-3

Release Date: 2020-05
Institute: OpenAI
Paper: Language models are few-shot learners

Codex

Release Date: 2021-08
Institute: OpenAI
Paper: Evaluating Large Language Models Trained on Code

GPT-NeoX

Release Date: 2022-04
Access: ckpt
Paper: GPT-NeoX-20B: An Open-Source Autoregressive Language Model

GPT-Neo

Release Date: 2021-03
Source: Github

CodeGen

Release Date: 2022/03
Paper: CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

InstructGPT

Release Date: 2022/01
Paper: Training language models to follow instructions with human feedback

CodeGeeX

Title: CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
Year: 2023
Paper: Link

GPT-J

Release Date: 2023/06
Access: GPT-J-6B, GPT4All-J
Paper: GPT-J-6B: 6B JAX-Based Transformer

LLaMA

Release Date: 2023-02
Institute: Meta
Paper: LLaMA: Open and Efficient Foundation Language Models

ChatGPT

Release Date: 2022-11
Access: demo, api
Origin: Blog

StableLM-Alpha

Release Date: 2023/04
Access: StableLM-Alpha
Paper: Stability AI Launches the First of its StableLM Suite of Language Models

InCoder

Paper: "InCoder: A Generative Model for Code Infilling and Synthesis"
Authors: Daniel Fried et al.
Release Date: 2023
Paper: Link

GPT-4

Release Date: 2023-03
Institute: OpenAI
Paper: GPT-4 Technical Report

PaLM

Release Date: 2022-04
Institute: Google
Paper: PaLM: Scaling Language Modeling with Pathways

Vicuna

Release Date: 2023/03
Blog: Link

Flan-UL2

Release Date: 2023-03
Institute: Google
Blog: Flan-UL2 Blog

CPM-Bee

Release Date: 2022-10
Institute: Baidu
Paper: CPM: A Large-scale Generative Chinese Pre-trained Language Model

MT-NLG

Release Date: 2022-01
Institute: Microsoft
Paper: Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

GLM

Release Date: 2022-10
Institute: Tsinghua University
Paper: GLM-130B: AN OPEN BILINGUAL PRE-TRAINED MODEL

YaLM

Release Date: 2022-06
Institute: Yandex
Blog: YaLM Blog

Alpaca

Release Date: 2023-03
Institute: Stanford University
Access: Alpaca GitHub

RWKV-4

Release Date: 2022-09
Institute: Independent (BlinkDL)
Access: RWKV-4 GitHub

Sparrow

Release Date: 2022-09
Institute: DeepMind
Paper: Improving alignment of dialogue agents via targeted human judgements

Falcon

Release Date: 2023-05
Institute: Technology Innovation Institute (TII)
Access: Falcon Homepage

Code Llama

Release Date: 2023
Institute: Meta (Facebook)
Paper: Code Llama: Open Foundation Models for Code

RedPajama-INCITE

Release Date: Not specified
Blog: RedPajama-INCITE Blog

DeciCoder-1B

Release Date: 2023-08
Institute: Deci AI
Blog: DeciCoder Blog

OpenLLaMA

Release Date: 2023-05
Institute: Not specified
Access: OpenLLaMA Access

CodeGPT

Release Date: 2021
Paper: CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Encoder-only Models

BERT

Release Date: 2018-10
Institute: Google
Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

ALBERT

Release Date: 2019
Paper: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

RoBERTa

Release Date: 2019
Paper: RoBERTa: A Robustly Optimized BERT Pretraining Approach

CodeBERT

Release Date: 2020-04
Institute: Microsoft
Paper: CodeBERT: A Pre-Trained Model for Programming and Natural Languages

GraphCodeBERT

Release Date: 2022/03
Access: GraphCodeBERT
Paper: GraphCodeBERT: Pre-training Code Representations with Data Flow

Encoder-decoder Models

AlphaCode

Release Date: 2022/02
Access: AlphaCode
Institute: DeepMind

T5

Release Date: 2019
Paper: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Checkpoint: Link

CodeT5

Release Date: 2021
Access: CodeT5
Paper: CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

CodeT5+

Release Date: 2023/05
Access: CodeT5+
Paper: CodeT5+: Open Code Large Language Models for Code Understanding and Generation

UnixCoder

Release Date: 2022
Access: UniXcoder on Hugging Face
Paper: UniXcoder: Unified Cross-Modal Pre-training for Code Representation

PLBART

Release Date: 2021
Paper: Unified Pre-training for Program Understanding and Generation

CodeReviewer

Release Date: 2022
Access: CodeReviewer
Paper: Automating Code Review Activities by Large-Scale Pre-training

Relevant Surveys on LM4Code

Large Language Models for Software Engineering: Survey and Open Problems, 2023, paper
Large Language Models for Software Engineering: A Systematic Literature Review, 2023, paper
A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends, 2023, paper
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code, 2023, paper
Software testing with large language model: Survey, landscape, and vision, 2023, paper
Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey, 2023, paper
Generative Artificial Intelligence for Software Engineering--A Research Agenda, 2023, paper
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly, 2023, paper
Trustworthy and Synergistic Artificial Intelligence for Software Engineering: Vision and Roadmaps, 2023, paper
Large language models meet NL2Code: A survey, 2023, paper
A Survey on Pretrained Language Models for Neural Code Intelligence, 2022, paper

General Surveys on AI4SE

A systematic literature review on the use of deep learning in software engineering research, TOSEM 2022, paper
A survey on deep learning for software engineering, CSUR 2022, paper
Software engineering for AI-based systems: a survey, TOSEM 2021, paper
Machine/deep learning for software engineering: A systematic literature review, TSE 2022, paper
Machine Learning Applied to Software Testing: A Systematic Mapping Study, 2019, paper
A survey of machine learning for big code and naturalness, CSUR 2018, paper

General Surveys on LLM

Large Language Models: A Comprehensive Survey of Applications, Challenges, Limitations, and Future Prospects, 2023, paper
A survey of large language models, 2023, paper
A Survey on Evaluation of Large Language Models, 2023, paper
Recent advances in natural language processing via large pre-trained language models: A survey, CSUR 2023, paper
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4, 2023, paper
Challenges and Applications of Large Language Models: A Survey, 2023, paper
Harnessing the power of llms in practice: A survey on chatgpt and beyond, 2023, paper
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT, 2023, paper

Repositories and Resources for LM4Code

LLM4SE: Large Language Models for Software Engineering
- Repository
- This repository is associated with prominent software engineering conferences like ICSE, FSE, and ASE.
Awesome-Code-LLM
- Repository
- This is the repo for one survey - a comprehensive review of LLM researches for code. Works in each category are ordered chronologically. A curated list of language modeling researches for code and related datasets.
awesome-ai4code-papers
- Repository
- A collection of recent papers, benchmarks and datasets of AI4Code domain.
ml4code
- Repository
- Research on machine learning for source code.
awesome-machine-learning-on-source-code
- Repository
- Cool links & research papers related to Machine Learning applied to source code (MLonCode)
saltudelft/ml4se
- Repository
- A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering
CUHK-ARISE/ml4code-dataset
- Repository
- A collection of datasets for machine learning for big code

Repositories and Resources for LLM

Awesome-LLM4Tool: A Curated List of Resources for LLM Tools
- Repository
- Offers a curated list of papers, repositories, tutorials, and resources related to large language models for tools.
LLMsPracticalGuide: A Curated List of Practical Resources
- Repository
- It includes an evolutionary tree of modern Large Language Models to trace the development over the years
Hannibal046/Awesome-LLM
- Repository
- Awesome-LLM: a curated list of Large Language Model
awesome-decentralized-llm
- Repository
- Collection of LLM resources that can be used to build products you can "own" or to perform reproducible research.
RUCAIBox/LLMSurvey
- Repository
- The official GitHub page for the survey paper "A Survey of Large Language Models".
tensorchord/Awesome-LLMOps
- Repository
- An awesome & curated list of best LLMOps tools for developers
luban-agi/Awesome-Domain-LLM
- Repository
- A curated list of domain-specific large language models in Chinese
underlines/awesome-ml
- Repository
- Curated list of useful LLM / Analytics / Datascience resources

Benchmarks

Bug Repair

Defects4J

Release year: 2014
Paper: "Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs"

ManyBugs/IntroClass

Release year: 2015
Paper: "The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs"

BugAID

Release year: 2016
Paper: "Discovering Bug Patterns in JavaScript"

CoCoNut

Release year: 2020
Paper: "CoCoNuT: combining context-aware neural translation models using ensemble for program repair"

QuixBugs

Release year: 2017
Paper: "QuixBugs: a multi-lingual program repair benchmark set based on the quixey challenge"

Bugs.jar

Release year: 2018
Paper: "Bugs.jar: a large-scale, diverse dataset of real-world Java bugs"

BugsInPy

Release year: 2020
Paper: "BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies"

DeepFix

Release year: 2017
Paper: "DeepFix: Fixing Common C Language Errors by Deep Learning"

Code Generation/Synthesis

CONCODE

Release year: 2018
Paper: "Mapping Language to Code in Programmatic Context"

HumanEval

Release year: 2021
Paper: "Evaluating Large Language Models Trained on Code"

MBPP/MathQA-Python

Release year: 2021
Paper: "Program Synthesis with Large Language Models"

Code Sumarization

CODE-NN

Release year: 2016
Paper: "Summarizing Source Code using a Neural Attention Model"

TL-CodeSum

Release year: 2018
Paper: "Summarizing Source Code with Transferred API Knowledge"

CodeSearchNet

Release year: 2019
Paper: "CodeSearchNet Challenge: Evaluating the State of Semantic Code Search"

Cites

If you find this repository useful, please cite our survey paper:

@article{she2023pitfalls,
  title={Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey},
  author={She, Xinyu and Liu, Yue and Zhao, Yanjie and He, Yiling and Li, Li and Tantithamthavorn, Chakkrit and Qin, Zhan and Wang, Haoyu},
  journal={arXiv preprint arXiv:2310.17903},
  year={2023}
}

@article{hou2023large,
  title={Large language models for software engineering: A systematic literature review},
  author={Hou, Xinyi and Zhao, Yanjie and Liu, Yue and Yang, Zhou and Wang, Kailong and Li, Li and Luo, Xiapu and Lo, David and Grundy, John and Wang, Haoyu},
  journal={arXiv preprint arXiv:2308.10620},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github/workflows		.github/workflows
docs		docs
figures		figures
README.md		README.md
Study_identification_and_selection_process.png		Study_identification_and_selection_process.png
_config.yml		_config.yml
index.md		index.md
review_results.md		review_results.md

yueyueL/ReliableLM4Code

Folders and files

Latest commit

History

Repository files navigation