Skip to content

RenzeLou/awesome-instruction-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Instruction Learning

Awesome Stars

Commit PaperNumber PullRequests

🔥🔥🔥 An awesome reading list of Instruction Tuning and Following, including papers and datasets.

👉 Explore our latest survey update! Feel free to dive in and discover the improvements we've made 👀 🤗 : Latest Survey


❤️ Contribution

This repository is currently maintained by Renze Lou @ PennState and Kai Zhang @ OhioState. We appreciate any contributions ❤️.

If you have any suggestions or find any missed papers, feel free to reach out or submit a pull request:

  1. Use following markdown format.
**Paper Title.** *Author 1, Author 2, and Author 3.* <ins>Conference/Journal/Preprint</ins> Year. [[pdf](link)]; [[other resources](link)].
  1. If one preprint paper has multiple versions, please use the earliest submitted year.

  2. Display the papers in a year descending order (the latest, the first).

🥳 Citation

Find this repository helpful? 😊😊😊

Please consider citing our paper. 👇👇👇

@article{lou2023instruction,
  title={A Comprehensive Survey on Instruction Following},
  author={Lou, Renze and Zhang, Kai and Yin, Wenpeng},
  journal={arXiv preprint arXiv:2303.10475},
  year={2023}
}

🔍 Table of Contents


1. 💁🏽‍♀️ Introduction

Why instruction-driven learning instead of example-driven learning?

  • 👉 Affordable. For the conventional example-driven supervised learning, each downstream task usually requires extensive labeled examples 💰. While for instruction learning, each downstream task may require only one instruction and just a few examples 🤩.
  • 👉 One model, all tasks. An ideal AI system should be able to quickly understand and handle various new tasks 💫.
  • 👉 A promising research direction. Traditional example-driven supervised learning uses labeled instances to represent the task semantics, i.e., training models by observing numerous examples to recover the original task meaning. Therefore, why not directly use the task instruction, which has already occupied the essential task semantics?

2. 🎓 Surveys and Tutorials

We use the label comprehensive to denote the papers with a more comprehensive perspective. While some other papers are more specific to a certain in-context instruction, including prompt, few-shot in-context demonstrations, and CoT reasoning.

  1. A Comprehensive Survey on Instruction Following. Renze Lou, Kai Zhang, and Wenpeng Yin. Preprint 2023. [pdf]; [paper list]. comprehensive

  2. Learning from Task Instructions. Wenpeng Yin, Qinyuan Ye, Pengfei Liu, Xiang Ren, and Hinrich Schütze. EMNLP Tutorial 2023. [pdf]. comprehensive

  3. Nature Language Reasoning, A Survey. Fei Yu, Hongbo Zhang, and Benyou Wang. Preprint 2023. [pdf]; [paper list]. reasoning

  4. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. ACM Computing Surveys 2023. [pdf]; [website]. prompt

  5. A Survey on In-context Learning. Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, Lei Li, and Zhifang Sui. Preprint 2022. [pdf]. in-context demonstrations

  6. Towards Reasoning in Large Language Models: A Survey. Jie Huang, and Kevin Chen-Chuan Chang. Preprint 2022. [pdf]; [paper list]. reasoning

  7. Reasoning with Language Model Prompting: A Survey. Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, and Huajun Chen. Preprint 2022. [pdf]; [paper list]. reasoning

3. 📚 Corpora

The high-quality dataset is the key factor for successful instruction tuning. Therefore, we put the "corpora" section here to emphasize its importance.

We carefully design the following table, make it easy to be referred to, and keep it up-to-date. Hope it can contribute to future research of instruction tuning. 🤗

(Some rows come from Longpre et al., thanks for their great work ❤️.)

Name  Release Data/Code Scale Language Annotator
#Tasks #Ins. (K)
UnifiedQA 05/2020 Link 46 750 ✍ Human
CrossFit 04/2021 Link 159 71,000 ✍ Human
Natural Inst. v1 04/2021 Link 61 620 ✍ Human
Flan 2021 09/2021 Link 62 4,400 ✍ Human
P3 10/2021 Link 62 12,000 ✍ Human
MetaICL 10/2021 Link 142 3,500 ✍ Human
ExMix 11/2021 Link 107 500 ✍ Human

SuperNI

(Natural Inst. v2)

04/2022 Link 1,613 5,000 ✍ Human
GLM 10/2022 Link 77 12,000 ✍ Human
Flan 2022 10/2022 Link 1,836 15,000 ✍ Human
xP3 11/2022 Link 71 81,000 ✍ Human
Unnatural Inst. 12/2022 Link 117 64

🤖 InstructGPT002

text-davinci-002

Self-Instruct 12/2022 Link / 82

🤖 GPT-3 

davinci

OPT-IML 12/2022 / 2,207 18,000 ✍ Human
Alpaca 03/2023 Link / 52

🤖 InstructGPT003

text-davinci-003

Baize 04/2023 Link / 100

🤖 ChatGPT

Koala 04/2023 / / /

✍ Human

🤖 ChatGPT

GPT4All 04/2023 Link / 808

✍ Human

🤖 ChatGPT

Alpaca-gpt4 04/2023 Link / 113

🤖 GPT-4 

gpt-4

Vicuna 04/2023 / / 76

✍ Human

🤖 ChatGPT

Dolly 04/2023 Link / 15 ✍ Human
Oasst 04/2023 Link / 84

✍ Human
LongForm 04/2023 Link / 27

✍ Human

🤖 InstructGPT003

text-davinci-003

Symbolic-Instruct 04/2023 Link / 796

✍ Human

Synthetic Examples

LaMini 04/2023 Link / 2,580

🤖 ChatGPT

WizardLM 04/2023 Link / 196

🤖 ChatGPT

COEDIT 05/2023 Link / 82

✍ Human

UltraChat 05/2023 Link / 1,500

🤖 ChatGPT

CoT Collection 05/2023 Link 1,060 1,880

🤖 Codex

Dynosaur 05/2023 Link 5,740 801

🤖 ChatGPT

MUFFIN 10/2023 Link / 68

🤖 ChatGPT

🤖 GPT-4 

✍ Human

Dynamics-of-Instruction 10/2023 Link / 40

✍ Human

CoachLM 11/2023 Link / 2

✍ Human

DEITA 12/2023 Link / 10

🤖 ChatGPT

WaveCoder 12/2023 Link 4 code-related tasks 20

🤖 ChatGPT

🤖 GPT-4

Conifer 04/2024 Link / 13

🤖 GPT-4

4. 🗂️ Taxonomy

In our paper, we divide the textual instructions into three categories.

4.1 Entailment-oriented Instruction

entailment_oriented

Entailment-oriented instruction regards the task input as the premise, and constructs the task output into the hypothesis. It unifies the conventional classification problems into a textual entailment paradigm.

  1. A Universal Discriminator for Zero-Shot Generalization. Haike Xu, Zongyu Lin, Jing Zhou, Yanan Zheng, and Zhilin Yang. ACL 2023. [pdf]; [code].

  2. ConEntail: An Entailment-based Framework for Universal Zero and Few Shot Classification with Supervised Contrastive Pretraining. Ranran Haoran Zhang, Aysa Xuemo Fan, and Rui Zhang. EACL 2023. [pdf]; [code].

  3. OpenStance: Real-world Zero-shot Stance Detection. Hanzi Xu, Slobodan Vucetic, and Wenpeng Yin. CoNLL 2022. [pdf]; [code].

  4. Ultra-fine Entity Typing with Indirect Supervision from Natural Language Inference. Bangzheng Li, Wenpeng Yin, and Muhao Chen. TACL 2022. [pdf]; [code].

  5. Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning. Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, and Eneko Agirre. Findings of NAACL 2022. [pdf]; [code].

  6. Label Verbalization and Entailment for Effective Zero and Few-Shot Relation Extraction. Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, and Eneko Agirre. EMNLP 2021. [pdf]; [code].

  7. Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections. Ruiqi Zhong, Kristy Lee, Zheng Zhang, and Dan Klein. Findings of EMNLP 2021. [pdf]; [code].

  8. Incremental Few-shot Text Classification with Multi-round New Classes: Formulation, Dataset and System. Congying Xia, Wenpeng Yin, Yihao Feng, and Philip Yu. NAACL 2021. [pdf]; [code].

  9. ExpBERT: Representation Engineering with Natural Language Explanations. Shikhar Murty, Pang Wei Koh, and Percy Liang. ACL 2020. [pdf]; [code].

  10. Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach. Wenpeng Yin, Jamaal Hay, Dan Roth . EMNLP 2019. [pdf]; [website].

4.2 PLM-oriented Instruction

plm_oriented

PLM-oriented instruction (i.e., prompt) aims to construct a cloze-style input to steer pre-trained language models (PLM) for responses. Here, we diaplay several representative works of PLM-oriented instruction learning. For more works, please refer to this repository and this survey.

  1. How Does In-Context Learning Help Prompt Tuning? Simeng Sun, Yang Liu, Dan Iter, Chenguang Zhu, and Mohit Iyyer. Preprint 2023. [pdf].

  2. Demystifying Prompts in Language Models via Perplexity Estimation. Hila Gonen, Srini Iyer, Terra Blevins, Noah A. Smith, and Luke Zettlemoyer. Preprint 2022. [pdf].

  3. RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning. Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, and et al. EMNLP 2022. [pdf]; [code].

  4. PPT: Pre-trained Prompt Tuning for Few-shot Learning. Yuxian Gu, Xu Han, Zhiyuan Liu, and Minlie Huang. ACL 2022. [pdf]; [code].

  5. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. ACL 2022. [pdf]; [code].

  6. KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction. Xiang Chen, Ningyu Zhang, Xin Xie, and et al. WWW 2022. [pdf]; [code].

  7. GPT Understands, Too. Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. Preprint 2021. [pdf]; [code].

  8. Few-Shot Text Generation with Natural Language Instructions. Timo Schick and Hinrich Schütze. EMNLP 2021. [pdf]; [code].

  9. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. Timo Schick and Hinrich Schütze. NAACL 2021. [pdf]; [code].

  10. Learning How to Ask: Querying LMs with Mixtures of Soft Prompts. Guanghui Qin and Jason Eisner. NAACL 2021. [pdf]; [code].

  11. Prefix-Tuning: Optimizing Continuous Prompts for Generation. Xiang Lisa Li and Percy Liang. ACL 2021. [pdf]; [code].

  12. Making Pre-trained Language Models Better Few-shot Learners. Tianyu Gao, Adam Fisch, and Danqi Chen. ACL 2021. [pdf]; [code].

  13. Template-Based Named Entity Recognition Using BART. Leyang Cui, Yu Wu, Jian Liu, Sen Yang, and Yue Zhang. Findings of ACL 2021. [pdf]; [code].

  14. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. Timo Schick and Hinrich Schütze. EACL 2021. [pdf]; [code].

  15. Language Models are Unsupervised Multitask Learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Preprint 2019. [pdf].

4.3 Human-oriented Instruction

Human-oriented Instruction

Human-oriented instruction is initially designed for human to understand the task and annotate the data, such as the Amazon MTurk Instructions, which provides sufficient information about the task (e.g., detailed definition).

  1. Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors. Kai Zhang, Bernal Jiménez Gutiérrez, and Yu Su. Findings of ACL 2023. [pdf]; [code].

  2. Symbol tuning improves in-context learning in language models. Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, and et al. Preprint 2023. [pdf].

  3. Small Models are Valuable Plug-ins for Large Language Models. Canwen Xu, Yichong Xu, Shuohang Wang, Yang Liu, Chenguang Zhu, and Julian McAuley. Preprint 2023. [pdf]; [code].

  4. How Many Data Samples is an Additional Instruction Worth? Ravsehaj Singh Puri, Swaroop Mishra, Mihir Parmar, and Chitta Baral. Findings of EACL 2023. [pdf]; [code].

  5. In-Context Instruction Learning. Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, and Minjoon Seo. Preprint 2023. [pdf]; [code].

  6. InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis. Kevin Scaria, Himanshu Gupta, Saurabh Arjun Sawant, Swaroop Mishra, and Chitta Baral. Preprint 2023. [pdf]; [code].

  7. HINT: Hypernetwork Instruction Tuning for Efficient Zero-Shot Generalisation. Hamish Ivison, Akshita Bhagia, Yizhong Wang, Hannaneh Hajishirzi, and Matthew Peters. Preprint 2022. [pdf].

  8. Boosting Natural Language Generation from Instructions with Meta-Learning. Budhaditya Deb, Guoqing Zheng, and Ahmed Hassan Awadallah. Preprint 2022. [pdf].

  9. GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models. Archiki Prasad, Peter Hase, Xiang Zhou, and Mohit Bansal. Preprint 2022. [pdf]; [code].

  10. ConTinTin: Continual Learning from Task Instructions. Wenpeng Yin, Jia Li, and Caiming Xiong. ACL 2022. [pdf].

  11. InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning. Prakhar Gupta, Cathy Jiao, Yi-Ting Yeh, Shikib Mehri, Maxine Eskenazi, and Jeffrey P. Bigham. EMNLP 2022. [pdf]; [code].

  12. Learning to Generate Task-Specific Adapters from Task Description. Qinyuan Ye and Xiang Ren. ACL 2021. [pdf]; [code].

  13. The Turking Test: Can Language Models Understand Instructions? Avia Efrat and Omer Levy. Preprint 2020. [pdf].

5. 📊 Analyses

5.1 Scale

The model and task scale are found to be important for instruction-based fine-tuning. Basically, the larger model scale brings more benefits to the generalization, and so does the task scale. However, some works raised objections (e.g., Jang et al. and Wang et al.).

  1. Exploring the Benefits of Training Expert Language Models over Instruction Tuning. Joel Jang, Seungone Kim, Seonghyeon Ye, and et al. Preprint 2023. [pdf]; [code].

  2. The Flan Collection: Designing Data and Methods for Effective Instruction Tuning. Shayne Longpre, Le Hou, Tu Vu, and et al. Preprint 2023. [pdf]; [code]; [corpus].

  3. UL2: Unifying Language Learning Paradigms. Yi Tay, Mostafa Dehghani, Vinh Q. Tran, and et al. Preprint 2022. [pdf]; [checkpoint].

  4. OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization. Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru, and et al. Preprint 2022. [pdf].

  5. Scaling Instruction-Finetuned Language Models. Hyung Won Chung, Le Hou, Shayne Longpre, and et al. Preprint 2022. [pdf]; [checkpoint].

  6. Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization. Yuxian Gu, Pei Ke, Xiaoyan Zhu, and Minlie Huang. EMNLP 2022. [pdf]; [code].

  7. Emergent Abilities of Large Language Models. Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, and et al. TMLR 2022. [pdf].

  8. Multitask Prompted Training Enables Zero-Shot Task Generalization. Victor Sanh, Albert Webson, Colin Raffel, and et al. ICLR 2022. [pdf]; [checkpoint]; [corpus].

  9. Finetuned Language Models are Zero-Shot Learners. Jason Wei, Maarten Bosma, Vincent Zhao, and et al. ICLR 2022. [pdf]; [code].

  10. Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks. Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, and Heng Ji. Preprint 2022. [pdf]; [code].

  11. ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization. Hanwei Xu, Yujun Chen, Yulun Du, Nan Shao, Yanggang Wang, Haiyu Li, and Zhilin Yang. Preprint 2022. [pdf].

  12. The Power of Scale for Parameter-Efficient Prompt Tuning. Brian Lester, Rami Al-Rfou, and Noah Constant. EMNLP 2021. [pdf]; [code].

5.2 Explanability

We exhibit works that focus on the interpretability and reliability of instruction learning, i.e., explaining when and why instruction can take effect.

  1. What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning. Jane Pan, Tianyu Gao, Howard Chen, and Danqi Chen. Findings of ACL 2023. [pdf]; [code].

  2. REV: Information-Theoretic Evaluation of Free-Text Rationales. Hanjie Chen, Faeze Brahman, Xiang Ren, and et al. ACL 2023. [pdf]; [code].

  3. Interpretability at Scale: Identifying Causal Mechanisms in Alpaca. Zhengxuan Wu, Atticus Geiger, Christopher Potts, and Noah D. Goodman. Preprint 2023. [pdf]; [code].

  4. Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning. Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, and William Yang Wang. Preprint 2023. [pdf]; [code].

  5. The Learnability of In-Context Learning. Noam Wies, Yoav Levine, and Amnon Shashua. Preprint 2023. [pdf].

  6. Why think step-by-step? Reasoning emerges from the locality of experience. Ben Prystawski, and Noah D. Goodman. Preprint 2023. [pdf].

  7. Larger language models do in-context learning differently. Jerry Wei, Jason Wei, Yi Tay, and et al. Preprint 2023. [pdf].

  8. ​​What learning algorithm is in-context learning? Investigations with linear models. Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou. ICLR 2023. [pdf]; [code].

  9. Can language models learn from explanations in context? Andrew K. Lampinen, Ishita Dasgupta, Stephanie C. Y. Chan, and et al. Findings of EMNLP 2022. [pdf].

  10. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. EMNLP 2022. [pdf]; [code].

  11. Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts. Daniel Khashabi, Xinxi Lyu, Sewon Min, and et al. NAACL 2022. [pdf]; [code].

  12. Do Prompt-Based Models Really Understand the Meaning of Their Prompts?. Albert Webson and Ellie Pavlick. NAACL 2022. [pdf]; [code].

  13. Reframing Instructional Prompts to GPTk’s Language. Swaroop Mishra, Daniel Khashabi, Chitta Baral, Yejin Choi, and Hannaneh Hajishirzi. Findings of ACL 2022. [pdf]; [code].

  14. What Makes Good In-Context Examples for GPT-3? Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. ACL Workshop 2022. [pdf]; [code].

  15. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. ACL 2022. [pdf].

  16. Calibrate Before Use: Improving Few-shot Performance of Language Models. Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. ICML 2021. [pdf]; [code].

5.3 Robustness and Safety

  1. Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection. Jun Yan, Vikas Yadav, Shiyang Li, and et al. Workshop @ NeurIPS 2023. [pdf].

  2. Evaluating the Zero-shot Robustness ofInstruction-tuned Language Models. Jiuding Sun, Chantal Shaib, and Byron C. Wallace. Preprint 2023. [pdf].

  3. Poisoning Language Models During Instruction Tuning. Alexander Wan, Eric Wallace, Sheng Shen, and Dan Klein. ICML 2023. [pdf]; [code].

  4. Multi-step Jailbreaking Privacy Attacks on ChatGPT. Haoran Li, Dadi Guo, Wei Fan, Mingshi Xu, Jie Huang, Fanpu Meng, and Yangqiu Song. Preprint 2023. [pdf].

  5. More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models. Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Preprint 2023. [pdf]; [code].

  6. Robustness of Learning from Task Instructions. Jiasheng Gu, Hanzi Xu, Liangyu Nie, and Wenpeng Yin. Preprint 2022. [pdf].

  7. Learning from Task Descriptions. Orion Weller, Nicholas Lourie, Matt Gardner, and Matthew E. Peters. EMNLP 2020. [pdf]; [code]; [corpus].

5.4 Evaluation

Stop using old-school automatic metrics to evaluate your instruction-tuned system; try more advanced methods to do it comprehensively!

  1. Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2. Hamish Ivison, Yizhong Wang, Valentina Pyatkin, and et al. Preprint 2023. [pdf]; [model&data]

  2. How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources. Yizhong Wang, Hamish Ivison, Pradeep Dasigi, and et al. NeurIPS Datasets and Benchmarks 2023. [pdf]; [code].

  3. Instruction-following Evaluation through Verbalizer Manipulation. Shiyang Li, Jun Yan, Hai Wang, Zheng Tang, Xiang Ren, Vijay Srinivasan, Hongxia Jin Preprint 2023. [pdf].

  4. INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models. Yew Ken Chia, Pengfei Hong, Lidong Bing, and Soujanya Poria. Preprint 2023. [pdf]; [code]; [leaderboard].

5.5 Negation

Negation expressions, such as do not and avoid doing, are difficult for models to corretly understand and follow.

  1. Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts. Joel Jang, Seonghyeon Ye, and Minjoon Seo. ICML Workshop 2023. [pdf].

  2. Understanding by Understanding Not: Modeling Negation in Language Models. Arian Hosseini, Siva Reddy, Dzmitry Bahdanau, and et al. NAACL 2021. [pdf]; [code].

5.6 Complexity

Papers are focusing on enhancing the complexity of instructions to enhance model competence. More complex data in the mix of instruction data, more competent performance model could achieve.

  1. Wizardlm: Empowering large language models to follow complex instructions. Xu, Can and Sun, Qingfeng and Zheng, Kai and Geng, Xiubo and Zhao, Pu and Feng, Jiazhan and Tao, Chongyang and Jiang, Daxin. Prepint 2023. [pdf]; [code].

  2. Orca: Progressive learning from complex explanation traces of gpt-4. Mukherjee, Subhabrata and Mitra, Arindam and Jawahar, Ganesh and Agarwal, Sahaj and Palangi, Hamid and Awadallah, Ahmed. Prepint 2023. [pdf].

  3. A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment. Zhao, Yingxiu and Yu, Bowen and Hui, Binyuan and Yu, Haiyang and Huang, Fei and Li, Yongbin and Zhang, Nevin L. Prepint 2023. [pdf]; [code].

5.7 Other Papers

  1. Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions. Mihir Parmar, Swaroop Mishra, Mor Geva, and Chitta Baral. EACL 2023. [pdf]; [code].
  2. Instruction Tuned Models are Quick Learners. Himanshu Gupta, Saurabh Arjun Sawant, Swaroop Mishra, et al. Preprint 2023. [pdf]; [code].
  3. Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning. Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin Raffel. NeurIPS 2022. [pdf]; [code].
  4. A Survey of NLP-Related Crowdsourcing HITs: what works and what does not. Jessica Huynh, Jeffrey Bigham, and Maxine Eskenazi. Preprint 2021. [pdf].

6. 🤖 Applications

6.1 Human-Computer Interaction

Instructions are used in various human-computer interaction (HCI) tasks, such as virtual assistants, chatbots, etc.

  1. Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing. Tuhin Chakrabarty, Vishakh Padmakumar, and He He. EMNLP 2022. [pdf]; [code].

  2. HELP ME THINK: A Simple Prompting Strategy for Non-experts to Create Customized Content with Models. Swaroop Mishra, and Elnaz Nouri. Preprint 2022. [pdf].

  3. EditEval: An Instruction-Based Benchmark for Text Improvements. Jane Dwivedi-Yu, Timo Schick, Zhengbao Jiang, and et al. Preprint 2022. [pdf]; [code]; [website].

  4. Communicating Natural Programs to Humans and Machines. Sam Acquaviva, Yewen Pu, Marta Kryven, and et al. NeurIPS Workshop 2022. [pdf]; [code].

  5. Interactive Task Learning from GUI-Grounded Natural Language Instructions and Demonstrations. Toby Jia-Jun Li, Tom Mitchell, and Brad Myers. ACL Demo 2020. [pdf]; [code]; [video].

  6. Multi-Modal Interactive Task Learning from Demonstrations and Natural Language Instructions. Toby Jia-Jun Li. UIST 2020. [pdf]; [code].

  7. Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following. David Gaddy, and Dan Klein. ACL 2019. [pdf].

  8. VirtualHome: Simulating Household Activities via Programs. Xavier Puig, Kevin Ra, Marko Boben, and et al. CVPR 2018. [pdf]; [website].

  9. Natural Language Communication with Robots. Yonatan Bisk, Deniz Yuret, and Daniel Marcu. NAACL 2016. [pdf]; [website].

  10. Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World. Jayant Krishnamurthy, and Thomas Kollar. TACL 2013. [pdf]; [code].

  11. Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions. Yoav Artzi, and Luke Zettlemoyer. TACL 2013. [pdf].

  12. Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision. Joohyun Kim, and Raymond Mooney. EMNLP 2012. [pdf].

  13. A joint model of language and perception for grounded attribute learning. Cynthia Matuszek, Nicholas FitzGerald, Luke Zettlemoyer, Liefeng Bo, and Dieter Fox. ICML 2012. [pdf].

  14. Learning to Interpret Natural Language Instructions. Monica Babeş-Vroman, James MacGlashan, Ruoyuan Gao, and et al. ACL Workshop 2012. [pdf].

  15. Fast Online Lexicon Learning for Grounded Language Acquisition. David Chen. ACL 2012. [pdf].

  16. Learning to Win by Reading Manuals in a Monte-Carlo Framework. S.R.K. Branavan, David Silver, and Regina Barzilay. ACL 2011. [pdf]; [website].

  17. Learning from natural instructions. Dan Goldwasse, and Dan Roth. IJCAI 2011. [pdf].

  18. Learning to Interpret Natural Language Navigation Instructions from Observations. David L. Chen and Raymond J. Mooney. AAAI 2011. [pdf].

  19. Approaching the Symbol Grounding Problem with Probabilistic Graphical Models. Stefanie Tellex, Thomas Kollar, Steven Dickerson, and et al. AAAI 2011. [pdf].

  20. Driving Semantic Parsing from the World’s Response. James Clarke, Dan Goldwasser, Ming-Wei Chang, and Dan Roth. CoNLL 2010. [pdf].

  21. Learning to Follow Navigational Directions. Adam Vogel, and Daniel Jurafsky. ACL 2010. [pdf].

  22. Reading between the Lines: Learning to Map High-Level Instructions to Commands. S.R.K. Branavan, Luke Zettlemoyer, and Regina Barzilay. ACL 2010. [pdf]; [website].

  23. Reading to Learn: Constructing Features from Semantic Abstracts. Jacob Eisenstein, James Clarke, Dan Goldwasser, and Dan Roth. EMNLP 2009. [pdf]; [website].

  24. Learning Semantic Correspondences with Less Supervision. Percy Liang, Michael Jordan, and Dan Klein. ACL 2009. [pdf].

  25. Reinforcement Learning for Mapping Instructions to Actions. S.R.K. Branavan, Harr Chen, Luke Zettlemoyer, and Regina Barzilay. ACL 2009. [pdf]; [website].

  26. Learning to sportscast: a test of grounded language acquisition. David L. Chen and Raymond J. Mooney. ICML 2008. [pdf].

  27. Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer. Gregory Kuhlmann, Peter Stone, Raymond Mooney, and Jude Shavlik. AAAI Workshop 2004. [pdf]; [website].

6.2 Data and Feature Augmentation

Some instructions (e.g., label explanations) are also be used for automatic annotation (i.e., data augmentation), or for enriching feature.

  1. One Embedder, Any Task: Instruction-Finetuned Text Embeddings. Hongjin Su, Weijia Shi, Jungo Kasai, and et al. Preprint 2022. [pdf]; [website].

  2. Prompt Consistency for Zero-Shot Task Generalization. Chunting Zhou, Junxian He, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. Findings of EMNLP 2022. [pdf]; [code].

  3. Teaching Machine Comprehension with Compositional Explanations. Qinyuan Ye, Xiao Huang, Elizabeth Boschee, and Xiang Ren. Findings of EMNLP 2020. [pdf]; [code].

  4. Learning from Explanations with Neural Execution Tree. Ziqi Wang, Yujia Qin, Wenxuan Zhou, Jun Yan, Qinyuan Ye, Leonardo Neves, Zhiyuan Liu, and Xiang Ren. ICLR 2020. [pdf]; [website].

  5. Training Classifiers with Natural Language Explanations. Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy Liang, and Christopher Ré. ACL 2018. [pdf]; [code].

  6. Zero-shot Learning of Classifiers from Natural Language Quantification. Shashank Srivastava, Igor Labutov, and Tom Mitchell. ACL 2018. [pdf].

  7. Joint Concept Learning and Semantic Parsing from Natural Language Explanations. Shashank Srivastava, Igor Labutov, and Tom Mitchell. EMNLP 2017. [pdf].

6.3 General-purpose Language Models

General-purpose language models are also one of the most attractive applications of instruction learning, e.g., ChatGPT, which can align nicely with human values.

  1. Sparks of Artificial General Intelligence: Early experiments with GPT-4. Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, and et al. Preprint 2023. [pdf].

  2. GPT-4 Technical Report. OpenAI. Preprint 2023. [pdf]; [blog].

  3. The Wisdom of Hindsight Makes Language Models Better Instruction Followers. Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, and Joseph E. Gonzalez. Preprint 2023. [pdf]; [code].

  4. Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models. Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, and Bryan Catanzaro. Preprint 2023. [pdf].

  5. Training language models to follow instructions with human feedback. Long Ouyang, Jeffrey Wu, Xu Jiang, and et al. NeurIPS 2022. [pdf].

6.4 Other Papers

  1. GPTScore: Evaluate as You Desire. Jinlan Fu, See-Kiong Ng, Zhengbao Jiang, and Pengfei Liu. Preprint 2023. [pdf]; [code].

  2. MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning. Zhiyang Xu, Ying Shen, and Lifu Huang. Preprint 2022. [pdf].

  3. Task-aware Retrieval with Instructions. Akari Asai, Timo Schick, Patrick Lewis, and et al. Preprint 2022. [pdf]; [code].

  4. UnifiedABSA: A Unified ABSA Framework Based on Multi-task Instruction Tuning. Zengzhi Wang, Rui Xia, and Jianfei Yu. Preprint 2022. [pdf].

  5. In-Context Learning for Few-Shot Dialogue State Tracking. Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, and Mari Ostendorf. Findings of EMNLP 2022. [pdf]; [code].

  6. Few-shot Learning with Multilingual Language Models. Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, and et al. EMNLP 2022. [pdf]; [code].

  7. UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models. Tianbao Xie, Chen Henry Wu, Peng Shi, and et al. EMNLP 2022. [pdf]; [code]; [website].

  8. In-BoXBART: Get Instructions into Biomedical Multi-Task Learning . Mihir Parmar, Swaroop Mishra, Mirali Purohit, Man Luo, M. Hassan Murad, and Chitta Baral. Findings of NAACL 2022. [pdf]; [code].

7. 📖 Extended Reading

We also share some other awesome papers that might inspire the future work.

7.1 Instruction Induction

  1. Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners. Seonghyeon Ye, Doyoung Kim, Joel Jang, Joongbo Shin, and Minjoon Seo. Preprint 2022. [pdf]; [code].

  2. Instruction Induction: From Few Examples to Natural Language Task Descriptions. Or Honovich, Uri Shaham, Samuel R. Bowman, and Omer Levy. Preprint 2022. [pdf]; [code].

  3. Learning to Decompose and Organize Complex Tasks. Yi Zhang, Sujay Kumar Jauhar, Julia Kiseleva, Ryen White, and Dan Roth. NAACL 2021. [pdf]; [corpus].

  4. Analogous Process Structure Induction for Sub-event Sequence Prediction. Hongming Zhang, Muhao Chen, Haoyu Wang, Yangqiu Song, and Dan Roth. EMNLP 2020. [pdf]; [code].

7.2 ChatGPT-related Papers

Nowdays, ChatGPT is a super star 🌟 in the NLP community. Since there is no official paper for ChatGPT, we share some frontier works that can provide deep insights into ChatGPT.

  1. When do you need Chain-of-Thought Prompting for ChatGPT? Jiuhai Chen, Lichang Chen, Heng Huang, and Tianyi Zhou. Preprint 2023. [pdf].

  2. Toxicity in ChatGPT: Analyzing Persona-assigned Language Models. Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, and Karthik Narasimhan. Preprint 2023. [pdf].

  3. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, and Diyi Yang. Preprint 2023. [pdf].

  4. How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. Biyang Guo, Xin Zhang, Ziyuan Wang, and et al. Preprint 2023. [pdf]; [corpus].

  5. ChatGPT: Jack of all trades, master of none. Jan Kocoń, Igor Cichecki, Oliwier Kaszyca, and et al. Preprint 2023. [pdf].

  6. On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective. Jindong Wang, Xixu Hu, Wenxin Hou, and et al. Preprint 2023. [pdf]; [code].

7.3 Human Feedback vs. Model Feedback

  1. Aligning Large Language Models through Synthetic Feedback. Sungdong Kim, Sanghwan Bae, Jamin Shin, Soyoung Kang, Donghyun Kwak, Kang Min Yoo, and Minjoon Seo. Preprint 2023. [pdf].

  2. LIMA: Less Is More for Alignment. Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, and et al. Preprint 2023. [pdf].

  3. Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision. Zhiqing Sun, Yikang Shen, Qinhong Zhou, and et al. Preprint 2023. [pdf]; [code].

  4. Chain of Hindsight Aligns Language Models with Feedback. Hao Liu, Carmelo Sferrazza, and Pieter Abbeel. Preprint 2023. [pdf]; [code].

  5. Pretraining Language Models with Human Preferences. Tomasz Korbak, Kejian Shi, Angelica Chen, and et al. Preprint 2023. [pdf].

  6. Constitutional AI: Harmlessness from AI Feedback. Yuntao Bai, Saurav Kadavath, Sandipan Kundu, and et al. Preprint 2022. [pdf]; [corpus].

  7. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. Yuntao Bai, Andy Jones, Kamal Ndousse, and et al. Preprint 2022. [pdf]; [corpus].

7.4 Scalable Oversight and Alignment

  1. Measuring Progress on Scalable Oversight for Large Language Models. Samuel R. Bowman, Jeeyoon Hyun, Ethan Perez, and et al. Preprint 2022. [pdf].

  2. Aligning AI With Shared Human Values. Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt. ICLR 2021. [pdf].

7.5 Other Papers

  1. Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models. Kaitlyn Zhou, Dan Jurafsky, and Tatsunori Hashimoto. Preprint 2023. [pdf].

  2. The Capacity for Moral Self-Correction in Large Language Models. Deep Ganguli, Amanda Askell, Nicholas Schiefer, and et al. Preprint 2023. [pdf].

  3. Large Language Models Can Be Easily Distracted by Irrelevant Context. Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed Chi, Nathanael Schärli, and Denny Zhou. Preprint 2023. [pdf]; [corpus].

  4. Language Models (Mostly) Know What They Know. Saurav Kadavath, Tom Conerly, Amanda Askell, and et al. Preprint 2022. [pdf].


⭐ Star History

Star History Chart