Skip to content

thunlp/CSS-LM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSS-LM

CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models

  • WWW-Workshop 2021 Accepted.

  • IEEE/TASLP 2021 Accepted.

Overview

CSS-LM CSS-LM improves the fine-tuning phase of PLMs via contrastive semi-supervised learning. Specifically, given a specific task, we retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to the task. By performing contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances, CSS-LM can help PLMs capture crucial task-related semantic features and achieve better performance in low-resource scenarios.

Setups

  • python>=3.6
  • torch>=2.0.0+cu118

Requirements

pip install -r requirement.sh

Prepare the data

Download the open domain corpus (openwebtext) and backbone models (roberta-base, bert-base-uncased) and move them to the corresponding directories.

wget https://cloud.tsinghua.edu.cn/f/690e78d324ee44068857/?dl=1
mv 'index.html?dl=1' download.zip
unzip download.zip

rm -rf __MACOSX
scp -r download/openwebtext data
scp -r download/roberta-base script/roberta-base-768
scp -r download/bert-base-uncased script/bert-base-768

Semi-supervised Contrastive Fine-tuning (CSS-LM)

The CSS-LM (run_${DATASET}_sscl_dt_k.sh and run_bert_${DATASET}_sscl_dt_k.sh) is our main method. Users can run the the example of script/semeval_example.sh

for i_th in {1..5};
do
    #RoBERTa-base Model
    bash run_semeval_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_semeval_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th

    #BERT-base Moodel
    bash run_bert_semeval_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_semeval_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th

done

We will introduce the the whole training pipeline and provide the detail of arguments in the following parts.

Run the All Experiments

Excute 'script/run1.sh'.

cd script
bash run1.sh

The run1.sh script.

for i_th in {1..5};
do
    #RoBERTa-based Model
    bash run_${DATASET}_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_${DATASET}_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_${DATASET}_st.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_${DATASET}_sscl.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th

    #BERT-based Moodel
    bash run_bert_${DATASET}_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_${DATASET}_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_${DATASET}_st.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_${DATASET}_sscl.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
done

In run1.sh, we have two kinds of backbone models (BERT and RoBERTa).

RoBERTa-based

  • run_${DATASET}_finetune.sh: Few-shot Fine-tuning (Standard)
  • run_${DATASET}_sscl_dt_k.sh: Semi-supervised Contrastive Fine-tuning (CSS-LM)
  • run_${DATASET}_st.sh: Supervised Contrastive Fine-tuning (SCF)
  • run_${DATASET}_sscl.sh: Semi-supervised Contrastive Pseudo Labeling Fine-tuning (CSS-LM-ST)

BERT-based

  • run_bert_${DATASET}_finetune.sh: Few-shot Fine-tuning (Standard)
  • run_bert_${DATASET}_finetune.sh: Semi-supervised Contrastive Fine-tuning (CSS-LM)
  • run_bert_${DATASET}_finetune.sh: Supervised Contrastive Fine-tuning (SCF)
  • run_bert_${DATASET}_finetune.sh: Semi-supervised Contrastive Pseudo Labeling Fine-tuning (CSS-LM-ST)

Arguments

  • ${DATASET}: Can be semeval, sst5, scicite, aclintent, sciie, chemprot, and chemprot.
  • $gpu_0 $gpu_1 $gpu_2 $gpu_3: You could assign the numbers of GPUs and gpu_ids that you need.
  • $N_1 $N_2 $N_3: The number of annotated instances.
  • $N_times_1 $N_times_2: The number of training epoches.
  • $batch_size: Training batch size.
  • $max_length: The max length of the input sentence.
  • $i_th: Given 5 random seeds to train the models. Each $i_th indicates the different random seed.

Citation

Please cite our paper if you use CSS-LM in your work:

@article{su2021csslm,
   title={CSS-LM: A Contrastive Framework for Semi-Supervised Fine-Tuning of Pre-Trained Language Models},
   volume={29},
   ISSN={2329-9304},
   url={http://dx.doi.org/10.1109/TASLP.2021.3105013},
   DOI={10.1109/taslp.2021.3105013},
   journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
   publisher={Institute of Electrical and Electronics Engineers (IEEE)},
   author={Su, Yusheng and Han, Xu and Lin, Yankai and Zhang, Zhengyan and Liu, Zhiyuan and Li, Peng and Zhou, Jie and Sun, Maosong},
   year={2021},
   pages={2930–2941}
}

Contact

Yusheng Su

Mail: yushengsu.thu@gmail.com; suys19@mauls.tsinghua.edu.cn

Releases

No releases published

Packages

No packages published