Skip to content

opendatalab/CHARM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CHARM✨ Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations

arXiv license

Construction of CHARM

Comparison of commonsense reasoning benchmarks

Benchmarks CN-Lang CSR CN-specifics Dual-Domain Rea-Mem
Most benchmarks in davis2023benchmarks
XNLI, XCOPA,XStoryCloze
LogiQA, CLUE, CMMLU
CORECODE
CHARM (ours)

"CN-Lang" indicates the benchmark is presented in Chinese language. "CSR" means the benchmark is designed to focus on CommonSense Reasoning. "CN-specific" indicates the benchmark includes elements that are unique to Chinese culture, language, regional characteristics, history, etc. "Dual-Domain" indicates the benchmark encompasses both Chinese-specific and global domain tasks, with questions presented in the similar style and format. "Rea-Mem" indicates the benchmark includes closely-interconnected reasoning and memorization tasks.

🚀 What's New

  • [2024.5.24] CHARM has been open-sourced !!! 🔥🔥🔥
  • [2024.5.15] CHARM has been accepted to the main conference of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) !!! 🔥🔥🔥
  • [2024.3.21] Paper available on ArXiv.

🧾 TODO

  • Support inference and evaluation on Opencompass.

🛠️ Inference and Evaluation on Opencompass

Below are the steps for quickly downloading CHARM and using OpenCompass for evaluation.

1. OpenCompass Environment Setup

Refer to the installation steps for OpenCompass.

2. Download CHARM

git clone https://github.com/opendatalab/CHARM CHARM

3. Run Inference and Evaluation

cd opencompass
mkdir data
ln -snf path_to_CHARM_repo/data/CHARM ./data/CHARM

# Infering and evaluating CHARM with hf_llama3_8b_instruct model
python run.py --models hf_llama3_8b_instruct --datasets charm_gen

🖊️ Citation

@misc{sun2024benchmarking,
      title={Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations}, 
      author={Jiaxing Sun and Weiquan Huang and Jiang Wu and Chenya Gu and Wei Li and Songyang Zhang and Hang Yan and Conghui He},
      year={2024},
      eprint={2403.14112},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

💳 License

This project is released under the Apache 2.0 license.

About

[ACL 2024 Main Conference] Chinese commonsense benchmark for LLMs

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages