CHARM✨ Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations
Benchmarks | CN-Lang | CSR | CN-specifics | Dual-Domain | Rea-Mem |
---|---|---|---|---|---|
Most benchmarks in davis2023benchmarks | ✘ | ✔ | ✘ | ✘ | ✘ |
XNLI, XCOPA,XStoryCloze | ✔ | ✔ | ✘ | ✘ | ✘ |
LogiQA, CLUE, CMMLU | ✔ | ✘ | ✔ | ✘ | ✘ |
CORECODE | ✔ | ✔ | ✘ | ✘ | ✘ |
CHARM (ours) | ✔ | ✔ | ✔ | ✔ | ✔ |
"CN-Lang" indicates the benchmark is presented in Chinese language. "CSR" means the benchmark is designed to focus on CommonSense Reasoning. "CN-specific" indicates the benchmark includes elements that are unique to Chinese culture, language, regional characteristics, history, etc. "Dual-Domain" indicates the benchmark encompasses both Chinese-specific and global domain tasks, with questions presented in the similar style and format. "Rea-Mem" indicates the benchmark includes closely-interconnected reasoning and memorization tasks.
- [2024.5.24] CHARM has been open-sourced !!! 🔥🔥🔥
- [2024.5.15] CHARM has been accepted to the main conference of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) !!! 🔥🔥🔥
- [2024.3.21] Paper available on ArXiv.
- Support inference and evaluation on Opencompass.
Below are the steps for quickly downloading CHARM and using OpenCompass for evaluation.
Refer to the installation steps for OpenCompass.
git clone https://github.com/opendatalab/CHARM CHARM
cd opencompass
mkdir data
ln -snf path_to_CHARM_repo/data/CHARM ./data/CHARM
# Infering and evaluating CHARM with hf_llama3_8b_instruct model
python run.py --models hf_llama3_8b_instruct --datasets charm_gen
@misc{sun2024benchmarking,
title={Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations},
author={Jiaxing Sun and Weiquan Huang and Jiang Wu and Chenya Gu and Wei Li and Songyang Zhang and Hang Yan and Conghui He},
year={2024},
eprint={2403.14112},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
This project is released under the Apache 2.0 license.