Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add human evaluations #395

Open
1 of 57 tasks
saattrupdan opened this issue Apr 16, 2024 · 0 comments
Open
1 of 57 tasks

Add human evaluations #395

saattrupdan opened this issue Apr 16, 2024 · 0 comments
Labels
model evaluation request Request to evaluate a model and add it to the leaderboard(s)

Comments

@saattrupdan
Copy link
Member

saattrupdan commented Apr 16, 2024

To have a better idea of how well the models are doing, we could add human benchmarks. These could be evaluated on the validation splits. All human evaluations should ideally be released openly.

To enable proper comparison between the models and humans, we should create an evaluation platform (could just be a simple Gradio app) which only supplies the humans with the same information as the models. The NER task should be dealt with separately by, e.g., having fields where they can write the entities for a given category (rather than having them write valid JSON, since we cannot use structured generation with humans).

  • Evaluation platform built

Danish

  • Named entity recognition
  • Sentiment classification
  • Linguistic acceptability
  • Question answering
  • Summarisation
  • Knowledge
  • Common-sense reasoning

Swedish

  • Named entity recognition
  • Sentiment classification
  • Linguistic acceptability
  • Question answering
  • Summarisation
  • Knowledge
  • Common-sense reasoning

Norwegian

  • Named entity recognition
  • Sentiment classification
  • Linguistic acceptability
  • Question answering
  • Summarisation
  • Knowledge
  • Common-sense reasoning

Icelandic

  • Named entity recognition
  • Sentiment classification
  • Linguistic acceptability
  • Question answering
  • Summarisation
  • Knowledge
  • Common-sense reasoning

Faroese

  • Named entity recognition
  • Sentiment classification
  • Linguistic acceptability
  • Question answering
  • Summarisation
  • Knowledge
  • Common-sense reasoning

German

  • Named entity recognition
  • Sentiment classification
  • Linguistic acceptability
  • Question answering
  • Summarisation
  • Knowledge
  • Common-sense reasoning

Dutch

  • Named entity recognition
  • Sentiment classification
  • Linguistic acceptability
  • Question answering
  • Summarisation
  • Knowledge
  • Common-sense reasoning

English

  • Named entity recognition
  • Sentiment classification
  • Linguistic acceptability
  • Question answering
  • Summarisation
  • Knowledge
  • Common-sense reasoning
@saattrupdan saattrupdan added the model evaluation request Request to evaluate a model and add it to the leaderboard(s) label Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model evaluation request Request to evaluate a model and add it to the leaderboard(s)
Projects
None yet
Development

No branches or pull requests

1 participant