evaluation-framework

Star

Here are 119 public repositories matching this topic...

EleutherAI / lm-evaluation-harness

Star

A framework for few-shot evaluation of language models.

transformer language-model evaluation-framework

Updated May 25, 2024
Python

MaurizioFD / RecSys2019_DeepLearning_Evaluation

Star

This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.

deep-learning neural-network reproducible-research collaborative-filtering matrix-factorization hyperparameters bpr recommendation-system recommender-system reproducibility recommendation-algorithms knn matrix-completion evaluation-framework content-based-recommendation hybrid-recommender-system funksvd bprmf bprslim slimelasticnet

Updated May 25, 2023
Python

promptfoo / promptfoo

Star

Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.

testing ci evaluation ci-cd cicd prompts evaluation-framework rag llm prompt-engineering llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework

Updated May 26, 2024
TypeScript

confident-ai / deepeval

Star

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated May 24, 2024
Python

srcclr / efda

Star

Evaluation Framework for Dependency Analysis (EFDA)

dependency-analysis languages evaluation-framework

Updated May 4, 2022
C

huggingface / lighteval

Star

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

evaluation evaluation-metrics evaluation-framework huggingface

Updated May 23, 2024
Python

SpikeInterface / spiketoolkit

Star

Python-based tools for pre-, post-processing, validating, and curating spike sorting datasets.

neuroscience electrophysiology spike-sorting evaluation-framework

Updated Jan 14, 2022
Python

Borda / BIRL

Sponsor

Star

BIRL: Benchmark on Image Registration methods with Landmark validations

benchmark docker-image dataset medical-imaging landmarks image-registration evaluation-framework pathology-image anhir image-pair registration-methods cima registration-performances registration-benchmark

Updated Jan 4, 2022
Python

TonicAI / tonic_validate

Star

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-metrics evaluation-framework rag large-language-models llm llms llmops retrieval-augmented-generation

Updated May 21, 2024
Python

bijington / expressive

Sponsor

Star

Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.

xamarin parsing cross-platform evaluation netstandard expression-parser expression-evaluator hacktoberfest evaluation-framework