Python SDK for running evaluations on LLM generated responses
-
Updated
May 25, 2024 - Python
Python SDK for running evaluations on LLM generated responses
Open source Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen
Harmonizing clinical and genetic data to enhance the precision and efficiency of glioma diagnosis.
Python client for Kolena's machine learning testing platform
A collection of color and style transfer algorithms and objective evaluation metrics.
This repository contains a Jupyter Notebook exploring the adult income dataset. The notebook performs Exploratory Data Analysis (EDA), including visualizations with charts and graphs. Additionally, it implements various classification models to predict income and analyzes their accuracy.
The LLM Evaluation Framework
[CVPR 2024] On the Content Bias in Fréchet Video Distance
This is a collection of all the machine learning techniques required in any machine learning project. It contains detailed descriptions, videos, book recommendations, and additional material to properly grasp all the concepts.
Project page for our paper "REALY: Rethinking the Evaluation of 3D Face Reconstruction".
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
Counting-Stars (★)
This repository houses my solutions to a diverse range of machine learning assignments and projects completed during the ML Zoom Camp , and MLOps Zoom Camp, a comprehensive machine learning boot camp.
Open-Source Evaluation for GenAI Application Pipelines
A library for evaluating Retrieval-Augmented Generation (RAG) systems
Official repository for “PATE: Proximity-Aware Time series anomaly Evaluation”.
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Detailed exploration of random forest regressors, including data cleaning, model building, and performance evaluation on various datasets.
Evaluate the quality of SRT files using the multilingual multimodal SONAR model.
Valor is a centralized evaluation store which makes it easy to measure, explore, and rank model performance.
Add a description, image, and links to the evaluation-metrics topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-metrics topic, visit your repo's landing page and select "manage topics."