#

evaluation-framework

Here are 119 public repositories matching this topic...

CalebGartner / PyExpr

A simple template module for evaluating user/runtime-unknown value expressions in a safe manner, using Python's 'eval'.

python template python-library recipe evaluation python3 expression-template expression-parser eval expression-evaluator evaluate-expressions evaluation-framework expression-parsing expression-analysis expression-trees runtime-unknown

Updated Oct 25, 2018

Galactic-FaaS / BMR-Harness

An Eval Harness for BLEU, METEOR, and ROUGE

transformers evaluation-framework

Updated Apr 3, 2024

IHIaadj / N-Compariw

N-Compariw: End-to-End Workflow for Neural Networks Comparison

deep-learning analysis evaluation-framework comparison-tool

Updated Oct 2, 2021

Brudalaxe / BM25-VSM-Search-Engine

A hybrid search engine based on the BM25 and VSM retrieval models.

search-engine information-retrieval query evaluation corpus indexing evaluation-metrics bm25 vsm evaluation-framework indexing-querying retrieval-model

Updated May 18, 2022
Jupyter Notebook

ulysses-camara / ulysses-senteval

Benchmark for assessing contextual-semantic sentence models in Brazilian legal domain.

legal brazil evaluation datasets evaluation-framework brazilian-portuguese sentence-transformers sbert legal-domain

Updated Feb 6, 2024
Python

brettdidonato / BSD_Evals

LLM evaluation framework

bigquery evaluations gcp google-cloud openai evaluation-metrics evaluation-framework nl2sql text2sql llms generative-ai anthropic gemini-pro

Updated Apr 13, 2024
Jupyter Notebook

MUSC-TBIC / etude-engine

ETUDE (Evaluation Tool for Unstructured Data and Extractions) is a Python-based tool that provides consistent evaluation options across a range of annotation schemata and corpus formats

nlp machine-learning biomedical-informatics nlp-machine-learning evaluation-framework evaluation-engine

Updated Jul 22, 2022
Jupyter Notebook

MinhVuong2000 / LLMReasonCert

Official Implementation of ACL2024 paper "Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs"(https://arxiv.org/abs/2402.11199).

framework evaluation knowledge-graph reasoning evaluation-framework llms faithfulness

Updated May 20, 2024
Python

feup-infolab / army-ant

An experimental information retrieval framework and a workbench for innovation in entity-oriented search.

information-retrieval research ant evaluation-framework

Updated Oct 16, 2020
Jupyter Notebook

dkuehlwein / capgemini-gdsc

Web-Interface for the evaluation of the different GDSC entries.

python competition capgemini evaluation-framework

Updated Apr 8, 2018
Python

EvilPsyCHo / Open-LLM-Benchmark

Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent，格式化输出，指令追随，长文本，多语言，代码，自定义任务的能力基准测试。

openai evaluation-framework huggingface large-language-models llamacpp vllm llm-agent llms-benchmarking

Updated May 10, 2024
Python

AI4Bharat / Dhruva-Evaluation-Suite

A tool to perform functional testing and performance testing of the Dhruva Platform

nlp tts locust nmt evaluation-metrics asr evaluation-framework

Updated Oct 18, 2023
Python

ghattab / MODELAR

MODELAR: MODular and EvaLuative framework to improve surgical Augmented Reality visualization

visualization modular augmented-reality channels evaluation-framework augmented-reality-application depth-peeling surgical-planning

Updated Oct 27, 2021
C#

SouravD-Me / LLM-Evaluation-Dashboard

A Visual Dashboard for Fundamental Benchmarking of LLMs

visualization python benchmarking natural-language-processing dashboard deep-learning jupyter-notebook data-analytics evaluation-framework large-language-models

Updated Feb 23, 2024
Jupyter Notebook

OPTML-Group / Unlearn-WorstCase

"Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning" by Chongyu Fan*, Jiancheng Liu*, Alfred Hero, Sijia Liu

evaluation data-privacy evaluation-framework machine-unlearning forgetting data-deletion unlearning data-removal

Updated May 4, 2024
Python

thlmenezes / TMJudge

A Code Judge Environment

python pbl evaluation-framework

Updated Feb 6, 2020
Python

szegedai / hun_ner_checklist

CHECKLIST-style test cases and the testing of three Hungarian Named Entity Recognition tools.

nlp ner evaluation-framework hungarian-language

Updated Jan 26, 2021
Python

Vmalik1995 / FlightDelay

Flight Delay using Machine Learning

machine-learning random-forest prediction-model evaluation-framework randomforestregressor

Updated May 1, 2023
Jupyter Notebook

yupidevs / pactus

Framework to evaluate Trajectory Classification Algorithms

python transformers classification trajectory-analysis trajectory evaluation-framework classification-models

Updated Sep 23, 2023
Python

e0397123 / AM-FM

dialogue evaluation-framework

Updated Jun 4, 2021
Python

Improve this page

Add a description, image, and links to the evaluation-framework topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation-framework topic, visit your repo's landing page and select "manage topics."