Empirical

Empirical is the fastest way to test different LLMs and model configurations, across all the scenarios that matter for your application.

With Empirical, you can

Run your test datasets locally against off-the-shelf or custom models
Compare model outputs on a web UI, and test changes quickly
Score your outputs with scoring functions
Run tests on CI/CD

Empirical-TS-demo-video.mp4

Usage

See all docs →

Empirical bundles together a test runner and a web app. These can be used through the CLI in your terminal window.

Empirical relies on a configuration file, typically located at empiricalrc.js which describes the test to run.

Start with a basic example

In this example, we will ask an LLM to extract entities from user messages and give us a structured JSON output. For example, "I'm Alice from Maryland" will become {name: 'Alice', location: 'Maryland'}.

Our test will succeed if the model outputs valid JSON.

Use the CLI to create a sample configuration file called empiricalrc.js.

npm init empiricalrun

# For TypeScript
npm init empiricalrun -- --using-ts

Run the example dataset against the selected models.
```
npx empiricalrun
```
This step requires the OPENAI_API_KEY environment variable to authenticate with OpenAI. This execution will cost $0.0026, based on the selected models.
Use the ui command to open the reporter web app and see side-by-side results.
```
npx empiricalrun ui
```

Make it yours

Edit the empiricalrc.js file to make Empirical work for your use-case.

Configure which models to use
Configure your test dataset
Configure scoring functions to grade output quality

Contribution guide

See development docs.

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
.changeset		.changeset
.github/workflows		.github/workflows
.vscode		.vscode
apps/web		apps/web
development		development
docs		docs
examples		examples
packages		packages
turbo/generators		turbo/generators
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.npmrc		.npmrc
.nvmrc		.nvmrc
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
postcss.config.js		postcss.config.js
tsconfig.json		tsconfig.json
turbo.json		turbo.json
turbowatch.ts		turbowatch.ts

License

empirical-run/empirical

Folders and files

Latest commit

History

Repository files navigation

Empirical

Usage

Start with a basic example

Make it yours

Contribution guide

About

Topics

Resources

License

Stars

Watchers

Forks

Languages