Bench All In One: An open-source hub for artificial intelligence benchmarks

Community-Driven AI Benchmark Ecosystem. Track SOTA models, visualize capabilities via Radar Charts, and explore the AI landscape with automated daily updates.

Live Demo | Report Bug | Request Feature

Introduction

BenchAI1 is an open-source hub which collects benchmarks in artificial intelligence. It tries to collect as many benchmarks as possible for everyone to find their best model.

I have tried different websites for AI benchamrk, all of them disappoints me at some aspect, so I decided to build BenchAI1 with following features:

Open-source data collection: All the benchmark results are collected from official websites of benchmark / professional evaluation platform (like Artificial Analysis).
Automated Pipelines: Most of the benchmark results are maintained by automated pipeline through GitHub Actions and Python scripts. The daily synchronization will make the data always up to date.
Interactive Leaderboards: Filter, search and sort models with client-side interactivity.
Community Driven: BenchAI1 tries to build up a community where people explore reasonable, professional or interesting benchmark to fully examine the capability of artificial intelligence.

Tech Stack

The stack of this project aims to be simple but comprehensive:

Framework: Astro 5.0 (Static Site Generation)
UI Components: React + Tailwind CSS
Data Visualization: Recharts
Automation & Scripting: Python (managed by uv)
CI/CD: GitHub Actions (Cron Jobs for data sync)
Data Source: JSON (For most data about benchmark/model/publisher) + MDX/MD (For benchmark document)

Getting Started

Prerequisites

Node.js & npm/bun
Python 3.10+ (with uv recommended)

Installation

Clone the repository

git clone https://github.com/daniel5u/BenchAI1.git
cd BenchAI1

Install dependencies

bun install

Set up Python environment (for data sync scripts)

cd pipeline
uv sync

Start Development Server

bun dev

Visit http://localhost:4321 to see the app.

Data Synchronization

The data sync pipeline ensures the benchmark result is always up-to-date

Location: /pipeline/ (There are different scripts data for different data source)

Run the sync manually:

cd pipeline
uv run [specific_script].py

Contributing

Contributions are welcomed! You can upload your own benchmark / your own model with scores / your improvement on the structure of the project, whether frontend or data structure

Upload your own benchmark: Any benchmark should contain no less than 5 models to make it valid. Simply write the corresponding /src/content/benchamrks/your_benchmark.json to include the benchamrk's name, matrix, data and so on, and the /src/content/benchmark-docs/your_doc.mdx(or .md) to describe any detailed information about your benchamrk.
Make sure the model reference is valid: All models are stored in /src/content/models with respect to their publisher (for example, gpt-4 will be stored as /src/content/models/openai/gpt-4.json), if there are any problem with model (incorrect name, params and so on), open an issue to correct the information.
Project improvement: I am a total rookie to dev, so if you have any idea about improving UI&UX about the pages, adding more function to the project, any other suggestions, feel free to open an issue / pull request / discussion.

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

TODOS:

I am constantly improving BenchAI1 to make it the best place for AI evaluation insights.

Comparison Function: Add the comparison function between different models / publishers.
Enhance Model Information: Add the information of parameter size & openness plus filtering function based on these.
View Tracking: The popularity(view) of different benchmarks are supposed to be tracked somehow, I am finding the best way, probably buying a NAS and deploy a redis server...
Community Setup: Since the project aims to build up a complete community for AI evaluation, I am planning to :
- [x]set up a discord channel, where people can share their own benchmark, receive latest update, communicate model capabilities and so on.
- set up subscription service, which sends email to users about the latest update / latest SOTA / latest model (customized by user).
Internationalization (i18n): Support various language for global AI community.
More Data Source: Integrate as many data source as possible with auto sync.

Have an idea? Open an issue to discuss!

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
.vscode		.vscode
pipeline		pipeline
public		public
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
astro.config.mjs		astro.config.mjs
bun.lock		bun.lock
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.mjs		tailwind.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bench All In One: An open-source hub for artificial intelligence benchmarks

Introduction

Tech Stack

Getting Started

Prerequisites

Installation

Data Synchronization

Contributing

License

TODOS:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bench All In One: An open-source hub for artificial intelligence benchmarks

Introduction

Tech Stack

Getting Started

Prerequisites

Installation

Data Synchronization

Contributing

License

TODOS:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages