Community-Driven AI Benchmark Ecosystem. Track SOTA models, visualize capabilities via Radar Charts, and explore the AI landscape with automated daily updates.
BenchAI1 is an open-source hub which collects benchmarks in artificial intelligence. It tries to collect as many benchmarks as possible for everyone to find their best model.
I have tried different websites for AI benchamrk, all of them disappoints me at some aspect, so I decided to build BenchAI1 with following features:
- Open-source data collection: All the benchmark results are collected from official websites of benchmark / professional evaluation platform (like Artificial Analysis).
- Automated Pipelines: Most of the benchmark results are maintained by automated pipeline through GitHub Actions and Python scripts. The daily synchronization will make the data always up to date.
- Interactive Leaderboards: Filter, search and sort models with client-side interactivity.
- Community Driven: BenchAI1 tries to build up a community where people explore reasonable, professional or interesting benchmark to fully examine the capability of artificial intelligence.
The stack of this project aims to be simple but comprehensive:
- Framework: Astro 5.0 (Static Site Generation)
- UI Components: React + Tailwind CSS
- Data Visualization: Recharts
- Automation & Scripting: Python (managed by
uv) - CI/CD: GitHub Actions (Cron Jobs for data sync)
- Data Source: JSON (For most data about benchmark/model/publisher) + MDX/MD (For benchmark document)
- Node.js & npm/bun
- Python 3.10+ (with
uvrecommended)
- Clone the repository
git clone https://github.com/daniel5u/BenchAI1.git
cd BenchAI1- Install dependencies
bun install- Set up Python environment (for data sync scripts)
cd pipeline
uv sync- Start Development Server
bun devVisit http://localhost:4321 to see the app.
The data sync pipeline ensures the benchmark result is always up-to-date
- Location:
/pipeline/(There are different scripts data for different data source)
Run the sync manually:
cd pipeline
uv run [specific_script].pyContributions are welcomed! You can upload your own benchmark / your own model with scores / your improvement on the structure of the project, whether frontend or data structure
- Upload your own benchmark: Any benchmark should contain no less than 5 models to make it valid. Simply write the corresponding
/src/content/benchamrks/your_benchmark.jsonto include the benchamrk's name, matrix, data and so on, and the/src/content/benchmark-docs/your_doc.mdx(or .md)to describe any detailed information about your benchamrk. - Make sure the model reference is valid: All models are stored in
/src/content/modelswith respect to their publisher (for example, gpt-4 will be stored as/src/content/models/openai/gpt-4.json), if there are any problem with model (incorrect name, params and so on), open an issue to correct the information. - Project improvement: I am a total rookie to dev, so if you have any idea about improving UI&UX about the pages, adding more function to the project, any other suggestions, feel free to open an issue / pull request / discussion.
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
I am constantly improving BenchAI1 to make it the best place for AI evaluation insights.
- Comparison Function: Add the
comparisonfunction between different models / publishers. - Enhance Model Information: Add the information of parameter size & openness plus filtering function based on these.
- View Tracking: The popularity(
view) of different benchmarks are supposed to be tracked somehow, I am finding the best way, probably buying a NAS and deploy a redis server... - Community Setup: Since the project aims to build up a complete community for AI evaluation, I am planning to :
- [x]set up a discord channel, where people can share their own benchmark, receive latest update, communicate model capabilities and so on.
- set up subscription service, which sends email to users about the latest update / latest SOTA / latest model (customized by user).
- Internationalization (i18n): Support various language for global AI community.
- More Data Source: Integrate as many data source as possible with auto sync.
Have an idea? Open an issue to discuss!