-
Notifications
You must be signed in to change notification settings - Fork 12
Refactor benchmark packaging/runtime: uv workspace, import cleanup, and docker unification #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
d448f7a
1c17b67
156aed7
87adc18
b918e49
3127aba
0f128d0
a13a41c
9d10d6b
64e2ef1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| .git | ||
| .gitignore | ||
|
|
||
| .venv | ||
| **/.venv | ||
| .uv-cache | ||
| **/.uv-cache | ||
|
|
||
| __pycache__ | ||
| **/__pycache__ | ||
| *.pyc | ||
|
|
||
| dist | ||
| build | ||
| *.egg-info | ||
|
|
||
| logs | ||
| outputs |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| name: SDK Package | ||
|
|
||
| on: | ||
| push: | ||
| branches: [main] | ||
| paths: | ||
| - 'sdk/**' | ||
| - 'pyproject.toml' | ||
| - '.github/workflows/sdk-package.yml' | ||
| pull_request: | ||
| paths: | ||
| - 'sdk/**' | ||
| - 'pyproject.toml' | ||
| - '.github/workflows/sdk-package.yml' | ||
| workflow_dispatch: | ||
|
|
||
| jobs: | ||
| build-sdk: | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.11' | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v6 | ||
|
|
||
| - name: Build SDK package | ||
| run: uv build --package system-intelligence-sdk --wheel --sdist | ||
|
|
||
| - name: Verify package metadata | ||
| run: uvx twine check dist/system_intelligence_sdk-* | ||
|
|
||
| - name: Upload SDK dist artifacts | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: sdk-dist | ||
| path: dist/* | ||
| retention-days: 14 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,6 +3,9 @@ __pycache__/ | |
| *.pyc | ||
| .venv/ | ||
| venv/ | ||
| build/ | ||
| dist/ | ||
| *.egg-info/ | ||
|
|
||
| # IDE | ||
| .vscode/ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,34 +1,43 @@ | ||
| FROM ubuntu:24.04 | ||
| FROM ghcr.io/astral-sh/uv:python3.11-bookworm-slim AS builder | ||
|
|
||
| ARG DEBIAN_FRONTEND=noninteractive | ||
|
|
||
| USER root | ||
| WORKDIR /workspace | ||
| COPY . /workspace | ||
| RUN mkdir -p /workspace/dist \ | ||
| && (uv build --package system-intelligence-sdk --wheel -o /workspace/dist || true) \ | ||
| && uv build --all-packages --wheel -o /workspace/dist | ||
|
|
||
| WORKDIR / | ||
| COPY . . | ||
| FROM ghcr.io/astral-sh/uv:python3.11-bookworm-slim | ||
|
|
||
| RUN rm -rf /var/lib/apt/lists/* \ | ||
| && apt-get update -o Acquire::Retries=5 \ | ||
| && apt-get install -y --no-install-recommends \ | ||
| build-essential \ | ||
| git \ | ||
| wget \ | ||
| python3-pip \ | ||
| python3-venv \ | ||
| pipx \ | ||
| ARG DEBIAN_FRONTEND=noninteractive | ||
| USER root | ||
| RUN apt-get update && apt-get install -y --no-install-recommends git \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # SWE-ReX will always attempt to install its server into your docker container | ||
| # however, this takes a couple of seconds. If we already provide it in the image, | ||
| # this is much faster. | ||
| RUN pipx install swe-rex | ||
| RUN pipx ensurepath | ||
|
|
||
| ENV PATH="/root/.local/bin:${PATH}" | ||
| ENV PATH="/usr/local/go/bin:${PATH}" | ||
|
|
||
| SHELL ["/bin/bash", "-c"] | ||
|
|
||
| RUN chmod +x install.sh test.sh && ./install.sh | ||
|
|
||
| CMD ["bash"] | ||
| # Build with repository root as context: | ||
| # docker build -f benchmarks/arteval_bench/Dockerfile . | ||
| WORKDIR /workspace | ||
| COPY . /workspace | ||
| COPY --from=builder /workspace/dist/*.whl /tmp/dist/ | ||
|
|
||
| WORKDIR /workspace/benchmarks/arteval_bench | ||
| RUN set -eux; \ | ||
| SDK_WHEEL="$(ls /tmp/dist/system_intelligence_sdk-*.whl | head -n1 || true)"; \ | ||
| BENCH_WHEEL="$(ls /tmp/dist/arteval_bench-*.whl | head -n1 || true)"; \ | ||
| if [ -z "$SDK_WHEEL" ]; then \ | ||
| echo "Missing SDK wheel in /tmp/dist. Build with repo root context:"; \ | ||
| echo "docker build -t arteval_bench -f benchmarks/arteval_bench/Dockerfile ."; \ | ||
| ls -1 /tmp/dist || true; \ | ||
| exit 1; \ | ||
| fi; \ | ||
| if [ -z "$BENCH_WHEEL" ]; then \ | ||
| echo "Missing arteval_bench wheel in /tmp/dist."; \ | ||
| ls -1 /tmp/dist || true; \ | ||
| exit 1; \ | ||
| fi; \ | ||
| rm -rf .venv; \ | ||
| uv venv .venv; \ | ||
| uv pip install --python .venv/bin/python "$SDK_WHEEL" "$BENCH_WHEEL"; \ | ||
| .venv/bin/python src/core/sweagent_compat.py >/dev/null; \ | ||
| .venv/bin/sweagent --help >/dev/null | ||
|
|
||
| CMD ["bash"] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| [project] | ||
| name = "arteval-bench" | ||
| version = "0.1.0" | ||
| description = "ArtEval benchmark package" | ||
| requires-python = ">=3.11" | ||
| dependencies = [ | ||
| "system-intelligence-sdk>=0.1.0", | ||
| "requests", | ||
| "azure-identity", | ||
| "sweagent @ git+https://github.com/SWE-agent/SWE-agent.git@v1.1.0", | ||
|
||
| ] | ||
|
|
||
| [project.optional-dependencies] | ||
| dev = [ | ||
| "pytest>=8.0.0", | ||
| "ruff>=0.6.0", | ||
| ] | ||
|
|
||
| [build-system] | ||
| requires = ["uv_build>=0.10.4,<0.11.0"] | ||
| build-backend = "uv_build" | ||
|
|
||
| [tool.uv.build-backend] | ||
| module-name = "src" | ||
| module-root = "" | ||
|
|
||
| [tool.uv.sources] | ||
| system-intelligence-sdk = { workspace = true } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| """ArtEval benchmark package.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The install script auto-installs
uvby piping a remote shell script from the network intosh. That pattern is a supply-chain risk and also makes installs non-reproducible in locked-down environments. Prefer documenting a manualuvinstallation step (or at least prompting for confirmation / verifying a pinned installer checksum) instead of executing a remote script automatically.