feat: add a rust API frontend#1667
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a Rust-based frontend for the Aphrodite engine, incorporating a new Rust workspace with specialized tool parsers for DeepSeek and Qwen models, gRPC service definitions, and Jinja-based chat template rendering. The Python infrastructure is updated with environment variables and process management logic to launch and monitor the Rust binary as a subprocess. Review feedback identified several critical issues, including missing imports for the dataclasses, weakref, and multiprocessing.connection modules in utility files, as well as a formatting error in a logger warning call where a placeholder was missing its corresponding argument.
| from argparse import Namespace | ||
| from http import HTTPStatus | ||
| from logging import Logger | ||
| from typing import Any |
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
| import argparse | ||
| import contextlib | ||
| import json |
There was a problem hiding this comment.
Several required modules and types are missing from the imports in this file, which will lead to runtime errors:
weakrefis used byRustFrontendProcessManager(line 283).connection(frommultiprocessing) is used by_SubprocessWrapper(lines 298, 303).Anyis used as a type hint in multiple places (lines 242, 401).
| import json | |
| import json | |
| import weakref | |
| from multiprocessing import connection | |
| from typing import Any |
| args.api_server_count, | ||
| ) | ||
| elif envs.APHRODITE_RUST_FRONTEND_PATH and args.api_server_count > 1: | ||
| logger.warning("Ignoring --api-server-count=%d when using rust front-end process") |
There was a problem hiding this comment.
The logger.warning call contains a %d placeholder but is missing the corresponding argument to format. This will result in a literal %d being logged instead of the actual server count.
| logger.warning("Ignoring --api-server-count=%d when using rust front-end process") | |
| logger.warning("Ignoring --api-server-count=%d when using rust front-end process", args.api_server_count) |
Based on https://github.com/Inferact/vllm-frontend-rs, with changes to accommodate Aphrodite's API and func signatures + request-level metrics.
Rust API:
E2E time: 1.73s, TTFT: 0.04s, Prefill: 1925 tokens (48891.2 tokens/s), Decode: 512 tokens (303.1 tokens/s)Python API:
E2E time: 1.69s, TTFT: 0.16s, Prefill: 1925 tokens (11845.3 tokens/s), Decode: 448 tokens (294.5 tokens/s)Testing
Install aphrodite as normal, then run: