feat: structured logging#37
Conversation
There was a problem hiding this comment.
I think the new logging flags won’t actually take effect for judgearena-elo yet, because the package entrypoint still points to estimate_elo_ratings:main while configure_logging() only runs in cli(). Could we make both paths go through the same logging setup?
Maybe we can point pyproject.toml at judgearena.estimate_elo_ratings:cli.
There was a problem hiding this comment.
Looks like the MT-Bench path resolves a timestamped res_folder and attaches a file handler before dispatch in generate_and_evaluate.py, but _save_mt_bench_results() later creates a different res_folder for the actual artifacts. Could we resolve the output directory once and reuse it for the whole MT-Bench run so logs and results stay together?
|
|
||
| Returns the handler so the caller can remove it if needed. | ||
| """ | ||
| root = logging.getLogger(_ROOT_LOGGER_NAME) |
There was a problem hiding this comment.
attach_file_handler() always adds a new FileHandler with no deduping. In multi-step flows like MT-Bench, this can accumulate multiple file handlers on the same logger. Should we make this idempotent, or centralize file-handler setup in one place?
…g existing handler instead of creating new one
|
Hi @ErlisLushtaku , thanks for the review. I have fixed the remaining problems |
Summary
Replace ad-hoc
print()calls across the codebase with Python'sloggingmodule under a unifiedjudgearenalogger namespace. This gives users control over verbosity and enables persistent debug logs without code changes.I totally think this is required and its better to switch to using
loggingpackage than late as its common practice (also development friendly). The only concern is the log files. It is usually hard to read log directly from console so its better to save log in a rotation for debugging and results.Changes
New:
judgearena/log.pyget_logger(name)— returns a child logger under thejudgearenanamespace; auto-prefixes bare module names.configure_logging(verbosity, log_file)— sets up console + optional file handlers. SupportsJUDGEARENA_LOG_LEVELenv-var override.attach_file_handler(path)— adds a DEBUG-level file handler (always captures full trace regardless of console verbosity).make_run_log_path(folder)— generates a timestampedrun-YYYYMMDD_HHMMSS.logpath.New CLI flags (
judgearena/cli_common.py)-v/--verbose-q/--quiet--log-file PATH--no-log-filerun-*.login the result folderAdded
resolve_verbosity(args)helper —-qtakes precedence over-v.Codebase migration
Replaced
print()withlogger.info()/logger.debug()in:evaluate.py,generate_and_evaluate.py,estimate_elo_ratings.pyarenas_utils.py,eval_utils.py,utils.pyinstruction_dataset/__init__.py,mt_bench/mt_bench_utils.pyTests (
tests/test_logging.py)Behaviour
Default behaviour (
-v 0) matches existing output —INFOmessages print to stderr just like the oldprint()calls. No visible change for users who don't pass new flags.Example Log