[PDF변환] 빌드시 추가 폰트 68개 설치 과정 적용#186
Conversation
- HuggingFace public repo(HeechanKim-Genon/additional_fonts)에서 폰트 다운로드하는 font_artifacts 스테이지 추가 - runtime 스테이지에서 /usr/share/fonts/additional에 복사 후 fc-cache로 등록 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughA new Docker build stage ( Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the Dockerfile to include a new stage for downloading additional fonts from Hugging Face and installing them into the system. The feedback suggests optimizing the image size by selectively downloading font files and improving build performance by avoiding a full font cache rescan.
| huggingface-cli download HeechanKim-Genon/additional_fonts \ | ||
| --repo-type dataset \ | ||
| --local-dir /app/extra_fonts \ | ||
| --local-dir-use-symlinks False |
There was a problem hiding this comment.
huggingface-cli download 명령 시 --include 옵션을 사용하여 필요한 폰트 파일 확장자(*.ttf, *.otf, *.ttc 등)만 선택적으로 다운로드하는 것을 권장합니다. 이는 데이터셋 저장소에 존재할 수 있는 불필요한 메타데이터나 문서 파일이 이미지 레이어에 포함되는 것을 방지하여 이미지 크기를 최적화하는 데 도움이 됩니다.
huggingface-cli download HeechanKim-Genon/additional_fonts \
--repo-type dataset \
--local-dir /app/extra_fonts \
--local-dir-use-symlinks False \
--include "*.ttf" "*.otf" "*.ttc"
|
|
||
| # 추가 폰트 설치 | ||
| COPY --from=font_artifacts /app/extra_fonts /usr/share/fonts/additional | ||
| RUN fc-cache -f -v |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
genon/preprocessor/docker/Dockerfile (1)
263-265: Add a build-time assertion for the “68 extra fonts” requirement.Line 263-265 registers fonts, but there is no fail-fast validation that the required font set is actually present after copy/cache refresh.
Suggested assertion step
# 추가 폰트 설치 COPY --from=font_artifacts /app/extra_fonts /usr/share/fonts/additional -RUN fc-cache -f -v +RUN set -eux; \ + fc-cache -f -v; \ + FONT_FILES="$(find /usr/share/fonts/additional -type f \( -iname '*.ttf' -o -iname '*.otf' -o -iname '*.ttc' \) | wc -l)"; \ + test "${FONT_FILES}" -ge 68🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@genon/preprocessor/docker/Dockerfile` around lines 263 - 265, The Dockerfile copies fonts with COPY --from=font_artifacts /app/extra_fonts /usr/share/fonts/additional and refreshes caches with RUN fc-cache -f -v but lacks a fail-fast check that exactly 68 font files were installed; add a build-time assertion step immediately after the fc-cache RUN to count files under /usr/share/fonts/additional (for example using find or a small shell/python one-liner) and exit non-zero with a clear error message if the count is not 68 so the image build fails fast when the required font set is missing or incomplete.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@genon/preprocessor/docker/Dockerfile`:
- Around line 163-169: The RUN block invoking `huggingface-cli download` should
pin the dataset to a specific revision and restrict downloaded files to font
artifacts: update the `huggingface-cli download` invocation in the RUN that
creates /app/extra_fonts to include `--revision <immutable-ref>` (replace with
the chosen commit SHA or tag) and add `--include` filters for font extensions
(e.g., *.ttf, *.otf, *.woff, *.woff2) so only font files are fetched into
/app/extra_fonts; keep `--repo-type dataset` and the existing `--local-dir`
flags intact to preserve behavior.
---
Nitpick comments:
In `@genon/preprocessor/docker/Dockerfile`:
- Around line 263-265: The Dockerfile copies fonts with COPY
--from=font_artifacts /app/extra_fonts /usr/share/fonts/additional and refreshes
caches with RUN fc-cache -f -v but lacks a fail-fast check that exactly 68 font
files were installed; add a build-time assertion step immediately after the
fc-cache RUN to count files under /usr/share/fonts/additional (for example using
find or a small shell/python one-liner) and exit non-zero with a clear error
message if the count is not 68 so the image build fails fast when the required
font set is missing or incomplete.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 429341d6-eca6-468d-9f9e-8441ff106ede
📒 Files selected for processing (1)
genon/preprocessor/docker/Dockerfile
| RUN --mount=type=cache,target=${HF_HOME} \ | ||
| echo "[INFO] Downloading extra fonts from HF (public)..." && \ | ||
| mkdir -p /app/extra_fonts && \ | ||
| huggingface-cli download HeechanKim-Genon/additional_fonts \ | ||
| --repo-type dataset \ | ||
| --local-dir /app/extra_fonts \ | ||
| --local-dir-use-symlinks False |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify whether revision pinning and file filters are currently absent/present.
rg -n -C2 'additional_fonts|--revision|--include|font_artifacts' genon/preprocessor/docker/DockerfileRepository: genonai/doc_parser
Length of output: 978
Pin HF dataset revision and restrict downloaded artifact types.
The huggingface-cli download at line 166 pulls additional_fonts from "latest" without --revision, making image contents non-deterministic across rebuilds. Additionally, it downloads all dataset files without filtering, potentially copying non-font artifacts into /usr/share/fonts/additional.
Note: The same Dockerfile already applies --include filters elsewhere (line 204), establishing the pattern for this repository.
Suggested fix
+ARG ADDITIONAL_FONTS_REPO=HeechanKim-Genon/additional_fonts
+ARG ADDITIONAL_FONTS_REV=<pin_dataset_commit_sha>
FROM builder AS font_artifacts
RUN --mount=type=cache,target=${HF_HOME} \
echo "[INFO] Downloading extra fonts from HF (public)..." && \
mkdir -p /app/extra_fonts && \
- huggingface-cli download HeechanKim-Genon/additional_fonts \
+ huggingface-cli download "${ADDITIONAL_FONTS_REPO}" \
--repo-type dataset \
+ --revision "${ADDITIONAL_FONTS_REV}" \
+ --include "*.ttf" \
+ --include "*.otf" \
+ --include "*.ttc" \
--local-dir /app/extra_fonts \
--local-dir-use-symlinks False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@genon/preprocessor/docker/Dockerfile` around lines 163 - 169, The RUN block
invoking `huggingface-cli download` should pin the dataset to a specific
revision and restrict downloaded files to font artifacts: update the
`huggingface-cli download` invocation in the RUN that creates /app/extra_fonts
to include `--revision <immutable-ref>` (replace with the chosen commit SHA or
tag) and add `--include` filters for font extensions (e.g., *.ttf, *.otf,
*.woff, *.woff2) so only font files are fetched into /app/extra_fonts; keep
`--repo-type dataset` and the existing `--local-dir` flags intact to preserve
behavior.
Summary
HeechanKim-Genon/additional_fonts)에서 다운로드 후 시스템 폰트 경로에 등록font_artifacts스테이지 추가 → runtime 스테이지에서fc-cache로 등록Changes
genon/preprocessor/docker/Dockerfile:font_artifacts스테이지 추가, runtime에 COPY + fc-cache 추가Test plan
fc-list로 추가 폰트 68개 확인Closes #181
🤖 Generated with Claude Code
Summary by CodeRabbit