Skip to content

[PDF변환] 빌드시 추가 폰트 68개 설치 과정 적용#186

Merged
inoray merged 1 commit intodevelopfrom
task/181-pdf변환-pdf-converter-빌드시-추가-폰트-다운로드-과정-적용
Apr 29, 2026

Hidden character warning

The head ref may contain hidden characters: "task/181-pdf\ubcc0\ud658-pdf-converter-\ube4c\ub4dc\uc2dc-\ucd94\uac00-\ud3f0\ud2b8-\ub2e4\uc6b4\ub85c\ub4dc-\uacfc\uc815-\uc801\uc6a9"
Merged

[PDF변환] 빌드시 추가 폰트 68개 설치 과정 적용#186
inoray merged 1 commit intodevelopfrom
task/181-pdf변환-pdf-converter-빌드시-추가-폰트-다운로드-과정-적용

Conversation

@HeechanKim-Genon
Copy link
Copy Markdown

@HeechanKim-Genon HeechanKim-Genon commented Apr 29, 2026

Summary

  • 빌드시 LibreOffice 자동 설치 폰트 외에, 추가 폰트 68개 설치 과정 적용
  • HuggingFace public repo(HeechanKim-Genon/additional_fonts)에서 다운로드 후 시스템 폰트 경로에 등록
  • font_artifacts 스테이지 추가 → runtime 스테이지에서 fc-cache로 등록

Changes

  • genon/preprocessor/docker/Dockerfile: font_artifacts 스테이지 추가, runtime에 COPY + fc-cache 추가

Test plan

  • 빌드 후 컨테이너 내에서 fc-list 로 추가 폰트 68개 확인

Closes #181

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Enhanced container runtime with additional system fonts to support expanded text rendering capabilities.

- HuggingFace public repo(HeechanKim-Genon/additional_fonts)에서 폰트 다운로드하는 font_artifacts 스테이지 추가
- runtime 스테이지에서 /usr/share/fonts/additional에 복사 후 fc-cache로 등록

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Walkthrough

A new Docker build stage (font_artifacts) downloads additional fonts from a Hugging Face dataset and integrates them into the PDF converter runtime environment by copying them to /usr/share/fonts/additional and refreshing the font cache.

Changes

Cohort / File(s) Summary
Docker Font Enhancement
genon/preprocessor/docker/Dockerfile
Added new build stage to download 75 extra fonts from Hugging Face, copy them into the runtime image's system font directory, and refresh the font cache for availability at runtime.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Fonts, fonts, everywhere!
Synapsoft treasures through the air,
From Hugging Face we fetch them all,
Seventy-five to heed the call,
fc-cache refreshes with delight,
PDF rendering, oh what a sight!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title in Korean describes installing 68 additional fonts during build, which directly matches the core change of adding font artifacts to the Docker build process.
Linked Issues check ✅ Passed The PR implements the requirement to download and install additional fonts (68 fonts from HuggingFace) during Docker build and register them in system font path, matching issue #181 objectives.
Out of Scope Changes check ✅ Passed The Dockerfile changes are focused solely on adding fonts via a new build stage and copying them to the runtime image, with no unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch task/181-pdf변환-pdf-converter-빌드시-추가-폰트-다운로드-과정-적용

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Dockerfile to include a new stage for downloading additional fonts from Hugging Face and installing them into the system. The feedback suggests optimizing the image size by selectively downloading font files and improving build performance by avoiding a full font cache rescan.

Comment on lines +166 to +169
huggingface-cli download HeechanKim-Genon/additional_fonts \
--repo-type dataset \
--local-dir /app/extra_fonts \
--local-dir-use-symlinks False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

huggingface-cli download 명령 시 --include 옵션을 사용하여 필요한 폰트 파일 확장자(*.ttf, *.otf, *.ttc 등)만 선택적으로 다운로드하는 것을 권장합니다. 이는 데이터셋 저장소에 존재할 수 있는 불필요한 메타데이터나 문서 파일이 이미지 레이어에 포함되는 것을 방지하여 이미지 크기를 최적화하는 데 도움이 됩니다.

    huggingface-cli download HeechanKim-Genon/additional_fonts \
      --repo-type dataset \
      --local-dir /app/extra_fonts \
      --local-dir-use-symlinks False \
      --include "*.ttf" "*.otf" "*.ttc"


# 추가 폰트 설치
COPY --from=font_artifacts /app/extra_fonts /usr/share/fonts/additional
RUN fc-cache -f -v
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

fc-cache 실행 시 -f (force) 옵션은 기존의 모든 폰트 캐시를 무시하고 전체를 다시 스캔하므로 빌드 과정에서 불필요한 시간을 소모할 수 있습니다. 이미 이전 단계(base, loext)에서 시스템 폰트 캐시가 생성되었으므로, 추가된 폰트들에 대해 증분 업데이트만 수행되도록 -f 옵션을 제거하는 것이 효율적입니다.

RUN fc-cache -v

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
genon/preprocessor/docker/Dockerfile (1)

263-265: Add a build-time assertion for the “68 extra fonts” requirement.

Line 263-265 registers fonts, but there is no fail-fast validation that the required font set is actually present after copy/cache refresh.

Suggested assertion step
 # 추가 폰트 설치
 COPY --from=font_artifacts /app/extra_fonts /usr/share/fonts/additional
-RUN fc-cache -f -v
+RUN set -eux; \
+    fc-cache -f -v; \
+    FONT_FILES="$(find /usr/share/fonts/additional -type f \( -iname '*.ttf' -o -iname '*.otf' -o -iname '*.ttc' \) | wc -l)"; \
+    test "${FONT_FILES}" -ge 68
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@genon/preprocessor/docker/Dockerfile` around lines 263 - 265, The Dockerfile
copies fonts with COPY --from=font_artifacts /app/extra_fonts
/usr/share/fonts/additional and refreshes caches with RUN fc-cache -f -v but
lacks a fail-fast check that exactly 68 font files were installed; add a
build-time assertion step immediately after the fc-cache RUN to count files
under /usr/share/fonts/additional (for example using find or a small
shell/python one-liner) and exit non-zero with a clear error message if the
count is not 68 so the image build fails fast when the required font set is
missing or incomplete.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@genon/preprocessor/docker/Dockerfile`:
- Around line 163-169: The RUN block invoking `huggingface-cli download` should
pin the dataset to a specific revision and restrict downloaded files to font
artifacts: update the `huggingface-cli download` invocation in the RUN that
creates /app/extra_fonts to include `--revision <immutable-ref>` (replace with
the chosen commit SHA or tag) and add `--include` filters for font extensions
(e.g., *.ttf, *.otf, *.woff, *.woff2) so only font files are fetched into
/app/extra_fonts; keep `--repo-type dataset` and the existing `--local-dir`
flags intact to preserve behavior.

---

Nitpick comments:
In `@genon/preprocessor/docker/Dockerfile`:
- Around line 263-265: The Dockerfile copies fonts with COPY
--from=font_artifacts /app/extra_fonts /usr/share/fonts/additional and refreshes
caches with RUN fc-cache -f -v but lacks a fail-fast check that exactly 68 font
files were installed; add a build-time assertion step immediately after the
fc-cache RUN to count files under /usr/share/fonts/additional (for example using
find or a small shell/python one-liner) and exit non-zero with a clear error
message if the count is not 68 so the image build fails fast when the required
font set is missing or incomplete.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 429341d6-eca6-468d-9f9e-8441ff106ede

📥 Commits

Reviewing files that changed from the base of the PR and between c990bc3 and 0dfbf5a.

📒 Files selected for processing (1)
  • genon/preprocessor/docker/Dockerfile

Comment on lines +163 to +169
RUN --mount=type=cache,target=${HF_HOME} \
echo "[INFO] Downloading extra fonts from HF (public)..." && \
mkdir -p /app/extra_fonts && \
huggingface-cli download HeechanKim-Genon/additional_fonts \
--repo-type dataset \
--local-dir /app/extra_fonts \
--local-dir-use-symlinks False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Verify whether revision pinning and file filters are currently absent/present.
rg -n -C2 'additional_fonts|--revision|--include|font_artifacts' genon/preprocessor/docker/Dockerfile

Repository: genonai/doc_parser

Length of output: 978


Pin HF dataset revision and restrict downloaded artifact types.

The huggingface-cli download at line 166 pulls additional_fonts from "latest" without --revision, making image contents non-deterministic across rebuilds. Additionally, it downloads all dataset files without filtering, potentially copying non-font artifacts into /usr/share/fonts/additional.

Note: The same Dockerfile already applies --include filters elsewhere (line 204), establishing the pattern for this repository.

Suggested fix
+ARG ADDITIONAL_FONTS_REPO=HeechanKim-Genon/additional_fonts
+ARG ADDITIONAL_FONTS_REV=<pin_dataset_commit_sha>

 FROM builder AS font_artifacts

 RUN --mount=type=cache,target=${HF_HOME} \
     echo "[INFO] Downloading extra fonts from HF (public)..." && \
     mkdir -p /app/extra_fonts && \
-    huggingface-cli download HeechanKim-Genon/additional_fonts \
+    huggingface-cli download "${ADDITIONAL_FONTS_REPO}" \
       --repo-type dataset \
+      --revision "${ADDITIONAL_FONTS_REV}" \
+      --include "*.ttf" \
+      --include "*.otf" \
+      --include "*.ttc" \
       --local-dir /app/extra_fonts \
       --local-dir-use-symlinks False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@genon/preprocessor/docker/Dockerfile` around lines 163 - 169, The RUN block
invoking `huggingface-cli download` should pin the dataset to a specific
revision and restrict downloaded files to font artifacts: update the
`huggingface-cli download` invocation in the RUN that creates /app/extra_fonts
to include `--revision <immutable-ref>` (replace with the chosen commit SHA or
tag) and add `--include` filters for font extensions (e.g., *.ttf, *.otf,
*.woff, *.woff2) so only font files are fetched into /app/extra_fonts; keep
`--repo-type dataset` and the existing `--local-dir` flags intact to preserve
behavior.

@HeechanKim-Genon HeechanKim-Genon requested a review from inoray April 29, 2026 08:54
@inoray inoray merged commit cc782c9 into develop Apr 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[PDF변환] pdf converter 빌드시 추가 폰트 다운로드 과정 적용

2 participants