feat(markdown): add picture index to image placeholder by nuri-yoo · Pull Request #555 · docling-project/docling-core

nuri-yoo · 2026-03-18T09:26:08Z

Summary

Add sequential picture indexing to the markdown image placeholder by introducing a {index} format token in image_placeholder.

Default: "" → "" → renders as , , ...
Index is extracted from item.self_ref (e.g. "#/pictures/6" → 6), matching JSON export references
Backward compatible: custom placeholders without {index} are unaffected (.replace() is a no-op)

Changes

MarkdownParams.image_placeholder default updated
MarkdownPictureSerializer._serialize_image_part(): resolve {index} token before emitting placeholder
Ground truth test data regenerated

Testing

All existing tests pass (369 passed, 0 failed)
Verified backward compatibility with explicit image_placeholder="" (no {index} token → no change)

Resolves docling-project/docling#3078

github-actions · 2026-03-18T09:26:19Z

✅ DCO Check Passed

Thanks @nuri-yoo, all your commits are properly signed off. 🎉

mergify · 2026-03-18T09:27:33Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviewer for test updates

Waiting for:

#approved-reviews-by >= 2

This rule is failing.

When test data is updated, we require two reviewers

#approved-reviews-by >= 2

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

dosubot · 2026-03-18T09:28:52Z

Related Documentation

1 document(s) may need updating based on files changed in this PR:

Docling

What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?

View Suggested Changes

@@ -7,7 +7,7 @@
     - `do_ocr` (default True): Use OCR
     - `force_ocr`: Replace existing text with OCR-generated text
     - `ocr_engine`, `ocr_lang`: OCR engine and language options
-    - `image_export_mode`: `placeholder`, `embedded`, `referenced`
+    - `image_export_mode`: `placeholder`, `embedded`, `referenced`. When using `placeholder` mode with Markdown export, the default placeholder format is `"<!-- image_{index} -->"`, which renders as sequential placeholders like `<!-- image_0 -->`, `<!-- image_1 -->`, etc. The index corresponds to the picture reference in the JSON export (e.g., `item.self_ref` like `"#/pictures/6"` → `6`). This is backward compatible—custom placeholders without the `{index}` token are unaffected.
     - `do_table_structure`, `table_mode`, `table_cell_matching`: Table extraction options (see Table Structure Models section below for details on TableFormer V1 and V2)
     - `do_code_enrichment`, `do_formula_enrichment`: Code/formula recognition
     - `vlm_pipeline_preset`, `vlm_pipeline_custom_config`, `picture_description_preset`, `picture_description_custom_config`, `code_formula_preset`, `code_formula_custom_config`: New model inference engine and preset options for VLM, picture description, and code/formula extraction

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

^{How did I do? Any feedback?}

codecov · 2026-03-19T13:17:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Use `{index}` token in `image_placeholder` to include the picture index from `item.self_ref`. Default placeholder changes from `` to ``, producing ``, etc. Backward compatible: custom placeholders without `{index}` are unaffected. Related: docling-project/docling#3078

I, nryoo <nryoo@nryooui-MacBookPro.local>, hereby add my Signed-off-by to this commit: 5de57a5 Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>

Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>

I, nryoo <nryoo@nryooui-MacBookPro.local>, hereby add my Signed-off-by to this commit: c157073 Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>

ceberam · 2026-03-20T11:54:48Z

@nuri-yoo I've seen you've been adding new commits lately. Just please let us know when it's ready for review.

nuri-yoo · 2026-03-20T12:01:00Z

Ready for review. All CI checks are passing now.

nuri-yoo · 2026-04-06T02:42:10Z

@ceberam Gentle ping on this PR. Appreciate a review when you get a chance.

ceberam

Thanks a lot @nuri-yoo for your contribution. However, I don’t think indexed image placeholders should become the library default. Markdown export here is a lossy text representation of DoclingDocument. Embedding self_ref-derived picture identity into the default output makes it application-specific and sets a precedent for exposing internal node indices for other item types as well. The DoclingDocument should be the reference object for document hierarchy.

The same workflow can be achieved with a custom serializer extension (e.g., custom MarkdownPictureSerializer / BasePictureSerializer), as shown in Creating a custom serializer.

For instance:

from pathlib import Path

from docling_core.transforms.serializer.base import SerializationResult
from docling_core.transforms.serializer.common import create_ser_result
from docling_core.transforms.serializer.markdown import (
    MarkdownDocSerializer,
    MarkdownParams,
    MarkdownPictureSerializer,
)
from docling_core.types.doc.base import ImageRefMode
from docling_core.types.doc.document import DoclingDocument, PictureItem


class IndexedMarkdownPictureSerializer(MarkdownPictureSerializer):
    """Custom picture serializer that supports {index} in the placeholder."""

    def _serialize_image_part(
        self,
        item: PictureItem,
        doc: DoclingDocument,
        image_mode: ImageRefMode,
        image_placeholder: str,
        **kwargs,
    ) -> SerializationResult:
        pic_idx = item.self_ref.rsplit("/", 1)[-1]
        resolved_placeholder = image_placeholder.replace("{index}", pic_idx)

        # Reuse the parent implementation for non-placeholder modes if desired.
        if image_mode != ImageRefMode.PLACEHOLDER:
            return super()._serialize_image_part(
                item=item,
                doc=doc,
                image_mode=image_mode,
                image_placeholder=resolved_placeholder,
                **kwargs,
            )

        return create_ser_result(text=resolved_placeholder, span_source=item)


# Example usage with an existing document
src = Path("test/data/doc/2408.09869v3_enriched.json")
doc: DoclingDocument = DoclingDocument.load_from_json(src)

serializer = MarkdownDocSerializer(
    doc=doc,
    picture_serializer=IndexedMarkdownPictureSerializer(),
    params=MarkdownParams(
        image_mode=ImageRefMode.PLACEHOLDER,
        image_placeholder="<!-- image_{index} -->",
    ),
)

markdown = serializer.serialize().text
print(markdown)

My suggestion would be:

close this PR
show this use case in Docling documentation (section Serialization) by extending the notebook serialization.ipynb

nuri-yoo · 2026-04-14T08:01:47Z

Thanks for the thorough review and the alternative approach. I'll close this and open a docs PR extending serialization.ipynb with the custom serializer example instead.

ceberam · 2026-04-14T08:07:15Z

Thanks for the thorough review and the alternative approach. I'll close this and open a docs PR extending serialization.ipynb with the custom serializer example instead.

Sounds good! You can link the new PR to the same issue docling-project/docling#3078 when it's ready.

Show how to subclass MarkdownPictureSerializer to resolve {index} tokens in image placeholders using self_ref, as an alternative to modifying the library default. Ref: docling-project/docling-core#555

…ok (#3293) * docs: add indexed picture placeholder example to serialization notebook Show how to subclass MarkdownPictureSerializer to resolve {index} tokens in image placeholders using self_ref, as an alternative to modifying the library default. Ref: docling-project/docling-core#555 * DCO Remediation Commit for nuri-yoo <nuri-yoo@users.noreply.github.com> I, nuri-yoo <nuri-yoo@users.noreply.github.com>, hereby add my Signed-off-by to this commit: 82cb733 Signed-off-by: nuri-yoo <nuri-yoo@users.noreply.github.com> --------- Signed-off-by: nuri-yoo <nuri-yoo@users.noreply.github.com> Co-authored-by: nuri-yoo <nuri-yoo@users.noreply.github.com>

ceberam self-requested a review March 19, 2026 13:31

nryoo added 4 commits March 20, 2026 20:42

DCO Remediation Commit for nryoo <nryoo@nryooui-MacBookPro.local>

1f459dd

I, nryoo <nryoo@nryooui-MacBookPro.local>, hereby add my Signed-off-by to this commit: 5de57a5 Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>

test: update chunker ground truth for image index placeholder

1162ee6

Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>

fix: remove unreachable else branch in _serialize_image_part

8ee7731

Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>

nuri-yoo force-pushed the feat/image-placeholder-index branch from 4d418c3 to 8ee7731 Compare March 20, 2026 11:43

DCO Remediation Commit for nryoo <nryoo@nryooui-MacBookPro.local>

b2db98a

I, nryoo <nryoo@nryooui-MacBookPro.local>, hereby add my Signed-off-by to this commit: c157073 Signed-off-by: nryoo <nryoo@nryooui-MacBookPro.local>

ceberam requested changes Apr 14, 2026

View reviewed changes

nuri-yoo closed this Apr 14, 2026

nuri-yoo mentioned this pull request Apr 14, 2026

docs: add indexed picture placeholder example to serialization notebook docling-project/docling#3293

Merged

1 task

ceberam mentioned this pull request Apr 15, 2026

image/figure count in md an txt docling-project/docling#3078

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(markdown): add picture index to image placeholder#555

feat(markdown): add picture index to image placeholder#555
nuri-yoo wants to merge 5 commits intodocling-project:mainfrom
nuri-yoo:feat/image-placeholder-index

nuri-yoo commented Mar 18, 2026 •

edited by ceberam

Loading

Uh oh!

github-actions bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

mergify bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

dosubot bot commented Mar 18, 2026

Uh oh!

codecov bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

ceberam commented Mar 20, 2026

Uh oh!

nuri-yoo commented Mar 20, 2026

Uh oh!

nuri-yoo commented Apr 6, 2026

Uh oh!

ceberam left a comment •

edited

Loading

Uh oh!

nuri-yoo commented Apr 14, 2026

Uh oh!

ceberam commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nuri-yoo commented Mar 18, 2026 • edited by ceberam Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Uh oh!

github-actions bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🔴 Require two reviewer for test updates

🟢 Enforce conventional commit

Uh oh!

dosubot bot commented Mar 18, 2026

What are the detailed pipeline options and processing behaviors for PDF, DOCX, PPTX, and XLSX files in the Python SDK?

Uh oh!

codecov bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ceberam commented Mar 20, 2026

Uh oh!

nuri-yoo commented Mar 20, 2026

Uh oh!

nuri-yoo commented Apr 6, 2026

Uh oh!

ceberam left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nuri-yoo commented Apr 14, 2026

Uh oh!

ceberam commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nuri-yoo commented Mar 18, 2026 •

edited by ceberam

Loading

github-actions bot commented Mar 18, 2026 •

edited

Loading

mergify bot commented Mar 18, 2026 •

edited

Loading

codecov bot commented Mar 19, 2026 •

edited

Loading

ceberam left a comment •

edited

Loading