tool: add image reader tool for local vision inputs #1306

Wangmerlyn · 2025-12-03T14:57:05Z

To whom it may concern, the built-in file_editor tool can provide agent with image input, a big shoutout to @xingyaoww for pointing it out.
file editor tool
image input test
So the functionality of this tool is completely covered by file_editor.

blacksmith-sh · 2025-12-07T12:54:02Z

[Automatic Post]: I have assigned @jpshackelford as a reviewer based on git blame information. Thanks in advance for the help!

enyst

Thank you for this. I think this raises an interesting question.

If the file_editor tool supports images already, do we need a separate image reader tool? WDYT?

I'm not sure. A quick thought is just: maybe? To note, one detail here is that we are looking to maybe try other tools, potentially replacing file editor, for GPT-5 and Gemini 3, and I'm not sure if they work for images.

On the other hand, to my knowledge, there's data that agents don't work well with too many tools, so adding duplicates maybe is not ideal.

Wangmerlyn · 2025-12-08T14:15:00Z

Thank you for this. I think this raises an interesting question.

If the file_editor tool supports images already, do we need a separate image reader tool? WDYT?

I'm not sure. A quick thought is just: maybe? To note, one detail here is that we are looking to maybe try other tools, potentially replacing file editor, for GPT-5 and Gemini 3, and I'm not sure if they work for images.

On the other hand, to my knowledge, there's data that agents don't work well with too many tools, so adding duplicates maybe is not ideal.

Oh yes, thank you for looking into this!

The background is that I wanted my agent to look at an image (e.g., a repo diagram), but it kept scanning the whole repo or large files instead. For debugging, I temporarily disabled the file_editor tool and found the agent is not able to "see" the image(I initially thought agents can do this by other means than tools to load images visually). Later I learned that agent needs a tool to "see" a image. Because of that, I created this separate image-reading tool.

Later, thanks to @xingyaoww, I realized that file_editor already supports loading images as visual input for the agent, so this standalone tool ends up being redundant for the current setup.

I’m happy to close this PR or adjust it depending on what direction you think makes the most sense.

Wangmerlyn added 4 commits December 3, 2025 22:53

Add image reader tool for local vision inputs

01ffc97

Fix image reader doc line length and executor signature

34d9210

Silence unused conversation param in image reader

8d48125

Move image reader tests into dedicated folder

15469d1

Wangmerlyn marked this pull request as ready for review December 3, 2025 16:16

blacksmith-sh bot requested a review from jpshackelford December 7, 2025 12:54

enyst reviewed Dec 7, 2025

View reviewed changes

jpshackelford removed their request for review December 8, 2025 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tool: add image reader tool for local vision inputs #1306

tool: add image reader tool for local vision inputs #1306

Wangmerlyn commented Dec 3, 2025 •

edited

Loading

Uh oh!

blacksmith-sh bot commented Dec 7, 2025

Uh oh!

enyst left a comment

Uh oh!

Wangmerlyn commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tool: add image reader tool for local vision inputs #1306

Are you sure you want to change the base?

tool: add image reader tool for local vision inputs #1306

Conversation

Wangmerlyn commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blacksmith-sh bot commented Dec 7, 2025

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

Wangmerlyn commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Wangmerlyn commented Dec 3, 2025 •

edited

Loading