Skip to content

Conversation

@Wangmerlyn
Copy link
Contributor

@Wangmerlyn Wangmerlyn commented Dec 3, 2025

To whom it may concern, the built-in file_editor tool can provide agent with image input, a big shoutout to @xingyaoww for pointing it out.
file editor tool
image input test
So the functionality of this tool is completely covered by file_editor.

@Wangmerlyn Wangmerlyn marked this pull request as ready for review December 3, 2025 16:16
@blacksmith-sh blacksmith-sh bot requested a review from jpshackelford December 7, 2025 12:54
@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Dec 7, 2025

[Automatic Post]: I have assigned @jpshackelford as a reviewer based on git blame information. Thanks in advance for the help!

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this. I think this raises an interesting question.

If the file_editor tool supports images already, do we need a separate image reader tool? WDYT?

I'm not sure. A quick thought is just: maybe? To note, one detail here is that we are looking to maybe try other tools, potentially replacing file editor, for GPT-5 and Gemini 3, and I'm not sure if they work for images.

On the other hand, to my knowledge, there's data that agents don't work well with too many tools, so adding duplicates maybe is not ideal.

@Wangmerlyn
Copy link
Contributor Author

Thank you for this. I think this raises an interesting question.

If the file_editor tool supports images already, do we need a separate image reader tool? WDYT?

I'm not sure. A quick thought is just: maybe? To note, one detail here is that we are looking to maybe try other tools, potentially replacing file editor, for GPT-5 and Gemini 3, and I'm not sure if they work for images.

On the other hand, to my knowledge, there's data that agents don't work well with too many tools, so adding duplicates maybe is not ideal.

Oh yes, thank you for looking into this!

The background is that I wanted my agent to look at an image (e.g., a repo diagram), but it kept scanning the whole repo or large files instead. For debugging, I temporarily disabled the file_editor tool and found the agent is not able to "see" the image(I initially thought agents can do this by other means than tools to load images visually). Later I learned that agent needs a tool to "see" a image. Because of that, I created this separate image-reading tool.

Later, thanks to @xingyaoww, I realized that file_editor already supports loading images as visual input for the agent, so this standalone tool ends up being redundant for the current setup.

I’m happy to close this PR or adjust it depending on what direction you think makes the most sense.

@jpshackelford jpshackelford removed their request for review December 8, 2025 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants