feat(telegram): inbound image / document / video by williamwa · Pull Request #177 · HCF-STUDIOS/openhermit

williamwa · 2026-05-29T10:15:18Z

Fourth and final channel in the attachment-support rollout (after #174 WhatsApp, #175 Discord, #176 Slack).

Telegram was lopsided: it could send every media kind outbound and transcribe inbound voice, but inbound photos, documents, and video were silently dropped (not even modeled on TelegramMessage). This makes inbound symmetric with the existing outbound support. No core/protocol/gateway changes.

Inbound (Telegram → agent)

Photos (largest size picked), documents, and video are downloaded via the existing getFile + downloadFile helpers and uploaded as session attachments — images become vision input automatically.
The message caption is kept as text.
Media-only messages now trigger the agent instead of being dropped.
The existing voice→STT and TTS-voice-reply paths are unchanged.
Files over Telegram's ~20 MB Bot API download limit are skipped (logged).

Notes

Telegram is a private package bundled into the openhermit CLI, so this ships with the next CLI release. Version bumped 0.2.0 → 0.3.0.
Adds the first unit tests for this package (pickMediaFile: largest-photo selection, document/video mapping).

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Telegram channel supports inbound photos, documents, videos as attachments (captions preserved), voice/audio transcribed, and outbound attachments sent with matching media types; inbound files > ~20 MB are skipped
Documentation
- Telegram adapter docs updated to describe new media handling and platform limits
Tests
- Added tests covering media selection and handling
Chores
- Package version bumped to 0.3.0

WhatsApp was text-only: inbound media was dropped (only captions kept) and the agent couldn't send files. Wire the channel to the existing attachment infrastructure in both directions. Inbound: images/video/documents are downloaded via Baileys downloadMediaMessage and uploaded as session attachments (images become vision input); captions ride along as text. Voice/audio notes are transcribed via the agent's STT, and replies to a voice note are spoken back as a WhatsApp voice note when TTS is configured. Media-only messages now trigger the agent instead of being dropped. Media over the 25 MiB cap is skipped. Outbound: the agent's attachment_send deliveries (SSE 'attachment' event) are routed to the matching Baileys send — image/video/document, and ogg/opus audio as a native push-to-talk voice note. No core/protocol/gateway changes — uses SDK uploadAttachment, postMessage({attachments}), downloadAttachmentBytes, transcribeAudio, synthesizeAudio. Bumps 0.2.0 -> 0.3.0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Discord was fully text-only: inbound attachments were ignored and the agent couldn't send files. Wire the channel to the existing attachment infrastructure both directions. Inbound: message attachments (guild messages + DM gateway dispatch) are fetched from the Discord CDN and uploaded as session attachments (images become vision input); audio attachments are transcribed via the agent's STT and appended as text. Media-only messages now trigger the agent. Attachments over the 25 MiB cap are skipped. Outbound: the agent's attachment_send deliveries (SSE 'attachment' event) are sent back as Discord file uploads via AttachmentBuilder, with any caption as the message content. Discord is bundled into the CLI (private package), so this ships with the next openhermit release rather than a standalone publish. Bumps 0.2.0 -> 0.3.0 for changelog clarity. Adds the first unit tests for this package. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Slack was fully text-only and even dropped file_share messages (the subtype guard returned early). Wire the channel to the existing attachment infrastructure both directions. Inbound: file_share uploads are now accepted; each file is fetched from url_private with bot-token auth and uploaded as a session attachment (images become vision input). Audio files are transcribed via STT and appended as text. Media-only messages now trigger the agent. Files over the 25 MiB cap are skipped. Outbound: the agent's attachment_send deliveries (SSE 'attachment' event) are uploaded back via files.uploadV2, into the originating thread when applicable. Needs files:read (download) and files:write (upload) bot scopes. Slack is bundled into the CLI (private package), so this ships with the next openhermit release. Bumps 0.2.0 -> 0.3.0 and adds the first unit tests (isProcessableMessage gating). Also corrects the manual, which previously claimed file support that did not exist. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Telegram could send every media kind outbound and transcribe inbound voice, but inbound photos, documents, and video were silently dropped (not even modeled on TelegramMessage). Wire them up so inbound is symmetric with the existing outbound support. Inbound photos (largest size), documents, and video are now downloaded via the existing getFile + downloadFile helpers and uploaded as session attachments (images become vision input); the message caption is kept as text. Media-only messages now trigger the agent. The existing voice→STT and TTS-voice-reply paths are unchanged. Files over Telegram's ~20 MB Bot API download limit are skipped. Telegram is bundled into the CLI (private package), so this ships with the next openhermit release. Bumps 0.2.0 -> 0.3.0 and adds the first unit tests for this package (pickMediaFile). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-05-29T10:15:32Z

Warning

Review limit reached

@williamwa, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 2 minutes and 29 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9a0303e2-42cc-4c06-ac90-0993e30cd758

📥 Commits

Reviewing files that changed from the base of the PR and between 62af007 and 901ef68.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (3)

apps/gateway/package.json
docs/channel-adapter.md
docs/manual/17-channels.md

📝 Walkthrough

Walkthrough

Telegram adapter now uploads inbound photos/documents/videos (with captions as fallback text and caption mention detection), transcribes voice/audio when applicable, enforces a ~20 MB download cap (skips oversized files), and maps agent attachment_send responses back to Telegram media sends.

Changes

Telegram media upload and forwarding

Layer / File(s)	Summary
Media type contracts `apps/channels/telegram/src/telegram-api.ts`	New `TelegramPhotoSize`, `TelegramDocument`, `TelegramVideo` interfaces and expanded `TelegramMessage` with `caption`, `caption_entities`, `photo`, `document`, and `video`.
Media file selection with tests `apps/channels/telegram/src/bridge.ts`, `apps/channels/telegram/test/bridge.test.ts`	`TelegramMediaFile` and `pickMediaFile()` choose the largest photo or document/video; tests cover photo selection, document/video mapping, and text-only behavior.
Bridge integration and caption/mention handling `apps/channels/telegram/src/bridge.ts`	Bridge falls back to `caption` for text, detects mentions in `caption_entities`, and allows media-only messages to proceed when media resolves.
Download, size-check, and upload attachments `apps/channels/telegram/src/bridge.ts`	`resolveMediaAttachment()` enforces ~20 MB cap, downloads file bytes, uploads to agent session as attachments; `sendToAgent()` includes attachments when present and handles media-only early returns on empty uploads.
Documentation and package metadata `apps/channels/telegram/package.json`, `apps/channels/telegram/README.md`, `docs/channel-adapter.md`, `docs/manual/17-channels.md`	Version bump to 0.3.0, added Node test script, and docs updated to describe inbound media handling (vision input for images, captions as text, voice STT), outbound `attachment_send` mapping, and ~20 MB skip behavior.

Sequence Diagram(s)

sequenceDiagram
  participant Telegram
  participant TelegramBridge
  participant TelegramFileAPI
  participant AgentSession
  Telegram->>TelegramBridge: Incoming message (text/caption, media file_id, entities/caption_entities)
  TelegramBridge->>TelegramBridge: resolve text (text || trimmed caption), detect mentions
  TelegramBridge->>TelegramFileAPI: getFile/download(file_id) with ~20MB size check
  TelegramFileAPI-->>TelegramBridge: file bytes (or skip if >20MB)
  TelegramBridge->>AgentSession: uploadAttachment(bytes, filename, mimeType)
  AgentSession-->>TelegramBridge: attachment reference
  TelegramBridge->>AgentSession: postMessage(text, senderPayload, attachments?)
  AgentSession-->>TelegramBridge: attachment_send results
  TelegramBridge->>Telegram: send corresponding media (photo/document/video/voice) per attachment_send

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

HCF-STUDIOS/openhermit#150: Modifies the same TelegramBridge.handleMessageInner voice transcription flow for message-to-agent forwarding.
HCF-STUDIOS/openhermit#164: Updates TelegramBridge agent-response handling in the same file for streamed/final reply text suppression.

Poem

🐰 I hop through bytes and tiny chats,
Photos, vids, and voice—that's that!
Captions whispered as plain text,
Twenty megs we kindly text,
Attachments travel, back they come—hooray! 📸

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main feature: adding support for inbound Telegram images, documents, and video. It is specific, directly related to the primary changeset goal, and matches the implemented functionality across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/telegram-inbound-media

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

apps/channels/telegram/src/bridge.ts (1)

126-154: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix missed @mentions in media captions for group messages.

isMentioned() only checks message.entities against message.text, but Telegram puts mention entities from photo/document/video captions into caption_entities (with offsets into message.caption). For groups, this can keep mentioned false and suppress triggering even when the caption contains @<bot>.
Add caption_entities to the TelegramMessage type and extend isMentioned() to check both entities/text and caption_entities/caption.

💡 Suggested fix

diff --git a/apps/channels/telegram/src/telegram-api.ts b/apps/channels/telegram/src/telegram-api.ts
@@
 export interface TelegramMessage {
   message_id: number;
   from?: TelegramUser;
   chat: TelegramChat;
   date: number;
   text?: string;
   caption?: string;
+  caption_entities?: TelegramMessageEntity[];
   entities?: TelegramMessageEntity[];
   reply_to_message?: TelegramMessage;
   voice?: TelegramVoice;
   audio?: TelegramAudio;
   photo?: TelegramPhotoSize[];
   document?: TelegramDocument;
   video?: TelegramVideo;
 }

diff --git a/apps/channels/telegram/src/bridge.ts b/apps/channels/telegram/src/bridge.ts
@@
   private async isMentioned(message: TelegramMessage): Promise<boolean> {
     const bot = await this.getBotInfo();
 
     // Reply to the bot's message
     if (message.reply_to_message?.from?.id === bot.id) {
       return true;
     }
 
-    // `@mention` in text entities
-    if (message.entities && bot.username) {
+    // `@mention` in text/caption entities
+    const entitySets: Array<{ entities?: typeof message.entities; source?: string }> = [
+      { entities: message.entities, source: message.text },
+      { entities: message.caption_entities, source: message.caption },
+    ];
+    if (bot.username) {
       const botUsername = bot.username.toLowerCase();
-      for (const entity of message.entities) {
-        if (
-          entity.type === 'mention' &&
-          message.text
-        ) {
-          const mentionText = message.text
-            .slice(entity.offset, entity.offset + entity.length)
-            .toLowerCase();
-          if (mentionText === `@${botUsername}`) {
-            return true;
-          }
-        }
-        // text_mention: when user has no username, Telegram uses this with a user object
-        if (entity.type === 'text_mention' && entity.user?.id === bot.id) {
-          return true;
+      for (const { entities, source } of entitySets) {
+        if (!entities || !source) continue;
+        for (const entity of entities) {
+          if (entity.type === 'mention') {
+            const mentionText = source
+              .slice(entity.offset, entity.offset + entity.length)
+              .toLowerCase();
+            if (mentionText === `@${botUsername}`) return true;
+          }
+          // text_mention: when user has no username, Telegram uses this with a user object
+          if (entity.type === 'text_mention' && entity.user?.id === bot.id) {
+            return true;
+          }
         }
       }
     }
 
     return false;
   }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/channels/telegram/src/bridge.ts` around lines 126 - 154, The isMentioned
function currently only inspects message.entities and message.text, missing
mentions in media captions; update the TelegramMessage type to include
caption_entities?: MessageEntity[] and caption?: string, then extend isMentioned
(still handling reply_to_message?.from?.id === bot.id) to iterate both
entities/text and caption_entities/caption: for each entity list, use the
corresponding text source (message.text for entities, message.caption for
caption_entities), extract the slice by entity.offset/length and compare to
`@${bot.username.toLowerCase()}`; also check text_mention entities in both lists
by verifying entity.user?.id === bot.id. Ensure null/undefined guards for
caption and caption_entities.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@apps/channels/telegram/src/bridge.ts`:
- Around line 126-154: The isMentioned function currently only inspects
message.entities and message.text, missing mentions in media captions; update
the TelegramMessage type to include caption_entities?: MessageEntity[] and
caption?: string, then extend isMentioned (still handling
reply_to_message?.from?.id === bot.id) to iterate both entities/text and
caption_entities/caption: for each entity list, use the corresponding text
source (message.text for entities, message.caption for caption_entities),
extract the slice by entity.offset/length and compare to
`@${bot.username.toLowerCase()}`; also check text_mention entities in both lists
by verifying entity.user?.id === bot.id. Ensure null/undefined guards for
caption and caption_entities.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1cc0112f-f152-4764-b32f-1cfde4deb7cf

📥 Commits

Reviewing files that changed from the base of the PR and between 38db5d9 and 627de91.

📒 Files selected for processing (7)

apps/channels/telegram/README.md
apps/channels/telegram/package.json
apps/channels/telegram/src/bridge.ts
apps/channels/telegram/src/telegram-api.ts
apps/channels/telegram/test/bridge.test.ts
docs/channel-adapter.md
docs/manual/17-channels.md

@bot

isMentioned() only inspected message.entities against message.text, so a group photo/document/video captioned "@bot ..." was never recognized as a mention — Telegram puts caption mentions in caption_entities with offsets into caption. With inbound media now flowing, such messages would be silently ignored in groups. Check both entity sources. Addresses CodeRabbit review on #177. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

williamwa · 2026-05-29T10:25:44Z

Good catch on the caption-mention gap — fixed in 62af007.

isMentioned() now checks both entities/text and caption_entities/caption, so a group photo/document/video captioned @bot … triggers the agent. Added caption_entities to TelegramMessage. Build + typecheck + tests green.

Three fixes from PR review: 1. Version pin/lockfile: gateway optionalDependencies still pinned @openhermit/channel-slack@0.2.0 while the workspace moved to 0.3.0, so a fresh install wouldn't link the updated adapter. Bump the gateway pin and the lockfile to 0.3.0. 2. 25 MiB cap was only enforced against Slack's reported file.size, which can be missing or wrong — downloadFile() would then read the whole body into memory. Move the cap into SlackApi.downloadFile(url, maxBytes): reject an oversized content-length up front, then stream and abort the moment the body crosses the limit so an oversized/mislabeled file never fully lands in memory. The bridge passes MAX_MEDIA_BYTES (file.size stays as a cheap early-skip). 3. isProcessableMessage() now trims text, so a whitespace-only event with no files is no longer treated as processable. Adds tests: whitespace-only not processable; downloadFile rejects on content-length and aborts mid-stream over the cap. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The gateway optionalDependencies still pinned @openhermit/channel-discord at 0.2.0 while the workspace moved to 0.3.0, so a fresh install wouldn't link the updated adapter and the media changes wouldn't take effect in the bundled CLI. Bump the gateway pin and lockfile to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The gateway optionalDependencies still pinned @openhermit/channel-telegram at 0.2.0 while the workspace moved to 0.3.0, so a fresh install wouldn't link the updated adapter and the inbound-media changes wouldn't take effect in the bundled CLI. Bump the gateway pin and lockfile to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Three review fixes: - package.json description no longer says "Text-only v1." (stale npm metadata after the 0.3.0 media/voice rollout). - On STT failure, log the detail but show the user a generic message instead of forwarding the raw error text into the chat. - Normalize the audio MIME (strip params like `; codecs=opus`) before push-to-talk detection, so `audio/ogg; codecs=opus` still sends as a voice note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A stalled Discord CDN connection during inbound attachment download could block the per-channel message queue indefinitely. Add a 15s AbortSignal.timeout() to the fetch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

A stalled url_private connection during inbound file download could block the channel queue indefinitely. Add a 15s AbortSignal.timeout() to the downloadFile fetch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…feat/slack-attachments

…eat/slack-attachments # Conflicts: # apps/gateway/package.json # docs/channel-adapter.md # package-lock.json

…t/telegram-inbound-media # Conflicts: # apps/gateway/package.json # docs/channel-adapter.md # package-lock.json

…-media # Conflicts: # apps/gateway/package.json # docs/channel-adapter.md # package-lock.json

Ship the bundled channel media support — Discord/Slack/Telegram attachments (#175, #176, #177) and their gateway pins (now all 0.3.0) — in the published CLI. Also syncs the stale lockfile cli entry (0.9.0 -> 0.9.2) left by the earlier 0.9.1 bump. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

williamwa and others added 4 commits May 29, 2026 18:01

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

williamwa mentioned this pull request May 29, 2026

feat(slack): inbound + outbound media attachments #176

Merged

williamwa and others added 9 commits May 29, 2026 18:59

fix(discord): bound CDN attachment fetch with a timeout

b262830

A stalled Discord CDN connection during inbound attachment download could block the per-channel message queue indefinitely. Add a 15s AbortSignal.timeout() to the fetch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

fix(slack): bound file download with a timeout

edc250d

A stalled url_private connection during inbound file download could block the channel queue indefinitely. Add a 15s AbortSignal.timeout() to the downloadFile fetch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/feat/whatsapp-attachments' into …

25269d4

…feat/slack-attachments

Merge remote-tracking branch 'origin/feat/discord-attachments' into f…

ea1395d

…eat/slack-attachments # Conflicts: # apps/gateway/package.json # docs/channel-adapter.md # package-lock.json

Merge remote-tracking branch 'origin/feat/slack-attachments' into fea…

5dbf6bb

…t/telegram-inbound-media # Conflicts: # apps/gateway/package.json # docs/channel-adapter.md # package-lock.json

Merge remote-tracking branch 'origin/main' into feat/telegram-inbound…

901ef68

…-media # Conflicts: # apps/gateway/package.json # docs/channel-adapter.md # package-lock.json

williamwa merged commit 06e4828 into main May 29, 2026
1 check passed

williamwa deleted the feat/telegram-inbound-media branch May 29, 2026 12:03

williamwa mentioned this pull request May 29, 2026

chore(cli): bump to 0.9.2 #178

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telegram): inbound image / document / video#177

feat(telegram): inbound image / document / video#177
williamwa merged 15 commits into
mainfrom
feat/telegram-inbound-media

williamwa commented May 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

williamwa commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

williamwa commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Inbound (Telegram → agent)

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

williamwa commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

williamwa commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading