Skip to content

feat(telegram): inbound image / document / video#177

Merged
williamwa merged 15 commits into
mainfrom
feat/telegram-inbound-media
May 29, 2026
Merged

feat(telegram): inbound image / document / video#177
williamwa merged 15 commits into
mainfrom
feat/telegram-inbound-media

Conversation

@williamwa
Copy link
Copy Markdown
Collaborator

@williamwa williamwa commented May 29, 2026

Fourth and final channel in the attachment-support rollout (after #174 WhatsApp, #175 Discord, #176 Slack).

Telegram was lopsided: it could send every media kind outbound and transcribe inbound voice, but inbound photos, documents, and video were silently dropped (not even modeled on TelegramMessage). This makes inbound symmetric with the existing outbound support. No core/protocol/gateway changes.

Inbound (Telegram → agent)

  • Photos (largest size picked), documents, and video are downloaded via the existing getFile + downloadFile helpers and uploaded as session attachments — images become vision input automatically.
  • The message caption is kept as text.
  • Media-only messages now trigger the agent instead of being dropped.
  • The existing voice→STT and TTS-voice-reply paths are unchanged.
  • Files over Telegram's ~20 MB Bot API download limit are skipped (logged).

Notes

  • Telegram is a private package bundled into the openhermit CLI, so this ships with the next CLI release. Version bumped 0.2.0 → 0.3.0.
  • Adds the first unit tests for this package (pickMediaFile: largest-photo selection, document/video mapping).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Telegram channel supports inbound photos, documents, videos as attachments (captions preserved), voice/audio transcribed, and outbound attachments sent with matching media types; inbound files > ~20 MB are skipped
  • Documentation

    • Telegram adapter docs updated to describe new media handling and platform limits
  • Tests

    • Added tests covering media selection and handling
  • Chores

    • Package version bumped to 0.3.0

Review Change Stack

williamwa and others added 4 commits May 29, 2026 18:01
WhatsApp was text-only: inbound media was dropped (only captions kept)
and the agent couldn't send files. Wire the channel to the existing
attachment infrastructure in both directions.

Inbound: images/video/documents are downloaded via Baileys
downloadMediaMessage and uploaded as session attachments (images become
vision input); captions ride along as text. Voice/audio notes are
transcribed via the agent's STT, and replies to a voice note are spoken
back as a WhatsApp voice note when TTS is configured. Media-only messages
now trigger the agent instead of being dropped. Media over the 25 MiB cap
is skipped.

Outbound: the agent's attachment_send deliveries (SSE 'attachment' event)
are routed to the matching Baileys send — image/video/document, and
ogg/opus audio as a native push-to-talk voice note.

No core/protocol/gateway changes — uses SDK uploadAttachment,
postMessage({attachments}), downloadAttachmentBytes, transcribeAudio,
synthesizeAudio. Bumps 0.2.0 -> 0.3.0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Discord was fully text-only: inbound attachments were ignored and the
agent couldn't send files. Wire the channel to the existing attachment
infrastructure both directions.

Inbound: message attachments (guild messages + DM gateway dispatch) are
fetched from the Discord CDN and uploaded as session attachments (images
become vision input); audio attachments are transcribed via the agent's
STT and appended as text. Media-only messages now trigger the agent.
Attachments over the 25 MiB cap are skipped.

Outbound: the agent's attachment_send deliveries (SSE 'attachment' event)
are sent back as Discord file uploads via AttachmentBuilder, with any
caption as the message content.

Discord is bundled into the CLI (private package), so this ships with the
next openhermit release rather than a standalone publish. Bumps 0.2.0 ->
0.3.0 for changelog clarity. Adds the first unit tests for this package.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slack was fully text-only and even dropped file_share messages (the
subtype guard returned early). Wire the channel to the existing
attachment infrastructure both directions.

Inbound: file_share uploads are now accepted; each file is fetched from
url_private with bot-token auth and uploaded as a session attachment
(images become vision input). Audio files are transcribed via STT and
appended as text. Media-only messages now trigger the agent. Files over
the 25 MiB cap are skipped.

Outbound: the agent's attachment_send deliveries (SSE 'attachment' event)
are uploaded back via files.uploadV2, into the originating thread when
applicable.

Needs files:read (download) and files:write (upload) bot scopes. Slack is
bundled into the CLI (private package), so this ships with the next
openhermit release. Bumps 0.2.0 -> 0.3.0 and adds the first unit tests
(isProcessableMessage gating). Also corrects the manual, which previously
claimed file support that did not exist.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Telegram could send every media kind outbound and transcribe inbound
voice, but inbound photos, documents, and video were silently dropped
(not even modeled on TelegramMessage). Wire them up so inbound is
symmetric with the existing outbound support.

Inbound photos (largest size), documents, and video are now downloaded
via the existing getFile + downloadFile helpers and uploaded as session
attachments (images become vision input); the message caption is kept as
text. Media-only messages now trigger the agent. The existing voice→STT
and TTS-voice-reply paths are unchanged. Files over Telegram's ~20 MB
Bot API download limit are skipped.

Telegram is bundled into the CLI (private package), so this ships with
the next openhermit release. Bumps 0.2.0 -> 0.3.0 and adds the first
unit tests for this package (pickMediaFile).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Warning

Review limit reached

@williamwa, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 2 minutes and 29 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9a0303e2-42cc-4c06-ac90-0993e30cd758

📥 Commits

Reviewing files that changed from the base of the PR and between 62af007 and 901ef68.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (3)
  • apps/gateway/package.json
  • docs/channel-adapter.md
  • docs/manual/17-channels.md
📝 Walkthrough

Walkthrough

Telegram adapter now uploads inbound photos/documents/videos (with captions as fallback text and caption mention detection), transcribes voice/audio when applicable, enforces a ~20 MB download cap (skips oversized files), and maps agent attachment_send responses back to Telegram media sends.

Changes

Telegram media upload and forwarding

Layer / File(s) Summary
Media type contracts
apps/channels/telegram/src/telegram-api.ts
New TelegramPhotoSize, TelegramDocument, TelegramVideo interfaces and expanded TelegramMessage with caption, caption_entities, photo, document, and video.
Media file selection with tests
apps/channels/telegram/src/bridge.ts, apps/channels/telegram/test/bridge.test.ts
TelegramMediaFile and pickMediaFile() choose the largest photo or document/video; tests cover photo selection, document/video mapping, and text-only behavior.
Bridge integration and caption/mention handling
apps/channels/telegram/src/bridge.ts
Bridge falls back to caption for text, detects mentions in caption_entities, and allows media-only messages to proceed when media resolves.
Download, size-check, and upload attachments
apps/channels/telegram/src/bridge.ts
resolveMediaAttachment() enforces ~20 MB cap, downloads file bytes, uploads to agent session as attachments; sendToAgent() includes attachments when present and handles media-only early returns on empty uploads.
Documentation and package metadata
apps/channels/telegram/package.json, apps/channels/telegram/README.md, docs/channel-adapter.md, docs/manual/17-channels.md
Version bump to 0.3.0, added Node test script, and docs updated to describe inbound media handling (vision input for images, captions as text, voice STT), outbound attachment_send mapping, and ~20 MB skip behavior.

Sequence Diagram(s)

sequenceDiagram
  participant Telegram
  participant TelegramBridge
  participant TelegramFileAPI
  participant AgentSession
  Telegram->>TelegramBridge: Incoming message (text/caption, media file_id, entities/caption_entities)
  TelegramBridge->>TelegramBridge: resolve text (text || trimmed caption), detect mentions
  TelegramBridge->>TelegramFileAPI: getFile/download(file_id) with ~20MB size check
  TelegramFileAPI-->>TelegramBridge: file bytes (or skip if >20MB)
  TelegramBridge->>AgentSession: uploadAttachment(bytes, filename, mimeType)
  AgentSession-->>TelegramBridge: attachment reference
  TelegramBridge->>AgentSession: postMessage(text, senderPayload, attachments?)
  AgentSession-->>TelegramBridge: attachment_send results
  TelegramBridge->>Telegram: send corresponding media (photo/document/video/voice) per attachment_send
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • HCF-STUDIOS/openhermit#150: Modifies the same TelegramBridge.handleMessageInner voice transcription flow for message-to-agent forwarding.
  • HCF-STUDIOS/openhermit#164: Updates TelegramBridge agent-response handling in the same file for streamed/final reply text suppression.

Poem

🐰 I hop through bytes and tiny chats,
Photos, vids, and voice—that's that!
Captions whispered as plain text,
Twenty megs we kindly text,
Attachments travel, back they come—hooray! 📸

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main feature: adding support for inbound Telegram images, documents, and video. It is specific, directly related to the primary changeset goal, and matches the implemented functionality across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/telegram-inbound-media

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/channels/telegram/src/bridge.ts (1)

126-154: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix missed @mentions in media captions for group messages.

  • isMentioned() only checks message.entities against message.text, but Telegram puts mention entities from photo/document/video captions into caption_entities (with offsets into message.caption). For groups, this can keep mentioned false and suppress triggering even when the caption contains @<bot>.
  • Add caption_entities to the TelegramMessage type and extend isMentioned() to check both entities/text and caption_entities/caption.
💡 Suggested fix
diff --git a/apps/channels/telegram/src/telegram-api.ts b/apps/channels/telegram/src/telegram-api.ts
@@
 export interface TelegramMessage {
   message_id: number;
   from?: TelegramUser;
   chat: TelegramChat;
   date: number;
   text?: string;
   caption?: string;
+  caption_entities?: TelegramMessageEntity[];
   entities?: TelegramMessageEntity[];
   reply_to_message?: TelegramMessage;
   voice?: TelegramVoice;
   audio?: TelegramAudio;
   photo?: TelegramPhotoSize[];
   document?: TelegramDocument;
   video?: TelegramVideo;
 }
diff --git a/apps/channels/telegram/src/bridge.ts b/apps/channels/telegram/src/bridge.ts
@@
   private async isMentioned(message: TelegramMessage): Promise<boolean> {
     const bot = await this.getBotInfo();
 
     // Reply to the bot's message
     if (message.reply_to_message?.from?.id === bot.id) {
       return true;
     }
 
-    // `@mention` in text entities
-    if (message.entities && bot.username) {
+    // `@mention` in text/caption entities
+    const entitySets: Array<{ entities?: typeof message.entities; source?: string }> = [
+      { entities: message.entities, source: message.text },
+      { entities: message.caption_entities, source: message.caption },
+    ];
+    if (bot.username) {
       const botUsername = bot.username.toLowerCase();
-      for (const entity of message.entities) {
-        if (
-          entity.type === 'mention' &&
-          message.text
-        ) {
-          const mentionText = message.text
-            .slice(entity.offset, entity.offset + entity.length)
-            .toLowerCase();
-          if (mentionText === `@${botUsername}`) {
-            return true;
-          }
-        }
-        // text_mention: when user has no username, Telegram uses this with a user object
-        if (entity.type === 'text_mention' && entity.user?.id === bot.id) {
-          return true;
+      for (const { entities, source } of entitySets) {
+        if (!entities || !source) continue;
+        for (const entity of entities) {
+          if (entity.type === 'mention') {
+            const mentionText = source
+              .slice(entity.offset, entity.offset + entity.length)
+              .toLowerCase();
+            if (mentionText === `@${botUsername}`) return true;
+          }
+          // text_mention: when user has no username, Telegram uses this with a user object
+          if (entity.type === 'text_mention' && entity.user?.id === bot.id) {
+            return true;
+          }
         }
       }
     }
 
     return false;
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/channels/telegram/src/bridge.ts` around lines 126 - 154, The isMentioned
function currently only inspects message.entities and message.text, missing
mentions in media captions; update the TelegramMessage type to include
caption_entities?: MessageEntity[] and caption?: string, then extend isMentioned
(still handling reply_to_message?.from?.id === bot.id) to iterate both
entities/text and caption_entities/caption: for each entity list, use the
corresponding text source (message.text for entities, message.caption for
caption_entities), extract the slice by entity.offset/length and compare to
`@${bot.username.toLowerCase()}`; also check text_mention entities in both lists
by verifying entity.user?.id === bot.id. Ensure null/undefined guards for
caption and caption_entities.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@apps/channels/telegram/src/bridge.ts`:
- Around line 126-154: The isMentioned function currently only inspects
message.entities and message.text, missing mentions in media captions; update
the TelegramMessage type to include caption_entities?: MessageEntity[] and
caption?: string, then extend isMentioned (still handling
reply_to_message?.from?.id === bot.id) to iterate both entities/text and
caption_entities/caption: for each entity list, use the corresponding text
source (message.text for entities, message.caption for caption_entities),
extract the slice by entity.offset/length and compare to
`@${bot.username.toLowerCase()}`; also check text_mention entities in both lists
by verifying entity.user?.id === bot.id. Ensure null/undefined guards for
caption and caption_entities.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1cc0112f-f152-4764-b32f-1cfde4deb7cf

📥 Commits

Reviewing files that changed from the base of the PR and between 38db5d9 and 627de91.

📒 Files selected for processing (7)
  • apps/channels/telegram/README.md
  • apps/channels/telegram/package.json
  • apps/channels/telegram/src/bridge.ts
  • apps/channels/telegram/src/telegram-api.ts
  • apps/channels/telegram/test/bridge.test.ts
  • docs/channel-adapter.md
  • docs/manual/17-channels.md

isMentioned() only inspected message.entities against message.text, so a
group photo/document/video captioned "@bot ..." was never recognized as a
mention — Telegram puts caption mentions in caption_entities with offsets
into caption. With inbound media now flowing, such messages would be
silently ignored in groups. Check both entity sources.

Addresses CodeRabbit review on #177.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@williamwa
Copy link
Copy Markdown
Collaborator Author

Good catch on the caption-mention gap — fixed in 62af007.

isMentioned() now checks both entities/text and caption_entities/caption, so a group photo/document/video captioned @bot … triggers the agent. Added caption_entities to TelegramMessage. Build + typecheck + tests green.

Three fixes from PR review:

1. Version pin/lockfile: gateway optionalDependencies still pinned
   @openhermit/channel-slack@0.2.0 while the workspace moved to 0.3.0, so a
   fresh install wouldn't link the updated adapter. Bump the gateway pin and
   the lockfile to 0.3.0.

2. 25 MiB cap was only enforced against Slack's reported file.size, which can
   be missing or wrong — downloadFile() would then read the whole body into
   memory. Move the cap into SlackApi.downloadFile(url, maxBytes): reject an
   oversized content-length up front, then stream and abort the moment the
   body crosses the limit so an oversized/mislabeled file never fully lands in
   memory. The bridge passes MAX_MEDIA_BYTES (file.size stays as a cheap
   early-skip).

3. isProcessableMessage() now trims text, so a whitespace-only event with no
   files is no longer treated as processable.

Adds tests: whitespace-only not processable; downloadFile rejects on
content-length and aborts mid-stream over the cap.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
williamwa and others added 9 commits May 29, 2026 18:59
The gateway optionalDependencies still pinned @openhermit/channel-discord
at 0.2.0 while the workspace moved to 0.3.0, so a fresh install wouldn't
link the updated adapter and the media changes wouldn't take effect in the
bundled CLI. Bump the gateway pin and lockfile to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The gateway optionalDependencies still pinned @openhermit/channel-telegram
at 0.2.0 while the workspace moved to 0.3.0, so a fresh install wouldn't
link the updated adapter and the inbound-media changes wouldn't take effect
in the bundled CLI. Bump the gateway pin and lockfile to match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Three review fixes:
- package.json description no longer says "Text-only v1." (stale npm
  metadata after the 0.3.0 media/voice rollout).
- On STT failure, log the detail but show the user a generic message
  instead of forwarding the raw error text into the chat.
- Normalize the audio MIME (strip params like `; codecs=opus`) before
  push-to-talk detection, so `audio/ogg; codecs=opus` still sends as a
  voice note.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A stalled Discord CDN connection during inbound attachment download could
block the per-channel message queue indefinitely. Add a 15s
AbortSignal.timeout() to the fetch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A stalled url_private connection during inbound file download could block
the channel queue indefinitely. Add a 15s AbortSignal.timeout() to the
downloadFile fetch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eat/slack-attachments

# Conflicts:
#	apps/gateway/package.json
#	docs/channel-adapter.md
#	package-lock.json
…t/telegram-inbound-media

# Conflicts:
#	apps/gateway/package.json
#	docs/channel-adapter.md
#	package-lock.json
…-media

# Conflicts:
#	apps/gateway/package.json
#	docs/channel-adapter.md
#	package-lock.json
@williamwa williamwa merged commit 06e4828 into main May 29, 2026
1 check passed
@williamwa williamwa deleted the feat/telegram-inbound-media branch May 29, 2026 12:03
williamwa added a commit that referenced this pull request May 29, 2026
Ship the bundled channel media support — Discord/Slack/Telegram
attachments (#175, #176, #177) and their gateway pins (now all 0.3.0) —
in the published CLI. Also syncs the stale lockfile cli entry (0.9.0 ->
0.9.2) left by the earlier 0.9.1 bump.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant