feat(telegram): inbound image / document / video#177
Conversation
WhatsApp was text-only: inbound media was dropped (only captions kept)
and the agent couldn't send files. Wire the channel to the existing
attachment infrastructure in both directions.
Inbound: images/video/documents are downloaded via Baileys
downloadMediaMessage and uploaded as session attachments (images become
vision input); captions ride along as text. Voice/audio notes are
transcribed via the agent's STT, and replies to a voice note are spoken
back as a WhatsApp voice note when TTS is configured. Media-only messages
now trigger the agent instead of being dropped. Media over the 25 MiB cap
is skipped.
Outbound: the agent's attachment_send deliveries (SSE 'attachment' event)
are routed to the matching Baileys send — image/video/document, and
ogg/opus audio as a native push-to-talk voice note.
No core/protocol/gateway changes — uses SDK uploadAttachment,
postMessage({attachments}), downloadAttachmentBytes, transcribeAudio,
synthesizeAudio. Bumps 0.2.0 -> 0.3.0.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Discord was fully text-only: inbound attachments were ignored and the agent couldn't send files. Wire the channel to the existing attachment infrastructure both directions. Inbound: message attachments (guild messages + DM gateway dispatch) are fetched from the Discord CDN and uploaded as session attachments (images become vision input); audio attachments are transcribed via the agent's STT and appended as text. Media-only messages now trigger the agent. Attachments over the 25 MiB cap are skipped. Outbound: the agent's attachment_send deliveries (SSE 'attachment' event) are sent back as Discord file uploads via AttachmentBuilder, with any caption as the message content. Discord is bundled into the CLI (private package), so this ships with the next openhermit release rather than a standalone publish. Bumps 0.2.0 -> 0.3.0 for changelog clarity. Adds the first unit tests for this package. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Slack was fully text-only and even dropped file_share messages (the subtype guard returned early). Wire the channel to the existing attachment infrastructure both directions. Inbound: file_share uploads are now accepted; each file is fetched from url_private with bot-token auth and uploaded as a session attachment (images become vision input). Audio files are transcribed via STT and appended as text. Media-only messages now trigger the agent. Files over the 25 MiB cap are skipped. Outbound: the agent's attachment_send deliveries (SSE 'attachment' event) are uploaded back via files.uploadV2, into the originating thread when applicable. Needs files:read (download) and files:write (upload) bot scopes. Slack is bundled into the CLI (private package), so this ships with the next openhermit release. Bumps 0.2.0 -> 0.3.0 and adds the first unit tests (isProcessableMessage gating). Also corrects the manual, which previously claimed file support that did not exist. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Telegram could send every media kind outbound and transcribe inbound voice, but inbound photos, documents, and video were silently dropped (not even modeled on TelegramMessage). Wire them up so inbound is symmetric with the existing outbound support. Inbound photos (largest size), documents, and video are now downloaded via the existing getFile + downloadFile helpers and uploaded as session attachments (images become vision input); the message caption is kept as text. Media-only messages now trigger the agent. The existing voice→STT and TTS-voice-reply paths are unchanged. Files over Telegram's ~20 MB Bot API download limit are skipped. Telegram is bundled into the CLI (private package), so this ships with the next openhermit release. Bumps 0.2.0 -> 0.3.0 and adds the first unit tests for this package (pickMediaFile). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 2 minutes and 29 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (3)
📝 WalkthroughWalkthroughTelegram adapter now uploads inbound photos/documents/videos (with captions as fallback text and caption mention detection), transcribes voice/audio when applicable, enforces a ~20 MB download cap (skips oversized files), and maps agent ChangesTelegram media upload and forwarding
Sequence Diagram(s)sequenceDiagram
participant Telegram
participant TelegramBridge
participant TelegramFileAPI
participant AgentSession
Telegram->>TelegramBridge: Incoming message (text/caption, media file_id, entities/caption_entities)
TelegramBridge->>TelegramBridge: resolve text (text || trimmed caption), detect mentions
TelegramBridge->>TelegramFileAPI: getFile/download(file_id) with ~20MB size check
TelegramFileAPI-->>TelegramBridge: file bytes (or skip if >20MB)
TelegramBridge->>AgentSession: uploadAttachment(bytes, filename, mimeType)
AgentSession-->>TelegramBridge: attachment reference
TelegramBridge->>AgentSession: postMessage(text, senderPayload, attachments?)
AgentSession-->>TelegramBridge: attachment_send results
TelegramBridge->>Telegram: send corresponding media (photo/document/video/voice) per attachment_send
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/channels/telegram/src/bridge.ts (1)
126-154:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix missed
@mentionsin media captions for group messages.
isMentioned()only checksmessage.entitiesagainstmessage.text, but Telegram puts mention entities from photo/document/video captions intocaption_entities(with offsets intomessage.caption). For groups, this can keepmentionedfalse and suppress triggering even when the caption contains@<bot>.- Add
caption_entitiesto theTelegramMessagetype and extendisMentioned()to check bothentities/textandcaption_entities/caption.💡 Suggested fix
diff --git a/apps/channels/telegram/src/telegram-api.ts b/apps/channels/telegram/src/telegram-api.ts @@ export interface TelegramMessage { message_id: number; from?: TelegramUser; chat: TelegramChat; date: number; text?: string; caption?: string; + caption_entities?: TelegramMessageEntity[]; entities?: TelegramMessageEntity[]; reply_to_message?: TelegramMessage; voice?: TelegramVoice; audio?: TelegramAudio; photo?: TelegramPhotoSize[]; document?: TelegramDocument; video?: TelegramVideo; }diff --git a/apps/channels/telegram/src/bridge.ts b/apps/channels/telegram/src/bridge.ts @@ private async isMentioned(message: TelegramMessage): Promise<boolean> { const bot = await this.getBotInfo(); // Reply to the bot's message if (message.reply_to_message?.from?.id === bot.id) { return true; } - // `@mention` in text entities - if (message.entities && bot.username) { + // `@mention` in text/caption entities + const entitySets: Array<{ entities?: typeof message.entities; source?: string }> = [ + { entities: message.entities, source: message.text }, + { entities: message.caption_entities, source: message.caption }, + ]; + if (bot.username) { const botUsername = bot.username.toLowerCase(); - for (const entity of message.entities) { - if ( - entity.type === 'mention' && - message.text - ) { - const mentionText = message.text - .slice(entity.offset, entity.offset + entity.length) - .toLowerCase(); - if (mentionText === `@${botUsername}`) { - return true; - } - } - // text_mention: when user has no username, Telegram uses this with a user object - if (entity.type === 'text_mention' && entity.user?.id === bot.id) { - return true; + for (const { entities, source } of entitySets) { + if (!entities || !source) continue; + for (const entity of entities) { + if (entity.type === 'mention') { + const mentionText = source + .slice(entity.offset, entity.offset + entity.length) + .toLowerCase(); + if (mentionText === `@${botUsername}`) return true; + } + // text_mention: when user has no username, Telegram uses this with a user object + if (entity.type === 'text_mention' && entity.user?.id === bot.id) { + return true; + } } } } return false; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/channels/telegram/src/bridge.ts` around lines 126 - 154, The isMentioned function currently only inspects message.entities and message.text, missing mentions in media captions; update the TelegramMessage type to include caption_entities?: MessageEntity[] and caption?: string, then extend isMentioned (still handling reply_to_message?.from?.id === bot.id) to iterate both entities/text and caption_entities/caption: for each entity list, use the corresponding text source (message.text for entities, message.caption for caption_entities), extract the slice by entity.offset/length and compare to `@${bot.username.toLowerCase()}`; also check text_mention entities in both lists by verifying entity.user?.id === bot.id. Ensure null/undefined guards for caption and caption_entities.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@apps/channels/telegram/src/bridge.ts`:
- Around line 126-154: The isMentioned function currently only inspects
message.entities and message.text, missing mentions in media captions; update
the TelegramMessage type to include caption_entities?: MessageEntity[] and
caption?: string, then extend isMentioned (still handling
reply_to_message?.from?.id === bot.id) to iterate both entities/text and
caption_entities/caption: for each entity list, use the corresponding text
source (message.text for entities, message.caption for caption_entities),
extract the slice by entity.offset/length and compare to
`@${bot.username.toLowerCase()}`; also check text_mention entities in both lists
by verifying entity.user?.id === bot.id. Ensure null/undefined guards for
caption and caption_entities.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1cc0112f-f152-4764-b32f-1cfde4deb7cf
📒 Files selected for processing (7)
apps/channels/telegram/README.mdapps/channels/telegram/package.jsonapps/channels/telegram/src/bridge.tsapps/channels/telegram/src/telegram-api.tsapps/channels/telegram/test/bridge.test.tsdocs/channel-adapter.mddocs/manual/17-channels.md
isMentioned() only inspected message.entities against message.text, so a group photo/document/video captioned "@bot ..." was never recognized as a mention — Telegram puts caption mentions in caption_entities with offsets into caption. With inbound media now flowing, such messages would be silently ignored in groups. Check both entity sources. Addresses CodeRabbit review on #177. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Good catch on the caption-mention gap — fixed in 62af007.
|
Three fixes from PR review: 1. Version pin/lockfile: gateway optionalDependencies still pinned @openhermit/channel-slack@0.2.0 while the workspace moved to 0.3.0, so a fresh install wouldn't link the updated adapter. Bump the gateway pin and the lockfile to 0.3.0. 2. 25 MiB cap was only enforced against Slack's reported file.size, which can be missing or wrong — downloadFile() would then read the whole body into memory. Move the cap into SlackApi.downloadFile(url, maxBytes): reject an oversized content-length up front, then stream and abort the moment the body crosses the limit so an oversized/mislabeled file never fully lands in memory. The bridge passes MAX_MEDIA_BYTES (file.size stays as a cheap early-skip). 3. isProcessableMessage() now trims text, so a whitespace-only event with no files is no longer treated as processable. Adds tests: whitespace-only not processable; downloadFile rejects on content-length and aborts mid-stream over the cap. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The gateway optionalDependencies still pinned @openhermit/channel-discord at 0.2.0 while the workspace moved to 0.3.0, so a fresh install wouldn't link the updated adapter and the media changes wouldn't take effect in the bundled CLI. Bump the gateway pin and lockfile to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The gateway optionalDependencies still pinned @openhermit/channel-telegram at 0.2.0 while the workspace moved to 0.3.0, so a fresh install wouldn't link the updated adapter and the inbound-media changes wouldn't take effect in the bundled CLI. Bump the gateway pin and lockfile to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Three review fixes: - package.json description no longer says "Text-only v1." (stale npm metadata after the 0.3.0 media/voice rollout). - On STT failure, log the detail but show the user a generic message instead of forwarding the raw error text into the chat. - Normalize the audio MIME (strip params like `; codecs=opus`) before push-to-talk detection, so `audio/ogg; codecs=opus` still sends as a voice note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A stalled Discord CDN connection during inbound attachment download could block the per-channel message queue indefinitely. Add a 15s AbortSignal.timeout() to the fetch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A stalled url_private connection during inbound file download could block the channel queue indefinitely. Add a 15s AbortSignal.timeout() to the downloadFile fetch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…feat/slack-attachments
…eat/slack-attachments # Conflicts: # apps/gateway/package.json # docs/channel-adapter.md # package-lock.json
…t/telegram-inbound-media # Conflicts: # apps/gateway/package.json # docs/channel-adapter.md # package-lock.json
…-media # Conflicts: # apps/gateway/package.json # docs/channel-adapter.md # package-lock.json
Ship the bundled channel media support — Discord/Slack/Telegram attachments (#175, #176, #177) and their gateway pins (now all 0.3.0) — in the published CLI. Also syncs the stale lockfile cli entry (0.9.0 -> 0.9.2) left by the earlier 0.9.1 bump. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Fourth and final channel in the attachment-support rollout (after #174 WhatsApp, #175 Discord, #176 Slack).
Telegram was lopsided: it could send every media kind outbound and transcribe inbound voice, but inbound photos, documents, and video were silently dropped (not even modeled on
TelegramMessage). This makes inbound symmetric with the existing outbound support. No core/protocol/gateway changes.Inbound (Telegram → agent)
getFile+downloadFilehelpers and uploaded as session attachments — images become vision input automatically.captionis kept as text.Notes
privatepackage bundled into theopenhermitCLI, so this ships with the next CLI release. Version bumped 0.2.0 → 0.3.0.pickMediaFile: largest-photo selection, document/video mapping).🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests
Chores