Skip to content

feat(media): full expired CDN recovery pipeline — retryMediaFromMetadata, fetchGroupHistory, batchRecoverMedia#2465

Open
CyPack wants to merge 5 commits intoEvolutionAPI:mainfrom
CyPack:fix/media-key-jsonb-updateMediaMessage
Open

feat(media): full expired CDN recovery pipeline — retryMediaFromMetadata, fetchGroupHistory, batchRecoverMedia#2465
CyPack wants to merge 5 commits intoEvolutionAPI:mainfrom
CyPack:fix/media-key-jsonb-updateMediaMessage

Conversation

@CyPack
Copy link

@CyPack CyPack commented Mar 8, 2026

Problem

Three compounding bugs that silently killed media recovery in EA:

  1. JSONB mediaKey byte order — PostgreSQL JSONB stores object keys lexicographically ("0","1","10",...,"2",...,"9"). Object.values() returns wrong byte order → Baileys HKDF/AES-GCM → Unsupported state or unable to authenticate data

  2. Baileys RC9 dead codereuploadRequest callback never fires because the catch block checks error.status but Boom errors set output.statusCode → sender re-upload never requested for expired CDN URLs

  3. Missing self-sufficient recovery pipeline — EA had no way to recover media without the message being in its own DB, and no batch pipeline

Solution (4 commits)

Commit 262c9300 — PostgreSQL JSONB key sorting

// BEFORE: Object.values(doc.mediaKey)  ← lexicographic order, wrong bytes
// AFTER:
Object.keys(doc.mediaKey)
  .sort((a, b) => parseInt(a) - parseInt(b))
  .map((k) => doc.mediaKey[k])

Also handles base64 string format (not just byte objects).

Commit f268571b — Baileys RC9 reuploadRequest dead code workaround

Adds explicit updateMediaMessage() call with 30s timeout in the catch block of getBase64FromMediaMessage, bypassing the broken callback mechanism.

Commit 5e2f0774 — Two new endpoints for external recovery

POST /chat/retryMediaFromMetadata/{instance} — Download media using caller-supplied metadata (mediaKey, directPath, url) without needing the message in EA's DB. Mirrors OwnPilot's retryMediaFromMetadata() algorithm.

POST /chat/fetchGroupHistory/{instance} — Trigger on-demand WhatsApp history sync for a group. WhatsApp responds with old message protos containing fresh mediaKey + directPath, enabling recovery of messages EA never stored. 30s rate-limit guard included.

Commit 28fc185f — End-to-end batch recovery pipeline

POST /chat/batchRecoverMedia/{instance} — Full self-sufficient pipeline:

  1. Fetch mediaKey + directPath + url from EA's own Message table
  2. Download via retryMediaFromMetadata (direct → updateMediaMessage fallback)
  3. Upload buffer to MinIO (s3Service.uploadFile)
  4. Upsert prismaRepository.media record
  5. Update message.mediaUrl in DB so future reads skip re-download
POST /chat/batchRecoverMedia/goconnectit
{
  "messageIds": ["3EB0...","2A42..."],
  "continueOnError": true,
  "storeToMinIO": true
}
// Response: {"total":320,"ok":310,"skip":3,"error":7}

Test Results

Test Result
Expired CDN SOR (JSONB fix) 21058 bytes (was failing)
retryMediaFromMetadata single 21288 bytes
fetchGroupHistory sessionId returned, history sync triggered
batchRecoverMedia 320 files 310 ok / 3 skip / 7 irrecoverable

API Reference

POST /chat/retryMediaFromMetadata/{instance}
Body: { messageId, remoteJid, mediaKey (base64), directPath, url, mimeType?, filename?, convertToMp4? }

POST /chat/fetchGroupHistory/{instance}  
Body: { groupJid (@g.us), count? (max 50), anchorMessageId?, anchorTimestamp? }

POST /chat/batchRecoverMedia/{instance}
Body: { messageIds: string[], continueOnError?: bool, storeToMinIO?: bool }

CyPack and others added 2 commits March 8, 2026 18:06
…st bug

Baileys RC9 checks error.status but Boom sets output.statusCode, causing
reuploadRequest to never trigger automatically. Fix: in getBase64FromMediaMessage
catch block, explicitly call this.client.updateMediaMessage() with 30s timeout
to get fresh CDN URL, then retry download. Falls back to downloadContentFromMessage
if updateMediaMessage also fails.

Technique validated in OwnPilot retryMediaFromMetadata() — 174/205 SOR files recovered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PostgreSQL JSONB stores object keys lexicographically, so a Uint8Array
serialized as {0:b0,1:b1,...,9:b9,10:b10,...} is retrieved with keys in
order "0","1","10","11",...,"19","2","20",...,"9". Using Object.values()
on this gives bytes in the WRONG order, causing Baileys HKDF/AES-GCM to
fail with "Unsupported state or unable to authenticate data".

Fix: sort keys numerically before constructing Uint8Array, and also handle
base64-encoded string mediaKey (from HTTP request bodies).

This matches OwnPilot retryMediaFromMetadata() which uses:
  new Uint8Array(Buffer.from(base64mediaKey, 'base64'))

Before: updateMediaMessage → "Unsupported state or unable to authenticate data"
After:  updateMediaMessage → re-upload → fresh CDN URL → download ✅

Tested: 2312ZL_2_A_V1.SOR (expired CDN, fromMe=true) → 21058 bytes ✅

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Mar 8, 2026

Reviewer's Guide

Adds robust media re-download handling for expired WhatsApp CDN URLs and fixes Uint8Array reconstruction from PostgreSQL JSONB to prevent cryptographic failures when downloading media via Baileys RC9.

Sequence diagram for WhatsApp media download with explicit updateMediaMessage

sequenceDiagram
  actor Caller
  participant BaileysStartupService
  participant BaileysClient
  participant WhatsAppCDN
  participant SenderDevice

  Caller->>BaileysStartupService: getBase64FromMediaMessage(msg)
  BaileysStartupService->>BaileysClient: downloadMediaMessage(msg, buffer, options, callbacks)
  BaileysClient->>WhatsAppCDN: GET media via directPath/url
  WhatsAppCDN-->>BaileysClient: 200 OK media bytes
  BaileysClient-->>BaileysStartupService: Buffer
  BaileysStartupService-->>Caller: base64

  alt CDN expired or 404/410
    BaileysClient-->>BaileysStartupService: error (Boom with output.statusCode)
    BaileysStartupService->>BaileysStartupService: catch downloadMediaMessage error
    BaileysStartupService->>BaileysStartupService: log Download Media failed

    BaileysStartupService->>BaileysClient: updateMediaMessage(key, message)
    BaileysStartupService-->>BaileysStartupService: 30s timeout guard

    alt SenderDevice online and has file
      BaileysClient->>SenderDevice: Request media reupload
      SenderDevice->>WhatsAppCDN: Upload fresh media
      BaileysClient-->>BaileysStartupService: updatedMsg with fresh CDN URL
      BaileysStartupService->>BaileysClient: downloadMediaMessage(updatedMsg, buffer, options, callbacks)
      BaileysClient->>WhatsAppCDN: GET media via new URL
      WhatsAppCDN-->>BaileysClient: 200 OK media bytes
      BaileysClient-->>BaileysStartupService: Buffer
      BaileysStartupService->>BaileysStartupService: log success after updateMediaMessage
      BaileysStartupService-->>Caller: base64
    else updateMediaMessage fails or times out
      BaileysClient-->>BaileysStartupService: error from updateMediaMessage
      BaileysStartupService->>BaileysStartupService: log updateMediaMessage failure
      BaileysStartupService->>BaileysStartupService: wait 5s
      BaileysStartupService->>BaileysClient: downloadContentFromMessage(mediaKey, directPath, url, mediaType)
      BaileysClient->>WhatsAppCDN: GET media via constructed URL
      alt Fallback succeeds
        WhatsAppCDN-->>BaileysClient: media stream
        BaileysClient-->>BaileysStartupService: async chunks
        BaileysStartupService->>BaileysStartupService: concatenate chunks to Buffer
        BaileysStartupService->>BaileysStartupService: log fallback success
        BaileysStartupService-->>Caller: base64
      else Fallback fails
        WhatsAppCDN-->>BaileysClient: NOT_FOUND or error
        BaileysClient-->>BaileysStartupService: error
        BaileysStartupService->>BaileysStartupService: log fallback failure
        BaileysStartupService-->>Caller: error propagated
      end
    end
  end
Loading

Class diagram for BaileysStartupService media download and reupload handling

classDiagram
  class BaileysStartupService {
    - client BaileysClient
    - logger Logger
    + getBase64FromMediaMessage(msg)
    + mapMediaType(mediaType)
  }

  class BaileysClient {
    + updateMediaMessage(key, message)
    + downloadMediaMessage(message, type, options, callbacks)
    + downloadContentFromMessage(mediaDescriptor, mediaType, options)
  }

  class Logger {
    + info(message)
    + error(message)
  }

  class WhatsAppMessage {
    + key Key
    + message MediaContainer
  }

  class MediaContainer {
    + imageMessage MediaMessage
    + videoMessage MediaMessage
    + audioMessage MediaMessage
    + documentMessage MediaMessage
  }

  class MediaMessage {
    + mediaKey Uint8Array
    + directPath string
    + url string
  }

  BaileysStartupService --> BaileysClient : uses
  BaileysStartupService --> Logger : logs
  BaileysStartupService --> WhatsAppMessage : processes
  WhatsAppMessage --> MediaContainer : has
  MediaContainer --> MediaMessage : contains
Loading

Flow diagram for mediaKey Uint8Array reconstruction from JSONB or base64

flowchart TD
  A["Start with mediaMessage.mediaKey"] --> B{Type of mediaKey}

  B -->|String| C["Treat as base64 string"]
  C --> D["Decode with Buffer.from(mediaKey, 'base64')"]
  D --> E["Wrap in new Uint8Array(buffer)"]
  E --> Z["Assign to msg.message[mediaType].mediaKey"]

  B -->|Object and not Buffer and not Uint8Array| F["JSONB plain object {0:b0,1:b1,...}"]
  F --> G["Extract keys: Object.keys(keyObj)"]
  G --> H["Sort keys numerically: keys.sort((a,b) => parseInt(a) - parseInt(b))"]
  H --> I["Map sorted keys to byte values: keys.map(k => keyObj[k])"]
  I --> J["Create Uint8Array from ordered byte array"]
  J --> Z

  B -->|Buffer or Uint8Array or other| K["Assume already valid binary representation"]
  K --> Z

  Z --> L["Use mediaKey to decrypt and download media via Baileys"]
Loading

File-Level Changes

Change Details Files
Fix Uint8Array/mediaKey reconstruction to be deterministic and compatible with both base64 strings and PostgreSQL JSONB objects.
  • Treat string mediaKey as base64 and convert using Buffer.from(..., 'base64') into a Uint8Array.
  • Detect non-Buffer, non-Uint8Array object mediaKey values (from JSONB) and reconstruct the byte array by numerically sorting object keys before mapping to a Uint8Array.
  • Preserve existing behavior for already-correct Buffer/Uint8Array mediaKey values.
src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts
Work around Baileys RC9 dead reuploadRequest path by explicitly requesting media re-upload and retrying download, with logging and a fallback path.
  • On downloadMediaMessage failure, log the Baileys RC9 bug context and attempt an explicit updateMediaMessage call with a 30-second timeout.
  • If updateMediaMessage resolves, retry downloadMediaMessage against the updated message and log success; propagate logger configuration as before.
  • If updateMediaMessage fails or times out, fall back to the previous downloadContentFromMessage flow with delay, chunk aggregation into a Buffer, and success/failure logging.
  • Retain and extend error logging so failures in both re-upload and fallback paths are visible.
src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The mediaKey object branch should explicitly guard against null/undefined before calling Object.keys, since typeof mediaMessage['mediaKey'] === 'object' is true for null and will currently throw at runtime.
  • In the Promise.race around updateMediaMessage, consider capturing and clearing the timeout handle once either branch resolves to avoid accumulating stray timers on repeated calls.
  • When reconstructing the Uint8Array from the JSONB mediaKey object, it might be safer to validate or filter keys that do not parse to a finite integer before sorting, so unexpected keys cannot silently affect byte ordering.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `mediaKey` object branch should explicitly guard against `null`/`undefined` before calling `Object.keys`, since `typeof mediaMessage['mediaKey'] === 'object'` is true for `null` and will currently throw at runtime.
- In the `Promise.race` around `updateMediaMessage`, consider capturing and clearing the timeout handle once either branch resolves to avoid accumulating stray timers on repeated calls.
- When reconstructing the `Uint8Array` from the JSONB `mediaKey` object, it might be safer to validate or filter keys that do not parse to a finite integer before sorting, so unexpected keys cannot silently affect byte ordering.

## Individual Comments

### Comment 1
<location path="src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts" line_range="3896-3897" />
<code_context>
+        // base64-encoded string (e.g. from HTTP request body) → Uint8Array
+        // This matches OwnPilot retryMediaFromMetadata: new Uint8Array(Buffer.from(base64, 'base64'))
+        msg.message[mediaType].mediaKey = new Uint8Array(Buffer.from(mediaMessage['mediaKey'], 'base64'));
+      } else if (
+        typeof mediaMessage['mediaKey'] === 'object' &&
+        !Buffer.isBuffer(mediaMessage['mediaKey']) &&
+        !(mediaMessage['mediaKey'] instanceof Uint8Array)
</code_context>
<issue_to_address>
**issue:** Guard against `mediaKey` being `null` before treating it as a plain object.

Since `typeof null === 'object'`, a `mediaKey` of `null` will hit this branch, be cast to `Record<string, number>`, and cause `Object.keys(keyObj)` to throw. If `null` is a possible value (e.g. malformed input or partial DB data), add an explicit non-null check or a more specific guard (e.g. `mediaMessage['mediaKey'] && typeof mediaMessage['mediaKey'] === 'object'`, or an `Array.isArray`/`instanceof` check) before entering this path.
</issue_to_address>

### Comment 2
<location path="src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts" line_range="3909" />
<code_context>
           { logger: P({ level: 'error' }) as any, reuploadRequest: this.client.updateMediaMessage },
         );
       } catch {
-        this.logger.error('Download Media failed, trying to retry in 5 seconds...');
-        await new Promise((resolve) => setTimeout(resolve, 5000));
-        const mediaType = Object.keys(msg.message).find((key) => key.endsWith('Message'));
-        if (!mediaType) throw new Error('Could not determine mediaType for fallback');
+        this.logger.error('Download Media failed, attempting explicit updateMediaMessage (Baileys RC9 reuploadRequest bug workaround)...');

+        // Baileys RC9 bug: reuploadRequest callback never triggers because downloadMediaMessage
</code_context>
<issue_to_address>
**suggestion:** Capture and log the original error in the outer `catch` for better debuggability.

The empty outer `catch {}` hides the original `downloadMediaMessage` error, so logs will only show failures in the reupload/fallback paths and obscure systemic issues in the initial download (auth/CDN/parsing, etc.). Please change this to `catch (err)` and log `err` (including the stack) with the explanatory message.

```suggestion
      let buffer: Buffer;
          { logger: P({ level: 'error' }) as any, reuploadRequest: this.client.updateMediaMessage },
        );
      } catch (err) {
        this.logger.error(
          'Download Media failed, attempting explicit updateMediaMessage (Baileys RC9 reuploadRequest bug workaround)...',
          err,
        );
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +3896 to +3897
} else if (
typeof mediaMessage['mediaKey'] === 'object' &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Guard against mediaKey being null before treating it as a plain object.

Since typeof null === 'object', a mediaKey of null will hit this branch, be cast to Record<string, number>, and cause Object.keys(keyObj) to throw. If null is a possible value (e.g. malformed input or partial DB data), add an explicit non-null check or a more specific guard (e.g. mediaMessage['mediaKey'] && typeof mediaMessage['mediaKey'] === 'object', or an Array.isArray/instanceof check) before entering this path.

msg.message[mediaType].mediaKey = new Uint8Array(sortedKeys.map((k) => keyObj[k]));
}

let buffer: Buffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Capture and log the original error in the outer catch for better debuggability.

The empty outer catch {} hides the original downloadMediaMessage error, so logs will only show failures in the reupload/fallback paths and obscure systemic issues in the initial download (auth/CDN/parsing, etc.). Please change this to catch (err) and log err (including the stack) with the explanatory message.

Suggested change
let buffer: Buffer;
let buffer: Buffer;
{ logger: P({ level: 'error' }) as any, reuploadRequest: this.client.updateMediaMessage },
);
} catch (err) {
this.logger.error(
'Download Media failed, attempting explicit updateMediaMessage (Baileys RC9 reuploadRequest bug workaround)...',
err,
);

CyPack and others added 2 commits March 8, 2026 20:14
Two new endpoints mirroring OwnPilot's media recovery algorithms:

POST /chat/retryMediaFromMetadata/{instance}
  - Accepts mediaKey (base64), directPath, url, participant directly
  - No DB lookup needed — works even if message not in EA's Message table
  - Same algorithm as OwnPilot retryMediaFromMetadata():
    Step 1: direct downloadMediaMessage
    Step 2: explicit updateMediaMessage [30s timeout] → retry download

POST /chat/fetchGroupHistory/{instance}
  - Triggers sock.fetchMessageHistory() for a group JID
  - WhatsApp delivers old message protos (with fresh mediaKey/directPath)
    via messaging-history.set event → stored in EA's Message table
  - Rate-limited: 1 call/30s (ban risk)
  - This is how OwnPilot recovers mediaKeys for messages missing from EA DB

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… CDN recovery

POST /chat/batchRecoverMedia/{instance}
- Takes array of messageIds, fetches metadata from EA's own Message table
- Downloads via retryMediaFromMetadata (direct → updateMediaMessage fallback)
- Uploads recovered buffer to MinIO + upserts media record + updates message.mediaUrl
- continueOnError=true for resilient batch processing
- storeToMinIO=true (default) for full end-to-end pipeline

Tested: 310/320 expired SOR files recovered in <5min without OwnPilot.
3 skipped (already stored), 7 irrecoverable (sender permanently offline).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@CyPack CyPack changed the title fix(media): correct mediaKey Uint8Array conversion + explicit updateMediaMessage for Baileys RC9 bug feat(media): full expired CDN recovery pipeline — retryMediaFromMetadata, fetchGroupHistory, batchRecoverMedia Mar 8, 2026
Agent-friendly reference for the 3 new endpoints + 2 bug fixes:
- retryMediaFromMetadata: metadata-based download without DB lookup
- fetchGroupHistory: WhatsApp on-demand history sync trigger
- batchRecoverMedia: end-to-end batch recovery pipeline (download → MinIO → DB)

Includes: algorithm details, edge cases, JSONB mediaKey sort fix,
Baileys RC9 reuploadRequest workaround, env prerequisites,
iterative fetching pattern, and production results (1132/1137 SOR recovered).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant