Skip to content

[Bug] OTLP gRPC metrics export fails when payload exceeds 32MB limit due to high-cardinality metrics #10240

@Houlong66

Description

@Houlong66

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.
  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

Linux / JDK 8+

RocketMQ version

5.3.1

Describe the Bug

When high-cardinality metrics (e.g., consumer lag with consumer_group × topic combinations) are exported via OTLP gRPC, the payload can exceed the gRPC 32MB message size limit, causing all metrics to fail to export with:

WARNING: Failed to export metrics. Server responded with gRPC status code 8.
Error message: grpc: received message larger than max (33568683 vs. 33554432)

Additionally, even when gzip compression is enabled, a single MetricData can contain tens of thousands of data points (e.g., 83,000+), producing a payload that exceeds backend per-RPC processing limits after decompression.

The OpenTelemetry Java SDK does not support automatic batch splitting for metrics export (see opentelemetry-java#5394).

Steps to Reproduce

  1. Configure a Broker with OTLP gRPC metrics export (metricsExporterType=OTLP_GRPC)
  2. Have a large number of consumer groups and topics (e.g., 20,000+ topics, 20,000+ subscriptions)
  3. Observe that metrics export fails with gRPC RESOURCE_EXHAUSTED or UNKNOWN errors

What Did You Expect to See?

All metrics should be exported successfully regardless of cardinality.

What Did You See Instead?

All metrics export fails every export cycle (default 60s), resulting in complete loss of observability.

Additional Context

This issue is tracked upstream in the OpenTelemetry Java SDK as opentelemetry-java#5394. The Python SDK already supports max_export_batch_size, but the Java SDK does not yet have this feature.

Fix: #10239

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions