pkg/chipingress: batch client with metrics and request splitting by pkcll · Pull Request #2058 · smartcontractkit/chainlink-common

pkcll · 2026-05-14T17:04:20Z

Summary

Add OTel metrics to the batch client, fix shutdown behavior, and harden tests.

Metrics

chip_ingress.batch.send_requests_total — counter with status=success|failure attribute
chip_ingress.batch.request_size_messages — histogram with batch_size attribute
chip_ingress.batch.request_size_bytes — histogram with max_grpc_request_size_bytes attribute
chip_ingress.batch.request_latency_ms — histogram with status attribute
chip_ingress.batch.config.info — gauge recording batch configuration at startup

Shutdown improvements

Close the underlying chipingress.Client in Stop()
Use a standalone timeout context for the shutdown drain so it is not cancelled prematurely by close(stopCh)
Remove closeOnce guard from client.Close (shutdownOnce already serialises)

Bug fixes

Pass caller-provided ctx to recordConfig instead of context.Background()
Remove redundant send_failures_total counter; send_requests_total with its status label already captures failure counts
Remove misleading comment on maxGRPCRequestSize default (10MB matches the chip-ingress server MaxRecvMsgSize, not the client-side 16MB maxMessageSize used for MaxCallRecvMsgSize)

Test improvements

Replace nil client in 17 test call sites with mocks.NewClient(t) — nil is not a valid chipingress.Client and would panic on Stop()
Add WithMaxGRPCRequestSize option and oversize-event metrics test
Add config.info gauge assertion

github-actions · 2026-05-14T17:05:46Z

✅ API Diff Results - `github.com/smartcontractkit/chainlink-common/pkg/chipingress`

✅ Compatible Changes (1)

`batch` (1)

WithMaxGRPCRequestSize — ➕ Added

📄 View full apidiff report

hendoxc

can we collapse send_requests_total (does it need to be prefixed like chip_ingress_batch_client_send_requests_total) and add a status label, so we have a single metric for failed and success sends

Copilot

Pull request overview

This PR enhances the pkg/chipingress/batch client to be more production-ready by adding OpenTelemetry metrics, enforcing/splitting by gRPC request size, and improving shutdown behavior by closing the underlying chipingress.Client and using a standalone shutdown timeout context.

Changes:

Added OTel metrics for batch send counts, failures, sizes, latency, and a config info gauge.
Implemented request splitting/rejection based on serialized PublishBatch request size, plus regression tests.
Updated shutdown to close the underlying chip ingress client and to use a standalone timeout context during drain.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
pkg/chipingress/go.mod	Adds direct OTel deps needed for new metrics + metric test utilities.
pkg/chipingress/client_test.go	Adds a Close() test for the chip ingress client.
pkg/chipingress/batch/client.go	Adds metrics, request-size splitting/rejection, and updates shutdown/close behavior.
pkg/chipingress/batch/client_test.go	Adds metric tests, oversize/splitting regression tests, and Stop() close expectations.

Comments suppressed due to low confidence (1)

pkg/chipingress/batch/client.go:346

The docstring for WithMaxGRPCRequestSize says it’s only used for metric comparison attributes, but it now also drives request splitting and oversize rejection in sendBatch. Please update the comment to reflect the behavioral impact so callers don’t treat it as metrics-only.

// WithMaxGRPCRequestSize sets the max gRPC request size in bytes used for metric comparison attributes.
func WithMaxGRPCRequestSize(maxReqSize int) Opt {
	return func(c *Client) {
		c.maxGRPCRequestSize = maxReqSize
	}
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkcll · 2026-05-14T21:16:20Z

 		client:             client,
 		log:                zap.NewNop().Sugar(),
 		batchSize:          10,
+		maxGRPCRequestSize: 10 * 1024 * 1024, // Match chipingress maxMessageSize default.


Updated misleading comment

pkcll · 2026-05-14T19:45:57Z

+		if err := b.client.Close(); err != nil {
+			b.log.Warnw("failed to close chip ingress client", "error", err)


it doesn't need a nil guard

+	var batches [][]*messageWithCallback
+	current := make([]*messageWithCallback, 0, len(messages))
+	for _, msg := range messages {
+		candidate := append(current, msg)
+		_, candidateBytes := newBatchRequest(candidate)
+		if len(current) > 0 && candidateBytes > maxRequestSize {
+			batches = append(batches, current)
+			current = []*messageWithCallback{msg}
+			continue
+		}
+		current = candidate
+	}


pkcll · 2026-05-14T21:16:42Z

 	b.shutdownOnce.Do(func() {
-		ctx, cancel := b.stopCh.CtxWithTimeout(b.shutdownTimeout)
+		// Use a standalone timeout context so the shutdown wait isn't cancelled
+		// by close(b.stopCh) below.
+		ctx, cancel := context.WithTimeout(context.Background(), b.shutdownTimeout)
 		defer cancel()

 		if b.cancelBatcher != nil {


Introduce ChipIngressBatchEmitterService backed by chipingress batch client, managed as a sub-service of the beholder Client via services.Engine. Refactor DualSourceEmitter to delegate fire-and-forget to ChipIngressEmitter. Pass explicit loggers throughout instead of creating new ones internally. Add batch emitter config fields, feature flag, tests, and benchmarks. Update pkg/loop server and config to propagate batch emitter settings. Depends on #2058 (pkg/chipingress batch client metrics).

Add observability metrics to the batch client using OpenTelemetry: - send_requests_total (counter with status=success|failure attribute) - request_size_messages (histogram with batch_size attribute) - request_size_bytes (histogram with max_grpc_request_size_bytes attribute) - request_latency_ms (histogram with status attribute) - config.info (gauge recording batch configuration at startup) Shutdown improvements: - Close the underlying chipingress.Client in Stop() - Use a standalone timeout context for the shutdown drain so it is not cancelled prematurely by close(stopCh) - Remove closeOnce guard from client.Close (shutdownOnce already serialises) Bug fixes: - Pass caller-provided ctx to recordConfig instead of context.Background() - Remove redundant send_failures_total counter; send_requests_total with its status label already captures failure counts - Remove misleading comment on maxGRPCRequestSize default (10MB matches the chip-ingress server MaxRecvMsgSize, not the client-side 16MB maxMessageSize used for MaxCallRecvMsgSize) Test improvements: - Replace nil client in 17 test call sites with mocks.NewClient(t); nil is not a valid chipingress.Client and would panic on Stop() - Add WithMaxGRPCRequestSize option and oversize-event metrics test - Add config.info gauge assertion

pkcll mentioned this pull request May 14, 2026

pkg/beholder: add batch emitter service with service-engine lifecycle #2059

Draft

pkcll force-pushed the infoplat-3436-chipingress-batching-part-1 branch 3 times, most recently from 9e14f91 to 73fc379 Compare May 14, 2026 17:32

pkcll changed the title ~~pkg/chipingress: add batch client metrics, close underlying client on Stop~~ pkg/chipingress: batch client with metrics and request splitting May 14, 2026

pkcll force-pushed the infoplat-3436-chipingress-batching-part-1 branch from 73fc379 to b54eb82 Compare May 14, 2026 17:34

pkcll marked this pull request as ready for review May 14, 2026 17:35

pkcll requested a review from a team as a code owner May 14, 2026 17:35

Copilot AI review requested due to automatic review settings May 14, 2026 17:35

product-security-plaid-production Bot requested a review from thomaska May 14, 2026 17:35

Copilot started reviewing on behalf of pkcll May 14, 2026 17:36 View session

pkcll requested review from hendoxc and jmank88 May 14, 2026 17:38

hendoxc reviewed May 14, 2026

View reviewed changes

Comment thread pkg/chipingress/batch/client.go Outdated

Copilot AI reviewed May 14, 2026

View reviewed changes

pkcll force-pushed the infoplat-3436-chipingress-batching-part-1 branch from b54eb82 to 044d3dd Compare May 14, 2026 17:54

jmank88 previously approved these changes May 14, 2026

View reviewed changes

pkcll dismissed jmank88’s stale review via 26ff00e May 14, 2026 21:19

pkcll force-pushed the infoplat-3436-chipingress-batching-part-1 branch from 712898c to 09be58d Compare May 14, 2026 21:35

pkcll force-pushed the infoplat-3436-chipingress-batching-part-1 branch from 09be58d to fddcb86 Compare May 14, 2026 21:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/chipingress: batch client with metrics and request splitting#2058

pkg/chipingress: batch client with metrics and request splitting#2058
pkcll wants to merge 1 commit into
mainfrom
infoplat-3436-chipingress-batching-part-1

pkcll commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

hendoxc left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

pkcll May 14, 2026

Uh oh!

pkcll May 14, 2026

Uh oh!

pkcll May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if err := b.client.Close(); err != nil {
		b.log.Warnw("failed to close chip ingress client", "error", err)

Conversation

pkcll commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Metrics

Shutdown improvements

Bug fixes

Test improvements

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ API Diff Results - github.com/smartcontractkit/chainlink-common/pkg/chipingress

✅ Compatible Changes (1)

batch (1)

Uh oh!

hendoxc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

pkcll May 14, 2026

Choose a reason for hiding this comment

Uh oh!

pkcll May 14, 2026

Choose a reason for hiding this comment

Uh oh!

pkcll May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pkcll commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading

✅ API Diff Results - `github.com/smartcontractkit/chainlink-common/pkg/chipingress`

`batch` (1)

pkcll May 14, 2026 •

edited

Loading