encode_opentelemetry: add cut off for otel payloads for prometheus mimir by cosmo0920 · Pull Request #223 · fluent/cmetrics

cosmo0920 · 2024-09-25T06:43:14Z

This issue is reported in fluent/fluent-bit#9400.

This is because Prometheus mimir limits the metrics' timestamps within 5 minutes in the same batch:
https://github.com/grafana/mimir/blob/main/pkg/distributor/distributor.go#L1010-L1020

In Promethemus mimir, it requests to limit for 5 minutes in the same batch: https://github.com/grafana/mimir/blob/main/pkg/distributor/distributor.go#L1010-L1020 Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

edsiper · 2024-09-26T18:12:28Z

what is the side effect of this for other endpoints/users ? is it ok to remove metrics for everybody ?

ElectricWeasel · 2024-09-27T09:01:48Z

A far I investigated fluent-bit is repeating infinitely (until restarted) metrics from devices or mounts that no longer exist:

  | Sep 27, 2024 @ 10:37:02.140 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:50:15.946Z and is from series node_filesystem_free_bytes{device="tmpfs", fstype="tmpfs", host_name="petra6.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
  | Sep 27, 2024 @ 10:36:48.274 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:49:17.062Z and is from series node_filesystem_free_bytes{device="tmpfs", fstype="tmpfs", host_name="petra5.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
  | Sep 27, 2024 @ 10:36:41.445 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:44:55.162Z and is from series node_filesystem_free_bytes{device="tmpfs", fstype="tmpfs", host_name="petra2.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
  | Sep 27, 2024 @ 10:36:32.213 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:40:47.164Z and is from series node_filesystem_device_error{device="tmpfs", fstype="tmpfs", host_name="petra1.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
  | Sep 27, 2024 @ 10:36:18.366 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:40:47.164Z and is from series node_filesystem_size_bytes{device="tmpfs", fstype="tmpfs", host_name="petra1.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
  | Sep 27, 2024 @ 10:36:17.153 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:50:15.946Z and is from series node_filesystem_free_bytes{device="tmpfs", fstype="tmpfs", host_name="petra6.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
  | Sep 27, 2024 @ 10:36:03.301 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:48:17.259Z and is from series node_filesystem_avail_bytes{device="tmpfs", fstype="tmpfs", host_name="petra4.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
  | Sep 27, 2024 @ 10:35:53.855 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:51:37.82Z and is from series node_filesystem_free_bytes{device="tmpfs", fstype="tmpfs", host_name="petra7.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)
  | Sep 27, 2024 @ 10:35:48.239 | user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-24T11:49:17.062Z and is from series node_filesystem_size_bytes{device="tmpfs", fstype="tmpfs", host_name="petra5.vrit.dev", metrics_agent="fluent-bit", metrics_source="host-metrics", mountpoint="/run/user/2137"} (sampled 1/10)

Trying to push metrics from 3 days ago... (tmpfs filesystem after user session)
I don't think anyone can benefit from this.

Regards
Rafał

cosmo0920 · 2024-09-27T09:36:31Z

Trying to push metrics from 3 days ago... (tmpfs filesystem after user session) I don't think anyone can benefit from this.

Regards Rafał

Just for confirming that this your log is applied this patch or not?

ElectricWeasel · 2024-09-27T09:50:38Z

Trying to push metrics from 3 days ago... (tmpfs filesystem after user session) I don't think anyone can benefit from this.
Regards Rafał

Just for confirming that this your log is applied this patch or not?

Ah sorry, i'ts a standard 3.1.2 version, I can try to compile from this branch and confirm.

Regards
Rafał

…led otel payloads Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

cosmo0920 · 2024-09-30T06:40:39Z

what is the side effect of this for other endpoints/users ? is it ok to remove metrics for everybody ?

I added APIs to specify cutoff options. This could be avoiding breaking changes for users who are using otel encoding.

src/cmt_encode_opentelemetry.c

…ecific one Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

…ting Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

Brodiemm · 2024-10-23T23:52:34Z

Is this being planned in for a release soon? Any other testing etc. that is needed?

cosmo0920 · 2024-10-24T01:15:33Z

I believe so. But even if it will be merged into fluent-bit tree, there is more works for implementing the cutoff related parameters on out_opentelemetry.

cosmo0920 added 3 commits September 25, 2024 15:24

encode_opentelemetry: Add encoding cut off for Prometheus Mimir

c3d4095

In Promethemus mimir, it requests to limit for 5 minutes in the same batch: https://github.com/grafana/mimir/blob/main/pkg/distributor/distributor.go#L1010-L1020 Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

tests: decoding: Follow the cut off change

38b1ee4

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

encode_opentelemetry: Plug memory leaks on cutoff

5cade1b

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

edsiper mentioned this pull request Sep 26, 2024

node_metrics output plugin producing stale data for no longer existent network devices fluent/fluent-bit#9400

Closed

encode_opentelemetry: tests: Add a mechanism for opt-in to cutoff sta…

f6f0e93

…led otel payloads Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

edsiper requested changes Sep 30, 2024

View reviewed changes

src/cmt_encode_opentelemetry.c Outdated Show resolved Hide resolved

encode_opentelemetry: tests: Make a generic opts instead of cutoff sp…

92a16b9

…ecific one Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

cosmo0920 force-pushed the cosmo0920-add-cut-off-for-otel-payloads-for-prometheus-mimir branch from faab663 to 6c74f7e Compare October 15, 2024 06:29

encode_opentelemetry: Encode otel payloads if valid contexts are exis…

ed94318

…ting Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>

cosmo0920 force-pushed the cosmo0920-add-cut-off-for-otel-payloads-for-prometheus-mimir branch from 6c74f7e to ed94318 Compare October 15, 2024 06:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encode_opentelemetry: add cut off for otel payloads for prometheus mimir#223

encode_opentelemetry: add cut off for otel payloads for prometheus mimir#223
cosmo0920 wants to merge 6 commits intomasterfrom
cosmo0920-add-cut-off-for-otel-payloads-for-prometheus-mimir

cosmo0920 commented Sep 25, 2024

Uh oh!

edsiper commented Sep 26, 2024

Uh oh!

ElectricWeasel commented Sep 27, 2024

Uh oh!

cosmo0920 commented Sep 27, 2024

Uh oh!

ElectricWeasel commented Sep 27, 2024

Uh oh!

cosmo0920 commented Sep 30, 2024

Uh oh!

Uh oh!

Brodiemm commented Oct 23, 2024

Uh oh!

cosmo0920 commented Oct 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cosmo0920 commented Sep 25, 2024

Uh oh!

edsiper commented Sep 26, 2024

Uh oh!

ElectricWeasel commented Sep 27, 2024

Uh oh!

cosmo0920 commented Sep 27, 2024

Uh oh!

ElectricWeasel commented Sep 27, 2024

Uh oh!

cosmo0920 commented Sep 30, 2024

Uh oh!

Uh oh!

Brodiemm commented Oct 23, 2024

Uh oh!

cosmo0920 commented Oct 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants