Skip to content

Add 0.49 ci#75

Merged
zhoucai-pingcap merged 35 commits into0.49from
add-0.49-ci
Mar 9, 2026
Merged

Add 0.49 ci#75
zhoucai-pingcap merged 35 commits into0.49from
add-0.49-ci

Conversation

@zhoucai-pingcap
Copy link
Collaborator

Summary

This PR introduces several new Vector extension components, enhances existing ones, and adds comprehensive unit test coverage:

New Sinks

  • TiDB sink (src/sinks/tidb/): Write log events to MySQL/TiDB databases with auto-create table, batch inserts, schema-aware type mapping, boolean/numeric value sanitization, and string truncation
  • S3 Content Partitioned sink (src/sinks/s3_content_partitioned/): Upload log content to S3 partitioned by component and hour_partition fields, with optional gzip compression and configurable part size

New Sources

  • Delta Lake Watermark source (src/sources/delta_lake_watermark/): Read Delta Lake tables via DuckDB with watermark-based incremental sync, checkpoint persistence, and configurable query patterns
  • File List source (src/sources/file_list/): List and stream files from S3/object stores with built-in log line parsing (Python logging, HTTP access log), path resolution for known o11y data types (raw_logs, slowlog, sql_statement, top_sql, conprof), and checkpoint-based resume

Enhancements

  • Conprof source: Added K8s topology fetch mode, native jeprof profiling tool support, and new prof modes
  • TopSQL source: Fixed topsql_instance handling for dedicated clusters
  • Stream processing: Multiple performance improvements for file streaming

CI & Infrastructure

  • Added test coverage workflow (.github/workflows/test_coverage.yml)
  • Updated build image workflow with improved architecture support
  • Added demo configurations, scripts, and documentation for data sync scenarios

Unit Tests (49 new tests)

  • TiDB sink (29 tests): infer_mysql_type, escape_ident, convert_bool_string_for_tinyint, extract_max_length, sanitize_numeric_value, is_numeric_column, is_table_not_found_error
  • S3 Content Partitioned sink (12 tests): object_key generation, key_from_event extraction, message_bytes handling, PartitionKey equality
  • File List checkpoint (8 tests): save/load cycle, completed key tracking, path sanitization, error handling, nested directory creation

All 448 tests pass (including the 49 new ones).

Test Plan

  • All unit tests pass (cargo test --lib → 448 passed, 0 failed)
  • TiDB sink: type inference, boolean conversion, numeric sanitization, max length extraction
  • S3 sink: object key generation, event field extraction, partition equality
  • File list checkpoint: persistence, corruption recovery, URL sanitization
  • Existing tests unaffected by changes

- TiDB sink: 29 tests for infer_mysql_type, escape_ident, convert_bool_string_for_tinyint,
  extract_max_length, sanitize_numeric_value, is_numeric_column, is_table_not_found_error
- S3 content partitioned sink: 12 tests for object_key, key_from_event, message_bytes, PartitionKey
- file_list checkpoint: 8 tests for save/load, completed tracking, path sanitization, error handling

Made-with: Cursor
@pingcap-cla-assistant
Copy link

pingcap-cla-assistant bot commented Mar 5, 2026

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot bot added the size/XXL label Mar 5, 2026
@zhoucai-pingcap zhoucai-pingcap merged commit b0a3aaa into 0.49 Mar 9, 2026
2 checks passed
@zhoucai-pingcap zhoucai-pingcap deleted the add-0.49-ci branch March 9, 2026 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants