Skip to content

sink(cloudstorage): add use-table-id-as-path option#4356

Open
yinshuangfei wants to merge 9 commits intopingcap:masterfrom
yinshuangfei:cdc_path_table_id
Open

sink(cloudstorage): add use-table-id-as-path option#4356
yinshuangfei wants to merge 9 commits intopingcap:masterfrom
yinshuangfei:cdc_path_table_id

Conversation

@yinshuangfei
Copy link

@yinshuangfei yinshuangfei commented Mar 5, 2026

What problem does this PR solve?

Issue Number: close #4357

What is changed and how it works?

The 'use-table-id-as-path' configuration option ONLY applies to TICI.

  • Adds config use-table-id-as-path.
  • Adds use_table_id_as_path into API conversion and sink config parsing.
  • Remove partition_id from the path to prevent duplicates.

In this mode, we adjust cloud storage path generation to omit schema when table-id-as-path is enabled and skip DB schema writes, that is:

  • Adding this configuration removes the 'database' prefix.
  • Automatically filtering database-related events.

For example:

The use-table-id-as-path option switches the path to use table_id instead of table_name when it set to true.
With configuration use-table-id-as-path=true in sink uri, for example: --sink-uri="s3://cdc&use-table-id-as-path=true", the cdc path changed from

test_db/table_name/5/2024-01-01/CDC_xxx.json

to

12345/5/2024-01-01/CDC_xxx.json

The reason for this design is:

  • To prevent residual data from being affected by creating a table with the same name after dropping a table.
  • Database events are not processed by TCI;
  • Retaining database events would create a database directory and schema files, increasing the workload of subsequent GC S3 directory creation.

Check List

Tests

  • Unit test

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

  • New Features

    • Added optional use-table-id-as-path configuration parameter for cloud storage sink, enabling file organization by numeric table IDs instead of table names.
  • Bug Fixes

    • Enhanced validation for exchange partition operations to properly detect and handle missing source table information.
  • Tests

    • Added test coverage for table ID-based path generation and exchange partition error handling.

Summary by CodeRabbit

  • New Features

    • Added a use-table-id-as-path option to organize cloud storage schema/data by table ID.
  • Bug Fixes

    • Validates and rejects malformed exchange-partition DDL events.
    • Skips database-level schema events when table-id-as-path is enabled and table name is absent.
    • Improved path generation and index-file error handling to avoid invalid writes.
  • Tests

    • Added tests covering table-ID path behavior and exchange-partition validation.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 5, 2026
@pingcap-cla-assistant
Copy link

pingcap-cla-assistant bot commented Mar 5, 2026

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 5, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the cloud storage sink by introducing a new configuration option, use-table-id-as-path. This option provides users with the flexibility to specify whether the output directory structure for table-related data should utilize the table's unique ID instead of its name, which can be beneficial for data management and consistency. The changes involve updating the sink's file writing logic and path generation to respect this new setting, along with corresponding test coverage. Additionally, a critical fix for DDL event processing ensures that exchange partition operations are handled correctly by validating the presence of source table information.

Highlights

  • New Configuration Option: Introduced use-table-id-as-path for the cloud storage sink, allowing output paths to use table IDs instead of table names for better organization and uniqueness.
  • Enhanced Path Generation Logic: Updated internal file path generation functions to conditionally use the table's unique ID based on the new configuration setting.
  • Improved DDL Event Handling: Added validation for ActionExchangeTablePartition DDL events to ensure that source table information is always present, preventing potential errors.
  • Expanded Test Coverage: Included new unit tests to verify the functionality of table ID-based paths and the robustness of the DDL event validation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • downstreamadapter/sink/cloudstorage/sink.go
    • Validated ActionExchangeTablePartition DDL events for missing source table info.
    • Modified writeFile function to accept tableID and updated its calls.
  • downstreamadapter/sink/cloudstorage/sink_test.go
    • Added test for use-table-id-as-path option in DDL event writing.
    • Included test for invalid ActionExchangeTablePartition DDL events.
  • pkg/sink/cloudstorage/config.go
    • Added UseTableIDAsPath field to urlConfig and Config structs.
    • Implemented parsing of use-table-id-as-path from sink URI.
  • pkg/sink/cloudstorage/config_test.go
    • Updated tests to include use-table-id-as-path in URI parsing and config application.
  • pkg/sink/cloudstorage/path.go
    • Introduced generateTablePath to conditionally use table name or ID.
    • Updated CheckOrWriteSchema and generateDataDirPath to use the new path generation logic.
  • pkg/sink/cloudstorage/path_test.go
    • Added test for data file path generation using table ID.
  • pkg/sink/cloudstorage/table_definition.go
    • Modified GenerateSchemaFilePath to accept useTableIDAsPath and tableID parameters.
    • Updated schema file path generation to use table ID when configured.
  • pkg/sink/cloudstorage/table_definition_test.go
    • Updated tests for GenerateSchemaFilePath to cover the useTableIDAsPath option.
Activity
  • Implemented a new configuration option for cloud storage sink path generation.
  • Refactored DDL event handling to include robust validation for exchange partition operations.
  • Added comprehensive unit tests to ensure the correctness of the new feature and the DDL event fix.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5c32ac78-8167-4e49-8cad-4dc5d79e5d50

📥 Commits

Reviewing files that changed from the base of the PR and between 1cc3c85 and 5252b7d.

📒 Files selected for processing (1)
  • pkg/config/sink.go

📝 Walkthrough

Walkthrough

Adds a UseTableIDAsPath option and updates cloud storage path and schema generation, config/API plumbing, sink handling, tests, and index-path error handling to support using table IDs (and omitting schema directories) when enabled.

Changes

Cohort / File(s) Summary
Configuration & API
pkg/config/sink.go, pkg/sink/cloudstorage/config.go, pkg/sink/cloudstorage/config_test.go, api/v2/model.go, api/v2/model_test.go
Introduce UseTableIDAsPath in configs and API model, add SinkURI param parsing, compatibility checks, and propagate flag through API↔internal conversions and tests.
Path & Table-definition logic
pkg/sink/cloudstorage/path.go, pkg/sink/cloudstorage/path_test.go, pkg/sink/cloudstorage/table_definition.go, pkg/sink/cloudstorage/table_definition_test.go
Refactor path generation to accept useTableIDAsPath and tableID, add generateTablePath, change many path helpers to return (string, error), add validations, and update tests for table-ID path behavior.
Sink behavior & tests
downstreamadapter/sink/cloudstorage/sink.go, downstreamadapter/sink/cloudstorage/sink_test.go
Validate exchange-partition DDL source TableInfo, construct and write a separate source schema event, integrate new GenerateSchemaFilePath signature, skip database-level schema writes when table-id-as-path is enabled, and add tests (including invalid exchange-partition cases).
Writer error handling
downstreamadapter/sink/cloudstorage/writer.go
Capture and return errors from GenerateIndexFilePath instead of proceeding with a potentially invalid path.

Sequence Diagram(s)

sequenceDiagram
    participant Changefeed as Changefeed (producer)
    participant Sink as CloudStorageSink
    participant PathGen as PathGenerator (pkg/sink/cloudstorage)
    participant Storage as Cloud Storage / FS

    Changefeed->>Sink: DDL event (maybe exchange partition)
    Sink->>Sink: validate TableInfo / build sourceEvent
    Sink->>PathGen: GenerateSchemaFilePath(useTableIDAsPath, tableID)
    PathGen-->>Sink: schemaPath or error
    alt error
      Sink-->>Changefeed: return error
    else
      Sink->>Storage: write schema file at schemaPath
      Sink->>PathGen: GenerateDataFilePath(...)
      PathGen-->>Sink: dataPath or error
      Sink->>Storage: write data file at dataPath
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • wk989898
  • hongyunyan

Poem

🐰 I nibble paths of schema bright,
Swapping names for IDs at night,
I hop from config into tests,
Skip empty schemas, do my best,
A tiny hop — storage alight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'sink(cloudstorage): add use-table-id-as-path option' clearly and concisely summarizes the main change: adding a new configuration option for cloud storage sink path generation.
Description check ✅ Passed The PR description includes the required issue reference (close #4357), explains the changes and how they work, provides examples, and addresses the checklist items including tests. The description is substantially complete.
Linked Issues check ✅ Passed The PR implements all key requirements from issue #4357: adds use-table-id-as-path option, adjusts path generation to omit schema/DB when enabled, skips DB schema writes, applies to TICI only, and includes unit test coverage for validation.
Out of Scope Changes check ✅ Passed All code changes are directly related to the use-table-id-as-path feature scope: configuration handling, path generation logic, API conversions, validation, and corresponding test coverage. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a use-table-id-as-path option for the cloud storage sink, enabling paths to be constructed using table IDs instead of table names. While the implementation is well-tested and appears correct, a security audit identified that the use of log.Panic for handling invalid or unexpected input from external sources (DDL events or storage files) could lead to Denial of Service (DoS). It is recommended to replace panics with proper error handling to ensure the stability of the replication process. Additionally, there are minor refactoring opportunities to improve code reuse and test structure.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@downstreamadapter/sink/cloudstorage/sink_test.go`:
- Around line 289-293: The test is missing setup of the mock PD clock before
creating the sink; call appcontext.SetService to register the mock PD clock
service (same mock used in other tests) prior to invoking newSinkForTest so the
sink and any PD clock accessors see the mock; specifically, in the test where
ctx, cancel := context.WithCancel(context.Background()) and cloudStorageSink,
err := newSinkForTest(...) are created, insert the mock PD clock registration
(via appcontext.SetService(mockPDClock)) using the same mock instance used by
TestWriteDDLEvent/TestWriteDDLEventWithTableIDAsPath and ensure it is cleaned up
or reset after the test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ef1e6151-8c9d-4aad-8422-14f382fcb166

📥 Commits

Reviewing files that changed from the base of the PR and between 039417c and 74c09c3.

📒 Files selected for processing (8)
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/sink/cloudstorage/config.go
  • pkg/sink/cloudstorage/config_test.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/table_definition.go`:
- Around line 334-343: The GenerateSchemaFilePath function allows
useTableIDAsPath to produce a "0" path when tableID is zero; add a guard at the
top of GenerateSchemaFilePath to validate tableID (e.g., tableID > 0) when
useTableIDAsPath is true and return a clear error if invalid, before calling
generateTablePath or computing checksum, so callers fail fast instead of writing
into an ambiguous "0" directory; reference: GenerateSchemaFilePath,
useTableIDAsPath, tableID, and generateTablePath.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 402c67b5-c910-49f1-b230-ddc3b6062f96

📥 Commits

Reviewing files that changed from the base of the PR and between 74c09c3 and 9326b6d.

📒 Files selected for processing (2)
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/sink/cloudstorage/table_definition.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/table_definition_test.go`:
- Around line 524-529: The test assertions in table_definition_test.go expect
the old error text "tableID must be positive" but GenerateSchemaFilePath now
returns "invalid table id for table-id path"; update the two assertions that
call GenerateSchemaFilePath (the cases with tableID 0 and -1) to assert that
err.Error() contains "invalid table id for table-id path" instead of the old
string so the test matches the current implementation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 213ab268-deb6-4709-a28f-0f9b8b979e5f

📥 Commits

Reviewing files that changed from the base of the PR and between 7296adb and 1e5ab99.

📒 Files selected for processing (2)
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 5, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/sink/cloudstorage/table_definition_test.go (1)

522-529: Add one test that documents intended behavior when tableID is unset.

These assertions currently codify hard-fail on tableID == 0/-1. Please add a companion case that captures the intended production behavior for missing IDs (fallback vs fail-fast), so the contract stays explicit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sink/cloudstorage/table_definition_test.go` around lines 522 - 529, Add a
companion unit test for the missing/unset tableID case that calls
def.GenerateSchemaFilePath(false, 0) (i.e., use-table-id-as-path = false with
tableID == 0) and assert the function returns no error and a schema path that
uses the table name/default fallback rather than failing; this makes the
contract explicit alongside the existing failure assertions for
GenerateSchemaFilePath(true, 0) and GenerateSchemaFilePath(true, -1).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/table_definition.go`:
- Around line 355-360: The check that hard-fails when tableID <= 0 inside the
useTableIDAsPath branch should be changed to gracefully fall back to using the
table name path instead of returning
errors.ErrInternalCheckFailed.GenWithStackByArgs; update the logic around the
useTableIDAsPath && t.Table branch (referencing useTableIDAsPath, t.Table,
tableID, and the current error return) so that if tableID is unset/<=0 you skip
the error and proceed to generate a path based on t.Table (or the existing
name-based path logic) rather than aborting.

---

Nitpick comments:
In `@pkg/sink/cloudstorage/table_definition_test.go`:
- Around line 522-529: Add a companion unit test for the missing/unset tableID
case that calls def.GenerateSchemaFilePath(false, 0) (i.e., use-table-id-as-path
= false with tableID == 0) and assert the function returns no error and a schema
path that uses the table name/default fallback rather than failing; this makes
the contract explicit alongside the existing failure assertions for
GenerateSchemaFilePath(true, 0) and GenerateSchemaFilePath(true, -1).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 77b845f5-35c1-4489-aa13-e29feadc64c4

📥 Commits

Reviewing files that changed from the base of the PR and between 1e5ab99 and 3302635.

📒 Files selected for processing (3)
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

@tenfyzhong
Copy link
Collaborator

/check-issue-triage-complete

@yinshuangfei yinshuangfei changed the title sink(cloudstorage): add use-table-id-as-path option DNM: sink(cloudstorage): add use-table-id-as-path option Mar 5, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Mar 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign flowbehappy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yinshuangfei
Copy link
Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@yinshuangfei yinshuangfei changed the title DNM: sink(cloudstorage): add use-table-id-as-path option sink(cloudstorage): add use-table-id-as-path option Mar 6, 2026
@yinshuangfei
Copy link
Author

@coderabbitai resume

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/sink/cloudstorage/path.go (1)

398-418: ⚠️ Potential issue | 🟠 Major

Avoid duplicating the physical table ID in partition paths.

With both UseTableIDAsPath and EnablePartitionSeparator enabled, Line 400-403 and Line 417 both append tbl.TableNameWithPhysicTableID.TableID. For partitioned tables this generates <physicalTableID>/<version>/<physicalTableID>/..., so the extra segment stops carrying distinct partition information and changes the path layout unexpectedly.

💡 One safe direction if physical table ID is already the top-level key
-	if f.config.EnablePartitionSeparator && tbl.TableNameWithPhysicTableID.IsPartition {
+	if f.config.EnablePartitionSeparator &&
+		tbl.TableNameWithPhysicTableID.IsPartition &&
+		!f.config.UseTableIDAsPath {
 		elems = append(elems, fmt.Sprintf("%d", tbl.TableNameWithPhysicTableID.TableID))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sink/cloudstorage/path.go` around lines 398 - 418, The code appends the
physical TableID twice when both f.config.UseTableIDAsPath and
f.config.EnablePartitionSeparator are true; update the partition-separator block
so it only appends the extra TableID when it is not already included by
generateTablePath. Concretely, in the branch that currently appends
fmt.Sprintf("%d", tbl.TableNameWithPhysicTableID.TableID) under
f.config.EnablePartitionSeparator, add a guard to skip this append when
f.config.UseTableIDAsPath is true (or when generateTablePath was called with the
TableID-included flag), referencing f.config.UseTableIDAsPath,
f.config.EnablePartitionSeparator, tbl.TableNameWithPhysicTableID.TableID,
generateTablePath, and f.versionMap[tbl].
🧹 Nitpick comments (1)
pkg/sink/cloudstorage/path.go (1)

218-232: Remove the second invalid-table-ID check.

Line 218 already routes through def.GenerateSchemaFilePath(...), which is the canonical validation point for use-table-id-as-path. The extra branch at Line 230-232 is dead on this path and duplicates the error literal, so the two sites can drift again.

♻️ Suggested cleanup
 	tblSchemaFile, err := def.GenerateSchemaFilePath(f.config.UseTableIDAsPath, table.TableNameWithPhysicTableID.TableID)
 	if err != nil {
 		return false, err
 	}
 	exist, err := f.storage.FileExists(ctx, tblSchemaFile)
 	if err != nil {
 		return false, err
 	}
 	if exist {
 		f.versionMap[table] = table.TableInfoVersion
 		return false, nil
 	}
-	if f.config.UseTableIDAsPath && table.TableNameWithPhysicTableID.TableID <= 0 {
-		return false, errors.ErrInternalCheckFailed.GenWithStackByArgs("invalid table id for table-id path")
-	}

Based on learnings: GenerateSchemaFilePath(useTableIDAsPath bool, tableID int64) intentionally returns ErrInternalCheckFailed when tableID <= 0 and useTableIDAsPath=true.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sink/cloudstorage/path.go` around lines 218 - 232, The code duplicates
validation for table IDs: GenerateSchemaFilePath(f.config.UseTableIDAsPath,
table.TableNameWithPhysicTableID.TableID) already returns ErrInternalCheckFailed
when UseTableIDAsPath is true and tableID <= 0, so remove the redundant
conditional that checks f.config.UseTableIDAsPath &&
table.TableNameWithPhysicTableID.TableID <= 0 (the branch after exist check that
returns ErrInternalCheckFailed) to avoid dead/duplicated validation; keep the
call to def.GenerateSchemaFilePath, the FileExists check (f.storage.FileExists),
and the versionMap assignment (f.versionMap[table] = table.TableInfoVersion)
untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@downstreamadapter/sink/cloudstorage/sink.go`:
- Around line 227-234: The early-return in sink.writeFile incorrectly uses
!def.IsTableSchema(), causing invalid table-level DDLs to be silently skipped in
UseTableIDAsPath mode; change the condition to explicitly check for a true
database-level DDL by using def.Table == "" (and keep the surrounding
cfg.UseTableIDAsPath check) so only genuine schema-level events are skipped and
invalid table defs will proceed to GenerateSchemaFilePath and surface validation
errors; reference symbols: sink.writeFile, cfg.UseTableIDAsPath, def.Table,
def.IsTableSchema(), GenerateSchemaFilePath.

In `@pkg/config/sink.go`:
- Around line 954-965: The compatibility check must include the new
UseTableIDAsPath flag so sink-URI-only changes are detected: update
CheckCompatibilityWithSinkURI to parse UseTableIDAsPath from params (as done in
the shown block) and compare it against the current s.UseTableIDAsPath value; if
they differ, populate cfgInSinkURI and cfgInFile for UseTableIDAsPath (similar
to Protocol and TxnAtomicity) and do not let the function return early ignoring
this difference. Ensure you reference the same symbols (UseTableIDAsPath,
CheckCompatibilityWithSinkURI, cfgInSinkURI, cfgInFile, params.Get) when adding
this comparison so URI-driven toggles are correctly treated as
compatibility-breaking.

---

Outside diff comments:
In `@pkg/sink/cloudstorage/path.go`:
- Around line 398-418: The code appends the physical TableID twice when both
f.config.UseTableIDAsPath and f.config.EnablePartitionSeparator are true; update
the partition-separator block so it only appends the extra TableID when it is
not already included by generateTablePath. Concretely, in the branch that
currently appends fmt.Sprintf("%d", tbl.TableNameWithPhysicTableID.TableID)
under f.config.EnablePartitionSeparator, add a guard to skip this append when
f.config.UseTableIDAsPath is true (or when generateTablePath was called with the
TableID-included flag), referencing f.config.UseTableIDAsPath,
f.config.EnablePartitionSeparator, tbl.TableNameWithPhysicTableID.TableID,
generateTablePath, and f.versionMap[tbl].

---

Nitpick comments:
In `@pkg/sink/cloudstorage/path.go`:
- Around line 218-232: The code duplicates validation for table IDs:
GenerateSchemaFilePath(f.config.UseTableIDAsPath,
table.TableNameWithPhysicTableID.TableID) already returns ErrInternalCheckFailed
when UseTableIDAsPath is true and tableID <= 0, so remove the redundant
conditional that checks f.config.UseTableIDAsPath &&
table.TableNameWithPhysicTableID.TableID <= 0 (the branch after exist check that
returns ErrInternalCheckFailed) to avoid dead/duplicated validation; keep the
call to def.GenerateSchemaFilePath, the FileExists check (f.storage.FileExists),
and the versionMap assignment (f.versionMap[table] = table.TableInfoVersion)
untouched.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a51876cc-08fb-4742-acf1-cc189eb723ed

📥 Commits

Reviewing files that changed from the base of the PR and between 3302635 and 2cc7aa9.

📒 Files selected for processing (11)
  • api/v2/model.go
  • api/v2/model_test.go
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/config/changefeed.go
  • pkg/config/sink.go
  • pkg/sink/cloudstorage/config.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/sink/cloudstorage/path_test.go

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

✅ Actions performed

Reviews resumed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/config/sink.go (1)

954-965: ⚠️ Potential issue | 🟠 Major

Add UseTableIDAsPath to the compatibility delta check.

This block makes use-table-id-as-path URI-governed, but CheckCompatibilityWithSinkURI still only keys off Protocol and TxnAtomicity. A changefeed update that only toggles UseTableIDAsPath can therefore return early and miss a real URI/config conflict.

🐛 Proposed fix
-	cfgParamsChanged := s.Protocol != oldSinkConfig.Protocol ||
-		s.TxnAtomicity != oldSinkConfig.TxnAtomicity
+	cfgParamsChanged := s.Protocol != oldSinkConfig.Protocol ||
+		s.TxnAtomicity != oldSinkConfig.TxnAtomicity ||
+		s.UseTableIDAsPath != oldSinkConfig.UseTableIDAsPath

Also applies to: 997-1014

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/sink.go` around lines 954 - 965, The change makes UseTableIDAsPath
(s.UseTableIDAsPath) driven from the sink URI but CheckCompatibilityWithSinkURI
still ignores that key, so updates toggling UseTableIDAsPath can bypass
compatibility checks; update CheckCompatibilityWithSinkURI to include
UseTableIDAsPath in the compatibility delta comparison (same style as
Protocol/TxnAtomicity), using the same config key string (UseTableIDAsPathKey)
and comparing util.GetOrZero(s.UseTableIDAsPath) with the parsed/enabled value
from cfgInSinkURI/cfgInFile, and ensure cfgInSinkURI and cfgInFile are populated
consistently where UseTableIDAsPath is parsed so the function can detect and
return conflicts correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/path.go`:
- Around line 127-131: The generateTablePath function currently returns any
tableID as a string which allows useTableIDAsPath=true to produce "0/..." paths;
change the logic to reject non-positive IDs the same way GenerateSchemaFilePath
does: when useTableIDAsPath is true and tableID <= 0 return an error
(ErrInternalCheckFailed) instead of silently returning "0", and update callers
that build data/index directories (the code paths referenced around the
data/index generation using generateTablePath) to handle/propagate that error so
data/index path construction fails early rather than creating mixed ID/name
paths.

---

Duplicate comments:
In `@pkg/config/sink.go`:
- Around line 954-965: The change makes UseTableIDAsPath (s.UseTableIDAsPath)
driven from the sink URI but CheckCompatibilityWithSinkURI still ignores that
key, so updates toggling UseTableIDAsPath can bypass compatibility checks;
update CheckCompatibilityWithSinkURI to include UseTableIDAsPath in the
compatibility delta comparison (same style as Protocol/TxnAtomicity), using the
same config key string (UseTableIDAsPathKey) and comparing
util.GetOrZero(s.UseTableIDAsPath) with the parsed/enabled value from
cfgInSinkURI/cfgInFile, and ensure cfgInSinkURI and cfgInFile are populated
consistently where UseTableIDAsPath is parsed so the function can detect and
return conflicts correctly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b4a1808c-9d41-4b3a-904b-92ae604bef64

📥 Commits

Reviewing files that changed from the base of the PR and between 3302635 and 2cc7aa9.

📒 Files selected for processing (11)
  • api/v2/model.go
  • api/v2/model_test.go
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/sink_test.go
  • pkg/config/changefeed.go
  • pkg/config/sink.go
  • pkg/sink/cloudstorage/config.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/table_definition.go
  • pkg/sink/cloudstorage/table_definition_test.go

Comment on lines +137 to +139
if urlParameter.UseTableIDAsPath != nil {
c.UseTableIDAsPath = *urlParameter.UseTableIDAsPath
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if urlParameter.UseTableIDAsPath != nil {
c.UseTableIDAsPath = *urlParameter.UseTableIDAsPath
}
c.UseTableIDAsPath = util.GetOrZero(urlParameter.UseTableIDAsPath)

Use the util.GetOrZero function to keep the code style

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimized.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/sink/cloudstorage/path_test.go (1)

141-166: Add the partitioned-table case here too.

This only locks in the plain-table path shape. The branch most likely to regress is UseTableIDAsPath=true with EnablePartitionSeparator=true and IsPartition=true, where the partition ID must stay omitted so we don't reintroduce tableID/.../<partitionID>/... duplicates.

💡 Suggested test extension
 func TestGenerateDataFilePathWithTableIDAsPath(t *testing.T) {
 	t.Parallel()

 	ctx, cancel := context.WithCancel(context.TODO())
 	defer cancel()
@@
 	date := f.GenerateDateStr()
 	path, err := f.GenerateDataFilePath(ctx, table, date)
 	require.NoError(t, err)
 	require.Equal(t, fmt.Sprintf("12345/5/CDC_%s_000001.json", table.DispatcherID.String()), path)
+
+	partitioned := table
+	partitioned.TableNameWithPhysicTableID.IsPartition = true
+	f = testFilePathGenerator(ctx, t, dir)
+	f.config.UseTableIDAsPath = true
+	f.config.EnablePartitionSeparator = true
+	f.versionMap[partitioned] = partitioned.TableInfoVersion
+
+	path, err = f.GenerateDataFilePath(ctx, partitioned, date)
+	require.NoError(t, err)
+	require.Equal(t, fmt.Sprintf("12345/5/CDC_%s_000001.json", partitioned.DispatcherID.String()), path)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sink/cloudstorage/path_test.go` around lines 141 - 166, Extend
TestGenerateDataFilePathWithTableIDAsPath to also cover the partitioned-table
case: create a VersionedTableName with IsPartition=true and a non-empty
PartitionID, set f.config.UseTableIDAsPath=true and
f.config.EnablePartitionSeparator=true, populate f.versionMap for that table
version, call f.GenerateDataFilePath(ctx, table, date) and assert no error and
that the resulting path omits the partition ID (matches
"12345/5/CDC_<dispatcher>_000001.json"). This ensures GenerateDataFilePath
respects UseTableIDAsPath + EnablePartitionSeparator when IsPartition is true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/sink/cloudstorage/path.go`:
- Around line 405-440: generateDataDirPath currently reads f.versionMap[tbl]
directly which yields a zero value (0) when the map hasn't been primed; change
FilePathGenerator.generateDataDirPath to check for presence using the comma-ok
idiom (e.g., v, ok := f.versionMap[tbl]) and return a descriptive error if !ok
(mentioning tbl or that CheckOrWriteSchema must run) so callers of
GenerateDataFilePath / GenerateIndexFilePath fail fast rather than producing a
"0" path segment; keep the rest of the path-building logic unchanged and
reference CheckOrWriteSchema in the error message as the priming step.

---

Nitpick comments:
In `@pkg/sink/cloudstorage/path_test.go`:
- Around line 141-166: Extend TestGenerateDataFilePathWithTableIDAsPath to also
cover the partitioned-table case: create a VersionedTableName with
IsPartition=true and a non-empty PartitionID, set f.config.UseTableIDAsPath=true
and f.config.EnablePartitionSeparator=true, populate f.versionMap for that table
version, call f.GenerateDataFilePath(ctx, table, date) and assert no error and
that the resulting path omits the partition ID (matches
"12345/5/CDC_<dispatcher>_000001.json"). This ensures GenerateDataFilePath
respects UseTableIDAsPath + EnablePartitionSeparator when IsPartition is true.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 769312cb-ecc2-4a88-8268-0dc98f576431

📥 Commits

Reviewing files that changed from the base of the PR and between 2cc7aa9 and 1e51de0.

📒 Files selected for processing (7)
  • downstreamadapter/sink/cloudstorage/sink.go
  • downstreamadapter/sink/cloudstorage/writer.go
  • pkg/config/sink.go
  • pkg/sink/cloudstorage/config.go
  • pkg/sink/cloudstorage/path.go
  • pkg/sink/cloudstorage/path_test.go
  • pkg/sink/cloudstorage/table_definition.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/config/sink.go

api/v2/model.go Outdated
SafeMode: c.Sink.SafeMode,
OpenProtocol: openProtocolConfig,
Debezium: debeziumConfig,
UseTableIDAsPath: c.Sink.UseTableIDAsPath,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be added to the cloud storage config

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been moved to cloud storage config.

def.Query = event.Query
def.Type = event.Type
if err := s.writeFile(event, def); err != nil {
if err := s.writeFile(event, def, event.GetTableID()); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the event has been a param, I think event.GetTableID() is not required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimized.

func (s *sink) writeFile(v *commonEvent.DDLEvent, def cloudstorage.TableDefinition, tableID int64) error {
// skip write database-level event for 'use-table-id-as-path' mode
if s.cfg.UseTableIDAsPath && def.Table == "" {
log.Debug("skip database schema for table id path",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will lead to the database level DDL not being replicated to the downstream

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a breaking change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this design is:

  • Database events are not processed by TCI;
  • Retaining database events would create a database directory and schema files, increasing the workload of subsequent GC S3 directory creation.

Note: The 'use-table-id-as-path' configuration option ONLY applies to TICI.

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 10, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
api/v2/model_test.go (1)

69-69: Consider adding test coverage for false and nil cases.

The test only verifies the true case. For comprehensive coverage, consider adding assertions for:

  1. UseTableIDAsPath = false (explicit false)
  2. UseTableIDAsPath = nil (unset, should preserve nil or default)

This would help ensure the conversion logic handles all states correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v2/model_test.go` at line 69, Add two more assertions mirroring the
existing true-case test: create one subtest where
Sink.CloudStorageConfig.UseTableIDAsPath is explicitly set to a pointer to false
and assert util.GetOrZero(internalCfg.Sink.CloudStorageConfig.UseTableIDAsPath)
is false; create another subtest where Sink.CloudStorageConfig.UseTableIDAsPath
is left nil (unset), run the same conversion/initialization used in the current
test, and assert the resulting
internalCfg.Sink.CloudStorageConfig.UseTableIDAsPath remains nil (i.e., the
nil/unset state is preserved rather than coerced to true), referencing the same
internalCfg.Sink.CloudStorageConfig.UseTableIDAsPath and util.GetOrZero symbols
used in the existing test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/config/sink.go`:
- Around line 1024-1026: The comparison currently checks pointer addresses via
getUseTableIDAsPath(s) != getUseTableIDAsPath(oldSinkConfig); change it to
compare the underlying boolean values (handling nil) so identical boolean
settings on different pointer instances don't appear as changed. Replace that
part of the cfgParamsChanged expression to compare normalized bool values (e.g.,
boolOrFalse(getUseTableIDAsPath(s)) !=
boolOrFalse(getUseTableIDAsPath(oldSinkConfig))) or modify getUseTableIDAsPath
to return a concrete bool and use it directly; keep references to s,
oldSinkConfig, getUseTableIDAsPath and cfgParamsChanged when making the change.

---

Nitpick comments:
In `@api/v2/model_test.go`:
- Line 69: Add two more assertions mirroring the existing true-case test: create
one subtest where Sink.CloudStorageConfig.UseTableIDAsPath is explicitly set to
a pointer to false and assert
util.GetOrZero(internalCfg.Sink.CloudStorageConfig.UseTableIDAsPath) is false;
create another subtest where Sink.CloudStorageConfig.UseTableIDAsPath is left
nil (unset), run the same conversion/initialization used in the current test,
and assert the resulting internalCfg.Sink.CloudStorageConfig.UseTableIDAsPath
remains nil (i.e., the nil/unset state is preserved rather than coerced to
true), referencing the same internalCfg.Sink.CloudStorageConfig.UseTableIDAsPath
and util.GetOrZero symbols used in the existing test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1ecb6828-eb97-4387-a848-c0793c558a87

📥 Commits

Reviewing files that changed from the base of the PR and between 095fb1b and 1cc3c85.

📒 Files selected for processing (5)
  • api/v2/model.go
  • api/v2/model_test.go
  • downstreamadapter/sink/cloudstorage/sink.go
  • pkg/config/sink.go
  • pkg/sink/cloudstorage/config.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/sink/cloudstorage/config.go
  • api/v2/model.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use table_id instead of table_name as cdc path.

3 participants