From fd4dd4c8fecaef4a40b45130fcb23fee822e2c71 Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Wed, 18 Feb 2026 17:13:29 -0500 Subject: [PATCH 1/6] MOLT Verify transformations and filter rules --- src/current/molt/molt-verify.md | 135 ++++++++++++++++++++++++++++++-- 1 file changed, 129 insertions(+), 6 deletions(-) diff --git a/src/current/molt/molt-verify.md b/src/current/molt/molt-verify.md index 185cfb8773b..89e93fbe509 100644 --- a/src/current/molt/molt-verify.md +++ b/src/current/molt/molt-verify.md @@ -23,7 +23,7 @@ For a demo of MOLT Verify, watch the following video: ## Supported databases -The following source databases are currently supported: +The following source databases are supported: - PostgreSQL 12-16 - MySQL 5.7, 8.0 and later @@ -66,11 +66,13 @@ Flag | Description `--source` | (Required) Connection string for the source database. `--target` | (Required) Connection string for the target database. `--concurrency` | Number of threads to process at a time when reading the tables.
**Default:** 16
For faster verification, set this flag to a higher value. {% comment %}
Note: Table splitting by shard only works for [`INT`]({% link {{site.current_cloud_version}}/int.md %}), [`UUID`]({% link {{site.current_cloud_version}}/uuid.md %}), and [`FLOAT`]({% link {{site.current_cloud_version}}/float.md %}) data types.{% endcomment %} +`--filter-path` | Path to a JSON file that defines filter rules to verify only a subset of data in specified tables. Refer to [Verify a subset of data](#verify-a-subset-of-data). `--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `verify-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`. -`--metrics-listen-addr` | Address of the metrics endpoint, which has the path `{address}/metrics`.

**Default:** `'127.0.0.1:3030'` | +`--metrics-listen-addr` | Address of the metrics endpoint, which has the path `{address}/metrics`.

**Default:** `'127.0.0.1:3030'` `--row-batch-size` | Number of rows to get from a table at a time.
**Default:** 20000 `--schema-filter` | Verify schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` `--table-filter` | Verify tables that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` +`--transformations-file` | Path to a JSON file that defines transformation rules applied during comparison to verify data that was transformed during [fetch]({% link molt/molt-fetch.md %}#transformations). Use the same transformation file from `molt fetch`. Refer to [Verify transformed data](#verify-transformed-data). ## Usage @@ -109,6 +111,128 @@ When verification completes, the output displays a summary message like the foll - `num_success` is the number of rows that matched. - `num_conditional_success` is the number of rows that matched while having a column mismatch due to a type difference. This value indicates that all other columns that could be compared have matched successfully. You should manually review the warnings and errors in the output to determine whether the column mismatches can be ignored. +### Verify a subset of data + +You can write filter rules to have `molt verify` compare only a subset of rows in specified tables. This allows you to verify specific data ranges or conditions without processing entire tables. + +Filter rules apply `WHERE` clauses to specified tables during verification. Columns referenced in filter expressions **must** be indexed. + +{{site.data.alerts.callout_info}} +Only PostgreSQL and MySQL sources are supported for selective data verification. +{{site.data.alerts.end}} + +#### Step 1. Create a filter rules file + +Create a JSON file that defines the filter rules. The following example defines filter rules on two tables, `public.filtertbl` and `public.filtertbl2`: + +~~~ json +{ + "filters": [ + { + "resource_specifier": { + "schema": "public", + "table": "filtertbl" + }, + "expr": "x < 10" + }, + { + "resource_specifier": { + "schema": "public", + "table": "filtertbl2" + }, + "source_expr": "id BETWEEN 5 AND 15", + "target_expr": "15 > id > 5" + } + ] +} +~~~ + +- `resource_specifier`: Identifies which schemas and tables to filter. Schema and table names are case-insensitive. + - `schema`: Schema name containing the table. + - `table`: Table name to apply the filter to. +- `expr`: SQL expression that applies to both source and target databases. The expression must be valid for both database dialects. +- `source_expr` and `target_expr`: SQL expressions that apply to the source and target databases, respectively. These must be defined together, and cannot be used with `expr`. + +#### Step 2. Run `molt verify` with the filter file + +Use the `--filter-path` flag to specify the filter rules file: + +{% include_cached copy-clipboard.html %} +~~~ shell +molt verify \ + --source 'postgres://user:password@localhost/molt' \ + --target 'postgres://root@localhost:26257/molt?sslmode=disable' \ + --filter-path='./filter-rules.json' +~~~ + +When verification completes, the output displays a summary showing the number of rows verified in each filtered table: + +~~~ json +{"level":"info","message":"starting verify on public.filtertbl, shard 1/1"} +{"level":"info","type":"summary","table_schema":"public","table_name":"filtertbl","num_truth_rows":5,"num_success":5,"num_conditional_success":0,"num_missing":0,"num_mismatch":0,"num_extraneous":0,"num_column_mismatch":0,"message":"finished row verification on public.filtertbl (shard 1/1)"} +~~~ + +### Verify transformed data + +If you applied [transformations during `molt fetch`]({% link molt/molt-fetch.md %}#transformations), you can apply the same transformations with MOLT Verify to match source data with the transformed target data. + +{{site.data.alerts.callout_info}} +Only table and schema renames are supported. +{{site.data.alerts.end}} + +#### Step 1. Create a transformation file + +Create a JSON file that defines the transformation rules. MOLT Verify applies these transformations during comparison only and does not modify the source database. + +The following example assumes that MOLT Fetch renamed table `t` to `t2` and schema `public` to `public2`. The same transformation rule is applied during verification: + +~~~ json +{ + "transforms": [ + { + "id": 1, + "resource_specifier": { + "schema": "public", + "table": "t" + }, + "table_rename_opts": { + "value": "t2" + }, + "schema_rename_opts": { + "value": "public2" + } + } + ] +} +~~~ + +- `resource_specifier`: Identifies which schemas and tables to transform. Schema and table names are case-insensitive. + - `schema`: Schema name containing the table. + - `table`: Table name to transform. +- `table_rename_opts`: Rename the table on the target database. + - `value`: The target table name to compare against. +- `schema_rename_opts`: Rename the schema on the target database. + - `value`: The target schema name to compare against. + +#### Step 2. Run `molt verify` with the transformation file + +Use the `--transformations-file` flag to specify the transformation file: + +{% include_cached copy-clipboard.html %} +~~~ shell +molt verify \ + --source 'postgres://user:password@localhost/molt' \ + --target 'postgres://root@localhost:26257/molt?sslmode=disable' \ + --transformations-file 'transformation-rules.json' +~~~ + +When verification completes, the output displays a summary: + +~~~ json +{"level":"info","message":"starting verify on public.t, shard 1/1"} +{"level":"info","type":"summary","table_schema":"public","table_name":"t","num_truth_rows":10,"num_success":10,"num_conditional_success":0,"num_missing":0,"num_mismatch":0,"num_extraneous":0,"num_column_mismatch":0,"message":"finished row verification on public.t (shard 1/1)"} +~~~ + ## Docker usage {% include molt/molt-docker.md %} @@ -116,13 +240,12 @@ When verification completes, the output displays a summary message like the foll ## Known limitations - MOLT Verify compares 20,000 rows at a time by default, and row values can change between batches, potentially resulting in temporary inconsistencies in data. To configure the row batch size, use the `--row_batch_size` [flag](#flags). +- MOLT Verify only supports comparing one MySQL database to a whole CockroachDB schema (which is assumed to be `public`). - MOLT Verify checks for collation mismatches on [primary key]({% link {{site.current_cloud_version}}/primary-key.md %}) columns. This may cause validation to fail when a [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) is used as a primary key and the source and target databases are using different [collations]({% link {{site.current_cloud_version}}/collate.md %}). - MOLT Verify might give an error in case of schema changes on either the source or target database. - [Geospatial types]({% link {{site.current_cloud_version}}/spatial-data-overview.md %}#spatial-objects) cannot yet be compared. - -The following limitation is specific to MySQL: - -- MOLT Verify only supports comparing one MySQL database to a whole CockroachDB schema (which is assumed to be `public`). +- Only PostgreSQL and MySQL sources are supported for [verifying a subset of data](#verify-a-subset-of-data). +- Only table and schema renames are supported when [verifying transformed data](#verify-transformed-data). ## See also From 849a2efc91ca38a74344c6aada60f800a1ea27f4 Mon Sep 17 00:00:00 2001 From: Ryan Kuo <8740013+taroface@users.noreply.github.com> Date: Tue, 24 Feb 2026 15:04:45 -0500 Subject: [PATCH 2/6] Apply suggestions from code review Co-authored-by: bsanchez-the-roach --- src/current/molt/molt-verify.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/current/molt/molt-verify.md b/src/current/molt/molt-verify.md index 89e93fbe509..e4dbaca0838 100644 --- a/src/current/molt/molt-verify.md +++ b/src/current/molt/molt-verify.md @@ -118,7 +118,7 @@ You can write filter rules to have `molt verify` compare only a subset of rows i Filter rules apply `WHERE` clauses to specified tables during verification. Columns referenced in filter expressions **must** be indexed. {{site.data.alerts.callout_info}} -Only PostgreSQL and MySQL sources are supported for selective data verification. +Selective data verification is only supported for PostgreSQL and MySQL sources. {{site.data.alerts.end}} #### Step 1. Create a filter rules file @@ -148,8 +148,8 @@ Create a JSON file that defines the filter rules. The following example defines ~~~ - `resource_specifier`: Identifies which schemas and tables to filter. Schema and table names are case-insensitive. - - `schema`: Schema name containing the table. - - `table`: Table name to apply the filter to. + - `schema`: Name of the schema containing the table. + - `table`: Name of the table to apply the filter to. - `expr`: SQL expression that applies to both source and target databases. The expression must be valid for both database dialects. - `source_expr` and `target_expr`: SQL expressions that apply to the source and target databases, respectively. These must be defined together, and cannot be used with `expr`. @@ -207,7 +207,7 @@ The following example assumes that MOLT Fetch renamed table `t` to `t2` and sche ~~~ - `resource_specifier`: Identifies which schemas and tables to transform. Schema and table names are case-insensitive. - - `schema`: Schema name containing the table. + - `schema`: Name of the schema containing the table. - `table`: Table name to transform. - `table_rename_opts`: Rename the table on the target database. - `value`: The target table name to compare against. From e3569b7173a8ad6b1445ce582498bbf7ffca4a74 Mon Sep 17 00:00:00 2001 From: Ryan Kuo <8740013+taroface@users.noreply.github.com> Date: Tue, 24 Feb 2026 15:10:25 -0500 Subject: [PATCH 3/6] Apply suggestion from @bsanchez-the-roach Co-authored-by: bsanchez-the-roach --- src/current/molt/molt-verify.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/current/molt/molt-verify.md b/src/current/molt/molt-verify.md index e4dbaca0838..645b8618ad6 100644 --- a/src/current/molt/molt-verify.md +++ b/src/current/molt/molt-verify.md @@ -72,7 +72,7 @@ Flag | Description `--row-batch-size` | Number of rows to get from a table at a time.
**Default:** 20000 `--schema-filter` | Verify schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` `--table-filter` | Verify tables that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` -`--transformations-file` | Path to a JSON file that defines transformation rules applied during comparison to verify data that was transformed during [fetch]({% link molt/molt-fetch.md %}#transformations). Use the same transformation file from `molt fetch`. Refer to [Verify transformed data](#verify-transformed-data). +`--transformations-file` | Path to a JSON file that defines transformation rules to be applied during comparison. If verifying data that was [transformed during a bulk load with MOLT Fetch]({% link molt/molt-fetch.md %}#transformations), use the same transformation file from that `molt fetch` run. Refer to [Verify transformed data](#verify-transformed-data). ## Usage From 23d3c32390bec13b9ba1d8cc8ece27e5bdf342ec Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Tue, 24 Feb 2026 15:44:20 -0500 Subject: [PATCH 4/6] address reviewer comments --- src/current/molt/molt-verify.md | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/src/current/molt/molt-verify.md b/src/current/molt/molt-verify.md index 645b8618ad6..dfa47ff2bb5 100644 --- a/src/current/molt/molt-verify.md +++ b/src/current/molt/molt-verify.md @@ -72,7 +72,7 @@ Flag | Description `--row-batch-size` | Number of rows to get from a table at a time.
**Default:** 20000 `--schema-filter` | Verify schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` `--table-filter` | Verify tables that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` -`--transformations-file` | Path to a JSON file that defines transformation rules to be applied during comparison. If verifying data that was [transformed during a bulk load with MOLT Fetch]({% link molt/molt-fetch.md %}#transformations), use the same transformation file from that `molt fetch` run. Refer to [Verify transformed data](#verify-transformed-data). +`--transformations-file` | Path to a JSON file that defines transformation rules to be applied during comparison. If verifying data that was [transformed during a bulk load with MOLT Fetch]({% link molt/molt-fetch.md %}#transformations), use the same transformations from that `molt fetch` run. Refer to [Verify transformed data](#verify-transformed-data). ## Usage @@ -96,6 +96,10 @@ molt verify \ --target 'postgresql://{username}:{password}@{host}:{port}/{database}?sslmode=verify-full' ~~~ +{{site.data.alerts.callout_info}} +MySQL tables belong directly to the database, not to a separate schema. MOLT Verify compares MySQL databases with the CockroachDB `public` schema. +{{site.data.alerts.end}} + Use the optional [flags](#flags) to customize the verification results. When verification completes, the output displays a summary message like the following: @@ -115,7 +119,7 @@ When verification completes, the output displays a summary message like the foll You can write filter rules to have `molt verify` compare only a subset of rows in specified tables. This allows you to verify specific data ranges or conditions without processing entire tables. -Filter rules apply `WHERE` clauses to specified tables during verification. Columns referenced in filter expressions **must** be indexed. +Filter rules apply `WHERE` clauses to specified source tables during verification. Columns referenced in filter expressions **must** be indexed. {{site.data.alerts.callout_info}} Selective data verification is only supported for PostgreSQL and MySQL sources. @@ -123,7 +127,7 @@ Selective data verification is only supported for PostgreSQL and MySQL sources. #### Step 1. Create a filter rules file -Create a JSON file that defines the filter rules. The following example defines filter rules on two tables, `public.filtertbl` and `public.filtertbl2`: +Create a JSON file that defines the filter rules. The following example defines filter rules on two source tables, `public.filtertbl` and `public.filtertbl2`: ~~~ json { @@ -150,7 +154,7 @@ Create a JSON file that defines the filter rules. The following example defines - `resource_specifier`: Identifies which schemas and tables to filter. Schema and table names are case-insensitive. - `schema`: Name of the schema containing the table. - `table`: Name of the table to apply the filter to. -- `expr`: SQL expression that applies to both source and target databases. The expression must be valid for both database dialects. +- `expr`: SQL expression that applies to both source and target databases. The expression must be valid for both the source and target dialect. - `source_expr` and `target_expr`: SQL expressions that apply to the source and target databases, respectively. These must be defined together, and cannot be used with `expr`. #### Step 2. Run `molt verify` with the filter file @@ -182,9 +186,9 @@ Only table and schema renames are supported. #### Step 1. Create a transformation file -Create a JSON file that defines the transformation rules. MOLT Verify applies these transformations during comparison only and does not modify the source database. +Create a JSON file that defines the transformation rules. Each rule can rename a source schema, table, or both. MOLT Verify applies these transformations during comparison only and does not modify the source database. -The following example assumes that MOLT Fetch renamed table `t` to `t2` and schema `public` to `public2`. The same transformation rule is applied during verification: +The following example assumes that MOLT Fetch renamed source table `t` to `t2` on the target, and source schema `public` to `public2` on the target. The same transformation rule is applied during verification: ~~~ json { @@ -206,12 +210,12 @@ The following example assumes that MOLT Fetch renamed table `t` to `t2` and sche } ~~~ -- `resource_specifier`: Identifies which schemas and tables to transform. Schema and table names are case-insensitive. - - `schema`: Name of the schema containing the table. - - `table`: Table name to transform. -- `table_rename_opts`: Rename the table on the target database. +- `resource_specifier`: Identifies which source schemas and tables to transform. Schema and table names are case-insensitive. + - `schema`: Name of the source schema containing the table. + - `table`: Source table name to transform. +- `table_rename_opts`: Rename the source table on the target database. - `value`: The target table name to compare against. -- `schema_rename_opts`: Rename the schema on the target database. +- `schema_rename_opts`: Rename the source schema on the target database. - `value`: The target schema name to compare against. #### Step 2. Run `molt verify` with the transformation file @@ -240,7 +244,6 @@ When verification completes, the output displays a summary: ## Known limitations - MOLT Verify compares 20,000 rows at a time by default, and row values can change between batches, potentially resulting in temporary inconsistencies in data. To configure the row batch size, use the `--row_batch_size` [flag](#flags). -- MOLT Verify only supports comparing one MySQL database to a whole CockroachDB schema (which is assumed to be `public`). - MOLT Verify checks for collation mismatches on [primary key]({% link {{site.current_cloud_version}}/primary-key.md %}) columns. This may cause validation to fail when a [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) is used as a primary key and the source and target databases are using different [collations]({% link {{site.current_cloud_version}}/collate.md %}). - MOLT Verify might give an error in case of schema changes on either the source or target database. - [Geospatial types]({% link {{site.current_cloud_version}}/spatial-data-overview.md %}#spatial-objects) cannot yet be compared. From 25ef172af3ca64e0e973bf29e3181d1651c1cd4c Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Wed, 25 Feb 2026 12:56:10 -0500 Subject: [PATCH 5/6] clarify usage --- src/current/molt/molt-verify.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/current/molt/molt-verify.md b/src/current/molt/molt-verify.md index dfa47ff2bb5..2b9ba7ca36c 100644 --- a/src/current/molt/molt-verify.md +++ b/src/current/molt/molt-verify.md @@ -72,7 +72,7 @@ Flag | Description `--row-batch-size` | Number of rows to get from a table at a time.
**Default:** 20000 `--schema-filter` | Verify schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` `--table-filter` | Verify tables that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` -`--transformations-file` | Path to a JSON file that defines transformation rules to be applied during comparison. If verifying data that was [transformed during a bulk load with MOLT Fetch]({% link molt/molt-fetch.md %}#transformations), use the same transformations from that `molt fetch` run. Refer to [Verify transformed data](#verify-transformed-data). +`--transformations-file` | Path to a JSON file that defines transformation rules to be applied during comparison. Refer to [Verify transformed data](#verify-transformed-data). ## Usage @@ -178,7 +178,7 @@ When verification completes, the output displays a summary showing the number of ### Verify transformed data -If you applied [transformations during `molt fetch`]({% link molt/molt-fetch.md %}#transformations), you can apply the same transformations with MOLT Verify to match source data with the transformed target data. +If you applied [transformations with MOLT Fetch]({% link molt/molt-fetch.md %}#transformations), a [MOLT Replicator userscript]({% link molt/userscript-cookbook.md %}#rename-tables), or another tool, you can apply the same transformations with MOLT Verify to match source data with the transformed target data. {{site.data.alerts.callout_info}} Only table and schema renames are supported. @@ -188,7 +188,7 @@ Only table and schema renames are supported. Create a JSON file that defines the transformation rules. Each rule can rename a source schema, table, or both. MOLT Verify applies these transformations during comparison only and does not modify the source database. -The following example assumes that MOLT Fetch renamed source table `t` to `t2` on the target, and source schema `public` to `public2` on the target. The same transformation rule is applied during verification: +The following example assumes that another process renamed source table `t` to `t2` on the target, and source schema `public` to `public2` on the target. The same transformation rule is applied during verification: ~~~ json { From 6cc20d3785686e64b3ad4fbc3f388147cb91ca49 Mon Sep 17 00:00:00 2001 From: Ryan Kuo Date: Wed, 25 Feb 2026 14:03:19 -0500 Subject: [PATCH 6/6] remove callout --- src/current/molt/molt-verify.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/src/current/molt/molt-verify.md b/src/current/molt/molt-verify.md index 2b9ba7ca36c..4f135d6d96b 100644 --- a/src/current/molt/molt-verify.md +++ b/src/current/molt/molt-verify.md @@ -180,10 +180,6 @@ When verification completes, the output displays a summary showing the number of If you applied [transformations with MOLT Fetch]({% link molt/molt-fetch.md %}#transformations), a [MOLT Replicator userscript]({% link molt/userscript-cookbook.md %}#rename-tables), or another tool, you can apply the same transformations with MOLT Verify to match source data with the transformed target data. -{{site.data.alerts.callout_info}} -Only table and schema renames are supported. -{{site.data.alerts.end}} - #### Step 1. Create a transformation file Create a JSON file that defines the transformation rules. Each rule can rename a source schema, table, or both. MOLT Verify applies these transformations during comparison only and does not modify the source database.