diff --git a/src/current/_includes/molt/classic-bulk-load-all-sources.md b/src/current/_includes/molt/classic-bulk-load-all-sources.md new file mode 100644 index 00000000000..9bfd649be18 --- /dev/null +++ b/src/current/_includes/molt/classic-bulk-load-all-sources.md @@ -0,0 +1,254 @@ +A [*Classic Bulk Load Migration*]({% link molt/migration-approach-classic-bulk-load.md %}) is the simplest way of [migrating data to CockroachDB]({% link molt/migration-overview.md %}). In this approach, you stop application traffic to the source database and migrate data to the target cluster using [MOLT Fetch]({% link molt/molt-fetch.md %}) during a **significant downtime window**. Application traffic is then cut over to the target after schema finalization and data verification. + +- All source data is migrated to the target [at once]({% link molt/migration-considerations-granularity.md %}). + +- This approach does not utilize [continuous replication]({% link molt/migration-considerations-replication.md %}). + +- [Rollback]({% link molt/migration-considerations-rollback.md %}) is manual, but in most cases it's simple, as the source database is preserved and write traffic begins on the target all at once. If you wish to roll back before the target has received any writes that are not present on the source database, nothing needs to be done. If you wish to roll back after the target has received writes that are not present on the source database, you must manually replicate these new rows on the source. + +This approach is best for small databases (<100 GB), internal tools, dev/staging environments, and production environments that can handle business disruption. It's a simple approach that guarantees full data consistency and is easy to execute with limited resources, but it can only be performed if your system can handle significant downtime. + +This page describes an example scenario. While the commands provided can be copy-and-pasted, they may need to be altered or reconsidered to suit the needs of your specific environment. + +
+Classic Bulk Load Migration flow +
+ +## Example scenario + +You have a small (50 GB) database that provides the data store for a web application. You want to migrate the entirety of this database to a new CockroachDB cluster. You schedule a maintenance window for Saturday from 2 AM to 6 AM, and announce it to your users several weeks in advance. + +The application runs on a Kubernetes cluster. + +**Estimated system downtime:** 4 hours. + +## Before the migration + +- Install the [MOLT (Migrate Off Legacy Technology)]({% link molt/molt-fetch-installation.md %}#installation) tools. +- Review the [MOLT Fetch]({% link molt/molt-fetch-best-practices.md %}) documentation. +- [Develop a migration plan]({% link molt/migration-strategy.md %}#develop-a-migration-plan) and [prepare for the migration]({% link molt/migration-strategy.md %}#prepare-for-migration). +- **Recommended:** Perform a dry run of this full set of instructions in a development environment that closely resembles your production environment. This can help you get a realistic sense of the time and complexity it requires. +- Announce the maintenance window to your users. +- Understand the prequisites and limitations of the MOLT tools: + +
+{% include molt/oracle-migration-prerequisites.md %} +
+ +{% include molt/molt-limitations.md %} + +## Step 1: Prepare the source database + +In this step, you will: + +- [Create a dedicated migration user on your source database](#create-migration-user-on-source-database). + +{% include molt/migration-prepare-database.md %} + +## Step 2: Prepare the target database + +In this step, you will: + +- [Provision and run a new CockroachDB cluster](#provision-a-cockroachdb-cluster). +- [Define the tables on the target cluster](#define-the-target-tables) to match those on the source. +- [Create a SQL user on the target cluster](#create-the-sql-user) with the necessary write permissions. + +### Provision a CockroachDB cluster + +Use one of the following options to create and run a new CockroachDB cluster. This is your migration **target**. + +#### Option 1: Create a secure cluster locally + +If you have the CockroachDB binary installed locally, you can manually deploy a multi-node, self-hosted CockroachDB cluster on your local machine. + +Learn how to [deploy a CockroachDB cluster locally]({% link {{ site.versions["stable"] }}/secure-a-cluster.md %}). + +#### Option 2: Create a CockroachDB Self-Hosted cluster on AWS + +You can manually deploy a multi-node, self-hosted CockroachDB cluster on Amazon's AWS EC2 platform, using AWS's managed load-balancing service to distribute client traffic. + +Learn how to [deploy a CockroachDB cluster on AWS]({% link {{ site.versions["stable"] }}/deploy-cockroachdb-on-aws.md %}). + +#### Option 3: Create a CockroachDB Cloud cluster + +CockroachDB Cloud is a fully-managed service run by Cockroach Labs, which simplifies the deployment and management of CockroachDB. + +[Sign up for a CockroachDB Cloud account](https://cockroachlabs.cloud) and [create a cluster]({% link cockroachcloud/create-your-cluster.md %}) using [trial credits]({% link cockroachcloud/free-trial.md %}). + +### Define the target tables + +{% include molt/migration-prepare-schema.md %} + +### Create the SQL user + +{% include molt/migration-create-sql-user.md %} + +## Step 3: Stop application traffic + +With both the source and target databases prepared for the data load, it's time to stop application traffic to the source. At the start of the maintenance window, scale down the Kubernetes cluster to zero pods. + +{% include_cached copy-clipboard.html %} +~~~shell +kubectl scale deployment app --replicas=0 +~~~ + +{{ site.data.alerts.callout_danger }} +Application downtime begins now. + +It is strongly recommended that you perform a dry run of this migration in a test environment. This will allow you to practice using the MOLT tools in real time, and it will give you an accurate sense of how long application downtime might last. +{{ site.data.alerts.end }} + +## Step 4: Load data into CockroachDB + +In this step, you will: + +- [Configure MOLT Fetch with the flags needed for your migration](#configure-molt-fetch). +- [Run MOLT Fetch](#run-molt-fetch). +- [Understand how to continue a load after an interruption](#continue-molt-fetch-after-an-interruption). + +### Configure MOLT Fetch + +The [MOLT Fetch documentation]({% link molt/molt-fetch.md %}) includes detailed information about how to [configure MOLT Fetch]({% link molt/molt-fetch.md %}#run-molt-fetch), and how to [monitor MOLT Fetch metrics]({% link molt/molt-fetch-monitoring.md %}). + +When you run `molt fetch`, you can configure the following options for data load: + + + + + + + + + + + +- [Specify source and target databases]({% link molt/molt-fetch.md %}#specify-source-and-target-databases): Specify URL‑encoded source and target connections. +- [Select data to migrate]({% link molt/molt-fetch.md %}#select-data-to-migrate): Specify schema and table names to migrate. +- [Define intermediate file storage]({% link molt/molt-fetch.md %}#define-intermediate-storage): Export data to cloud storage or a local file server. +- [Define fetch mode]({% link molt/molt-fetch.md %}#define-fetch-mode): Specifies whether data will only be loaded into/from intermediate storage. +- [Shard tables]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export): Divide larger tables into multiple shards during data export. +- [Data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from): Choose between `IMPORT INTO` and `COPY FROM`. +- [Table handling mode]({% link molt/molt-fetch.md %}#handle-target-tables): Determine how existing target tables are initialized before load. +- [Define data transformations]({% link molt/molt-fetch.md %}#define-transformations): Define any row-level transformations to apply to the data before it reaches the target. +- [Monitor fetch metrics]({% link molt/molt-fetch-monitoring.md %}): Configure metrics collection during initial data load. + +Read through the documentation to understand how to configure your `molt fetch` command and its flags. Follow [best practices]({% link molt/molt-fetch-best-practices.md %}), especially those related to security. + +At minimum, the `molt fetch` command should include the source, target, data path, and [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check) flags: + +{% include_cached copy-clipboard.html %} +~~~ shell +molt fetch \ +--source $SOURCE \ +--target $TARGET \ +--bucket-path 's3://bucket/path' \ +--ignore-replication-check +~~~ + +However, depending on the needs of your migration, you may have many more flags set, and you may need to prepare some accompanying .json files. + +### Run MOLT Fetch + +Perform the bulk load of the source data. + +1. Run the [MOLT Fetch]({% link molt/molt-fetch.md %}) command to move the source data into CockroachDB. This example command passes the source and target connection strings [as environment variables](#secure-connections), writes [intermediate files](#intermediate-file-storage) to S3 storage, and uses the `truncate-if-exists` [table handling mode](#table-handling-mode) to truncate the target tables before loading data. It limits the migration to a single schema and filters for three specific tables. The [data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) defaults to `IMPORT INTO`. Include the `--ignore-replication-check` flag to skip replication checkpoint queries, which eliminates the need to configure the source database for logical replication. + +
+ {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --target $TARGET \ + --schema-filter 'migration_schema' \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists \ + --ignore-replication-check + ~~~ +
+ +
+ {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --target $TARGET \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists \ + --ignore-replication-check + ~~~ +
+ +
+ The command assumes an Oracle Multitenant (CDB/PDB) source. [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) specifies the container database (CDB) connection string. + + {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --source-cdb $SOURCE_CDB \ + --target $TARGET \ + --schema-filter 'migration_schema' \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists \ + --ignore-replication-check + ~~~ +
+ +{% include molt/fetch-data-load-output.md %} + +### Continue MOLT Fetch after an interruption + +{% include molt/fetch-continue-after-interruption.md %} + +## Step 5: Verify the data + +In this step, you will use [MOLT Verify]({% link molt/molt-verify.md %}) to confirm that the source and target data is consistent. This ensures that the data load was successful. + +### Run MOLT Verify + +{% include molt/verify-output.md %} + +## Step 6: Finalize the target schema + +### Add constraints and indexes + +{% include molt/migration-modify-target-schema.md %} + +## Step 7: Cut over application traffic + +With the target cluster verified and finalized, it's time to resume application traffic. + +### Modify application code + +In the application back end, make sure that the application now directs traffic to the CockroachDB cluster. For example: + +~~~yml +env: + - name: DATABASE_URL + value: postgres://root@localhost:26257/defaultdb?sslmode=verify-full +~~~ + +### Resume application traffic + +Scale up the Kubernetes deployment to the original number of replicas: + +{% include_cached copy-clipboard.html %} +~~~shell +kubectl scale deployment app --replicas=3 +~~~ + +This ends downtime. + +## Troubleshooting + +{% include molt/molt-troubleshooting-fetch.md %} + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) +- [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) diff --git a/src/current/_includes/molt/delta-all-sources.md b/src/current/_includes/molt/delta-all-sources.md new file mode 100644 index 00000000000..0ccf648d9e7 --- /dev/null +++ b/src/current/_includes/molt/delta-all-sources.md @@ -0,0 +1,553 @@ +A [*Delta Migration*]({% link molt/migration-approach-delta.md %}) uses an initial data load, followed by [continuous replication]({% link molt/migration-considerations-replication.md %}), to [migrate data to CockroachDB]({% link molt/migration-overview.md %}). In this approach, you migrate most application data to the target using [MOLT Fetch]({% link molt/molt-fetch.md %}) **before** stopping application traffic to the source database. You then use [MOLT Replicator]({% link molt/molt-replicator.md %}) to keep the target database in sync with any changes in the source database (the migration _delta_), before finally halting traffic to the source and cutting over to the target after schema finalization and data verification. + +- All source data is migrated to the target [at once]({% link molt/migration-considerations-granularity.md %}). + +- This approach utilizes [continuous replication]({% link molt/migration-considerations-replication.md %}). + +- [Failback replication]({% link molt/migration-considerations-rollback.md %}) is supported, though this example will not use it. See [Phased Delta Migration with Failback Replication]({% link molt/migration-approach-phased-delta-failback.md %}) for an example of a migration that uses failback replication. + +This approach is best for production environments that need to minimize system downtime. + +This page describes an example scenario. While the commands provided can be copy-and-pasted, they may need to be altered or reconsidered to suit the needs of your specific environment. + +
+Delta migration flow +
+ +## Example scenario + +You have a small (300 GB) database that provides the data store for a web application. You want to migrate the entirety of this database to a new CockroachDB cluster. Business cannot accommodate a full maintenance window, but it can accommodate a brief (<60 second) halt in traffic. + +The application runs on a Kubernetes cluster. + +**Estimated system downtime:** 3-5 minutes. + +## Before the migration + +- Install the [MOLT (Migrate Off Legacy Technology)]({% link molt/molt-fetch-installation.md %}#installation) tools. +- Review the [MOLT Fetch]({% link molt/molt-fetch-best-practices.md %}) and [MOLT Replicator]({% link molt/molt-replicator.md %}) documentation. +- [Develop a migration plan]({% link molt/migration-strategy.md %}#develop-a-migration-plan) and [prepare for the migration]({% link molt/migration-strategy.md %}#prepare-for-migration). +- **Recommended:** Perform a dry run of this full set of instructions in a development environment that closely resembles your production environment. This can help you get a realistic sense of the time and complexity it requires. +- Understand the prequisites and limitations of the MOLT tools: + +
+{% include molt/oracle-migration-prerequisites.md %} +
+ +{% include molt/molt-limitations.md %} + +## Step 1: Prepare the source database + +In this step, you will: + +- [Create a dedicated migration user on your source database](#create-migration-user-on-source-database). +- [Configure the source database for replication](#configure-source-database-for-replication). + +{% include molt/migration-prepare-database.md %} + +## Step 2: Prepare the target database + +### Define the target tables + +{% include molt/migration-prepare-schema.md %} + +### Create the SQL user + +{% include molt/migration-create-sql-user.md %} + +### Configure GC TTL + +Before starting the [initial data load](#run-molt-fetch), configure the [garbage collection (GC) TTL]({% link {{ site.current_cloud_version }}/configure-replication-zones.md %}#gc-ttlseconds) on the source CockroachDB cluster to ensure that historical data remains available when replication begins. The GC TTL must be long enough to cover the full duration of the data load. + +Increase the GC TTL before starting the data load. For example, to set the GC TTL to 24 hours: + +{% include_cached copy-clipboard.html %} +~~~ sql +ALTER DATABASE defaultdb CONFIGURE ZONE USING gc.ttlseconds = 86400; +~~~ + +{{site.data.alerts.callout_info}} +The GC TTL duration must be higher than your expected time for the initial data load. +{{site.data.alerts.end}} + +Once replication has started successfully (which automatically protects its own data range), you can restore the GC TTL to its original value. For example, to restore to 5 minutes: + +{% include_cached copy-clipboard.html %} +~~~ sql +ALTER DATABASE defaultdb CONFIGURE ZONE USING gc.ttlseconds = 300; +~~~ + +For details, refer to [Protect Changefeed Data from Garbage Collection]({% link {{ site.current_cloud_version }}/protect-changefeed-data.md %}). + +## Step 3: Load data into CockroachDB + +In this step, you will: + +- [Configure MOLT Fetch with the flags needed for your migration](#configure-molt-fetch). +- [Run MOLT Fetch](#run-molt-fetch). +- [Understand how to continue a load after an interruption](#continue-molt-fetch-after-an-interruption). + +### Configure MOLT Fetch + +The [MOLT Fetch documentation]({% link molt/molt-fetch.md %}) includes detailed information about how to [configure MOLT Fetch]({% link molt/molt-fetch.md %}#run-molt-fetch), and how to [monitor MOLT Fetch metrics]({% link molt/molt-fetch-monitoring.md %}). + +When you run `molt fetch`, you can configure the following options for data load: + + + + + + + + + + + +- [Specify source and target databases]({% link molt/molt-fetch.md %}#specify-source-and-target-databases): Specify URL‑encoded source and target connections. +- [Select data to migrate]({% link molt/molt-fetch.md %}#select-data-to-migrate): Specify schema and table names to migrate. +- [Define intermediate file storage]({% link molt/molt-fetch.md %}#define-intermediate-storage): Export data to cloud storage or a local file server. +- [Define fetch mode]({% link molt/molt-fetch.md %}#define-fetch-mode): Specifies whether data will only be loaded into/from intermediate storage. +- [Shard tables]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export): Divide larger tables into multiple shards during data export. +- [Data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from): Choose between `IMPORT INTO` and `COPY FROM`. +- [Table handling mode]({% link molt/molt-fetch.md %}#handle-target-tables): Determine how existing target tables are initialized before load. +- [Define data transformations]({% link molt/molt-fetch.md %}#define-transformations): Define any row-level transformations to apply to the data before it reaches the target. +- [Monitor fetch metrics]({% link molt/molt-fetch-monitoring.md %}): Configure metrics collection during initial data load. + +Read through the documentation to understand how to configure your `molt fetch` command and its flags. Follow [best practices]({% link molt/molt-fetch-best-practices.md %}), especially those related to security. + +At minimum, the `molt fetch` command should include the source, target, data path, and [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check) flags: + +{% include_cached copy-clipboard.html %} +~~~ shell +molt fetch \ +--source $SOURCE \ +--target $TARGET \ +--bucket-path 's3://bucket/path' \ +--ignore-replication-check +~~~ + +However, depending on the needs of your migration, you may have many more flags set, and you may need to prepare some accompanying .json files. + +### Run MOLT Fetch + + + +Perform the initial load of the source data. + +1. Issue the [MOLT Fetch]({% link molt/molt-fetch.md %}) command to move the source data to CockroachDB. This example command passes the source and target connection strings [as environment variables](#secure-connections), writes [intermediate files](#intermediate-file-storage) to S3 storage, and uses the `truncate-if-exists` [table handling mode](#table-handling-mode) to truncate the target tables before loading data. It also limits the migration to a single schema and filters three specific tables to migrate. The [data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) defaults to `IMPORT INTO`. + +
+ You **must** include `--pglogical-replication-slot-name` and `--pglogical-publication-and-slot-drop-and-recreate` to automatically create the publication and replication slot during the data load. + + {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --target $TARGET \ + --schema-filter 'migration_schema' \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists \ + --pglogical-replication-slot-name molt_slot \ + --pglogical-publication-and-slot-drop-and-recreate + ~~~ +
+ +
+ {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --target $TARGET \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists + ~~~ +
+ +
+ The command assumes an Oracle Multitenant (CDB/PDB) source. [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) specifies the container database (CDB) connection string. + + {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --source-cdb $SOURCE_CDB \ + --target $TARGET \ + --schema-filter 'migration_schema' \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists + ~~~ +
+ +{% include molt/fetch-data-load-output.md %} + +### Continue MOLT Fetch after an interruption + +{% include molt/fetch-continue-after-interruption.md %} + +## Step 4: Verify the initial data load + +Use [MOLT Verify]({% link molt/molt-verify.md %}) to confirm that the source and target data is consistent. This ensures that the data load was successful. + +### Run MOLT Verify + +{% include molt/verify-output.md %} + +## Step 5: Finalize the target schema + +### Add constraints and indexes + +{% include molt/migration-modify-target-schema.md %} + +## Step 6: Begin forward replication + +In this step, you will: + +- [Configure MOLT Replicator with the flags needed for your migration](#configure-molt-replicator). +- [Start MOLT Replicator](#start-molt-replicator). +- [Understand how to continue replication after an interruption](#continue-molt-replicator-after-an-interruption). + +### Configure MOLT Replicator + +When you run `replicator`, you can configure the following options for replication: + +- [Replication connection strings](#replication-connection-strings): Specify URL-encoded source and target database connections. +- [Replicator flags](#replicator-flags): Specify required and optional flags to configure replicator behavior. +
+- [Tuning parameters](#tuning-parameters): Optimize replication performance and resource usage. +
+- [Replicator metrics](#replicator-metrics): Monitor replication progress and performance. + +#### Replication connection strings + +MOLT Replicator uses `--sourceConn` and `--targetConn` to specify the source and target database connections. + +`--sourceConn` specifies the connection string of the source database: + +
+~~~ +--sourceConn 'postgresql://{username}:{password}@{host}:{port}/{database}' +~~~ +
+ +
+~~~ +--sourceConn 'mysql://{username}:{password}@{protocol}({host}:{port})/{database}' +~~~ +
+ +
+~~~ +--sourceConn 'oracle://{username}:{password}@{host}:{port}/{service_name}' +~~~ + +For Oracle Multitenant databases, also specify `--sourcePDBConn` with the PDB connection string: + +~~~ +--sourcePDBConn 'oracle://{username}:{password}@{host}:{port}/{pdb_service_name}' +~~~ +
+ +`--targetConn` specifies the target CockroachDB connection string: + +~~~ +--targetConn 'postgresql://{username}:{password}@{host}:{port}/{database}' +~~~ + +{{site.data.alerts.callout_success}} +Follow best practices for securing connection strings. Refer to [Secure connections](#secure-connections). +{{site.data.alerts.end}} + +#### Replicator flags + +{% include molt/replicator-flags-usage.md %} + +
+ +#### Tuning parameters + +{% include molt/optimize-replicator-performance.md %} +
+ +#### Replicator metrics + +MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: + +~~~ +--metricsAddr :30005 +~~~ + +
+For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=postgres). +
+ +
+For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=mysql). +
+ +
+For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=oracle). +
+ +### Start MOLT Replicator + + + +With initial load complete, start replication of ongoing changes on the source to CockroachDB using [MOLT Replicator]({% link molt/molt-replicator.md %}). + +{{site.data.alerts.callout_info}} +MOLT Fetch captures a consistent point-in-time checkpoint at the start of the data load (shown as `cdc_cursor` in the fetch output). Starting replication from this checkpoint ensures that all changes made during and after the data load are replicated to CockroachDB, preventing data loss or duplication. The following steps use the checkpoint values from the fetch output to start replication at the correct position. +{{site.data.alerts.end}} + +
+1. Run the `replicator` command, using the same slot name that you specified with `--pglogical-replication-slot-name` and the publication name created by `--pglogical-publication-and-slot-drop-and-recreate` in the [Fetch command](#run-molt-fetch). Use `--stagingSchema` to specify a unique name for the staging database, and include `--stagingCreateSchema` to have MOLT Replicator automatically create the staging database: + + {% include_cached copy-clipboard.html %} + ~~~ shell + replicator pglogical \ + --sourceConn $SOURCE \ + --targetConn $TARGET \ + --targetSchema defaultdb.migration_schema \ + --slotName molt_slot \ + --publicationName molt_fetch \ + --stagingSchema defaultdb._replicator \ + --stagingCreateSchema \ + --metricsAddr :30005 \ + -v + ~~~ +
+ +
+1. Run the `replicator` command, specifying the GTID from the [checkpoint recorded during data load](#run-molt-fetch). Use `--stagingSchema` to specify a unique name for the staging database, and include `--stagingCreateSchema` to have MOLT Replicator automatically create the staging database. If you [filtered tables during the initial load](#schema-and-table-filtering), [write a userscript to filter tables on replication]({% link molt/userscript-cookbook.md %}#filter-multiple-tables) and specify the path with `--userscript`. + + {% include_cached copy-clipboard.html %} + ~~~ shell + replicator mylogical \ + --sourceConn $SOURCE \ + --targetConn $TARGET \ + --targetSchema defaultdb.public \ + --defaultGTIDSet 4c658ae6-e8ad-11ef-8449-0242ac140006:1-29 \ + --stagingSchema defaultdb._replicator \ + --stagingCreateSchema \ + --metricsAddr :30005 \ + --userscript table_filter.ts \ + -v + ~~~ + + {{site.data.alerts.callout_success}} + For MySQL versions that do not support `binlog_row_metadata`, include `--fetchMetadata` to explicitly fetch column metadata. This requires additional permissions on the source MySQL database. Grant `SELECT` permissions with `GRANT SELECT ON migration_db.* TO 'migration_user'@'localhost';`. If that is insufficient for your deployment, use `GRANT PROCESS ON *.* TO 'migration_user'@'localhost';`, though this is more permissive and allows seeing processes and server status. + {{site.data.alerts.end}} +
+ +
+1. Run the `replicator` command, specifying the backfill and starting SCN from the [checkpoint recorded during data load](#run-molt-fetch). Use `--stagingSchema` to specify a unique name for the staging database, and include `--stagingCreateSchema` to have MOLT Replicator automatically create the staging database. If you [filtered tables during the initial load](#schema-and-table-filtering), [write a userscript to filter tables on replication]({% link molt/userscript-cookbook.md %}#filter-multiple-tables) and specify the path with `--userscript`. + + {% include_cached copy-clipboard.html %} + ~~~ shell + replicator oraclelogminer \ + --sourceConn $SOURCE \ + --sourcePDBConn $SOURCE_PDB \ + --targetConn $TARGET \ + --sourceSchema MIGRATION_USER \ + --targetSchema defaultdb.migration_schema \ + --backfillFromSCN 26685444 \ + --scn 26685786 \ + --stagingSchema defaultdb._replicator \ + --stagingCreateSchema \ + --metricsAddr :30005 \ + --userscript table_filter.ts \ + -v + ~~~ + + {{site.data.alerts.callout_info}} + When [filtering out tables in a schema with a userscript]({% link molt/userscript-cookbook.md %}#filter-multiple-tables), replication performance may decrease because filtered tables are still included in LogMiner queries and processed before being discarded. + {{site.data.alerts.end}} +
+ +#### Check that replication is working + +1. Verify that Replicator is processing changes successfully. To do so, check the MOLT Replicator logs. Since you enabled debug logging with `-v`, you should see connection and row processing messages: + +
+ You should see periodic primary keepalive messages: + + ~~~ + DEBUG [Aug 25 14:38:10] primary keepalive received ReplyRequested=false ServerTime="2025-08-25 14:38:09.556773 -0500 CDT" ServerWALEnd=0/49913A58 + DEBUG [Aug 25 14:38:15] primary keepalive received ReplyRequested=false ServerTime="2025-08-25 14:38:14.556836 -0500 CDT" ServerWALEnd=0/49913E60 + ~~~ + + When rows are successfully replicated, you should see debug output like the following: + + ~~~ + DEBUG [Aug 25 14:40:02] upserted rows conflicts=0 duration=7.855333ms proposed=1 target="\"molt\".\"public\".\"tbl1\"" upserted=1 + DEBUG [Aug 25 14:40:02] progressed to LSN: 0/49915DD0 + ~~~ +
+ +
+ You should see binlog syncer connection and row processing: + + ~~~ + [2025/08/25 15:29:09] [info] binlogsyncer.go:463 begin to sync binlog from GTID set 77263736-7899-11f0-81a5-0242ac120002:1-38 + [2025/08/25 15:29:09] [info] binlogsyncer.go:409 Connected to mysql 8.0.43 server + INFO [Aug 25 15:29:09] connected to MySQL version 8.0.43 + ~~~ + + When rows are successfully replicated, you should see debug output like the following: + + ~~~ + DEBUG [Aug 25 15:29:38] upserted rows conflicts=0 duration=1.801ms proposed=1 target="\"molt\".\"public\".\"tbl1\"" upserted=1 + DEBUG [Aug 25 15:29:38] progressed to consistent point: 77263736-7899-11f0-81a5-0242ac120002:1-39 + ~~~ +
+ +
+ When transactions are read from the Oracle source, you should see registered transaction IDs (XIDs): + + ~~~ + DEBUG [Jul 3 15:55:12] registered xid 0f001f0040060000 + DEBUG [Jul 3 15:55:12] registered xid 0b001f00bb090000 + ~~~ + + When rows are successfully replicated, you should see debug output like the following: + + ~~~ + DEBUG [Jul 3 15:55:12] upserted rows conflicts=0 duration=2.620009ms proposed=13 target="\"molt_movies\".\"USERS\".\"CUSTOMER_CONTACT\"" upserted=13 + DEBUG [Jul 3 15:55:12] upserted rows conflicts=0 duration=2.212807ms proposed=16 target="\"molt_movies\".\"USERS\".\"CUSTOMER_DEVICE\"" upserted=16 + ~~~ +
+ + These messages confirm successful replication. You can disable verbose logging after verifying the connection. + +### Continue MOLT Replicator after an interruption + +
+Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `pglogical` command using the same `--stagingSchema` value from your [initial replication command](#start-molt-replicator). + +Be sure to specify the same `--slotName` value that you used during your [initial replication command](#start-molt-replicator). The replication slot on the source PostgreSQL database automatically tracks the LSN (Log Sequence Number) checkpoint, so replication will resume from where it left off. + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator pglogical \ +--sourceConn $SOURCE \ +--targetConn $TARGET \ +--targetSchema defaultdb.migration_schema \ +--slotName molt_slot \ +--stagingSchema defaultdb._replicator \ +--metricsAddr :30005 \ +-v +~~~ +
+ +
+Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `mylogical` command using the same `--stagingSchema` value from your [initial replication command](#start-molt-replicator). + +Replicator will automatically use the saved GTID (Global Transaction Identifier) from the `memo` table in the staging schema (in this example, `defaultdb._replicator.memo`) and track advancing GTID checkpoints there. To have Replicator start from a different GTID instead of resuming from the checkpoint, clear the `memo` table with `DELETE FROM defaultdb._replicator.memo;` and run the `replicator` command with a new `--defaultGTIDSet` value. + +{{site.data.alerts.callout_success}} +For MySQL versions that do not support `binlog_row_metadata`, include `--fetchMetadata` to explicitly fetch column metadata. This requires additional permissions on the source MySQL database. Grant `SELECT` permissions with `GRANT SELECT ON migration_db.* TO 'migration_user'@'localhost';`. If that is insufficient for your deployment, use `GRANT PROCESS ON *.* TO 'migration_user'@'localhost';`, though this is more permissive and allows seeing processes and server status. +{{site.data.alerts.end}} + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator mylogical \ +--sourceConn $SOURCE \ +--targetConn $TARGET \ +--targetSchema defaultdb.public \ +--stagingSchema defaultdb._replicator \ +--metricsAddr :30005 \ +--userscript table_filter.ts \ +-v +~~~ +
+ +
+Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `oraclelogminer` command using the same `--stagingSchema` value from your [initial replication command](#start-molt-replicator). + +Replicator will automatically find the correct restart SCN (System Change Number) from the `_oracle_checkpoint` table in the staging schema. The restart point is determined by the non-committed row with the smallest `startscn` column value. + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator oraclelogminer \ +--sourceConn $SOURCE \ +--sourcePDBConn $SOURCE_PDB \ +--sourceSchema MIGRATION_USER \ +--targetSchema defaultdb.migration_schema \ +--targetConn $TARGET \ +--stagingSchema defaultdb._replicator \ +--metricsAddr :30005 \ +--userscript table_filter.ts \ +-v +~~~ + +{{site.data.alerts.callout_info}} +When [filtering out tables in a schema with a userscript]({% link molt/userscript-cookbook.md %}#filter-multiple-tables), replication performance may decrease because filtered tables are still included in LogMiner queries and processed before being discarded. +{{site.data.alerts.end}} +
+ +Replication resumes from the last checkpoint without performing a fresh load. Monitor the metrics endpoint at `http://localhost:30005/_/varz` to track replication progress. + +## Step 7: Stop application traffic + +Once the inital data load has been verified and the target schema has been finalized, it's time to begin the cutover process. First, stop application traffic to the source. Scale down the Kubernetes cluster to zero pods. + +{% include_cached copy-clipboard.html %} +~~~shell +kubectl scale deployment app --replicas=0 +~~~ + +{{ site.data.alerts.callout_danger }} +Application downtime begins now. + +It is strongly recommended that you perform a dry run of this migration in a test environment. This will allow you to practice using the MOLT tools in real time, and it will give you an accurate sense of how long application downtime might last. +{{ site.data.alerts.end }} + +## Step 8: Stop forward replication + +Before you can cut over traffic to the target, the changes to the source database need to finish being written to the target. Once the source is no longer receiving write traffic, MOLT Replicator will take some seconds to finish replicating the final changes. This is known as _drainage_. + +{% include molt/migration-stop-replication.md %} + +## Step 9: Verify the replicated data + +Repeat [Step 4](#step-4-verify-the-initial-data-load) to verify the updated data. + +## Step 10: Cut over application traffic + +With the target cluster verified and finalized, it's time to resume application traffic. + +### Modify application code + +In the application back end, make sure that the application now directs traffic to the CockroachDB cluster. For example: + +~~~yml +env: + - name: DATABASE_URL + value: postgres://root@localhost:26257/defaultdb?sslmode=verify-full +~~~ + +### Resume application traffic + +Scale up the Kubernetes deployment to the original number of replicas: + +{% include_cached copy-clipboard.html %} +~~~shell +kubectl scale deployment app --replicas=3 +~~~ + +This ends downtime. + +## Troubleshooting + +{% include molt/molt-troubleshooting-fetch.md %} +{% include molt/molt-troubleshooting-replication.md %} + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Phased Bulk Load Migration]({% link molt/migration-approach-phased-bulk-load.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) \ No newline at end of file diff --git a/src/current/_includes/molt/fetch-continue-after-interruption.md b/src/current/_includes/molt/fetch-continue-after-interruption.md new file mode 100644 index 00000000000..f69ccc70d9b --- /dev/null +++ b/src/current/_includes/molt/fetch-continue-after-interruption.md @@ -0,0 +1,75 @@ +If MOLT Fetch fails while loading data into CockroachDB from intermediate files, it exits with an error message, fetch ID, and *continuation token* for each table that failed to load on the target database. + +~~~json +{"level":"info","table":"public.employees","file_name":"shard_01_part_00000001.csv.gz","message":"creating or updating token for duplicate key value violates unique constraint \"employees_pkey\"; Key (id)=(1) already exists."} +{"level":"info","table":"public.employees","continuation_token":"a1b2c3d4-e5f6-7890-abcd-ef1234567890","message":"created continuation token"} +{"level":"info","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","message":"continue from this fetch ID"} +{"level":"error","message":"Error: error from fetching table for public.employees: error importing data: duplicate key value violates unique + constraint \"employees_pkey\" (SQLSTATE 23505)"} +~~~ + +You can use this information to [continue the task from the *continuation point*]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption) where it was interrupted. + +Continuation is only possible under the following conditions: + +- All data has been exported from the source database into intermediate files on [cloud]({% link molt/molt-fetch.md %}#bucket-path) or [local storage]({% link molt/molt-fetch.md %}#local-path). +- The *initial load* of source data into the target CockroachDB database is incomplete. +- The load uses [`IMPORT INTO` rather than `COPY FROM`](#data-load-mode). + +{{site.data.alerts.callout_info}} +Only one fetch ID and set of continuation tokens, each token corresponding to a table, are active at any time. See [List active continuation tokens]({% link molt/molt-fetch.md %}#list-active-continuation-tokens). +{{site.data.alerts.end}} + +The following command reattempts the data load starting from a specific continuation file, but you can also use individual continuation tokens to [reattempt the data load for individual tables]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption). + +
+ +{% include_cached copy-clipboard.html %} +~~~ shell +molt fetch \ +--source $SOURCE \ +--target $TARGET \ +--schema-filter 'migration_schema' \ +--table-filter 'employees|payments|orders' \ +--bucket-path 's3://migration/data/cockroach' \ +--table-handling truncate-if-exists \ +--ignore-replication-check \ +--fetch-id f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c \ +--continuation-file-name shard_01_part_00000001.csv.gz +~~~ +
+ +
+ +{% include_cached copy-clipboard.html %} +~~~ shell +molt fetch \ +--source $SOURCE \ +--target $TARGET \ +--table-filter 'employees|payments|orders' \ +--bucket-path 's3://migration/data/cockroach' \ +--table-handling truncate-if-exists \ +--ignore-replication-check \ +--fetch-id f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c \ +--continuation-file-name shard_01_part_00000001.csv.gz +~~~ +
+ +
+The command assumes an Oracle Multitenant (CDB/PDB) source. [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) specifies the container database (CDB) connection string. + +{% include_cached copy-clipboard.html %} +~~~ shell +molt fetch \ +--source $SOURCE \ +--source-cdb $SOURCE_CDB \ +--target $TARGET \ +--schema-filter 'migration_schema' \ +--table-filter 'employees|payments|orders' \ +--bucket-path 's3://migration/data/cockroach' \ +--table-handling truncate-if-exists \ +--ignore-replication-check \ +--fetch-id f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c \ +--continuation-file-name shard_01_part_00000001.csv.gz +~~~ +
\ No newline at end of file diff --git a/src/current/_includes/molt/fetch-data-load-output.md b/src/current/_includes/molt/fetch-data-load-output.md index eb5f4319391..f6c3425a63b 100644 --- a/src/current/_includes/molt/fetch-data-load-output.md +++ b/src/current/_includes/molt/fetch-data-load-output.md @@ -1,6 +1,6 @@ 1. Check the output to observe `fetch` progress. - {% if page.name == "migrate-load-replicate.md" %} + {% if page.name contains "delta" %}
If you included the `--pglogical-replication-slot-name` and `--pglogical-publication-and-slot-drop-and-recreate` flags, a publication named `molt_fetch` is automatically created: @@ -97,7 +97,7 @@ {"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":3,"tables":["public.employees","public.payments","public.payments"],"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"} ~~~ - {% if page.name != "migrate-bulk-load.md" %} + {% if page.name contains "delta" %} This message includes a `cdc_cursor` value. You must set the `--defaultGTIDSet` replication flag to this value when [starting Replicator](#start-replicator): {% include_cached copy-clipboard.html %} @@ -112,7 +112,7 @@ {"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":3,"tables":["migration_schema.employees","migration_schema.payments","migration_schema.payments"],"cdc_cursor":"backfillFromSCN=26685444,scn=26685786","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"} ~~~ - {% if page.name != "migrate-bulk-load.md" %} + {% if page.name contains "delta" %} This message shows the appropriate values for the `--backfillFromSCN` and `--scn` flags to use when [starting Replicator](#start-replicator): {% include_cached copy-clipboard.html %} diff --git a/src/current/_includes/molt/fetch-metrics.md b/src/current/_includes/molt/fetch-metrics.md index 2a3e39f2ad3..51caaa390ff 100644 --- a/src/current/_includes/molt/fetch-metrics.md +++ b/src/current/_includes/molt/fetch-metrics.md @@ -1,5 +1,3 @@ -### Fetch metrics - By default, MOLT Fetch exports [Prometheus](https://prometheus.io/) metrics at `http://127.0.0.1:3030/metrics`. You can override the address with `--metrics-listen-addr '{host}:{port}'`, where the endpoint will be `http://{host}:{port}/metrics`. Cockroach Labs recommends monitoring the following metrics during data load: @@ -14,9 +12,9 @@ Cockroach Labs recommends monitoring the following metrics during data load: | `molt_fetch_table_export_duration_ms` | Duration (in milliseconds) of a table's export. For example:
`molt_fetch_table_export_duration_ms{table="public.users"}` | | `molt_fetch_table_import_duration_ms` | Duration (in milliseconds) of a table's import. For example:
`molt_fetch_table_import_duration_ms{table="public.users"}` | -To visualize the preceding metrics, use the Grafana dashboard [bundled with your binary (`grafana_dashboard.json`)]({% link molt/molt-fetch.md %}#installation). The bundled dashboard matches your binary version. Alternatively, you can download the [latest dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json). +To visualize the preceding metrics, use the Grafana dashboard [bundled with your binary (`grafana_dashboard.json`)]({% link molt/molt-fetch-installation.md %}). The bundled dashboard matches your binary version. Alternatively, you can download the [latest dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json). -{% if page.name != "migrate-bulk-load.md" %} +{% if page.name contains "delta" %} {{site.data.alerts.callout_success}} For details on Replicator metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}). {{site.data.alerts.end}} diff --git a/src/current/_includes/molt/fetch-replication-output.md b/src/current/_includes/molt/fetch-replication-output.md deleted file mode 100644 index 84496eda6b4..00000000000 --- a/src/current/_includes/molt/fetch-replication-output.md +++ /dev/null @@ -1,26 +0,0 @@ -1. Check the output to observe `replicator` progress. - - A `starting replicator` message indicates that the task has started: - - ~~~ json - {"level":"info","time":"2025-02-10T14:28:13-05:00","message":"starting replicator"} - ~~~ - - The `staging database name` message contains the name of the staging schema. The schema name contains a replication marker for streaming changes, which is used for [resuming replication]({% link molt/molt-fetch.md %}#resume-replication), or performing [failback to the source database]({% link molt/migrate-failback.md %}). - - - ~~~ json - {"level":"info","time":"2025-02-10T14:28:13-05:00","message":"staging database name: _replicator_1739215693817700000"} - ~~~ - - `upserted rows` log messages indicate that changes were replicated to CockroachDB: - - ~~~ shell - DEBUG [Jan 22 13:52:40] upserted rows conflicts=0 duration=7.620208ms proposed=1 target="\"molt\".\"migration_schema\".\"employees\"" upserted=1 - ~~~ - - {% if page.name != "migrate-resume-replication.md" %} - {{site.data.alerts.callout_success}} - If replication is interrupted, you can [resume replication]({% link molt/migrate-resume-replication.md %}). - {{site.data.alerts.end}} - {% endif %} \ No newline at end of file diff --git a/src/current/_includes/molt/migration-create-sql-user.md b/src/current/_includes/molt/migration-create-sql-user.md index dd2e078e3a4..8993dd91ab6 100644 --- a/src/current/_includes/molt/migration-create-sql-user.md +++ b/src/current/_includes/molt/migration-create-sql-user.md @@ -45,7 +45,7 @@ ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO crdb_user; ~~~ -Depending on the MOLT Fetch [data load mode](#data-load-mode) you will use, grant the necessary privileges to run either [`IMPORT INTO`](#import-into-privileges) or [`COPY FROM`](#copy-from-privileges) on the target tables: +Depending on the MOLT Fetch [data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) you will use, grant the necessary privileges to run either [`IMPORT INTO`](#import-into-privileges) or [`COPY FROM`](#copy-from-privileges) on the target tables: #### `IMPORT INTO` privileges @@ -81,7 +81,7 @@ Grant [`admin`]({% link {{site.current_cloud_version}}/security-reference/author GRANT admin TO crdb_user; ~~~ -{% if page.name != "migrate-bulk-load.md" %} +{% if page.name contains "delta" %} #### Replication privileges Grant permissions to create the staging schema for replication: diff --git a/src/current/_includes/molt/migration-prepare-database.md b/src/current/_includes/molt/migration-prepare-database.md index a37f685adf5..86f572a0a7b 100644 --- a/src/current/_includes/molt/migration-prepare-database.md +++ b/src/current/_includes/molt/migration-prepare-database.md @@ -1,4 +1,13 @@ -#### Create migration user on source database +### Create migration user on source database + +{% if page.source_db_not_selectable %} +{% else %} +
+ + + +
+{% endif %} Create a dedicated migration user (for example, `MIGRATION_USER`) on the source database. This user is responsible for reading data from source tables during the migration. You will pass this username in the [source connection string](#source-connection-string). @@ -18,7 +27,7 @@ GRANT SELECT ON ALL TABLES IN SCHEMA migration_schema TO migration_user; ALTER DEFAULT PRIVILEGES IN SCHEMA migration_schema GRANT SELECT ON TABLES TO migration_user; ~~~ -{% if page.name != "migrate-bulk-load.md" %} +{% if page.name contains "delta" %} Grant the `SUPERUSER` role to the user (recommended for replication configuration): {% include_cached copy-clipboard.html %} @@ -54,7 +63,7 @@ GRANT SELECT ON mysql.gtid_executed TO 'migration_user'@'%'; FLUSH PRIVILEGES; ~~~ -{% if page.name != "migrate-bulk-load.md" %} +{% if page.name contains "delta" %} For replication, grant additional privileges for binlog access: {% include_cached copy-clipboard.html %} @@ -77,7 +86,7 @@ When migrating from Oracle Multitenant (PDB/CDB), this should be a [common user] Grant the user privileges to connect, read metadata, and `SELECT` and `FLASHBACK` the tables you plan to migrate. The tables should all reside in a single schema (for example, `migration_schema`). For details, refer to [Schema and table filtering](#schema-and-table-filtering). -##### Oracle Multitenant (PDB/CDB) user privileges +#### Oracle Multitenant (PDB/CDB) user privileges Connect to the Oracle CDB as a DBA and grant the following: @@ -126,7 +135,7 @@ GRANT SELECT ON V_$TRANSACTION TO C##MIGRATION_USER; GRANT SELECT, FLASHBACK ON migration_schema.tbl TO C##MIGRATION_USER; ~~~ -##### Single-tenant Oracle user privileges +#### Single-tenant Oracle user privileges Connect to the Oracle database as a DBA and grant the following: @@ -157,8 +166,17 @@ GRANT SELECT, FLASHBACK ON migration_schema.tbl TO MIGRATION_USER; ~~~
-{% if page.name != "migrate-bulk-load.md" %} -#### Configure source database for replication +{% if page.name contains "delta" %} +### Configure source database for replication + +{% if page.source_db_not_selectable %} +{% else %} +
+ + + +
+{% endif %} {{site.data.alerts.callout_info}} Connect to the primary instance (PostgreSQL primary, MySQL primary/master, or Oracle primary), **not** a replica. Replicas cannot provide the necessary replication checkpoints and transaction metadata required for ongoing replication. @@ -210,7 +228,8 @@ GTID replication sends all database changes to Replicator. To limit replication
-##### Enable ARCHIVELOG and FORCE LOGGING + +#### Enable ARCHIVELOG and FORCE LOGGING Enable `ARCHIVELOG` mode for LogMiner to access archived redo logs: @@ -253,7 +272,7 @@ ALTER DATABASE FORCE LOGGING; SELECT force_logging FROM v$database; -- Expected: YES ~~~ -##### Create source sentinel table +#### Create source sentinel table Create a checkpoint table called `REPLICATOR_SENTINEL` in the Oracle schema you will migrate: @@ -272,7 +291,7 @@ Grant privileges to modify the checkpoint table. In Oracle Multitenant, grant th GRANT SELECT, INSERT, UPDATE ON migration_schema."REPLICATOR_SENTINEL" TO C##MIGRATION_USER; ~~~ -##### Grant LogMiner privileges +#### Grant LogMiner privileges Grant LogMiner privileges. In Oracle Multitenant, grant the permissions on the CDB: @@ -302,7 +321,7 @@ The user must: - Retrieve active transaction information to determine the starting point for ongoing replication. - Update the internal [`REPLICATOR_SENTINEL` table](#create-source-sentinel-table) created on the Oracle source schema by the DBA. -##### Verify LogMiner privileges +#### Verify LogMiner privileges Query the locations of redo files in the Oracle database: diff --git a/src/current/_includes/molt/migration-schema-design-practices.md b/src/current/_includes/molt/migration-schema-design-practices.md index 644fad6de81..6df1dfa02a7 100644 --- a/src/current/_includes/molt/migration-schema-design-practices.md +++ b/src/current/_includes/molt/migration-schema-design-practices.md @@ -31,12 +31,12 @@ Convert the source table definitions into CockroachDB-compatible equivalents. Co ~~~
- - MOLT Fetch can automatically define matching CockroachDB tables using the {% if page.name != "migration-strategy.md" %}[`drop-on-target-and-recreate`](#table-handling-mode){% else %}[`drop-on-target-and-recreate`]({% link molt/molt-fetch.md %}#target-table-handling){% endif %} option. + - MOLT Fetch can automatically define matching CockroachDB tables using the {% if page.name != "migration-strategy.md" %}[`drop-on-target-and-recreate`](#table-handling-mode){% else %}[`drop-on-target-and-recreate`]({% link molt/molt-fetch.md %}#handle-target-tables){% endif %} option. - If you define the target tables manually, review how MOLT Fetch handles [type mismatches]({% link molt/molt-fetch.md %}#mismatch-handling). You can use the {% if page.name != "migration-strategy.md" %}[MOLT Schema Conversion Tool](#schema-conversion-tool){% else %}[MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}){% endif %} to create matching table definitions.
- - By default, table and column names are case-insensitive in MOLT Fetch. If using the [`--case-sensitive`]({% link molt/molt-fetch.md %}#global-flags) flag, schema, table, and column names must match Oracle's default uppercase identifiers. Use quoted names on the target to preserve case. For example, the following CockroachDB SQL statement will error: + - By default, table and column names are case-insensitive in MOLT Fetch. If using the [`--case-sensitive`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag, schema, table, and column names must match Oracle's default uppercase identifiers. Use quoted names on the target to preserve case. For example, the following CockroachDB SQL statement will error: ~~~ sql CREATE TABLE co.stores (... store_id ...); @@ -57,6 +57,6 @@ Convert the source table definitions into CockroachDB-compatible equivalents. Co Avoid using sequential keys. To learn more about the performance issues that can result from their use, refer to the [guidance on indexing with sequential keys]({% link {{site.current_cloud_version}}/sql-faqs.md %}#how-do-i-generate-unique-slowly-increasing-sequential-numbers-in-cockroachdb). If a sequential key is necessary in your CockroachDB table, you must create it manually, after using [MOLT Fetch]({% link molt/molt-fetch.md %}) to load and replicate the data. {{site.data.alerts.end}} -- Review [Transformations]({% link molt/molt-fetch.md %}#transformations) to understand how computed columns and partitioned tables can be mapped to the target, and how target tables can be renamed. +- Review [Transformations]({% link molt/molt-fetch.md %}#define-transformations) to understand how computed columns and partitioned tables can be mapped to the target, and how target tables can be renamed. - By default on CockroachDB, `INT` is an alias for `INT8`, which creates 64-bit signed integers. PostgreSQL and MySQL default to 32-bit integers. Depending on your source database or application requirements, you may need to change the integer size to `4`. For more information, refer to [Considerations for 64-bit signed integers]({% link {{ site.current_cloud_version }}/int.md %}#considerations-for-64-bit-signed-integers). \ No newline at end of file diff --git a/src/current/_includes/molt/migration-stop-replication.md b/src/current/_includes/molt/migration-stop-replication.md index 613805d6473..b9376bdf40c 100644 --- a/src/current/_includes/molt/migration-stop-replication.md +++ b/src/current/_includes/molt/migration-stop-replication.md @@ -1,8 +1,43 @@ -{% if page.name != "migrate-failback.md" %} -1. Stop application traffic to your source database. **This begins downtime.** -{% endif %} +
+1. Wait for replication to drain, which means that all transactions that occurred on the source database have been fully processed and replicated. There are several ways to determine that replication has fully drained: + - When replication is caught up, you will not see new `upserted rows` logs. + - If you set up the replication metrics endpoint with [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) in the preceding steps, metrics are available at: + + ~~~ + http://{host}:{port}/_/varz + ~~~ + + Use the following Prometheus alert expression to observe when the combined rate of upserts and deletes is `0` for each schema: + + ~~~ + sum by (schema) (rate(apply_upserts_total[$__rate_interval]) + rate(apply_deletes_total[$__rate_interval])) + ~~~ + - You can also check Prometheus metrics associated with replication lag, including [`target_apply_transaction_lag_seconds`]({% link molt/replicator-metrics.md %}#target-apply-transaction-lag-seconds), [`core_source_lag_seconds`]({% link molt/replicator-metrics.md %}#core-source-lag-seconds), and [`source_commit_to_apply_lag_seconds`]({% link molt/replicator-metrics.md %}#source-commit-to-apply-lag-seconds). + +2. Cancel replication by entering `ctrl-c` to issue a `SIGTERM` signal. This returns an exit code `0`. +
+ +
+1. Wait for replication to drain, which means that all transactions that occurred on the source database have been fully processed and replicated. There are several ways to determine that replication has fully drained: + - When replication is caught up, you will not see new `upserted rows` logs. + - If you set up the replication metrics endpoint with [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) in the preceding steps, metrics are available at: + + ~~~ + http://{host}:{port}/_/varz + ~~~ + + Use the following Prometheus alert expression to observe when the combined rate of upserts and deletes is `0` for each schema: + + ~~~ + sum by (schema) (rate(apply_upserts_total[$__rate_interval]) + rate(apply_deletes_total[$__rate_interval])) + ~~~ + - You can also check Prometheus metrics associated with replication lag, including [`source_commit_to_apply_lag_seconds`]({% link molt/replicator-metrics.md %}#source-commit-to-apply-lag-seconds). + +2. Cancel replication by entering `ctrl-c` to issue a `SIGTERM` signal. This returns an exit code `0`. +
-1. Wait for replication to drain, which means that all transactions that occurred on the source database have been fully processed and replicated to CockroachDB. There are two ways to determine that replication has fully drained: +
+1. Wait for replication to drain, which means that all transactions that occurred on the source database have been fully processed and replicated. There are several ways to determine that replication has fully drained: - When replication is caught up, you will not see new `upserted rows` logs. - If you set up the replication metrics endpoint with [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) in the preceding steps, metrics are available at: @@ -15,5 +50,7 @@ ~~~ sum by (schema) (rate(apply_upserts_total[$__rate_interval]) + rate(apply_deletes_total[$__rate_interval])) ~~~ + - You can also check Prometheus metrics associated with replication lag, including [`target_apply_transaction_lag_seconds`]({% link molt/replicator-metrics.md %}#target-apply-transaction-lag-seconds), [`core_source_lag_seconds`]({% link molt/replicator-metrics.md %}#core-source-lag-seconds) [`source_commit_to_apply_lag_seconds`]({% link molt/replicator-metrics.md %}#source-commit-to-apply-lag-seconds), and [`stage_commit_lag_seconds`]({% link molt/replicator-metrics.md %}#stage-commit-lag-seconds). -1. Cancel replication to CockroachDB by entering `ctrl-c` to issue a `SIGTERM` signal. This returns an exit code `0`. \ No newline at end of file +2. Cancel replication by entering `ctrl-c` to issue a `SIGTERM` signal. This returns an exit code `0`. +
\ No newline at end of file diff --git a/src/current/_includes/molt/molt-connection-strings.md b/src/current/_includes/molt/molt-connection-strings.md index c865d976b4d..11a4540b467 100644 --- a/src/current/_includes/molt/molt-connection-strings.md +++ b/src/current/_includes/molt/molt-connection-strings.md @@ -4,7 +4,7 @@ Define the connection strings for the [source](#source-connection-string) and [t The `--source` flag specifies the connection string for the source database: -{% if page.name != "migrate-bulk-load.md" %} +{% if page.name contains "delta" %} {{site.data.alerts.callout_info}} The source connection **must** point to the primary instance (PostgreSQL primary, MySQL primary/master, or Oracle primary). Replicas cannot provide the necessary replication checkpoints and transaction metadata required for ongoing replication. {{site.data.alerts.end}} diff --git a/src/current/_includes/molt/molt-drop-constraints-indexes.md b/src/current/_includes/molt/molt-drop-constraints-indexes.md index c360991eff5..7c98fc6bf54 100644 --- a/src/current/_includes/molt/molt-drop-constraints-indexes.md +++ b/src/current/_includes/molt/molt-drop-constraints-indexes.md @@ -1,11 +1,11 @@ To optimize data load performance, drop all non-`PRIMARY KEY` [constraints]({% link {{ site.current_cloud_version }}/alter-table.md %}#drop-constraint) and [indexes]({% link {{site.current_cloud_version}}/drop-index.md %}) on the target CockroachDB database before migrating: -{% if page.name == "molt-fetch.md" %} +{% if page.name == "molt-fetch-best-practices.md" %} - [`FOREIGN KEY`]({% link {{ site.current_cloud_version }}/foreign-key.md %}) - [`UNIQUE`]({% link {{ site.current_cloud_version }}/unique.md %}) - [Secondary indexes]({% link {{ site.current_cloud_version }}/schema-design-indexes.md %}) - [`CHECK`]({% link {{ site.current_cloud_version }}/check.md %}) - [`DEFAULT`]({% link {{ site.current_cloud_version }}/default-value.md %}) - - [`NOT NULL`]({% link {{ site.current_cloud_version }}/not-null.md %}) (you do not need to drop this constraint when using `drop-on-target-and-recreate` for [table handling](#target-table-handling)) + - [`NOT NULL`]({% link {{ site.current_cloud_version }}/not-null.md %}) (you do not need to drop this constraint when using `drop-on-target-and-recreate` for [table handling]({% link molt/molt-fetch.md %}#handle-target-tables)) {{site.data.alerts.callout_danger}} Do **not** drop [`PRIMARY KEY`]({% link {{ site.current_cloud_version }}/primary-key.md %}) constraints. diff --git a/src/current/_includes/molt/molt-limitations-fetch.md b/src/current/_includes/molt/molt-limitations-fetch.md new file mode 100644 index 00000000000..6b98096da7a --- /dev/null +++ b/src/current/_includes/molt/molt-limitations-fetch.md @@ -0,0 +1,25 @@ +- Only tables with [primary key]({% link {{ site.current_cloud_version }}/primary-key.md %}) types of [`INT`]({% link {{ site.current_cloud_version }}/int.md %}), [`FLOAT`]({% link {{ site.current_cloud_version }}/float.md %}), or [`UUID`]({% link {{ site.current_cloud_version }}/uuid.md %}) can be sharded with [`--export-concurrency`]({% link molt/molt-fetch-best-practices.md %}#configure-the-source-database-and-connection). +- `GEOMETRY` and `GEOGRAPHY` types are not supported. + +{% if page.name contains "molt-fetch" %} +The following limitation is specific to PostgreSQL sources: + +- `OID LOB` types in PostgreSQL are not supported, although similar types like `BYTEA` are supported. +
+ +The following limitations are specific to Oracle sources: + +- Migrations must be performed from a single Oracle schema. You **must** include [`--schema-filter`]({% link molt/molt-fetch.md %}#select-data-to-migrate) so that MOLT Fetch only loads data from the specified schema. Refer to [Schema and table filtering]({% link molt/molt-fetch.md %}#select-data-to-migrate). + - Specifying [`--table-filter`]({% link molt/molt-fetch.md %}#select-data-to-migrate) is also strongly recommended to ensure that only necessary tables are migrated from the Oracle schema. +- Oracle advises against `LONG RAW` columns and [recommends converting them to `BLOB`](https://www.orafaq.com/wiki/LONG_RAW#History). `LONG RAW` can only store binary values up to 2GB, and only one `LONG RAW` column per table is supported. +{% else %} +
+- `OID LOB` types in PostgreSQL are not supported, although similar types like `BYTEA` are supported. +
+ +
+- Migrations must be performed from a single Oracle schema. You **must** include [`--schema-filter`]({% link molt/molt-fetch.md %}#select-data-to-migrate) so that MOLT Fetch only loads data from the specified schema. Refer to [Schema and table filtering]({% link molt/molt-fetch.md %}#select-data-to-migrate). + - Specifying [`--table-filter`]({% link molt/molt-fetch.md %}#select-data-to-migrate) is also strongly recommended to ensure that only necessary tables are migrated from the Oracle schema. +- Oracle advises against `LONG RAW` columns and [recommends converting them to `BLOB`](https://www.orafaq.com/wiki/LONG_RAW#History). `LONG RAW` can only store binary values up to 2GB, and only one `LONG RAW` column per table is supported. +
+{% endif %} \ No newline at end of file diff --git a/src/current/_includes/molt/molt-limitations-replicator.md b/src/current/_includes/molt/molt-limitations-replicator.md new file mode 100644 index 00000000000..16781d17bda --- /dev/null +++ b/src/current/_includes/molt/molt-limitations-replicator.md @@ -0,0 +1,39 @@ +- Replication modes require connection to the primary instance (PostgreSQL primary, MySQL primary/master, or Oracle primary). MOLT cannot obtain replication checkpoints or transaction metadata from replicas. +- Running DDL on the source or target while replication is in progress can cause replication failures. +- `TRUNCATE` operations on the source are not captured. Only `INSERT`, `UPDATE`, `UPSERT`, and `DELETE` events are replicated. +- Changes to virtual columns are not replicated automatically. To migrate these columns, you must define them explicitly with [transformation rules]({% link molt/molt-fetch.md %}#define-transformations). + +{% if page.name contains "molt-replicator" %} +The following limitation is specific to MySQL sources: + +- MySQL replication is supported only with [GTID](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html)-based configurations. Binlog-based features that do not use GTID are not supported. + + +The following limitations are specific to Oracle sources: + +- Replication will not work for tables or column names exceeding 30 characters. This is a [limitation of Oracle LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/21/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4). +- The following data types are not supported for replication: + - User-defined types (UDTs) + - Nested tables + - `VARRAY` + - `LONGBLOB`/`CLOB` columns (over 4000 characters) +- If your Oracle workload executes `UPDATE` statements that modify only LOB columns, these `UPDATE` statements are not supported by Oracle LogMiner and will not be replicated. +- If you are using Oracle 11 and execute `UPDATE` statements on `XMLTYPE` or LOB columns, those changes are not supported by Oracle LogMiner and will be excluded from ongoing replication. +- If you are migrating LOB columns from Oracle 12c, use [AWS DMS Binary Reader](https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html#CHAP_Source.Oracle.CDC) instead of LogMiner. Oracle LogMiner does not support LOB replication in 12c. +{% else %} +
+- MySQL replication is supported only with [GTID](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html)-based configurations. Binlog-based features that do not use GTID are not supported. +
+ +
+- Replication will not work for tables or column names exceeding 30 characters. This is a [limitation of Oracle LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/21/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4). +- The following data types are not supported for replication: + - User-defined types (UDTs) + - Nested tables + - `VARRAY` + - `LONGBLOB`/`CLOB` columns (over 4000 characters) +- If your Oracle workload executes `UPDATE` statements that modify only LOB columns, these `UPDATE` statements are not supported by Oracle LogMiner and will not be replicated. +- If you are using Oracle 11 and execute `UPDATE` statements on `XMLTYPE` or LOB columns, those changes are not supported by Oracle LogMiner and will be excluded from ongoing replication. +- If you are migrating LOB columns from Oracle 12c, use [AWS DMS Binary Reader](https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html#CHAP_Source.Oracle.CDC) instead of LogMiner. Oracle LogMiner does not support LOB replication in 12c. +
+{% endif %} \ No newline at end of file diff --git a/src/current/_includes/molt/molt-limitations-verify.md b/src/current/_includes/molt/molt-limitations-verify.md new file mode 100644 index 00000000000..0ef97eb5462 --- /dev/null +++ b/src/current/_includes/molt/molt-limitations-verify.md @@ -0,0 +1,14 @@ +- MOLT Verify compares 20,000 rows at a time by default, and row values can change between batches, potentially resulting in temporary inconsistencies in data. To configure the row batch size, use the `--row_batch_size` [flag]({% link molt/molt-verify.md %}#flags). +- MOLT Verify checks for collation mismatches on [primary key]({% link {{site.current_cloud_version}}/primary-key.md %}) columns. This may cause validation to fail when a [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) is used as a primary key and the source and target databases are using different [collations]({% link {{site.current_cloud_version}}/collate.md %}). +- MOLT Verify might give an error in case of schema changes on either the source or target database. +- [Geospatial types]({% link {{site.current_cloud_version}}/spatial-data-overview.md %}#spatial-objects) cannot yet be compared. + +{% if page.name contains "molt-verify" %} +The following limitation is specific to MySQL sources: + +- MOLT Verify only supports comparing one MySQL database to a whole CockroachDB schema (which is assumed to be `public`). +{% else %} +
+- MOLT Verify only supports comparing one MySQL database to a whole CockroachDB schema (which is assumed to be `public`). +
+{% endif %} \ No newline at end of file diff --git a/src/current/_includes/molt/molt-limitations.md b/src/current/_includes/molt/molt-limitations.md index 4e41fb29b93..b529137bade 100644 --- a/src/current/_includes/molt/molt-limitations.md +++ b/src/current/_includes/molt/molt-limitations.md @@ -1,41 +1,16 @@ ### Limitations -#### Fetch limitations +#### MOLT Fetch limitations -
-- `OID LOB` types in PostgreSQL are not supported, although similar types like `BYTEA` are supported. -
+{% include molt/molt-limitations-fetch.md %} -
-- Migrations must be performed from a single Oracle schema. You **must** include [`--schema-filter`](#schema-and-table-filtering) so that MOLT Fetch only loads data from the specified schema. Refer to [Schema and table filtering](#schema-and-table-filtering). - - Specifying [`--table-filter`](#schema-and-table-filtering) is also strongly recommended to ensure that only necessary tables are migrated from the Oracle schema. -- Oracle advises against `LONG RAW` columns and [recommends converting them to `BLOB`](https://www.orafaq.com/wiki/LONG_RAW#History). `LONG RAW` can only store binary values up to 2GB, and only one `LONG RAW` column per table is supported. -
+{% if page.name contains "delta" %} +#### MOLT Replicator limitations -- Only tables with [primary key]({% link {{ site.current_cloud_version }}/primary-key.md %}) types of [`INT`]({% link {{ site.current_cloud_version }}/int.md %}), [`FLOAT`]({% link {{ site.current_cloud_version }}/float.md %}), or [`UUID`]({% link {{ site.current_cloud_version }}/uuid.md %}) can be sharded with [`--export-concurrency`]({% link molt/molt-fetch.md %}#best-practices). +{% include molt/molt-limitations-replicator.md %} -{% if page.name != "migrate-bulk-load.md" %} -#### Replicator limitations +{% endif %} -- Replication modes require connection to the primary instance (PostgreSQL primary, MySQL primary/master, or Oracle primary). MOLT cannot obtain replication checkpoints or transaction metadata from replicas. +#### MOLT Verify limitations -
-- MySQL replication is supported only with [GTID](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html)-based configurations. Binlog-based features that do not use GTID are not supported. -
- -
-- Replication will not work for tables or column names exceeding 30 characters. This is a [limitation of Oracle LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/21/sutil/oracle-logminer-utility.html#GUID-7594F0D7-0ACD-46E6-BD61-2751136ECDB4). -- The following data types are not supported for replication: - - User-defined types (UDTs) - - Nested tables - - `VARRAY` - - `LONGBLOB`/`CLOB` columns (over 4000 characters) -- If your Oracle workload executes `UPDATE` statements that modify only LOB columns, these `UPDATE` statements are not supported by Oracle LogMiner and will not be replicated. -- If you are using Oracle 11 and execute `UPDATE` statements on `XMLTYPE` or LOB columns, those changes are not supported by Oracle LogMiner and will be excluded from ongoing replication. -- If you are migrating LOB columns from Oracle 12c, use [AWS DMS Binary Reader](https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html#CHAP_Source.Oracle.CDC) instead of LogMiner. Oracle LogMiner does not support LOB replication in 12c. -
- -- Running DDL on the source or target while replication is in progress can cause replication failures. -- `TRUNCATE` operations on the source are not captured. Only `INSERT`, `UPDATE`, `UPSERT`, and `DELETE` events are replicated. -- Changes to virtual columns are not replicated automatically. To migrate these columns, you must define them explicitly with [transformation rules]({% link molt/molt-fetch.md %}#transformations). -{% endif %} \ No newline at end of file +{% include molt/molt-limitations-verify.md %} \ No newline at end of file diff --git a/src/current/_includes/molt/molt-setup.md b/src/current/_includes/molt/molt-setup.md deleted file mode 100644 index 9b54d1dc5e6..00000000000 --- a/src/current/_includes/molt/molt-setup.md +++ /dev/null @@ -1,98 +0,0 @@ -
- - - -
- - -## Before you begin - -- Create a CockroachDB [{{ site.data.products.cloud }}]({% link cockroachcloud/create-your-cluster.md %}) or [{{ site.data.products.core }}]({% link {{ site.current_cloud_version }}/install-cockroachdb-mac.md %}) cluster. -- Install the [MOLT (Migrate Off Legacy Technology)]({% link releases/molt.md %}#installation) tools. -- Review the [Fetch]({% link molt/molt-fetch.md %}#best-practices) and {% if page.name != "migrate-bulk-load.md" %}[Replicator]({% link molt/molt-replicator.md %}#best-practices){% endif %} best practices. -- Review [Migration Strategy]({% link molt/migration-strategy.md %}). - -
-{% include molt/oracle-migration-prerequisites.md %} -
- -{% include molt/molt-limitations.md %} - -## Prepare the source database - -{% include molt/migration-prepare-database.md %} - -## Prepare the target database - -### Define the target tables - -{% include molt/migration-prepare-schema.md %} - -### Create the SQL user - -{% include molt/migration-create-sql-user.md %} - -{% if page.name != "migrate-bulk-load.md" %} -### Configure GC TTL - -Before starting the [initial data load](#start-fetch), configure the [garbage collection (GC) TTL]({% link {{ site.current_cloud_version }}/configure-replication-zones.md %}#gc-ttlseconds) on the source CockroachDB cluster to ensure that historical data remains available when replication begins. The GC TTL must be long enough to cover the full duration of the data load. - -Increase the GC TTL before starting the data load. For example, to set the GC TTL to 24 hours: - -{% include_cached copy-clipboard.html %} -~~~ sql -ALTER DATABASE defaultdb CONFIGURE ZONE USING gc.ttlseconds = 86400; -~~~ - -{{site.data.alerts.callout_info}} -The GC TTL duration must be higher than your expected time for the initial data load. -{{site.data.alerts.end}} - -Once replication has started successfully (which automatically protects its own data range), you can restore the GC TTL to its original value. For example, to restore to 5 minutes: - -{% include_cached copy-clipboard.html %} -~~~ sql -ALTER DATABASE defaultdb CONFIGURE ZONE USING gc.ttlseconds = 300; -~~~ - -For details, refer to [Protect Changefeed Data from Garbage Collection]({% link {{ site.current_cloud_version }}/protect-changefeed-data.md %}). -{% endif %} - -## Configure Fetch - -When you run `molt fetch`, you can configure the following options for data load: - -- [Connection strings](#connection-strings): Specify URL‑encoded source and target connections. -- [Intermediate file storage](#intermediate-file-storage): Export data to cloud storage or a local file server. -- [Table handling mode](#table-handling-mode): Determine how existing target tables are initialized before load. -- [Schema and table filtering](#schema-and-table-filtering): Specify schema and table names to migrate. -- [Data load mode](#data-load-mode): Choose between `IMPORT INTO` and `COPY FROM`. -- [Fetch metrics](#fetch-metrics): Configure metrics collection during initial data load. - -
- - - -
- -### Connection strings - -{% include molt/molt-connection-strings.md %} - -### Intermediate file storage - -{% include molt/fetch-intermediate-file-storage.md %} - -### Table handling mode - -{% include molt/fetch-table-handling.md %} - -### Schema and table filtering - -{% include molt/fetch-schema-table-filtering.md %} - -### Data load mode - -{% include molt/fetch-data-load-modes.md %} - -{% include molt/fetch-metrics.md %} \ No newline at end of file diff --git a/src/current/_includes/molt/molt-troubleshooting-failback.md b/src/current/_includes/molt/molt-troubleshooting-failback.md index 26d68f40da9..fee32c13c4a 100644 --- a/src/current/_includes/molt/molt-troubleshooting-failback.md +++ b/src/current/_includes/molt/molt-troubleshooting-failback.md @@ -20,7 +20,7 @@ This error occurs when the [CockroachDB changefeed]({% link {{ site.current_clou transient error: 400 Bad Request: unknown schema: ~~~ -The webhook URL path is specified in the `INTO` clause when you [create the changefeed]({% link molt/migrate-failback.md %}#create-the-cockroachdb-changefeed). For example: `webhook-https://replicator-host:30004/database/schema`. +The webhook URL path is specified in the `INTO` clause when you create the changefeed. For example: `webhook-https://replicator-host:30004/database/schema`. **Resolution:** Verify the webhook path format matches your target database type: diff --git a/src/current/_includes/molt/molt-troubleshooting-fetch.md b/src/current/_includes/molt/molt-troubleshooting-fetch.md index 31c20563098..d171ff278cc 100644 --- a/src/current/_includes/molt/molt-troubleshooting-fetch.md +++ b/src/current/_includes/molt/molt-troubleshooting-fetch.md @@ -1,6 +1,6 @@ ### Fetch issues -##### Fetch exits early due to mismatches +#### Fetch exits early due to mismatches When run in `none` or `truncate-if-exists` mode, `molt fetch` exits early in the following cases, and will output a log with a corresponding `mismatch_tag` and `failable_mismatch` set to `true`: @@ -59,7 +59,7 @@ If you receive `ORA-01950: no privileges on tablespace 'USERS'`, it means the Or ALTER USER migration_schema QUOTA UNLIMITED ON USERS; ~~~ -##### No tables to drop and recreate on target +#### No tables to drop and recreate on target When expecting a bulk load but seeing `no tables to drop and recreate on the target`, ensure the migration user has `SELECT` and `FLASHBACK` privileges on each table to be migrated. For example: @@ -69,11 +69,11 @@ GRANT SELECT, FLASHBACK ON migration_schema.payments TO C##MIGRATION_USER; GRANT SELECT, FLASHBACK ON migration_schema.orders TO C##MIGRATION_USER; ~~~ -##### Table or view does not exist +#### Table or view does not exist -If the Oracle migration user lacks privileges on certain tables, you may receive errors stating that the table or view does not exist. Either use `--table-filter` to {% if page.name != "migrate-load-replicate.md" %}[limit the tables to be migrated]({% link molt/migrate-load-replicate.md %}#schema-and-table-filtering){% else %}[limit the tables to be migrated](#schema-and-table-filtering){% endif %}, or grant the migration user `SELECT` privileges on all objects in the schema. Refer to {% if page.name != "migrate-load-replicate.md" %}[Create migration user on source database]({% link molt/migrate-load-replicate.md %}#create-migration-user-on-source-database){% else %}[Create migration user on source database](#create-migration-user-on-source-database){% endif %}. +If the Oracle migration user lacks privileges on certain tables, you may receive errors stating that the table or view does not exist. Either use `--table-filter` to {% if page.name contains "delta" %}[limit the tables to be migrated](#schema-and-table-filtering){% else %}[limit the tables to be migrated]({% link molt/molt-fetch.md %}#schema-and-table-selection){% endif %}, or grant the migration user `SELECT` privileges on all objects in the schema. Refer to {% if page.name contains "delta" %}[Create migration user on source database](#create-migration-user-on-source-database){% else %}[Create migration user on source database]({% link {{site.current_cloud_version}}/create-user.md %}){% endif %}. -##### Oracle sessions remain open after forcefully stopping `molt` or `replicator` +#### Oracle sessions remain open after forcefully stopping `molt` or `replicator` If you shut down `molt` or `replicator` unexpectedly (e.g., with `kill -9` or a system crash), Oracle sessions opened by these tools may remain active. diff --git a/src/current/_includes/molt/molt-troubleshooting-replication.md b/src/current/_includes/molt/molt-troubleshooting-replication.md index 431cb1c8b66..7f01a9680e6 100644 --- a/src/current/_includes/molt/molt-troubleshooting-replication.md +++ b/src/current/_includes/molt/molt-troubleshooting-replication.md @@ -84,7 +84,7 @@ could not connect to source database: failed to connect to `user=migration_user run SELECT pg_create_logical_replication_slot('molt_slot', 'pgoutput'); in source database ~~~ -**Resolution:** {% if page.name != "migrate-load-replicate.md" %}[Create the replication slot]({% link molt/migrate-load-replicate.md %}#configure-source-database-for-replication){% else %}[Create the replication slot](#configure-source-database-for-replication){% endif %} or verify the correct slot name: +**Resolution:** {% if page.name contains "delta" %}[Create the replication slot](#configure-source-database-for-replication){% else %}[Create the replication slot]({% link molt/delta-migration-postgres.md %}#configure-source-database-for-replication){% endif %} or verify the correct slot name: {% include_cached copy-clipboard.html %} ~~~ sql @@ -92,7 +92,7 @@ SELECT pg_create_logical_replication_slot('molt_slot', 'pgoutput'); ~~~ -{% if page.name == "migrate-resume-replication.md" %} +{% if page.name contains "delta" %} ##### Resuming from stale location
@@ -177,7 +177,7 @@ Invalid GTIDs can occur when GTIDs are purged due to insufficient binlog retenti **Resolution:** Use a valid GTID from `SHOW MASTER STATUS` (MySQL < 8.0) or `SHOW BINARY LOG STATUS` (MySQL 8.0+) and ensure you're connecting to the primary host. If GTIDs are being purged, increase binlog retention. -{% if page.name == "migrate-resume-replication.md" %} +{% if page.name contains "delta" %} ##### Stale GTID from cache **Resolution:** Clear the `_replicator` database memo table: diff --git a/src/current/_includes/molt/optimize-replicator-performance.md b/src/current/_includes/molt/optimize-replicator-performance.md index 3c6ce6f4aad..1878bf6c634 100644 --- a/src/current/_includes/molt/optimize-replicator-performance.md +++ b/src/current/_includes/molt/optimize-replicator-performance.md @@ -14,4 +14,4 @@ The following parameters apply to PostgreSQL, Oracle, and CockroachDB (failback) | [`--enableParallelApplies`]({% link molt/replicator-flags.md %}#enable-parallel-applies) | Improve apply throughput for independent tables and table groups that share foreign key dependencies. Increases memory and target connection usage, so ensure you increase [`--targetMaxPoolSize`]({% link molt/replicator-flags.md %}#target-max-pool-size) or reduce [`--parallelism`]({% link molt/replicator-flags.md %}#parallelism). | | [`--flushPeriod`]({% link molt/replicator-flags.md %}#flush-period) | Set to the maximum allowable time between flushes (for example, `10s` if data must be applied within 10 seconds). Works with [`--flushSize`]({% link molt/replicator-flags.md %}#flush-size) to control when buffered mutations are committed to the target. | | [`--quiescentPeriod`]({% link molt/replicator-flags.md %}#quiescent-period) | Lower this value if constraint violations resolve quickly on your workload to make retries more frequent and reduce latency. Do not lower if constraint violations take time to resolve. | -| [`--scanSize`]({% link molt/replicator-flags.md %}#scan-size) | Applies to {% if page.name != "migrate-failback".md" %}[failback]({% link molt/migrate-failback.md %}){% else %}failback{% endif %} (`replicator start`) scenarios **only**. Balance memory usage and throughput. Increase to read more rows at once from the CockroachDB staging cluster for higher throughput, at the cost of memory pressure. Decrease to reduce memory pressure and increase stability. | \ No newline at end of file +| [`--scanSize`]({% link molt/replicator-flags.md %}#scan-size) | Applies to [failback]({% link molt/molt-replicator.md %}#failback-replication) (`replicator start`) scenarios **only**. Balance memory usage and throughput. Increase to read more rows at once from the CockroachDB staging cluster for higher throughput, at the cost of memory pressure. Decrease to reduce memory pressure and increase stability. | \ No newline at end of file diff --git a/src/current/_includes/molt/phased-bulk-load-all-sources.md b/src/current/_includes/molt/phased-bulk-load-all-sources.md new file mode 100644 index 00000000000..6d97972e925 --- /dev/null +++ b/src/current/_includes/molt/phased-bulk-load-all-sources.md @@ -0,0 +1,337 @@ +A [*Phased Bulk Load Migration*]({% link molt/migration-approach-phased-bulk-load.md %}) involves [migrating data to CockroachDB]({% link molt/migration-overview.md %}) in several phases. Data can be sliced per tenant, per service, per region, or per table to suit the needs of the migration. In this approach, you stop application traffic to the source database _only_ for the tables in a particular slice of data. You then migrate that phase of data to the target cluster using [MOLT Fetch]({% link molt/molt-fetch.md %}) during a **downtime window**. Application traffic is then cut over to those target tables after schema finalization and data verification. This process is repeated for each phase of data. + +- Data is migrated to the target [in phases]({% link molt/migration-considerations-granularity.md %}). + +- This approach does not utilize [continuous replication]({% link molt/migration-considerations-replication.md %}). + +- [Rollback]({% link molt/migration-considerations-rollback.md %}) is manual. If you wish to roll back before the target has received any writes that are not present on the source database, nothing needs to be done. If you wish to roll back after the target has received writes that are not present on the source database, you must manually replicate these new rows on the source. + +This approach is comparable to the [Classic Bulk Load Migration]({% link molt/migration-approach-classic-bulk-load.md %}), but dividing the data into multiple phases allows each downtime window to be shorter, and it allows each phase of the migration to be less complex. Depending on how you divide the data, it also may allow your downtime windows to affect only a subset of users. For example, dividing the data per region could mean that, when migrating the data from Region A, application usage in Region B may remain unaffected. This approach may increase overall migration complexity: its duration is longer, you will need to do the work of partitioning the data, and you will have a longer period when you run both the source and the target database concurrently. + +This approach is best for databases that are too large to migrate all at once, internal tools, dev/staging environments, and production environments that can handle business disruption. It can only be performed if your system can handle downtime for each migration phase, and if your source database can easily be divided into the phases you need. + +This page describes an example scenario. While the commands provided can be copy-and-pasted, they may need to be altered or reconsidered to suit the needs of your specific environment. + +
+Phased Bulk Load Migration flow +
+ +## Example scenario + +You have a moderately-sized (500GB) database that provides the data store for a web application. You want to migrate the entirety of this database to a new CockroachDB cluster. You will divide this migration into four geographic regions (A, B, C, and D). You schedule a maintenance window for each region over four subsequent evenings, and you announce them to your users (per region) several weeks in advance. + +The application runs on a Kubernetes cluster with an NGINX Ingress Controller. + +**Estimated system downtime:** 4 hours per region. + +## Before the migration + +- Install the [MOLT (Migrate Off Legacy Technology)]({% link molt/molt-fetch-installation.md %}#installation) tools. +- Review the [MOLT Fetch]({% link molt/molt-fetch-best-practices.md %}) documentation. +- [Develop a migration plan]({% link molt/migration-strategy.md %}#develop-a-migration-plan) and [prepare for the migration]({% link molt/migration-strategy.md %}#prepare-for-migration). +- **Recommended:** Perform a dry run of this full set of instructions in a development environment that closely resembles your production environment. This can help you get a realistic sense of the time and complexity it requires. +- Announce the maintenance window to your users on a per-region basis. +- Understand the prequisites and limitations of the MOLT tools: + +
+{% include molt/oracle-migration-prerequisites.md %} +
+ +{% include molt/molt-limitations.md %} + + +## Step 1: Prepare the source database + +In this step, you will: + +- [Create a dedicated migration user on your source database](#create-migration-user-on-source-database). + +{% include molt/migration-prepare-database.md %} + +## Step 2: Prepare the target database + +In this step, you will: + +- [Provision and run a new CockroachDB cluster](#provision-a-cockroachdb-cluster). +- [Define the tables on the target cluster](#define-the-target-tables) to match those on the source. +- [Create a SQL user on the target cluster](#create-the-sql-user) with the necessary write permissions. + +### Provision a CockroachDB cluster + +Use one of the following options to create and run a new CockroachDB cluster. This is your migration **target**. + +#### Option 1: Create a secure cluster locally + +If you have the CockroachDB binary installed locally, you can manually deploy a multi-node, self-hosted CockroachDB cluster on your local machine. + +Learn how to [deploy a CockroachDB cluster locally]({% link {{ site.versions["stable"] }}/secure-a-cluster.md %}). + +#### Option 2: Create a CockroachDB Self-Hosted cluster on AWS + +You can manually deploy a multi-node, self-hosted CockroachDB cluster on Amazon's AWS EC2 platform, using AWS's managed load-balancing service to distribute client traffic. + +Learn how to [deploy a CockroachDB cluster on AWS]({% link {{ site.versions["stable"] }}/deploy-cockroachdb-on-aws.md %}). + +#### Option 3: Create a CockroachDB Cloud cluster + +CockroachDB Cloud is a fully-managed service run by Cockroach Labs, which simplifies the deployment and management of CockroachDB. + +[Sign up for a CockroachDB Cloud account](https://cockroachlabs.cloud) and [create a cluster]({% link cockroachcloud/create-your-cluster.md %}) using [trial credits]({% link cockroachcloud/free-trial.md %}). + +### Define the target tables + +{% include molt/migration-prepare-schema.md %} + +### Create the SQL user + +{% include molt/migration-create-sql-user.md %} + +## Migrating each phase + +Steps 3-7 are run for each phase of the data migration. Within the first migration downtime window, you will run through these steps for Region A. You will repeat these steps for the other regions during each subsequent downtime window. + +## Step 3: Stop application traffic + +With both the source and target databases prepared for the data load, it's time to stop application traffic to the source for a particular region. + +If the Kubernetes cluster that deploys the application has pre-region deployments (for example, `app-us`, `app-eu`, `app-apac`), you can scale down only the deployment for that region. + +{% include_cached copy-clipboard.html %} +~~~shell +kubectl scale deploy/app-eu --replicas=0 +~~~ + +Or this can be handled by the NGINX Ingress Controller, by including the following to your NGINX configuration, ensuring that the conditional statement is suitable for your deployment: + +{% include_cached copy-clipboard.html %} +~~~yml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: app + annotations: + nginx.ingress.kubernetes.io/server-snippet: | + if ($http_x_region = "eu") { + return 503; + } +spec: + ingressClassName: nginx + rules: + - host: api.example.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: app + port: + number: 80 +~~~ + +{{ site.data.alerts.callout_danger }} +Application downtime begins now, for users in the given region. + +It is strongly recommended that you perform a dry run of this migration in a test environment. This will allow you to practice using the MOLT tools in real time, and it will give you an accurate sense of how long application downtime might last. +{{ site.data.alerts.end }} + +## Step 4: Load data into CockroachDB + +In this step, you will: + +- [Configure MOLT Fetch with the flags needed for your migration](#configure-molt-fetch). +- [Run MOLT Fetch](#run-molt-fetch). +- [Understand how to continue a load after an interruption](#continue-molt-fetch-after-an-interruption). + +### Configure MOLT Fetch + +The [MOLT Fetch documentation]({% link molt/molt-fetch.md %}) includes detailed information about how to [configure MOLT Fetch]({% link molt/molt-fetch.md %}#run-molt-fetch), and how to [monitor MOLT Fetch metrics]({% link molt/molt-fetch-monitoring.md %}). + +When you run `molt fetch`, you can configure the following options for data load: + + + + + + + + + + + +- [Specify source and target databases]({% link molt/molt-fetch.md %}#specify-source-and-target-databases): Specify URL‑encoded source and target connections. +- [Select data to migrate]({% link molt/molt-fetch.md %}#select-data-to-migrate): Specify schema and table names to migrate. **Important for a phased migration.** +- [Define intermediate file storage]({% link molt/molt-fetch.md %}#define-intermediate-storage): Export data to cloud storage or a local file server. +- [Define fetch mode]({% link molt/molt-fetch.md %}#define-fetch-mode): Specifies whether data will only be loaded into/from intermediate storage. +- [Shard tables]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export): Divide larger tables into multiple shards during data export. +- [Data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from): Choose between `IMPORT INTO` and `COPY FROM`. +- [Table handling mode]({% link molt/molt-fetch.md %}#handle-target-tables): Determine how existing target tables are initialized before load. +- [Define data transformations]({% link molt/molt-fetch.md %}#define-transformations): Define any row-level transformations to apply to the data before it reaches the target. +- [Monitor fetch metrics]({% link molt/molt-fetch-monitoring.md %}): Configure metrics collection during initial data load. + +Read through the documentation to understand how to configure your `molt fetch` command and its flags. Follow [best practices]({% link molt/molt-fetch-best-practices.md %}), especially those related to security. + +At minimum, the `molt fetch` command should include the source, target, data path, and [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check) flags. For a phased migration, you may also choose to include [`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) or [`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter) flags: + +{% include_cached copy-clipboard.html %} +~~~ shell +molt fetch \ +--source $SOURCE \ +--target $TARGET \ +--table-filter '.*user.*' \ +--bucket-path 's3://bucket/path' \ +--ignore-replication-check +~~~ + +However, depending on the needs of your migration, you may have many more flags set, and you may need to prepare some accompanying .json files. + +### Run MOLT Fetch + +Perform the bulk load of the source data. + +1. Run the [MOLT Fetch]({% link molt/molt-fetch.md %}) command to move the source data into CockroachDB. This example command passes the source and target connection strings [as environment variables](#secure-connections), writes [intermediate files](#intermediate-file-storage) to S3 storage, and uses the `truncate-if-exists` [table handling mode](#table-handling-mode) to truncate the target tables before loading data. It limits the migration to a single schema and filters for three specific tables. The [data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) defaults to `IMPORT INTO`. Include the `--ignore-replication-check` flag to skip replication checkpoint queries, which eliminates the need to configure the source database for logical replication. + +
+ {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --target $TARGET \ + --schema-filter 'migration_schema' \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists \ + --ignore-replication-check + ~~~ +
+ +
+ {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --target $TARGET \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists \ + --ignore-replication-check + ~~~ +
+ +
+ The command assumes an Oracle Multitenant (CDB/PDB) source. [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) specifies the container database (CDB) connection string. + + {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --source-cdb $SOURCE_CDB \ + --target $TARGET \ + --schema-filter 'migration_schema' \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists \ + --ignore-replication-check + ~~~ +
+ +{% include molt/fetch-data-load-output.md %} + +### Continue MOLT Fetch after an interruption + +{% include molt/fetch-continue-after-interruption.md %} + +## Step 5: Verify the data + +In this step, you will use [MOLT Verify]({% link molt/molt-verify.md %}) to confirm that the source and target data is consistent. This ensures that the data load was successful. Use MOLT Verify's [`--schema-filter`]({% link molt/molt-verify.md %}#flags) or [`--table-filter`]({% link molt/molt-verify.md %}#flags) to select only the tables that are relevant for the given phase. + +### Run MOLT Verify + +{% include molt/verify-output.md %} + +## Step 6: Finalize the target schema + +### Add constraints and indexes + +{% include molt/migration-modify-target-schema.md %} + +## Step 7: Cut over application traffic + +With the target cluster verified and finalized, it's time to resume application traffic for the current migration phase. + +### Modify application code + +In the application back end, update the application to route traffic for this migration phase to the CockroachDB cluster. A simple example: + +~~~yml +env: + - name: DATABASE_URL_US_EAST + value: postgres://root@cockroachdb.us-east:26257/defaultdb?sslmode=verify-full + - name: DATABASE_URL_US_WEST + value: postgres://legacy-db.us-west:5432/defaultdb # Still on source +~~~ + +In your application code, route database connections based on the user's region: + +~~~python +def get_db_connection(user_region): + if user_region == "us-east": + return os.getenv("DATABASE_URL_US_EAST") # CockroachDB + else: + return os.getenv("DATABASE_URL_US_WEST") # Source database +~~~ + +### Resume application traffic + +If you halted traffic by scaling down a regional Kubernetes deployment, scale it back up. + +{% include_cached copy-clipboard.html %} +~~~shell +kubectl scale deploy/app-eu --replicas=3 +~~~ + +Or if this was handled by the NGINX Controller, remove the 503 block that was written in step 3: + +{% include_cached copy-clipboard.html %} +~~~yml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: app +# annotations: +# nginx.ingress.kubernetes.io/server-snippet: | +# if ($http_x_region = "eu") { +# return 503; +# } +spec: + ingressClassName: nginx + rules: + - host: api.example.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: app + port: + number: 80 +~~~ + +This ends downtime for the current migration phase. + +## Repeat for each phase + +During the next scheduled, regional downtime window, [return to step 3](#step-3-stop-application-traffic) to migrate the next phase of data. Repeat steps 3-7 for each phase of data, until every region's data has been migrated and all application traffic has been cut over to the target. + +## Troubleshooting + +{% include molt/molt-troubleshooting-fetch.md %} + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Classic Bulk Load Migration]({% link molt/migration-approach-classic-bulk-load.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) \ No newline at end of file diff --git a/src/current/_includes/molt/phased-delta-failback-all-sources.md b/src/current/_includes/molt/phased-delta-failback-all-sources.md new file mode 100644 index 00000000000..776fc39316a --- /dev/null +++ b/src/current/_includes/molt/phased-delta-failback-all-sources.md @@ -0,0 +1,1040 @@ +A [*Phased Delta Migration with Failback Replication*]({% link molt/migration-approach-phased-delta-failback.md %}) involves [migrating data to CockroachDB]({% link molt/migration-overview.md %}) in several phases. Data can be sliced per tenant, per service, per region, or per table to suit the needs of the migration. **For each given migration phase**, you use [MOLT Fetch]({% link molt/molt-fetch.md %}) to perform an initial bulk load of the data, you use [MOLT Replicator]({% link molt/molt-replicator.md %}) to update the target database via forward replication and to activate failback replication, and then you cut over application traffic to CockroachDB after schema finalization and data verification. This process is repeated for each phase of data. + +- Data is migrated to the target [in phases]({% link molt/migration-considerations-granularity.md %}). + +- This approach utilizes [continuous replication]({% link molt/migration-considerations-replication.md %}). + +- [Rollback]({% link molt/migration-considerations-rollback.md %}) is achieved via failback replication. + +This approach is comparable to the [Delta Migration]({% link molt/migration-approach-delta.md %}), but dividing the data into multiple phases allows each downtime window to be shorter, and it allows each phase of the migration to be less complex. Depending on how you divide the data, it also may allow your downtime windows to affect only a subset of users. For example, dividing the data per region could mean that, when migrating the data from Region A, application usage in Region B may remain unaffected. This approach may increase overall migration complexity: its duration is longer, you will need to do the work of partitioning the data, and you will have a longer period when you run both the source and the target database concurrently. + +[Failback replication]({% link molt/migration-considerations-rollback.md %}) keeps the source database up to date with changes that occur in the target database once the target database begins receiving write traffic. Failback replication ensures that, if something goes wrong during the migration process, traffic can easily be returned to the source database without data loss. Like forward replication, in this approach, failback replication is run on a per-phase basis. It can persist indefinitely, until you're comfortable maintaining the target database as your sole data store. + +This approach is best for databases that are too large to migrate all at once, and in business cases where downtime must be minimal. It's also suitable for risk-averse situations in which a safe rollback path must be ensured. It can only be performed if your team can handle the complexity of this approach, and if your source database can easily be divided into the phases you need. + +This page describes an example scenario. While the commands provided can be copy-and-pasted, they may need to be altered or reconsidered to suit the needs of your specific environment. + +
+Phased Delta Migration flow +
+ +## Example scenario + +You have a moderately-sized (500GB) database that provides the data store for a web application. You want to migrate the entirety of this database to a new CockroachDB cluster. You will divide this migration into four geographic regions (A, B, C, and D). + +The application runs on a Kubernetes cluster with an NGINX Ingress Controller. + +Your business could not accommodate major performance issues that could arise after the migration. Therefore, you want to enable failback replication so that you can easily return to using your original database with minimal interruption. + +**Estimated system downtime:** 3-5 minutes per region. + +## Before the migration + +- Install the [MOLT (Migrate Off Legacy Technology)]({% link molt/molt-fetch-installation.md %}#installation) tools. +- Review the [MOLT Fetch]({% link molt/molt-fetch-best-practices.md %}) and [MOLT Replicator]({% link molt/molt-replicator.md %}) documentation. +- [Develop a migration plan]({% link molt/migration-strategy.md %}#develop-a-migration-plan) and [prepare for the migration]({% link molt/migration-strategy.md %}#prepare-for-migration). +- **Recommended:** Perform a dry run of this full set of instructions in a development environment that closely resembles your production environment. This can help you get a realistic sense of the time and complexity it requires. +- Understand the prequisites and limitations of the MOLT tools: + +
+{% include molt/oracle-migration-prerequisites.md %} +
+ +{% include molt/molt-limitations.md %} + +## Step 1: Prepare the source database + +In this step, you will: + +- [Create a dedicated migration user on your source database](#create-migration-user-on-source-database). +- [Configure the source database for replication](#configure-source-database-for-replication). + +{% include molt/migration-prepare-database.md %} + +## Step 2: Prepare the target database + +### Define the target tables + +{% include molt/migration-prepare-schema.md %} + +### Create the SQL user + +{% include molt/migration-create-sql-user.md %} + +### Configure GC TTL + +Before starting the [initial data load](#run-molt-fetch), configure the [garbage collection (GC) TTL]({% link {{ site.current_cloud_version }}/configure-replication-zones.md %}#gc-ttlseconds) on the source CockroachDB cluster to ensure that historical data remains available when replication begins. The GC TTL must be long enough to cover the full duration of the data load. + +Increase the GC TTL before starting the data load. For example, to set the GC TTL to 24 hours: + +{% include_cached copy-clipboard.html %} +~~~ sql +ALTER DATABASE defaultdb CONFIGURE ZONE USING gc.ttlseconds = 86400; +~~~ + +{{site.data.alerts.callout_info}} +The GC TTL duration must be higher than your expected time for the initial data load. +{{site.data.alerts.end}} + +Once replication has started successfully (which automatically protects its own data range), you can restore the GC TTL to its original value. For example, to restore to 5 minutes: + +{% include_cached copy-clipboard.html %} +~~~ sql +ALTER DATABASE defaultdb CONFIGURE ZONE USING gc.ttlseconds = 300; +~~~ + +For details, refer to [Protect Changefeed Data from Garbage Collection]({% link {{ site.current_cloud_version }}/protect-changefeed-data.md %}). + +## Migrating each phase + +Steps 3-12 are run for each phase of the data migration. When migrating the first phase, you will run through these steps for Region A. You will repeat these steps for the other regions during each subsequent migration phase. + +## Step 3: Load data into CockroachDB + +In this step, you will: + +- [Configure MOLT Fetch with the flags needed for your migration](#configure-molt-fetch). +- [Run MOLT Fetch](#run-molt-fetch). +- [Understand how to continue a load after an interruption](#continue-molt-fetch-after-an-interruption). + +### Configure MOLT Fetch + +The [MOLT Fetch documentation]({% link molt/molt-fetch.md %}) includes detailed information about how to [configure MOLT Fetch]({% link molt/molt-fetch.md %}#run-molt-fetch), and how to [monitor MOLT Fetch metrics]({% link molt/molt-fetch-monitoring.md %}). + +When you run `molt fetch`, you can configure the following options for data load: + + + + + + + + + + + +- [Specify source and target databases]({% link molt/molt-fetch.md %}#specify-source-and-target-databases): Specify URL‑encoded source and target connections. +- [Select data to migrate]({% link molt/molt-fetch.md %}#select-data-to-migrate): Specify schema and table names to migrate. **Important for a phased migration.** +- [Define intermediate file storage]({% link molt/molt-fetch.md %}#define-intermediate-storage): Export data to cloud storage or a local file server. +- [Define fetch mode]({% link molt/molt-fetch.md %}#define-fetch-mode): Specifies whether data will only be loaded into/from intermediate storage. +- [Shard tables]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export): Divide larger tables into multiple shards during data export. +- [Data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from): Choose between `IMPORT INTO` and `COPY FROM`. +- [Table handling mode]({% link molt/molt-fetch.md %}#handle-target-tables): Determine how existing target tables are initialized before load. +- [Define data transformations]({% link molt/molt-fetch.md %}#define-transformations): Define any row-level transformations to apply to the data before it reaches the target. +- [Monitor fetch metrics]({% link molt/molt-fetch-monitoring.md %}): Configure metrics collection during initial data load. + +Read through the documentation to understand how to configure your `molt fetch` command and its flags. Follow [best practices]({% link molt/molt-fetch-best-practices.md %}), especially those related to security. + +At minimum, the `molt fetch` command should include the source, target, data path, and [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check) flags. For a phased migration, you may also choose to include [`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) or [`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter) flags: + +{% include_cached copy-clipboard.html %} +~~~ shell +molt fetch \ +--source $SOURCE \ +--target $TARGET \ +--table-filter '.*user.*' \ +--bucket-path 's3://bucket/path' \ +--ignore-replication-check +~~~ + +However, depending on the needs of your migration, you may have many more flags set, and you may need to prepare some accompanying .json files. + +### Run MOLT Fetch + + + +Perform the initial load of the source data. + +1. Issue the [MOLT Fetch]({% link molt/molt-fetch.md %}) command to move the source data to CockroachDB. This example command passes the source and target connection strings [as environment variables](#secure-connections), writes [intermediate files](#intermediate-file-storage) to S3 storage, and uses the `truncate-if-exists` [table handling mode](#table-handling-mode) to truncate the target tables before loading data. It also limits the migration to a single schema and filters three specific tables to migrate. The [data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) defaults to `IMPORT INTO`. + +
+ You **must** include `--pglogical-replication-slot-name` and `--pglogical-publication-and-slot-drop-and-recreate` to automatically create the publication and replication slot during the data load. + + {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --target $TARGET \ + --schema-filter 'migration_schema' \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists \ + --pglogical-replication-slot-name molt_slot \ + --pglogical-publication-and-slot-drop-and-recreate + ~~~ +
+ +
+ {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --target $TARGET \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists + ~~~ +
+ +
+ The command assumes an Oracle Multitenant (CDB/PDB) source. [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) specifies the container database (CDB) connection string. + + {% include_cached copy-clipboard.html %} + ~~~ shell + molt fetch \ + --source $SOURCE \ + --source-cdb $SOURCE_CDB \ + --target $TARGET \ + --schema-filter 'migration_schema' \ + --table-filter 'employees|payments|orders' \ + --bucket-path 's3://migration/data/cockroach' \ + --table-handling truncate-if-exists + ~~~ +
+ +{% include molt/fetch-data-load-output.md %} + +### Continue MOLT Fetch after an interruption + +{% include molt/fetch-continue-after-interruption.md %} + +## Step 4: Verify the initial data load + +In this step, you will use [MOLT Verify]({% link molt/molt-verify.md %}) to confirm that the source and target data is consistent. This ensures that the data load was successful. Use MOLT Verify's [`--schema-filter`]({% link molt/molt-verify.md %}#flags) or [`--table-filter`]({% link molt/molt-verify.md %}#flags) to select only the tables that are relevant for the given phase. + +### Run MOLT Verify + +{% include molt/verify-output.md %} + +## Step 5: Finalize the target schema + +### Add constraints and indexes + +{% include molt/migration-modify-target-schema.md %} + +## Step 6: Begin forward replication + +In this step, you will: + +- [Configure MOLT Replicator with the flags needed for your migration](#configure-molt-replicator-forward-replication). +- [Start MOLT Replicator](#start-molt-replicator-forward-replication). +- [Understand how to continue replication after an interruption](#continue-molt-replicator-after-an-interruption-forward-replication). + +### Configure MOLT Replicator (forward replication) + +When you run `replicator`, you can configure the following options for replication: + +- [Replication connection strings](#replication-connection-strings): Specify URL-encoded source and target database connections. +- [Replicator flags](#replicator-flags): Specify required and optional flags to configure replicator behavior. +
+- [Tuning parameters](#tuning-parameters): Optimize replication performance and resource usage. +
+- [Replicator metrics](#replicator-metrics): Monitor replication progress and performance. + +#### Replication connection strings + +MOLT Replicator uses `--sourceConn` and `--targetConn` to specify the source and target database connections. + +`--sourceConn` specifies the connection string of the source database: + +
+~~~ +--sourceConn 'postgresql://{username}:{password}@{host}:{port}/{database}' +~~~ +
+ +
+~~~ +--sourceConn 'mysql://{username}:{password}@{protocol}({host}:{port})/{database}' +~~~ +
+ +
+~~~ +--sourceConn 'oracle://{username}:{password}@{host}:{port}/{service_name}' +~~~ + +For Oracle Multitenant databases, also specify `--sourcePDBConn` with the PDB connection string: + +~~~ +--sourcePDBConn 'oracle://{username}:{password}@{host}:{port}/{pdb_service_name}' +~~~ +
+ +`--targetConn` specifies the target CockroachDB connection string: + +~~~ +--targetConn 'postgresql://{username}:{password}@{host}:{port}/{database}' +~~~ + +{{site.data.alerts.callout_success}} +Follow best practices for securing connection strings. Refer to [Secure connections](#secure-connections). +{{site.data.alerts.end}} + +#### Replicator flags + +{% include molt/replicator-flags-usage.md %} + +
+ +#### Tuning parameters + +{% include molt/optimize-replicator-performance.md %} +
+ +#### Replicator metrics + +MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: + +~~~ +--metricsAddr :30005 +~~~ + +
+For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=postgres). +
+ +
+For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=mysql). +
+ +
+For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=oracle). +
+ +### Start MOLT Replicator (forward replication) + + + +With initial load complete, start replication of ongoing changes on the source to CockroachDB using [MOLT Replicator]({% link molt/molt-replicator.md %}). + +{{site.data.alerts.callout_info}} +MOLT Fetch captures a consistent point-in-time checkpoint at the start of the data load (shown as `cdc_cursor` in the fetch output). Starting replication from this checkpoint ensures that all changes made during and after the data load are replicated to CockroachDB, preventing data loss or duplication. The following steps use the checkpoint values from the fetch output to start replication at the correct position. +{{site.data.alerts.end}} + +
+ +Run the `replicator` command, using the same slot name that you specified with `--pglogical-replication-slot-name` and the publication name created by `--pglogical-publication-and-slot-drop-and-recreate` in the [Fetch command](#run-molt-fetch). Use `--stagingSchema` to specify a unique name for the staging database, and include `--stagingCreateSchema` to have MOLT Replicator automatically create the staging database: + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator pglogical \ +--sourceConn $SOURCE \ +--targetConn $TARGET \ +--targetSchema defaultdb.migration_schema \ +--slotName molt_slot \ +--publicationName molt_fetch \ +--stagingSchema defaultdb._replicator \ +--stagingCreateSchema \ +--metricsAddr :30005 \ +-v +~~~ +
+ +
+ +Run the `replicator` command, specifying the GTID from the [checkpoint recorded during data load](#run-molt-fetch). Use `--stagingSchema` to specify a unique name for the staging database, and include `--stagingCreateSchema` to have MOLT Replicator automatically create the staging database. If you [filtered tables during the initial load](#schema-and-table-filtering), [write a userscript to filter tables on replication]({% link molt/userscript-cookbook.md %}#filter-multiple-tables) and specify the path with `--userscript`. + + {% include_cached copy-clipboard.html %} + ~~~ shell + replicator mylogical \ + --sourceConn $SOURCE \ + --targetConn $TARGET \ + --targetSchema defaultdb.public \ + --defaultGTIDSet 4c658ae6-e8ad-11ef-8449-0242ac140006:1-29 \ + --stagingSchema defaultdb._replicator \ + --stagingCreateSchema \ + --metricsAddr :30005 \ + --userscript table_filter.ts \ + -v + ~~~ + + {{site.data.alerts.callout_success}} + For MySQL versions that do not support `binlog_row_metadata`, include `--fetchMetadata` to explicitly fetch column metadata. This requires additional permissions on the source MySQL database. Grant `SELECT` permissions with `GRANT SELECT ON migration_db.* TO 'migration_user'@'localhost';`. If that is insufficient for your deployment, use `GRANT PROCESS ON *.* TO 'migration_user'@'localhost';`, though this is more permissive and allows seeing processes and server status. + {{site.data.alerts.end}} +
+ +
+ +Run the `replicator` command, specifying the backfill and starting SCN from the [checkpoint recorded during data load](#run-molt-fetch). Use `--stagingSchema` to specify a unique name for the staging database, and include `--stagingCreateSchema` to have MOLT Replicator automatically create the staging database. If you [filtered tables during the initial load](#schema-and-table-filtering), [write a userscript to filter tables on replication]({% link molt/userscript-cookbook.md %}#filter-multiple-tables) and specify the path with `--userscript`. + + {% include_cached copy-clipboard.html %} + ~~~ shell + replicator oraclelogminer \ + --sourceConn $SOURCE \ + --sourcePDBConn $SOURCE_PDB \ + --targetConn $TARGET \ + --sourceSchema MIGRATION_USER \ + --targetSchema defaultdb.migration_schema \ + --backfillFromSCN 26685444 \ + --scn 26685786 \ + --stagingSchema defaultdb._replicator \ + --stagingCreateSchema \ + --metricsAddr :30005 \ + --userscript table_filter.ts \ + -v + ~~~ + + {{site.data.alerts.callout_info}} + When [filtering out tables in a schema with a userscript]({% link molt/userscript-cookbook.md %}#filter-multiple-tables), replication performance may decrease because filtered tables are still included in LogMiner queries and processed before being discarded. + {{site.data.alerts.end}} +
+ +#### Check that replication is working + +Verify that Replicator is processing changes successfully. To do so, check the MOLT Replicator logs. Since you enabled debug logging with `-v`, you should see connection and row processing messages: + +
+You should see periodic primary keepalive messages: + +~~~ +DEBUG [Aug 25 14:38:10] primary keepalive received ReplyRequested=false ServerTime="2025-08-25 14:38:09.556773 -0500 CDT" ServerWALEnd=0/49913A58 +DEBUG [Aug 25 14:38:15] primary keepalive received ReplyRequested=false ServerTime="2025-08-25 14:38:14.556836 -0500 CDT" ServerWALEnd=0/49913E60 +~~~ + +When rows are successfully replicated, you should see debug output like the following: + +~~~ +DEBUG [Aug 25 14:40:02] upserted rows conflicts=0 duration=7.855333ms proposed=1 target="\"molt\".\"public\".\"tbl1\"" upserted=1 +DEBUG [Aug 25 14:40:02] progressed to LSN: 0/49915DD0 +~~~ +
+ +
+You should see binlog syncer connection and row processing: + +~~~ +[2025/08/25 15:29:09] [info] binlogsyncer.go:463 begin to sync binlog from GTID set 77263736-7899-11f0-81a5-0242ac120002:1-38 +[2025/08/25 15:29:09] [info] binlogsyncer.go:409 Connected to mysql 8.0.43 server +INFO [Aug 25 15:29:09] connected to MySQL version 8.0.43 +~~~ + +When rows are successfully replicated, you should see debug output like the following: + +~~~ +DEBUG [Aug 25 15:29:38] upserted rows conflicts=0 duration=1.801ms proposed=1 target="\"molt\".\"public\".\"tbl1\"" upserted=1 +DEBUG [Aug 25 15:29:38] progressed to consistent point: 77263736-7899-11f0-81a5-0242ac120002:1-39 +~~~ +
+ +
+When transactions are read from the Oracle source, you should see registered transaction IDs (XIDs): + +~~~ +DEBUG [Jul 3 15:55:12] registered xid 0f001f0040060000 +DEBUG [Jul 3 15:55:12] registered xid 0b001f00bb090000 +~~~ + +When rows are successfully replicated, you should see debug output like the following: + +~~~ +DEBUG [Jul 3 15:55:12] upserted rows conflicts=0 duration=2.620009ms proposed=13 target="\"molt_movies\".\"USERS\".\"CUSTOMER_CONTACT\"" upserted=13 +DEBUG [Jul 3 15:55:12] upserted rows conflicts=0 duration=2.212807ms proposed=16 target="\"molt_movies\".\"USERS\".\"CUSTOMER_DEVICE\"" upserted=16 +~~~ +
+ +These messages confirm successful replication. You can disable verbose logging after verifying the connection. + +### Continue MOLT Replicator after an interruption (forward replication) + +
+Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `pglogical` command using the same `--stagingSchema` value from your [initial replication command](#start-molt-replicator-forward-replication). + +Be sure to specify the same `--slotName` value that you used during your [initial replication command](#start-molt-replicator-forward-replication). The replication slot on the source PostgreSQL database automatically tracks the LSN (Log Sequence Number) checkpoint, so replication will resume from where it left off. + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator pglogical \ +--sourceConn $SOURCE \ +--targetConn $TARGET \ +--targetSchema defaultdb.migration_schema \ +--slotName molt_slot \ +--stagingSchema defaultdb._replicator \ +--metricsAddr :30005 \ +-v +~~~ +
+ +
+Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `mylogical` command using the same `--stagingSchema` value from your [initial replication command](#start-molt-replicator-forward-replication). + +Replicator will automatically use the saved GTID (Global Transaction Identifier) from the `memo` table in the staging schema (in this example, `defaultdb._replicator.memo`) and track advancing GTID checkpoints there. To have Replicator start from a different GTID instead of resuming from the checkpoint, clear the `memo` table with `DELETE FROM defaultdb._replicator.memo;` and run the `replicator` command with a new `--defaultGTIDSet` value. + +{{site.data.alerts.callout_success}} +For MySQL versions that do not support `binlog_row_metadata`, include `--fetchMetadata` to explicitly fetch column metadata. This requires additional permissions on the source MySQL database. Grant `SELECT` permissions with `GRANT SELECT ON migration_db.* TO 'migration_user'@'localhost';`. If that is insufficient for your deployment, use `GRANT PROCESS ON *.* TO 'migration_user'@'localhost';`, though this is more permissive and allows seeing processes and server status. +{{site.data.alerts.end}} + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator mylogical \ +--sourceConn $SOURCE \ +--targetConn $TARGET \ +--targetSchema defaultdb.public \ +--stagingSchema defaultdb._replicator \ +--metricsAddr :30005 \ +--userscript table_filter.ts \ +-v +~~~ +
+ +
+Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `oraclelogminer` command using the same `--stagingSchema` value from your [initial replication command](#start-molt-replicator-forward-replication). + +Replicator will automatically find the correct restart SCN (System Change Number) from the `_oracle_checkpoint` table in the staging schema. The restart point is determined by the non-committed row with the smallest `startscn` column value. + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator oraclelogminer \ +--sourceConn $SOURCE \ +--sourcePDBConn $SOURCE_PDB \ +--sourceSchema MIGRATION_USER \ +--targetSchema defaultdb.migration_schema \ +--targetConn $TARGET \ +--stagingSchema defaultdb._replicator \ +--metricsAddr :30005 \ +--userscript table_filter.ts \ +-v +~~~ + +{{site.data.alerts.callout_info}} +When [filtering out tables in a schema with a userscript]({% link molt/userscript-cookbook.md %}#filter-multiple-tables), replication performance may decrease because filtered tables are still included in LogMiner queries and processed before being discarded. +{{site.data.alerts.end}} +
+ +Replication resumes from the last checkpoint without performing a fresh load. Monitor the metrics endpoint at `http://localhost:30005/_/varz` to track replication progress. + +## Step 7: Stop application traffic + +Once the inital data load has been verified and the target schema has been finalized, it's time to begin the cutover process. First, stop application traffic to the source for this particular region. + +If the Kubernetes cluster that deploys the application has pre-region deployments (for example, `app-us`, `app-eu`, `app-apac`), you can scale down only the deployment for that region. + +{% include_cached copy-clipboard.html %} +~~~shell +kubectl scale deploy/app-eu --replicas=0 +~~~ + +Or this can be handled by the NGINX Ingress Controller, by including the following to your NGINX configuration, ensuring that the conditional statement is suitable for your deployment: + +{% include_cached copy-clipboard.html %} +~~~yml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: app + annotations: + nginx.ingress.kubernetes.io/server-snippet: | + if ($http_x_region = "eu") { + return 503; + } +spec: + ingressClassName: nginx + rules: + - host: api.example.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: app + port: + number: 80 +~~~ + +{{ site.data.alerts.callout_danger }} +Application downtime begins now, for users in the given region. + +It is strongly recommended that you perform a dry run of this migration in a test environment. This will allow you to practice using the MOLT tools in real time, and it will give you an accurate sense of how long application downtime might last. +{{ site.data.alerts.end }} + +## Step 8: Stop forward replication + +Before you can cut over traffic to the target, the changes to the source database need to finish being written to the target. Once the source is no longer receiving write traffic, MOLT Replicator will take some seconds to finish replicating the final changes. This is known as _drainage_. + +{% include molt/migration-stop-replication.md %} + +## Step 9: Verify the replicated data + +Repeat [Step 4](#step-4-verify-the-initial-data-load) to verify the updated data. + +## Step 10: Begin failback replication + +In this step, you will: + +- [Prepare both databases for failback replication](#prepare-your-source-and-target-databases-for-failback-replication) +- [Configure MOLT Replicator with the flags needed for your migration](#configure-molt-replicator-failback-replication). +- [Start MOLT Replicator](#start-molt-replicator-failback-replication). +- [Understand how to continue replication after an interruption](#continue-molt-replicator-after-an-interruption-failback-replication). + +### Prepare your source and target databases for failback replication + +#### Prepare the CockroachDB cluster + +{{site.data.alerts.callout_success}} +For details on enabling CockroachDB changefeeds, refer to [Create and Configure Changefeeds]({% link {{ site.current_cloud_version }}/create-and-configure-changefeeds.md %}). +{{site.data.alerts.end}} + +If you are migrating to a CockroachDB {{ site.data.products.core }} cluster, [enable rangefeeds]({% link {{ site.current_cloud_version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) on the cluster: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET CLUSTER SETTING kv.rangefeed.enabled = true; +~~~ + +Use the following optional settings to increase changefeed throughput. + +{{site.data.alerts.callout_danger}} +The following settings can impact source cluster performance and stability, especially SQL foreground latency during writes. For details, refer to [Advanced Changefeed Configuration]({% link {{ site.current_cloud_version }}/advanced-changefeed-configuration.md %}). +{{site.data.alerts.end}} + +To lower changefeed emission latency, but increase SQL foreground latency: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET CLUSTER SETTING kv.rangefeed.closed_timestamp_refresh_interval = '250ms'; +~~~ + +To lower the [closed timestamp]({% link {{ site.current_cloud_version }}/architecture/transaction-layer.md %}#closed-timestamps) lag duration: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET CLUSTER SETTING kv.closed_timestamp.target_duration = '1s'; +~~~ + +To improve catchup speeds but increase cluster CPU usage: + +{% include_cached copy-clipboard.html %} +~~~ sql +SET CLUSTER SETTING kv.rangefeed.concurrent_catchup_iterators = 64; +~~~ + +#### Grant target database user permissions + +You should have already created a migration user on the target database (your **original source database**) with the necessary privileges. Refer to [Create migration user on source database](#create-migration-user-on-source-database). + +For failback replication, grant the user additional privileges to write data back to the target database: + +
+{% include_cached copy-clipboard.html %} +~~~ sql +-- Grant INSERT and UPDATE on tables to fail back to +GRANT INSERT, UPDATE ON ALL TABLES IN SCHEMA migration_schema TO migration_user; +ALTER DEFAULT PRIVILEGES IN SCHEMA migration_schema GRANT INSERT, UPDATE ON TABLES TO migration_user; +~~~ +
+ +
+{% include_cached copy-clipboard.html %} +~~~ sql +-- Grant INSERT and UPDATE on tables to fail back to +GRANT SELECT, INSERT, UPDATE ON migration_db.* TO 'migration_user'@'%'; +FLUSH PRIVILEGES; +~~~ +
+ +
+{% include_cached copy-clipboard.html %} +~~~ sql +-- Grant INSERT, UPDATE, and FLASHBACK on tables to fail back to +GRANT SELECT, INSERT, UPDATE, FLASHBACK ON migration_schema.employees TO MIGRATION_USER; +GRANT SELECT, INSERT, UPDATE, FLASHBACK ON migration_schema.payments TO MIGRATION_USER; +GRANT SELECT, INSERT, UPDATE, FLASHBACK ON migration_schema.orders TO MIGRATION_USER; +~~~ +
+ +#### Create a CockroachDB changefeed + +On the target cluster, create a CockroachDB changefeed to send changes to MOLT Replicator. + +1. Get the current logical timestamp from CockroachDB, after [ensuring that forward replication has fully drained](#step-8-stop-forward-replication): + + {% include_cached copy-clipboard.html %} + ~~~ sql + SELECT cluster_logical_timestamp(); + ~~~ + + ~~~ + cluster_logical_timestamp + ---------------------------------- + 1759246920563173000.0000000000 + ~~~ + +1. Create the CockroachDB changefeed pointing to the MOLT Replicator webhook endpoint. Use `cursor` to specify the logical timestamp from the preceding step. For details on the webhook sink URI, refer to [Webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink). + + {{site.data.alerts.callout_info}} + Explicitly set a default `10s` [`webhook_client_timeout`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options) value in the `CREATE CHANGEFEED` statement. This value ensures that the webhook can report failures in inconsistent networking situations and make crash loops more visible. + {{site.data.alerts.end}} + +
+ The target schema is specified in the webhook URL path in the fully-qualified format `/database/schema`. The path specifies the database and schema on the target PostgreSQL database. For example, `/migration_db/migration_schema` routes changes to the `migration_schema` schema in the `migration_db` database. + + {% include_cached copy-clipboard.html %} + ~~~ sql + CREATE CHANGEFEED FOR TABLE employees, payments, orders \ + INTO 'webhook-https://replicator-host:30004/migration_db/migration_schema?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \ + WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}', webhook_client_timeout = '10s'; + ~~~ +
+ +
+ MySQL tables belong directly to the database, not to a separate schema. The webhook URL path specifies the database name on the target MySQL database. For example, `/migration_db` routes changes to the `migration_db` database. + + {% include_cached copy-clipboard.html %} + ~~~ sql + CREATE CHANGEFEED FOR TABLE employees, payments, orders \ + INTO 'webhook-https://replicator-host:30004/migration_db?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \ + WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}', webhook_client_timeout = '10s'; + ~~~ +
+ +
+ The webhook URL path specifies the schema name on the target Oracle database. Oracle capitalizes identifiers by default. For example, `/MIGRATION_SCHEMA` routes changes to the `MIGRATION_SCHEMA` schema. + + {% include_cached copy-clipboard.html %} + ~~~ sql + CREATE CHANGEFEED FOR TABLE employees, payments, orders \ + INTO 'webhook-https://replicator-host:30004/MIGRATION_SCHEMA?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \ + WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}', webhook_client_timeout = '10s'; + ~~~ +
+ + The output shows the job ID: + + ~~~ + job_id + ----------------------- + 1101234051444375553 + ~~~ + + {{site.data.alerts.callout_success}} + Ensure that only **one** changefeed points to MOLT Replicator at a time to avoid mixing streams of incoming data. + {{site.data.alerts.end}} + +1. Monitor the changefeed status, specifying the job ID: + + ~~~ sql + SHOW CHANGEFEED JOB 1101234051444375553; + ~~~ + + ~~~ + job_id | ... | status | running_status | ... + ----------------------+-----+---------+-------------------------------------------+---- + 1101234051444375553 | ... | running | running: resolved=1759246920563173000,0 | ... + ~~~ + + To confirm the changefeed is active and replicating changes to the target database, check that `status` is `running` and `running_status` shows `running: resolved={timestamp}`. + + {{site.data.alerts.callout_danger}} + `running: resolved` may be reported even if data isn't being sent properly. This typically indicates incorrect host/port configuration or network connectivity issues. + {{site.data.alerts.end}} + +1. Verify that Replicator is reporting incoming HTTP requests from the changefeed. To do so, check the MOLT Replicator logs. Since you enabled debug logging with `-v`, you should see periodic HTTP request successes: + + ~~~ + DEBUG [Aug 25 11:52:47] httpRequest="&{0x14000b068c0 45 200 3 9.770958ms false false}" + DEBUG [Aug 25 11:52:48] httpRequest="&{0x14000d1a000 45 200 3 13.438125ms false false}" + ~~~ + + These debug messages confirm successful changefeed connections to MOLT Replicator. You can disable verbose logging after verifying the connection. + +### Configure MOLT Replicator (failback replication) + +When you run `replicator`, you can configure the following options for replication: + +- [Connection strings](#connection-strings): Specify URL‑encoded source and target connections. +- [TLS certificate and key](#tls-certificate-and-key): Configure secure TLS connections. +- [Replicator flags](#replicator-flags): Specify required and optional flags to configure replicator behavior. +
+- [Tuning parameters](#tuning-parameters): Optimize failback performance and resource usage. +
+- [Replicator metrics](#replicator-metrics): Monitor failback replication performance. + +#### Replication connection strings + +MOLT Replicator uses `--sourceConn` and `--targetConn` to specify the source and target database connections. + +{{site.data.alerts.callout_info}} +For MOLT Replicator, the source is always the **replication** source, while the target is always the **replication** target. This is distinct from the **migration** source and target. In the case of this example migration, the new CockroachDB cluster is the migration target, but because failback replication moves data from the migration target back to the migration source, the **replication** target is the original source database. In essence, the `--sourceConn` and `--targetConn` strings should be reversed for failback replication. +{{site.data.alerts.end}} + +`--sourceConn` specifies the connection string of the CockroachDB cluster: + +~~~ +--sourceConn 'postgresql://{username}:{password}@{host}:{port}/{database}' +~~~ + +`--targetConn` specifies the original source database: + +
+~~~ +--targetConn 'postgresql://{username}:{password}@{host}:{port}/{database}' +~~~ +
+ +
+~~~ +--targetConn 'mysql://{username}:{password}@{protocol}({host}:{port})/{database}' +~~~ +
+ +
+~~~ +--targetConn 'oracle://{username}:{password}@{host}:{port}/{service_name}' +~~~ +
+ +{{site.data.alerts.callout_success}} +Follow best practices for securing connection strings. Refer to [Secure connections](#secure-connections). +{{site.data.alerts.end}} + +##### Secure connections + +{% include molt/molt-secure-connection-strings.md %} + +#### TLS certificate and key + +Always use **secure TLS connections** for failback replication to protect data in transit. Do **not** use insecure configurations in production: avoid the `--disableAuthentication` and `--tlsSelfSigned` Replicator flags and `insecure_tls_skip_verify=true` query parameter in the changefeed webhook URI. + +Generate self-signed TLS certificates or certificates from an external CA. Ensure the TLS server certificate and key are accessible on the MOLT Replicator host machine via a relative or absolute file path. When you [start failback with Replicator](#start-replicator), specify the paths with `--tlsCertificate` and `--tlsPrivateKey`. For example: + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator start \ +... \ +--tlsCertificate ./certs/server.crt \ +--tlsPrivateKey ./certs/server.key +~~~ + +The client certificates defined in the changefeed webhook URI must correspond to the server certificates specified in the `replicator` command. This ensures proper TLS handshake between the changefeed and MOLT Replicator. To include client certificates in the changefeed webhook URL, encode them with `base64` and then URL-encode the output with `jq`: + +{% include_cached copy-clipboard.html %} +~~~ shell +base64 -i ./client.crt | jq -R -r '@uri' +base64 -i ./client.key | jq -R -r '@uri' +base64 -i ./ca.crt | jq -R -r '@uri' +~~~ + +When you [create the changefeed](#create-a-cockroachdb-changefeed), pass the encoded certificates in the changefeed URL, where `client_cert`, `client_key`, and `ca_cert` are [webhook sink parameters]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-parameters): + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE CHANGEFEED FOR TABLE table1, table2 +INTO 'webhook-https://host:port/database/schema?client_cert={base64_and_url_encoded_cert}&client_key={base64_and_url_encoded_key}&ca_cert={base64_and_url_encoded_ca}' +WITH ...; +~~~ + +For additional details on the webhook sink URI, refer to [Webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink). + +#### Replicator flags + +| Flag | Description | +|---------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) | **Required.** Staging schema name on CockroachDB for the changefeed checkpoint table. Schema name must be fully qualified in the format `database.schema`. | +| [`--bindAddr`]({% link molt/replicator-flags.md %}#bind-addr) | **Required.** Network address to bind the webhook sink for the changefeed. For example, `:30004`. | +| [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) | Path to the server TLS certificate for the webhook sink. Refer to [TLS certificate and key](#tls-certificate-and-key). | +| [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key) | Path to the server TLS private key for the webhook sink. Refer to [TLS certificate and key](#tls-certificate-and-key).Q | +| [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | +| [`--userscript`]({% link molt/replicator-flags.md %}#userscript) | Path to a [userscript]({% link molt/userscript-overview.md %}) that enables data filtering, routing, or transformations. For examples, refer to [Userscript Cookbook]({% link molt/userscript-cookbook.md %}). | + +- The staging schema is first created during [initial replication setup]({% link molt/molt-replicator.md %}#forward-replication-after-initial-load) with [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema). + +- When configuring a [secure changefeed](#tls-certificate-and-key) for failback, you **must** include [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) and [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key), which specify the paths to the server certificate and private key for the webhook sink connection. + +
+### Tuning parameters + +{% include molt/optimize-replicator-performance.md %} +
+ +#### Replicator metrics + +MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: + +~~~ +--metricsAddr :30005 +~~~ + +For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=cockroachdb). + +### Start MOLT Replicator (failback replication) + +With initial load complete, start replication of ongoing changes on the source to CockroachDB using [MOLT Replicator]({% link molt/molt-replicator.md %}). + +{{site.data.alerts.callout_info}} +MOLT Fetch captures a consistent point-in-time checkpoint at the start of the data load (shown as `cdc_cursor` in the fetch output). Starting replication from this checkpoint ensures that all changes made during and after the data load are replicated to CockroachDB, preventing data loss or duplication. The following steps use the checkpoint values from the fetch output to start replication at the correct position. +{{site.data.alerts.end}} + +Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `start` command to begin failback replication from CockroachDB to your source database. In this example, `--metricsAddr :30005` enables a Prometheus endpoint for monitoring replication metrics, and `--bindAddr :30004` sets up the webhook endpoint for the changefeed. + +`--stagingSchema` specifies the staging database name (`defaultdb._replicator` in this example) used for replication checkpoints and metadata. This staging database was created during [initial forward replication](#step-6-begin-forward-replication) when you first ran MOLT Replicator with `--stagingCreateSchema`. + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator start \ +--targetConn $TARGET \ +--stagingConn $STAGING \ +--stagingSchema defaultdb._replicator \ +--metricsAddr :30005 \ +--bindAddr :30004 \ +--tlsCertificate ./certs/server.crt \ +--tlsPrivateKey ./certs/server.key \ +-v +~~~ + +### Continue MOLT Replicator after an interruption (failback replication) + +
+Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `pglogical` command using the same `--stagingSchema` value from your [initial replication command](#start-molt-replicator-failback-replication). + +Be sure to specify the same `--slotName` value that you used during your [initial replication command](#start-molt-replicator-failback-replication). The replication slot on the source PostgreSQL database automatically tracks the LSN (Log Sequence Number) checkpoint, so replication will resume from where it left off. + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator pglogical \ +--sourceConn $SOURCE \ +--targetConn $TARGET \ +--targetSchema defaultdb.migration_schema \ +--slotName molt_slot \ +--stagingSchema defaultdb._replicator \ +--metricsAddr :30005 \ +-v +~~~ +
+ +
+Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `mylogical` command using the same `--stagingSchema` value from your [initial replication command](#start-molt-replicator-failback-replication). + +Replicator will automatically use the saved GTID (Global Transaction Identifier) from the `memo` table in the staging schema (in this example, `defaultdb._replicator.memo`) and track advancing GTID checkpoints there. To have Replicator start from a different GTID instead of resuming from the checkpoint, clear the `memo` table with `DELETE FROM defaultdb._replicator.memo;` and run the `replicator` command with a new `--defaultGTIDSet` value. + +{{site.data.alerts.callout_success}} +For MySQL versions that do not support `binlog_row_metadata`, include `--fetchMetadata` to explicitly fetch column metadata. This requires additional permissions on the source MySQL database. Grant `SELECT` permissions with `GRANT SELECT ON migration_db.* TO 'migration_user'@'localhost';`. If that is insufficient for your deployment, use `GRANT PROCESS ON *.* TO 'migration_user'@'localhost';`, though this is more permissive and allows seeing processes and server status. +{{site.data.alerts.end}} + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator mylogical \ +--sourceConn $SOURCE \ +--targetConn $TARGET \ +--targetSchema defaultdb.public \ +--stagingSchema defaultdb._replicator \ +--metricsAddr :30005 \ +--userscript table_filter.ts \ +-v +~~~ +
+ +
+Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `oraclelogminer` command using the same `--stagingSchema` value from your [initial replication command](#start-molt-replicator-failback-replication). + +Replicator will automatically find the correct restart SCN (System Change Number) from the `_oracle_checkpoint` table in the staging schema. The restart point is determined by the non-committed row with the smallest `startscn` column value. + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator oraclelogminer \ +--sourceConn $SOURCE \ +--sourcePDBConn $SOURCE_PDB \ +--sourceSchema MIGRATION_USER \ +--targetSchema defaultdb.migration_schema \ +--targetConn $TARGET \ +--stagingSchema defaultdb._replicator \ +--metricsAddr :30005 \ +--userscript table_filter.ts \ +-v +~~~ + +{{site.data.alerts.callout_info}} +When [filtering out tables in a schema with a userscript]({% link molt/userscript-cookbook.md %}#filter-multiple-tables), replication performance may decrease because filtered tables are still included in LogMiner queries and processed before being discarded. +{{site.data.alerts.end}} +
+ +Replication resumes from the last checkpoint without performing a fresh load. Monitor the metrics endpoint at `http://localhost:30005/_/varz` to track replication progress. + +## Step 11: Cut over application traffic + +With the target cluster verified and finalized, it's time to resume application traffic for the current migration phase. + +### Modify application code + +In the application back end, update the application to route traffic for this migration phase to the CockroachDB cluster. A simple example: + +~~~yml +env: + - name: DATABASE_URL_US_EAST + value: postgres://root@cockroachdb.us-east:26257/defaultdb?sslmode=verify-full + - name: DATABASE_URL_US_WEST + value: postgres://legacy-db.us-west:5432/defaultdb # Still on source +~~~ + +In your application code, route database connections based on the user's region: + +~~~python +def get_db_connection(user_region): + if user_region == "us-east": + return os.getenv("DATABASE_URL_US_EAST") # CockroachDB + else: + return os.getenv("DATABASE_URL_US_WEST") # Source database +~~~ + +### Resume application traffic + +If you halted traffic by scaling down a regional Kubernetes deployment, scale it back up. + +{% include_cached copy-clipboard.html %} +~~~shell +kubectl scale deploy/app-eu --replicas=3 +~~~ + +Or if this was handled by the NGINX Controller, remove the 503 block that was written in step 3: + +{% include_cached copy-clipboard.html %} +~~~yml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: app +# annotations: +# nginx.ingress.kubernetes.io/server-snippet: | +# if ($http_x_region = "eu") { +# return 503; +# } +spec: + ingressClassName: nginx + rules: + - host: api.example.com + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: app + port: + number: 80 +~~~ + +This ends downtime for the current migration phase. + +## Step 12: Stop failback replication + +After traffic has been cut over to the target, you can maintain failback replication indefinitely. Once you decide that you want to use the CockroachDB cluster as your sole data store going forward, you can end failback replication with the following steps. + +{% include molt/migration-stop-replication.md %} + +## Repeat for each phase + +During the next scheduled migration phase, [return to step 3](#step-3-load-data-into-cockroachdb) to migrate the next phase of data. Repeat steps 3-12 for each phase of data, until every region's data has been migrated and all application traffic has been cut over to the target. + +## Troubleshooting + +{% include molt/molt-troubleshooting-fetch.md %} +{% include molt/molt-troubleshooting-replication.md %} +{% include molt/molt-troubleshooting-failback.md %} + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Phased Bulk Load Migration]({% link molt/migration-approach-phased-bulk-load.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) \ No newline at end of file diff --git a/src/current/_includes/molt/replicator-flags-usage.md b/src/current/_includes/molt/replicator-flags-usage.md index e7617b9c5de..2c8aadf4c1e 100644 --- a/src/current/_includes/molt/replicator-flags-usage.md +++ b/src/current/_includes/molt/replicator-flags-usage.md @@ -1,6 +1,6 @@ Configure the following [MOLT Replicator]({% link molt/molt-replicator.md %}) flags for continuous replication. For details on all available flags, refer to [Replicator Flags]({% link molt/replicator-flags.md %}). -{% if page.name == "migrate-load-replicate.md" %} +{% if page.name contains "delta" %}
| Flag | Description | |--------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -42,20 +42,6 @@ You can find the starting GTID in the `cdc_cursor` field of the `fetch complete` You can find the SCN values in the message `replication-only mode should include the following replicator flags` after the [initial data load](#start-fetch) completes.
-{% elsif page.name == "migrate-failback.md" %} -| Flag | Description | -|---------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) | **Required.** Staging schema name on CockroachDB for the changefeed checkpoint table. Schema name must be fully qualified in the format `database.schema`. | -| [`--bindAddr`]({% link molt/replicator-flags.md %}#bind-addr) | **Required.** Network address to bind the webhook sink for the changefeed. For example, `:30004`. | -| [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) | Path to the server TLS certificate for the webhook sink. Refer to [TLS certificate and key](#tls-certificate-and-key). | -| [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key) | Path to the server TLS private key for the webhook sink. Refer to [TLS certificate and key](#tls-certificate-and-key).Q | -| [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | -| [`--userscript`]({% link molt/replicator-flags.md %}#userscript) | Path to a [userscript]({% link molt/userscript-overview.md %}) that enables data filtering, routing, or transformations. For examples, refer to [Userscript Cookbook]({% link molt/userscript-cookbook.md %}). | - -- The staging schema is first created during [initial replication setup]({% link molt/migrate-load-replicate.md %}#start-replicator) with [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema). - -- When configuring a [secure changefeed](#tls-certificate-and-key) for failback, you **must** include [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) and [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key), which specify the paths to the server certificate and private key for the webhook sink connection. - {% else %} | Flag | Description | |---------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------| diff --git a/src/current/_includes/releases/v24.2/feature-highlights-migrations.html b/src/current/_includes/releases/v24.2/feature-highlights-migrations.html index 7df377498f3..fac0c718033 100644 --- a/src/current/_includes/releases/v24.2/feature-highlights-migrations.html +++ b/src/current/_includes/releases/v24.2/feature-highlights-migrations.html @@ -31,7 +31,7 @@ MOLT Fetch transformation rules

- Column exclusion, computed columns, and partitioned tables are now supported in table migrations with MOLT Fetch. They are supported via a new transformations framework that allows the user to specify a JSON file with instructions on how MOLT Fetch should treat certain schemas, tables, or underlying columns. + Column exclusion, computed columns, and partitioned tables are now supported in table migrations with MOLT Fetch. They are supported via a new transformations framework that allows the user to specify a JSON file with instructions on how MOLT Fetch should treat certain schemas, tables, or underlying columns.

All★★ diff --git a/src/current/_includes/v23.1/sidebar-data/migrate.json b/src/current/_includes/v23.1/sidebar-data/migrate.json index 81d046ba2d9..bfeae2cdd39 100644 --- a/src/current/_includes/v23.1/sidebar-data/migrate.json +++ b/src/current/_includes/v23.1/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", + "urls": [ + "/molt/migration-considerations-granularity.html" + ] + }, + { + "title": "Continuous Replication", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Load and Replicate Separately", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Resume Replication", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-validation.html" ] }, { - "title": "Failback", + "title": "Rollback Plan", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-rollback.html" ] } ] @@ -46,7 +52,7 @@ { "title": "MOLT Tools", "items": [ - { +{ "title": "Schema Conversion Tool", "urls": [ "/cockroachcloud/migrations-page.html" @@ -54,8 +60,43 @@ }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,16 +109,69 @@ ] }, { - "title": "Flags", + "title": "Installation", + "urls": [ + "/molt/molt-replicator-installation.html" + ] + }, + { + "title": "Commands and Flags", "urls": [ "/molt/replicator-flags.html" ] }, + { + "title": "Userscripts", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/userscript-overview.html" + ] + }, + { + "title": "Quickstart", + "urls": [ + "/molt/userscript-quickstart.html" + ] + }, + { + "title": "API", + "urls": [ + "/molt/userscript-api.html" + ] + }, + { + "title": "Cookbook", + "urls": [ + "/molt/userscript-cookbook.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/userscript-metrics.html" + ] + } + ] + }, { "title": "Metrics", "urls": [ "/molt/replicator-metrics.html" ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -89,6 +183,127 @@ } ] }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/classic-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/classic-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/classic-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Phased Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Delta Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/delta-migration-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/delta-migration-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/delta-migration-oracle.html" + ] + } + ] + }, + { + "title": "Phased Delta Migration with Failback Replication", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-delta-failback.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-delta-failback-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-delta-failback-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-delta-failback-oracle.html" + ] + } + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v23.2/sidebar-data/migrate.json b/src/current/_includes/v23.2/sidebar-data/migrate.json index 0458b56a1f8..4dd440717de 100644 --- a/src/current/_includes/v23.2/sidebar-data/migrate.json +++ b/src/current/_includes/v23.2/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-granularity.html" ] }, { - "title": "Load and Replicate Separately", + "title": "Continuous Replication", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Resume Replication", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Failback", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-validation.html" + ] + }, + { + "title": "Rollback Plan", + "urls": [ + "/molt/migration-considerations-rollback.html" ] } ] @@ -54,8 +60,43 @@ }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,15 +109,15 @@ ] }, { - "title": "Flags", + "title": "Installation", "urls": [ - "/molt/replicator-flags.html" + "/molt/molt-replicator-installation.html" ] }, { - "title": "Metrics", + "title": "Commands and Flags", "urls": [ - "/molt/replicator-metrics.html" + "/molt/replicator-flags.html" ] }, { @@ -113,6 +154,24 @@ ] } ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -124,6 +183,133 @@ } ] }, + { + "title": "Migration Strategy", + "urls": [ + "/molt/migration-strategy.html" + ] + }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/classic-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/classic-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/classic-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Phased Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Delta Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/delta-migration-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/delta-migration-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/delta-migration-oracle.html" + ] + } + ] + }, + { + "title": "Phased Delta Migration with Failback Replication", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-delta-failback.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-delta-failback-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-delta-failback-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-delta-failback-oracle.html" + ] + } + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v24.1/sidebar-data/migrate.json b/src/current/_includes/v24.1/sidebar-data/migrate.json index 0458b56a1f8..4dd440717de 100644 --- a/src/current/_includes/v24.1/sidebar-data/migrate.json +++ b/src/current/_includes/v24.1/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-granularity.html" ] }, { - "title": "Load and Replicate Separately", + "title": "Continuous Replication", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Resume Replication", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Failback", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-validation.html" + ] + }, + { + "title": "Rollback Plan", + "urls": [ + "/molt/migration-considerations-rollback.html" ] } ] @@ -54,8 +60,43 @@ }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,15 +109,15 @@ ] }, { - "title": "Flags", + "title": "Installation", "urls": [ - "/molt/replicator-flags.html" + "/molt/molt-replicator-installation.html" ] }, { - "title": "Metrics", + "title": "Commands and Flags", "urls": [ - "/molt/replicator-metrics.html" + "/molt/replicator-flags.html" ] }, { @@ -113,6 +154,24 @@ ] } ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -124,6 +183,133 @@ } ] }, + { + "title": "Migration Strategy", + "urls": [ + "/molt/migration-strategy.html" + ] + }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/classic-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/classic-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/classic-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Phased Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Delta Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/delta-migration-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/delta-migration-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/delta-migration-oracle.html" + ] + } + ] + }, + { + "title": "Phased Delta Migration with Failback Replication", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-delta-failback.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-delta-failback-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-delta-failback-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-delta-failback-oracle.html" + ] + } + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v24.2/sidebar-data/migrate.json b/src/current/_includes/v24.2/sidebar-data/migrate.json index 0458b56a1f8..4dd440717de 100644 --- a/src/current/_includes/v24.2/sidebar-data/migrate.json +++ b/src/current/_includes/v24.2/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-granularity.html" ] }, { - "title": "Load and Replicate Separately", + "title": "Continuous Replication", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Resume Replication", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Failback", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-validation.html" + ] + }, + { + "title": "Rollback Plan", + "urls": [ + "/molt/migration-considerations-rollback.html" ] } ] @@ -54,8 +60,43 @@ }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,15 +109,15 @@ ] }, { - "title": "Flags", + "title": "Installation", "urls": [ - "/molt/replicator-flags.html" + "/molt/molt-replicator-installation.html" ] }, { - "title": "Metrics", + "title": "Commands and Flags", "urls": [ - "/molt/replicator-metrics.html" + "/molt/replicator-flags.html" ] }, { @@ -113,6 +154,24 @@ ] } ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -124,6 +183,133 @@ } ] }, + { + "title": "Migration Strategy", + "urls": [ + "/molt/migration-strategy.html" + ] + }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/classic-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/classic-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/classic-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Phased Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Delta Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/delta-migration-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/delta-migration-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/delta-migration-oracle.html" + ] + } + ] + }, + { + "title": "Phased Delta Migration with Failback Replication", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-delta-failback.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-delta-failback-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-delta-failback-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-delta-failback-oracle.html" + ] + } + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v24.3/sidebar-data/migrate.json b/src/current/_includes/v24.3/sidebar-data/migrate.json index 0458b56a1f8..4dd440717de 100644 --- a/src/current/_includes/v24.3/sidebar-data/migrate.json +++ b/src/current/_includes/v24.3/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-granularity.html" ] }, { - "title": "Load and Replicate Separately", + "title": "Continuous Replication", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Resume Replication", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Failback", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-validation.html" + ] + }, + { + "title": "Rollback Plan", + "urls": [ + "/molt/migration-considerations-rollback.html" ] } ] @@ -54,8 +60,43 @@ }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,15 +109,15 @@ ] }, { - "title": "Flags", + "title": "Installation", "urls": [ - "/molt/replicator-flags.html" + "/molt/molt-replicator-installation.html" ] }, { - "title": "Metrics", + "title": "Commands and Flags", "urls": [ - "/molt/replicator-metrics.html" + "/molt/replicator-flags.html" ] }, { @@ -113,6 +154,24 @@ ] } ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -124,6 +183,133 @@ } ] }, + { + "title": "Migration Strategy", + "urls": [ + "/molt/migration-strategy.html" + ] + }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/classic-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/classic-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/classic-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Phased Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Delta Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/delta-migration-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/delta-migration-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/delta-migration-oracle.html" + ] + } + ] + }, + { + "title": "Phased Delta Migration with Failback Replication", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-delta-failback.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-delta-failback-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-delta-failback-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-delta-failback-oracle.html" + ] + } + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v25.1/sidebar-data/migrate.json b/src/current/_includes/v25.1/sidebar-data/migrate.json index a2a50c4e81a..4dd440717de 100644 --- a/src/current/_includes/v25.1/sidebar-data/migrate.json +++ b/src/current/_includes/v25.1/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-granularity.html" ] }, { - "title": "Load and Replicate", + "title": "Continuous Replication", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Resume Replication", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Failback", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-validation.html" + ] + }, + { + "title": "Rollback Plan", + "urls": [ + "/molt/migration-considerations-rollback.html" ] } ] @@ -54,8 +60,43 @@ }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,15 +109,15 @@ ] }, { - "title": "Flags", + "title": "Installation", "urls": [ - "/molt/replicator-flags.html" + "/molt/molt-replicator-installation.html" ] }, { - "title": "Metrics", + "title": "Commands and Flags", "urls": [ - "/molt/replicator-metrics.html" + "/molt/replicator-flags.html" ] }, { @@ -113,6 +154,24 @@ ] } ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -124,6 +183,133 @@ } ] }, + { + "title": "Migration Strategy", + "urls": [ + "/molt/migration-strategy.html" + ] + }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/classic-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/classic-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/classic-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Phased Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Delta Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/delta-migration-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/delta-migration-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/delta-migration-oracle.html" + ] + } + ] + }, + { + "title": "Phased Delta Migration with Failback Replication", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-delta-failback.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-delta-failback-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-delta-failback-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-delta-failback-oracle.html" + ] + } + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v25.2/sidebar-data/migrate.json b/src/current/_includes/v25.2/sidebar-data/migrate.json index 6c997dd96a1..4dd440717de 100644 --- a/src/current/_includes/v25.2/sidebar-data/migrate.json +++ b/src/current/_includes/v25.2/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-granularity.html" ] }, { - "title": "Load and Replicate", + "title": "Continuous Replication", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Resume Replication", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Failback", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-validation.html" + ] + }, + { + "title": "Rollback Plan", + "urls": [ + "/molt/migration-considerations-rollback.html" ] } ] @@ -54,8 +60,43 @@ }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,15 +109,15 @@ ] }, { - "title": "Flags", + "title": "Installation", "urls": [ - "/molt/replicator-flags.html" + "/molt/molt-replicator-installation.html" ] }, { - "title": "Metrics", + "title": "Commands and Flags", "urls": [ - "/molt/replicator-metrics.html" + "/molt/replicator-flags.html" ] }, { @@ -113,6 +154,24 @@ ] } ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -124,6 +183,133 @@ } ] }, + { + "title": "Migration Strategy", + "urls": [ + "/molt/migration-strategy.html" + ] + }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/classic-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/classic-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/classic-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Phased Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Delta Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/delta-migration-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/delta-migration-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/delta-migration-oracle.html" + ] + } + ] + }, + { + "title": "Phased Delta Migration with Failback Replication", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-delta-failback.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-delta-failback-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-delta-failback-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-delta-failback-oracle.html" + ] + } + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ @@ -207,4 +393,4 @@ ] } ] -} +} \ No newline at end of file diff --git a/src/current/_includes/v25.3/sidebar-data/migrate.json b/src/current/_includes/v25.3/sidebar-data/migrate.json index a2a50c4e81a..4dd440717de 100644 --- a/src/current/_includes/v25.3/sidebar-data/migrate.json +++ b/src/current/_includes/v25.3/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-granularity.html" ] }, { - "title": "Load and Replicate", + "title": "Continuous Replication", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Resume Replication", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Failback", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-validation.html" + ] + }, + { + "title": "Rollback Plan", + "urls": [ + "/molt/migration-considerations-rollback.html" ] } ] @@ -54,8 +60,43 @@ }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,15 +109,15 @@ ] }, { - "title": "Flags", + "title": "Installation", "urls": [ - "/molt/replicator-flags.html" + "/molt/molt-replicator-installation.html" ] }, { - "title": "Metrics", + "title": "Commands and Flags", "urls": [ - "/molt/replicator-metrics.html" + "/molt/replicator-flags.html" ] }, { @@ -113,6 +154,24 @@ ] } ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -124,6 +183,133 @@ } ] }, + { + "title": "Migration Strategy", + "urls": [ + "/molt/migration-strategy.html" + ] + }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/classic-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/classic-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/classic-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Phased Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Delta Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/delta-migration-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/delta-migration-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/delta-migration-oracle.html" + ] + } + ] + }, + { + "title": "Phased Delta Migration with Failback Replication", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-delta-failback.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-delta-failback-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-delta-failback-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-delta-failback-oracle.html" + ] + } + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v25.4/sidebar-data/migrate.json b/src/current/_includes/v25.4/sidebar-data/migrate.json index a2a50c4e81a..783baeb0601 100644 --- a/src/current/_includes/v25.4/sidebar-data/migrate.json +++ b/src/current/_includes/v25.4/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations.html" ] }, { - "title": "Load and Replicate", + "title": "Migration Granularity", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-granularity.html" ] }, { - "title": "Resume Replication", + "title": "Continuous Replication", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Failback", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-transformation.html" + ] + }, + { + "title": "Validation Strategy", + "urls": [ + "/molt/migration-considerations-validation.html" + ] + }, + { + "title": "Rollback Plan", + "urls": [ + "/molt/migration-considerations-rollback.html" ] } ] @@ -54,8 +60,43 @@ }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,15 +109,15 @@ ] }, { - "title": "Flags", + "title": "Installation", "urls": [ - "/molt/replicator-flags.html" + "/molt/molt-replicator-installation.html" ] }, { - "title": "Metrics", + "title": "Commands and Flags", "urls": [ - "/molt/replicator-metrics.html" + "/molt/replicator-flags.html" ] }, { @@ -113,6 +154,24 @@ ] } ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -124,6 +183,133 @@ } ] }, + { + "title": "Migration Best Practices", + "urls": [ + "/molt/migration-strategy.html" + ] + }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/classic-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/classic-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/classic-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Phased Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Delta Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/delta-migration-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/delta-migration-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/delta-migration-oracle.html" + ] + } + ] + }, + { + "title": "Phased Delta Migration with Failback Replication", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-delta-failback.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-delta-failback-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-delta-failback-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-delta-failback-oracle.html" + ] + } + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v26.1/sidebar-data/migrate.json b/src/current/_includes/v26.1/sidebar-data/migrate.json index a2a50c4e81a..783baeb0601 100644 --- a/src/current/_includes/v26.1/sidebar-data/migrate.json +++ b/src/current/_includes/v26.1/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations.html" ] }, { - "title": "Load and Replicate", + "title": "Migration Granularity", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-granularity.html" ] }, { - "title": "Resume Replication", + "title": "Continuous Replication", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Failback", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-transformation.html" + ] + }, + { + "title": "Validation Strategy", + "urls": [ + "/molt/migration-considerations-validation.html" + ] + }, + { + "title": "Rollback Plan", + "urls": [ + "/molt/migration-considerations-rollback.html" ] } ] @@ -54,8 +60,43 @@ }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,15 +109,15 @@ ] }, { - "title": "Flags", + "title": "Installation", "urls": [ - "/molt/replicator-flags.html" + "/molt/molt-replicator-installation.html" ] }, { - "title": "Metrics", + "title": "Commands and Flags", "urls": [ - "/molt/replicator-metrics.html" + "/molt/replicator-flags.html" ] }, { @@ -113,6 +154,24 @@ ] } ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -124,6 +183,133 @@ } ] }, + { + "title": "Migration Best Practices", + "urls": [ + "/molt/migration-strategy.html" + ] + }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/classic-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/classic-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/classic-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Phased Bulk Load Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-bulk-load-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-bulk-load-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-bulk-load-oracle.html" + ] + } + ] + }, + { + "title": "Delta Migration", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/delta-migration-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/delta-migration-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/delta-migration-oracle.html" + ] + } + ] + }, + { + "title": "Phased Delta Migration with Failback Replication", + "items": [ + { + "title": "Overview", + "urls": [ + "/molt/migration-approach-phased-delta-failback.html" + ] + }, + { + "title": "From PostgreSQL", + "urls": [ + "/molt/phased-delta-failback-postgres.html" + ] + }, + { + "title": "From MySQL", + "urls": [ + "/molt/phased-delta-failback-mysql.html" + ] + }, + { + "title": "From Oracle", + "urls": [ + "/molt/phased-delta-failback-oracle.html" + ] + } + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/advisories/a144650.md b/src/current/advisories/a144650.md index 969f5c24488..814bc8fc947 100644 --- a/src/current/advisories/a144650.md +++ b/src/current/advisories/a144650.md @@ -106,11 +106,11 @@ Follow these steps after [`detect_144650.sh` finds a corrupted job or problemati #### MOLT Fetch -By default, MOLT Fetch uses [`IMPORT INTO`]({% link v25.1/import-into.md %}) to load data into CockroachDB, and can therefore be affected by this issue. [As recommended in the migration documentation]({% link molt/migrate-load-replicate.md %}#stop-replication-and-verify-data), a run of [MOLT Fetch]({% link molt/molt-fetch.md %}) should be followed by a run of [MOLT Verify]({% link molt/molt-verify.md %}) to ensure that all data on the target side matches the data on the source side. +By default, MOLT Fetch uses [`IMPORT INTO`]({% link v25.1/import-into.md %}) to load data into CockroachDB, and can therefore be affected by this issue. [As recommended in the migration documentation]({% link molt/migration-overview.md %}), a run of [MOLT Fetch]({% link molt/molt-fetch.md %}) should be followed by a run of [MOLT Verify]({% link molt/molt-verify.md %}) to ensure that all data on the target side matches the data on the source side. - If you ran MOLT Verify after completing your MOLT Fetch run, and Verify did not find mismatches, then MOLT Fetch was unaffected by this issue. -- If you did not run Verify after Fetch, analyze the exported files that exist in your configured [Fetch data path]({% link molt/molt-fetch.md %}#data-path) to determine the expected number of rows. Then follow steps 1-3 in the [`IMPORT`](#import) section. +- If you did not run Verify after Fetch, analyze the exported files that exist in your configured [Fetch data path]({% link molt/molt-fetch.md %}#define-intermediate-storage) to determine the expected number of rows. Then follow steps 1-3 in the [`IMPORT`](#import) section. #### Physical Cluster Replication diff --git a/src/current/images/molt/migration_flow.svg b/src/current/images/molt/migration_flow.svg deleted file mode 100644 index e66dbb43359..00000000000 --- a/src/current/images/molt/migration_flow.svg +++ /dev/null @@ -1,887 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/src/current/images/molt/molt-fetch-flow-1.png b/src/current/images/molt/molt-fetch-flow-1.png new file mode 100644 index 00000000000..5ce30eac061 Binary files /dev/null and b/src/current/images/molt/molt-fetch-flow-1.png differ diff --git a/src/current/images/molt/molt_classic_bulk_load_flow.svg b/src/current/images/molt/molt_classic_bulk_load_flow.svg new file mode 100644 index 00000000000..760fe1efd2a --- /dev/null +++ b/src/current/images/molt/molt_classic_bulk_load_flow.svg @@ -0,0 +1,626 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/current/images/molt/molt_delta_flow.svg b/src/current/images/molt/molt_delta_flow.svg new file mode 100644 index 00000000000..2eb954be0a0 --- /dev/null +++ b/src/current/images/molt/molt_delta_flow.svg @@ -0,0 +1,908 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/current/images/molt/molt_flow_generic.svg b/src/current/images/molt/molt_flow_generic.svg new file mode 100644 index 00000000000..c7d57fc4634 --- /dev/null +++ b/src/current/images/molt/molt_flow_generic.svg @@ -0,0 +1,856 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/current/images/molt/molt_phased_bulk_load_flow.svg b/src/current/images/molt/molt_phased_bulk_load_flow.svg new file mode 100644 index 00000000000..53fab0351ba --- /dev/null +++ b/src/current/images/molt/molt_phased_bulk_load_flow.svg @@ -0,0 +1,666 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/current/images/molt/molt_phased_delta_flow.svg b/src/current/images/molt/molt_phased_delta_flow.svg new file mode 100644 index 00000000000..8490e17abc8 --- /dev/null +++ b/src/current/images/molt/molt_phased_delta_flow.svg @@ -0,0 +1,1154 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/current/molt/classic-bulk-load-mysql.md b/src/current/molt/classic-bulk-load-mysql.md new file mode 100644 index 00000000000..fa5e15a971a --- /dev/null +++ b/src/current/molt/classic-bulk-load-mysql.md @@ -0,0 +1,18 @@ +--- +title: Classic Bulk Load Migration from MySQL +summary: Learn what a Classic Bulk Load Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/classic-bulk-load-all-sources.md %} diff --git a/src/current/molt/classic-bulk-load-oracle.md b/src/current/molt/classic-bulk-load-oracle.md new file mode 100644 index 00000000000..df74e4b325b --- /dev/null +++ b/src/current/molt/classic-bulk-load-oracle.md @@ -0,0 +1,18 @@ +--- +title: Classic Bulk Load Migration from Oracle +summary: Learn what a Classic Bulk Load Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/classic-bulk-load-all-sources.md %} diff --git a/src/current/molt/classic-bulk-load-postgres.md b/src/current/molt/classic-bulk-load-postgres.md new file mode 100644 index 00000000000..c7055f433fd --- /dev/null +++ b/src/current/molt/classic-bulk-load-postgres.md @@ -0,0 +1,18 @@ +--- +title: Classic Bulk Load Migration from PostgreSQL +summary: Learn what a Classic Bulk Load Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/classic-bulk-load-all-sources.md %} \ No newline at end of file diff --git a/src/current/molt/delta-migration-mysql.md b/src/current/molt/delta-migration-mysql.md new file mode 100644 index 00000000000..ddaa9b07d3b --- /dev/null +++ b/src/current/molt/delta-migration-mysql.md @@ -0,0 +1,18 @@ +--- +title: Delta Migration from MySQL +summary: Learn what a Delta Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/delta-all-sources.md %} \ No newline at end of file diff --git a/src/current/molt/delta-migration-oracle.md b/src/current/molt/delta-migration-oracle.md new file mode 100644 index 00000000000..30fadea72f0 --- /dev/null +++ b/src/current/molt/delta-migration-oracle.md @@ -0,0 +1,18 @@ +--- +title: Delta Migration from Oracle +summary: Learn what a Delta Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/delta-all-sources.md %} \ No newline at end of file diff --git a/src/current/molt/delta-migration-postgres.md b/src/current/molt/delta-migration-postgres.md new file mode 100644 index 00000000000..6abc2491b19 --- /dev/null +++ b/src/current/molt/delta-migration-postgres.md @@ -0,0 +1,18 @@ +--- +title: Delta Migration from PostgreSQL +summary: Learn what a Delta Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/delta-all-sources.md %} \ No newline at end of file diff --git a/src/current/molt/migrate-bulk-load.md b/src/current/molt/migrate-bulk-load.md deleted file mode 100644 index 3247c617175..00000000000 --- a/src/current/molt/migrate-bulk-load.md +++ /dev/null @@ -1,89 +0,0 @@ ---- -title: Bulk Load Migration -summary: Learn how to migrate data from a source database (such as PostgreSQL, MySQL, or Oracle) into a CockroachDB cluster. -toc: true -docs_area: migrate ---- - -Perform a one-time bulk load of source data into CockroachDB. - -{% include molt/crdb-to-crdb-migration.md %} - -{% include molt/molt-setup.md %} - -## Start Fetch - -Perform the bulk load of the source data. - -1. Run the [MOLT Fetch]({% link molt/molt-fetch.md %}) command to move the source data into CockroachDB. This example command passes the source and target connection strings [as environment variables](#secure-connections), writes [intermediate files](#intermediate-file-storage) to S3 storage, and uses the `truncate-if-exists` [table handling mode](#table-handling-mode) to truncate the target tables before loading data. It limits the migration to a single schema and filters for three specific tables. The [data load mode](#data-load-mode) defaults to `IMPORT INTO`. Include the `--ignore-replication-check` flag to skip replication checkpoint queries, which eliminates the need to configure the source database for logical replication. - -
- {% include_cached copy-clipboard.html %} - ~~~ shell - molt fetch \ - --source $SOURCE \ - --target $TARGET \ - --schema-filter 'migration_schema' \ - --table-filter 'employees|payments|orders' \ - --bucket-path 's3://migration/data/cockroach' \ - --table-handling truncate-if-exists \ - --ignore-replication-check - ~~~ -
- -
- {% include_cached copy-clipboard.html %} - ~~~ shell - molt fetch \ - --source $SOURCE \ - --target $TARGET \ - --table-filter 'employees|payments|orders' \ - --bucket-path 's3://migration/data/cockroach' \ - --table-handling truncate-if-exists \ - --ignore-replication-check - ~~~ -
- -
- The command assumes an Oracle Multitenant (CDB/PDB) source. `--source-cdb` specifies the container database (CDB) connection string. - - {% include_cached copy-clipboard.html %} - ~~~ shell - molt fetch \ - --source $SOURCE \ - --source-cdb $SOURCE_CDB \ - --target $TARGET \ - --schema-filter 'migration_schema' \ - --table-filter 'employees|payments|orders' \ - --bucket-path 's3://migration/data/cockroach' \ - --table-handling truncate-if-exists \ - --ignore-replication-check - ~~~ -
- -{% include molt/fetch-data-load-output.md %} - -## Verify the data load - -{% include molt/verify-output.md %} - -## Add constraints and indexes - -{% include molt/migration-modify-target-schema.md %} - -## Cutover - -Perform a cutover by resuming application traffic, now to CockroachDB. - -## Troubleshooting - -{% include molt/molt-troubleshooting-fetch.md %} - -## See also - -- [Migration Overview]({% link molt/migration-overview.md %}) -- [Migration Strategy]({% link molt/migration-strategy.md %}) -- [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) -- [MOLT Fetch]({% link molt/molt-fetch.md %}) -- [MOLT Verify]({% link molt/molt-verify.md %}) -- [Migration Failback]({% link molt/migrate-failback.md %}) \ No newline at end of file diff --git a/src/current/molt/migrate-failback.md b/src/current/molt/migrate-failback.md deleted file mode 100644 index 0ae69698c7a..00000000000 --- a/src/current/molt/migrate-failback.md +++ /dev/null @@ -1,327 +0,0 @@ ---- -title: Migration Failback -summary: Learn how to fail back from a CockroachDB cluster to a PostgreSQL, MySQL, or Oracle database. -toc: true -docs_area: migrate ---- - -{{site.data.alerts.callout_info}} -These instructions assume you have already [installed MOLT and completed the prerequisites]({% link molt/migrate-load-replicate.md %}#before-you-begin) for your source dialect. -{{site.data.alerts.end}} - -
- - - -
- -## Prepare the CockroachDB cluster - -{{site.data.alerts.callout_success}} -For details on enabling CockroachDB changefeeds, refer to [Create and Configure Changefeeds]({% link {{ site.current_cloud_version }}/create-and-configure-changefeeds.md %}). -{{site.data.alerts.end}} - -If you are migrating to a CockroachDB {{ site.data.products.core }} cluster, [enable rangefeeds]({% link {{ site.current_cloud_version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) on the cluster: - -{% include_cached copy-clipboard.html %} -~~~ sql -SET CLUSTER SETTING kv.rangefeed.enabled = true; -~~~ - -Use the following optional settings to increase changefeed throughput. - -{{site.data.alerts.callout_danger}} -The following settings can impact source cluster performance and stability, especially SQL foreground latency during writes. For details, refer to [Advanced Changefeed Configuration]({% link {{ site.current_cloud_version }}/advanced-changefeed-configuration.md %}). -{{site.data.alerts.end}} - -To lower changefeed emission latency, but increase SQL foreground latency: - -{% include_cached copy-clipboard.html %} -~~~ sql -SET CLUSTER SETTING kv.rangefeed.closed_timestamp_refresh_interval = '250ms'; -~~~ - -To lower the [closed timestamp]({% link {{ site.current_cloud_version }}/architecture/transaction-layer.md %}#closed-timestamps) lag duration: - -{% include_cached copy-clipboard.html %} -~~~ sql -SET CLUSTER SETTING kv.closed_timestamp.target_duration = '1s'; -~~~ - -To improve catchup speeds but increase cluster CPU usage: - -{% include_cached copy-clipboard.html %} -~~~ sql -SET CLUSTER SETTING kv.rangefeed.concurrent_catchup_iterators = 64; -~~~ - -## Grant target database user permissions - -You should have already created a migration user on the target database (your **original source database**) with the necessary privileges. Refer to [Create migration user on source database]({% link molt/migrate-load-replicate.md %}#create-migration-user-on-source-database). - -For failback replication, grant the user additional privileges to write data back to the target database: - -
-{% include_cached copy-clipboard.html %} -~~~ sql --- Grant INSERT and UPDATE on tables to fail back to -GRANT INSERT, UPDATE ON ALL TABLES IN SCHEMA migration_schema TO migration_user; -ALTER DEFAULT PRIVILEGES IN SCHEMA migration_schema GRANT INSERT, UPDATE ON TABLES TO migration_user; -~~~ -
- -
-{% include_cached copy-clipboard.html %} -~~~ sql --- Grant INSERT and UPDATE on tables to fail back to -GRANT SELECT, INSERT, UPDATE ON migration_db.* TO 'migration_user'@'%'; -FLUSH PRIVILEGES; -~~~ -
- -
-{% include_cached copy-clipboard.html %} -~~~ sql --- Grant INSERT, UPDATE, and FLASHBACK on tables to fail back to -GRANT SELECT, INSERT, UPDATE, FLASHBACK ON migration_schema.employees TO MIGRATION_USER; -GRANT SELECT, INSERT, UPDATE, FLASHBACK ON migration_schema.payments TO MIGRATION_USER; -GRANT SELECT, INSERT, UPDATE, FLASHBACK ON migration_schema.orders TO MIGRATION_USER; -~~~ -
- -## Configure Replicator - -When you run `replicator`, you can configure the following options for replication: - -- [Connection strings](#connection-strings): Specify URL‑encoded source and target connections. -- [TLS certificate and key](#tls-certificate-and-key): Configure secure TLS connections. -- [Replicator flags](#replicator-flags): Specify required and optional flags to configure replicator behavior. -
-- [Tuning parameters](#tuning-parameters): Optimize failback performance and resource usage. -
-- [Replicator metrics](#replicator-metrics): Monitor failback replication performance. - -
- - - -
- -### Connection strings - -For failback, MOLT Replicator uses `--targetConn` to specify the destination database where you want to replicate CockroachDB changes, and `--stagingConn` for the CockroachDB staging database. - -`--targetConn` is the connection string of the database you want to replicate changes to (the database you originally migrated from). - -For example: - -
-~~~ ---targetConn 'postgres://postgres:postgres@localhost:5432/molt?sslmode=verify-full' -~~~ -
- -
-~~~ ---targetConn 'mysql://user:password@localhost/molt?sslcert=.%2fsource_certs%2fclient.root.crt&sslkey=.%2fsource_certs%2fclient.root.key&sslmode=verify-full&sslrootcert=.%2fsource_certs%2fca.crt' -~~~ -
- -
-~~~ ---targetConn 'oracle://C%23%23MIGRATION_USER:password@host:1521/ORCLPDB1' -~~~ -
- -`--stagingConn` is the CockroachDB connection string for staging operations: - -~~~ ---stagingConn 'postgres://crdb_user@localhost:26257/defaultdb?sslmode=verify-full' -~~~ - -#### Secure connections - -{% include molt/molt-secure-connection-strings.md %} - -### TLS certificate and key - -Always use **secure TLS connections** for failback replication to protect data in transit. Do **not** use insecure configurations in production: avoid the `--disableAuthentication` and `--tlsSelfSigned` Replicator flags and `insecure_tls_skip_verify=true` query parameter in the changefeed webhook URI. - -Generate self-signed TLS certificates or certificates from an external CA. Ensure the TLS server certificate and key are accessible on the MOLT Replicator host machine via a relative or absolute file path. When you [start failback with Replicator](#start-replicator), specify the paths with `--tlsCertificate` and `--tlsPrivateKey`. For example: - -{% include_cached copy-clipboard.html %} -~~~ shell -replicator start \ -... \ ---tlsCertificate ./certs/server.crt \ ---tlsPrivateKey ./certs/server.key -~~~ - -The client certificates defined in the changefeed webhook URI must correspond to the server certificates specified in the `replicator` command. This ensures proper TLS handshake between the changefeed and MOLT Replicator. To include client certificates in the changefeed webhook URL, encode them with `base64` and then URL-encode the output with `jq`: - -{% include_cached copy-clipboard.html %} -~~~ shell -base64 -i ./client.crt | jq -R -r '@uri' -base64 -i ./client.key | jq -R -r '@uri' -base64 -i ./ca.crt | jq -R -r '@uri' -~~~ - -When you [create the changefeed](#create-the-cockroachdb-changefeed), pass the encoded certificates in the changefeed URL, where `client_cert`, `client_key`, and `ca_cert` are [webhook sink parameters]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-parameters): - -{% include_cached copy-clipboard.html %} -~~~ sql -CREATE CHANGEFEED FOR TABLE table1, table2 -INTO 'webhook-https://host:port/database/schema?client_cert={base64_and_url_encoded_cert}&client_key={base64_and_url_encoded_key}&ca_cert={base64_and_url_encoded_ca}' -WITH ...; -~~~ - -For additional details on the webhook sink URI, refer to [Webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink). - -### Replicator flags - -{% include molt/replicator-flags-usage.md %} - -
-### Tuning parameters - -{% include molt/optimize-replicator-performance.md %} -
- -### Replicator metrics - -MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: - -~~~ ---metricsAddr :30005 -~~~ - -For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=cockroachdb). - -## Stop forward replication - -{% include molt/migration-stop-replication.md %} - -## Start Replicator - -1. Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `start` command to begin failback replication from CockroachDB to your source database. In this example, `--metricsAddr :30005` enables a Prometheus endpoint for monitoring replication metrics, and `--bindAddr :30004` sets up the webhook endpoint for the changefeed. - - `--stagingSchema` specifies the staging database name (`defaultdb._replicator` in this example) used for replication checkpoints and metadata. This staging database was created during [initial forward replication]({% link molt/migrate-load-replicate.md %}#start-replicator) when you first ran MOLT Replicator with `--stagingCreateSchema`. - - {% include_cached copy-clipboard.html %} - ~~~ shell - replicator start \ - --targetConn $TARGET \ - --stagingConn $STAGING \ - --stagingSchema defaultdb._replicator \ - --metricsAddr :30005 \ - --bindAddr :30004 \ - --tlsCertificate ./certs/server.crt \ - --tlsPrivateKey ./certs/server.key \ - -v - ~~~ - -## Create the CockroachDB changefeed - -Create a CockroachDB changefeed to send changes to MOLT Replicator. - -1. Get the current logical timestamp from CockroachDB, after [ensuring that forward replication has fully drained](#stop-forward-replication): - - {% include_cached copy-clipboard.html %} - ~~~ sql - SELECT cluster_logical_timestamp(); - ~~~ - - ~~~ - cluster_logical_timestamp - ---------------------------------- - 1759246920563173000.0000000000 - ~~~ - -1. Create the CockroachDB changefeed pointing to the MOLT Replicator webhook endpoint. Use `cursor` to specify the logical timestamp from the preceding step. For details on the webhook sink URI, refer to [Webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink). - - {{site.data.alerts.callout_info}} - Explicitly set a default `10s` [`webhook_client_timeout`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options) value in the `CREATE CHANGEFEED` statement. This value ensures that the webhook can report failures in inconsistent networking situations and make crash loops more visible. - {{site.data.alerts.end}} - -
- The target schema is specified in the webhook URL path in the fully-qualified format `/database/schema`. The path specifies the database and schema on the target PostgreSQL database. For example, `/migration_db/migration_schema` routes changes to the `migration_schema` schema in the `migration_db` database. - - {% include_cached copy-clipboard.html %} - ~~~ sql - CREATE CHANGEFEED FOR TABLE employees, payments, orders \ - INTO 'webhook-https://replicator-host:30004/migration_db/migration_schema?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \ - WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}', webhook_client_timeout = '10s'; - ~~~ -
- -
- MySQL tables belong directly to the database, not to a separate schema. The webhook URL path specifies the database name on the target MySQL database. For example, `/migration_db` routes changes to the `migration_db` database. - - {% include_cached copy-clipboard.html %} - ~~~ sql - CREATE CHANGEFEED FOR TABLE employees, payments, orders \ - INTO 'webhook-https://replicator-host:30004/migration_db?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \ - WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}', webhook_client_timeout = '10s'; - ~~~ -
- -
- The webhook URL path specifies the schema name on the target Oracle database. Oracle capitalizes identifiers by default. For example, `/MIGRATION_SCHEMA` routes changes to the `MIGRATION_SCHEMA` schema. - - {% include_cached copy-clipboard.html %} - ~~~ sql - CREATE CHANGEFEED FOR TABLE employees, payments, orders \ - INTO 'webhook-https://replicator-host:30004/MIGRATION_SCHEMA?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \ - WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}', webhook_client_timeout = '10s'; - ~~~ -
- - The output shows the job ID: - - ~~~ - job_id - ----------------------- - 1101234051444375553 - ~~~ - - {{site.data.alerts.callout_success}} - Ensure that only **one** changefeed points to MOLT Replicator at a time to avoid mixing streams of incoming data. - {{site.data.alerts.end}} - -1. Monitor the changefeed status, specifying the job ID: - - ~~~ sql - SHOW CHANGEFEED JOB 1101234051444375553; - ~~~ - - ~~~ - job_id | ... | status | running_status | ... - ----------------------+-----+---------+-------------------------------------------+---- - 1101234051444375553 | ... | running | running: resolved=1759246920563173000,0 | ... - ~~~ - - To confirm the changefeed is active and replicating changes to the target database, check that `status` is `running` and `running_status` shows `running: resolved={timestamp}`. - - {{site.data.alerts.callout_danger}} - `running: resolved` may be reported even if data isn't being sent properly. This typically indicates incorrect host/port configuration or network connectivity issues. - {{site.data.alerts.end}} - -1. Verify that Replicator is reporting incoming HTTP requests from the changefeed. To do so, check the MOLT Replicator logs. Since you enabled debug logging with `-v`, you should see periodic HTTP request successes: - - ~~~ - DEBUG [Aug 25 11:52:47] httpRequest="&{0x14000b068c0 45 200 3 9.770958ms false false}" - DEBUG [Aug 25 11:52:48] httpRequest="&{0x14000d1a000 45 200 3 13.438125ms false false}" - ~~~ - - These debug messages confirm successful changefeed connections to MOLT Replicator. You can disable verbose logging after verifying the connection. - -## Troubleshooting - -{% include molt/molt-troubleshooting-failback.md %} - -## See also - -- [MOLT Replicator]({% link molt/molt-replicator.md %}) -- [Migration Overview]({% link molt/migration-overview.md %}) -- [Migration Strategy]({% link molt/migration-strategy.md %}) -- [MOLT Fetch]({% link molt/molt-fetch.md %}) diff --git a/src/current/molt/migrate-load-replicate.md b/src/current/molt/migrate-load-replicate.md deleted file mode 100644 index 3802550852a..00000000000 --- a/src/current/molt/migrate-load-replicate.md +++ /dev/null @@ -1,314 +0,0 @@ ---- -title: Load and Replicate -summary: Learn how to migrate data from a source database (such as PostgreSQL, MySQL, or Oracle) into a CockroachDB cluster. -toc: true -docs_area: migrate ---- - -Perform an initial bulk load of the source data using [MOLT Fetch]({% link molt/molt-fetch.md %}), then use [MOLT Replicator]({% link molt/molt-replicator.md %}) to replicate ongoing changes to the target. - -{% include molt/crdb-to-crdb-migration.md %} - -{% include molt/molt-setup.md %} - -## Start Fetch - -Perform the initial load of the source data. - -1. Issue the [MOLT Fetch]({% link molt/molt-fetch.md %}) command to move the source data to CockroachDB. This example command passes the source and target connection strings [as environment variables](#secure-connections), writes [intermediate files](#intermediate-file-storage) to S3 storage, and uses the `truncate-if-exists` [table handling mode](#table-handling-mode) to truncate the target tables before loading data. It also limits the migration to a single schema and filters three specific tables to migrate. The [data load mode](#data-load-mode) defaults to `IMPORT INTO`. - -
- You **must** include `--pglogical-replication-slot-name` and `--pglogical-publication-and-slot-drop-and-recreate` to automatically create the publication and replication slot during the data load. - - {% include_cached copy-clipboard.html %} - ~~~ shell - molt fetch \ - --source $SOURCE \ - --target $TARGET \ - --schema-filter 'migration_schema' \ - --table-filter 'employees|payments|orders' \ - --bucket-path 's3://migration/data/cockroach' \ - --table-handling truncate-if-exists \ - --pglogical-replication-slot-name molt_slot \ - --pglogical-publication-and-slot-drop-and-recreate - ~~~ -
- -
- {% include_cached copy-clipboard.html %} - ~~~ shell - molt fetch \ - --source $SOURCE \ - --target $TARGET \ - --table-filter 'employees|payments|orders' \ - --bucket-path 's3://migration/data/cockroach' \ - --table-handling truncate-if-exists - ~~~ -
- -
- The command assumes an Oracle Multitenant (CDB/PDB) source. `--source-cdb` specifies the container database (CDB) connection string. - - {% include_cached copy-clipboard.html %} - ~~~ shell - molt fetch \ - --source $SOURCE \ - --source-cdb $SOURCE_CDB \ - --target $TARGET \ - --schema-filter 'migration_schema' \ - --table-filter 'employees|payments|orders' \ - --bucket-path 's3://migration/data/cockroach' \ - --table-handling truncate-if-exists - ~~~ -
- -{% include molt/fetch-data-load-output.md %} - -## Verify the data load - -Use [MOLT Verify]({% link molt/molt-verify.md %}) to confirm that the source and target data is consistent. This ensures that the data load was successful. - -{% include molt/verify-output.md %} - -## Configure Replicator - -When you run `replicator`, you can configure the following options for replication: - -- [Replication connection strings](#replication-connection-strings): Specify URL-encoded source and target database connections. -- [Replicator flags](#replicator-flags): Specify required and optional flags to configure replicator behavior. -
-- [Tuning parameters](#tuning-parameters): Optimize replication performance and resource usage. -
-- [Replicator metrics](#replicator-metrics): Monitor replication progress and performance. - -
- - - -
- -### Replication connection strings - -MOLT Replicator uses `--sourceConn` and `--targetConn` to specify the source and target database connections. - -`--sourceConn` specifies the connection string of the source database: - -
-~~~ ---sourceConn 'postgresql://{username}:{password}@{host}:{port}/{database}' -~~~ -
- -
-~~~ ---sourceConn 'mysql://{username}:{password}@{protocol}({host}:{port})/{database}' -~~~ -
- -
-~~~ ---sourceConn 'oracle://{username}:{password}@{host}:{port}/{service_name}' -~~~ - -For Oracle Multitenant databases, also specify `--sourcePDBConn` with the PDB connection string: - -~~~ ---sourcePDBConn 'oracle://{username}:{password}@{host}:{port}/{pdb_service_name}' -~~~ -
- -`--targetConn` specifies the target CockroachDB connection string: - -~~~ ---targetConn 'postgresql://{username}:{password}@{host}:{port}/{database}' -~~~ - -{{site.data.alerts.callout_success}} -Follow best practices for securing connection strings. Refer to [Secure connections](#secure-connections). -{{site.data.alerts.end}} - -### Replicator flags - -{% include molt/replicator-flags-usage.md %} - -
-### Tuning parameters - -{% include molt/optimize-replicator-performance.md %} -
- -### Replicator metrics - -MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: - -~~~ ---metricsAddr :30005 -~~~ - -
-For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=postgres). -
- -
-For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=mysql). -
- -
-For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=oracle). -
- -## Start Replicator - -With initial load complete, start replication of ongoing changes on the source to CockroachDB using [MOLT Replicator]({% link molt/molt-replicator.md %}). - -{{site.data.alerts.callout_info}} -MOLT Fetch captures a consistent point-in-time checkpoint at the start of the data load (shown as `cdc_cursor` in the fetch output). Starting replication from this checkpoint ensures that all changes made during and after the data load are replicated to CockroachDB, preventing data loss or duplication. The following steps use the checkpoint values from the fetch output to start replication at the correct position. -{{site.data.alerts.end}} - -
-1. Run the `replicator` command, using the same slot name that you specified with `--pglogical-replication-slot-name` and the publication name created by `--pglogical-publication-and-slot-drop-and-recreate` in the [Fetch command](#start-fetch). Use `--stagingSchema` to specify a unique name for the staging database, and include `--stagingCreateSchema` to have MOLT Replicator automatically create the staging database: - - {% include_cached copy-clipboard.html %} - ~~~ shell - replicator pglogical \ - --sourceConn $SOURCE \ - --targetConn $TARGET \ - --targetSchema defaultdb.migration_schema \ - --slotName molt_slot \ - --publicationName molt_fetch \ - --stagingSchema defaultdb._replicator \ - --stagingCreateSchema \ - --metricsAddr :30005 \ - -v - ~~~ -
- -
-1. Run the `replicator` command, specifying the GTID from the [checkpoint recorded during data load](#start-fetch). Use `--stagingSchema` to specify a unique name for the staging database, and include `--stagingCreateSchema` to have MOLT Replicator automatically create the staging database. If you [filtered tables during the initial load](#schema-and-table-filtering), [write a userscript to filter tables on replication]({% link molt/userscript-cookbook.md %}#filter-multiple-tables) and specify the path with `--userscript`. - - {% include_cached copy-clipboard.html %} - ~~~ shell - replicator mylogical \ - --sourceConn $SOURCE \ - --targetConn $TARGET \ - --targetSchema defaultdb.public \ - --defaultGTIDSet 4c658ae6-e8ad-11ef-8449-0242ac140006:1-29 \ - --stagingSchema defaultdb._replicator \ - --stagingCreateSchema \ - --metricsAddr :30005 \ - --userscript table_filter.ts \ - -v - ~~~ - - {{site.data.alerts.callout_success}} - For MySQL versions that do not support `binlog_row_metadata`, include `--fetchMetadata` to explicitly fetch column metadata. This requires additional permissions on the source MySQL database. Grant `SELECT` permissions with `GRANT SELECT ON migration_db.* TO 'migration_user'@'localhost';`. If that is insufficient for your deployment, use `GRANT PROCESS ON *.* TO 'migration_user'@'localhost';`, though this is more permissive and allows seeing processes and server status. - {{site.data.alerts.end}} -
- -
-1. Run the `replicator` command, specifying the backfill and starting SCN from the [checkpoint recorded during data load](#start-fetch). Use `--stagingSchema` to specify a unique name for the staging database, and include `--stagingCreateSchema` to have MOLT Replicator automatically create the staging database. If you [filtered tables during the initial load](#schema-and-table-filtering), [write a userscript to filter tables on replication]({% link molt/userscript-cookbook.md %}#filter-multiple-tables) and specify the path with `--userscript`. - - {% include_cached copy-clipboard.html %} - ~~~ shell - replicator oraclelogminer \ - --sourceConn $SOURCE \ - --sourcePDBConn $SOURCE_PDB \ - --targetConn $TARGET \ - --sourceSchema MIGRATION_USER \ - --targetSchema defaultdb.migration_schema \ - --backfillFromSCN 26685444 \ - --scn 26685786 \ - --stagingSchema defaultdb._replicator \ - --stagingCreateSchema \ - --metricsAddr :30005 \ - --userscript table_filter.ts \ - -v - ~~~ - - {{site.data.alerts.callout_info}} - When [filtering out tables in a schema with a userscript]({% link molt/userscript-cookbook.md %}#filter-multiple-tables), replication performance may decrease because filtered tables are still included in LogMiner queries and processed before being discarded. - {{site.data.alerts.end}} -
- -## Verify replication - -1. Verify that Replicator is processing changes successfully. To do so, check the MOLT Replicator logs. Since you enabled debug logging with `-v`, you should see connection and row processing messages: - -
- You should see periodic primary keepalive messages: - - ~~~ - DEBUG [Aug 25 14:38:10] primary keepalive received ReplyRequested=false ServerTime="2025-08-25 14:38:09.556773 -0500 CDT" ServerWALEnd=0/49913A58 - DEBUG [Aug 25 14:38:15] primary keepalive received ReplyRequested=false ServerTime="2025-08-25 14:38:14.556836 -0500 CDT" ServerWALEnd=0/49913E60 - ~~~ - - When rows are successfully replicated, you should see debug output like the following: - - ~~~ - DEBUG [Aug 25 14:40:02] upserted rows conflicts=0 duration=7.855333ms proposed=1 target="\"molt\".\"public\".\"tbl1\"" upserted=1 - DEBUG [Aug 25 14:40:02] progressed to LSN: 0/49915DD0 - ~~~ -
- -
- You should see binlog syncer connection and row processing: - - ~~~ - [2025/08/25 15:29:09] [info] binlogsyncer.go:463 begin to sync binlog from GTID set 77263736-7899-11f0-81a5-0242ac120002:1-38 - [2025/08/25 15:29:09] [info] binlogsyncer.go:409 Connected to mysql 8.0.43 server - INFO [Aug 25 15:29:09] connected to MySQL version 8.0.43 - ~~~ - - When rows are successfully replicated, you should see debug output like the following: - - ~~~ - DEBUG [Aug 25 15:29:38] upserted rows conflicts=0 duration=1.801ms proposed=1 target="\"molt\".\"public\".\"tbl1\"" upserted=1 - DEBUG [Aug 25 15:29:38] progressed to consistent point: 77263736-7899-11f0-81a5-0242ac120002:1-39 - ~~~ -
- -
- When transactions are read from the Oracle source, you should see registered transaction IDs (XIDs): - - ~~~ - DEBUG [Jul 3 15:55:12] registered xid 0f001f0040060000 - DEBUG [Jul 3 15:55:12] registered xid 0b001f00bb090000 - ~~~ - - When rows are successfully replicated, you should see debug output like the following: - - ~~~ - DEBUG [Jul 3 15:55:12] upserted rows conflicts=0 duration=2.620009ms proposed=13 target="\"molt_movies\".\"USERS\".\"CUSTOMER_CONTACT\"" upserted=13 - DEBUG [Jul 3 15:55:12] upserted rows conflicts=0 duration=2.212807ms proposed=16 target="\"molt_movies\".\"USERS\".\"CUSTOMER_DEVICE\"" upserted=16 - ~~~ -
- - These messages confirm successful replication. You can disable verbose logging after verifying the connection. - -## Stop replication and verify data - -{% include molt/migration-stop-replication.md %} - -1. Repeat [Verify the data load](#verify-the-data-load) to verify the updated data. - -## Add constraints and indexes - -{% include molt/migration-modify-target-schema.md %} - -## Cutover - -Perform a cutover by resuming application traffic, now to CockroachDB. - -## Troubleshooting - -{% include molt/molt-troubleshooting-fetch.md %} -{% include molt/molt-troubleshooting-replication.md %} - -## See also - -- [Migration Overview]({% link molt/migration-overview.md %}) -- [Migration Strategy]({% link molt/migration-strategy.md %}) -- [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) -- [MOLT Fetch]({% link molt/molt-fetch.md %}) -- [MOLT Verify]({% link molt/molt-verify.md %}) -- [Migration Failback]({% link molt/migrate-failback.md %}) diff --git a/src/current/molt/migrate-resume-replication.md b/src/current/molt/migrate-resume-replication.md deleted file mode 100644 index 03e15b2cabd..00000000000 --- a/src/current/molt/migrate-resume-replication.md +++ /dev/null @@ -1,99 +0,0 @@ ---- -title: Resume Replication -summary: Resume replication after an interruption. -toc: true -docs_area: migrate ---- - -Resume replication using [MOLT Replicator]({% link molt/molt-replicator.md %}) by running `replicator` with the same arguments used during [initial replication setup]({% link molt/migrate-load-replicate.md %}?filters=postgres#start-replicator). Replicator will automatically resume from the saved checkpoint in the existing staging schema. - -{{site.data.alerts.callout_info}} -These instructions assume you have already started replication at least once. To start replication for the first time, refer to [Load and Replicate]({% link molt/migrate-load-replicate.md %}#start-replicator). -{{site.data.alerts.end}} - -
- - - -
- -## Resume replication after interruption - -
-Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `pglogical` command using the same `--stagingSchema` value from your [initial replication command]({% link molt/migrate-load-replicate.md %}?filters=postgres#start-replicator). - -Be sure to specify the same `--slotName` value that you used during your [initial replication command]({% link molt/migrate-load-replicate.md %}?filters=postgres#start-replicator). The replication slot on the source PostgreSQL database automatically tracks the LSN (Log Sequence Number) checkpoint, so replication will resume from where it left off. - -{% include_cached copy-clipboard.html %} -~~~ shell -replicator pglogical \ ---sourceConn $SOURCE \ ---targetConn $TARGET \ ---targetSchema defaultdb.migration_schema \ ---slotName molt_slot \ ---stagingSchema defaultdb._replicator \ ---metricsAddr :30005 \ --v -~~~ -
- -
-Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `mylogical` command using the same `--stagingSchema` value from your [initial replication command]({% link molt/migrate-load-replicate.md %}?filters=mysql#start-replicator). - -Replicator will automatically use the saved GTID (Global Transaction Identifier) from the `memo` table in the staging schema (in this example, `defaultdb._replicator.memo`) and track advancing GTID checkpoints there. To have Replicator start from a different GTID instead of resuming from the checkpoint, clear the `memo` table with `DELETE FROM defaultdb._replicator.memo;` and run the `replicator` command with a new `--defaultGTIDSet` value. - -{{site.data.alerts.callout_success}} -For MySQL versions that do not support `binlog_row_metadata`, include `--fetchMetadata` to explicitly fetch column metadata. This requires additional permissions on the source MySQL database. Grant `SELECT` permissions with `GRANT SELECT ON migration_db.* TO 'migration_user'@'localhost';`. If that is insufficient for your deployment, use `GRANT PROCESS ON *.* TO 'migration_user'@'localhost';`, though this is more permissive and allows seeing processes and server status. -{{site.data.alerts.end}} - -{% include_cached copy-clipboard.html %} -~~~ shell -replicator mylogical \ ---sourceConn $SOURCE \ ---targetConn $TARGET \ ---targetSchema defaultdb.public \ ---stagingSchema defaultdb._replicator \ ---metricsAddr :30005 \ ---userscript table_filter.ts \ --v -~~~ -
- -
-Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `oraclelogminer` command using the same `--stagingSchema` value from your [initial replication command]({% link molt/migrate-load-replicate.md %}?filters=oracle#start-replicator). - -Replicator will automatically find the correct restart SCN (System Change Number) from the `_oracle_checkpoint` table in the staging schema. The restart point is determined by the non-committed row with the smallest `startscn` column value. - -{% include_cached copy-clipboard.html %} -~~~ shell -replicator oraclelogminer \ ---sourceConn $SOURCE \ ---sourcePDBConn $SOURCE_PDB \ ---sourceSchema MIGRATION_USER \ ---targetSchema defaultdb.migration_schema \ ---targetConn $TARGET \ ---stagingSchema defaultdb._replicator \ ---metricsAddr :30005 \ ---userscript table_filter.ts \ --v -~~~ - -{{site.data.alerts.callout_info}} -When [filtering out tables in a schema with a userscript]({% link molt/userscript-cookbook.md %}#filter-multiple-tables), replication performance may decrease because filtered tables are still included in LogMiner queries and processed before being discarded. -{{site.data.alerts.end}} -
- -Replication resumes from the last checkpoint without performing a fresh load. Monitor the metrics endpoint at `http://localhost:30005/_/varz` to track replication progress. - -## Troubleshooting - -{% include molt/molt-troubleshooting-replication.md %} - -## See also - -- [Migration Overview]({% link molt/migration-overview.md %}) -- [Migration Strategy]({% link molt/migration-strategy.md %}) -- [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) -- [MOLT Fetch]({% link molt/molt-fetch.md %}) -- [MOLT Verify]({% link molt/molt-verify.md %}) -- [Migration Failback]({% link molt/migrate-failback.md %}) \ No newline at end of file diff --git a/src/current/molt/migrate-to-cockroachdb.md b/src/current/molt/migrate-to-cockroachdb.md index a64832549da..015dc6a2ae3 100644 --- a/src/current/molt/migrate-to-cockroachdb.md +++ b/src/current/molt/migrate-to-cockroachdb.md @@ -5,33 +5,33 @@ toc: true docs_area: migrate --- -MOLT Fetch supports various migration flows using [MOLT Fetch modes]({% link molt/molt-fetch.md %}#fetch-mode). +MOLT Fetch supports various migration flows using [MOLT Fetch modes]({% link molt/molt-fetch.md %}#define-fetch-mode). {% include molt/crdb-to-crdb-migration.md %} | Migration flow | Mode | Description | Best for | |---------------------------------------------------------------------|------------------------------|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------| -| [Bulk load]({% link molt/migrate-bulk-load.md %}) | `--mode data-load` | Perform a one-time bulk load of source data into CockroachDB. | Testing, migrations with [planned downtime]({% link molt/migration-strategy.md %}#approach-to-downtime) | -| [Load and replicate]({% link molt/migrate-load-replicate.md %}) | MOLT Fetch + MOLT Replicator | Load source data using MOLT Fetch, then replicate subsequent changes using MOLT Replicator. | [Minimal downtime]({% link molt/migration-strategy.md %}#approach-to-downtime) migrations | -| [Resume replication]({% link molt/migrate-resume-replication.md %}) | `--mode replication-only` | Resume replication from a checkpoint after interruption. | Resuming interrupted migrations, post-load sync | -| [Failback]({% link molt/migrate-failback.md %}) | `--mode failback` | Replicate changes from CockroachDB back to the source database. | [Rollback]({% link molt/migrate-failback.md %}) scenarios | +| Bulk load | `--mode data-load` | Perform a one-time bulk load of source data into CockroachDB. | Testing, migrations with [planned downtime]({% link molt/migration-considerations.md %}#permissible-downtime) | +| Load and replicate | MOLT Fetch + MOLT Replicator | Load source data using MOLT Fetch, then replicate subsequent changes using MOLT Replicator. | [Minimal downtime]({% link molt/migration-considerations.md %}#permissible-downtime) migrations | +| Resume replication | `--mode replication-only` | Resume replication from a checkpoint after interruption. | Resuming interrupted migrations, post-load sync | +| Failback | `--mode failback` | Replicate changes from CockroachDB back to the source database. | Rollback scenarios | ### Bulk load -For migrations that tolerate downtime, use `data-load` mode to perform a one-time bulk load of source data into CockroachDB. Refer to [Bulk Load]({% link molt/migrate-bulk-load.md %}). +For migrations that tolerate downtime, use `data-load` mode to perform a one-time bulk load of source data into CockroachDB. ### Migrations with minimal downtime To minimize downtime during migration, MOLT Fetch supports replication streams that sync ongoing changes from the source database to CockroachDB. Instead of performing the entire data load during a planned downtime window, you can perform an initial load followed by continuous replication. Writes are only briefly paused to allow replication to drain before final cutover. The length of the pause depends on the volume of write traffic and the amount of replication lag between the source and CockroachDB. -- Use MOLT Fetch for data loading followed by MOLT Replicator for replication. Refer to [Load and replicate]({% link molt/migrate-load-replicate.md %}). +- Use MOLT Fetch for data loading followed by MOLT Replicator for replication. ### Recovery and rollback strategies If the migration is interrupted or you need to abort cutover, MOLT Fetch supports safe recovery flows: -- Use `replication-only` to resume a previously interrupted replication stream. Refer to [Resume Replication]({% link molt/migrate-resume-replication.md %}). -- Use `failback` to reverse the migration, syncing changes from CockroachDB back to the original source. This ensures data consistency on the source so that you can retry later. Refer to [Migration Failback]({% link molt/migrate-failback.md %}). +- Use `replication-only` to resume a previously interrupted replication stream. +- Use `failback` to reverse the migration, syncing changes from CockroachDB back to the original source. This ensures data consistency on the source so that you can retry later. ## See also diff --git a/src/current/molt/migration-approach-classic-bulk-load.md b/src/current/molt/migration-approach-classic-bulk-load.md new file mode 100644 index 00000000000..227f3ae9c6c --- /dev/null +++ b/src/current/molt/migration-approach-classic-bulk-load.md @@ -0,0 +1,48 @@ +--- +title: Classic Bulk Load Migration +summary: Learn what a Classic Bulk Load Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +--- + +A *Classic Bulk Load Migration* is the simplest way of [migrating data to CockroachDB]({% link molt/migration-overview.md %}). In this approach, you stop application traffic to the source database and migrate data to the target cluster using [MOLT Fetch]({% link molt/molt-fetch.md %}) during a **significant downtime window**. Application traffic is then cut over to the target after schema finalization and data verification. + +- All source data is migrated to the target [at once]({% link molt/migration-considerations-granularity.md %}). + +- This approach does not utilize [continuous replication]({% link molt/migration-considerations-replication.md %}). + +- [Rollback]({% link molt/migration-considerations-rollback.md %}) is manual, but in most cases it's simple, as the source database is preserved and write traffic begins on the target all at once. If you wish to roll back before the target has received any writes that are not present on the source database, nothing needs to be done. If you wish to roll back after the target has received writes that are not present on the source database, you must manually replicate these new rows on the source. + +This approach is best for small databases (<100 GB), internal tools, dev/staging environments, and production environments that can handle business disruption. It's a simple approach that guarantees full data consistency and is easy to execute with limited resources, but it can only be performed if your system can handle significant downtime. + +This page describes an example scenario. While the commands provided can be copy-and-pasted, they may need to be altered or reconsidered to suit the needs of your specific environment. + +
+Classic Bulk Load Migration flow +
+ +## Example scenario + +You have a small (50 GB) database that provides the data store for a web application. You want to migrate the entirety of this database to a new CockroachDB cluster. You schedule a maintenance window for Saturday from 2 AM to 6 AM, and announce it to your users several weeks in advance. + +The application runs on a Kubernetes cluster. + +**Estimated system downtime:** 4 hours. + +## Step-by-step walkthroughs + +The following walkthroughs demonstrate how to use the MOLT tools to perform this migration for each supported source database: + +- [Classic Bulk Load Migration from PostgreSQL]({% link molt/classic-bulk-load-postgres.md %}) +- [Classic Bulk Load Migration from MySQL]({% link molt/classic-bulk-load-mysql.md %}) +- [Classic Bulk Load Migration from Oracle]({% link molt/classic-bulk-load-oracle.md %}) + +{% include molt/crdb-to-crdb-migration.md %} + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Phased Bulk Load Migration]({% link molt/migration-approach-phased-bulk-load.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/migration-approach-delta.md b/src/current/molt/migration-approach-delta.md new file mode 100644 index 00000000000..0594c9f2c68 --- /dev/null +++ b/src/current/molt/migration-approach-delta.md @@ -0,0 +1,48 @@ +--- +title: Delta Migration +summary: Learn what a Delta Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +--- + +A *Delta Migration* uses an initial data load, followed by [continuous replication]({% link molt/migration-considerations-replication.md %}), to [migrate data to CockroachDB]({% link molt/migration-overview.md %}). In this approach, you migrate most application data to the target using [MOLT Fetch]({% link molt/molt-fetch.md %}) **before** stopping application traffic to the source database. You then use [MOLT Replicator]({% link molt/molt-replicator.md %}) to keep the target database in sync with any changes in the source database (the migration _delta_), before finally halting traffic to the source and cutting over to the target after schema finalization and data verification. + +- All source data is migrated to the target [at once]({% link molt/migration-considerations-granularity.md %}). + +- This approach utilizes [continuous replication]({% link molt/migration-considerations-replication.md %}). + +- [Failback replication]({% link molt/migration-considerations-rollback.md %}) is supported, though this example will not use it. See [Phased Delta Migration with Failback Replication]({% link molt/migration-approach-phased-delta-failback.md %}) for an example of a migration that uses failback replication. + +This approach is best for production environments that need to minimize system downtime. + +This page describes an example scenario. While the commands provided can be copy-and-pasted, they may need to be altered or reconsidered to suit the needs of your specific environment. + +
+Delta migration flow +
+ +## Example scenario + +You have a small (300 GB) database that provides the data store for a web application. You want to migrate the entirety of this database to a new CockroachDB cluster. Business cannot accommodate a full maintenance window, but it can accommodate a brief (<60 second) halt in traffic. + +The application runs on a Kubernetes cluster. + +**Estimated system downtime:** 3-5 minutes. + +## Step-by-step walkthroughs + +The following walkthroughs demonstrate how to use the MOLT tools to perform this migration for each supported source database: + +- [Delta Migration from PostgreSQL]({% link molt/delta-migration-postgres.md %}) +- [Delta Migration from MySQL]({% link molt/delta-migration-mysql.md %}) +- [Delta Migration from Oracle]({% link molt/delta-migration-oracle.md %}) + +{% include molt/crdb-to-crdb-migration.md %} + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Phased Bulk Load Migration]({% link molt/migration-approach-phased-bulk-load.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/migration-approach-phased-bulk-load.md b/src/current/molt/migration-approach-phased-bulk-load.md new file mode 100644 index 00000000000..1f02b4fc6a0 --- /dev/null +++ b/src/current/molt/migration-approach-phased-bulk-load.md @@ -0,0 +1,50 @@ +--- +title: Phased Bulk Load Migration +summary: Learn what a Phased Bulk Load Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +--- + +A *Phased Bulk Load Migration* involves [migrating data to CockroachDB]({% link molt/migration-overview.md %}) in several phases. Data can be sliced per tenant, per service, per region, or per table to suit the needs of the migration. In this approach, you stop application traffic to the source database _only_ for the tables in a particular slice of data. You then migrate that phase of data to the target cluster using [MOLT Fetch]({% link molt/molt-fetch.md %}) during a **downtime window**. Application traffic is then cut over to those target tables after schema finalization and data verification. This process is repeated for each phase of data. + +- Data is migrated to the target [in phases]({% link molt/migration-considerations-granularity.md %}). + +- This approach does not utilize [continuous replication]({% link molt/migration-considerations-replication.md %}). + +- [Rollback]({% link molt/migration-considerations-rollback.md %}) is manual. If you wish to roll back before the target has received any writes that are not present on the source database, nothing needs to be done. If you wish to roll back after the target has received writes that are not present on the source database, you must manually replicate these new rows on the source. + +This approach is comparable to the [Classic Bulk Load Migration]({% link molt/migration-approach-classic-bulk-load.md %}), but dividing the data into multiple phases allows each downtime window to be shorter, and it allows each phase of the migration to be less complex. Depending on how you divide the data, it also may allow your downtime windows to affect only a subset of users. For example, dividing the data per region could mean that, when migrating the data from Region A, application usage in Region B may remain unaffected. This approach may increase overall migration complexity: its duration is longer, you will need to do the work of partitioning the data, and you will have a longer period when you run both the source and the target database concurrently. + +This approach is best for databases that are too large to migrate all at once, internal tools, dev/staging environments, and production environments that can handle business disruption. It can only be performed if your system can handle downtime for each migration phase, and if your source database can easily be divided into the phases you need. + +This page describes an example scenario. While the commands provided can be copy-and-pasted, they may need to be altered or reconsidered to suit the needs of your specific environment. + +
+Phased Bulk Load Migration flow +
+ +## Example scenario + +You have a moderately-sized (500GB) database that provides the data store for a web application. You want to migrate the entirety of this database to a new CockroachDB cluster. You will divide this migration into four geographic regions (A, B, C, and D). You schedule a maintenance window for each region over four subsequent evenings, and you announce them to your users (per region) several weeks in advance. + +The application runs on a Kubernetes cluster. + +**Estimated system downtime:** 4 hours per region. + +## Step-by-step walkthroughs + +The following walkthroughs demonstrate how to use the MOLT tools to perform this migration for each supported source database: + +- [Phased Bulk Load Migration from PostgreSQL]({% link molt/phased-bulk-load-postgres.md %}) +- [Phased Bulk Load Migration from MySQL]({% link molt/phased-bulk-load-mysql.md %}) +- [Phased Bulk Load Migration from Oracle]({% link molt/phased-bulk-load-oracle.md %}) + +{% include molt/crdb-to-crdb-migration.md %} + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Classic Bulk Load Migration]({% link molt/migration-approach-classic-bulk-load.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) \ No newline at end of file diff --git a/src/current/molt/migration-approach-phased-delta-failback.md b/src/current/molt/migration-approach-phased-delta-failback.md new file mode 100644 index 00000000000..4a504c35f03 --- /dev/null +++ b/src/current/molt/migration-approach-phased-delta-failback.md @@ -0,0 +1,52 @@ +--- +title: Phased Delta Migration with Failback Replication +summary: Learn what a Phased Delta Migration with Failback Replication is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +--- + +A *Phased Delta Migration with Failback Replication* involves [migrating data to CockroachDB]({% link molt/migration-overview.md %}) in several phases. Data can be sliced per tenant, per service, per region, or per table to suit the needs of the migration. **For each given migration phase**, you use [MOLT Fetch]({% link molt/molt-fetch.md %}) to perform an initial bulk load of the data, you use [MOLT Replicator]({% link molt/molt-replicator.md %}) to update the target database via forward replication and to activate failback replication, and then you cut over application traffic to CockroachDB after schema finalization and data verification. This process is repeated for each phase of data. + +- Data is migrated to the target [in phases]({% link molt/migration-considerations-granularity.md %}). + +- This approach utilizes [continuous replication]({% link molt/migration-considerations-replication.md %}). + +- [Rollback]({% link molt/migration-considerations-rollback.md %}) is achieved via failback replication. + +This approach is comparable to the [Delta Migration]({% link molt/migration-approach-delta.md %}), but dividing the data into multiple phases allows each downtime window to be shorter, and it allows each phase of the migration to be less complex. Depending on how you divide the data, it also may allow your downtime windows to affect only a subset of users. For example, dividing the data per region could mean that, when migrating the data from Region A, application usage in Region B may remain unaffected. This approach may increase overall migration complexity: its duration is longer, you will need to do the work of partitioning the data, and you will have a longer period when you run both the source and the target database concurrently. + +[Failback replication]({% link molt/migration-considerations-rollback.md %}) keeps the source database up to date with changes that occur in the target database once the target database begins receiving write traffic. Failback replication ensures that, if something goes wrong during the migration process, traffic can easily be returned to the source database without data loss. Like forward replication, in this approach, failback replication is run on a per-phase basis. It can persist indefinitely, until you're comfortable maintaining the target database as your sole data store. + +This approach is best for databases that are too large to migrate all at once, and in business cases where downtime must be minimal. It's also suitable for risk-averse situations in which a safe rollback path must be ensured. It can only be performed if your team can handle the complexity of this approach, and if your source database can easily be divided into the phases you need. + +This page describes an example scenario. While the commands provided can be copy-and-pasted, they may need to be altered or reconsidered to suit the needs of your specific environment. + +
+Phased Delta Migration flow +
+ +## Example scenario + +You have a moderately-sized (500GB) database that provides the data store for a web application. You want to migrate the entirety of this database to a new CockroachDB cluster. You will divide this migration into four geographic regions (A, B, C, and D). + +The application runs on a Kubernetes cluster with an NGINX Ingress Controller. + +**Estimated system downtime:** 3-5 minutes per region. + +## Step-by-step walkthroughs + +The following walkthroughs demonstrate how to use the MOLT tools to perform this migration for each supported source database: + +- [Phased Delta Migration with Failback Replication from PostgreSQL]({% link molt/phased-delta-failback-postgres.md %}) +- [Phased Delta Migration with Failback Replication from MySQL]({% link molt/phased-delta-failback-mysql.md %}) +- [Phased Delta Migration with Failback Replication from Oracle]({% link molt/phased-delta-failback-oracle.md %}) + +{% include molt/crdb-to-crdb-migration.md %} + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Classic Bulk Load Migration]({% link molt/migration-approach-classic-bulk-load.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) \ No newline at end of file diff --git a/src/current/molt/migration-considerations-granularity.md b/src/current/molt/migration-considerations-granularity.md new file mode 100644 index 00000000000..7ecd2dfd752 --- /dev/null +++ b/src/current/molt/migration-considerations-granularity.md @@ -0,0 +1,84 @@ +--- +title: Migration Granularity +summary: Learn how to think about phased data migration, and whether or not to approach your migration in phases. +toc: true +docs_area: migrate +--- + +You may choose to migrate all of your data into a CockroachDB cluster at once. However, for larger data stores it's recommended that you migrate data in separate phases. This can help break the migration down into manageable slices, and it can help limit the effects of migration difficulties. + +This page explains when to choose each approach, how to define phases, and how to use MOLT tools effectively in either context. + +In general: + +- Choose to migrate your data **all at once** if your data volume is modest, if you want to minimize migration complexity, or if you don't mind taking on a greater risk of something going wrong. + +- Choose a **phased migration** if your data volume is large, especially if you can naturally partition workload by tenant, service/domain, table/shard, geography, or time. A phased migration helps to reduce risk by limiting the workloads that would be adversely affected by a migration failure. It also helps to limit the downtime per phase, and allows the application to continue serving unaffected subsets of data during the migration of a phase. However, breaking the migration into phases increases the time and complexity of the whole migration. + +## How to divide migrations into phases + +Here are some common ways to divide migrations: + +* **Per-tenant**: Multi-tenant apps route traffic and data per customer/tenant. Migrate a small cohort first, then migrate progressively larger cohorts. This aligns with access controls and isolates blast radius. + +* **Per-service/domain**: In microservice architectures, migrate data owned by a service or domain (e.g., billing, catalog) and route only that service to CockroachDB while others continue on the source. This requires clear data ownership and integration contracts. + +* **Per-table or shard**: Start with non-critical tables, large-but-isolated tables, or shard ranges. For monolith schemas, you can still phase by tables with few foreign-key dependencies and clear read/write paths. + +* **Per-region/market**: If traffic is regionally segmented, migrate one region/market at a time and validate latency, capacity, and routing rules before expanding. + +Tips for picking slices: + +- Prefer slices with clear routing keys (tenant_id, region_id) to simplify cutover and verification. + +- Start with lower-impact slices to exercise the migration process before migrating high-value cohorts. + +## Tradeoffs + +| | All at once | Phased | +|---|---|---| +| Downtime | A single downtime window, but it affects the whole database | Multiple short windows, each with limited impact | +| Risk | Higher blast radius if issues surface post-cutover | Lower blast radius, issues confined to a slice | +| Complexity | Simpler orchestration, enables a single cutover | More orchestration, repeated verify and cutover steps | +| Validation | One-time, system-wide | Iterative per slice; faster feedback loops | +| Timeline | Shorter migration time | Longer calendar time but safer path | +| Best for | Small/medium datasets, simple integrations | Larger datasets, data with natural partitions or multiple tenants, risk-averse migrations | + +## Decision framework + +Use these questions to guide your approach: + +**How large is your dataset and how long will a full migration take?** +If you can migrate the entire dataset within an acceptable downtime window, all-at-once is simpler. If the migration would take hours or days, phased migrations reduce the risk and downtime per phase. + +**Does your data have natural partitions?** +If you can clearly partition by tenant, service, region, or table with minimal cross-dependencies, phased migration is well-suited. If your data is highly interconnected with complex foreign-key relationships, all-at-once may be easier. + +**What is your risk tolerance?** +If a migration failure affecting the entire system is unacceptable, phased migration limits the blast radius. If you can afford to roll back the entire migration in case of issues, all-at-once is faster. + +**How much downtime can you afford per cutover?** +Phased migrations spread downtime across multiple smaller windows, each affecting only a subset of users or services. All-at-once requires a single larger window affecting everyone. + +**What is your team's capacity for orchestration?** +Phased migrations require repeated cycles of migration, validation, and cutover, with careful coordination of routing and monitoring. All-at-once is a single coordinated event. + +**Can you route traffic selectively?** +Phased migrations may require the ability to route specific tenants, services, or regions to CockroachDB while others remain on the source. If your application can't easily support this, all-at-once may be necessary. + +## MOLT toolkit support + +Phased and unphased migrations are both supported natively by MOLT. + +By default, [MOLT Fetch]({% link molt/molt-fetch.md %}) moves all data from the source database to CockroachDB. However, you can use the [`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter), [`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter), and [`--filter-path`]({% link molt/molt-fetch-commands-and-flags.md %}#filter-path) flags to selective migrate data from the source to the target. Learn more about [schema and table selection]({% link molt/molt-fetch.md %}#schema-and-table-selection) and [selective data movement]({% link molt/molt-fetch.md %}#select-data-to-migrate), both of which can enable a phased migration. + +Similarly, you can use [MOLT Verify]({% link molt/molt-verify.md %})'s `--schema-filter` and `--table-filter` flags to run validation checks on subsets of the data in your source and target databases. In a phased migration, you will likely want to verify data at the end of each migration phase, rather than at the end of the entire migration. + +[MOLT Replicator]({% link molt/molt-replicator.md %}) replicates full tables by default. If you choose to combine phased migration with [continuous replication]({% link molt/migration-considerations-replication.md %}), you will either need to select phases that include whole tables, or else use [userscripts]({% link molt/userscript-overview.md %}) to select rows to replicate. + +## See Also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Continuous Replication]({% link molt/migration-considerations-replication.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) diff --git a/src/current/molt/migration-considerations-replication.md b/src/current/molt/migration-considerations-replication.md new file mode 100644 index 00000000000..4ded649973f --- /dev/null +++ b/src/current/molt/migration-considerations-replication.md @@ -0,0 +1,97 @@ +--- +title: Continuous Replication +summary: Learn when and how to use continuous replication during data migration to minimize downtime and keep the target synchronized with the source. +toc: true +docs_area: migrate +--- + +Continuous replication can be used during a migration to keep a CockroachDB target cluster synchronized with a live source database. This is often used to minimize downtime at cutover. It can complement bulk data loading or be used independently. + +This page explains when to choose continuous replication, how to combine it with bulk loading, and how to use MOLT tools effectively for each approach. + +In general: + +- Choose to **bulk load only** if you can schedule a downtime window long enough to complete the entire data load and do not need to capture ongoing changes during migration. + +- Choose a **hybrid approach (bulk load + continuous replication)** when you need to minimize downtime and keep the target synchronized with ongoing source database changes until cutover. + +- You can choose **continuous replication only** for tables with transient data, or in other contexts where you only need to capture ongoing changes and are not concerned with migrating a large initial dataset. While it's possible to only use continuous replication, this is less common for an entire migration. This is because with large data volumes, streaming changes from the initial set of data can be much slower than bulk loading the dataset. + +## Permissible downtime + +Downtime is the primary factor to consider in determining your migration's approach to continuous replication. + +If your migration can accommodate a window of **planned downtime** that's made known to your users in advance, a bulk load approach is simpler. A pure bulk load approach is well-suited for test or pre-production refreshes, or with migrations that can successfully move data within a planned downtime window. + +If your migration needs to **minimize downtime**, you will likely need to keep the source database live for as long as possible, continuing to allow write traffic to the source until cutover. In this case, an initial bulk load will need to be followed by a replication period, during which you stream incremental changes from the source to the target CockroachDB cluster. This is ideal for large datasets that are impractical to move within a narrow downtime window, or when you need validation time with a live, continuously synced target before switching traffic. The final downtime is minimized to a brief pause to let replication drain before switching traffic, with the pause length driven by write volume and observed replication lag. + +If you're migrating your data [in mulitple phases]({% link molt/migration-considerations-granularity.md %}), consider the fact that each phase can have its own separate downtime window and cutover, and that migrating in phases can reduce the length of each individual downtime window. + +## Tradeoffs + +This table considers the tradeoffs of the two main options for this variable, "bulk load only" and a "hybrid approach." + +| | Bulk load only | Hybrid (bulk + replication) | +|---|---|---| +| **Downtime** | Requires full downtime for entire load | Minimal final downtime (brief pause to drain) | +| **Performance** | Fastest overall if window allows | Spreads work: bulk moves mass, replication handles ongoing changes | +| **Complexity** | Fewer moving parts, simpler orchestration | Requires replication infrastructure and monitoring | +| **Risk management** | Full commit at once; rollback more disruptive | Supports failback flows for rollback options | +| **Cutover** | Traffic off until entire load completes | Traffic paused briefly while replication drains | +| **Timeline** | Shortest migration time if downtime permits | Longer preparation but safer path | +| **Best for** | Simple moves, test environments, scheduled maintenance | Production migrations, large datasets, high availability requirements | + +## Decision framework + +Use these questions to guide your approach: + +**What downtime can you tolerate?** +If you can't guarantee a window long enough for the full load, favor the hybrid approach to minimize downtime at cutover. + +**How large is the dataset and how fast can you bulk-load it?** +If load time fits inside downtime, bulk-only is simplest. Otherwise, consider a hybrid approach. + +**How active is the source (write rate and burstiness)?** +Higher write rates mean a longer final drain; this pushes you toward hybrid with close monitoring of replication lag before cutover. + +**How much validation do you require pre-cutover?** +Hybrid gives you time to validate a live, synchronized target before switching traffic. + +## MOLT toolkit support + +The MOLT toolkit provides two complementary tools for data migration: [MOLT Fetch]({% link molt/molt-fetch.md %}) for bulk loading the initial dataset, and [MOLT Replicator]({% link molt/molt-replicator.md %}) for continuous replication. These tools work independently or together depending on your chosen replication approach. + +### Bulk load only + +Use MOLT Fetch to [export and load data]({% link molt/molt-fetch.md %}#bulk-data-load) to CockroachDB. + +For pure bulk migrations, set the [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check) flag to skip gathering replication checkpoints. This simplifies the workflow when you don't need to track change positions for subsequent replication. + +MOLT Fetch supports both `IMPORT INTO` (default, for highest throughput with offline tables) and `COPY FROM` (for online tables) loading methods. Because a pure bulk load approach will likely involve substantial application downtime, you may benefit from using `IMPORT INTO`. In this case, do not use the [`--use-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#use-copy) flag. Learn more about Fetch's [data load modes]({% link molt/molt-fetch.md %}#import-into-vs-copy-from). + +A migration that does not utilize continuous replication would not need to use MOLT Replicator. + +### Hybrid (bulk load + continuous replication) + +Use MOLT Fetch to [export and load the inital dataset]({% link molt/molt-fetch.md %}#initial-bulk-load-before-replication) to CockroachDB. Then start MOLT Replicator to [begin streaming changes]({% link molt/molt-replicator.md %}#forward-replication-after-initial-load) from the source database to CockroachDB. + +When you run MOLT Fetch without [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check), it emits a checkpoint value that marks the point in time when the bulk load snapshot was taken. After MOLT Fetch completes, the checkpoint is stored in the target database. MOLT Replicator then uses this checkpoint to begin streaming changes from exactly that point, ensuring no data is missed between the bulk load and continuous replication. Learn more about [replication checkpoints]({% link molt/molt-replicator.md %}#replication-checkpoints). + +MOLT Fetch supports both `IMPORT INTO` (default, for highest throughput with offline tables) and `COPY FROM` (for online tables) loading methods. Because a hybrid approach will likely aim to have less downtime, you may need to use `COPY FROM` if your tables remain online. In this case, use the [`--use-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#use-copy) flag. Learn more about Fetch's [data load modes]({% link molt/molt-fetch.md %}#import-into-vs-copy-from). + +MOLT Replicator replicates full tables by default. If you choose to combine continuous replication with a [phased migration]({% link molt/migration-considerations-granularity.md %}), you will either need to select phases that include whole tables, or else use [userscripts]({% link molt/replicator-flags.md %}#userscript) to select rows to replicate. + +MOLT Replicator can be stopped after cutover, or it can remain online to continue streaming changes indefinitely. + +### Continuous replication only + +If you're only interested in capturing recent changes, skip MOLT Fetch entirely and just use MOLT Replicator. + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Migration Granularity]({% link molt/migration-considerations-granularity.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/migration-considerations-rollback.md b/src/current/molt/migration-considerations-rollback.md new file mode 100644 index 00000000000..ed5121135e9 --- /dev/null +++ b/src/current/molt/migration-considerations-rollback.md @@ -0,0 +1,84 @@ +--- +title: Rollback Plan +summary: Learn how to plan rollback options to limit risk and preserve data integrity during migration. +toc: true +docs_area: migrate +--- + +A rollback plan defines how you will undo or recover from a failed migration. A clear rollback strategy limits risk during migration, minimizes business impact, and preserves data integrity so that you can retry the migration with confidence. + +This page explains common rollback options, their trade-offs, and how the MOLT toolkit supports each approach. + +In general: + +- **Manual reconciliation** is sufficient for low-risk systems or low-complexity migrations where automated rollback is not necessary. + +- Utilize **failback replication** to maintain synchronization between the CockroachDB cluster and the original source database after cutover to CockroachDB. + +- Utilize **bidirectional replication** (simultaneous forward and failback replication) to maximize database synchronization without requiring app changes, accepting the operational overhead of running two replication streams. + +- Choose a **dual-write** strategy for the fastest rollback with minimal orchestration, accepting higher application complexity during the trial window. + +## Why plan for rollback + +Many things can go wrong during a migration. Performance issues may surface under production load that didn't appear in testing. Application compatibility problems might emerge that require additional code changes. Data discrepancies could be discovered that necessitate investigation and remediation. In any of these scenarios, the ability to quickly and safely return to the source database is critical to minimizing business impact. + +Your rollback strategy should align with your migration's risk profile, downtime tolerance, and operational capabilities. High-stakes production migrations typically require faster rollback paths with minimal data loss, while test environments or low-traffic systems can tolerate simpler manual approaches. + +### Failback replication + +[Continuous (forward) replication]({% link molt/migration-considerations-replication.md %}), which serves to minimize downtime windows, keeps two databases in sync by replicating changes from the source to the target. In contrast, **failback replication** synchronizes data in the opposite direction, from the target back to the source. + +Failback replication is useful for rollback because it keeps the source database synchronized with writes that occur on CockroachDB after cutover. If problems emerge during your trial period and you need to roll back, the source database already has all the data that was written to CockroachDB. This enables a quick rollback without data loss. + +Failback and forward replication can be used simultaneously (**bidirectional replication**). This is especially useful if the source and the target databases can receive simultaneous, but disparate write traffic. In that case, bidirectional replication is necessary to ensure that both databases stay in sync. It's also useful if downtime windows are long or if cutover is gradual, increasing the likelihood that the two databases receive independent writes. + +### Dual-write + +Failback replication requires an external replication system (like [MOLT Replicator]({% link molt/molt-replicator.md %})) to keep two databases synchronized. Alternatively, you can modify the application code itself to enable **dual-writes**, wherein the application writes to both the source database and CockroachDB during a trial window. If rollback is needed, you can then redirect traffic to the source without additional data movement. + +This enables faster rollback, but increases application complexity as you need to manage two write paths. + +## Decision framework + +Use these questions to guide your rollback strategy: + +**How quickly do you need to roll back if problems occur?** +If you need immediate rollback, choose dual-write or bidirectional replication. If you can tolerate some delay to activate failback replication, one-way, on-demand failback replication is sufficient. For low-risk migrations with generous time windows, manual reconciliation may be acceptable. + +**How much data can you afford to lose during rollback?** +If you cannot lose any data written after cutover, choose bidirectional replication or on-demand failback (both preserve all writes). Dual-write can also preserve data if implemented carefully. Manual reconciliation typically accepts some data loss. + +**Will writes occur to both databases during the trial period?** +If traffic might split between source and target (e.g., during gradual cutover or in multi-region scenarios), bidirectional replication keeps both databases synchronized. If traffic cleanly shifts from source to target, on-demand failback or dual-write is sufficient. + +**Can you modify the application code?** +If application changes are expensive or risky, use database-level replication (bidirectional or on-demand failback) instead of dual-write. + +**What is your team's operational capacity?** +Bidirectional replication requires monitoring and managing two active replication streams. On-demand failback requires a tested runbook for activating failback quickly. Dual-write requires application-layer resilience and observability. Manual reconciliation has the lowest operational complexity. + +**What are your database capabilities?** +Ensure your source database supports the change data capture requirements for the migration window. Verify that CockroachDB changefeeds can provide the necessary failback support for your environment. + +## MOLT toolkit support + +[MOLT Replicator]({% link molt/molt-replicator.md %}) uses change data to stream changes from one database to another. It's used for both [forward replication]({% link molt/migration-considerations-replication.md %}) and [failback replication](#failback-replication). + +To use MOLT Replicator in failback mode, run the [`replicator start`]({% link molt/replicator-flags.md %}#commands) command with its various [flags]({% link molt/replicator-flags.md %}). + +When enabling failback replication, the original source database becomes the replication target, and the original target CockroachDB cluster becomes the replication source. Use the [`--sourceConn`]({% link molt/replicator-flags.md %}#source-conn) flag to indicate the CockroachDB cluster, and use the [`--targetConn`]({% link molt/replicator-flags.md %}#target-conn) flag to indicate the PostgreSQL, MySQL, or Oracle database from which data is being migrated. + +MOLT Replicator can be stopped after cutover, or it can remain online to continue streaming changes indefinitely. + +Rollback plans that do not utilize failback replication will require external tooling, or in the case of a dual-write strategy, changes to application code. You can still use [MOLT Verify]({% link molt/molt-verify.md %}) to ensure parity between the two databases. + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Continuous Replication]({% link molt/migration-considerations-replication.md %}) +- [Validation Strategy]({% link molt/migration-considerations-validation.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/migration-considerations-transformation.md b/src/current/molt/migration-considerations-transformation.md new file mode 100644 index 00000000000..f3f6079d1f8 --- /dev/null +++ b/src/current/molt/migration-considerations-transformation.md @@ -0,0 +1,92 @@ +--- +title: Data Transformation Strategy +summary: Learn about the different approaches to applying data transformations during a migration and how to choose the right strategy for your use case. +toc: true +docs_area: migrate +--- + +Data transformations are applied to data as it moves from the source to the target. Transformations ensure that the data is compatible, consistent, and valuable in the destination. They are a key part of a migration to CockroachDB. When planning a migration, it's important to determine **what** transformations are necessary and **where** they need to occur. + +This page explains the types of transformations to expect, where they can be applied, and how these choices shape your use of MOLT tooling. + +## Common transformation types + +If the source and target schemas are not identical, some sort of transformation is likely to be necessary during a migration. The set of necessary transformations will depend on the differences between your source database schema and your target CockroachDB schema, as well as any data quality or formatting requirements for your application. + +- **Type mapping**: Align source types with CockroachDB types, especially for dialect-specific types. +- **Format conversion**: Change the format or encoding of certain value to align with the target schema (for example, `2024-03-01T00:00:00Z` to `03/01/2024`). +- **Field renaming**: Rename fields to fit target schemas or conventions. +- **Primary key strategy**: Replace source sequences or auto-increment patterns with CockroachDB-friendly IDs (UUIDs, sequences). +- **Table reshaping**: Consolidate partitioned tables, rename tables, or retarget to different schemas. +- **Column changes**: Exclude deprecated columns, or map computed columns. +- **Row filtering**: Move only a subset of rows by tenant, region, or timeframe. +- **Null/default handling**: Replace, remove, or infer missing values. +- **Constraints and indexes**: Drop non-primary-key constraints and secondary indexes before bulk load for performance, then recreate after. + +## Where to transform + +Transformations can occur in the source database, in the target database, or in flight (between the source and the target). Deciding where to perform the transformations is largely determined by technical constraints, including the mutability of the source database and the choice of tooling. + +#### Transform in the source database + +Apply transformations directly on the source database before migrating data. This is only possible if the source database can be modified to accommodate the transformations suited for the target database. + +This provides the advantage of allowing ample time, before the downtime window, to perform the transformations, but it often is not possible due to technical constraints. + +#### Transform in the target database + +Apply transformations in the CockroachDB cluster after data has been loaded. For any transformations that occur in the target cluster, it's recommended that these occur before cutover, to ensure that live data complies with CockroachDB best practices. Transformations that occur before cutover may extend downtime. + +#### Transform in flight + +Apply transformations within the migration pipeline, between the source and target databases. This allows the source database to remain as it is, and it allows the target database to be designed using CockroachDB best practices. It also enables testability by separating transformations from either database. + +However, in-flight transformations may require more complex tooling. Transformation in-flight is largely supported by the [MOLT toolkit](#molt-toolkit-support). + +## Decision framework + +Use these questions to guide your transformation strategy: + +- **What is your downtime tolerance?** Near-zero downtime pushes you toward in-flight transforms that apply consistently to bulk and streaming loads. +- **Will transformation logic be reused post-cutover?** If you need ongoing sync or failback, prefer deterministic, version-controlled in-flight transformations. +- **How complex are the transformations?** Simple schema reshaping favors MOLT Fetch transformations or target DDL. Complex value normalization or routing favors in-flight userscripts. +- **Can you modify the source database?** Source-side transformations require permission and capacity to create views, staging tables, or run transformation queries. + +## MOLT toolkit support + +The MOLT toolkit provides functionality for implementing transformations at each stage of the migration pipeline. + +### MOLT Schema Conversion Tool + +While not a part of the transformation process itself, the [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) automates the creation of the target database schema based on the schema of the source database. This reduces downstream transformation pressure by addressing DDL incompatibilities upfront. + +### MOLT Fetch + +[MOLT Fetch]({% link molt/molt-fetch.md %}) supports transformations during a bulk data load: + +- **Row filtering**: [`--filter-path`]({% link molt/molt-fetch-commands-and-flags.md %}#filter-path) specifies a JSON file with table-to-SQL-predicate mappings evaluated in the source dialect before export. Ensure filtered columns are indexed for performance. +- **Schema shaping**: [`--transformations-file`]({% link molt/molt-fetch-commands-and-flags.md %}#transformations-file) defines table renames, n→1 merges (consolidate partitioned tables), and column exclusions. For n→1 merges, use [`--use-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#use-copy) or [`--direct-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#direct-copy) and pre-create the target table. +- **Type alignment**: [`--type-map-file`]({% link molt/molt-fetch-commands-and-flags.md %}#type-map-file) specifies explicit type mappings when auto-creating target tables. +- **Table lifecycle**: [`--table-handling`]({% link molt/molt-fetch-commands-and-flags.md %}#table-handling) controls whether to truncate, drop-and-recreate, or assume tables exist. + +### MOLT Replicator + +[MOLT Replicator]({% link molt/molt-replicator.md %}) uses TypeScript [userscripts]({% link molt/userscript-overview.md %}) to implement in-flight transformations for continuous replication. Common use cases include: + +- [Renaming tables]({% link molt/userscript-cookbook.md %}#rename-tables): Map source table names to different names on the target. +- [Renaming columns]({% link molt/userscript-cookbook.md %}#rename-columns): Map source column names to different names on the target. +- [Row filtering]({% link molt/userscript-cookbook.md %}#select-data-to-replicate): Filter out specific rows based on conditions, such as excluding soft-deleted records or test data. +- [Table filtering]({% link molt/userscript-cookbook.md %}#filter-multiple-tables): Exclude specific tables from replication. +- [Column filtering]({% link molt/userscript-cookbook.md %}#filter-columns): Remove sensitive or unnecessary columns from replicated data. +- [Data transformation]({% link molt/userscript-cookbook.md %}#compute-new-columns): Transform column values, compute new columns, or change data types during replication. +- [Table partitioning]({% link molt/userscript-cookbook.md %}#route-table-partitions): Distribute rows from a single source table across multiple target tables based on partitioning rules. + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Migration Granularity]({% link molt/migration-considerations-granularity.md %}) +- [Continuous Replication]({% link molt/migration-considerations-replication.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/migration-considerations-validation.md b/src/current/molt/migration-considerations-validation.md new file mode 100644 index 00000000000..4e407fbe8cd --- /dev/null +++ b/src/current/molt/migration-considerations-validation.md @@ -0,0 +1,114 @@ +--- +title: Validation Strategy +summary: Learn when and how to validate data during migration to ensure correctness, completeness, and consistency. +toc: true +docs_area: migrate +--- + +Validation strategies are critical to ensuring a successful data migration. They're how you confirm that the right data has been moved correctly, is complete, and is usable in the new environment. A validation strategy is defined by **what** validations you want to run and **when** you want to run them. + +This page explains how to think about different validation strategies and how to use MOLT tooling to enable validation. + +## What to validate + +Running any of the following validations can help you feel confident that the data in the CockroachDB cluster matches the data in the migration source database. + +- **Row Count Validation**: Ensures the number of records matches between source and target. + +- **Checksum/Hash Validation**: Compares hashed values of rows or columns to detect changes or corruption. + +- **Data Sampling**: Randomly sample and manually compare rows between systems. + +- **Column-Level Comparison**: Validate individual field values across systems. + +- **Business Rule Validation**: Apply domain rules to validate logic or derived values. + +- **Boundary Testing**: Ensure edge-case data (nulls, max values, etc.) are correctly migrated. + +- **Referential Integrity**: Validate that relationships (foreign keys) are intact in the target. + +- **Data Type Validation**: Confirm that fields conform to expected types/formats. + +- **Null/Default Value Checks**: Validate expected default values or NULLs post-migration. + +- **ETL Process Validation**: Check logs, counts, or errors from migration tools. + +- **Automated Testing**: Use scripts or tools to compare results and flag mismatches. + +The rigor of your validations (the set of validations you perform) will depend on your organization's risk tolerance and the complexity of the migration. + +## When to validate + +A migration can be a long process, and depending on the choices made in designing a migration, it can be complex. If the dataset is small or the migration is low in complexity, it may be sufficient to simply run validations when you're ready to cut over application traffic to CockroachDB. However, there are several opportunities to validate your data in advance of cutover. + +It's often useful to find natural checkpoints in your migration flow to run validations, and to increase the rigor of those validations as you approach cutover. + +If performing a migration [in phases]({% link molt/migration-considerations-granularity.md %}), the checkpoints below can be considered in the context of each individual phase. A rigorous validation approach might choose to run validations after each phase, while a more risk-tolerant approach might choose to run them after all of the phases have been migrated but before cutover. + +### Pre-migration (design and dry-run) + +Validate converted schema and resolve type mapping issues. Run a dry-run migration on test data and begin query validation to catch behavioral differences early. + +### After a bulk data load + +Run comprehensive validations to confirm schema and row-level parity before re-adding constraints and indexes that were dropped to accelerate load. + +### During continuous replication + +If using [continuous replication]({% link molt/migration-considerations-replication.md %}), run validation periodically to ensure the target converges with the source. Use live-aware validation to reduce false positives from in-flight changes. This gives you confidence that replication is working correctly. + +### Before cutover + +Once replication has drained, run final validation on the complete cutover scope and verify critical application queries. + +### After cutover + +After traffic moves to CockroachDB, run targeted validation on critical tables and application smoke tests to confirm steady state. + +## Decision framework + +Use these questions to help you determine what validations you want to perform, and when you want to peform them: + +**What is your data volume and validation timeline?** +Larger datasets require more validation time. Consider concurrency tuning, phased validation, or off-peak runs to fit within windows. + +**Are there intentional schema or type differences?** +Expect validation to flag type conversions and collation differences. Decide upfront whether to accept conditional successes or redesign to enable strict parity. + +**What is your organization's risk tolerance?** +High-risk migrations may require comprehensive validation at every checkpoint, including both automated and manual verification. Lower-risk migrations may accept sampling or targeted validation. + +**Are you migrating in phases?** +Phased migrations offer natural checkpoints for validation between phases. Decide whether to validate after each phase or defer to pre-cutover validation. + +**How will you handle validation failures?** +Determine in advance whether mismatches will block cutover, trigger investigation, or allow conditional proceed. Establish clear thresholds and escalation paths. + +## MOLT toolkit support + +[MOLT Verify]({% link molt/molt-verify.md %}) performs structural and row-level comparison between the source database and the CockroachDB cluster. MOLT Verify performs the following verifications to ensure data integrity during a migration: + +- Table Verification: Check that the structure of tables between the source database and the target database are the same. + +- Column Definition Verification: Check that the column names, data types, constraints, nullability, and other attributes between the source database and the target database are the same. + +- Row Value Verification: Check that the actual data in the tables is the same between the source database and the target database. + +Other validations beyond those supported by MOLT Verify would need to be run by a third-party tool, but could be run in tandem with MOLT Verify. + +If performing a [phased migration]({% link molt/migration-considerations-granularity.md %}), you can use MOLT Verify's `--schema-filter` and `--table-filter` flags to specify specific schemas or tables to run the validations on. + +If using [continuous replication]({% link molt/migration-considerations-replication.md %}), you can utilize MOLT Verify's [selective data verification]({% link molt/molt-verify.md %}#verify-a-subset-of-data) to validate replicated changes as they are written to the target. + +Check MOLT Verify's [known limitations]({% link molt/molt-verify.md %}#known-limitations) to ensure the tool's suitability for your validation strategy. + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Migration Granularity]({% link molt/migration-considerations-granularity.md %}) +- [Continuous Replication]({% link molt/migration-considerations-replication.md %}) +- [Data Transformation Strategy]({% link molt/migration-considerations-transformation.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) diff --git a/src/current/molt/migration-considerations.md b/src/current/molt/migration-considerations.md new file mode 100644 index 00000000000..5bc90bb9a66 --- /dev/null +++ b/src/current/molt/migration-considerations.md @@ -0,0 +1,71 @@ +--- +title: Migration Considerations +summary: Learn what to consider when making high-level decisions about a migration. +toc: true +docs_area: migrate +--- + +When planning a migration to CockroachDB, you need to make several high-level decisions that will shape your migration approach. This page provides an overview of key migration variables and the factors that influence them. Each variable has multiple options, and the combination you choose will largely define your migration strategy. + +For detailed migration sequencing and tool usage, see [Migration Overview]({% link molt/migration-overview.md %}). For detailed planning guidance, see [Migration Best Practices]({% link molt/migration-strategy.md %}). + +## Migration variables + +Learn more about each migration variable by clicking the links in the left-hand column. + +| Variable | Description | +|---|---| +| [**Migration granularity**]({% link molt/migration-considerations-granularity.md %}) | Do you want to migrate all of your data at once, or do you want to split your data up into phases and migrate one phase at a time? | +| [**Continuous replication**]({% link molt/migration-considerations-replication.md %}) | After the initial data load (or after the initial load of each phase), do you want to stream further changes to that data from the source to the target? | +| [**Data transformation strategy**]({% link molt/migration-considerations-transformation.md %}) | If there are discrepancies between the source and target schema, how will you define those data transformations, and when will those transformations occur? | +| [**Validation strategy**]({% link molt/migration-considerations-validation.md %}) | How and when will you verify that the data in CockroachDB matches the source database? | +| [**Rollback plan**]({% link molt/migration-considerations-rollback.md %}) | What approach will you use to roll back the migration if issues arise during or after cutover? | + +The combination of these variables largely defines your migration approach. While you'll typically choose one primary option for each variable, some migrations may involve a hybrid approach depending on your specific requirements. + +## Factors to consider + +When deciding on the options for each migration variable, consider the following business and technical requirements: + +### Permissible downtime + +How much downtime can your application tolerate during the migration? This is one of the most critical factors in determining your migration approach, and it may influence your choices for [migration granularity]({% link molt/migration-considerations-granularity.md %}), and [continuous replication]({% link molt/migration-considerations-replication.md %}). + +- **Planned downtime** is made known to your users in advance. It involves taking the application offline, conducting the migration, and bringing the application back online on CockroachDB. + + To succeed, you should estimate the amount of downtime required to migrate your data, and ideally schedule the downtime outside of peak hours. Scheduling downtime is easiest if your application traffic is "periodic", meaning that it varies by the time of day, day of week, or day of month. + + If you can support planned downtime, you may want to migrate your data all at once, and _without_ continuous replication. + +- **Minimal downtime** impacts as few customers as possible, ideally without impacting their regular usage. If your application is intentionally offline at certain times (e.g., outside business hours), you can migrate the data without users noticing. Alternatively, if your application's functionality is not time-sensitive (e.g., it sends batched messages or emails), you can queue requests while the system is offline and process them after completing the migration to CockroachDB. + +In addition to downtime duration, consider whether your application could support windows of **reduced functionality** in which some, but not all, application functionality is brought offline. For example, you can disable writes but not reads while you migrate the application data, and queue data to be written after completing the migration. + +### Migration timeframe and allowable complexity + +When do you need to complete the migration? How many team members can be allocated for this effort? How much complex orchestration can your team manage? These factors may influence your choices for [migration granularity]({% link molt/migration-considerations-granularity.md %}), [continuous replication]({% link molt/migration-considerations-replication.md %}), and [rollback plan]({% link molt/migration-considerations-rollback.md %}). + +- Migrations with a short timeline, or which cannot accommodate high complexity, may want to migrate data all at once, without utilizing continuous replication, and requiring manual reconciliation in the event of migration failure. + +- Migrations with a long timeline, or which can accomodate complexity, may want to migrate data in phases. If the migration requires minimal downtime, these migrations may also want to utilize continuous replication. If the migration is low in risk-tolerance, these migrations may also want to enable failback. + +### Risk tolerance + +How much risk is your organization willing to accept during the migration? This may influence your choices for [migration granularity]({% link molt/migration-considerations-granularity.md %}), [validation strategy]({% link molt/migration-considerations-validation.md %}), and [rollback plan]({% link molt/migration-considerations-rollback.md %}). + +- Risk-averse migrations should prefer phased migrations that limit the blast radius of any issues. Start with low-risk slices (e.g., a small cohort of tenants or a non-critical service), validate thoroughly, and progressively expand to higher-value workloads. These migrations may also prefer rollback plans that enable quick recovery in the event of migration issues. + +- For risk-tolerant migrations, it may be acceptable to migrate all of your data at once. Less stringent validation strategies and manual reconciliation in the event of a migration failure may also be acceptable. + +___ + +These factors are only a subset of what you will want to consider in designing your CockroachDB migration, along with your specific business requirements and technical constraints. It's recommended that you document these decisions and the reasoning behind them as part of your [migration plan]({% link molt/migration-strategy.md %}#develop-a-migration-plan). + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Best Practices]({% link molt/migration-strategy.md %}) +- [Bulk vs. Phased Migration]({% link molt/migration-considerations-granularity.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/migration-overview.md b/src/current/molt/migration-overview.md index 163dc26d8aa..74801d8548d 100644 --- a/src/current/molt/migration-overview.md +++ b/src/current/molt/migration-overview.md @@ -5,34 +5,37 @@ toc: true docs_area: migrate --- -The MOLT (Migrate Off Legacy Technology) toolkit enables safe, minimal-downtime database migrations to CockroachDB. MOLT combines schema transformation, distributed data load, continuous replication, and row-level validation into a highly configurable workflow that adapts to diverse production environments. +A migration involves transferring data from a pre-existing **source** database onto a **target** CockroachDB cluster. Migrating data is a complex, multi-step process, and a data migration can take many different forms depending on your specific business and technical constraints. + +Cockroach Labs provides a [MOLT (Migrate Off Legacy Technology)]({% link releases/molt.md %}) toolkit to aid in migrations. This page provides an overview of the following: -- Overall [migration sequence](#migration-sequence) +- The generic [migration sequence](#migration-sequence) - [MOLT tools](#molt-tools) -- Supported [migration flows](#migration-flows) +- [Variables](#migration-variables) to consider when choosing a migration approach +- [Common migration approaches](#common-migration-approaches) ## Migration sequence -{{site.data.alerts.callout_success}} -Before you begin the migration, review [Migration Strategy]({% link molt/migration-strategy.md %}). -{{site.data.alerts.end}} +A migration to CockroachDB generally follows a variant of this sequence: -A migration to CockroachDB generally follows this sequence: +1. **Assess and discover**: Inventory the source database, flag unsupported features, make a migration plan. +1. **Prepare the environment**: Configure networking, users and permissions, bucket locations, replication settings, and more. +1. **Convert the source schema**: Generate CockroachDB-compatible [DDL]({% link {{ site.current_cloud_version }}/sql-statements.md %}#data-definition-statements). Apply the converted schema to the target database. Drop constraints and indexes to facilitate data load. +1. **Load data into CockroachDB**: Bulk load the source data into the CockroachDB cluster. +1. **Finalize target schema**: Recreate indexes or constraints on CockroachDB that you previously dropped to facilitate data load. +1. **_(Optional)_ Replicate ongoing changes**: Keep CockroachDB in sync with the source. This may be necessary for migrations that minimize downtime. +1. **Stop application traffic**: Limit user read/write traffic to the source database. _This begins application downtime._ +1. **Verify data consistency**: Confirm that the CockroachDB data is consistent with the source. +1. **_(Optional)_ Enable failback**: Replicate data from the target back to the source, enabling a reversion to the source database in the event of migration failure. +1. **Cut over application traffic**: Resume normal application use, with the CockroachDB cluster as the target database. _This ends application downtime._ -MOLT tooling overview - -1. Prepare the source database: Configure users, permissions, and replication settings as needed. -1. Convert the source schema: Use the [Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) to generate CockroachDB-compatible [DDL]({% link {{ site.current_cloud_version }}/sql-statements.md %}#data-definition-statements). Apply the converted schema to the target database. Drop constraints and indexes to facilitate data load. -1. Load data into CockroachDB: Use [MOLT Fetch]({% link molt/molt-fetch.md %}) to bulk-ingest your source data. -1. (Optional) Verify consistency before replication: Use [MOLT Verify]({% link molt/molt-verify.md %}) to confirm that the data loaded into CockroachDB is consistent with the source. -1. Finalize target schema: Recreate indexes or constraints on CockroachDB that you previously dropped to facilitate data load. -1. Replicate ongoing changes: Enable continuous replication with [MOLT Replicator]({% link molt/molt-replicator.md %}) to keep CockroachDB in sync with the source. -1. Verify consistency before cutover: Use [MOLT Verify]({% link molt/molt-verify.md %}) to confirm that the CockroachDB data is consistent with the source. -1. Cut over to CockroachDB: Redirect application traffic to the CockroachDB cluster. +The MOLT (Migrate Off Legacy Technology) toolkit enables safe, minimal-downtime database migrations to CockroachDB. MOLT combines schema transformation, distributed data load, continuous replication, and row-level validation into a highly configurable workflow that adapts to diverse production environments. -For more details, refer to [Migration flows](#migration-flows). +
+MOLT toolkit flow +
## MOLT tools @@ -56,13 +59,13 @@ MOLT [Fetch](#fetch), [Replicator](#replicator), and [Verify](#verify) are CLI-b Fetch Initial data load - PostgreSQL 11-16, MySQL 5.7-8.0+, Oracle Database 19c (Enterprise Edition) and 21c (Express Edition), CockroachDB + PostgreSQL 11-16, MySQL 5.7-8.0+, Oracle Database 19c (Enterprise Edition) and 21c (Express Edition) GA Replicator Continuous replication - CockroachDB, PostgreSQL 11-16, MySQL 5.7+ and 8.0+, Oracle Database 19c+ + PostgreSQL 11-16, MySQL 5.7+ and 8.0+, Oracle Database 19c+, CockroachDB GA @@ -87,20 +90,20 @@ The [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) [MOLT Fetch]({% link molt/molt-fetch.md %}) performs the initial data load to CockroachDB. It supports: -- [Multiple migration flows](#migration-flows) via `IMPORT INTO` or `COPY FROM`. -- Data movement via [cloud storage, local file servers, or direct copy]({% link molt/molt-fetch.md %}#data-path). -- [Concurrent data export]({% link molt/molt-fetch.md %}#best-practices) from multiple source tables and shards. -- [Schema transformation rules]({% link molt/molt-fetch.md %}#transformations). -- After exporting data with `IMPORT INTO`, safe [continuation]({% link molt/molt-fetch.md %}#fetch-continuation) to retry failed or interrupted tasks from specific checkpoints. +- Multiple migration flows via `IMPORT INTO` or `COPY FROM`. +- Data movement via [cloud storage, local file servers, or direct copy]({% link molt/molt-fetch.md %}#define-intermediate-storage). +- [Concurrent data export]({% link molt/molt-fetch-best-practices.md %}) from multiple source tables and shards. +- [Schema transformation rules]({% link molt/molt-fetch.md %}#define-transformations). +- After exporting data with `IMPORT INTO`, safe [continuation]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption) to retry failed or interrupted tasks from specific checkpoints. ### Replicator -[MOLT Replicator]({% link molt/molt-replicator.md %}) provides continuous replication capabilities for minimal-downtime migrations. It supports: +[MOLT Replicator]({% link molt/molt-replicator.md %}) provides [continuous replication](#continuous-replication) capabilities for minimal-downtime migrations. It supports: - Continuous replication from source databases to CockroachDB. - [Multiple consistency modes]({% link molt/molt-replicator.md %}#consistency-modes) for balancing throughput and transactional guarantees. - Failback replication from CockroachDB back to source databases. -- [Performance tuning]({% link molt/molt-replicator.md %}#optimize-performance) for high-throughput workloads. +- [Performance tuning]({% link molt/molt-replicator-best-practices.md %}#optimize-performance) for high-throughput workloads. ### Verify @@ -110,35 +113,56 @@ The [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) - Column definition. - Row-level data. -## Migration flows +## Migration variables -MOLT supports various migration flows using [MOLT Fetch]({% link molt/molt-fetch.md %}) for data loading and [MOLT Replicator]({% link molt/molt-replicator.md %}) for ongoing replication. +You must decide how you want your migration to handle each of the following variables. These decisions will depend on your specific business and technical considerations. The MOLT toolkit supports any set of decisions made for the [supported source databases](#molt-tools). -| Migration flow | Tools | Description | Best for | -|------------------------------------------------------------------------|------------------------------|----------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------| -| [Bulk load]({% link molt/migrate-bulk-load.md %}) | MOLT Fetch | Perform a one-time bulk load of source data into CockroachDB. | Testing, migrations with [planned downtime]({% link molt/migration-strategy.md %}#approach-to-downtime) | -| [Data load and replication]({% link molt/migrate-load-replicate.md %}) | MOLT Fetch + MOLT Replicator | Load source data with Fetch, then replicate subsequent changes continuously with Replicator. | [Minimal downtime]({% link molt/migration-strategy.md %}#approach-to-downtime) migrations | -| [Resume replication]({% link molt/migrate-resume-replication.md %}) | MOLT Replicator | Resume replication from a checkpoint after interruption. | Resuming interrupted migrations, post-load sync | -| [Failback]({% link molt/migrate-failback.md %}) | MOLT Replicator | Replicate changes from CockroachDB back to the source database. | [Rollback]({% link molt/migrate-failback.md %}) scenarios | +### Migration granularity + +You may choose to migrate all of your data into a CockroachDB cluster at once. However, for larger data stores it's recommended that you migrate data in separate phases. This can help break the migration down into manageable slices, and it can help limit the effects of migration difficulties. + +### Continuous replication + +After data is migrated from the source into CockroachDB, you may choose to continue streaming changes to that source data from the source to the target. This is important for migrations that aim to minimize application downtime, as they may require the source database to continue receiving writes until application traffic is fully cut over to CockroachDB. -### Bulk load +### Data transformation strategy -For migrations that tolerate downtime, use MOLT Fetch in `data-load` mode to perform a one-time bulk load of source data into CockroachDB. Refer to [Bulk Load]({% link molt/migrate-bulk-load.md %}). +If there are discrepancies between the source and target schemas, the rules that determine necessary data transformations need to be defined. These transformations can be applied in the source database, in flight, or in the target database. -### Migrations with minimal downtime +### Validation strategy -To minimize downtime during migration, use MOLT Fetch for initial data loading followed by MOLT Replicator for continuous replication. Instead of loading all data during a planned downtime window, you can run an initial load followed by continuous replication. Writes are paused only briefly to allow replication to drain before the final cutover. The duration of this pause depends on the volume of write traffic and the replication lag between the source and CockroachDB. +There are several different ways of verifying that the data in the source and the target match one another. You must decide what validation checks you want to perform, and when in the migration process you want to perform them. -Refer to [Load and Replicate]({% link molt/migrate-load-replicate.md %}) for detailed instructions. +### Rollback plan -### Recovery and rollback strategies +Until the migration is complete, migration failures may make you decide to roll back application traffic entirely to the source database. You may therefore need a way of keeping the source database up to date with new writes to the target. This is especially important for risk-averse migrations that aim to minimize downtime. -If the migration is interrupted or cutover must be aborted, MOLT Replicator provides safe recovery options: +--- + +[Learn more about the different migration variables]({% link molt/migration-considerations.md %}), how you should consider the different options for each variable, and how to use the MOLT toolkit for each variable. + +## Common migration approaches + + + +MOLT supports various migration flows using [MOLT Fetch]({% link molt/molt-fetch.md %}) for data loading and [MOLT Replicator]({% link molt/molt-replicator.md %}) for ongoing replication. + +| Migration approach | Description | Best for | +|------------------------------------------------------------------------|------------------------------|----------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------| +| [Classic Bulk Load Migration]({% link molt/migration-approach-classic-bulk-load.md %}) | Perform a one-time bulk load of source data into CockroachDB. | Simple migrations with planned downtime. | +| [Phased Bulk Load Migration]({% link molt/migration-approach-phased-bulk-load.md %}) | Divide your data into separate phases and bulk load each phase. | Larger migrations with planned downtime per phase. | +| [Delta Migration]({% link molt/migration-approach-delta.md %}) | Perform an initial data load, then replicate ongoing changes continuously. | Minimal-downtime migrations. | +| [Phased Delta Migration with Failback Replication]({% link molt/migration-approach-phased-delta-failback.md %}) | Divide your data into separate phases. For each phase, perform an initial data load, then replicate ongoing changes continuously. Enable failback replication. | Risk-averse migrations with minimal downtime per phase. | -- Resume a previously interrupted replication stream. Refer to [Resume Replication]({% link molt/migrate-resume-replication.md %}). -- Use failback mode to reverse the migration, synchronizing changes from CockroachDB back to the original source. This ensures data consistency on the source so that you can retry the migration later. Refer to [Migration Failback]({% link molt/migrate-failback.md %}). +Each of these approaches has a detailed walkthrough guide for performing these migrations using the [MOLT toolkit](#molt-tools). While these approaches are among the most common, you may need to modify these instructions to suit the specific needs of your migration. ## See also -- [Migration Strategy]({% link molt/migration-strategy.md %}) +- [Migration Best Practices]({% link molt/migration-strategy.md %}) - [MOLT Releases]({% link releases/molt.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Classic Bulk Load Migration]({% link molt/migration-approach-classic-bulk-load.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) diff --git a/src/current/molt/migration-strategy.md b/src/current/molt/migration-strategy.md index eb2b46f65f5..237196867f1 100644 --- a/src/current/molt/migration-strategy.md +++ b/src/current/molt/migration-strategy.md @@ -1,5 +1,5 @@ --- -title: Migration Strategy +title: Migration Best Practices summary: Build a migration strategy before performing a database migration to CockroachDB. toc: true docs_area: migrate @@ -10,10 +10,9 @@ A successful migration to CockroachDB requires planning for downtime, applicatio This page outlines key decisions, infrastructure considerations, and best practices for a resilient and repeatable high-level migration strategy: - [Develop a migration plan](#develop-a-migration-plan). -- Evaluate your [downtime approach](#approach-to-downtime). - [Size the target CockroachDB cluster](#capacity-planning). - Implement [application changes](#application-changes) to address necessary [schema changes](#schema-design-best-practices), [transaction contention](#handling-transaction-contention), and [unimplemented features](#unimplemented-features-and-syntax-incompatibilities). -- [Prepare for migration](#prepare-for-migration) by running a [pre-mortem](#run-a-migration-pre-mortem), setting up [metrics](#set-up-monitoring-and-alerting), [loading test data](#load-test-data), [validating application queries](#validate-queries) for correctness and performance, performing a [migration dry run](#perform-a-dry-run), and reviewing your [cutover strategy](#cutover-strategy). +- [Prepare for migration](#prepare-for-migration) by running a [pre-mortem](#run-a-migration-pre-mortem), setting up [metrics](#set-up-monitoring-and-alerting), [loading test data](#load-test-data), [validating application queries](#validate-queries) for correctness and performance, performing a [migration dry run](#perform-a-dry-run), and reviewing your cutover strategy. {% assign variable = value %} {{site.data.alerts.callout_success}} For help migrating to CockroachDB, contact our sales team. @@ -31,20 +30,6 @@ Consider the following as you plan your migration: Create a document summarizing the migration's purpose, technical details, and team members involved. -## Approach to downtime - -It's important to fully [prepare the migration](#prepare-for-migration) in order to be certain that the migration can be completed successfully during the downtime window. - -- *Planned downtime* is made known to your users in advance. Once you have [prepared for the migration](#prepare-for-migration), you take the application offline, [conduct the migration]({% link molt/migration-overview.md %}), and bring the application back online on CockroachDB. To succeed, you should estimate the amount of downtime required to migrate your data, and ideally schedule the downtime outside of peak hours. Scheduling downtime is easiest if your application traffic is "periodic", meaning that it varies by the time of day, day of week, or day of month. - - Migrations with planned downtime are only recommended if you can complete the bulk data load (e.g., using the MOLT Fetch [`data-load` mode]({% link molt/molt-fetch.md %}#fetch-mode)) within the downtime window. Otherwise, you can [minimize downtime using continuous replication]({% link molt/migration-overview.md %}#migrations-with-minimal-downtime). - -- *Minimal downtime* impacts as few customers as possible, ideally without impacting their regular usage. If your application is intentionally offline at certain times (e.g., outside business hours), you can migrate the data without users noticing. Alternatively, if your application's functionality is not time-sensitive (e.g., it sends batched messages or emails), you can queue requests while the system is offline and process them after completing the migration to CockroachDB. - - MOLT enables [migrations with minimal downtime]({% link molt/migration-overview.md %}#migrations-with-minimal-downtime), using [MOLT Replicator]({% link molt/molt-replicator.md %}) for continuous replication of source changes to CockroachDB. - -- *Reduced functionality* takes some, but not all, application functionality offline. For example, you can disable writes but not reads while you migrate the application data, and queue data to be written after completing the migration. - ## Capacity planning To size the target CockroachDB cluster, consider your data volume and workload characteristics: @@ -110,9 +95,9 @@ Based on the error budget you [defined in your migration plan](#develop-a-migrat ### Load test data -It's useful to load test data into CockroachDB so that you can [test your application queries](#validate-queries). Refer to [Migration flows]({% link molt/migration-overview.md %}#migration-flows). +It's useful to load test data into CockroachDB so that you can [test your application queries](#validate-queries). -MOLT Fetch [supports both `IMPORT INTO` and `COPY FROM`]({% link molt/molt-fetch.md %}#data-load-mode) for loading data into CockroachDB: +MOLT Fetch [supports both `IMPORT INTO` and `COPY FROM`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) for loading data into CockroachDB: - Use `IMPORT INTO` for maximum throughput when the target tables can be offline. For a bulk data migration, most users should use `IMPORT INTO` because the tables will be offline anyway, and `IMPORT INTO` can [perform the data import much faster]({% link {{ site.current_cloud_version }}/import-performance-best-practices.md %}) than `COPY FROM`. - Use `COPY FROM` (or `--direct-copy`) when the target must remain queryable during load. @@ -147,7 +132,7 @@ To further minimize potential surprises when you conduct the migration, practice Performing a dry run is highly recommended. In addition to demonstrating how long the migration may take, a dry run also helps to ensure that team members understand what they need to do during the migration, and that changes to the application are coordinated. -## Cutover strategy + ## See also - [Migration Overview]({% link molt/migration-overview.md %}) -- [Migration Failback]({% link molt/migrate-failback.md %}) - [Schema Design Overview]({% link {{ site.current_cloud_version }}/schema-design-overview.md %}) - [Primary key best practices]({% link {{ site.current_cloud_version }}/schema-design-table.md %}#primary-key-best-practices) - [Secondary index best practices]({% link {{ site.current_cloud_version }}/schema-design-indexes.md %}#best-practices) diff --git a/src/current/molt/molt-fetch-best-practices.md b/src/current/molt/molt-fetch-best-practices.md new file mode 100644 index 00000000000..10fe5ca6600 --- /dev/null +++ b/src/current/molt/molt-fetch-best-practices.md @@ -0,0 +1,71 @@ +--- +title: MOLT Fetch Best Practices +summary: Learn best practices for using MOLT Fetch to migrate data to CockroachDB. +toc: true +docs_area: migrate +--- + +This page describes best practices for using [MOLT Fetch]({% link molt/molt-fetch.md %}) to ensure reliable, secure, and performant data migration to CockroachDB. + +## Test and validate + +To verify that your connections and configuration work properly, run MOLT Fetch in a staging environment before migrating any data in production. Use a test or development environment that closely resembles production. + +## Configure the source database and connection + +- To prevent connections from terminating prematurely during the [data export phase]({% link molt/molt-fetch.md %}#data-export-phase), set the following to high values on the source database: + + - **Maximum allowed number of connections.** MOLT Fetch can export data across multiple connections. The number of connections it will create is the number of shards ([`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags)) multiplied by the number of tables ([`--table-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags)) being exported concurrently. + + {{site.data.alerts.callout_info}} + With the default numerical range sharding, only tables with [primary key]({% link {{ site.current_cloud_version }}/primary-key.md %}) types of [`INT`]({% link {{ site.current_cloud_version }}/int.md %}), [`FLOAT`]({% link {{ site.current_cloud_version }}/float.md %}), or [`UUID`]({% link {{ site.current_cloud_version }}/uuid.md %}) can be sharded. PostgreSQL users can enable [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) to use statistics-based sharding for tables with primary keys of any data type. For details, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export). + {{site.data.alerts.end}} + + - **Maximum lifetime of a connection.** + +- If a PostgreSQL database is set as a [source]({% link molt/molt-fetch.md %}#specify-source-and-target-databases), ensure that [`idle_in_transaction_session_timeout`](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUT) on PostgreSQL is either disabled or set to a value longer than the duration of the [data export phase]({% link molt/molt-fetch.md %}#data-export-phase). Otherwise, the connection will be prematurely terminated. To estimate the time needed to export the PostgreSQL tables, you can perform a dry run and sum the value of [`molt_fetch_table_export_duration_ms`]({% link molt/molt-fetch-monitoring.md %}#metrics) for all exported tables. + +## Optimize performance + +- {% include molt/molt-drop-constraints-indexes.md %} + +- For PostgreSQL sources using [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags), run [`ANALYZE`]({% link {{ site.current_cloud_version }}/create-statistics.md %}) on source tables before migration to ensure optimal shard distribution. This is especially important for large tables where even distribution can significantly improve export performance. + +- To prevent memory outages during `READ COMMITTED` [data export]({% link molt/molt-fetch.md %}#data-export-phase) of tables with large rows, estimate the amount of memory used to export a table: + + ~~~ + --row-batch-size * --export-concurrency * average size of the table rows + ~~~ + + If you are exporting more than one table at a time (i.e., [`--table-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) is set higher than `1`), add the estimated memory usage for the tables with the largest row sizes. Ensure that you have sufficient memory to run `molt fetch`, and adjust [`--row-batch-size`]({% link molt/molt-fetch-commands-and-flags.md %}#row-batch-size) accordingly. For details on how concurrency and sharding interact, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export). + +- If a table in the source database is much larger than the other tables, [filter and export the largest table]({% link molt/molt-fetch.md %}#schema-and-table-selection) in its own `molt fetch` task. Repeat this for each of the largest tables. Then export the remaining tables in another task. + +- Ensure that the machine running MOLT Fetch is large enough to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the `molt_fetch_rows_exported` [metric]({% link molt/molt-fetch-monitoring.md %}#metrics) for decreases in the number of rows being processed. You can use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view metrics. For details on optimizing export performance through sharding, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export). + +## Import and continuation handling + +- When using [`IMPORT INTO`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) during the [data import phase]({% link molt/molt-fetch.md %}#data-import-phase) to load tables into CockroachDB, if the fetch task terminates before the import job completes, the hanging import job on the target database will keep the table offline. To make this table accessible again, [manually resume or cancel the job]({% link {{site.current_cloud_version}}/import-into.md %}#view-and-control-import-jobs). Then resume `molt fetch` using [continuation]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption), or restart the task from the beginning. + +## Security + +Cockroach Labs strongly recommends the following security practices. + +### Connection security + +{% include molt/molt-secure-connection-strings.md %} + +{{site.data.alerts.callout_info}} +By default, insecure connections (i.e., `sslmode=disable` on PostgreSQL; `sslmode` not set on MySQL) are disallowed. When using an insecure connection, `molt fetch` returns an error. To override this check, you can enable the [`--allow-tls-mode-disable`]({% link molt/molt-fetch-commands-and-flags.md %}#allow-tls-mode-disable) flag. Do this **only** when testing, or if a secure SSL/TLS connection to the source or target database is not possible. +{{site.data.alerts.end}} + +### Cloud storage security + +{% include molt/fetch-secure-cloud-storage.md %} + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/molt-fetch-commands-and-flags.md b/src/current/molt/molt-fetch-commands-and-flags.md new file mode 100644 index 00000000000..f38a1f8f851 --- /dev/null +++ b/src/current/molt/molt-fetch-commands-and-flags.md @@ -0,0 +1,91 @@ +--- +title: MOLT Fetch Commands and Flags +summary: Reference documentation for MOLT Fetch commands and flags. +toc: true +docs_area: migrate +--- + +This page lists the [MOLT Fetch]({% link molt/molt-fetch.md %}) commands and the flags that you can use to configure a MOLT Fetch command execution. + +## Commands + +| Command | Usage | +|---------|---------------------------------------------------------------------------------------------------| +| `fetch` | Start the fetch task. This loads data from a source database to a target CockroachDB database. | + +### Subcommands + +| Command | Usage | +|--------------|----------------------------------------------------------------------| +| `tokens list` | List active [continuation tokens]({% link molt/molt-fetch.md %}#list-active-continuation-tokens). | + +## Flags + +### Global flags + +| Flag | Description | +|---------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `--source` | (Required) Connection string used to connect to the Oracle PDB (in a CDB/PDB architecture) or to a standalone database (non‑CDB). For details, refer to [Source and target databases]({% link molt/molt-fetch.md %}#specify-source-and-target-databases). | +| `--source-cdb` | Connection string for the Oracle container database (CDB) when using a multitenant (CDB/PDB) architecture. Omit this flag on a non‑multitenant Oracle database. For details, refer to [Source and target databases]({% link molt/molt-fetch.md %}#specify-source-and-target-databases). | +| `--target` | (Required) Connection string for the target database. For details, refer to [Source and target databases]({% link molt/molt-fetch.md %}#specify-source-and-target-databases). | +| `--allow-tls-mode-disable` | Allow insecure connections to databases. Secure SSL/TLS connections should be used by default. This should be enabled **only** if secure SSL/TLS connections to the source or target database are not possible. | +| `--assume-role` | Service account to use for assume role authentication. `--use-implicit-auth` must be included. For example, `--assume-role='user-test@cluster-ephemeral.iam.gserviceaccount.com' --use-implicit-auth`. For details, refer to [Cloud Storage Authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}). | +| `--bucket-path` | The path within the [cloud storage]({% link molt/molt-fetch.md %}#bucket-path) bucket where intermediate files are written (e.g., `'s3://bucket/path'` or `'gs://bucket/path'`). Only the URL path is used; query parameters (e.g., credentials) are ignored. To pass in query parameters, use the appropriate flags: `--assume-role`, `--import-region`, `--use-implicit-auth`. | +| `--case-sensitive` | Toggle case sensitivity when comparing table and column names on the source and target. To disable case sensitivity, set `--case-sensitive=false`. If `=` is **not** included (e.g., `--case-sensitive false`), the flag is interpreted as `--case-sensitive` (i.e., `--case-sensitive=true`).

**Default:** `false` | +| `--cleanup` | Whether to delete intermediate files after moving data using [cloud or local storage]({% link molt/molt-fetch.md %}#define-intermediate-storage). **Note:** Cleanup does not occur on [continuation]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption). | +| `--compression` | Compression method for data when using [`IMPORT INTO`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) (`gzip`/`none`).

**Default:** `gzip` | +| `--continuation-file-name` | Restart fetch at the specified filename if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption). | +| `--continuation-token` | Restart fetch at a specific table, using the specified continuation token, if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption). | +| `--crdb-pts-duration` | The duration for which each timestamp used in data export from a CockroachDB source is protected from garbage collection. This ensures that the data snapshot remains consistent. For example, if set to `24h`, each timestamp is protected for 24 hours from the initiation of the export job. This duration is extended at regular intervals specified in `--crdb-pts-refresh-interval`.

**Default:** `24h0m0s` | +| `--crdb-pts-refresh-interval` | The frequency at which the protected timestamp's validity is extended. This interval maintains protection of the data snapshot until data export from a CockroachDB source is completed. For example, if set to `10m`, the protected timestamp's expiration will be extended by the duration specified in `--crdb-pts-duration` (e.g., `24h`) every 10 minutes while export is not complete.

**Default:** `10m0s` | +| `--direct-copy` | Enables [direct copy]({% link molt/molt-fetch.md %}#direct-copy), which copies data directly from source to target without using an intermediate store. | +| `--export-concurrency` | Number of shards to export at a time per table, each on a dedicated thread. This controls how many shards are created for each individual table during the [data export phase]({% link molt/molt-fetch.md %}#data-export-phase) and is distinct from `--table-concurrency`, which controls how many tables are processed simultaneously. The total number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`. Tables can be sharded with a range-based or stats-based mechanism. For details, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export).

**Default:** `4` | +| `--export-retry-max-attempts` | Maximum number of retry attempts for source export queries when connection failures occur. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `3` | +| `--export-retry-max-duration` | Maximum total duration for retrying source export queries. If `0`, no time limit is enforced. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `5m0s` | +| `--filter-path` | Path to a JSON file defining row-level filters for the [data import phase]({% link molt/molt-fetch.md %}#data-import-phase). Refer to [Selective data movement]({% link molt/molt-fetch.md %}#select-data-to-migrate). | +| `--fetch-id` | Restart fetch task corresponding to the specified ID. If `--continuation-file-name` or `--continuation-token` are not specified, fetch restarts for all failed tables. | +| `--flush-rows` | Number of rows before the source data is flushed to intermediate files. **Note:** If `--flush-size` is also specified, the fetch behavior is based on the flag whose criterion is met first. | +| `--flush-size` | Size (in bytes) before the source data is flushed to intermediate files. **Note:** If `--flush-rows` is also specified, the fetch behavior is based on the flag whose criterion is met first. | +| `--ignore-replication-check` | Skip querying for replication checkpoints such as `pg_current_wal_insert_lsn()` on PostgreSQL, `gtid_executed` on MySQL, and `CURRENT_SCN` on Oracle. This option is intended for use during bulk load migrations or when doing a one-time data export from a read replica. | +| `--import-batch-size` | The number of files to be imported at a time to the target database during the [data import phase]({% link molt/molt-fetch.md %}#data-import-phase). This applies only when using [`IMPORT INTO`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) for data movement. **Note:** Increasing this value can improve the performance of full-scan queries on the target database shortly after fetch completes, but very high values are not recommended. If any individual file in the import batch fails, you must [retry]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption) the entire batch.

**Default:** `1000` | +| `--import-region` | The region of the [cloud storage]({% link molt/molt-fetch.md %}#bucket-path) bucket. This applies only to [Amazon S3 buckets]({% link molt/molt-fetch.md %}#bucket-path). Set this flag only if you need to specify an `AWS_REGION` explicitly when using [`IMPORT INTO`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) for data movement. For example, `--import-region=ap-south-1`. | +| `--local-path` | The path within the [local file server]({% link molt/molt-fetch.md %}#local-path) where intermediate files are written (e.g., `data/migration/cockroach`). `--local-path-listen-addr` must be specified. | +| `--local-path-crdb-access-addr` | Address of a [local file server]({% link molt/molt-fetch.md %}#local-path) that is **publicly accessible**. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB {{ site.data.products.cloud }} deployment). `--local-path` and `--local-path-listen-addr` must be specified.

**Default:** Value of `--local-path-listen-addr`. | +| `--local-path-listen-addr` | Write intermediate files to a [local file server]({% link molt/molt-fetch.md %}#local-path) at the specified address (e.g., `'localhost:3000'`). `--local-path` must be specified. | +| `--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `fetch-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`. | +| `--logging` | Level at which to log messages (`trace`/`debug`/`info`/`warn`/`error`/`fatal`/`panic`).

**Default:** `info` | +| `--metrics-listen-addr` | Address of the Prometheus metrics endpoint, which has the path `{address}/metrics`. For details on important metrics to monitor, refer to [Monitoring]({% link molt/molt-fetch-monitoring.md %}).

**Default:** `'127.0.0.1:3030'` | +| `--mode` | Configure the MOLT Fetch behavior: `data-load`, `export-only`, or `import-only`. For details, refer to [Fetch mode]({% link molt/molt-fetch.md %}#define-fetch-mode).

**Default:** `data-load` | +| `--non-interactive` | Run the fetch task without interactive prompts. This is recommended **only** when running `molt fetch` in an automated process (i.e., a job or continuous integration). | +| `--pglogical-replication-slot-name` | Name of a PostgreSQL replication slot that will be created before taking a snapshot of data. Must match the slot name specified with `--slotName` in the [MOLT Replicator command]({% link molt/molt-replicator.md %}#replication-checkpoints). For details, refer to [Initial bulk load (before replication)]({% link molt/molt-fetch.md %}#initial-bulk-load-before-replication). | +| `--pglogical-publication-and-slot-drop-and-recreate` | Drop the PostgreSQL publication and replication slot if they exist, then recreate them. Creates a publication named `molt_fetch` and the replication slot specified with `--pglogical-replication-slot-name`. For details, refer to [Initial bulk load (before replication)]({% link molt/molt-fetch.md %}#initial-bulk-load-before-replication).

**Default:** `false` | +| `--pprof-listen-addr` | Address of the pprof endpoint.

**Default:** `'127.0.0.1:3031'` | +| `--row-batch-size` | Number of rows per shard to export at a time. For details on sharding, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export). See also [Best practices]({% link molt/molt-fetch-best-practices.md %}).

**Default:** `100000` | +| `--schema-filter` | Move schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | +| `--skip-pk-check` | Skip primary-key matching to allow data load when source or target tables have missing or mismatched primary keys. Disables sharding and bypasses `--export-concurrency` and `--row-batch-size` settings. Refer to [Skip primary key matching]({% link molt/molt-fetch.md %}#skip-primary-key-matching).

**Default:** `false` | +| `--table-concurrency` | Number of tables to export at a time. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`.

**Default:** `4` | +| `--table-exclusion-filter` | Exclude tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

This value **cannot** be set to `'.*'`, which would cause every table to be excluded.

**Default:** Empty string | +| `--table-filter` | Move tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | +| `--table-handling` | How tables are initialized on the target database (`none`/`drop-on-target-and-recreate`/`truncate-if-exists`). For details, see [Target table handling]({% link molt/molt-fetch.md %}#handle-target-tables).

**Default:** `none` | +| `--transformations-file` | Path to a JSON file that defines transformations to be performed on the target schema during the fetch task. Refer to [Transformations]({% link molt/molt-fetch.md %}#define-transformations). | +| `--type-map-file` | Path to a JSON file that contains explicit type mappings for automatic schema creation, when enabled with `--table-handling drop-on-target-and-recreate`. For details on the JSON format and valid type mappings, see [type mapping]({% link molt/molt-fetch.md %}#type-mapping). | +| `--use-console-writer` | Use the console writer, which has cleaner log output but introduces more latency.

**Default:** `false` (log as structured JSON) | +| `--use-copy` | Use [`COPY FROM`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) to move data. This makes tables queryable during data load, but is slower than using `IMPORT INTO`. For details, refer to [Data movement]({% link molt/molt-fetch.md %}#import-into-vs-copy-from). | +| `--use-implicit-auth` | Use [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) for [cloud storage]({% link molt/molt-fetch.md %}#bucket-path) URIs. | +| `--use-stats-based-sharding` | Enable statistics-based sharding for PostgreSQL sources. This allows sharding of tables with primary keys of any data type and can create more evenly distributed shards compared to the default numerical range sharding. Requires PostgreSQL 11+ and access to `pg_stats`. For details, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export). | + + +### `tokens list` flags + +| Flag | Description | +|-----------------------|---------------------------------------------------------------------------------------------------------------------------------------------| +| `--conn-string` | (Required) Connection string for the target database. For details, see [List active continuation tokens]({% link molt/molt-fetch.md %}#list-active-continuation-tokens). | +| `-n`, `--num-results` | Number of results to return. | + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Fetch Best Practices]({% link molt/molt-fetch-best-practices.md %}) +- [MOLT Fetch Monitoring]({% link molt/molt-fetch-monitoring.md %}) +- [MOLT Fetch Troubleshooting]({% link molt/molt-fetch-troubleshooting.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) diff --git a/src/current/molt/molt-fetch-installation.md b/src/current/molt/molt-fetch-installation.md new file mode 100644 index 00000000000..f6ba16fb19c --- /dev/null +++ b/src/current/molt/molt-fetch-installation.md @@ -0,0 +1,54 @@ +--- +title: MOLT Fetch Installation +summary: Learn how to install MOLT Fetch and configure prerequisites for data migration. +toc: true +docs_area: migrate +--- + +This page explains the prequisites for using [MOLT Fetch]({% link molt/molt-fetch.md %}) and then describes how to install it. + +## Prerequisites + +### Supported databases + +The following source databases are supported: + +- PostgreSQL 11-16 +- MySQL 5.7, 8.0 and later +- Oracle Database 19c (Enterprise Edition) and 21c (Express Edition) + +### Database configuration + +Ensure that the source and target schemas are identical, unless you enable automatic schema creation with the [`drop-on-target-and-recreate`]({% link molt/molt-fetch.md %}#handle-target-tables) option. If you are creating the target schema manually, review the behaviors in [Mismatch handling]({% link molt/molt-fetch.md %}#mismatch-handling). + +{{site.data.alerts.callout_info}} +MOLT Fetch does not support migrating sequences. If your source database contains sequences, refer to the [guidance on indexing with sequential keys]({% link {{site.current_cloud_version}}/sql-faqs.md %}#how-do-i-generate-unique-slowly-increasing-sequential-numbers-in-cockroachdb). If a sequential key is necessary in your CockroachDB table, you must create it manually. After using MOLT Fetch to load the data onto the target, but before cutover, make sure to update each sequence's current value using [`setval()`]({% link {{site.current_cloud_version}}/functions-and-operators.md %}#sequence-functions) so that new inserts continue from the correct point. +{{site.data.alerts.end}} + +If you plan to use cloud storage for the data migration, follow [Cloud storage security]({% link molt/molt-fetch-best-practices.md %}#cloud-storage-security) best practices. + +### User permissions + +The SQL user running MOLT Fetch requires specific privileges on both the source and target databases: + +| Database | Required Privileges | Examples | +|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------| +| PostgreSQL source | | [Create PostgreSQL migration user]({% link molt/classic-bulk-load-postgres.md %}#create-migration-user-on-source-database) | +| MySQL source | | [Create MySQL migration user]({% link molt/classic-bulk-load-mysql.md %}?#create-migration-user-on-source-database) | +| Oracle source | | [Create Oracle migration user]({% link molt/classic-bulk-load-oracle.md %}#create-migration-user-on-source-database) | +| CockroachDB target | | [Create CockroachDB user]({% link molt/classic-bulk-load-postgres.md %}#create-the-sql-user) | + +## Installation + +{% include molt/molt-install.md %} + +### Docker usage + +{% include molt/molt-docker.md %} + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Fetch Commands and Flags]({% link molt/molt-fetch-commands-and-flags.md %}) +- [MOLT Fetch Best Practices]({% link molt/molt-fetch-best-practices.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) diff --git a/src/current/molt/molt-fetch-monitoring.md b/src/current/molt/molt-fetch-monitoring.md new file mode 100644 index 00000000000..b9dea6c4383 --- /dev/null +++ b/src/current/molt/molt-fetch-monitoring.md @@ -0,0 +1,33 @@ +--- +title: MOLT Fetch Metrics +summary: Learn how to monitor MOLT Fetch during data migration using Prometheus metrics. +toc: true +docs_area: migrate +--- + +This page lists the [MOLT Fetch]({% link molt/molt-fetch.md %}) metrics that you can use to observe the progress of a MOLT Fetch command execution. + +## Metrics + +By default, MOLT Fetch exports [Prometheus](https://prometheus.io/) metrics at `127.0.0.1:3030/metrics`. You can configure this endpoint with the [`--metrics-listen-addr`]({% link molt/molt-fetch-commands-and-flags.md %}#metrics-listen-addr) flag. + +Cockroach Labs recommends monitoring the following metrics: + +| Metric Name | Description | +|---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------| +| `molt_fetch_num_tables` | Number of tables that will be moved from the source. | +| `molt_fetch_num_task_errors` | Number of errors encountered by the fetch task. | +| `molt_fetch_overall_duration` | Duration (in seconds) of the fetch task. | +| `molt_fetch_rows_exported` | Number of rows that have been exported from a table. For example:
`molt_fetch_rows_exported{table="public.users"}` | +| `molt_fetch_rows_imported` | Number of rows that have been imported from a table. For example:
`molt_fetch_rows_imported{table="public.users"}` | +| `molt_fetch_table_export_duration_ms` | Duration (in milliseconds) of a table's export. For example:
`molt_fetch_table_export_duration_ms{table="public.users"}` | +| `molt_fetch_table_import_duration_ms` | Duration (in milliseconds) of a table's import. For example:
`molt_fetch_table_import_duration_ms{table="public.users"}` | + +You can also use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view the preceding metrics. + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Fetch Best Practices]({% link molt/molt-fetch-best-practices.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) diff --git a/src/current/molt/molt-fetch-troubleshooting.md b/src/current/molt/molt-fetch-troubleshooting.md new file mode 100644 index 00000000000..79eac88df6a --- /dev/null +++ b/src/current/molt/molt-fetch-troubleshooting.md @@ -0,0 +1,23 @@ +--- +title: MOLT Fetch Troubleshooting +summary: Troubleshoot common issues when using MOLT Fetch for data migration. +toc: true +docs_area: migrate +--- + +This page describes common issues that can occur while using [MOLT Fetch]({% link molt/molt-fetch.md %}) and suggests ways to troubleshoot those issues. + +
+ + + +
+ +{% include molt/molt-troubleshooting-fetch.md %} + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/molt-fetch.md b/src/current/molt/molt-fetch.md index 03f66eea319..867f95c49b0 100644 --- a/src/current/molt/molt-fetch.md +++ b/src/current/molt/molt-fetch.md @@ -7,158 +7,51 @@ docs_area: migrate MOLT Fetch moves data from a source database into CockroachDB as part of a [database migration]({% link molt/migration-overview.md %}). -MOLT Fetch uses [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) to move the source data to cloud storage (Google Cloud Storage, Amazon S3, or Azure Blob Storage), a local file server, or local memory. Once the data is exported, MOLT Fetch loads the data into a target CockroachDB database. For details, refer to [Migration phases](#migration-phases). +MOLT Fetch uses [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) to move the source data to cloud storage (Google Cloud Storage, Amazon S3, or Azure Blob Storage), a local file server, or local memory. Once the data is exported, MOLT Fetch loads the data into a target CockroachDB database. -## Terminology +You can use MOLT Fetch to migrate data from a PostgreSQL, MySQL, or Oracle source database. Read more about [MOLT Fetch prerequisites]({% link molt/molt-fetch-installation.md %}#prerequisites). -- *Shard*: A portion of a table's data exported concurrently during the data export phase. Tables are divided into shards to enable parallel processing. For details, refer to [Table sharding](#table-sharding). -- *Continuation token*: An identifier that marks the progress of a fetch task. Used to resume data loading from the point of interruption if a fetch task fails. For details, refer to [Fetch continuation](#fetch-continuation). -- *Intermediate files*: Temporary data files written to cloud storage or a local file server during the data export phase. These files are used to stage exported data before importing it into CockroachDB during the data import phase. For details, refer to [Data path](#data-path). +## How it works -## Prerequisites +MOLT Fetch operates in two distinct phases to move data from the source databases to CockroachDB. The [data export phase](#data-export-phase) moves data to intermediate storage (either cloud storage or a local file server). The [data import phase](#data-import-phase) moves data from that intermediate storage to the CockroachDB cluster. For details on available modes, refer to [Define fetch mode](#define-fetch-mode). -### Supported databases - -The following source databases are supported: - -- PostgreSQL 11-16 -- MySQL 5.7, 8.0 and later -- Oracle Database 19c (Enterprise Edition) and 21c (Express Edition) - -### Database configuration - -Ensure that the source and target schemas are identical, unless you enable automatic schema creation with the [`drop-on-target-and-recreate`](#target-table-handling) option. If you are creating the target schema manually, review the behaviors in [Mismatch handling](#mismatch-handling). - -{{site.data.alerts.callout_info}} -MOLT Fetch does not support migrating sequences. If your source database contains sequences, refer to the [guidance on indexing with sequential keys]({% link {{site.current_cloud_version}}/sql-faqs.md %}#how-do-i-generate-unique-slowly-increasing-sequential-numbers-in-cockroachdb). If a sequential key is necessary in your CockroachDB table, you must create it manually. After using MOLT Fetch to load the data onto the target, but before cutover, make sure to update each sequence's current value using [`setval()`]({% link {{site.current_cloud_version}}/functions-and-operators.md %}#sequence-functions) so that new inserts continue from the correct point. -{{site.data.alerts.end}} +
+MOLT Fetch flow draft +
-If you plan to use cloud storage for the data migration, follow the steps in [Cloud storage security](#cloud-storage-security). +### Data export phase -### User permissions +In this first phase, MOLT Fetch connects to the source database and exports table data to intermediate storage. -The SQL user running MOLT Fetch requires specific privileges on both the source and target databases: +- [**Selective data movement**](#select-data-to-migrate): By default, MOLT Fetch moves all data from the --source database to CockroachDB. If instead you want to move a subset of the available data, use the [`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter), [`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter), and [`--filter-path`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) flags. -| Database | Required Privileges | Details | -|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------| -| PostgreSQL source | | [Create PostgreSQL migration user]({% link molt/migrate-bulk-load.md %}#create-migration-user-on-source-database) | -| MySQL source | | [Create MySQL migration user]({% link molt/migrate-bulk-load.md %}?filters=mysql#create-migration-user-on-source-database) | -| Oracle source | | [Create Oracle migration user]({% link molt/migrate-bulk-load.md %}?filters=oracle#create-migration-user-on-source-database) | -| CockroachDB target | | [Create CockroachDB user]({% link molt/migrate-bulk-load.md %}#create-the-sql-user) | +- [**Table sharding for concurrent export**](#shard-tables-for-concurrent-export): Multiple tables and _table shards_ can be exported simultaneously using [`--table-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#table-concurrency) and [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#export-concurrency), with large tables divided into shards for parallel processing. -## Installation +- [**Load into intermediate storage**](#define-intermediate-storage): Define whether data is written to cloud storage (Amazon S3, Google Cloud Storage, Azure Blob Storage), a local file server, or directly to CockroachDB memory. Intermediate storage enables [continuation after a MOLT Fetch failure](#continue-molt-fetch-after-interruption) by storing _continuation tokens_. -{% include molt/molt-install.md %} +### Data import phase -### Docker usage +MOLT Fetch loads the exported data from intermediate storage to the target CockroachDB database. -{% include molt/molt-docker.md %} +- [**`IMPORT INTO` vs. `COPY FROM`**](#import-into-vs-copy-from): This phase uses [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) (faster, tables offline during import) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) (slower, tables remain queryable) to move data. -## Migration phases +- [**Target table handling**](#handle-target-tables): Target tables can be automatically created, truncated, or left unchanged based on [`--table-handling`]({% link molt/molt-fetch-commands-and-flags.md %}#table-handling) settings. -MOLT Fetch operates in distinct phases to move data from source databases to CockroachDB. For details on available modes, refer to [Fetch mode](#fetch-mode). +- [**Schema/table transformations**](#define-transformations): Use JSON to map computed columns from source to target, map partitioned tables to a single target table, rename tables on the target database, or rename database schemas. -### Data export phase +Refer to [the MOLT Fetch flags]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) to learn how to use any flag for the `molt fetch` command. -MOLT Fetch connects to the source database and exports table data to intermediate storage. Data is written to [cloud storage](#bucket-path) (Amazon S3, Google Cloud Storage, Azure Blob Storage), a [local file server](#local-path), or [directly to CockroachDB memory](#direct-copy). Multiple tables and table shards can be exported simultaneously using [`--table-concurrency`](#global-flags) and [`--export-concurrency`](#global-flags), with large tables divided into shards for parallel processing. For details, refer to: +## Run MOLT Fetch -- [Fetch mode](#fetch-mode) -- [Table sharding](#table-sharding) +The following section describes how to use the [`molt fetch`]({% link molt/molt-fetch-commands-and-flags.md %}#commands) command and how to set its main [flags]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags). -### Data import phase - -MOLT Fetch loads the exported data into the target CockroachDB database. The process uses [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) (faster, tables offline during import) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) (slower, tables remain queryable) to move data. Data files are imported in configurable batches using [`--import-batch-size`](#global-flags), and target tables can be automatically created, truncated, or left unchanged based on [`--table-handling`](#global-flags) settings. For details, refer to: - -- [Data movement](#data-load-mode) -- [Target table handling](#target-table-handling) - -## Commands - -| Command | Usage | -|---------|---------------------------------------------------------------------------------------------------| -| `fetch` | Start the fetch task. This loads data from a source database to a target CockroachDB database. | - -### Subcommands - -| Command | Usage | -|--------------|----------------------------------------------------------------------| -| `tokens list` | List active [continuation tokens](#list-active-continuation-tokens). | - -## Flags - -### Global flags - -| Flag | Description | -|------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--source` | (Required) Connection string used to connect to the Oracle PDB (in a CDB/PDB architecture) or to a standalone database (non‑CDB). For details, refer to [Source and target databases](#source-and-target-databases). | -| `--source-cdb` | Connection string for the Oracle container database (CDB) when using a multitenant (CDB/PDB) architecture. Omit this flag on a non‑multitenant Oracle database. For details, refer to [Source and target databases](#source-and-target-databases). | -| `--target` | (Required) Connection string for the target database. For details, refer to [Source and target databases](#source-and-target-databases). | -| `--allow-tls-mode-disable` | Allow insecure connections to databases. Secure SSL/TLS connections should be used by default. This should be enabled **only** if secure SSL/TLS connections to the source or target database are not possible. | -| `--assume-role` | Service account to use for assume role authentication. `--use-implicit-auth` must be included. For example, `--assume-role='user-test@cluster-ephemeral.iam.gserviceaccount.com' --use-implicit-auth`. For details, refer to [Cloud Storage Authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}). | -| `--bucket-path` | The path within the [cloud storage](#bucket-path) bucket where intermediate files are written (e.g., `'s3://bucket/path'` or `'gs://bucket/path'`). Only the URL path is used; query parameters (e.g., credentials) are ignored. To pass in query parameters, use the appropriate flags: `--assume-role`, `--import-region`, `--use-implicit-auth`. | -| `--case-sensitive` | Toggle case sensitivity when comparing table and column names on the source and target. To disable case sensitivity, set `--case-sensitive=false`. If `=` is **not** included (e.g., `--case-sensitive false`), the flag is interpreted as `--case-sensitive` (i.e., `--case-sensitive=true`).

**Default:** `false` | -| `--cleanup` | Whether to delete intermediate files after moving data using [cloud or local storage](#data-path). **Note:** Cleanup does not occur on [continuation](#fetch-continuation). | -| `--compression` | Compression method for data when using [`IMPORT INTO`](#data-load-mode) (`gzip`/`none`).

**Default:** `gzip` | -| `--continuation-file-name` | Restart fetch at the specified filename if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | -| `--continuation-token` | Restart fetch at a specific table, using the specified continuation token, if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | -| `--crdb-pts-duration` | The duration for which each timestamp used in data export from a CockroachDB source is protected from garbage collection. This ensures that the data snapshot remains consistent. For example, if set to `24h`, each timestamp is protected for 24 hours from the initiation of the export job. This duration is extended at regular intervals specified in `--crdb-pts-refresh-interval`.

**Default:** `24h0m0s` | -| `--crdb-pts-refresh-interval` | The frequency at which the protected timestamp's validity is extended. This interval maintains protection of the data snapshot until data export from a CockroachDB source is completed. For example, if set to `10m`, the protected timestamp's expiration will be extended by the duration specified in `--crdb-pts-duration` (e.g., `24h`) every 10 minutes while export is not complete.

**Default:** `10m0s` | -| `--direct-copy` | Enables [direct copy](#direct-copy), which copies data directly from source to target without using an intermediate store. | -| `--export-concurrency` | Number of shards to export at a time per table, each on a dedicated thread. This controls how many shards are created for each individual table during the [data export phase](#data-export-phase) and is distinct from `--table-concurrency`, which controls how many tables are processed simultaneously. The total number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`. Tables can be sharded with a range-based or stats-based mechanism. For details, refer to [Table sharding](#table-sharding).

**Default:** `4` | -| `--export-retry-max-attempts` | Maximum number of retry attempts for source export queries when connection failures occur. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `3` | -| `--export-retry-max-duration` | Maximum total duration for retrying source export queries. If `0`, no time limit is enforced. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `5m0s` | -| `--filter-path` | Path to a JSON file defining row-level filters for the [data import phase](#data-import-phase). Refer to [Selective data movement](#selective-data-movement). | -| `--fetch-id` | Restart fetch task corresponding to the specified ID. If `--continuation-file-name` or `--continuation-token` are not specified, fetch restarts for all failed tables. | -| `--flush-rows` | Number of rows before the source data is flushed to intermediate files. **Note:** If `--flush-size` is also specified, the fetch behavior is based on the flag whose criterion is met first. | -| `--flush-size` | Size (in bytes) before the source data is flushed to intermediate files. **Note:** If `--flush-rows` is also specified, the fetch behavior is based on the flag whose criterion is met first. | -| `--ignore-replication-check` | Skip querying for replication checkpoints such as `pg_current_wal_insert_lsn()` on PostgreSQL, `gtid_executed` on MySQL, and `CURRENT_SCN` on Oracle. This option is intended for use during bulk load migrations or when doing a one-time data export from a read replica. | -| `--import-batch-size` | The number of files to be imported at a time to the target database during the [data import phase](#data-import-phase). This applies only when using [`IMPORT INTO`](#data-load-mode) for data movement. **Note:** Increasing this value can improve the performance of full-scan queries on the target database shortly after fetch completes, but very high values are not recommended. If any individual file in the import batch fails, you must [retry](#fetch-continuation) the entire batch.

**Default:** `1000` | -| `--import-region` | The region of the [cloud storage](#bucket-path) bucket. This applies only to [Amazon S3 buckets](#bucket-path). Set this flag only if you need to specify an `AWS_REGION` explicitly when using [`IMPORT INTO`](#data-load-mode) for data movement. For example, `--import-region=ap-south-1`. | -| `--local-path` | The path within the [local file server](#local-path) where intermediate files are written (e.g., `data/migration/cockroach`). `--local-path-listen-addr` must be specified. | -| `--local-path-crdb-access-addr` | Address of a [local file server](#local-path) that is **publicly accessible**. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB {{ site.data.products.cloud }} deployment). `--local-path` and `--local-path-listen-addr` must be specified.

**Default:** Value of `--local-path-listen-addr`. | -| `--local-path-listen-addr` | Write intermediate files to a [local file server](#local-path) at the specified address (e.g., `'localhost:3000'`). `--local-path` must be specified. | -| `--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `fetch-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`. | -| `--logging` | Level at which to log messages (`trace`/`debug`/`info`/`warn`/`error`/`fatal`/`panic`).

**Default:** `info` | -| `--metrics-listen-addr` | Address of the Prometheus metrics endpoint, which has the path `{address}/metrics`. For details on important metrics to monitor, refer to [Monitoring](#monitoring).

**Default:** `'127.0.0.1:3030'` | -| `--mode` | Configure the MOLT Fetch behavior: `data-load`, `export-only`, or `import-only`. For details, refer to [Fetch mode](#fetch-mode).

**Default:** `data-load` | -| `--non-interactive` | Run the fetch task without interactive prompts. This is recommended **only** when running `molt fetch` in an automated process (i.e., a job or continuous integration). | -| `--pglogical-replication-slot-name` | Name of a PostgreSQL replication slot that will be created before taking a snapshot of data. Must match the slot name specified with `--slotName` in the [MOLT Replicator command]({% link molt/molt-replicator.md %}#replication-checkpoints). For details, refer to [Load before replication](#load-before-replication). | -| `--pglogical-publication-and-slot-drop-and-recreate` | Drop the PostgreSQL publication and replication slot if they exist, then recreate them. Creates a publication named `molt_fetch` and the replication slot specified with `--pglogical-replication-slot-name`. For details, refer to [Load before replication](#load-before-replication).

**Default:** `false` | -| `--pprof-listen-addr` | Address of the pprof endpoint.

**Default:** `'127.0.0.1:3031'` | -| `--row-batch-size` | Number of rows per shard to export at a time. For details on sharding, refer to [Table sharding](#table-sharding). See also [Best practices](#best-practices).

**Default:** `100000` | -| `--schema-filter` | Move schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression). Not used with MySQL sources. For Oracle sources, this filter is case-insensitive.

**Default:** `'.*'` | -| `--skip-pk-check` | Skip primary-key matching to allow data load when source or target tables have missing or mismatched primary keys. Disables sharding and bypasses `--export-concurrency` and `--row-batch-size` settings. Refer to [Skip primary key matching](#skip-primary-key-matching).

**Default:** `false` | -| `--table-concurrency` | Number of tables to export at a time. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`.

**Default:** `4` | -| `--table-exclusion-filter` | Exclude tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

This value **cannot** be set to `'.*'`, which would cause every table to be excluded.

**Default:** Empty string | -| `--table-filter` | Move tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | -| `--table-handling` | How tables are initialized on the target database (`none`/`drop-on-target-and-recreate`/`truncate-if-exists`). For details, see [Target table handling](#target-table-handling).

**Default:** `none` | -| `--transformations-file` | Path to a JSON file that defines transformations to be performed on the target schema during the fetch task. Refer to [Transformations](#transformations). | -| `--type-map-file` | Path to a JSON file that contains explicit type mappings for automatic schema creation, when enabled with `--table-handling drop-on-target-and-recreate`. For details on the JSON format and valid type mappings, see [type mapping](#type-mapping). | -| `--use-console-writer` | Use the console writer, which has cleaner log output but introduces more latency.

**Default:** `false` (log as structured JSON) | -| `--use-copy` | Use [`COPY FROM`](#data-load-mode) to move data. This makes tables queryable during data load, but is slower than using `IMPORT INTO`. For details, refer to [Data load mode](#data-load-mode). | -| `--use-implicit-auth` | Use [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) for [cloud storage](#bucket-path) URIs. | -| `--use-stats-based-sharding` | Enable statistics-based sharding for PostgreSQL sources. This allows sharding of tables with primary keys of any data type and can create more evenly distributed shards compared to the default numerical range sharding. Requires PostgreSQL 11+ and access to `pg_stats`. For details, refer to [Table sharding](#table-sharding). | - - -### `tokens list` flags - -| Flag | Description | -|-----------------------|---------------------------------------------------------------------------------------------------------------------------------------------| -| `--conn-string` | (Required) Connection string for the target database. For details, see [List active continuation tokens](#list-active-continuation-tokens). | -| `-n`, `--num-results` | Number of results to return. | - - -## Usage - -The following sections describe how to use the `molt fetch` [flags](#flags). - -### Source and target databases +### Specify source and target databases {{site.data.alerts.callout_success}} -Follow the recommendations in [Connection security](#connection-security). +Follow the recommendations in [Connection security]({% link molt/molt-fetch-best-practices.md %}#connection-security). {{site.data.alerts.end}} -`--source` specifies the connection string of the source database. +[`--source`]({% link molt/molt-fetch-commands-and-flags.md %}#source) specifies the connection string of the source database. PostgreSQL or CockroachDB connection string: @@ -181,7 +74,7 @@ Oracle connection string: --source 'oracle://{username}:{password}@{host}:{port}/{service_name}' ~~~ -For Oracle Multitenant databases, `--source-cdb` specifies the container database (CDB) connection. `--source` specifies the pluggable database (PDB): +For Oracle Multitenant databases, [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) specifies the container database (CDB) connection. [`--source`]({% link molt/molt-fetch-commands-and-flags.md %}#source) specifies the pluggable database (PDB): {% include_cached copy-clipboard.html %} ~~~ @@ -189,16 +82,16 @@ For Oracle Multitenant databases, `--source-cdb` specifies the container databas --source-cdb 'oracle://{username}:{password}@{host}:{port}/{cdb_service_name}' ~~~ -`--target` specifies the [CockroachDB connection string]({% link {{site.current_cloud_version}}/connection-parameters.md %}#connect-using-a-url): +[`--target`]({% link molt/molt-fetch-commands-and-flags.md %}#target) specifies the [CockroachDB connection string]({% link {{site.current_cloud_version}}/connection-parameters.md %}#connect-using-a-url): {% include_cached copy-clipboard.html %} ~~~ --target 'postgresql://{username}:{password}@{host}:{port}/{database}' ~~~ -### Fetch mode +### Define fetch mode -`--mode` specifies the MOLT Fetch behavior. +[`--mode`]({% link molt/molt-fetch-commands-and-flags.md %}#mode) specifies the MOLT Fetch behavior. `data-load` (default) instructs MOLT Fetch to load the source data into CockroachDB: @@ -221,29 +114,82 @@ For Oracle Multitenant databases, `--source-cdb` specifies the container databas --mode import-only ~~~ -### Data load mode +### Select data to migrate -MOLT Fetch can use either [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) to load data into CockroachDB. +By default, MOLT Fetch moves all data from the [`--source`]({% link molt/molt-fetch-commands-and-flags.md %}#source) database to CockroachDB. Use the following flags to move a subset of data. -By default, MOLT Fetch uses `IMPORT INTO`: +#### Schema and table selection -- `IMPORT INTO` achieves the highest throughput, but [requires taking the CockroachDB tables **offline**]({% link {{site.current_cloud_version}}/import-into.md %}#considerations) to achieve its import speed. Tables are taken back online once an [import job]({% link {{site.current_cloud_version}}/import-into.md %}#view-and-control-import-jobs) completes successfully. See [Best practices](#best-practices). -- `IMPORT INTO` supports compression using the `--compression` flag, which reduces the amount of storage used. +[`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) specifies a range of schema objects to move to CockroachDB, formatted as a POSIX regex string. For example, to move every table in the source database's `migration_schema` schema: -`--use-copy` configures MOLT Fetch to use `COPY FROM`: +{% include_cached copy-clipboard.html %} +~~~ +--schema-filter 'migration_schema' +~~~ -- `COPY FROM` enables your tables to remain online and accessible. However, it is slower than using [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}). -- `COPY FROM` does not support compression. +{{site.data.alerts.callout_info}} +[`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) does not apply to MySQL sources because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. +{{site.data.alerts.end}} + +[`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter) and [`--table-exclusion-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-exclusion-filter) specify tables to include and exclude from the migration, respectively, formatted as POSIX regex strings. For example, to move every source table that has "user" in the table name and exclude every source table that has "temp" in the table name: + +{% include_cached copy-clipboard.html %} +~~~ +--table-filter '.*user.*' --table-exclusion-filter '.*temp.*' +~~~ + +To filter tables during replication, use [MOLT Replicator]({% link molt/molt-replicator.md %}) with [userscripts]({% link molt/userscript-cookbook.md %}#filter-a-single-table). + +#### Row-level filtering + +Use [`--filter-path`]({% link molt/molt-fetch-commands-and-flags.md %}#filter-path) to specify the path to a JSON file that defines row-level filtering for data load. This enables you to move a subset of data in a table, rather than all data in the table. To apply row-level filters during replication, use [MOLT Replicator]({% link molt/molt-replicator.md %}) with [userscripts]({% link molt/userscript-cookbook.md %}#select-data-to-replicate). + +{% include_cached copy-clipboard.html %} +~~~ +--filter-path 'data-filter.json' +~~~ + +The JSON file should contain one or more entries in `filters`, each with a `resource_specifier` (`schema` and `table`) and a SQL expression `expr`. For example, the following example exports only rows from `migration_schema.t1` where `v > 100`: + +~~~ json +{ + "filters": [ + { + "resource_specifier": { + "schema": "migration_schema", + "table": "t1" + }, + "expr": "v > 100" + } + ] +} +~~~ + +`expr` is case-sensitive and must be valid in your source dialect. For example, when using Oracle as the source, quote all identifiers and escape embedded quotes: + +~~~ json +{ + "filters": [ + { + "resource_specifier": { + "schema": "C##FETCHORACLEFILTERTEST", + "table": "FILTERTBL" + }, + "expr": "ABS(\"X\") > 10 AND CEIL(\"X\") < 100 AND FLOOR(\"X\") > 0 AND ROUND(\"X\", 2) < 100.00 AND TRUNC(\"X\", 0) > 0 AND MOD(\"X\", 2) = 0 AND FLOOR(\"X\" / 3) > 1" + } + ] +} +~~~ {{site.data.alerts.callout_info}} -`COPY FROM` is also used for [direct copy](#direct-copy). +If the expression references columns that are not indexed, MOLT Fetch will emit a warning like: `filter expression 'v > 100' contains column 'v' which is not indexed. This may lead to performance issues.` {{site.data.alerts.end}} -### Table sharding +### Shard tables for concurrent export During the [data export phase](#data-export-phase), MOLT Fetch can divide large tables into multiple shards for concurrent export. -To control the number of shards created per table, use the `--export-concurrency` flag. For example: +To control the number of shards created per table, use the [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#export-concurrency) flag. For example: {% include_cached copy-clipboard.html %} ~~~ @@ -251,14 +197,14 @@ To control the number of shards created per table, use the `--export-concurrency ~~~ {{site.data.alerts.callout_success}} -For performance considerations with concurrency settings, refer to [Best practices](#best-practices). +For performance considerations with concurrency settings, refer to [Best practices]({% link molt/molt-fetch-best-practices.md %}). {{site.data.alerts.end}} Two sharding mechanisms are available: - **Range-based sharding (default):** Tables are divided based on numerical ranges found in primary key values. Only tables with [`INT`]({% link {{ site.current_cloud_version }}/int.md %}), [`FLOAT`]({% link {{ site.current_cloud_version }}/float.md %}), or [`UUID`]({% link {{ site.current_cloud_version }}/uuid.md %}) primary keys can use range-based sharding. Tables with other primary key data types export as a single shard. -- **Stats-based sharding (PostgreSQL only):** Enable with [`--use-stats-based-sharding`](#global-flags) for PostgreSQL 11+ sources. Tables are divided by analyzing the [`pg_stats`](https://www.postgresql.org/docs/current/view-pg-stats.htm) view to create more evenly distributed shards, up to a maximum of 200 shards. Primary keys of any data type are supported. +- **Stats-based sharding (PostgreSQL only):** Enable with [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#use-stats-based-sharding) for PostgreSQL 11+ sources. Tables are divided by analyzing the [`pg_stats`](https://www.postgresql.org/docs/current/view-pg-stats.htm) view to create more evenly distributed shards, up to a maximum of 200 shards. Primary keys of any data type are supported. Stats-based sharding requires that the user has `SELECT` permissions on source tables and on each table's `pg_stats` view. The latter permission is automatically granted to users that can read the table. @@ -280,7 +226,7 @@ Large tables may take time to analyze, but `ANALYZE` can run in the background. Migration without running `ANALYZE` will still work, but shard distribution may be less even. {{site.data.alerts.end}} -When using `--use-stats-based-sharding`, monitor the log output for each table you want to migrate. +When using [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#use-stats-based-sharding), monitor the log output for each table you want to migrate. If stats-based sharding is successful on a table, MOLT logs the following `INFO` message: @@ -294,25 +240,25 @@ If stats-based sharding fails on a table, MOLT logs the following `WARNING` mess Warning: failed to shard table {table_name} using stats based sharding: {reason_for_failure}, falling back to non stats based sharding ~~~ -The number of shards is dependent on the number of distinct values in the first primary key column of the table to be migrated. If this is different from the number of shards requested with `--export-concurrency`, MOLT logs the following `WARNING` and continues with the migration: +The number of shards is dependent on the number of distinct values in the first primary key column of the table to be migrated. If this is different from the number of shards requested with [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#export-concurrency), MOLT logs the following `WARNING` and continues with the migration: ~~~ number of shards formed: {num_shards_formed} is not equal to number of shards requested: {num_shards_requested} for table {table_name} ~~~ -Because stats-based sharding analyzes the entire table, running `--use-stats-based-sharding` with [`--filter-path`](#global-flags) (refer to [Selective data movement](#selective-data-movement)) will cause imbalanced shards to form. +Because stats-based sharding analyzes the entire table, running [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#use-stats-based-sharding) with [`--filter-path`]({% link molt/molt-fetch-commands-and-flags.md %}#filter-path) (refer to [Selective data movement](#select-data-to-migrate)) will cause imbalanced shards to form. -### Data path +### Define intermediate storage MOLT Fetch can move the source data to CockroachDB via [cloud storage](#bucket-path), a [local file server](#local-path), or [directly](#direct-copy) without an intermediate store. #### Bucket path {{site.data.alerts.callout_success}} -Only the path specified in `--bucket-path` is used. Query parameters, such as credentials, are ignored. To authenticate cloud storage, follow the steps in [Secure cloud storage](#cloud-storage-security). +Only the path specified in [`--bucket-path`]({% link molt/molt-fetch-commands-and-flags.md %}#bucket-path) is used. Query parameters, such as credentials, are ignored. To authenticate cloud storage, follow the steps in [Secure cloud storage]({% link molt/molt-fetch-best-practices.md %}#cloud-storage-security). {{site.data.alerts.end}} -`--bucket-path` instructs MOLT Fetch to write intermediate files to a path within [Google Cloud Storage](https://cloud.google.com/storage/docs/buckets), [Amazon S3](https://aws.amazon.com/s3/), or [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) to which you have the necessary permissions. Use additional [flags](#global-flags), shown in the following examples, to specify authentication or region parameters as required for bucket access. +[`--bucket-path`]({% link molt/molt-fetch-commands-and-flags.md %}#bucket-path) instructs MOLT Fetch to write intermediate files to a path within [Google Cloud Storage](https://cloud.google.com/storage/docs/buckets), [Amazon S3](https://aws.amazon.com/s3/), or [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) to which you have the necessary permissions. Use additional [flags]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags), shown in the following examples, to specify authentication or region parameters as required for bucket access. Connect to a Google Cloud Storage bucket with [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}#google-cloud-storage-implicit) and [assume role]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}#set-up-google-cloud-storage-assume-role): @@ -332,7 +278,7 @@ Connect to an Amazon S3 bucket and explicitly specify the `ap_south-1` region: ~~~ {{site.data.alerts.callout_info}} -When `--import-region` is set, `IMPORT INTO` must be used for [data movement](#data-load-mode). +When [`--import-region`]({% link molt/molt-fetch-commands-and-flags.md %}#import-region) is set, `IMPORT INTO` must be used for [data movement](#import-into-vs-copy-from). {{site.data.alerts.end}} Connect to an Azure Blob Storage container with [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}?filters=azure#azure-blob-storage-implicit-authentication): @@ -345,7 +291,7 @@ Connect to an Azure Blob Storage container with [implicit authentication]({% lin #### Local path -`--local-path` instructs MOLT Fetch to write intermediate files to a path within a [local file server]({% link {{site.current_cloud_version}}/use-a-local-file-server.md %}). `local-path-listen-addr` specifies the address of the local file server. For example: +[`--local-path`]({% link molt/molt-fetch-commands-and-flags.md %}#local-path) instructs MOLT Fetch to write intermediate files to a path within a [local file server]({% link {{site.current_cloud_version}}/use-a-local-file-server.md %}). [`--local-path-listen-addr`]({% link molt/molt-fetch-commands-and-flags.md %}#local-path-listen-addr) specifies the address of the local file server. For example: {% include_cached copy-clipboard.html %} ~~~ @@ -353,9 +299,9 @@ Connect to an Azure Blob Storage container with [implicit authentication]({% lin --local-path-listen-addr 'localhost:3000' ~~~ -In some cases, CockroachDB will not be able to use the local address specified by `--local-path-listen-addr`. This will depend on where CockroachDB is deployed, the runtime OS, and the source dialect. +In some cases, CockroachDB will not be able to use the local address specified by [`--local-path-listen-addr`]({% link molt/molt-fetch-commands-and-flags.md %}#local-path-listen-addr). This will depend on where CockroachDB is deployed, the runtime OS, and the source dialect. -For example, if you are migrating to CockroachDB {{ site.data.products.cloud }}, such that the {{ site.data.products.cloud }} cluster is in a different physical location than the machine running `molt fetch`, then CockroachDB cannot reach an address such as `localhost:3000`. In these situations, use `--local-path-crdb-access-addr` to specify an address for the local file server that is **publicly accessible**. For example: +For example, if you are migrating to CockroachDB {{ site.data.products.cloud }}, such that the {{ site.data.products.cloud }} cluster is in a different physical location than the machine running `molt fetch`, then CockroachDB cannot reach an address such as `localhost:3000`. In these situations, use [`--local-path-crdb-access-addr`]({% link molt/molt-fetch-commands-and-flags.md %}#local-path-crdb-access-addr) to specify an address for the local file server that is **publicly accessible**. For example: {% include_cached copy-clipboard.html %} ~~~ @@ -370,90 +316,38 @@ For example, if you are migrating to CockroachDB {{ site.data.products.cloud }}, #### Direct copy -`--direct-copy` specifies that MOLT Fetch should use `COPY FROM` to move the source data directly to CockroachDB without an intermediate store: +[`--direct-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#direct-copy) specifies that MOLT Fetch should use `COPY FROM` to move the source data directly to CockroachDB without an intermediate store: -- Because the data is held in memory, the machine must have sufficient RAM for the data currently in flight: +- Because the data is held in memory, the machine must have sufficient RAM for the data currently in flight: ~~~ average size of each row * --row-batch-size * --export-concurrency * --table-concurrency ~~~ -- Direct copy does not support compression or [continuation](#fetch-continuation). -- The [`--use-copy`](#data-load-mode) flag is redundant with `--direct-copy`. - -### Schema and table selection - -By default, MOLT Fetch moves all data from the [`--source`](#source-and-target-databases) database to CockroachDB. Use the following flags to move a subset of data. - -`--schema-filter` specifies a range of schema objects to move to CockroachDB, formatted as a POSIX regex string. For example, to move every table in the source database's `migration_schema` schema: - -{% include_cached copy-clipboard.html %} -~~~ ---schema-filter 'migration_schema' -~~~ - -{{site.data.alerts.callout_info}} -`--schema-filter` does not apply to MySQL sources because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. -{{site.data.alerts.end}} - -`--table-filter` and `--table-exclusion-filter` specify tables to include and exclude from the migration, respectively, formatted as POSIX regex strings. For example, to move every source table that has "user" in the table name and exclude every source table that has "temp" in the table name: - -{% include_cached copy-clipboard.html %} -~~~ ---table-filter '.*user.*' --table-exclusion-filter '.*temp.*' -~~~ - -To filter tables during replication, use [MOLT Replicator]({% link molt/molt-replicator.md %}) with [userscripts]({% link molt/userscript-cookbook.md %}#filter-a-single-table). +- Direct copy does not support compression or [continuation](#continue-molt-fetch-after-interruption). +- The [`--use-copy`](#import-into-vs-copy-from) flag is redundant with [`--direct-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#direct-copy). +### `IMPORT INTO` vs. `COPY FROM` -### Selective data movement +MOLT Fetch can use either [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) to load data into CockroachDB. -Use `--filter-path` to specify the path to a JSON file that defines row-level filtering for data load. This enables you to move a subset of data in a table, rather than all data in the table. To apply row-level filters during replication, use [MOLT Replicator]({% link molt/molt-replicator.md %}) with [userscripts]({% link molt/userscript-cookbook.md %}#select-data-to-replicate). +By default, MOLT Fetch uses `IMPORT INTO`: -{% include_cached copy-clipboard.html %} -~~~ ---filter-path 'data-filter.json' -~~~ +- `IMPORT INTO` achieves the highest throughput, but [requires taking the CockroachDB tables **offline**]({% link {{site.current_cloud_version}}/import-into.md %}#considerations) to achieve its import speed. Tables are taken back online once an [import job]({% link {{site.current_cloud_version}}/import-into.md %}#view-and-control-import-jobs) completes successfully. See [Best practices]({% link molt/molt-fetch-best-practices.md %}). +- `IMPORT INTO` supports compression using the [`--compression`]({% link molt/molt-fetch-commands-and-flags.md %}#compression) flag, which reduces the amount of storage used. -The JSON file should contain one or more entries in `filters`, each with a `resource_specifier` (`schema` and `table`) and a SQL expression `expr`. For example, the following example exports only rows from `migration_schema.t1` where `v > 100`: +[`--use-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#use-copy) configures MOLT Fetch to use `COPY FROM`: -~~~ json -{ - "filters": [ - { - "resource_specifier": { - "schema": "migration_schema", - "table": "t1" - }, - "expr": "v > 100" - } - ] -} -~~~ - -`expr` is case-sensitive and must be valid in your source dialect. For example, when using Oracle as the source, quote all identifiers and escape embedded quotes: - -~~~ json -{ - "filters": [ - { - "resource_specifier": { - "schema": "C##FETCHORACLEFILTERTEST", - "table": "FILTERTBL" - }, - "expr": "ABS(\"X\") > 10 AND CEIL(\"X\") < 100 AND FLOOR(\"X\") > 0 AND ROUND(\"X\", 2) < 100.00 AND TRUNC(\"X\", 0) > 0 AND MOD(\"X\", 2) = 0 AND FLOOR(\"X\" / 3) > 1" - } - ] -} -~~~ +- `COPY FROM` enables your tables to remain online and accessible. However, it is slower than using [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}). +- `COPY FROM` does not support compression. {{site.data.alerts.callout_info}} -If the expression references columns that are not indexed, MOLT Fetch will emit a warning like: `filter expression ‘v > 100' contains column ‘v' which is not indexed. This may lead to performance issues.` +`COPY FROM` is also used for [direct copy](#direct-copy). {{site.data.alerts.end}} -### Target table handling +### Handle target tables -`--table-handling` defines how MOLT Fetch loads data on the CockroachDB tables that [match the selection](#schema-and-table-selection). +[`--table-handling`]({% link molt/molt-fetch-commands-and-flags.md %}#table-handling) defines how MOLT Fetch loads data on the CockroachDB tables that [match the selection](#schema-and-table-selection). To load the data without changing the existing data in the tables, use `none`: @@ -480,21 +374,21 @@ When using the `drop-on-target-and-recreate` option, MOLT Fetch creates a new Co #### Mismatch handling -If either [`none`](#target-table-handling) or [`truncate-if-exists`](#target-table-handling) is set, `molt fetch` loads data into the existing tables on the target CockroachDB database. If the target schema mismatches the source schema, `molt fetch` will exit early in certain cases, and will need to be re-run from the beginning. For details, refer to [Fetch exits early due to mismatches](#fetch-exits-early-due-to-mismatches). +If either [`none`](#handle-target-tables) or [`truncate-if-exists`](#handle-target-tables) is set, `molt fetch` loads data into the existing tables on the target CockroachDB database. If the target schema mismatches the source schema, `molt fetch` will exit early in certain cases, and will need to be re-run from the beginning. For details, refer to [Fetch exits early due to mismatches]({% link molt/molt-fetch-troubleshooting.md %}#fetch-exits-early-due-to-mismatches). {{site.data.alerts.callout_info}} -This does not apply when [`drop-on-target-and-recreate`](#target-table-handling) is specified, since this option automatically creates a compatible CockroachDB schema. +This does not apply when [`drop-on-target-and-recreate`](#handle-target-tables) is specified, since this option automatically creates a compatible CockroachDB schema. {{site.data.alerts.end}} #### Skip primary key matching -`--skip-pk-check` removes the [requirement that source and target tables share matching primary keys](#fetch-exits-early-due-to-mismatches) for data load. When this flag is set: +[`--skip-pk-check`]({% link molt/molt-fetch-commands-and-flags.md %}#skip-pk-check) removes the [requirement that source and target tables share matching primary keys]({% link molt/molt-fetch-troubleshooting.md %}#fetch-exits-early-due-to-mismatches) for data load. When this flag is set: - The data load proceeds even if the source or target table lacks a primary key, or if their primary key columns do not match. -- [Table sharding](#table-sharding) is disabled. Each table is exported in a single batch within one shard, bypassing `--export-concurrency` and `--row-batch-size`. As a result, memory usage and execution time may increase due to full table scans. +- [Table sharding](#shard-tables-for-concurrent-export) is disabled. Each table is exported in a single batch within one shard, bypassing [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#export-concurrency) and [`--row-batch-size`]({% link molt/molt-fetch-commands-and-flags.md %}#row-batch-size). As a result, memory usage and execution time may increase due to full table scans. - If the source table contains duplicate rows but the target has [`PRIMARY KEY`]({% link {{ site.current_cloud_version }}/primary-key.md %}) or [`UNIQUE`]({% link {{ site.current_cloud_version }}/unique.md %}) constraints, duplicate rows are deduplicated during import. -When `--skip-pk-check` is set, all tables are treated as if they lack a primary key, and are thus exported in a single unsharded batch. To avoid performance issues, use this flag with `--table-filter` to target only tables **without** a primary key. +When [`--skip-pk-check`]({% link molt/molt-fetch-commands-and-flags.md %}#skip-pk-check) is set, all tables are treated as if they lack a primary key, and are thus exported in a single unsharded batch. To avoid performance issues, use this flag with [`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter) to target only tables **without** a primary key. For example: @@ -506,7 +400,7 @@ molt fetch \ --skip-pk-check ~~~ -Example log output when `--skip-pk-check` is enabled: +Example log output when [`--skip-pk-check`]({% link molt/molt-fetch-commands-and-flags.md %}#skip-pk-check) is enabled: ~~~json {"level":"info","message":"sharding is skipped for table public.nopktbl - flag skip-pk-check is specified and thus no PK for source table is specified"} @@ -514,7 +408,8 @@ Example log output when `--skip-pk-check` is enabled: #### Type mapping -If [`drop-on-target-and-recreate`](#target-table-handling) is set, MOLT Fetch automatically creates a CockroachDB schema that is compatible with the source data. The column types are determined as follows: +If [`drop-on-target-and-recreate`](#handle-target-tables) is set, MOLT Fetch automatically creates a CockroachDB schema that is compatible with the source data. The column types are determined as follows: + - PostgreSQL types are mapped to existing CockroachDB [types]({% link {{site.current_cloud_version}}/data-types.md %}) that have the same [`OID`]({% link {{site.current_cloud_version}}/oid.md %}). - The following MySQL types are mapped to corresponding CockroachDB types: @@ -572,7 +467,7 @@ If [`drop-on-target-and-recreate`](#target-table-handling) is set, MOLT Fetch au | `SDO_GEOMETRY` | [`GEOMETRY`]({% link {{site.current_cloud_version}}/architecture/glossary.md %}#geometry) | Spatial type (PostGIS-style) | | `XMLTYPE` | [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) | Stored as text | -- To override the default mappings for automatic schema creation, you can map source to target CockroachDB types explicitly. These are defined in the JSON file indicated by the `--type-map-file` flag. The allowable custom mappings are valid CockroachDB aliases, casts, and the following mappings specific to MOLT Fetch and [Verify]({% link molt/molt-verify.md %}): +- To override the default mappings for automatic schema creation, you can map source to target CockroachDB types explicitly. These are defined in the JSON file indicated by the [`--type-map-file`]({% link molt/molt-fetch-commands-and-flags.md %}#type-map-file) flag. The allowable custom mappings are valid CockroachDB aliases, casts, and the following mappings specific to MOLT Fetch and [Verify]({% link molt/molt-verify.md %}): - [`TIMESTAMP`]({% link {{site.current_cloud_version}}/timestamp.md %}) <> [`TIMESTAMPTZ`]({% link {{site.current_cloud_version}}/timestamp.md %}) - [`VARCHAR`]({% link {{site.current_cloud_version}}/string.md %}) <> [`UUID`]({% link {{site.current_cloud_version}}/uuid.md %}) @@ -582,7 +477,7 @@ If [`drop-on-target-and-recreate`](#target-table-handling) is set, MOLT Fetch au - [`JSONB`]({% link {{site.current_cloud_version}}/jsonb.md %}) <> [`TEXT`]({% link {{site.current_cloud_version}}/string.md %}) - [`INET`]({% link {{site.current_cloud_version}}/inet.md %}) <> [`TEXT`]({% link {{site.current_cloud_version}}/string.md %}) -`--type-map-file` specifies the path to the JSON file containing the explicit type mappings. For example: +[`--type-map-file`]({% link molt/molt-fetch-commands-and-flags.md %}#type-map-file) specifies the path to the JSON file containing the explicit type mappings. For example: {% include_cached copy-clipboard.html %} ~~~ @@ -616,7 +511,7 @@ The following JSON example defines two type mappings: - `source_type` specifies the source type to be mapped. - `crdb_type` specifies the target CockroachDB [type]({% link {{ site.current_cloud_version }}/data-types.md %}) to be mapped. -### Transformations +### Define transformations You can define transformation rules to be performed on the target database during the fetch task. These can be used to: @@ -625,7 +520,7 @@ You can define transformation rules to be performed on the target database durin - Rename tables on the target database. - Rename database schemas. -Transformation rules are defined in the JSON file indicated by the `--transformations-file` flag. For example: +Transformation rules are defined in the JSON file indicated by the [`--transformations-file`]({% link molt/molt-fetch-commands-and-flags.md %}#transformations-file) flag. For example: {% include_cached copy-clipboard.html %} ~~~ @@ -711,8 +606,8 @@ The following JSON example defines three transformation rules: rule `1` [maps co For n-to-1 mappings: - - Use [`--use-copy`](#data-load-mode) or [`--direct-copy`](#direct-copy) for data movement. - - Manually create the target table. Do not use [`--table-handling drop-on-target-and-recreate`](#target-table-handling). + - Use [`--use-copy`](#import-into-vs-copy-from) or [`--direct-copy`](#direct-copy) for data movement. + - Manually create the target table. Do not use [`--table-handling drop-on-target-and-recreate`](#handle-target-tables). [Example rule `2`](#transformation-rules-example) maps all table names with prefix `charges_part` to a single `charges` table on CockroachDB (an n-to-1 mapping). This assumes that all matching `charges_part.*` tables have the same table definition: @@ -757,7 +652,7 @@ Each rule is applied in the order it is defined. If two rules overlap, the later To verify that the logging shows that the computed columns are being created: -When running `molt fetch`, set `--logging debug` and look for `ALTER TABLE ... ADD COLUMN` statements with the `STORED` or `VIRTUAL` keywords in the log output: +When running `molt fetch`, set [`--logging`]({% link molt/molt-fetch-commands-and-flags.md %}#logging) `debug` and look for `ALTER TABLE ... ADD COLUMN` statements with the `STORED` or `VIRTUAL` keywords in the log output: ~~~ json {"level":"debug","time":"2024-07-22T12:01:51-04:00","message":"running: ALTER TABLE IF EXISTS public.computed ADD COLUMN computed_col INT8 NOT NULL AS ((col1 + col2)) STORED"} @@ -779,7 +674,7 @@ SHOW CREATE TABLE computed; | ) ~~~ -### Fetch continuation +## Continue MOLT Fetch after interruption If MOLT Fetch fails while loading data into CockroachDB from intermediate files, it exits with an error message, fetch ID, and [continuation token](#list-active-continuation-tokens) for each table that failed to load on the target database. You can use this information to continue the task from the *continuation point* where it was interrupted. @@ -792,14 +687,14 @@ Continuation is only possible under the following conditions: Only one fetch ID and set of continuation tokens, each token corresponding to a table, are active at any time. See [List active continuation tokens](#list-active-continuation-tokens). {{site.data.alerts.end}} -To retry all data starting from the continuation point, reissue the `molt fetch` command and include the `--fetch-id`. +To retry all data starting from the continuation point, reissue the `molt fetch` command and include the [`--fetch-id`]({% link molt/molt-fetch-commands-and-flags.md %}#fetch-id). {% include_cached copy-clipboard.html %} ~~~ --fetch-id d44762e5-6f70-43f8-8e15-58b4de10a007 ~~~ -To retry a specific table that failed, include both `--fetch-id` and `--continuation-token`. The latter flag specifies a token string that corresponds to a specific table on the source database. A continuation token is written in the `molt fetch` output for each failed table. If the fetch task encounters a subsequent error, it generates a new token for each failed table. See [List active continuation tokens](#list-active-continuation-tokens). +To retry a specific table that failed, include both [`--fetch-id`]({% link molt/molt-fetch-commands-and-flags.md %}#fetch-id) and [`--continuation-token`]({% link molt/molt-fetch-commands-and-flags.md %}#continuation-token). The latter flag specifies a token string that corresponds to a specific table on the source database. A continuation token is written in the `molt fetch` output for each failed table. If the fetch task encounters a subsequent error, it generates a new token for each failed table. See [List active continuation tokens](#list-active-continuation-tokens). {{site.data.alerts.callout_info}} This will retry only the table that corresponds to the continuation token. If the fetch task succeeds, there may still be source data that is not yet loaded into CockroachDB. @@ -811,7 +706,7 @@ This will retry only the table that corresponds to the continuation token. If th --continuation-token 011762e5-6f70-43f8-8e15-58b4de10a007 ~~~ -To retry all data starting from a specific file, include both `--fetch-id` and `--continuation-file-name`. The latter flag specifies the filename of an intermediate file in [cloud or local storage](#data-path). All filenames are prepended with `part_` and have the `.csv.gz` or `.csv` extension, depending on compression type (gzip by default). For example: +To retry all data starting from a specific file, include both [`--fetch-id`]({% link molt/molt-fetch-commands-and-flags.md %}#fetch-id) and [`--continuation-file-name`]({% link molt/molt-fetch-commands-and-flags.md %}#continuation-file-name). The latter flag specifies the filename of an intermediate file in [cloud or local storage](#define-intermediate-storage). All filenames are prepended with `part_` and have the `.csv.gz` or `.csv` extension, depending on compression type (gzip by default). For example: {% include_cached copy-clipboard.html %} ~~~ @@ -823,9 +718,9 @@ To retry all data starting from a specific file, include both `--fetch-id` and ` Continuation is not possible when using [direct copy](#direct-copy). {{site.data.alerts.end}} -#### List active continuation tokens +### List active continuation tokens -To view all active continuation tokens, issue a `molt fetch tokens list` command along with `--conn-string`, which specifies the [connection string]({% link {{site.current_cloud_version}}/connection-parameters.md %}#connect-using-a-url) for the target CockroachDB database. For example: +To view all active continuation tokens, issue a `molt fetch tokens list` command along with [`--conn-string`]({% link molt/molt-fetch-commands-and-flags.md %}#conn-string), which specifies the [connection string]({% link {{site.current_cloud_version}}/connection-parameters.md %}#connect-using-a-url) for the target CockroachDB database. For example: {% include_cached copy-clipboard.html %} ~~~ shell @@ -842,9 +737,9 @@ molt fetch tokens list \ Continuation Tokens. ~~~ -### CDC cursor +## Enable replication -A change data capture (CDC) cursor is written to the output as `cdc_cursor` at the beginning and end of the fetch task. +A change data capture (CDC) cursor is written to the MOLT Fetch output as `cdc_cursor` at the beginning and end of the fetch task. For MySQL: @@ -858,39 +753,25 @@ For Oracle: {"level":"info","type":"summary","fetch_id":"735a4fe0-c478-4de7-a342-cfa9738783dc","num_tables":3,"tables":["migration_schema.employees"],"cdc_cursor":"backfillFromSCN=26685444,scn=26685786","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:37:02-04:00","message":"fetch complete"} ~~~ -Use the `cdc_cursor` value as the checkpoint for MySQL or Oracle replication with [MOLT Replicator]({% link molt/molt-replicator.md %}#replication-checkpoints). - -You can also use the `cdc_cursor` value with an external change data capture (CDC) tool to continuously replicate subsequent changes from the source database to CockroachDB. - -## Security - -Cockroach Labs strongly recommends the following security practices. +This `cdc_cursor` value is also included in the output of a fetch task from a PostgreSQL source. However, in the case of a PostgreSQL source, you can instead enable replication with the [`--pglogical-replication-slot-name`]({% link molt/molt-fetch-commands-and-flags.md %}#pglogical-replication-slot-name) and [`--pglogical-publication-and-slot-drop-and-recreate`]({% link molt/molt-fetch-commands-and-flags.md %}#pglogical-publication-and-slot-drop-and-recreate) flags, which must be defined. -### Connection security +Use the `cdc_cursor` value as the checkpoint for MySQL or Oracle replication with MOLT Replicator. Use the [`--pglogical-replication-slot-name`]({% link molt/molt-fetch-commands-and-flags.md %}#pglogical-replication-slot-name) value as the checkpoint for PostgreSQL replication with MOLT Replicator. Refer to [Replication checkpoints]({% link molt/molt-replicator.md %}#replication-checkpoints) in the MOLT Replicator documentation. -{% include molt/molt-secure-connection-strings.md %} - -{{site.data.alerts.callout_info}} -By default, insecure connections (i.e., `sslmode=disable` on PostgreSQL; `sslmode` not set on MySQL) are disallowed. When using an insecure connection, `molt fetch` returns an error. To override this check, you can enable the `--allow-tls-mode-disable` flag. Do this **only** when testing, or if a secure SSL/TLS connection to the source or target database is not possible. -{{site.data.alerts.end}} - -### Cloud storage security - -{% include molt/fetch-secure-cloud-storage.md %} +You can also use the `cdc_cursor` value with an external change data capture (CDC) tool to continuously replicate subsequent changes from the source database to CockroachDB. -## Common workflows +## Common uses ### Bulk data load +When migrating data to CockroachDB in a bulk load (without utilizing [continuous replication]({% link molt/migration-considerations-replication.md %}) to minimize system downtime), run the `molt fetch` command with the required flags, as shown below: +
-To perform a bulk data load migration from your source database to CockroachDB, run the `molt fetch` command with the required flags. - -Specify the source and target database connections. For connection string formats, refer to [Source and target databases](#source-and-target-databases). +Specify the source and target database connections. For connection string formats, refer to [Source and target databases](#specify-source-and-target-databases).
{% include_cached copy-clipboard.html %} @@ -901,7 +782,7 @@ Specify the source and target database connections. For connection string format
-For Oracle Multitenant (CDB/PDB) sources, also include `--source-cdb` to specify the container database (CDB) connection string. +For Oracle Multitenant (CDB/PDB) sources, also include [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) to specify the container database (CDB) connection string. {% include_cached copy-clipboard.html %} ~~~ @@ -944,7 +825,7 @@ Optionally, filter the source data to migrate. By default, all schemas and table
-For Oracle sources, `--schema-filter` is case-insensitive. You can use either lowercase or uppercase: +For Oracle sources, [`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) is case-insensitive. You can use either lowercase or uppercase: {% include_cached copy-clipboard.html %} ~~~ @@ -954,7 +835,7 @@ For Oracle sources, `--schema-filter` is case-insensitive. You can use either lo
-For MySQL sources, omit `--schema-filter` because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. If needed, use `--table-filter` to select specific tables: +For MySQL sources, omit [`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. If needed, use [`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter) to select specific tables: {% include_cached copy-clipboard.html %} ~~~ @@ -962,14 +843,14 @@ For MySQL sources, omit `--schema-filter` because MySQL tables belong directly t ~~~
-Specify how to handle target tables. By default, `--table-handling` is set to `none`, which loads data without changing existing data in the tables. For details, refer to [Target table handling](#target-table-handling): +Specify how to handle target tables. By default, [`--table-handling`]({% link molt/molt-fetch-commands-and-flags.md %}#table-handling) is set to `none`, which loads data without changing existing data in the tables. For details, refer to [Target table handling](#handle-target-tables): {% include_cached copy-clipboard.html %} ~~~ --table-handling truncate-if-exists ~~~ -When performing a bulk load without subsequent replication, use `--ignore-replication-check` to skip querying for replication checkpoints (such as `pg_current_wal_insert_lsn()` on PostgreSQL, `gtid_executed` on MySQL, and `CURRENT_SCN` on Oracle). This is appropriate when: +When performing a bulk load without subsequent replication, use [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check) to skip querying for replication checkpoints (such as `pg_current_wal_insert_lsn()` on PostgreSQL, `gtid_executed` on MySQL, and `CURRENT_SCN` on Oracle). This is appropriate when: - Performing a one-time data migration with no plan to replicate ongoing changes. - Exporting data from a read replica where replication checkpoints are unavailable. @@ -979,7 +860,7 @@ When performing a bulk load without subsequent replication, use `--ignore-replic --ignore-replication-check ~~~ -At minimum, the `molt fetch` command should include the source, target, data path, and `--ignore-replication-check` flags: +At minimum, the `molt fetch` command should include the source, target, data path, and [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check) flags: {% include_cached copy-clipboard.html %} ~~~ shell @@ -990,9 +871,14 @@ molt fetch \ --ignore-replication-check ~~~ -For detailed steps, refer to [Bulk load migration]({% link molt/migrate-bulk-load.md %}). +For detailed walkthroughs of migrations that use `molt fetch` in this way, refer to these common migration approaches: + +- [Classic Bulk Load Migration]({% link molt/migration-approach-classic-bulk-load.md %}) +- [Phased Bulk Load Migration]({% link molt/migration-approach-phased-bulk-load.md %}) -### Load before replication +### Initial bulk load (before replication) + +In a migration that utilizes [continuous replication]({% link molt/migration-considerations-replication.md %}), perform an initial data load before [setting up ongoing replication with MOLT Replicator]({% link molt/molt-replicator.md %}#forward-replication-after-initial-load). Run the `molt fetch` command without [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check), as shown below:
@@ -1000,15 +886,14 @@ For detailed steps, refer to [Bulk load migration]({% link molt/migrate-bulk-loa
-To perform an initial data load before setting up ongoing replication with [MOLT Replicator]({% link molt/molt-replicator.md %}), run the `molt fetch` command without `--ignore-replication-check`. This captures replication checkpoints during the data load. - The workflow is the same as [Bulk data load](#bulk-data-load), except: -- Exclude `--ignore-replication-check`. MOLT Fetch will query and record replication checkpoints. +- Exclude [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check). MOLT Fetch will query and record replication checkpoints. +- After the data load completes, check the [CDC cursor](#enable-replication) in the output for the checkpoint value to use with MOLT Replicator. +
-- You must include `--pglogical-replication-slot-name` and `--pglogical-publication-and-slot-drop-and-recreate` to automatically create the publication and replication slot during the data load. +- You must include [`--pglogical-replication-slot-name`]({% link molt/molt-fetch-commands-and-flags.md %}#pglogical-replication-slot-name) and [`--pglogical-publication-and-slot-drop-and-recreate`]({% link molt/molt-fetch-commands-and-flags.md %}#pglogical-publication-and-slot-drop-and-recreate) to automatically create the publication and replication slot during the data load.
-- After the data load completes, check the [CDC cursor](#cdc-cursor) in the output for the checkpoint value to use with MOLT Replicator. At minimum, the `molt fetch` command should include the source, target, and data path flags: @@ -1052,87 +937,24 @@ The output will include a `cdc_cursor` value at the end of the fetch task: ~~~
-Use this `cdc_cursor` value when starting MOLT Replicator to ensure replication begins from the correct position. For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-replicate.md %}). +Use this `cdc_cursor` value when starting MOLT Replicator to ensure replication begins from the correct position.
-## Monitoring - -### Metrics - -By default, MOLT Fetch exports [Prometheus](https://prometheus.io/) metrics at `127.0.0.1:3030/metrics`. You can configure this endpoint with the `--metrics-listen-addr` [flag](#global-flags). - -Cockroach Labs recommends monitoring the following metrics: - -| Metric Name | Description | -|---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------| -| `molt_fetch_num_tables` | Number of tables that will be moved from the source. | -| `molt_fetch_num_task_errors` | Number of errors encountered by the fetch task. | -| `molt_fetch_overall_duration` | Duration (in seconds) of the fetch task. | -| `molt_fetch_rows_exported` | Number of rows that have been exported from a table. For example:
`molt_fetch_rows_exported{table="public.users"}` | -| `molt_fetch_rows_imported` | Number of rows that have been imported from a table. For example:
`molt_fetch_rows_imported{table="public.users"}` | -| `molt_fetch_table_export_duration_ms` | Duration (in milliseconds) of a table's export. For example:
`molt_fetch_table_export_duration_ms{table="public.users"}` | -| `molt_fetch_table_import_duration_ms` | Duration (in milliseconds) of a table's import. For example:
`molt_fetch_table_import_duration_ms{table="public.users"}` | - -To visualize the preceding metrics, use the Grafana dashboard [bundled with your binary]({% link molt/molt-fetch.md %}#installation) (`grafana_dashboard.json`). The bundled dashboard matches your binary version. Alternatively, you can download the [latest dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json). - -## Best practices - -### Test and validate - -To verify that your connections and configuration work properly, run MOLT Fetch in a staging environment before migrating any data in production. Use a test or development environment that closely resembles production. - -### Configure the source database and connection - -- To prevent connections from terminating prematurely during the [data export phase](#data-export-phase), set the following to high values on the source database: +For detailed walkthroughs of migrations that use `molt fetch` in this way, refer to these common migration approaches: - - **Maximum allowed number of connections.** MOLT Fetch can export data across multiple connections. The number of connections it will create is the number of shards ([`--export-concurrency`](#global-flags)) multiplied by the number of tables ([`--table-concurrency`](#global-flags)) being exported concurrently. +- [Delta Migration]({% link molt/migration-approach-delta.md %}) +- [Phased Delta Migration with Failback Replication]({% link molt/migration-approach-phased-delta-failback.md %}) - {{site.data.alerts.callout_info}} - With the default numerical range sharding, only tables with [primary key]({% link {{ site.current_cloud_version }}/primary-key.md %}) types of [`INT`]({% link {{ site.current_cloud_version }}/int.md %}), [`FLOAT`]({% link {{ site.current_cloud_version }}/float.md %}), or [`UUID`]({% link {{ site.current_cloud_version }}/uuid.md %}) can be sharded. PostgreSQL users can enable [`--use-stats-based-sharding`](#global-flags) to use statistics-based sharding for tables with primary keys of any data type. For details, refer to [Table sharding](#table-sharding). - {{site.data.alerts.end}} - - - **Maximum lifetime of a connection.** - -- If a PostgreSQL database is set as a [source](#source-and-target-databases), ensure that [`idle_in_transaction_session_timeout`](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUT) on PostgreSQL is either disabled or set to a value longer than the duration of the [data export phase](#data-export-phase). Otherwise, the connection will be prematurely terminated. To estimate the time needed to export the PostgreSQL tables, you can perform a dry run and sum the value of [`molt_fetch_table_export_duration_ms`](#monitoring) for all exported tables. - -### Optimize performance - -- {% include molt/molt-drop-constraints-indexes.md %} - -- For PostgreSQL sources using [`--use-stats-based-sharding`](#global-flags), run [`ANALYZE`]({% link {{ site.current_cloud_version }}/create-statistics.md %}) on source tables before migration to ensure optimal shard distribution. This is especially important for large tables where even distribution can significantly improve export performance. - -- To prevent memory outages during `READ COMMITTED` [data export](#data-export-phase) of tables with large rows, estimate the amount of memory used to export a table: - - ~~~ - --row-batch-size * --export-concurrency * average size of the table rows - ~~~ - - If you are exporting more than one table at a time (i.e., [`--table-concurrency`](#global-flags) is set higher than `1`), add the estimated memory usage for the tables with the largest row sizes. Ensure that you have sufficient memory to run `molt fetch`, and adjust `--row-batch-size` accordingly. For details on how concurrency and sharding interact, refer to [Table sharding](#table-sharding). - -- If a table in the source database is much larger than the other tables, [filter and export the largest table](#schema-and-table-selection) in its own `molt fetch` task. Repeat this for each of the largest tables. Then export the remaining tables in another task. - -- Ensure that the machine running MOLT Fetch is large enough to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the `molt_fetch_rows_exported` [metric](#monitoring) for decreases in the number of rows being processed. You can use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view metrics. For details on optimizing export performance through sharding, refer to [Table sharding](#table-sharding). - -### Import and continuation handling - -- When using [`IMPORT INTO`](#data-load-mode) during the [data import phase](#data-import-phase) to load tables into CockroachDB, if the fetch task terminates before the import job completes, the hanging import job on the target database will keep the table offline. To make this table accessible again, [manually resume or cancel the job]({% link {{site.current_cloud_version}}/import-into.md %}#view-and-control-import-jobs). Then resume `molt fetch` using [continuation](#fetch-continuation), or restart the task from the beginning. - -## Troubleshooting - -
- - - -
+## Known limitations -{% include molt/molt-troubleshooting-fetch.md %} +{% include molt/molt-limitations-fetch.md %} ## See also +- [MOLT Fetch Installation]({% link molt/molt-fetch-installation.md %}) +- [MOLT Fetch Commands and Flags]({% link molt/molt-fetch-commands-and-flags.md %}) +- [MOLT Fetch Metrics]({% link molt/molt-fetch-monitoring.md %}) +- [MOLT Fetch Best Practices]({% link molt/molt-fetch-best-practices.md %}) +- [MOLT Fetch Troubleshooting]({% link molt/molt-fetch-troubleshooting.md %}) - [Migration Overview]({% link molt/migration-overview.md %}) -- [Migration Strategy]({% link molt/migration-strategy.md %}) -- [MOLT Replicator]({% link molt/molt-replicator.md %}) -- [MOLT Verify]({% link molt/molt-verify.md %}) -- [Load and replicate]({% link molt/migrate-load-replicate.md %}) -- [Resume Replication]({% link molt/migrate-resume-replication.md %}) -- [Migration Failback]({% link molt/migrate-failback.md %}) \ No newline at end of file +- [MOLT Replicator]({% link molt/molt-replicator.md %}) \ No newline at end of file diff --git a/src/current/molt/molt-replicator-best-practices.md b/src/current/molt/molt-replicator-best-practices.md new file mode 100644 index 00000000000..7f2fc29bfd1 --- /dev/null +++ b/src/current/molt/molt-replicator-best-practices.md @@ -0,0 +1,149 @@ +--- +title: MOLT Replicator Best Practices +summary: Learn best practices for using MOLT Replicator for continuous replication. +toc: true +docs_area: migrate +--- + +This page describes best practices for using [MOLT Replicator]({% link molt/molt-replicator.md %}) to ensure reliable, secure, and performant data migration to CockroachDB. + +## Test and validate + +To verify that your connections and configuration work properly, run MOLT Replicator in a staging environment before replicating any data in production. Use a test or development environment that closely resembles production. + +## Optimize performance + +{% include molt/optimize-replicator-performance.md %} + +## Security + +Cockroach Labs **strongly** recommends the following: + +### Connection security and credentials + +{% include molt/molt-secure-connection-strings.md %} + +### CockroachDB changefeed security + +For failback scenarios, secure the connection from CockroachDB to MOLT Replicator using TLS certificates. Generate TLS certificates using self-signed certificates, certificate authorities like Let's Encrypt, or your organization's certificate management system. + +#### TLS from CockroachDB to Replicator + +Configure MOLT Replicator with server certificates using the [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) and [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key) flags to specify the certificate and private key file paths. For example: + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator start \ +--tlsCertificate ./certs/server.crt \ +--tlsPrivateKey ./certs/server.key \ +... +~~~ + +These server certificates must correspond to the client certificates specified in the changefeed webhook URL to ensure proper TLS handshake. + +Encode client certificates for changefeed webhook URLs: + +- Webhook URLs: Use both URL encoding and base64 encoding: `base64 -i ./client.crt | jq -R -r '@uri'` +- Non-webhook contexts: Use base64 encoding only: `base64 -w 0 ca.cert` + +#### JWT authentication + +You can use JSON Web Tokens (JWT) to authorize incoming changefeed connections and restrict writes to a subset of SQL databases or user-defined schemas in the target cluster. + +Replicator supports JWT claims that allow writes to specific databases, schemas, or all of them. JWT tokens must be signed using RSA or EC keys. HMAC and `None` signatures are automatically rejected. + +To configure JWT authentication: + +1. Add PEM-formatted public signing keys to the `_replicator.jwt_public_keys` table in the staging database. + +1. To revoke a specific token, add its `jti` value to the `_replicator.jwt_revoked_ids` table in the staging database. + +The Replicator process re-reads these tables every minute to pick up changes. + +To pass the JWT token from the changefeed to the Replicator webhook sink, use the [`webhook_auth_header` option]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options): + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE CHANGEFEED ... WITH webhook_auth_header='Bearer '; +~~~ + +##### Token quickstart + +The following example uses `OpenSSL` to generate keys, but any PEM-encoded RSA or EC keys will work. + +{% include_cached copy-clipboard.html %} +~~~ shell +# Generate an EC private key using OpenSSL. +openssl ecparam -out ec.key -genkey -name prime256v1 + +# Write the public key components to a separate file. +openssl ec -in ec.key -pubout -out ec.pub + +# Upload the public key for all instances of Replicator to find it. +cockroach sql -e "INSERT INTO _replicator.jwt_public_keys (public_key) VALUES ('$(cat ec.pub)')" + +# Reload configuration, or wait one minute. +killall -HUP replicator + +# Generate a token which can write to the ycsb.public schema. +# The key can be decoded using the debugger at https://jwt.io. +# Add the contents of out.jwt to the CREATE CHANGEFEED command: +# WITH webhook_auth_header='Bearer {out.jwt}' +replicator make-jwt -k ec.key -a ycsb.public -o out.jwt +~~~ + +##### External JWT providers + +The `make-jwt` command also supports a [`--claim`]({% link molt/replicator-flags.md %}#claim) flag, which prints a JWT claim that can be signed by your existing JWT provider. The PEM-formatted public key or keys for that provider must be inserted into the `_replicator.jwt_public_keys` table. The `iss` (issuers) and `jti` (token id) fields will likely be specific to your auth provider, but the custom claim must be retained in its entirety. + +{{site.data.alerts.callout_success}} +You can repeat the [`-a`]({% link molt/replicator-flags.md %}#allow) flag to create a claim for multiple schemas. +{{site.data.alerts.end}} + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator make-jwt -a 'database.schema' --claim +~~~ + +~~~json +{ + "iss": "replicator", + "jti": "d5ffa211-8d54-424b-819a-bc19af9202a5", + "https://github.com/cockroachdb/replicator": { + "schemas": [ + [ + "database", + "schema" + ] + ] + } +} +~~~ + +### Production considerations + +- Avoid [`--disableAuthentication`]({% link molt/replicator-flags.md %}#disable-authentication) and [`--tlsSelfSigned`]({% link molt/replicator-flags.md %}#tls-self-signed) flags in production environments. These flags should only be used for testing or development purposes. + +### Supply chain security + +Use the `version` command to verify the integrity of your MOLT Replicator build and identify potential upstream vulnerabilities. + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator version +~~~ + +The output includes: + +- Module name +- go.mod checksum +- Version + +Use this information to determine if your build may be subject to vulnerabilities from upstream packages. Cockroach Labs uses Dependabot to automatically upgrade Go modules, and the team regularly merges Dependabot updates to address security issues. + +## See also + +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Replicator Installation]({% link molt/molt-replicator-installation.md %}) +- [MOLT Replicator Flags]({% link molt/replicator-flags.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) diff --git a/src/current/molt/molt-replicator-installation.md b/src/current/molt/molt-replicator-installation.md new file mode 100644 index 00000000000..b836a83ce45 --- /dev/null +++ b/src/current/molt/molt-replicator-installation.md @@ -0,0 +1,56 @@ +--- +title: MOLT Replicator Installation +summary: Learn how to install MOLT Replicator and configure prerequisites for continuous replication. +toc: true +docs_area: migrate +--- + +This page explains the prequisites for using [MOLT Replicator]({% link molt/molt-replicator.md %}) and then describes how to install it. + +## Prerequisites + +### Supported databases + +MOLT Replicator supports the following source and target databases: + +- PostgreSQL 11-16 +- MySQL 5.7, 8.0 and later +- Oracle Database 19c (Enterprise Edition) and 21c (Express Edition) +- CockroachDB (all currently [supported versions]({% link releases/release-support-policy.md %}#supported-versions)) + +### Database configuration + +The source database must be configured for replication: + +| Database | Configuration Requirements | Examples | +|-------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------| +| PostgreSQL source | | [Configure PostgreSQL for replication]({% link molt/delta-migration-postgres.md %}#configure-source-database-for-replication) | +| MySQL source | | [Configure MySQL for replication]({% link molt/delta-migration-mysql.md %}#configure-source-database-for-replication) | +| Oracle source | | [Configure Oracle for replication]({% link molt/delta-migration-oracle.md %}#configure-source-database-for-replication) | +| CockroachDB source (failback) | | [Configure CockroachDB for replication]({% link molt/phased-delta-failback-postgres.md %}#prepare-the-cockroachdb-cluster) | + +### User permissions + +The SQL user running MOLT Replicator requires specific privileges on both the source and target databases: + +| Database | Required Privileges | Examples | +|------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PostgreSQL source | | [Create PostgreSQL migration user]({% link molt/delta-migration-postgres.md %}#create-migration-user-on-source-database) | +| MySQL source | | [Create MySQL migration user]({% link molt/delta-migration-mysql.md %}#create-migration-user-on-source-database) | +| Oracle source | | [Create Oracle migration user]({% link molt/delta-migration-oracle.md %}#create-migration-user-on-source-database)

[Create sentinel table]({% link molt/delta-migration-oracle.md %}#create-source-sentinel-table)

[Grant LogMiner privileges]({% link molt/delta-migration-oracle.md %}#grant-logminer-privileges) | +| CockroachDB target (forward replication) | | [Create CockroachDB user]({% link molt/delta-migration-postgres.md %}#create-the-sql-user) | +| PostgreSQL, MySQL, or Oracle target (failback) | | [Grant PostgreSQL user permissions]({% link molt/phased-delta-failback-postgres.md %}#grant-target-database-user-permissions)

[Grant MySQL user permissions]({% link molt/phased-delta-failback-mysql.md %}?filter=mysql#grant-target-database-user-permissions)

[Grant Oracle user permissions]({% link molt/phased-delta-failback-oracle.md %}?filter=oracle#grant-target-database-user-permissions) | + +## Installation + +{% include molt/molt-install.md %} + +### Docker usage + +{% include molt/molt-docker.md %} + +## See also + +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Replicator Flags]({% link molt/replicator-flags.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) diff --git a/src/current/molt/molt-replicator-troubleshooting.md b/src/current/molt/molt-replicator-troubleshooting.md new file mode 100644 index 00000000000..ed4ab1fa7cf --- /dev/null +++ b/src/current/molt/molt-replicator-troubleshooting.md @@ -0,0 +1,25 @@ +--- +title: MOLT Replicator Troubleshooting +summary: Troubleshoot common issues with MOLT Replicator during continuous replication. +toc: true +docs_area: migrate +--- + +This page describes common issues that can occur while using [MOLT Replicator]({% link molt/molt-replicator.md %}) and suggests ways to troubleshoot those issues. + +
+ + + +
+ +{% include molt/molt-troubleshooting-replication.md %} + +{% include molt/molt-troubleshooting-failback.md %} + +## See also + +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Replicator Installation]({% link molt/molt-replicator-installation.md %}) +- [MOLT Replicator Flags]({% link molt/replicator-flags.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) diff --git a/src/current/molt/molt-replicator.md b/src/current/molt/molt-replicator.md index 319ed1c2bcb..086f8867f20 100644 --- a/src/current/molt/molt-replicator.md +++ b/src/current/molt/molt-replicator.md @@ -7,107 +7,30 @@ docs_area: migrate MOLT Replicator continuously replicates changes from a source database to CockroachDB as part of a [database migration]({% link molt/migration-overview.md %}). It supports migrations from a source database to CockroachDB with minimal downtime, and enables backfill from CockroachDB to your source database for failback scenarios to preserve a rollback option during a migration window. -MOLT Replicator consumes change data from PostgreSQL [logical replication](https://www.postgresql.org/docs/current/logical-replication.html) streams, MySQL [GTID-based replication](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html), Oracle [LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/21/sutil/oracle-logminer-utility.html), and [CockroachDB changefeeds]({% link {{ site.current_cloud_version }}/change-data-capture-overview.md %}) (for failback). For details, refer to [How it works](#how-it-works). +MOLT Replicator consumes change data from PostgreSQL [logical replication](https://www.postgresql.org/docs/current/logical-replication.html) streams, MySQL [GTID-based replication](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html), Oracle [LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/21/sutil/oracle-logminer-utility.html), and [CockroachDB changefeeds]({% link {{ site.current_cloud_version }}/change-data-capture-overview.md %}) (for failback). Read more about [MOLT Replicator prerequisites]({% link molt/molt-replicator-installation.md %}#prerequisites). ## Terminology - *Checkpoint*: The position in the source database's transaction log from which replication begins or resumes: LSN (PostgreSQL), GTID (MySQL), or SCN (Oracle). - *Staging database*: A CockroachDB database used by Replicator to store replication metadata, checkpoints, and buffered mutations. Specified with [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) and automatically created with [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema). For details, refer to [Staging database](#staging-database). -- *Forward replication*: Replicate changes from a source database (PostgreSQL, MySQL, or Oracle) to CockroachDB during a migration. For usage details, refer to [Forward replication with initial load](#forward-replication-with-initial-load). -- *Failback*: Replicate changes from CockroachDB back to the source database. Used for migration rollback or to maintain data consistency on the source during migration. For usage details, refer to [Failback to source database](#failback-to-source-database). - -## Prerequisites - -### Supported databases - -MOLT Replicator supports the following source and target databases: - -- PostgreSQL 11-16 -- MySQL 5.7, 8.0 and later -- Oracle Database 19c (Enterprise Edition) and 21c (Express Edition) -- CockroachDB (all currently [supported versions]({% link releases/release-support-policy.md %}#supported-versions)) - -### Database configuration - -The source database must be configured for replication: - -| Database | Configuration Requirements | Details | -|-------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------| -| PostgreSQL source | | [Configure PostgreSQL for replication]({% link molt/migrate-load-replicate.md %}#configure-source-database-for-replication) | -| MySQL source | | [Configure MySQL for replication]({% link molt/migrate-load-replicate.md %}?filters=mysql#configure-source-database-for-replication) | -| Oracle source | | [Configure Oracle for replication]({% link molt/migrate-load-replicate.md %}?filters=oracle#configure-source-database-for-replication) | -| CockroachDB source (failback) | | [Configure CockroachDB for replication]({% link molt/migrate-failback.md %}#prepare-the-cockroachdb-cluster) | - -### User permissions - -The SQL user running MOLT Replicator requires specific privileges on both the source and target databases: - -| Database | Required Privileges | Details | -|------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| PostgreSQL source | | [Create PostgreSQL migration user]({% link molt/migrate-load-replicate.md %}#create-migration-user-on-source-database) | -| MySQL source | | [Create MySQL migration user]({% link molt/migrate-load-replicate.md %}?filters=mysql#create-migration-user-on-source-database) | -| Oracle source | | [Create Oracle migration user]({% link molt/migrate-load-replicate.md %}?filters=oracle#create-migration-user-on-source-database)

[Create sentinel table]({% link molt/migrate-load-replicate.md %}?filters=oracle#create-source-sentinel-table)

[Grant LogMiner privileges]({% link molt/migrate-load-replicate.md %}?filters=oracle#grant-logminer-privileges) | -| CockroachDB target (forward replication) | | [Create CockroachDB user]({% link molt/migrate-load-replicate.md %}#create-the-sql-user) | -| PostgreSQL, MySQL, or Oracle target (failback) | | [Grant PostgreSQL user permissions]({% link molt/migrate-failback.md %}#grant-target-database-user-permissions)

[Grant MySQL user permissions]({% link molt/migrate-failback.md %}?filter=mysql#grant-target-database-user-permissions)

[Grant Oracle user permissions]({% link molt/migrate-failback.md %}?filter=oracle#grant-target-database-user-permissions) | - -## Installation - -{% include molt/molt-install.md %} - -### Docker usage - -{% include molt/molt-docker.md %} +- *Forward replication*: Replicate changes from a source database (PostgreSQL, MySQL, or Oracle) to CockroachDB during a migration. For usage details, refer to [Forward replication (after initial load)](#forward-replication-after-initial-load). +- *Failback*: Replicate changes from CockroachDB back to the source database. Used for migration rollback or to maintain data consistency on the source during migration. For usage details, refer to [Failback replication](#failback-replication). ## How it works MOLT Replicator supports forward replication from PostgreSQL, MySQL, and Oracle, and failback from CockroachDB: -- PostgreSQL source ([`pglogical`](#commands)): MOLT Replicator uses [PostgreSQL logical replication](https://www.postgresql.org/docs/current/logical-replication.html), which is based on publications and replication slots. You create a publication for the target tables, and a slot marks consistent replication points. MOLT Replicator consumes this logical feed directly and applies the data in sorted batches to the target. - -- MySQL source ([`mylogical`](#commands)): MOLT Replicator relies on [MySQL GTID-based replication](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) to read change data from MySQL binlogs. It works with MySQL versions that support GTID-based replication and applies transactionally consistent feeds to the target. Binlog features that do not use GTIDs are not supported. - -- Oracle source ([`oraclelogminer`](#commands)): MOLT Replicator uses [Oracle LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/21/sutil/oracle-logminer-utility.html) to capture change data from Oracle redo logs. Both Oracle Multitenant (CDB/PDB) and single-tenant Oracle architectures are supported. Replicator periodically queries LogMiner-populated views and processes transactional data in ascending SCN windows for reliable throughput while maintaining consistency. - -- Failback from CockroachDB ([`start`](#commands)): MOLT Replicator acts as an HTTP webhook sink for a single CockroachDB changefeed. Replicator receives mutations from source cluster nodes, can optionally buffer them in a CockroachDB staging cluster, and then applies time-ordered transactional batches to the target database. Mutations are applied as [`UPSERT`]({% link {{ site.current_cloud_version }}/upsert.md %}) or [`DELETE`]({% link {{ site.current_cloud_version }}/delete.md %}) statements while respecting [foreign-key]({% link {{ site.current_cloud_version }}/foreign-key.md %}) and table dependencies. - -### Consistency modes - -MOLT Replicator supports three consistency modes for balancing throughput and transactional guarantees: - -1. *Consistent* (failback mode only, default for CockroachDB sources): Preserves per-row order and source transaction atomicity. Concurrent transactions are controlled by [`--parallelism`]({% link molt/replicator-flags.md %}#parallelism). - -1. *BestEffort* (failback mode only): Relaxes atomicity across tables that do not have foreign key constraints between them (maintains coherence within FK-connected groups). Enable with [`--bestEffortOnly`]({% link molt/replicator-flags.md %}#best-effort-only) or allow auto-entry via [`--bestEffortWindow`]({% link molt/replicator-flags.md %}#best-effort-window) set to a positive duration (such as `1s`). - - {{site.data.alerts.callout_info}} - For independent tables (with no foreign key constraints), BestEffort mode applies changes immediately as they arrive, without waiting for the resolved timestamp. This provides higher throughput for tables that have no relationships with other tables. - {{site.data.alerts.end}} - -1. *Immediate* (default for PostgreSQL, MySQL, and Oracle sources): Applies updates as they arrive to Replicator with no buffering or waiting for resolved timestamps. For CockroachDB sources, provides highest throughput but requires no foreign keys on the target schema. - -## Commands - -MOLT Replicator provides the following commands: - -| Command | Description | -|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `pglogical` | Replicate from PostgreSQL source to CockroachDB target using logical replication. | -| `mylogical` | Replicate from MySQL source to CockroachDB target using GTID-based replication. | -| `oraclelogminer` | Replicate from Oracle source to CockroachDB target using Oracle LogMiner. | -| `start` | Replicate from CockroachDB source to PostgreSQL, MySQL, or Oracle target ([failback mode](#failback-to-source-database)). Requires a CockroachDB changefeed with rangefeeds enabled. | -| `make-jwt` | Generate JWT tokens for authorizing changefeed connections in failback scenarios. Supports signing tokens with RSA or EC keys, or generating claims for external JWT providers. For details, refer to [JWT authentication](#jwt-authentication). | -| `version` | Display version information and Go module dependencies with checksums. For details, refer to [Supply chain security](#supply-chain-security). | - -For command-specific flags and examples, refer to [Usage](#usage) and [Common workflows](#common-workflows). +- PostgreSQL source ([`pglogical`]({% link molt/replicator-flags.md %}#commands)): MOLT Replicator uses [PostgreSQL logical replication](https://www.postgresql.org/docs/current/logical-replication.html), which is based on publications and replication slots. You create a publication for the target tables, and a slot marks consistent replication points. MOLT Replicator consumes this logical feed directly and applies the data in sorted batches to the target. -## Flags +- MySQL source ([`mylogical`]({% link molt/replicator-flags.md %}#commands)): MOLT Replicator relies on [MySQL GTID-based replication](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) to read change data from MySQL binlogs. It works with MySQL versions that support GTID-based replication and applies transactionally consistent feeds to the target. Binlog features that do not use GTIDs are not supported. -Refer to [Replicator Flags]({% link molt/replicator-flags.md %}). +- Oracle source ([`oraclelogminer`]({% link molt/replicator-flags.md %}#commands)): MOLT Replicator uses [Oracle LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/21/sutil/oracle-logminer-utility.html) to capture change data from Oracle redo logs. Both Oracle Multitenant (CDB/PDB) and single-tenant Oracle architectures are supported. Replicator periodically queries LogMiner-populated views and processes transactional data in ascending SCN windows for reliable throughput while maintaining consistency. -## Usage +- Failback from CockroachDB ([`start`]({% link molt/replicator-flags.md %}#commands)): MOLT Replicator acts as an HTTPS [webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink) for a single CockroachDB changefeed. Replicator receives mutations from source cluster nodes, can optionally buffer them in a CockroachDB staging cluster, and then applies time-ordered transactional batches to the target database. Mutations are applied as [`UPSERT`]({% link {{ site.current_cloud_version }}/upsert.md %}) or [`DELETE`]({% link {{ site.current_cloud_version }}/delete.md %}) statements while respecting [foreign-key]({% link {{ site.current_cloud_version }}/foreign-key.md %}) and table dependencies. ### Replicator commands -MOLT Replicator provides four commands for different replication scenarios. For detailed workflows, refer to [Common workflows](#common-workflows). +MOLT Replicator provides four commands for different replication scenarios. For example commands, refer to [Common uses](#common-uses). Use `pglogical` to replicate from PostgreSQL to CockroachDB: @@ -140,7 +63,7 @@ replicator start ### Source connection strings {{site.data.alerts.callout_success}} -Follow the security recommendations in [Connection security and credentials](#connection-security-and-credentials). +Follow the security recommendations in [Connection security and credentials]({% link molt/molt-replicator-best-practices.md %}#connection-security-and-credentials). {{site.data.alerts.end}} [`--sourceConn`]({% link molt/replicator-flags.md %}#source-conn) specifies the connection string of the source database for forward replication. @@ -195,28 +118,28 @@ For failback, [`--stagingConn`]({% link molt/replicator-flags.md %}#staging-conn ~~~ {{site.data.alerts.callout_info}} -For failback, [`--targetConn`]({% link molt/replicator-flags.md %}#target-conn) specifies the original source database (PostgreSQL, MySQL, or Oracle). For details, refer to [Failback to source database](#failback-to-source-database). +For failback, [`--targetConn`]({% link molt/replicator-flags.md %}#target-conn) specifies the original source database (PostgreSQL, MySQL, or Oracle). For details, refer to [Failback replication](#failback-replication). {{site.data.alerts.end}} ### Replication checkpoints MOLT Replicator requires a checkpoint value to start replication from the correct position in the source database's transaction log. -For PostgreSQL, use [`--slotName`]({% link molt/replicator-flags.md %}#slot-name) to specify the [replication slot created during the data load]({% link molt/migrate-load-replicate.md %}#start-fetch). The slot automatically tracks the LSN (Log Sequence Number): +For PostgreSQL, use [`--slotName`]({% link molt/replicator-flags.md %}#slot-name) to specify the [replication slot created during the data load]({% link molt/molt-fetch.md %}#initial-bulk-load-before-replication). The slot automatically tracks the LSN (Log Sequence Number): {% include_cached copy-clipboard.html %} ~~~ --slotName molt_slot ~~~ -For MySQL, set [`--defaultGTIDSet`]({% link molt/replicator-flags.md %}#default-gtid-set) to the [`cdc_cursor` value]({% link molt/molt-fetch.md %}#cdc-cursor) from the MOLT Fetch output: +For MySQL, set [`--defaultGTIDSet`]({% link molt/replicator-flags.md %}#default-gtid-set) to the [`cdc_cursor` value]({% link molt/molt-fetch.md %}#enable-replication) from the MOLT Fetch output: {% include_cached copy-clipboard.html %} ~~~ --defaultGTIDSet '4c658ae6-e8ad-11ef-8449-0242ac140006:1-29' ~~~ -For Oracle, set [`--scn`]({% link molt/replicator-flags.md %}#scn) and [`--backfillFromSCN`]({% link molt/replicator-flags.md %}#backfill-from-scn) to the [`cdc_cursor` values]({% link molt/molt-fetch.md %}#cdc-cursor) from the MOLT Fetch output: +For Oracle, set [`--scn`]({% link molt/replicator-flags.md %}#scn) and [`--backfillFromSCN`]({% link molt/replicator-flags.md %}#backfill-from-scn) to the [`cdc_cursor` values]({% link molt/molt-fetch.md %}#enable-replication) from the MOLT Fetch output: {% include_cached copy-clipboard.html %} ~~~ @@ -241,161 +164,83 @@ The staging database is used to: - Maintain consistency for time-ordered transactional batches while respecting table dependencies. - Provide restart capabilities after failures. -## Security - -Cockroach Labs **strongly** recommends the following: - -### Connection security and credentials - -{% include molt/molt-secure-connection-strings.md %} - -### CockroachDB changefeed security - -For failback scenarios, secure the connection from CockroachDB to MOLT Replicator using TLS certificates. Generate TLS certificates using self-signed certificates, certificate authorities like Let's Encrypt, or your organization's certificate management system. - -#### TLS from CockroachDB to Replicator - -Configure MOLT Replicator with server certificates using the [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) and [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key) flags to specify the certificate and private key file paths. For example: - -{% include_cached copy-clipboard.html %} -~~~ shell -replicator start \ ---tlsCertificate ./certs/server.crt \ ---tlsPrivateKey ./certs/server.key \ -... -~~~ - -These server certificates must correspond to the client certificates specified in the changefeed webhook URL to ensure proper TLS handshake. - -Encode client certificates for changefeed webhook URLs: - -- Webhook URLs: Use both URL encoding and base64 encoding: `base64 -i ./client.crt | jq -R -r '@uri'` -- Non-webhook contexts: Use base64 encoding only: `base64 -w 0 ca.cert` - -#### JWT authentication +### Consistency modes -You can use JSON Web Tokens (JWT) to authorize incoming changefeed connections and restrict writes to a subset of SQL databases or user-defined schemas in the target cluster. +MOLT Replicator supports three consistency modes for [failback replication](#failback-replication), allowing you to balance throughput and transactional guarantees. [Forward replication](#forward-replication-after-initial-load) uses a fixed _immediate_ mode optimized for each source database. The three consistency modes that MOLT Replicator supports are: -Replicator accepts any JWT token that meets the following requirements: +1. *Consistent* (failback mode only, default for CockroachDB sources): Preserves per-row order and source transaction atomicity. Concurrent transactions are controlled by [`--parallelism`]({% link molt/replicator-flags.md %}#parallelism). -- Tokens must be signed using RSA or EC keys. HMAC and `None` signatures are automatically rejected. -- Tokens must include a `jti` (JWT ID) claim for revocation support. -- Tokens must include a custom claim with the schema authorization list. +1. *BestEffort* (failback mode only): Relaxes atomicity across tables that do not have foreign key constraints between them (maintains coherence within FK-connected groups). Enable with [`--bestEffortOnly`]({% link molt/replicator-flags.md %}#best-effort-only). You can also allow auto-entry via [`--bestEffortWindow`]({% link molt/replicator-flags.md %}#best-effort-window) set to a positive duration (such as `1s`). In this case, Replicator will switch from _Consistent_ to _BestEffort_ if the age of a mutation exceeds the duration of this window. -{{site.data.alerts.callout_success}} -You can generate tokens using the [`make-jwt` command](#generate-jwt-tokens). -{{site.data.alerts.end}} + {{site.data.alerts.callout_info}} + For independent tables (with no foreign key constraints), BestEffort mode applies changes immediately as they arrive, without waiting for the resolved timestamp. This provides higher throughput for tables that have no relationships with other tables. + {{site.data.alerts.end}} -To configure JWT authentication: +1. *Immediate* (default for PostgreSQL, MySQL, and Oracle sources): Applies updates as they arrive to Replicator with no buffering or waiting for resolved timestamps. For CockroachDB sources, provides highest throughput but requires no foreign keys on the target schema. -1. Add PEM-formatted public signing keys to the `_replicator.jwt_public_keys` table in the staging database. +### Userscripts -1. To revoke a specific token, add its `jti` value to the `_replicator.jwt_revoked_ids` table in the staging database. +MOLT Replicator can apply *userscripts*, specified with the [`--userscript` flag]({% link molt/replicator-flags.md %}#userscript), to customize how data is processed and transformed as it moves through the live replication pipeline. Userscripts are customized TypeScript files that apply transformation logic to rows of data on a per-schema and per-table basis. -The Replicator process re-reads these tables every minute to pick up changes. +Userscripts are intended to address unique business or data transformation needs. They perform operations that cannot be handled by the source change data capture (CDC) stream, such as filtering out specific tables, rows, or columns; routing data from a single source table to multiple target tables; transforming column values or adding computed columns; and implementing custom error handling. These tranformations occur in-flight, between the source and target databases. -To pass the JWT token from the changefeed to the Replicator webhook sink, use the [`webhook_auth_header` option]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options): +To have MOLT Replicator apply a userscript, include the [`--userscript`]({% link molt/replicator-flags.md %}#userscript) flag with any [Replicator command]({% link molt/replicator-flags.md %}). The flag accepts a path to a TypeScript filename. {% include_cached copy-clipboard.html %} -~~~ sql -CREATE CHANGEFEED ... WITH webhook_auth_header='Bearer '; +~~~ +--userscript 'path/to/script.ts' ~~~ -##### Generate JWT tokens - -The `make-jwt` command generates JWT tokens or claims for authorizing changefeed connections. It requires a signing key ([`-k`]({% link molt/replicator-flags.md %}#key)) and the database or schema to authorize ([`-a`]({% link molt/replicator-flags.md %}#allow)). You can output a signed token to a file ([`-o`]({% link molt/replicator-flags.md %}#out)) or generate an unsigned claim ([`--claim`]({% link molt/replicator-flags.md %}#claim)) for signing with an external JWT provider. +For more information, read the [userscript documentation]({% link molt/userscript-overview.md %}). Learn how to use the [userscript API]({% link molt/userscript-api.md %}) and refer to the [userscript cookbook examples]({% link molt/userscript-cookbook.md %}). -The format of the `-a` argument depends on your target database. For CockroachDB and PostgreSQL, which have a schema concept, use the `database.schema` format: +### Monitoring -{% include_cached copy-clipboard.html %} -~~~ shell -replicator make-jwt -k ec.key -a database_name.schema_name -o out.jwt -~~~ +#### Metrics -For MySQL and Oracle, which do not have a schema concept, use only the database name: +MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: -{% include_cached copy-clipboard.html %} -~~~ shell -replicator make-jwt -k ec.key -a database_name -o out.jwt +~~~ +--metricsAddr :30005 ~~~ -{{site.data.alerts.callout_success}} -You can repeat the [`-a`]({% link molt/replicator-flags.md %}#allow) flag to authorize multiple schemas. -{{site.data.alerts.end}} - -##### Token quickstart - -The following example uses `OpenSSL` to generate keys, but any PEM-encoded RSA or EC keys will work. When using this example, ensure the `-a` argument format matches your target database as specified in [Generate JWT tokens](#generate-jwt-tokens). - -{% include_cached copy-clipboard.html %} -~~~ shell -# Generate an EC private key using OpenSSL. -openssl ecparam -out ec.key -genkey -name prime256v1 - -# Write the public key components to a separate file. -openssl ec -in ec.key -pubout -out ec.pub - -# Upload the public key for all instances of Replicator to find it. -cockroach sql -e "INSERT INTO _replicator.jwt_public_keys (public_key) VALUES ('$(cat ec.pub)')" - -# Reload configuration, or wait one minute. -killall -HUP replicator +Metrics can additionally be written to snapshot files at repeated intervals. Metrics snapshotting is disabled by default. If metrics have been enabled, metrics snapshotting can also be enabled with the [`--metricsSnapshotPeriod`]({% link molt/replicator-flags.md %}#metrics-snapshot-period) flag. For example, the following flag enables metrics snapshotting every 15 seconds: -# Generate a token which can write to the ycsb.public schema. -# The key can be decoded using the debugger at https://jwt.io. -# Add the contents of out.jwt to the CREATE CHANGEFEED command: -# WITH webhook_auth_header='Bearer {out.jwt}' -replicator make-jwt -k ec.key -a ycsb.public -o out.jwt +~~~ +--metricsSnapshotPeriod 15s ~~~ -##### External JWT providers +Metrics snapshots enable access to metrics when the Prometheus server is unavailable, and they can be sent to [CockroachDB support]({% link {{ site.current_cloud_version }}/support-resources.md %}) to help quickly resolve an issue. -To use an external JWT provider, generate a claim with the `--claim` flag. The PEM-formatted public key or keys for that provider must be inserted into the `_replicator.jwt_public_keys` table. The `iss` (issuers) and `jti` (token id) fields will likely be specific to your auth provider, but the custom claim must be retained in its entirety: +For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}). -{% include_cached copy-clipboard.html %} -~~~ shell -replicator make-jwt -a 'database.schema' --claim -~~~ +#### Logging -~~~json -{ - "iss": "replicator", - "jti": "d5ffa211-8d54-424b-819a-bc19af9202a5", - "https://github.com/cockroachdb/replicator": { - "schemas": [ - [ - "database", - "schema" - ] - ] - } -} -~~~ +By default, MOLT Replicator writes two streams of logs: operational logs to `stdout` (including `warning`, `info`, `trace`, and some errors) and final errors to `stderr`. -### Production considerations +Redirect both streams to ensure all logs are captured for troubleshooting: -- Avoid [`--disableAuthentication`]({% link molt/replicator-flags.md %}#disable-authentication) and [`--tlsSelfSigned`]({% link molt/replicator-flags.md %}#tls-self-signed) flags in production environments. These flags should only be used for testing or development purposes. +{% include_cached copy-clipboard.html %} +~~~shell +# Merge both streams to console +./replicator ... 2>&1 -### Supply chain security +# Redirect both streams to a file +./replicator ... > output.log 2>&1 -Use the `version` command to verify the integrity of your MOLT Replicator build and identify potential upstream vulnerabilities. +# Merge streams to console while saving to file +./replicator > >(tee replicator.log) 2>&1 -{% include_cached copy-clipboard.html %} -~~~ shell -replicator version +# Use logDestination flag to write all logs to a file +./replicator --logDestination replicator.log ... ~~~ -The output includes: - -- Module name -- go.mod checksum -- Version +Enable debug logging with [`-v`]({% link molt/replicator-flags.md %}#verbose). For more granularity and system insights, enable trace logging with [`-vv`]({% link molt/replicator-flags.md %}#verbose). Pay close attention to warning- and error-level logs, as these indicate when Replicator is misbehaving. -Use this information to determine if your build may be subject to vulnerabilities from upstream packages. Cockroach Labs uses Dependabot to automatically upgrade Go modules, and the team regularly merges Dependabot updates to address security issues. +## Common uses -## Common workflows +### Forward replication (after initial load) -### Forward replication with initial load +In a migration that utilizes [continuous replication]({% link molt/migration-considerations-replication.md %}), run the `replicator` command after [using MOLT Fetch to perform the initial data load]({% link molt/molt-fetch.md %}#initial-bulk-load-before-replication). Run the `replicator` command with the required flags, as shown below:
@@ -404,7 +249,7 @@ Use this information to determine if your build may be subject to vulnerabilitie
-To start replication after an [initial data load with MOLT Fetch]({% link molt/migrate-load-replicate.md %}#start-fetch), use the `pglogical` command: +To start replication after an initial data load with MOLT Fetch, use the `pglogical` command: {% include_cached copy-clipboard.html %} ~~~ shell @@ -413,7 +258,7 @@ replicator pglogical
-To start replication after an [initial data load with MOLT Fetch]({% link molt/migrate-load-replicate.md %}?filters=mysql#start-fetch), use the `mylogical` command: +To start replication after an initial data load with MOLT Fetch, use the `mylogical` command: {% include_cached copy-clipboard.html %} ~~~ shell @@ -422,7 +267,7 @@ replicator mylogical
-To start replication after an [initial data load with MOLT Fetch]({% link molt/migrate-load-replicate.md %}?filters=oracle#start-fetch), use the `oraclelogminer` command: +To start replication after an initial data load with MOLT Fetch, use the `oraclelogminer` command: {% include_cached copy-clipboard.html %} ~~~ shell @@ -464,7 +309,7 @@ Specify the target schema on CockroachDB with [`--targetSchema`]({% link molt/re To replicate from the correct position, specify the appropriate checkpoint value.
-Use [`--slotName`]({% link molt/replicator-flags.md %}#slot-name) to specify the slot [created during the data load]({% link molt/molt-fetch.md %}#load-before-replication), which automatically tracks the LSN (Log Sequence Number) checkpoint: +Use [`--slotName`]({% link molt/replicator-flags.md %}#slot-name) to specify the slot [created during the data load]({% link molt/molt-fetch.md %}#initial-bulk-load-before-replication), which automatically tracks the LSN (Log Sequence Number) checkpoint: {% include_cached copy-clipboard.html %} ~~~ @@ -513,7 +358,6 @@ replicator pglogical \ --stagingCreateSchema ~~~ -For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-replicate.md %}#start-replicator).
@@ -528,7 +372,6 @@ replicator mylogical \ --stagingCreateSchema ~~~ -For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-replicate.md %}?filters=mysql#start-replicator).
@@ -546,63 +389,16 @@ replicator oraclelogminer \ --stagingCreateSchema ~~~ -For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-replicate.md %}?filters=oracle#start-replicator). -
- -### Resume after interruption - -
- - - -
- -When resuming replication after an interruption, MOLT Replicator automatically uses the stored checkpoint to resume from the correct position. - -Rerun the same `replicator` command used during [forward replication](#forward-replication-with-initial-load), specifying the same fully-qualified [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) value as before. Omit [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema) and any checkpoint flags. For example: - -
-{% include_cached copy-clipboard.html %} -~~~ shell -replicator pglogical \ ---sourceConn $SOURCE \ ---targetConn $TARGET \ ---slotName molt_slot \ ---stagingSchema defaultdb._replicator -~~~ - -For detailed steps, refer to [Resume replication]({% link molt/migrate-resume-replication.md %}).
-
-{% include_cached copy-clipboard.html %} -~~~ shell -replicator mylogical \ ---sourceConn $SOURCE \ ---targetConn $TARGET \ ---stagingSchema defaultdb._replicator -~~~ - -For detailed steps, refer to [Resume replication]({% link molt/migrate-resume-replication.md %}?filters=mysql). -
- -
-{% include_cached copy-clipboard.html %} -~~~ shell -replicator oraclelogminer \ ---sourceConn $SOURCE \ ---sourcePDBConn $SOURCE_PDB \ ---sourceSchema MIGRATION_USER \ ---targetConn $TARGET \ ---stagingSchema defaultdb._replicator -~~~ +For detailed walkthroughs of migrations that use `replicator` in this way, refer to these common migration approaches: -For detailed steps, refer to [Resume replication]({% link molt/migrate-resume-replication.md %}?filters=oracle). -
+- [Delta Migration]({% link molt/migration-approach-delta.md %}) +- [Phased Delta Migration with Failback Replication]({% link molt/migration-approach-phased-delta-failback.md %}) -### Failback to source database +### Failback replication -When replicating from CockroachDB back to the source database, MOLT Replicator acts as a webhook sink for a CockroachDB changefeed. +A migration that utilizes [failback replication]({% link molt/migration-considerations-rollback.md %}) replicates data from the CockroachDB cluster back to the source database. In this case, MOLT Replicator acts as an HTTPS [webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink) for a CockroachDB changefeed. Use the `start` command for failback: @@ -625,7 +421,7 @@ Specify the CockroachDB connection string with [`--stagingConn`]({% link molt/re --stagingConn $STAGING ~~~ -Specify the staging database name with [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) in fully-qualified `database.schema` format. This should be the same staging database created during [Forward replication with initial load](#forward-replication-with-initial-load): +Specify the staging database name with [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) in fully-qualified `database.schema` format. This should be the same staging database created during [Forward replication with initial load](#forward-replication-after-initial-load): {% include_cached copy-clipboard.html %} ~~~ @@ -660,89 +456,99 @@ replicator start \ --tlsPrivateKey ./certs/server.key ~~~ -After starting `replicator`, create a CockroachDB changefeed to send changes to MOLT Replicator. For detailed steps, refer to [Migration failback]({% link molt/migrate-failback.md %}). +
+ +After starting `replicator`, create a CockroachDB changefeed to send changes to MOLT Replicator. For a detailed example, refer to [Phased Delta Migration with Failback Replication]({% link molt/phased-delta-failback-postgres.md %}#create-a-cockroachdb-changefeed). {{site.data.alerts.callout_info}} -When [creating the CockroachDB changefeed]({% link molt/migrate-failback.md %}#create-the-cockroachdb-changefeed), you specify the target database and schema in the webhook URL path. For PostgreSQL targets, use the fully-qualified format `/database/schema` (`/migration_db/migration_schema`). For MySQL targets, use the database name (`/migration_db`). For Oracle targets, use the uppercase schema name (`/MIGRATION_SCHEMA`). +When [creating the CockroachDB changefeed]({% link molt/phased-delta-failback-postgres.md %}#create-a-cockroachdb-changefeed), you specify the target database and schema in the webhook URL path. For PostgreSQL targets, use the fully-qualified format `/database/schema` (`/migration_db/migration_schema`). For MySQL targets, use the database name (`/migration_db`). For Oracle targets, use the uppercase schema name (`/MIGRATION_SCHEMA`). Explicitly set a default `10s` [`webhook_client_timeout`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options) value in the `CREATE CHANGEFEED` statement. This value ensures that the webhook can report failures in inconsistent networking situations and make crash loops more visible. {{site.data.alerts.end}} +
-## Monitoring +
-### Metrics +After starting `replicator`, create a CockroachDB changefeed to send changes to MOLT Replicator. For a detailed example, refer to [Phased Delta Migration with Failback Replication]({% link molt/phased-delta-failback-mysql.md %}#create-a-cockroachdb-changefeed). -MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: - -~~~ ---metricsAddr :30005 -~~~ +{{site.data.alerts.callout_info}} +When [creating the CockroachDB changefeed]({% link molt/phased-delta-failback-mysql.md %}#create-a-cockroachdb-changefeed), you specify the target database and schema in the webhook URL path. For PostgreSQL targets, use the fully-qualified format `/database/schema` (`/migration_db/migration_schema`). For MySQL targets, use the database name (`/migration_db`). For Oracle targets, use the uppercase schema name (`/MIGRATION_SCHEMA`). -Metrics can additionally be written to snapshot files at repeated intervals. Metrics snapshotting is disabled by default. If metrics have been enabled, metrics snapshotting can also be enabled with the [`--metricsSnapshotPeriod`]({% link molt/replicator-flags.md %}#metrics-snapshot-period) flag. For example, the following flag enables metrics snapshotting every 15 seconds: +Explicitly set a default `10s` [`webhook_client_timeout`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options) value in the `CREATE CHANGEFEED` statement. This value ensures that the webhook can report failures in inconsistent networking situations and make crash loops more visible. +{{site.data.alerts.end}} +
-~~~ ---metricsSnapshotPeriod 15s -~~~ +
-Metrics snapshots enable access to metrics when the Prometheus server is unavailable, and they can be sent to [CockroachDB support]({% link {{ site.current_cloud_version }}/support-resources.md %}) to help quickly resolve an issue. +After starting `replicator`, create a CockroachDB changefeed to send changes to MOLT Replicator. For a detailed example, refer to [Phased Delta Migration with Failback Replication]({% link molt/phased-delta-failback-oracle.md %}#create-a-cockroachdb-changefeed). -For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}). +{{site.data.alerts.callout_info}} +When [creating the CockroachDB changefeed]({% link molt/phased-delta-failback-oracle.md %}#create-a-cockroachdb-changefeed), you specify the target database and schema in the webhook URL path. For PostgreSQL targets, use the fully-qualified format `/database/schema` (`/migration_db/migration_schema`). For MySQL targets, use the database name (`/migration_db`). For Oracle targets, use the uppercase schema name (`/MIGRATION_SCHEMA`). -### Logging +Explicitly set a default `10s` [`webhook_client_timeout`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options) value in the `CREATE CHANGEFEED` statement. This value ensures that the webhook can report failures in inconsistent networking situations and make crash loops more visible. +{{site.data.alerts.end}} +
-By default, MOLT Replicator writes two streams of logs: operational logs to `stdout` (including `warning`, `info`, `trace`, and some errors) and final errors to `stderr`. +### Resuming after an interruption -Redirect both streams to ensure all logs are captured for troubleshooting: +Whether you're using Replicator to perform [forward replication](#forward-replication-after-initial-load) or [failback replication](#failback-replication), an unexpected issue may cause replication to stop. Rerun the `replicator` command as shown below: -{% include_cached copy-clipboard.html %} -~~~shell -# Merge both streams to console -./replicator ... 2>&1 +
+ + + +
-# Redirect both streams to a file -./replicator ... > output.log 2>&1 +When resuming replication after an interruption, MOLT Replicator automatically uses the stored checkpoint to resume from the correct position. -# Merge streams to console while saving to file -./replicator > >(tee replicator.log) 2>&1 +Rerun the same `replicator` command used during forward replication, specifying the same fully-qualified [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) value as before. Omit [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema) and any checkpoint flags. For example: -# Use logDestination flag to write all logs to a file -./replicator --logDestination replicator.log ... +
+{% include_cached copy-clipboard.html %} +~~~ shell +replicator pglogical \ +--sourceConn $SOURCE \ +--targetConn $TARGET \ +--slotName molt_slot \ +--stagingSchema defaultdb._replicator ~~~ -Enable debug logging with [`-v`]({% link molt/replicator-flags.md %}#verbose). For more granularity and system insights, enable trace logging with [`-vv`]({% link molt/replicator-flags.md %}#verbose). Pay close attention to warning- and error-level logs, as these indicate when Replicator is misbehaving. - -## Best practices - -### Test and validate - -To verify that your connections and configuration work properly, run MOLT Replicator in a staging environment before replicating any data in production. Use a test or development environment that closely resembles production. - -### Optimize performance - -{% include molt/optimize-replicator-performance.md %} - -## Troubleshooting +
-
- - - -
+
+{% include_cached copy-clipboard.html %} +~~~ shell +replicator mylogical \ +--sourceConn $SOURCE \ +--targetConn $TARGET \ +--stagingSchema defaultdb._replicator +~~~ -{% include molt/molt-troubleshooting-replication.md %} +
-{% include molt/molt-troubleshooting-failback.md %} +
+{% include_cached copy-clipboard.html %} +~~~ shell +replicator oraclelogminer \ +--sourceConn $SOURCE \ +--sourcePDBConn $SOURCE_PDB \ +--sourceSchema MIGRATION_USER \ +--targetConn $TARGET \ +--stagingSchema defaultdb._replicator +~~~ -## Examples +
-For detailed examples of using MOLT Replicator usage, refer to the migration workflow tutorials: +## Known limitations -- [Load and Replicate]({% link molt/migrate-load-replicate.md %}): Load data with MOLT Fetch and set up ongoing replication with MOLT Replicator. -- [Resume Replication]({% link molt/migrate-resume-replication.md %}): Resume replication after an interruption. -- [Migration failback]({% link molt/migrate-failback.md %}): Replicate changes from CockroachDB back to the initial source database. +{% include molt/molt-limitations-replicator.md %} ## See also +- [MOLT Replicator Installation]({% link molt/molt-replicator-installation.md %}) +- [MOLT Replicator Flags]({% link molt/replicator-flags.md %}) +- [MOLT Replicator Best Practices]({% link molt/molt-replicator-best-practices.md %}) +- [MOLT Replicator Troubleshooting]({% link molt/molt-replicator-troubleshooting.md %}) - [Migration Overview]({% link molt/migration-overview.md %}) - [Migration Strategy]({% link molt/migration-strategy.md %}) - [MOLT Fetch]({% link molt/molt-fetch.md %}) \ No newline at end of file diff --git a/src/current/molt/molt-verify.md b/src/current/molt/molt-verify.md index 4f135d6d96b..0a7ca70936e 100644 --- a/src/current/molt/molt-verify.md +++ b/src/current/molt/molt-verify.md @@ -33,7 +33,7 @@ The following source databases are supported: {% include molt/molt-install.md %} -# Setup +## Setup Complete the following items before using MOLT Verify: @@ -178,7 +178,7 @@ When verification completes, the output displays a summary showing the number of ### Verify transformed data -If you applied [transformations with MOLT Fetch]({% link molt/molt-fetch.md %}#transformations), a [MOLT Replicator userscript]({% link molt/userscript-cookbook.md %}#rename-tables), or another tool, you can apply the same transformations with MOLT Verify to match source data with the transformed target data. +If you applied [transformations with MOLT Fetch]({% link molt/molt-fetch.md %}#define-transformations), a [MOLT Replicator userscript]({% link molt/userscript-cookbook.md %}#rename-tables), or another tool, you can apply the same transformations with MOLT Verify to match source data with the transformed target data. #### Step 1. Create a transformation file @@ -239,12 +239,7 @@ When verification completes, the output displays a summary: ## Known limitations -- MOLT Verify compares 20,000 rows at a time by default, and row values can change between batches, potentially resulting in temporary inconsistencies in data. To configure the row batch size, use the `--row_batch_size` [flag](#flags). -- MOLT Verify checks for collation mismatches on [primary key]({% link {{site.current_cloud_version}}/primary-key.md %}) columns. This may cause validation to fail when a [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) is used as a primary key and the source and target databases are using different [collations]({% link {{site.current_cloud_version}}/collate.md %}). -- MOLT Verify might give an error in case of schema changes on either the source or target database. -- [Geospatial types]({% link {{site.current_cloud_version}}/spatial-data-overview.md %}#spatial-objects) cannot yet be compared. -- Only PostgreSQL and MySQL sources are supported for [verifying a subset of data](#verify-a-subset-of-data). -- Only table and schema renames are supported when [verifying transformed data](#verify-transformed-data). +{% include molt/molt-limitations-verify.md %} ## See also diff --git a/src/current/molt/phased-bulk-load-mysql.md b/src/current/molt/phased-bulk-load-mysql.md new file mode 100644 index 00000000000..5b6fd69dcab --- /dev/null +++ b/src/current/molt/phased-bulk-load-mysql.md @@ -0,0 +1,18 @@ +--- +title: Phased Bulk Load Migration from MySQL +summary: Learn what a Phased Bulk Load Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/phased-bulk-load-all-sources.md %} \ No newline at end of file diff --git a/src/current/molt/phased-bulk-load-oracle.md b/src/current/molt/phased-bulk-load-oracle.md new file mode 100644 index 00000000000..e85ccfa3c5b --- /dev/null +++ b/src/current/molt/phased-bulk-load-oracle.md @@ -0,0 +1,18 @@ +--- +title: Phased Bulk Load Migration from Oracle +summary: Learn what a Phased Bulk Load Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/phased-bulk-load-all-sources.md %} \ No newline at end of file diff --git a/src/current/molt/phased-bulk-load-postgres.md b/src/current/molt/phased-bulk-load-postgres.md new file mode 100644 index 00000000000..39f4ea59064 --- /dev/null +++ b/src/current/molt/phased-bulk-load-postgres.md @@ -0,0 +1,18 @@ +--- +title: Phased Bulk Load Migration from PostgreSQL +summary: Learn what a Phased Bulk Load Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/phased-bulk-load-all-sources.md %} diff --git a/src/current/molt/phased-delta-failback-mysql.md b/src/current/molt/phased-delta-failback-mysql.md new file mode 100644 index 00000000000..e96c5cfc806 --- /dev/null +++ b/src/current/molt/phased-delta-failback-mysql.md @@ -0,0 +1,18 @@ +--- +title: Phased Delta Migration with Failback Replication from MySQL +summary: Learn what a Phased Delta Migration with Failback Replication is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/phased-delta-failback-all-sources.md %} \ No newline at end of file diff --git a/src/current/molt/phased-delta-failback-oracle.md b/src/current/molt/phased-delta-failback-oracle.md new file mode 100644 index 00000000000..128cda41bc5 --- /dev/null +++ b/src/current/molt/phased-delta-failback-oracle.md @@ -0,0 +1,18 @@ +--- +title: Phased Delta Migration with Failback Replication from Oracle +summary: Learn what a Phased Delta Migration with Failback Replication is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/phased-delta-failback-all-sources.md %} \ No newline at end of file diff --git a/src/current/molt/phased-delta-failback-postgres.md b/src/current/molt/phased-delta-failback-postgres.md new file mode 100644 index 00000000000..39dbce4f9de --- /dev/null +++ b/src/current/molt/phased-delta-failback-postgres.md @@ -0,0 +1,18 @@ +--- +title: Phased Delta Migration with Failback Replication from PostgreSQL +summary: Learn what a Phased Delta Migration with Failback Replication is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +source_db_not_selectable: true +--- + + + +{% include molt/phased-delta-failback-all-sources.md %} \ No newline at end of file diff --git a/src/current/molt/replicator-flags.md b/src/current/molt/replicator-flags.md index 8ae5b08aec1..c6acea0fbfa 100644 --- a/src/current/molt/replicator-flags.md +++ b/src/current/molt/replicator-flags.md @@ -1,11 +1,30 @@ --- -title: Replicator Flags +title: MOLT Replicator Commands and Flags summary: Flag reference for MOLT Replicator toc: false docs_area: migrate --- -This page lists all available flags for the [MOLT Replicator commands]({% link molt/molt-replicator.md %}#commands): `start`, `pglogical`, `mylogical`, `oraclelogminer`, and `make-jwt`. +This page lists the [MOLT Replicator]({% link molt/molt-replicator.md %}) commands and the flags that you can use to configure a MOLT Replicator command execution. + +## Commands + +MOLT Replicator provides the following commands: + +| Command | Description | +|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `pglogical` | Replicate from PostgreSQL source to CockroachDB target using logical replication. | +| `mylogical` | Replicate from MySQL source to CockroachDB target using GTID-based replication. | +| `oraclelogminer` | Replicate from Oracle source to CockroachDB target using Oracle LogMiner. | +| `start` | Replicate from CockroachDB source to PostgreSQL, MySQL, or Oracle target ([failback mode]({% link molt/molt-replicator.md %}#failback-replication)). Requires a CockroachDB changefeed with rangefeeds enabled. | +| `make-jwt` | Generate JWT tokens for authorizing changefeed connections in failback scenarios. Supports signing tokens with RSA or EC keys, or generating claims for external JWT providers. For details, refer to [JWT authentication]({% link molt/molt-replicator-best-practices.md %}#jwt-authentication). | +| `version` | Display version information and Go module dependencies with checksums. For details, refer to [Supply chain security]({% link molt/molt-replicator-best-practices.md %}#supply-chain-security). | + +For command-specific flags and examples, refer to MOLT Replicator's [How it works]({% link molt/molt-replicator.md %}#how-it-works) and [Common uses]({% link molt/molt-replicator.md %}#common-uses) documentation. + +## Flags + +This page lists all available flags for the [MOLT Replicator commands](#commands): `start`, `pglogical`, `mylogical`, `oraclelogminer`, and `make-jwt`. | Flag | Commands | Type | Description | |---------------------------------------------------------------------------------------------|-----------------------------------------------------|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -58,7 +77,7 @@ This page lists all available flags for the [MOLT Replicator commands]({% link m | `--schemaRefresh` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | How often a watcher will refresh its schema. If this value is zero or negative, refresh behavior will be disabled.

**Default:** `1m0s` | | `--scn` | `oraclelogminer` | `INT` | **Required** the first time `replicator` is run. The snapshot System Change Number (SCN) from the initial data load, which provides a replication marker for streaming changes. | | `--scnWindowSize` | `oraclelogminer` | `INT` | The maximum size of SCN bounds per pull iteration from LogMiner. This helps prevent timeout errors when processing large SCN ranges. Set to `0` or a negative value to disable the cap.

**Default:** `3250` | -| `--slotName` | `pglogical` | `STRING` | **Required.** PostgreSQL replication slot name. Must match the slot name specified with `--pglogical-replication-slot-name` in the [MOLT Fetch command]({% link molt/molt-fetch.md %}#load-before-replication).

**Default:** `"replicator"` | +| `--slotName` | `pglogical` | `STRING` | **Required.** PostgreSQL replication slot name. Must match the slot name specified with `--pglogical-replication-slot-name` in the [MOLT Fetch command]({% link molt/molt-fetch.md %}#initial-bulk-load-before-replication).

**Default:** `"replicator"` | | `--sourceConn` | `pglogical`, `mylogical`, `oraclelogminer` | `STRING` | The source database's connection string. When replicating from Oracle, this is the connection string of the Oracle container database (CDB). | | `--sourcePDBConn` | `oraclelogminer` | `STRING` | Connection string for the Oracle pluggable database (PDB). Only required when using an [Oracle multitenant configuration](https://docs.oracle.com/en/database/oracle/oracle-database/21/cncpt/CDBs-and-PDBs.html). [`--sourceConn`](#source-conn) **must** be included. | | `--sourceSchema` | `oraclelogminer` | `STRING` | **Required.** Source schema name on Oracle where tables will be replicated from. | diff --git a/src/current/molt/replicator-metrics.md b/src/current/molt/replicator-metrics.md index de0343902c7..53db6f250fa 100644 --- a/src/current/molt/replicator-metrics.md +++ b/src/current/molt/replicator-metrics.md @@ -1,18 +1,18 @@ --- -title: Replicator Metrics +title: MOLT Replicator Metrics summary: Learn how to monitor stages of the MOLT Replicator pipeline. toc: true docs_area: migrate --- -[MOLT Replicator]({% link molt/molt-replicator.md %}) exposes Prometheus metrics at each stage of the [replication pipeline](#replication-pipeline). When using Replicator to perform [forward replication]({% link molt/migrate-load-replicate.md %}#start-replicator) or [failback]({% link molt/migrate-failback.md %}), you should monitor the health of each relevant pipeline stage to quickly detect issues. +[MOLT Replicator]({% link molt/molt-replicator.md %}) exposes Prometheus metrics at each stage of the [replication pipeline](#replication-pipeline). When using Replicator to perform [forward replication]({% link molt/molt-replicator.md %}#forward-replication-after-initial-load) or [failback]({% link molt/molt-replicator.md %}#failback-replication), you should monitor the health of each relevant pipeline stage to quickly detect issues. This page describes and provides usage guidelines for Replicator metrics, according to the replication source: - PostgreSQL - MySQL - Oracle -- CockroachDB (during [failback]({% link molt/migrate-failback.md %})) +- CockroachDB (during [failback]({% link molt/molt-replicator.md %}#failback-replication))
@@ -87,11 +87,11 @@ OK ### Visualize metrics
-Use the Replicator Grafana dashboard [bundled with your binary]({% link molt/molt-replicator.md %}#installation) (`replicator_grafana_dashboard.json`) to visualize metrics. The bundled dashboard matches your binary version. Alternatively, you can download the [latest dashboard](https://replicator.cockroachdb.com/replicator_grafana_dashboard.json). +Use the Replicator Grafana dashboard [bundled with your binary]({% link molt/molt-replicator-installation.md %}) (`replicator_grafana_dashboard.json`) to visualize metrics. The bundled dashboard matches your binary version. Alternatively, you can download the [latest dashboard](https://replicator.cockroachdb.com/replicator_grafana_dashboard.json).
-Use the Replicator Grafana dashboards [bundled with your binary]({% link molt/molt-replicator.md %}#installation) to visualize metrics. The general Replicator dashboard (`replicator_grafana_dashboard.json`) displays overall replication metrics, and the Oracle-specific dashboard (`replicator_oracle_grafana_dashboard.json`) displays [Oracle source metrics](#oracle-source). The bundled dashboards match your binary version. Alternatively, you can download the latest dashboards for [Replicator](https://replicator.cockroachdb.com/replicator_grafana_dashboard.json) and [Oracle source metrics](https://replicator.cockroachdb.com/replicator_oracle_grafana_dashboard.json). +Use the Replicator Grafana dashboards [bundled with your binary]({% link molt/molt-replicator-installation.md %}) to visualize metrics. The general Replicator dashboard (`replicator_grafana_dashboard.json`) displays overall replication metrics, and the Oracle-specific dashboard (`replicator_oracle_grafana_dashboard.json`) displays [Oracle source metrics](#oracle-source). The bundled dashboards match your binary version. Alternatively, you can download the latest dashboards for [Replicator](https://replicator.cockroachdb.com/replicator_grafana_dashboard.json) and [Oracle source metrics](https://replicator.cockroachdb.com/replicator_oracle_grafana_dashboard.json).
## Overall replication metrics @@ -101,38 +101,43 @@ Use the Replicator Grafana dashboards [bundled with your binary]({% link molt/mo Monitor the following metrics to track the overall health of the [replication pipeline](#replication-pipeline):
-- `core_source_lag_seconds` +- `core_source_lag_seconds` - Description: Age of the most recently received checkpoint. This represents the time from source commit to `COMMIT` event processing. - Interpretation: If consistently increasing, Replicator is falling behind in reading source changes, and cannot keep pace with database changes.
-- `core_source_lag_seconds` +- `core_source_lag_seconds` - Description: Age of the most recently received checkpoint. This represents the time elapsed since the latest received resolved timestamp. - Interpretation: If consistently increasing, Replicator is falling behind in reading source changes, and cannot keep pace with database changes.
-- `target_apply_mutation_age_seconds` +- `target_apply_mutation_age_seconds` - Description: End-to-end replication lag per mutation from source commit to target apply. Measures the difference between current wall time and the mutation's [MVCC timestamp]({% link {{ site.current_cloud_version }}/architecture/storage-layer.md %}#mvcc). - Interpretation: Higher values mean that older mutations are being applied, and indicate end-to-end pipeline delays. Compare across tables to find bottlenecks.
-- `target_apply_queue_utilization_percent` +- `target_apply_queue_utilization_percent` - Description: Percentage of target apply queue capacity utilization. - Interpretation: Values above 90 percent indicate severe backpressure throughout the pipeline, and potential data processing delays. Increase [`--targetApplyQueueSize`]({% link molt/replicator-flags.md %}#target-apply-queue-size) or investigate target database performance.
-- `target_apply_queue_utilization_percent` +- `target_apply_queue_utilization_percent` - Description: Percentage of target apply queue capacity utilization. - Interpretation: Values above 90 percent indicate severe backpressure throughout the pipeline, and potential data processing delays. Investigate target database performance.
- -
### Replication lag +Monitor the following metrics to track end-to-end replication lag: + +
+- `source_commit_to_apply_lag_seconds` + - Description: Time delta between writing a mutation to the source and writing it to the target. + - Interpretation: This may indicate the duration of a [minimum downtime window]({% link molt/migration-considerations-replication.md %}#permissible-downtime) due to drainage. Low values (seconds or hundreds of milliseconds) would allow for minimal downtime on cutover. +
-Monitor the following metric to track end-to-end replication lag: -- `target_apply_transaction_lag_seconds` +
+- `target_apply_transaction_lag_seconds` - Description: Age of the transaction applied to the target table, measuring time from source commit to target apply. - Interpretation: Consistently high values indicate bottlenecks in the pipeline. Compare with `core_source_lag_seconds` to determine if the delay is in source read or target apply.
@@ -142,10 +147,10 @@ Monitor the following metric to track end-to-end replication lag: Monitor the following metrics to track checkpoint progress: -- `target_applied_timestamp_seconds` +- `target_applied_timestamp_seconds` - Description: Wall time (Unix timestamp) of the most recently applied resolved timestamp. - Interpretation: Use to verify continuous progress. Stale values indicate apply stalls. -- `target_pending_timestamp_seconds` +- `target_pending_timestamp_seconds` - Description: Wall time (Unix timestamp) of the most recently received resolved timestamp. - Interpretation: A gap between this metric and `target_applied_timestamp_seconds` indicates apply backlog, meaning that the pipeline cannot keep up with incoming changes.
@@ -159,16 +164,16 @@ Monitor the following metrics to track checkpoint progress:
#### CockroachDB source -- `checkpoint_committed_age_seconds` +- `checkpoint_committed_age_seconds` - Description: Age of the committed checkpoint. - Interpretation: Increasing values indicate checkpoint commits are falling behind, which affects crash recovery capability. -- `checkpoint_proposed_age_seconds` +- `checkpoint_proposed_age_seconds` - Description: Age of the proposed checkpoint. - Interpretation: A gap with `checkpoint_committed_age_seconds` indicates checkpoint commit lag. -- `checkpoint_commit_duration_seconds` +- `checkpoint_commit_duration_seconds` - Description: Amount of time taken to save the committed checkpoint to the staging database. - Interpretation: High values indicate staging database bottlenecks due to write contention or performance issues. -- `checkpoint_proposed_going_backwards_errors_total` +- `checkpoint_proposed_going_backwards_errors_total` - Description: Number of times an error condition occurred where the changefeed was restarted. - Interpretation: Indicates source changefeed restart or time regression. Requires immediate investigation of source changefeed stability.
@@ -177,59 +182,59 @@ Monitor the following metrics to track checkpoint progress: #### Oracle source {{site.data.alerts.callout_success}} -To visualize the following metrics, import the Oracle Grafana dashboard [bundled with your binary]({% link molt/molt-replicator.md %}#installation) (`replicator_oracle_grafana_dashboard.json`). The bundled dashboard matches your binary version. Alternatively, you can download the [latest dashboard](https://replicator.cockroachdb.com/replicator_oracle_grafana_dashboard.json). +To visualize the following metrics, import the Oracle Grafana dashboard [bundled with your binary]({% link molt/molt-replicator-installation.md %}) (`replicator_oracle_grafana_dashboard.json`). The bundled dashboard matches your binary version. Alternatively, you can download the [latest dashboard](https://replicator.cockroachdb.com/replicator_oracle_grafana_dashboard.json). {{site.data.alerts.end}} -- `oraclelogminer_scn_interval_size` +- `oraclelogminer_scn_interval_size` - Description: Size of the interval from the start SCN to the current Oracle SCN. - Interpretation: Values larger than the [`--scnWindowSize`]({% link molt/replicator-flags.md %}#scn-window-size) flag value indicate replication lag, or that replication is idle. -- `oraclelogminer_time_per_window_seconds` +- `oraclelogminer_time_per_window_seconds` - Description: Amount of time taken to fully process an SCN interval. - Interpretation: Large values indicate Oracle slowdown, blocked replication loop, or slow processing. -- `oraclelogminer_query_redo_logs_duration_seconds` +- `oraclelogminer_query_redo_logs_duration_seconds` - Description: Amount of time taken to query redo logs from LogMiner. - Interpretation: High values indicate Oracle is under load or the SCN interval is too large. -- `oraclelogminer_num_inflight_transactions_in_memory` +- `oraclelogminer_num_inflight_transactions_in_memory` - Description: Current number of in-flight transactions in memory. - Interpretation: High counts indicate long-running transactions on source. Monitor for memory usage. -- `oraclelogminer_num_async_checkpoints_in_queue` +- `oraclelogminer_num_async_checkpoints_in_queue` - Description: Checkpoints queued for processing against staging database. - Interpretation: Values close to the `--checkpointQueueBufferSize` flag value indicate checkpoint processing cannot keep up with incoming checkpoints. -- `oraclelogminer_upsert_checkpoints_duration` +- `oraclelogminer_upsert_checkpoints_duration` - Description: Amount of time taken to upsert checkpoint batch into staging database. - Interpretation: High values indicate the staging database is under heavy load or batch size is too large. -- `oraclelogminer_delete_checkpoints_duration` +- `oraclelogminer_delete_checkpoints_duration` - Description: Amount of time taken to delete old checkpoints from the staging database. - Interpretation: High values indicate staging database load or long-running transactions preventing checkpoint deletion. -- `mutation_total` - - Description: Total number of mutations processed, labeled by source and mutation type (insert/update/delete). +- `mutation_total` + - Description: Total number of mutations processed, labeled by source and mutation type (insert/update/delete). - Interpretation: Use to monitor replication throughput and identify traffic patterns.
#### MySQL source -- `mylogical_dial_success_total` +- `mylogical_dial_success_total` - Description: Number of times Replicator successfully started logical replication. - Interpretation: Multiple successes may indicate reconnects. Monitor for connection stability. -- `mylogical_dial_failure_total` +- `mylogical_dial_failure_total` - Description: Number of times Replicator failed to start logical replication. - Interpretation: Nonzero values indicate connection issues. Check network connectivity and source database health. -- `mutation_total` - - Description: Total number of mutations processed, labeled by source and mutation type (insert/update/delete). +- `mutation_total` + - Description: Total number of mutations processed, labeled by source and mutation type (insert/update/delete). - Interpretation: Use to monitor replication throughput and identify traffic patterns.
#### PostgreSQL source -- `pglogical_dial_success_total` +- `pglogical_dial_success_total` - Description: Number of times Replicator successfully started logical replication (executed `START_REPLICATION` command). - Interpretation: Multiple successes may indicate reconnects. Monitor for connection stability. -- `pglogical_dial_failure_total` +- `pglogical_dial_failure_total` - Description: Number of times Replicator failed to start logical replication (failure to execute `START_REPLICATION` command). - Interpretation: Nonzero values indicate connection issues. Check network connectivity and source database health. -- `mutation_total` +- `mutation_total` - Description: Total number of mutations processed, labeled by source and mutation type (insert/update/delete). - Interpretation: Use to monitor replication throughput and identify traffic patterns.
@@ -243,13 +248,13 @@ To visualize the following metrics, import the Oracle Grafana dashboard [bundled For checkpoint terminology, refer to the [MOLT Replicator documentation]({% link molt/molt-replicator.md %}#terminology). {{site.data.alerts.end}} -- `stage_commit_lag_seconds` +- `stage_commit_lag_seconds` - Description: Time between writing a mutation to source and writing it to staging. - Interpretation: High values indicate delays in getting data into the staging layer. -- `stage_mutations_total` - - Description: Number of mutations staged for each table. +- `stage_mutations_total` + - Description: Number of mutations staged for each table. - Interpretation: Use to monitor staging throughput per table. -- `stage_duration_seconds` +- `stage_duration_seconds` - Description: Amount of time taken to successfully stage mutations. - Interpretation: High values indicate write performance issues on the staging database.
@@ -259,16 +264,16 @@ For checkpoint terminology, refer to the [MOLT Replicator documentation]({% link [Core sequencer](#replication-pipeline) metrics track mutation processing, ordering, and transaction coordination. -- `core_sweep_duration_seconds` +- `core_sweep_duration_seconds` - Description: Duration of each schema sweep operation, which looks for and applies staged mutations. - Interpretation: Long durations indicate that large backlogs, slow staging reads, or slow target writes are affecting throughput. -- `core_sweep_mutations_applied_total` +- `core_sweep_mutations_applied_total` - Description: Total count of mutations read from staging and successfully applied to the target database during a sweep. - Interpretation: Use to monitor processing throughput. A flat line indicates no mutations are being applied. -- `core_sweep_success_timestamp_seconds` +- `core_sweep_success_timestamp_seconds` - Description: Wall time (Unix timestamp) at which a sweep attempt last succeeded. - Interpretation: If this value stops updating and becomes stale, it indicates that the sweep has stopped. -- `core_parallelism_utilization_percent` +- `core_parallelism_utilization_percent` - Description: Percentage of the configured parallelism that is actively being used for concurrent transaction processing. - Interpretation: High utilization indicates bottlenecks in mutation processing. @@ -277,25 +282,25 @@ For checkpoint terminology, refer to the [MOLT Replicator documentation]({% link [Target apply](#replication-pipeline) metrics track mutation application to the target database. -- `target_apply_queue_size` +- `target_apply_queue_size` - Description: Number of transactions waiting in the target apply queue. - Interpretation: High values indicate target apply cannot keep up with incoming transactions. -- `apply_duration_seconds` +- `apply_duration_seconds` - Description: Amount of time taken to successfully apply mutations to a table. - Interpretation: High values indicate target database performance issues or contention. -- `apply_upserts_total` - - Description: Number of rows upserted to the target. +- `apply_upserts_total` + - Description: Number of rows upserted to the target. - Interpretation: Use to monitor write throughput. Should grow steadily during active replication. -- `apply_deletes_total` +- `apply_deletes_total` - Description: Number of rows deleted from the target. - Interpretation: Use to monitor delete throughput. Compare with delete operations on the source database. -- `apply_errors_total` +- `apply_errors_total` - Description: Number of times an error was encountered while applying mutations. - Interpretation: Growing error count indicates target database issues or constraint violations. -- `apply_conflicts_total` +- `apply_conflicts_total` - Description: Number of rows that experienced a compare-and-set (CAS) conflict. - Interpretation: High counts indicate concurrent modifications or stale data conflicts. May require conflict resolution tuning. -- `apply_resolves_total` +- `apply_resolves_total` - Description: Number of rows that experienced a compare-and-set (CAS) conflict and were successfully resolved. - Interpretation: Compare with `apply_conflicts_total` to verify conflict resolution is working. Should be close to or equal to conflicts. @@ -511,5 +516,5 @@ Include this bundled metrics snapshot file on a [support ticket]({% link {{ site - [MOLT Replicator]({% link molt/molt-replicator.md %}) - [Replicator Flags]({% link molt/replicator-flags.md %}) -- [Load and Replicate]({% link molt/migrate-load-replicate.md %}) -- [Migration Failback]({% link molt/migrate-failback.md %}) +- [MOLT Replicator Best Practices]({% link molt/molt-replicator-best-practices.md %}) +- [MOLT Replicator Troubleshooting]({% link molt/molt-replicator-troubleshooting.md %}) diff --git a/src/current/molt/userscript-api.md b/src/current/molt/userscript-api.md index 4efdc135203..eb425ef02a1 100644 --- a/src/current/molt/userscript-api.md +++ b/src/current/molt/userscript-api.md @@ -13,7 +13,7 @@ The [userscript cookbook]({% link molt/userscript-cookbook.md %}) includes examp ## Access this API -To access the userscript API, [install MOLT Replicator **v1.3.0 or later**]({% link molt/molt-replicator.md %}#installation). The userscript API is accessible via the `replicator` library, which you should import at the top of your TypeScript file: `import * as api from "replicator@v2";`. The `replicator` library is included in the MOLT Replicator binary itself, so you do not need to install any external packages in order to run userscripts. +To access the userscript API, [install MOLT Replicator **v1.3.0 or later**]({% link molt/molt-replicator-installation.md %}). The userscript API is accessible via the `replicator` library, which you should import at the top of your TypeScript file: `import * as api from "replicator@v2";`. The `replicator` library is included in the MOLT Replicator binary itself, so you do not need to install any external packages in order to run userscripts. In addition to importing the API from the `replicator` library, you can download the [userscript type definitions file](https://replicator.cockroachdb.com/userscripts/replicator@v2.d.ts) and the [tsconfig.json file](https://replicator.cockroachdb.com/userscripts/tsconfig.json). Place these files in your working directory to enable autocomplete, inline documentation, and real-time error detection directly in your IDE. diff --git a/src/current/molt/userscript-cookbook.md b/src/current/molt/userscript-cookbook.md index 35a07b497a1..67ac56bd88e 100644 --- a/src/current/molt/userscript-cookbook.md +++ b/src/current/molt/userscript-cookbook.md @@ -9,12 +9,12 @@ Userscripts allow you to define custom schema and table transformations. When sp This cookbook provides ready-to-use examples that demonstrate real-world uses of the [userscript API]({% link molt/userscript-api.md %}). You can copy and paste them into your own code, and you can adapt them for your specific use cases. -Userscripts are comparable to MOLT Fetch's [transformations]({% link molt/molt-fetch.md %}#transformations), which are used during the initial bulk load phase of a migration. When performing an [initial data load followed by live replication]({% link molt/migrate-load-replicate.md %}), **apply equivalent transformations in both the Fetch command and Replicator userscript** to ensure data consistency. Below each example, you will see the equivalent way of carrying out that transformation using MOLT Fetch, if it's possible to do so. +Userscripts are comparable to MOLT Fetch's [transformations]({% link molt/molt-fetch.md %}#define-transformations), which are used during the initial bulk load phase of a migration. When performing an [initial data load followed by live replication]({% link molt/molt-replicator.md %}#forward-replication-after-initial-load), **apply equivalent transformations in both the Fetch command and Replicator userscript** to ensure data consistency. Below each example, you will see the equivalent way of carrying out that transformation using MOLT Fetch, if it's possible to do so. ## Before you begin - Make sure that you understand the [purpose and usage of userscripts]({% link molt/userscript-overview.md %}). Take a look at the [userscript API]({% link molt/userscript-api.md %}). Understand [what you cannot do]({% link molt/userscript-overview.md %}#unsupported-typescript-features) in a userscript. -- [Install MOLT Replicator]({% link molt/molt-replicator.md %}#installation). The userscript API is accessible via the `replicator` library, which is already included in MOLT Replicator. +- [Install MOLT Replicator]({% link molt/molt-replicator-installation.md %}). The userscript API is accessible via the `replicator` library, which is already included in MOLT Replicator. - [Install TypeScript](https://www.typescriptlang.org/download/), and install a TypeScript-compatible IDE (for example, VS Code). - Download the [userscript type definitions file](https://replicator.cockroachdb.com/userscripts/replicator@v2.d.ts) and the [tsconfig.json file](https://replicator.cockroachdb.com/userscripts/tsconfig.json). Place these files in your working directory to enable autocomplete, inline documentation, and real-time error detection directly in your IDE. @@ -188,7 +188,7 @@ is_deleted STRING, ssn STRING, credit_card_number STRING #### MOLT Fetch equivalent -You can selectively replicate data using the [`--filter-path`]({% link molt/molt-fetch.md %}#selective-data-movement) flag, which accepts a path to a JSON file that specifies row-level filter expressions. +You can selectively replicate data using the [`--filter-path`]({% link molt/molt-fetch.md %}#select-data-to-migrate) flag, which accepts a path to a JSON file that specifies row-level filter expressions. **Make sure to replace the `/path/to/soft_delete_filter.json` placeholder with the path to your json file, and make sure that the source and target connection strings have been exported to the environment.** @@ -266,7 +266,7 @@ is_deleted STRING, ssn STRING, credit_card_number STRING #### MOLT Fetch equivalent -Filter columns using the [`--transformations-file`]({% link molt/molt-fetch.md %}#transformations) flag, which accepts a path to a JSON file that specifies column exclusions. +Filter columns using the [`--transformations-file`]({% link molt/molt-fetch.md %}#define-transformations) flag, which accepts a path to a JSON file that specifies column exclusions. **Make sure to replace the `/path/to/exclude_qty_column.json` placeholder with the path to your json file, and make sure that the source and target connection strings have been exported to the environment.** @@ -375,7 +375,7 @@ employee_id STRING, employee_name STRING, department STRING #### MOLT Fetch equivalent -MOLT Fetch does not have direct support for column renaming. You may need to rename the column on the target database after the [initial data load from MOLT Fetch]({% link molt/migrate-load-replicate.md %}#start-fetch). +MOLT Fetch does not have direct support for column renaming. You may need to rename the column on the target database after the [initial data load from MOLT Fetch]({% link molt/migration-approach-delta.md %}). ### Rename primary keys @@ -458,7 +458,7 @@ id1 STRING, id2 STRING, name STRING #### MOLT Fetch equivalent -MOLT Fetch does not have direct support for primary key renaming. You may need to reconfigure the primary keys on the target database after the [initial data load from MOLT Fetch]({% link molt/migrate-load-replicate.md %}#start-fetch). +MOLT Fetch does not have direct support for primary key renaming. You may need to reconfigure the primary keys on the target database after the [initial data load from MOLT Fetch]({% link molt/migration-approach-delta.md %}). ### Route table partitions @@ -571,7 +571,7 @@ api.configureTargetSchema(TARGET_SCHEMA_NAME, { #### MOLT Fetch equivalent -Rename tables using the [`--transformations-file`]({% link molt/molt-fetch.md %}#transformations) flag, which accepts a path to a JSON file that specifies table mappings. +Rename tables using the [`--transformations-file`]({% link molt/molt-fetch.md %}#define-transformations) flag, which accepts a path to a JSON file that specifies table mappings. **Make sure to replace the `/path/to/rename_tables.json` placeholder with the path to your json file, and make sure that the source and target connection strings have been exported to the environment.** @@ -663,7 +663,7 @@ is_deleted STRING, ssn STRING, credit_card_number STRING #### MOLT Fetch equivalent -Creating computed columns is not supported by MOLT Fetch transforms. MOLT Fetch can only preserve computed columns that exist on the source. You may need to calculate this column for the target database table after the [initial data load from MOLT Fetch]({% link molt/migrate-load-replicate.md %}#start-fetch). +Creating computed columns is not supported by MOLT Fetch transforms. MOLT Fetch can only preserve computed columns that exist on the source. You may need to calculate this column for the target database table after the [initial data load from MOLT Fetch]({% link molt/migration-approach-delta.md %}). ### Combine multiple transforms @@ -738,8 +738,8 @@ is_deleted STRING To implement this transformation with MOLT Fetch, create: -- A `soft_delete_filter.json` file (to be included via the [`--filter-path`]({% link molt/molt-fetch.md %}#selective-data-movement) flag). -- A `pii_removal_transform.json` file (to be included via the [`--transformations-file`]({% link molt/molt-fetch.md %}#transformations) flag). +- A `soft_delete_filter.json` file (to be included via the [`--filter-path`]({% link molt/molt-fetch.md %}#select-data-to-migrate) flag). +- A `pii_removal_transform.json` file (to be included via the [`--transformations-file`]({% link molt/molt-fetch.md %}#define-transformations) flag). Run MOLT Fetch with both the `--filter-path` and `--transformations-file` flags. diff --git a/src/current/molt/userscript-overview.md b/src/current/molt/userscript-overview.md index 055f8a9cdcd..f8197d24a11 100644 --- a/src/current/molt/userscript-overview.md +++ b/src/current/molt/userscript-overview.md @@ -21,7 +21,7 @@ Userscripts are [written in TypeScript]({% link molt/userscript-cookbook.md %}) Userscripts act as a customizable processing layer within MOLT Replicator's live replication lifecycle. They are used to intercept, inspect, and modify the flow of data as it moves from the source database to the target database, enabling full control over how rows are transformed, filtered, or written; as well as providing the ability to run custom transactional logic against the target database. -Userscripts are comparable to MOLT Fetch's [transformations]({% link molt/molt-fetch.md %}#transformations), which are used during the initial bulk load phase of a migration. However, userscripts provide much greater customizability. When performing an [initial data load followed by live replication]({% link molt/migrate-load-replicate.md %}), **apply equivalent transformations in both the Fetch command and Replicator userscript** to ensure data consistency. +Userscripts are comparable to MOLT Fetch's [transformations]({% link molt/molt-fetch.md %}#define-transformations), which are used during the initial bulk load phase of a migration. However, userscripts provide much greater customizability. When performing an [initial data load followed by live replication]({% link molt/molt-replicator.md %}#forward-replication-after-initial-load), **apply equivalent transformations in both the Fetch command and Replicator userscript** to ensure data consistency. The following diagram illustrates how userscripts fit into the replication pipeline: @@ -47,7 +47,7 @@ Userscripts run in a sandboxed JavaScript runtime inside [MOLT Replicator]({% li ## Usage -To have MOLT Replicator apply a userscript, include the [`--userscript`]({% link molt/replicator-flags.md %}#userscript) flag with any [Replicator command]({% link molt/molt-replicator.md %}#commands). The flag accepts a path to a TypeScript filename. +To have MOLT Replicator apply a userscript, include the [`--userscript`]({% link molt/replicator-flags.md %}#userscript) flag with any [Replicator command]({% link molt/replicator-flags.md %}). The flag accepts a path to a TypeScript filename. {% include_cached copy-clipboard.html %} ~~~ diff --git a/src/current/molt/userscript-quickstart.md b/src/current/molt/userscript-quickstart.md index da7ae54aa49..e96a2ac0576 100644 --- a/src/current/molt/userscript-quickstart.md +++ b/src/current/molt/userscript-quickstart.md @@ -9,7 +9,7 @@ This quickstart guides you through creating, validating, and deploying your firs ## Before you begin -- [Install MOLT Replicator **v1.3.0 or later**]({% link molt/molt-replicator.md %}#installation) for full compatibility with the userscript API. +- [Install MOLT Replicator **v1.3.0 or later**]({% link molt/molt-replicator-installation.md %}) for full compatibility with the userscript API. - Install a TypeScript-compatible IDE (for example, [VS Code](https://code.visualstudio.com/)). ## Step 1: Set up your environment diff --git a/src/current/releases/molt.md b/src/current/releases/molt.md index ce08a0ad9ca..9211f9c9709 100644 --- a/src/current/releases/molt.md +++ b/src/current/releases/molt.md @@ -115,9 +115,9 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.3.1 is [available](#installation). -- MOLT Fetch now supports [sharding]({% link molt/molt-fetch.md %}#table-sharding) of primary keys of any data type on PostgreSQL 11+ sources. This can be enabled with the [`--use-stats-based-sharding`]({% link molt/molt-fetch.md %}#global-flags) flag. -- Added the [`--ignore-replication-check`]({% link molt/molt-fetch.md %}#global-flags) flag to allow data loads with planned downtime and no replication setup. The `--pglogical-ignore-wal-check` flag has been removed. -- Added the `--enableParallelApplies` [replication flag]({% link molt/molt-replicator.md %}#flags) to enable parallel application of independent table groups during replication. By default, applies are synchronous. When enabled, this increases throughput at the cost of increased target pool and memory usage. +- MOLT Fetch now supports [sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export) of primary keys of any data type on PostgreSQL 11+ sources. This can be enabled with the [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag. +- Added the [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag to allow data loads with planned downtime and no replication setup. The `--pglogical-ignore-wal-check` flag has been removed. +- Added the `--enableParallelApplies` [replication flag]({% link molt/replicator-flags.md %}) to enable parallel application of independent table groups during replication. By default, applies are synchronous. When enabled, this increases throughput at the cost of increased target pool and memory usage. - Improved cleanup logic for scheduled tasks to ensure progress reporting and prevent indefinite hangs. - Added parallelism gating to ensure the parallelism setting is smaller than the `targetMaxPoolSize`. This helps prevent a potential indefinite hang. - Added new metrics that track start and end times for progress reports (`core_progress_reports_started_count` and `core_progress_reports_ended_count`) and error reports (`core_error_reports_started_count` and `core_error_reports_ended_count`). These provide visibility into the core sequencer progress and help identify hangs in the applier and progress tracking pipeline. @@ -159,7 +159,7 @@ Cockroach Labs recommends using the latest available version of each tool. Refer ##### Bug fixes -- MOLT Fetch [failback]({% link molt/migrate-failback.md %}) now reliably creates changefeeds with a sorted list of table names so that create changefeed operations can be properly deduplicated. +- MOLT Fetch failback now reliably creates changefeeds with a sorted list of table names so that create changefeed operations can be properly deduplicated. - Fixed an issue where shard connections failed to recognize custom types (e.g., `ENUM`) in primary keys during table migration. This occurred because the type map from the original `pgx.Conn` was not cloned. The type map is now properly cloned and attached to each shard connection. - Fixed a bug that could cause an integer overflow, which impacts retrieving the correct shards for exporting data. @@ -167,7 +167,7 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.2.6 is [available](#installation). -- Fixed a bug in [`--direct-copy` mode]({% link molt/molt-fetch.md %}#direct-copy) that occurred when [`--case-sensitive`]({% link molt/molt-fetch.md %}#global-flags) was set to `false` (default). Previously, the `COPY` query could use incorrect column names in some cases during data transfer, causing errors. The query now uses the correct column names. +- Fixed a bug in [`--direct-copy` mode]({% link molt/molt-fetch.md %}#direct-copy) that occurred when [`--case-sensitive`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) was set to `false` (default). Previously, the `COPY` query could use incorrect column names in some cases during data transfer, causing errors. The query now uses the correct column names. - Fixed a bug in how origin messages were handled during replication from PostgreSQL sources. This allows replication to successfully continue. - `ENUM` types can now be replicated from MySQL 8.0 sources. @@ -185,7 +185,7 @@ Cockroach Labs recommends using the latest available version of each tool. Refer - MOLT Fetch failback to CockroachDB is now disallowed. - MOLT Verify can now compare tables that are named differently on the source and target schemas. - The `molt` logging date format is now period-delimited for Windows compatibility. -- During replication, an index is now created on all tables by default, improving replication performance. Because index creation can cause the replication process to initialize more slowly, this behavior can be disabled using the `--stageDisableCreateTableReaderIndex` [replication flag]({% link molt/molt-replicator.md %}#flags). +- During replication, an index is now created on all tables by default, improving replication performance. Because index creation can cause the replication process to initialize more slowly, this behavior can be disabled using the `--stageDisableCreateTableReaderIndex` [replication flag]({% link molt/replicator-flags.md %}#stage-disable-create-table-reader-index). - Added a failback metric that tracks the time to write a source commit to the staging schema for a given mutation. - Added a failback metric that tracks the time to write a source commit to the target database for a given mutation. @@ -193,41 +193,41 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.2.3 is [available](#installation). -- MOLT Fetch users can now set [`--table-concurrency`]({% link molt/molt-fetch.md %}#global-flags) and [`--export-concurrency`]({% link molt/molt-fetch.md %}#global-flags) to values greater than `1` for MySQL sources. -- MOLT Fetch now supports case-insensitive comparison of table and column names by default. Previously, case-sensitive comparisons could result in `no matching table on target` errors. To disable case-sensitive comparisons explicitly, set [`--case-sensitive=false`]({% link molt/molt-fetch.md %}#global-flags). If `=` is **not** included (e.g., `--case-sensitive false`), this is interpreted as `--case-sensitive` (i.e., `--case-sensitive=true`). +- MOLT Fetch users can now set [`--table-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) and [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) to values greater than `1` for MySQL sources. +- MOLT Fetch now supports case-insensitive comparison of table and column names by default. Previously, case-sensitive comparisons could result in `no matching table on target` errors. To disable case-sensitive comparisons explicitly, set [`--case-sensitive=false`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags). If `=` is **not** included (e.g., `--case-sensitive false`), this is interpreted as `--case-sensitive` (i.e., `--case-sensitive=true`). ### February 5, 2025 `molt` 1.2.2 is [available](#installation). -- Added an [`--import-region`]({% link molt/molt-fetch.md %}#global-flags) flag that is used to set the `AWS_REGION` query parameter explicitly in the [`s3` URL]({% link molt/molt-fetch.md %}#bucket-path). -- Fixed the [`truncate-if-exists`]({% link molt/molt-fetch.md %}#target-table-handling) schema mode for cases where there are uppercase table or schema names. +- Added an [`--import-region`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag that is used to set the `AWS_REGION` query parameter explicitly in the [`s3` URL]({% link molt/molt-fetch.md %}#bucket-path). +- Fixed the [`truncate-if-exists`]({% link molt/molt-fetch.md %}#handle-target-tables) schema mode for cases where there are uppercase table or schema names. - Fixed an issue with unsigned `BIGINT` values overflowing in replication. -- Added a `--schemaRefresh` [replication flag]({% link molt/molt-replicator.md %}#flags) that is used to configure the schema watcher refresh delay in the replication phase. Previously, the refresh delay was set to a constant value of 1 minute. Set the flag as follows: `--replicator-flags "--schemaRefresh {value}"`. +- Added a `--schemaRefresh` [replication flag]({% link molt/replicator-flags.md %}#schema-refresh) that is used to configure the schema watcher refresh delay in the replication phase. Previously, the refresh delay was set to a constant value of 1 minute. Set the flag as follows: `--replicator-flags "--schemaRefresh {value}"`. ### December 13, 2024 `molt` 1.2.1 is [available](#installation). -- MOLT Fetch users now can use [`--assume-role`]({% link molt/molt-fetch.md %}#global-flags) to specify a service account for assume role authentication to cloud storage. `--assume-role` must be used with `--use-implicit-auth`, or the flag will be ignored. +- MOLT Fetch users now can use [`--assume-role`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) to specify a service account for assume role authentication to cloud storage. `--assume-role` must be used with `--use-implicit-auth`, or the flag will be ignored. - MySQL 5.7 and later are now supported with MOLT Fetch replication modes. - Fetch replication mode now defaults to a less verbose `INFO` logging level. To specify `DEBUG` logging, pass in the `--replicator-flags '-v'` setting, or `--replicator-flags '-vv'` for trace logging. - MySQL columns of type `BIGINT UNSIGNED` or `SERIAL` are now auto-mapped to [`DECIMAL`]({% link {{ site.current_cloud_version }}/decimal.md %}) type in CockroachDB. MySQL regular `BIGINT` types are mapped to [`INT`]({% link {{ site.current_cloud_version }}/int.md %}) type in CockroachDB. -- The `pglogical` replication workflow was modified in order to enforce safer and simpler defaults for the [`data-load`]({% link molt/molt-fetch.md %}#fetch-mode), `data-load-and-replication`, and `replication-only` workflows for PostgreSQL sources. Fetch now ensures that the publication is created before the slot, and that `replication-only` defaults to using publications and slots created either in previous Fetch runs or manually. +- The `pglogical` replication workflow was modified in order to enforce safer and simpler defaults for the [`data-load`]({% link molt/molt-fetch.md %}#define-fetch-mode), `data-load-and-replication`, and `replication-only` workflows for PostgreSQL sources. Fetch now ensures that the publication is created before the slot, and that `replication-only` defaults to using publications and slots created either in previous Fetch runs or manually. - Fixed scan iterator query ordering for `BINARY` and `TEXT` (of same collation) PKs so that they lead to the correct queries and ordering. -- For a MySQL source in `replication-only` mode, the [`--stagingSchema` replicator flag]({% link molt/replicator-flags.md %}#staging-schema) can now be used to resume streaming replication after being interrupted. Otherwise, the [`--defaultGTIDSet` replicator flag]({% link molt/replicator-flags.md %}#default-gtid-set) is used to start initial replication after a previous Fetch run in [`data-load`]({% link molt/molt-fetch.md %}#fetch-mode) mode, or as an override to the current replication stream. +- For a MySQL source in `replication-only` mode, the [`--stagingSchema` replicator flag]({% link molt/replicator-flags.md %}#staging-schema) can now be used to resume streaming replication after being interrupted. Otherwise, the [`--defaultGTIDSet` replicator flag]({% link molt/replicator-flags.md %}#default-gtid-set) is used to start initial replication after a previous Fetch run in [`data-load`]({% link molt/molt-fetch.md %}#define-fetch-mode) mode, or as an override to the current replication stream. ### October 29, 2024 `molt` 1.2.0 is [available](#installation). - Added `failback` mode to MOLT Fetch, which allows the user to replicate changes on CockroachDB back to the initial source database. Failback is supported for MySQL and PostgreSQL databases. -- The [`--pprof-list-addr` flag]({% link molt/molt-fetch.md %}#global-flags), which specifies the address of the `pprof` endpoint, is now configurable. The default value is `'127.0.0.1:3031'`. -- [Fetch modes]({% link molt/molt-fetch.md %}#fetch-mode) involving replication now state that MySQL 8.0 and later are supported for replication between MySQL and CockroachDB. -- [Partitioned tables]({% link molt/molt-fetch.md %}#transformations) can now be moved to CockroachDB using [`IMPORT INTO`]({% link {{ site.current_cloud_version }}/import-into.md %}). -- Improved logging for the [Fetch]({% link molt/molt-fetch.md %}) schema check phases under the `trace` logging level, which is set with [`--logging trace`]({% link molt/molt-fetch.md %}#global-flags). +- The [`--pprof-list-addr` flag]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags), which specifies the address of the `pprof` endpoint, is now configurable. The default value is `'127.0.0.1:3031'`. +- [Fetch modes]({% link molt/molt-fetch.md %}#define-fetch-mode) involving replication now state that MySQL 8.0 and later are supported for replication between MySQL and CockroachDB. +- [Partitioned tables]({% link molt/molt-fetch.md %}#define-transformations) can now be moved to CockroachDB using [`IMPORT INTO`]({% link {{ site.current_cloud_version }}/import-into.md %}). +- Improved logging for the [Fetch]({% link molt/molt-fetch.md %}) schema check phases under the `trace` logging level, which is set with [`--logging trace`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags). - Added a [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) for monitoring MOLT tasks. -- Fetch now logs the name of the staging database in the target CockroachDB cluster used to store metadata for [replication modes]({% link molt/molt-fetch.md %}#fetch-mode). +- Fetch now logs the name of the staging database in the target CockroachDB cluster used to store metadata for [replication modes]({% link molt/molt-fetch.md %}#define-fetch-mode). - String [primary keys]({% link {{ site.current_cloud_version }}/primary-key.md %}) that use `C` [collations]({% link {{ site.current_cloud_version }}/collate.md %}) on PostgreSQL can now be compared to the default `en_US.utf8` on CockroachDB. - MOLT is now distributed under the [Cockroach Labs Product License Agreement](https://www.cockroachlabs.com/cockroach-labs-product-license-agreement/), which is bundled with the binary. @@ -235,7 +235,7 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.1.7 is [available](#installation). -- When a [Fetch transformation rule]({% link molt/molt-fetch.md %}#transformations) is used to rename a table or map partitioned tables, a script in the format `partitionTableScript.{timestamp}.ts` is now automatically generated to ensure that [replication]({% link molt/molt-fetch.md %}#fetch-mode) works properly if enabled. +- When a [Fetch transformation rule]({% link molt/molt-fetch.md %}#define-transformations) is used to rename a table or map partitioned tables, a script in the format `partitionTableScript.{timestamp}.ts` is now automatically generated to ensure that [replication]({% link molt/molt-fetch.md %}#define-fetch-mode) works properly if enabled. ### August 19, 2024 @@ -248,8 +248,8 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.1.5 is [available](#installation). - **Deprecated** the `--ongoing-replication` flag in favor of `--mode data-load-and-replication`, using the new `--mode` flag. Users should replace all instances of `--ongoing-replication` with `--mode data-load-and-replication`. -- Fetch can now be run in an export-only mode by specifying [`--mode export-only`]({% link molt/molt-fetch.md %}#fetch-mode). This will export all the data in `csv` or `csv.gz` format to the specified cloud or local store. -- Fetch can now be run in an import-only mode by specifying [`--mode import-only`]({% link molt/molt-fetch.md %}#fetch-mode). This will load all data in the specified cloud or local store into the target CockroachDB database, effectively skipping the export data phase. +- Fetch can now be run in an export-only mode by specifying [`--mode export-only`]({% link molt/molt-fetch.md %}#define-fetch-mode). This will export all the data in `csv` or `csv.gz` format to the specified cloud or local store. +- Fetch can now be run in an import-only mode by specifying [`--mode import-only`]({% link molt/molt-fetch.md %}#define-fetch-mode). This will load all data in the specified cloud or local store into the target CockroachDB database, effectively skipping the export data phase. - Strings for the `--mode` flag are now word-separated by hyphens instead of underscores. For example, `replication-only` instead of `replication_only`. ### August 8, 2024 @@ -257,7 +257,7 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.1.4 is [available](#installation). - Added a replication-only mode for Fetch that allows the user to run ongoing replication without schema creation or initial data load. This requires users to set `--mode replication_only` and `--replicator-flags` to specify the `defaultGTIDSet` ([MySQL](https://github.com/cockroachdb/replicator/wiki/MYLogical)) or `slotName` ([PostgreSQL](https://github.com/cockroachdb/replicator/wiki/PGLogical)). -- Partitioned tables can now be mapped to renamed tables on the target database, using the Fetch [transformations framework]({% link molt/molt-fetch.md %}#transformations). +- Partitioned tables can now be mapped to renamed tables on the target database, using the Fetch [transformations framework]({% link molt/molt-fetch.md %}#define-transformations). - Added a new `--metrics-scrape-interval` flag to allow users to specify their Prometheus scrape interval and apply a sleep at the end to allow for the final metrics to be scraped. - Previously, there was a mismatch between the errors logged in log lines and those recorded in the exceptions table when an `IMPORT INTO` or `COPY FROM` operation failed due to a non-PostgreSQL error. Now, all errors will lead to an exceptions table entry that allows the user to continue progress from a certain table's file. - Fixed a bug that will allow Fetch to properly determine a GTID if there are multiple `source_uuids`. @@ -267,8 +267,8 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.1.3 is [available](#installation). - `'infinity'::timestamp` values can now be moved with Fetch. -- Fixed an issue where connections were not being closed immediately after sharding was completed. This could lead to errors if the [maximum number of connections]({% link molt/molt-fetch.md %}#best-practices) was set to a low value. -- Fetch users can now exclude specific tables from migration using the [`--table-exclusion-filter` flag]({% link molt/molt-fetch.md %}#global-flags). +- Fixed an issue where connections were not being closed immediately after sharding was completed. This could lead to errors if the [maximum number of connections]({% link molt/molt-fetch-best-practices.md %}#configure-the-source-database-and-connection) was set to a low value. +- Fetch users can now exclude specific tables from migration using the [`--table-exclusion-filter` flag]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags). ### July 18, 2024 @@ -276,16 +276,16 @@ Cockroach Labs recommends using the latest available version of each tool. Refer - Fetch users can now specify columns to exclude from table migrations in order to migrate a subset of their data. This is supported in the schema creation, export, import, and direct copy phases. - Fetch now automatically maps a partitioned table from a PostgreSQL source to the target CockroachDB schema. -- Fetch now supports column exclusions and computed column mappings via a new [transformations framework]({% link molt/molt-fetch.md %}#transformations). -- The new Fetch [`--transformations-file`]({% link molt/molt-fetch.md %}#global-flags) flag specifies a JSON file for schema/table/column transformations, which has validation utilities built in. +- Fetch now supports column exclusions and computed column mappings via a new [transformations framework]({% link molt/molt-fetch.md %}#define-transformations). +- The new Fetch [`--transformations-file`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag specifies a JSON file for schema/table/column transformations, which has validation utilities built in. ### July 10, 2024 `molt` 1.1.1 is [available](#installation). -- Fixed a bug that led to incorrect list continuation file behavior if a trailing slash was provided in [`--bucket-path`]({% link molt/molt-fetch.md %}#global-flags). +- Fixed a bug that led to incorrect list continuation file behavior if a trailing slash was provided in [`--bucket-path`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags). - Fixed a bug with extracting the filename from a failed import URL. Previously, an older filename was being used, which could result in duplicated data. Now, the filename that is used in import matches what is stored in the exceptions log table. -- Added a [`--use-implicit-auth`]({% link molt/molt-fetch.md %}#global-flags) flag that determines whether [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) is used for cloud storage import URIs. +- Added a [`--use-implicit-auth`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag that determines whether [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) is used for cloud storage import URIs. ### July 8, 2024 @@ -300,14 +300,14 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.0.0 is [available](#installation). -- Renamed the `--table-splits` flag to [`--concurrency-per-table`]({% link molt/molt-fetch.md %}#global-flags), which is more descriptive. -- Increased the default value of [`--import-batch-size`]({% link molt/molt-fetch.md %}#global-flags) to `1000`. This leads to better performance on the target post-migration. Each individual import job will take longer, since more data is now imported in each batch, but the sum total of all jobs should take the same (or less) time. +- Renamed the `--table-splits` flag to [`--concurrency-per-table`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags), which is more descriptive. +- Increased the default value of [`--import-batch-size`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) to `1000`. This leads to better performance on the target post-migration. Each individual import job will take longer, since more data is now imported in each batch, but the sum total of all jobs should take the same (or less) time. ### May 29, 2024 `molt` 0.3.0 is [available](#installation). -- Added an [`--import-batch-size`]({% link molt/molt-fetch.md %}#global-flags) flag, which configures the number of files to be imported in each `IMPORT` job. +- Added an [`--import-batch-size`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag, which configures the number of files to be imported in each `IMPORT` job. - In some cases on the previous version, binaries would not work due to how `molt` was being built. Updated the build method to use static linking, which creates binaries that should be more portable. - [`VARBIT`]({% link {{ site.current_cloud_version }}/bit.md %}) <> [`BOOL`]({% link {{ site.current_cloud_version }}/bool.md %}) conversion is now allowed for Fetch and Verify. The bit array is first converted to `UINT64`. A resulting `1` or `0` is converted to `true` or `false` accordingly. If the `UINT64` is another value, an error is emitted. @@ -316,7 +316,7 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 0.2.1 is [available](#installation). - MOLT tools now enforce secure connections to databases as a default. The `--allow-tls-mode-disable` flag allows users to override that behavior if secure access is not possible. -- When using MySQL as a source, [`--table-concurrency`]({% link molt/molt-fetch.md %}#global-flags) and [`--export-concurrency`]({% link molt/molt-fetch.md %}#global-flags) are strictly set to `1`. +- When using MySQL as a source, [`--table-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) and [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) are strictly set to `1`. - Fixed a bug involving history retention for [`DECIMAL`]({% link {{ site.current_cloud_version }}/decimal.md %}) values. ### May 3, 2024 @@ -324,9 +324,9 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 0.2.0 is [available](#installation). - Fetch now supports CockroachDB [multi-region tables]({% link {{ site.current_cloud_version }}/multiregion-overview.md %}). -- Fetch now supports continuous replication for PostgreSQL and MySQL source databases via the [`--ongoing-replication`]({% link molt/molt-fetch.md %}#global-flags) flag. When Fetch finishes the initial data load phase, it will start the replicator process as a subprocess, which runs indefinitely until the user ends the process with a `SIGTERM` (`ctrl-c`). -- Replicator flags for ([PostgreSQL](https://github.com/cockroachdb/replicator/wiki/PGLogical#postgresql-logical-replication) and [MySQL](https://github.com/cockroachdb/replicator/wiki/MYLogical#mysqlmariadb-replication)) are now supported, allowing users to further configure the [`--ongoing-replication`]({% link molt/molt-fetch.md %}#global-flags) mode for their use case. -- Added the [`--type-map-file`]({% link molt/molt-fetch.md %}#global-flags) flag, which enables custom type mapping for schema creation. +- Fetch now supports continuous replication for PostgreSQL and MySQL source databases via the [`--ongoing-replication`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag. When Fetch finishes the initial data load phase, it will start the replicator process as a subprocess, which runs indefinitely until the user ends the process with a `SIGTERM` (`ctrl-c`). +- Replicator flags for ([PostgreSQL](https://github.com/cockroachdb/replicator/wiki/PGLogical#postgresql-logical-replication) and [MySQL](https://github.com/cockroachdb/replicator/wiki/MYLogical#mysqlmariadb-replication)) are now supported, allowing users to further configure the [`--ongoing-replication`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) mode for their use case. +- Added the [`--type-map-file`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag, which enables custom type mapping for schema creation. - Fixed a bug where primary key positions could be missed when creating a schema with multiple primary keys. - Added a default mode for MySQL sources that ensures consistency and does not leverage parallelism. New text is displayed that alerts the user and links to documentation in cases where fetching from MySQL might not be consistent. - Logging for continuation tokens is now omitted when data export does not successfully complete.