All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Skip
AggregateFunctioncolumns on Clickhouse when describing the schema.
- Previous fix for ordering issue was not properly handling blocks batched together and was working at block level. Fix now extended to work when also inside a batch.
- Fixed a potential ordering issue if some tables are seen before others but rows are still dependent of each other.
-
Fixed delete operation on composite keys.
-
Read tables information only for given schema to avoid walking non-related tables.
-
Improved signal handling by properly logging which signal was received.
-
If a
clickhouse_table_options.partition_fieldsalready contains some partition for_block_timestamp_, the default(toYYYYMM(_block_timestamp_))will not be added anymore. -
The default partition on
_block_timestamp_is now(toYYYYMM(_block_timestamp_))instead of just_block_timestamp_.
- Automatically skip block flush if the average flush duration is above the time between blocks, preventing drift in these situations
-
An
clickhouse_table_optionsoptionorder_by_fieldsis required now when usingsubstreams-sink-sql from proto, for example you should go from:message Transfer { option (schema.table) = { name: "tranfers" }; string id = 1; string from = 2; string to = 3; }
To:
message Transfer {
option (schema.table) = {
name: "tranfers",
clickhouse_table_options: {
order_by_fields: [
{ name: "id" }
]
}
};
string id = 1;
string from = 2;
string to = 3;
}The selected column will be used to as the ORDER BY <column> clause when generating the table schema using ReplacingMergeTree.
-
Fixed generated schema column
_blocks_.numberto be of typeUint64. -
Fixed generated schema column
<table>._block_number_on all tables to be of typeUInt64.
- Replaced hardcoded field names with dialect constants in PostgreSQL inserters for better consistency
- Updated
accumulator_inserter.goto useDialectFieldBlockNumberandDialectFieldBlockTimestampconstants - Updated
row_inserter.goto useDialectFieldBlockNumberandDialectFieldBlockTimestampconstants - Ensures all proto-related code uses centralized field name constants instead of hardcoded strings
- Updated
- Added flush retry mechanism with configurable retry count and incremental backoff delay
- New
--flush-retry-countflag (default: 3) to set number of retry attempts for flush operations - New
--flush-retry-delayflag (default: 1s) to set base delay for incremental retry backoff - Automatic retry on flush failures with exponential delay (1s, 2s, 3s, etc.)
- Improved error handling and logging for flush operations
- New
- Added support for
--clickhouse-clusteron allCREATEstatements. - Added support for specifying the output-module on the command-line in 'from-proto' mode, without requiring a "sink config" in the package.
ex:
substreams-sink-sql from-proto psql://.. substreams.yaml map_my_data - Fixed a bug causing a panic when the protobuf file containing the outputType is not the last file
KNOWN ISSUES:
from-protoexecution mode does not support 'repeated fields for native types' in protobuf definition, ex:repeated stringorrepeated bytesOnlyrepeated MyStructis supported for now.
-
Added support for
Upsertoperation in Postgres, Clickhouse isn't supported right now. -
Improved batch block flush logic to flush after a certain number of blocks instead of taking by modulo.
-
Fixed cursor loading on Clickhouse.
-
Fixed a bug with not flushing rows on stream completion.
-
Added
--clickhouse-clusterflag. If non-empty the sink will apply aON CLUSTER <cluster>clause when setting up tables and replace table engines with theirReplicatedcounterparts. -
Added flushed block number and time drift metrics.
-
Added more flags to configure flushing intervals.
Available flags are
batch-block-flush-interval,batch-row-flush-intervalandlive-block-flush-interval, checksubstreams-sink-sql run --helpfor full documentation about the flags.
-
Added support for the Clickhouse
Datetype. -
Deprecated the existing
flush-intervalflag in favor ofbatch-block-flush-interval. -
Fixed handling of the Clickhouse
Arraytype. -
Removed the check for duplicate primary keys on the Clickhouse dialect. This allows inserting multiple rows with the same primary key.
- Added a check for non-existent columns in Clickhouse
- Added support for
Nullabletypes in Clickhouse. - Bump substreams to
v0.11.1 - Bump substreams-sink to
v0.5.0
- Fix major bug when receiving empty
MapOutput
- Bump substreams to v1.10.3 to support new manifest data like
protobuf:excludePaths
- Added the --cursor-table and --history-table flags to allow running to sinks on the same database (be careful that you have no collision in table names)
- bumped substreams to v1.8.2, add some default network endpoints
- Bumped substreams to v1.7.3
- Enable gzip compression on substreams data on the wire
- Fix another case where 'infinite-retry' would not work and the program would stop on an error.
- Enable multiple Substreams authentication methods (API key, JWT), using flags
--api-key-envvarand--api-token-envvar. - Deprecates the use of
SF_API_TOKENenvironment variable, now use defaultSUBSTREAMS_API_TOKENor set your own using--api-token-envvar.
-
Fixed spurious error reporting when the sinker is terminating or has been canceled.
-
Updated
substreamsdependency to latest versionv1.3.7.
- Fixed the timestamp parsing in Clickhouse dialect.
- Fixed the schema in the tutorial for clickhouse.
- Add
--networkflag to override the default value in the manifest or spkg
- Bump version of
schemadependency to fix errors with new Clickhouse versions now usingsystem.tablestable instead ofinformation_schema.tablesview.
- Fix a critical bug breaking the reorg management when more than one row needs to be reverted.
⚠️ This release candidate contains a critical bug in the reorg management and should not be used. Upgrade immediately to v4.0.0-rc.3
- Support more networks with default mappings (ex: solana, soon: optimism, soon: bitcoin)
- Add command "create-user " to help creating more SQL users, read-only or otherwise
- add 'enabled' field under "DBTConfig"
- Removed PgwebFrontend and WireProtocolAccess fields from the SinkConfig message: they will now be deployed when on a development environment, so they are not mentionned here anymore.
⚠️ This release candidate contains a critical bug in the reorg management and should not be used. Upgrade immediately to v4.0.0-rc.3
- Fix an issue preventing the
setupcommand from running on a clickhouse backend because of the reorg settings.
⚠️ This release candidate contains a critical bug in the reorg management and should not be used. Upgrade immediately to v4.0.0-rc.3
- This release brings support for managing reorgs in Postgres database, enabled by default when
--undo-buffer-sizeto 0.
-
A change in your SQL schema may be required to keep existing substreams:SQL integrations working:
- The presence of a primary key (single key or composite) is now MANDATORY on every table.
- The
sf.substreams.sink.database.v1.TableChangemessage, generated inside substreams, must now exactly match its primary key with the one in the SQL schema. - You will need to re-run
setupon your existing PostgreSQL databases to add thesubstreams_historytable. You can use the new--system-tables-onlyflag to perform only that.
-
Since reorgs management is not yet supported on Clickhouse, users will have to set
--undo-buffer-sizeto a non-zero value (12was the previous default)
- Added support for
rest_frontendfield withenabledboolean flag, aimed at this backend implementation: https://github.com/semiotic-ai/sql-wrapper
- Fixed regression:
runcommand was incorrectly only processing blocks staying behind the "FinalBlocks" cliff.
- Fixed support for tables with primary keys misaligned with
database_changes's keys (fixing Clickhouse use case)
- Added support for selecting engine
postgresorclickhouseto sinkconfig protobuf definition
- Fixed missing uint8 and uint16 cast for clickhouse driver conversion
- Fixed default endpoint sfor networks 'goerli' and 'mumbai'
- Added
--postgraphileflag tosetup, which will add a @skip comment on cursor table so Postgraphile doesn't try to serve cursors (it resulted in a name collision with Postgraphile internal names) - Fixed a bug with Clickhouse driver where different integer sizes need explicit conversion
- Fixed an issue where the schema encoded in the SinkConfig part of a manifest would not be encoded correctly, leading to garbled (base64) bytes being sent to the SQL server instead of the schema.
This release brings a major refactoring enabling support for multiple database drivers and not just Postgres anymore. Our first newly supported driver is Clickhouse which defines itself as The fastest and most resource efficient open-source database for real-time apps and analytics. In the future, further database driver could be supported like MySQL, MSSQL and any other that can talk the SQL protocol.
Now that we support multiple driver, keeping the substreams-sink-postgres didn't make sense anymore. As such, we have renamed the project from substreams-sink-postgresql to substreams-sink-sql since it now supports Clickhouse out of the box. The binary and Go modules have been renamed in consequence.
Another major change brought by this release is the usage of Substreams "Deployable Unit". What we call a "Deployable Unit" is a Substreams manifest that fully defines a deployment packaged as a single artifact. This change how the sink is operated; the SQL schema, output module and "Network" identifer are now passed in the "SinkConfig" section of the Substreams manifest instead of being accepted at command line.
Read the Operators section below to learn how to migrate to this new version.
Passing the schema and the module_name to the run and setup commands is no longer accepted via arguments, they need to be written to the substreams.yaml file.
Before:
substreams-sink-sql setup "psql://..." "path/to/schema.sql"
substreams-sink-sql run "psql://..." mainnet.eth.streamingfast.io:443 https://github.com/streamingfast/substreams-eth-block-meta/releases/download/v0.5.1/substreams-eth-block-meta-v0.5.1.spkg db_out [<range>]Now:
- Create a deployable unit file, let's call it
substreams.prod.yamlwith content:
specVersion: v0.1.0
package:
name: "<name>"
version: v0.0.1
imports:
sql: https://github.com/streamingfast/substreams-sink-sql/releases/download/protodefs-v1.0.1/substreams-sink-sql-protodefs-v1.0.1.spkg
main: https://github.com/streamingfast/substreams-eth-block-meta/releases/download/v0.5.1/substreams-eth-block-meta-v0.5.1.spkg
network: mainnet
sink:
module: main:db_out
type: sf.substreams.sink.sql.v1.Service
config:
schema: "./path/to/schema.sql"In this <name> is the same name as what <manifest> defines was, https://github.com/streamingfast/substreams-eth-block-meta/releases/download/v0.5.1/substreams-eth-block-meta-v0.5.1.spkg is the current manifest you deploy.
The ./path/to/schema.sql would point to your schema file (path resolved relative to parent directory of substreams.prod.yaml).
The 'network: mainnet' will be used to resolve to an endpoint. You can configure each network to have its own endpoint via environment variables SUBSTREAMS_ENDPOINTS_CONFIG_<NETWORK> or override this mechanism completely by using the --endpoint flag. Most used networks have default endpoints.
- Setup your database:
substreams-sink-sql setup <dsn> substreams.prod.yaml- Run the sink:
substreams-sink-sql run <dsn> substreams.prod.yamlSimilar changes have been applied to other commands as well.
-
Added average flush duration to sink stats.
-
Added log line when flush time to database is
> 5sinINFOand inWARNif> 30s.
- Fixed
pprofHTTP routes not properly registered.
- Renamed metric Prometheus metric
substreams_sink_postgres_flushed_entries_counttosubstreams_sink_postgres_flushed_rows_count, adjust your dashboard if needed and change it found aGaugeto aCounter.
-
Refactored internal code to support multiple database drivers.
-
Experimental
clickhouseis now supported as a newclickhouseis now supported* Added driver abstractionYou can connect to Clickhouse by using the following DSN:
- Not encrypted:
clickhouse://<host>:9000/<database>?username=<user>&password=<password> - Encrypted:
clickhouse://<host>:9440/<database>?secure=true&skip_verify=true&username=<user>&password=<password>
If you want to send custom args to the connection, you can use by sending as query params.
- Not encrypted:
- Bumped
logginglibrary to latest version which should fixed problem where containerized workload are not printing logs out in JSON format.
This is a bug fix release containing a fix for inserting rows into a table for which no primary key constraint exist. For now, we still requires internally that your provide an id in your DatabaseChange of your row, a future update will lift that limitations.
This releases brings improvements to reported progress message while your Substreams executes which should greatly enhanced progression tracking
Note
Stay tuned, we are planning even more useful progression tracking now that we've updated progression data sent back to the client!
This releases also introduces a new mode to dump data in the database at high speed, useful for large amount of data insertion.
Bumped substreams-sink v0.3.1 and substreams to v1.1.12 to support the new progress message format. Progression now relates to stages instead of modules. You can get stage information using the substreams info command starting from version v1.1.12.
Important
This client only support progress messages sent from a server using substreams version >=v1.1.12
substreams_sink_progress_messageremoved in favor ofsubstreams_sink_progress_message_total_processed_blockssubstreams_sink_progress_message_last_end_blockremoved in favor ofsubstreams_sink_progress_message_last_block(per stage)
- Added
substreams_sink_progress_message_last_contiguous_block(per stage) - Added
substreams_sink_progress_message_running_jobs(per stage)
A new injection method has been added to this substreams-sink-postgres release. It's a 2 steps method that leverage COPY FROM SQL operations to inject at high speed a great quantity of data.
Note
This method will be useful if you insert a lot of data into the database. If the standard ingestion speed satisfy your needs, continue to use it, the new feature is an advanced use case.
See the High Throughput Injection section of the README.md file to check how to use it.
-
Added newer method of populating the database via CSV (thanks @gusinacio!).
Newer commands:
generate-csv: Generates CSVs for each tableinject-csv: Injects generated CSV rows for<table>
-
gRPC
InvalidArgumenterror(s) are not retried anymore like specifying and invalid start block or argument in your request. -
Breaking Flag shorthand
-pfor--plaintexthas been re-assigned to Substreams params definition, to align withsubstreams run/guion that aspect. There is no shorthand anymore for--plaintext.If you were using before
-p, please convert to--plaintext.Note We expect that this is affecting very few users as
--plaintextis usually used only on developers machine.
- Added support for
--params, -p(can be repeated multiple times) on the form-p <module>=<value>.
-
Added logging of new
Sessionreceived values (linear_handoff_block,max_parallel_workersandresolved_start_block). -
Added
--header, -H(can be repeated multiple times) flag to pass extra headers to the server.
- Now reporting available columns when an unknown column is encountered.
- Batches written to the database now respects the insertion ordering has received from your Substreams. This fixes for example auto-increment to be as defined on the chain.
- Fixed problem where string had unicode character and caused
pq: invalid message format.
-
The
substreams-sink-postgres setupcommand has been fixed to use the correct schema defined by the DSN. -
The
cursorstable suggestion when the table is not found has been updated to be in-sync with table used insubstreams-sink-postgres setup.
- Now using Go Protobuf generate bindings from https://github.com/streamingfast/substreams-sink-database-changes.
-
Added
Composite keyssupport following the update insubstreams-database-changeThe code was updated to use
oneOfprimary keys (pk and composite) to keep backward compatibility. Therefore, Substreams using older versions ofDatabaseChangecan still use newer versions ofpostgres-sinkwithout problems. To use composite key, define your schema to use Postgres composite keys, update to latest version ofsubstreams-database-changesand update your code to send aCompositePrimaryKeykey object for theprimary_keyfield of theTableChangemessage. -
Added escape to value in case the postgres data type is
BYTES. We now escape the byte array.
- Added back support for old Substreams Database Change Protobuf package id
sf.substreams.database.v1.DatabaseChanges.
- Reduced the amount of allocations and escaping performed which should increase ingestion speed, this will be more visible for Substreams where a lot of entities and columns are processed.
- The
schemais correctly respected now for the thecursorstable.
It appeared that the cursor was not saved properly until the first graceful shutdown of substreams-sink-postgres. Furthermore, the on exit save was actually wrong because it was saving the cursor without flushing accumulated data which is wrong (e.g. that we had N blocks in memory unflushed and a cursor, and we were saving this cursor to the database without having flushed the in memory logic).
This bug has been introduced in v2.0.0 by mistake which means if we synced a new database with v2.0.0+, there is a good chance your are actually missing some data in your database. It's highly recommended that you re-synchronize your database from scratch.
Note If your are using the same
.spkgthat you are using right now, database ingestion from scratch should go at very high speed because you will be reading from previously cached output, so the bottleneck should be network and the database write performance.
In the release, we change a big how cursor is associated to the <module>'s hash in the database and how it's stored.
Prior this version, when loading the cursor back from the database on restart, we were retrieving the cursor associated to the <module>'s hash received by substreams-sink-postgres run. The consequence of that is that if you change the .spkg version you were sinking with, on restart we would find no cursor since the module's hash of this new .spkg would have changed and which you mean a full sync back would be happening because we would start without a cursor.
This silent behavior is problematic because it could seen like the cursor was lost somehow while actually, we just picked up a new one from scratch because the .spkg changed.
This release brings in a new flag substreams-sink-postgres run --on-module-hash-mistmatch=error (default value shown) where it would control how we should react to a changes in the module's hash since last run.
- If
erroris used (default), it will exit with an error explaining the problem and how to fix it. - If
warnis used, it does the same as 'ignore' but it will log a warning message when it happens. - If
ignoreis set, we pick the cursor at the highest block number and use it as the starting point. Subsequent updates to the cursor will overwrite the module hash in the database.
There is a possibility that multiple cursors exists in your database, hence why we pick the one with the highest block. If it's the case, you will be warned that multiple cursors exists. You can run substreams-sink-postgres tools cursor cleanup <manifest> <module> --dsn=<dsn> which will delete now useless cursors.
The ignore value can be used to change to a new .spkg while retaining the previous data in the database, the database schema will start to be different after a certain point where the new .spkg became active.
- Added
substreams-sink-postgres run --on-module-hash-mistmatch=errorto control how a change in module's hash should be handled.
- Changed behavior of how cursor are retrieved on restart.
-
Fixed
cursornot being saved correctly until the binary exits. -
Fixed wrong handling of updating the cursor, we were not checking if a row was updated when doing the flush operation.
-
Fixed a bug where it was possible if the sink was terminating to write a cursor for data not yet flushed. This was happening if the
substreams-sink-postgres runwas stopped before we ever written a cursor, which normally happens each 1000 blocks. We don't expect anybody to have been hit by this but if you are unsure, you should check data for the 1000 first blocks of you sink (for example from 11 000 000 to 11 001 000 if your module start block was 11 000 000).
-
Column's schema type that are not known by the
sqllibrary we know will now be transferred as-is to the database.There is a lot of column's for which the
sqllibrary we use have to Go representation for by default. This is the case for example for thenumericcolumn's type. Previously, this would be reported directly as an error, Now, we pass the received value from your Substreams unmodified to the database engine. It will be your responsibility to send the data in the right format accepted by the database. We send the value as-is, without escaping and without sanitization, so this is a risk if you don't control the Substreams.
- When doing
substreams-sink-postgres run, the<manifest>argument now accepts directory like..
- Fixed timestamp received in RFC3339 format.
- Diminish amount of allocations done to perform fields transformation.
-
Fixed some places where escaping for either identifier or value was not done properly.
-
Fixed double escaping of boolean values.
- Added proper escaping for table & column names to allow keyword column names to use keywords as column names such as
toandfrometc.
This release drops support for Substreams RPC protocol sf.substreams.v1 and switch to Substreams RPC protocol sf.substreams.rpc.v2. As a end user, right now the transition is seamless. All StreamingFast endpoints have been updated to to support the legacy Substreams RPC protocol sf.substreams.v1 as well as the newer Substreams RPC protocol sf.substreams.rpc.v2.
Support for legacy Substreams RPC protocol sf.substreams.v1 is expected to end by June 6 2023. What this means is that you will need to update to at least this release if you are running substreams-sink-postgres in production. Otherwise, after this date, your current binary will stop working and will return errors that sf.substreams.v1.Blocks is not supported on the endpoint.
From a database and operator standpoint, this binary is fully backward compatible with your current schema. Updating to this binary will continue to sink just like if you used a prior release.
The errors coming from Postgres are not retried anymore and will stop the binary immediately.
If you were using environment variable to configure the binary, note that the environment prefix has changed from SINK_ to SINK_POSTGRES_.
- Deprecated The flag
--irreversible-onlyis deprecated, use--final-blocks-onlyinstead.
-
Added command
substreams-sink-postgres tools --dsn <dsn> cursor readto read the current cursors stored in your database. -
Dangerous Added command
substreams-sink-postgres tools --dsn <dsn> cursor write <module_hash> <cursor>to update the cursor in your database for the given<module_hash>Warning This is a destructive operation, be sure you understand the consequences of updating the cursor.
-
Dangerous Added command
substreams-sink-postgres tools --dsn <dsn> cursor delete [<module_hash>|--all]to delete the cursor associated with the given module's hash or all cursors if--allis used.Warning This is a destructive operation, be sure you understand the consequences of updating the cursor.
This is the latest release before upgrading to Substreams RPC v2.
-
Added
--infinite-retryto never exit on error and retry indefinitely instead. -
Added
--development-modeto run in development mode.Warning You should use that flag for testing purposes, development mode drastically reduce performance you get from the server.
-
Added
--irreversible-onlyto only deal with final (irreversible) blocks.