PS-11001: Implement logic for purge_binlogs mode (local filesystem storage only)#133
PS-11001: Implement logic for purge_binlogs mode (local filesystem storage only)#133kamil-holubicki wants to merge 2 commits into
Conversation
https://perconadev.atlassian.net/browse/PS-11001 Added a new 'binlog_server list <json_config_file>' subcommand that enumerates every binlog file currently present in the configured storage and prints them, in chronological (storage) order, as a JSON document on stdout. The per-record schema matches the existing 'search_by_timestamp' / 'search_by_gtid_set' responses (name, size, uri, previous_gtids, added_gtids, min_timestamp, max_timestamp) and the 'binsrv::models::search_response' class is reused as-is. Unlike the search subcommands, 'list' returns 'status:success' with an empty 'result' array on an empty storage instead of raising an error. This makes 'list' the primary diagnostic command operators can use to distinguish a freshly initialized (empty) storage from a corrupted one. Co-authored-by: Cursor <cursoragent@cursor.com>
| #include <string_view> | ||
| #include <system_error> | ||
|
|
||
| #include <fcntl.h> |
There was a problem hiding this comment.
To be honest, I don't like polluting standard-conformant c++ code with posix. I also understand that in our project we will definitely need fsync() for files and / or directories. What I suggest is to create a separate header, say util/native_file_operations_helpers.hpp with c++-conformant interface (no posix includes). At the same time put the implementation with all the posix-spacific calls to the util/native_file_operations_helpers_posix.cpp. Who knows, may be one day we will have util/native_file_operations_helpers_windows.cpp.
In this case we will include either util/native_file_operations_helpers_posix.cpp or util/native_file_operations_helpers_windows.cpp to the project conditionally using CMake means.
| --exec $BINSRV purge_binlogs $MYSQL_TMP_DIR/no_such_config.json $first_binlog > $read_from_file | ||
| --source include/read_file_to_var.inc | ||
| --assert(`SELECT JSON_EXTRACT('$result', '$.status') = 'error'`) | ||
|
|
There was a problem hiding this comment.
You should probably add "trying to execute binlog_purge with currently active file specified in the command line". If I haven't missed this among other cases.
There was a problem hiding this comment.
Please see case 9 below
There was a problem hiding this comment.
I knew I missed it :)
| // remove several objects back-to-back from the same backend and | ||
| // want to amortise the fsync cost over the whole batch by passing | ||
| // 'do_fsync=true' on the final call only. | ||
| void remove_object(std::string_view name, bool do_fsync = true); |
There was a problem hiding this comment.
I don't like the idea of exposing do_fsync() to public interface. I'd rather have
2 private methods
virtual void do_remove_object(std::string_view name) = 0;
virtual void do_fsync() = 0;and 2 public methods:
void remove_object(std::string_view name) {
do_remove_object(name);
do_fsync();
}
void remove_objects(std::collection<std::string_view> names) {
for(const auto &name: names) {
do_remove_object();
}
do_fsync();
}What do you think?
There was a problem hiding this comment.
I think the way I proposed is more flexible. But if we agree that do_fsync is must have after remove_object or remove_objects, we can go your way. To be honest I think both approaches are more or less equal.
If you insist, I will change.
There was a problem hiding this comment.
I would insist, flushing is not something users of the class should care about. For them, it should just work.
|
|
||
| void filesystem_storage_backend::do_put_object(std::string_view name, | ||
| util::const_byte_span content) { | ||
| // atomic-overwrite is implemented via the standard POSIX |
There was a problem hiding this comment.
I have one concern here. The suggested scehema with single rename is totally OK for POSIX, renaming a file to an existing one is a well-defined atomic operation. No questions here.
However, as you may have already noticed our S3 uploads are not optimal. We always completely owerwrite full existing object. However, there is an area for improvemnt here. We can initiate a multipart upload and compose a new object from a reference to an existing one and a new data block.
In other words, if we have
/path/object1.data (1024 bytes)
we can create
/path/object2.data (2048 bytes total : 1024 from /path/object1.data and 1024 bytes from new data block upload)
However, S3 does NOT have a rename operation at all. So, you won't be able to just change the name of
/path/object2.data back to /path/object1.data.
You can only
- delete
/path/object1.data` - copy
/path/object2.datato/path/object1.data - delete
/path/object2.data`
So, my question here is what atomic rename scheme we should use so that we would have a single recovery algorithm for both file and s3 storage backends.
Still not 100% sure here but one of the options could be (for file storage backend)
- create .tmp1 with new content
- move original to .tmp2
- rename .tmp1 to original
- remove .tmp2
There was a problem hiding this comment.
As per documentation S3 PutObject is atomic. So our implementation of s3_storage_backend::do_put_object is atomic. No changes needed here.
In other words: create - rename pattern is only for posix. For S3, PutObject is already atomic.
There was a problem hiding this comment.
Yes, Kamil. Currently, PutObject is atomic and no problems with it. Here I am going a bit ahead of time when we decide to simulate this append operation with MultipartUploads. But again, this is definitely another task.
6564e25 to
f227868
Compare
Add 'binlog_server purge_binlogs <config> <binlog_name>' which removes every binlog file with sequence number less than or equal to <binlog_name> (a contiguous prefix). The current tail is refused: emptying the storage would lose the resume position and force the next 'fetch' / 'pull' to re-stream from the beginning of the source's retained binlog history. Phase 1 only: 'file' backend; v2 will extend to 's3' once 'remove_object' is wired to 'DeleteObject'. Self-healing recovery from a purge crashed between the index rewrite and the payload/metadata deletion is left for later phases - existing constructor validators refuse to open such a storage and surface it for manual cleanup. Co-authored-by: Cursor <cursoragent@cursor.com>
f227868 to
78d4661
Compare
Add 'binlog_server purge_binlogs <binlog_name>' which removes
every binlog file with sequence number less than or equal to <binlog_name>
(a contiguous prefix). The current tail is refused: emptying the storage
would lose the resume position and force the next 'fetch' / 'pull' to
re-stream from the beginning of the source's retained binlog history.
Phase 1 only: 'file' backend; v2 will extend to 's3' once 'remove_object'
is wired to 'DeleteObject'.
Self-healing recovery from a purge crashed
between the index rewrite and the payload/metadata deletion is left for
later phases - existing constructor validators refuse to open such
a storage and surface it for manual cleanup.