Skip to content

PS-11001: Enable purge_binlogs on the s3 storage backend#136

Open
kamil-holubicki wants to merge 3 commits into
Percona-Lab:mainfrom
kamil-holubicki:PS-11001-part-3
Open

PS-11001: Enable purge_binlogs on the s3 storage backend#136
kamil-holubicki wants to merge 3 commits into
Percona-Lab:mainfrom
kamil-holubicki:PS-11001-part-3

Conversation

@kamil-holubicki
Copy link
Copy Markdown
Collaborator

No description provided.

https://perconadev.atlassian.net/browse/PS-11001

Added a new 'binlog_server list <json_config_file>' subcommand that
enumerates every binlog file currently present in the configured
storage and prints them, in chronological (storage) order, as a JSON
document on stdout. The per-record schema matches the existing
'search_by_timestamp' / 'search_by_gtid_set' responses (name, size,
uri, previous_gtids, added_gtids, min_timestamp, max_timestamp) and
the 'binsrv::models::search_response' class is reused as-is.
Unlike the search subcommands, 'list' returns 'status:success' with
an empty 'result' array on an empty storage instead of raising an
error. This makes 'list' the primary diagnostic command operators
can use to distinguish a freshly initialized (empty) storage from a
corrupted one.

Co-authored-by: Cursor <cursoragent@cursor.com>
}
if (::fsync(file_descriptor) != 0) {
const auto saved_errno = errno;
::close(file_descriptor);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be boost::scoped_exit to avoid calling ::close() from 2 places

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

util::exception_location().raise<std::runtime_error>(
"cannot close underlying tmp object file");
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we could also call fsync() for file data as well here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

void remove_object(std::string_view name);
// Batch remove: drops each named object, then runs a single
// durability barrier for the whole batch.
void remove_objects(std::span<const std::string> names);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the asymmetry here (remove_object() accepts std::string_view, whereas remove_objects() expects std::strings)
Ideally, it would be great to have something like

void remove_objects(std::span<const std::string_view> names);

but I understand that it will be very inconvenient to use (even for our use case)

Comment on lines +50 to +58
for (const auto &name : names) {
try {
do_remove_object(name);
} catch (...) { // NOLINT(bugprone-empty-catch)
// best-effort: per-object failures are intentionally swallowed
// (see 'remove_objects' contract in the header)
}
}
do_fsync();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be do our best trying to remove individual objects but still throw if at least one of the deletes failed.

Suggested change
for (const auto &name : names) {
try {
do_remove_object(name);
} catch (...) { // NOLINT(bugprone-empty-catch)
// best-effort: per-object failures are intentionally swallowed
// (see 'remove_objects' contract in the header)
}
}
do_fsync();
bool error_occurred{false};
for (const auto &name : names) {
try {
do_remove_object(name);
} catch (...) {
// best-effort: per-object failures are intentionally swallowed
// (see 'remove_objects' contract in the header)
error_occurred = true;
}
}
do_fsync();
if (error_occurred) {
throw ...;
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it so that the first error bubbles up to the caller and is included in the JSON response as a warning. If needed in the future, we can collect files + failures and bubble them up, so the JSON response could contain all failed files and the exact reasons.


void s3_storage_backend::aws_context::delete_object(
const qualified_object_path &target) const {
Aws::S3Crt::Model::DeleteObjectRequest delete_object_request;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add a // TODO: comment here

Consider deleting several objects in one request by using `DeleteObjects()` AWS SDK C++ API call

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Add 'binlog_server purge_binlogs <config> <binlog_name>' which removes
every binlog file with sequence number less than or equal to <binlog_name>
(a contiguous prefix). The current tail is refused: emptying the storage
would lose the resume position and force the next 'fetch' / 'pull' to
re-stream from the beginning of the source's retained binlog history.

Phase 1 only: 'file' backend; v2 will extend to 's3' once 'remove_object'
is wired to 'DeleteObject'.

Self-healing recovery from a purge crashed
between the index rewrite and the payload/metadata deletion is left for
later phases - existing constructor validators refuse to open such
a storage and surface it for manual cleanup.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Collaborator

@percona-ysorokin percona-ysorokin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants