feat(homeserver, testnet): add configurable setup for pubky-homeserver and pubky-testnet#361
feat(homeserver, testnet): add configurable setup for pubky-homeserver and pubky-testnet#361techraed wants to merge 10 commits intopubky:mainfrom
pubky-homeserver and pubky-testnet#361Conversation
pubky-homeserver and pubky-testnetpubky-homeserver and pubky-testnet
pubky-homeserver and pubky-testnetpubky-homeserver and pubky-testnet
|
cc @SeverinAlexB @86667 Also I have a question regarding
If the second choice is the intended use case, can I request to implement it in other PR, not in the current one. As that case requires to think thoroughly how to handle the case with metadata is postgres. |
|
Hey @techraed thanks for your work! Nicely done, I have a couple comments around making changes as simple as possible in open codebases. There are a lot of changes here. Im not sure why the cli has changed, and the refactor and directory rename makes it difficult to understand what is happening and is technically a breaking change. Please separate behavioral changes from refactors either in different PRs or at least different commits, and seriously consider whether a rename of a common type is necessary. It may be that the renaming/refactor is absolutely necessary for the desired behaviour to work, if so please justify it.
Hm, if the second option is more difficult then I agree lets keep it for a separate piece of work. Achieving only 1. is still progress in the right direction. |
Yeah, you right about that. Sorry, forgot that for open codebases it's usually preferred small and only relative changes. Anyway, let me try to justify these changes. The CLIThe original issue states the following: "It's not possible to choose the data dir for the homeserver". And the same issue is for the Frankly speaking, the Problem-1 Config and the data path are the sameThe problem is that under the path there must the config and the secrets file. Therefore if you have some other files location, you must transfer there your config and secrets file, because the setup code for the homeserver has a strong dependency of the config and secrets files from the data directory (must be in the same directory). Problem-2 Low UX mitigationsAlright, you could mitigate that just by the following changes to storage config toml: pub enum StorageConfigToml {
/// Files are stored on the local file system.
FileSystem { path: PathBuf }, // <------------- Added here the path
// other variants here
}That would allow not touching the CLI, but the UX would be very low — basically, you would need to create a directory, inside the directory you must define a config file, which must define the path. SolutionSo what's the best and ergonomic way (imho) to define a custom data dir without having to think of config and secretes files being close to data dir? Change the CLI! # Calling like this would create everything inside the ~/.pubky
./target/release/homeserver
# Remains the same behavior
./target/release/after_this_pr_homeserver
# Calling like this would create everything inside the /etc/some/path/to/data
./target/release/homeserver --data-dir /etc/some/path/to/data
# Remains the same behavior
./target/release/after_this_pr_homeserver --data-dir /var/lib/homeserver/data
# Now config and data are in the separate dirs. Secrets will be created in the data dir.
./target/release/after_this_pr_homeserver --data-dir ~/data --config /etc/configs/homeserver_config.toml
# Now everything is accessible from their own paths
./target/release/after_this_pr_homeserver --data-dir /var/lib/homeserver/data --config /etc/configs/homeserver_config.toml --secret-key /etc/secrets/homeserver_secretsRenaming because of the CLI changesSo the CLI changes bring the separation between different setup sources. So we can't say now the path is to persistent data directory. It's now a path to different configs. That's now a directory is renamed:
|
|
@86667 shall I remain the same the PR? or divide changes related to logic and renamings (different commits)? |
|
Hey @techraed
I understand that there is some way to justify 41 files changes and breaking the interface between modules. The question is whether its worth the pain. Ideally, we would keep changes as minimal as possible so that we can quickly test and confirm all is well, then iterate quickly.
This would be great, if we could have a minimal PR which solves a single problem. This being said, I do agree that the task here is fairly under-specified and the development of it is itself research into what is best. I think that maybe we could achieve what we need (data dir specified so that it can be reused later) by the more simple change of allowing the path of That way, we could later add an optional var to Looking forward to seeing what you think, thanks again for getting invovled! |
|
I re-considered what were doing here and decided to update the original issue. The actual need is in the testnet only, the homeserver I think we can get away with not changing. See this issue comment - #185 (comment) |
|
Sure, that's a right thing to do. After the #361 (comment) I thought in the same direction. Will do |
|
hey @86667 Only focused on providing an ability to use persistent data storage (i.e., file system) for tests. Basic scenario which was described in the #185 (comment) is implemented in here - https://github.com/pubky/pubky-core/pull/361/changes#diff-87ec3cd0feb2ec7271c55b5a28635507f0a79f26433b869d2977b872c07cf6a9R922. Summary of changes:
|
There was a problem hiding this comment.
I may be wrong here, but it seems like the quickest path to get the functionality working is implemented and pushed without much thought around the code base's abstractions, understand-ability and ease to build upon.
With AI, code is cheap. But so also is review, research, design discussion, etc. Therefore as code cost decreases the expected code quality increases.
Happy to review once more, but i don't have the resources to continue a back and forth for issues which a simple AI "review this code" prompt can find. Hope you understand.
| std::fs::create_dir_all(path)?; | ||
|
|
||
| // Check if we can write to the data directory | ||
| let test_file_path = path.join("test_write_f2d560932f9b437fa9ef430ba436d611"); // random file name to not conflict with anything |
There was a problem hiding this comment.
This could conflict with itself. Generate a value or include timestamp maybe?
| let signer = pubky.signer(keypair); | ||
| signer | ||
| .pkdns() | ||
| .publish_homeserver_force(Some(&homeserver.public_key())) |
There was a problem hiding this comment.
I would expect the homeserver to automatically republish the record to the new DHT. Does this not happen? Or is this so that the test doesnt have to wait?
| let keypair = keypair.unwrap_or_else(pubky_common::crypto::Keypair::random); | ||
| std::fs::create_dir_all(&data_dir)?; | ||
|
|
||
| debug_assert!( |
There was a problem hiding this comment.
Why use debug here? Feels like something we would always want to check for?
| } | ||
|
|
||
| fn ensure_data_dir_exists_and_is_writable(&self) -> anyhow::Result<()> { | ||
| Ok(()) // Always ok because this is validated by the tempfile crate. |
There was a problem hiding this comment.
// Always ok because this is validated by the tempfile crate.
This comment is still useful for the MockDataDirKind::Temp case i think. Maybe match on MockDataDirKind and put the comment in the Temp branch
| const TEST_DB_NAME_KEY: &'static str = "pubky-test-db-name"; | ||
| const TEST_PERSIST_KEY: &'static str = "pubky-test-persist"; | ||
| const DEFAULT_CONNECTION_STRING: &'static str = | ||
| "postgres://localhost:5432/postgres?pubky-test=true"; |
There was a problem hiding this comment.
Why have you changed the default connection string? This will break devs with an authenticated local setup (which is likely everyone)
| } | ||
|
|
||
| /// Returns true if the root of this [`MockDataDir`] is a temporary directory. | ||
| pub fn is_temp(&self) -> bool { |
There was a problem hiding this comment.
Did you intend to use this? If no, remove
| /// Same as [`MockDataDir::test()`] but with a real directory that is not cleaned up on drop. | ||
| /// Use this for integration tests that need to verify persistence across process restarts. | ||
| #[cfg(any(test, feature = "testing"))] | ||
| pub fn test_persistent_data_dir(data_dir: PathBuf) -> Self { |
There was a problem hiding this comment.
Unused, was this intended for use? The integration test uses .with_data_dir()
| let mock_dir = MockDataDir::new(config, Some(keypair))?; | ||
| let mock_dir = if let Some(data_dir) = self.data_dir { | ||
| config.storage = StorageConfigToml::FileSystem; | ||
| MockDataDir::new_persistent_data_dir(data_dir, config, Some(keypair))? |
There was a problem hiding this comment.
This all feels a bit sloppy. We have data_dir, config.storage and MockDataDirKind all determining whether were using memory or file system, and acting on that info in different places.
There was a problem hiding this comment.
We should figure out a way to have a single decision point with all MockDataDirKind logic in one place. The memory/file system decision is spread across three layers, they should ideally be in-sync at construction time
| self.db_name.clone(), | ||
| self.connection_string.clone(), | ||
| ); | ||
| if self.should_drop { |
There was a problem hiding this comment.
Very fishy. Drop is for dropping.
db_dropper: Option<std::sync::Arc<TestDbDropper>>, is optional for this reason, just dont set it. Here youre setting the dropper and passing in a flag telling it to carry out its only purpose or not. This is sloppy and how code bases get into spaghetti mess.
If you need the TestDbDropper metadata then maybe consider pulling it into a separate TestDbInfo struct, and only creating the dropper when cleanup is actually wanted.
Resolves issue #185
Changes summary:
1. homeserver CLI now accepts 3 path args: to data dir, to secret file and to config file. The resolution algo:- If path is defined, then the value is used
- If path is not define, then data dir path is used as a base path
- If data dir is not defined, then home directory is used (
~/.pubky)The CLI behavior didn't change if no params are provided or only if data dir param is provided - that way changes are backward compatible.
2.
PersistentDataDiris renamed toHomeserverPaths, and the latter one stores all three resolved paths. Consequently, the moduledata_directoryis also renamed tohomeserver_config.3. Due to changes semantics described upper,
DataDiris now renamed toSetupSource(also module is renamed). The trait is no longer aCloneimplementor, as homeserver app covers the trait object withArc.4.
MockDataDiris renamed toMockSetupSource. The module is also renamed. The type now can be instantiated with a path to data dir.5.
EphermalTestnetcan now define a data dir in its' builder. If data dir was defined for an ephermal testnet, then a file system will be used by open-dal. Otherwise testnet storage remains in-memory.UPD: the original issue was changed. See further comments to understand the context and changes summary.