Implement interoperation test API #375

divergentdave · 2022-08-09T16:46:10Z

This adds three new binaries (and container images) to support draft-dcook-ppm-dap-interop-test-design. It includes an end-to-end test that exercises the binaries using the test APIs.

The following items are made public or moved out from behind "test-util" features to support this:

janus_server::aggregator::aggregator_filter()
janus_core::hpke::generate_hpke_config_and_private_key()
janus_core::time::MockClock

branlwyd

This is really nice; I like the container definitions. Have you done any (possibly manual/one-off) testing to make sure the networking etc works out for these containers to allow a realistic test to be done?

I was working on containerizing the functionality in monolithic_integration_test; I'm probably going to refactor that work to use these binaries instead, and it'd be fantastic if I could run them in a container. (Unfortunately, I think until Daphne provides an implementation of the test API too, we'll need a separate interface for the monolithic_integration_test tests, even if Daphne itself is containerized. I guess the cleanest way to implement that is to phrase the monolithic_integration_test functionality in terms of these official-test-API containers, so at least the bulk of the functionality is only written once.)

interop_binaries/src/bin/janus_interop_aggregator.rs

branlwyd · 2022-08-10T16:50:06Z

interop_binaries/src/bin/janus_interop_aggregator.rs

+        ),
+        (1, _) => (
+            Role::Helper,
+            // TODO(issue #370): Task::new() requires that we have a collector authentication


Fix in this PR?

janus/janus_server/src/task.rs

Lines 235 to 237 in 2940c07

if collector_auth_tokens.is_empty() {

return Err(Error::InvalidParameter("collector_auth_tokens"));

}

could be updated to perform its check based on the role: must be empty for helper, non-empty for leader.

branlwyd · 2022-08-10T16:55:31Z

interop_binaries/src/bin/janus_interop_aggregator.rs

+                .default_value("8080")
+                .help("Port number to listen on."),
+        )
+        .arg(


Can we remove this flag entirely and assume 5432? The fact that Postgres exists is an implementation detail of the container, right?

If we want to keep this flag for e.g. out-of-container local testing, make this a full DB connection string?

Yes, this is for the end_to_end test. I'll make that change.

branlwyd · 2022-08-10T16:56:54Z

interop_binaries/src/bin/janus_interop_client.rs

+{
+    let task_id_bytes = base64::decode_config(request.task_id, URL_SAFE_NO_PAD)
+        .context("Invalid base64url content in \"taskId\"")?;
+    let task_id = TaskId::get_decoded(&task_id_bytes).context("Invalid length of TaskId")?;


Suggested change

let task_id = TaskId::get_decoded(&task_id_bytes).context("Invalid length of TaskId")?;

let task_id = TaskId::get_decoded(&task_id_bytes).context("invalid length of TaskId")?;

lowercase errors (throughout)

branlwyd · 2022-08-10T18:21:00Z

interop_binaries/tests/end_to_end.rs

+    // We use std::process instead of tokio::process so that we can kill the child processes from
+    // a Drop implementation. tokio::process::Child::kill() is async, and could not be called from
+    // there.
+    let mut client_command = Command::new(env!("CARGO_BIN_EXE_janus_interop_client"));


Why do these need to be written as multiple statements? I think e.g. arg is meant to be called in a chained fashion, so this could be written as:

let client_command = Command::new(...) .arg("--port") .arg(format!("{}", client_port)) .stout(Stdio::piped()) .stderr(Stdio::piped()) .spawn();

The builder methods each return &mut Command, so at minimum I need the let binding, since I'm not spawning from a temporary value. Beyond that, I pair arguments here for aesthetics, and then chain the rest of the builder methods at once within the for loop.

branlwyd · 2022-08-10T18:27:11Z

interop_binaries/tests/end_to_end.rs

+impl Drop for DropGuard {
+    fn drop(&mut self) {
+        let mut any_error = false;
+        for mut process in self.0.drain(..) {


This type is complicated by having to deal with multiple Children IMO. It would be more easily stated if each DropGuard owned a single Child and implemented the desired drop logic for it.

branlwyd · 2022-08-10T18:31:28Z

interop_binaries/tests/end_to_end.rs

+}
+
+/// RAII guard to ensure that child processes are cleaned up during test failures.
+struct DropGuard(Vec<Child>);


Optional: creating a DropGuard could also set up forwarding of its stdout/stderr. This would simplify setup later, but also merges mostly-unrelated concerns into a single type. But you could write the process creation as a one-liner e.g.:

let my_child = DropGuard::new(Command::new(...) .arg(...) .stdin(...) .etc() .spawn());

If you take this suggestion, maybe name the type something like TestProcessChild, reflecting the more-general purpose of this type?

branlwyd · 2022-08-10T18:36:45Z

interop_binaries/tests/end_to_end.rs

+        "Ports selected for HTTP servers were not unique",
+    );
+
+    // Step 1: Create and start containers. (here, we just run the binaries instead)


Q: is the reason we start binaries instead of containers "container build is not automated"?

Yes, plus Cargo ensures that binary targets are up-to-date before running this integration test.

branlwyd · 2022-08-10T18:52:10Z

janus_core/src/hpke.rs

-    }
+/// Generate a new HPKE keypair and return it as an HpkeConfig (public portion) and
+/// HpkePrivateKey (private portion).
+pub fn generate_hpke_config_and_private_key() -> (HpkeConfig, HpkePrivateKey) {


I think this function should either remain gated by the test-util feature, or be generalized to allow different Kem, Kdf, Aead selections & allow the config ID to be specified.

I agree. Eventually we'll need something like this function in support of automated onboarding of tasks, but for now, is it so bad for interop_binaries to have to enable test-util on its Janus dependencies?

I'd like to avoid it, because turning on test-util there would have the effect of turning it on for every build in the workspace. This was half of the cause of our last run-in with openssl-dev build-time dependencies.

branlwyd · 2022-08-10T18:53:21Z

janus_core/src/time.rs

+/// be controlled by a controller retrieved from any of the clones.
+#[derive(Clone, Debug)]
+#[non_exhaustive]
+pub struct MockClock {


I think MockClock should remain guarded by the test-util feature, since it is intended for use only in tests. (What's the technical reason you'd like to drop this feature guard?)

I need this functionality to allow submitting reports with crafted timestamps. If I were to turn on the janus_core/test-util feature from the interop_binaries dependencies, then that would result in it being enabled for all builds throughout the workspace, and we'd be building kube from our janus_server release builds, etc.

That's unfortunate. I suppose this is another artifact of the interplay between workspaces & features/how feature unification is implemented.

I think one of the solutions at https://nickb.dev/blog/cargo-workspace-and-the-feature-unification-pitfall might help us. I think our release builds also effectively use solution #2 as well since we specify the --bin flag to cargo build (but maybe specifying -p is also required?). Are you sure our release builds would encounter this pitfall?

In any case, I don't think we need to hold this PR up over this issue, but I'd definitely like to resolve it somehow. Would you mind filing an issue? I'd like to be able to use features w/o dependency resolution changing what features are enabled on a per-package basis; hopefully one of the blog solutions will work, though I'd be happier if we didn't have to go as far as introducing sub-workspaces.

I tested cargo --bin aggregator with and without -p janus_server, and the package specification is necessary to keep the feature from leaking over.

divergentdave · 2022-08-10T19:03:55Z

Yes, I rigged up a quick and dirty test of the containers themselves with bash, curl, and jq.

tgeoghegan · 2022-08-10T22:21:33Z

janus_core/src/hpke.rs

-    }
+/// Generate a new HPKE keypair and return it as an HpkeConfig (public portion) and
+/// HpkePrivateKey (private portion).
+pub fn generate_hpke_config_and_private_key() -> (HpkeConfig, HpkePrivateKey) {


I agree. Eventually we'll need something like this function in support of automated onboarding of tasks, but for now, is it so bad for interop_binaries to have to enable test-util on its Janus dependencies?

interop_binaries/src/bin/janus_interop_aggregator.rs

tgeoghegan · 2022-08-10T22:27:24Z

interop_binaries/src/bin/janus_interop_aggregator.rs

+use warp::{hyper::StatusCode, reply::Response, Filter, Reply};
+
+#[derive(Debug, Deserialize)]
+struct EndpointRequest {


If we're going to ignore every field of the request, then why parse it at all? Just to detect if the test runner is sending invalid messages?

interop_binaries/src/bin/janus_interop_aggregator.rs

tgeoghegan · 2022-08-10T22:40:44Z

interop_binaries/src/bin/janus_interop_client.rs

+        .context("Invalid base64url content in \"taskId\"")?;
+    let task_id = TaskId::get_decoded(&task_id_bytes).context("Invalid length of TaskId")?;
+    let leader_url = Url::parse(&request.leader).context("Bad leader URL")?;
+    let helper_url = Url::parse(&request.helper).context("Bad helper URL")?;


If you opt into the serde feature on url, then url::Url is Serialize/Deserialize and so you can have leader: Url, helper: Url in UploadRequest.

tgeoghegan · 2022-08-11T18:24:33Z

interop_binaries/src/bin/janus_interop_collector.rs

+use tokio::sync::Mutex;
+use warp::{hyper::StatusCode, reply::Response, Filter, Reply};
+
+static DAP_AUTH_TOKEN: &str = "DAP-Auth-Token";


There's janus_server::task::DAP_AUTH_HEADER for this

interop_binaries/src/bin/janus_interop_collector.rs

tgeoghegan · 2022-08-11T18:38:49Z

interop_binaries/src/bin/janus_interop_collector.rs

+    Number(u64),
+    NumberArray(Vec<u64>),


Is u64 big enough? Prio3Aes128Sum uses Field128, so in the end, the result type is u128. Making this generic over prio::vdaf::Vdaf would be kind of a pain but since this struct is just used for serializing to JSON, maybe using u128 instead of u64 suffices.

I specified that the result should be encoded as a JSON number, and serde_json's support for u128 is off by default, and has some warts (due to the limitations of JSON). I'll open an issue on the document to consider changing result types.

JS numbers are only safe up to 53 bits, so even u64 is not necessarily safe.

The most common solution to this I've seen is to roundtrip numbers to JSON by encoding/decoding to a JSON string representing the number, i.e. 12345 might be encoded into JSON as "12345".

I was hoping that serde might have something as easy as tagging the field, but it looks like implementing this would require some custom code today: serde-rs/json#329 & serde-rs/json#833 have examples.

In any case, I think using strings instead of numbers would (probably?) need a spec update.

tgeoghegan · 2022-08-11T18:40:57Z

interop_binaries/src/bin/janus_interop_collector.rs

+                .into_iter()
+                .map(|counter| {
+                    u64::try_from(counter).context(
+                        "entry in aggregate result was too large to represent natively in JSON",


JSON numbers are floating point, which should be big enough to represent u128. I'm not sure if u128 is big enough that loss of precision can occur, though.

JSON numbers can only "safely" represent integers up to about 53 bits[1]. If we care to fix this (maybe not for now, since this is test code & can simply arrange to not use such large numbers), the standard method I've seen is to represent the integer as a JSON string like "12345".

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER

tgeoghegan · 2022-08-11T18:48:49Z

interop_binaries/src/bin/janus_interop_collector.rs

@@ -0,0 +1,503 @@
+use anyhow::Context;


We should consider factoring some of this logic out into a standalone collector library that handles making collect requests, polling collect URIs and combining aggregate shares into aggregate results. We could distribute it in janus_client. Not needed in this PR, but a future enhancement.

Agreed, but maybe it should go in a separate crate, to keep janus_client slim for WASM users.

Agreed, it would be good to have a janus_collector implementation eventually (we've got several partial implementations sitting in several different tests, now), but IMO it should go into a separate crate from janus_client.

tgeoghegan · 2022-08-11T22:47:04Z

interop_binaries/src/bin/janus_interop_aggregator.rs

+    };
+
+    let (hpke_config, private_key) = generate_hpke_config_and_private_key(
+        HpkeConfigId::from(0u8),


Teleporting over from a comment I made on the collector code... This works fine if we assume that HPKE config IDs are unique per-task, but I think that won't be true anymore once ietf-wg-ppm/draft-ietf-ppm-dap#289 lands in DAP.

tgeoghegan

A few more nits but I think this is a sound implementation of https://divergentdave.github.io/draft-dcook-ppm-dap-interop-test-design/draft-dcook-ppm-dap-interop-test-design.html

tgeoghegan · 2022-08-12T17:29:01Z

interop_binaries/tests/end_to_end.rs

+    fn drop(&mut self) {
+        if self.0.try_wait().unwrap().is_none() {
+            self.0.kill().unwrap();
+            self.0.wait().unwrap();


I think the call to Child::wait is unnecessary after Child::kill. The docs aren't really clear, but empirically I've found that no zombie is left behind after calling Child::kill, and the sentence "This is equivalent to sending a SIGKILL on Unix platforms." implies you don't need to wait.

Makes sense, that way we rely on init to clean up the process once the test exits.

tgeoghegan · 2022-08-12T17:33:16Z

interop_binaries/tests/end_to_end.rs

+        .header(CONTENT_TYPE, JSON_MEDIA_TYPE)
+        .json(&json!({


Suggested change

.header(CONTENT_TYPE, JSON_MEDIA_TYPE)

.json(&json!({

.json(&json!({

RequestBuilder::json automatically sets Content-Type: application/json so I think quite a few calls to RequestBuilder::header can be deleted.

branlwyd · 2022-08-12T18:23:27Z

interop_binaries/src/bin/janus_interop_collector.rs

@@ -0,0 +1,503 @@
+use anyhow::Context;


Agreed, it would be good to have a janus_collector implementation eventually (we've got several partial implementations sitting in several different tests, now), but IMO it should go into a separate crate from janus_client.

branlwyd · 2022-08-12T18:31:16Z

interop_binaries/src/bin/janus_interop_collector.rs

+    Number(u64),
+    NumberArray(Vec<u64>),


JS numbers are only safe up to 53 bits, so even u64 is not necessarily safe.

The most common solution to this I've seen is to roundtrip numbers to JSON by encoding/decoding to a JSON string representing the number, i.e. 12345 might be encoded into JSON as "12345".

I was hoping that serde might have something as easy as tagging the field, but it looks like implementing this would require some custom code today: serde-rs/json#329 & serde-rs/json#833 have examples.

In any case, I think using strings instead of numbers would (probably?) need a spec update.

branlwyd · 2022-08-12T20:08:58Z

interop_binaries/tests/end_to_end.rs

+
+impl Drop for ChildProcessCleanupDropGuard {
+    fn drop(&mut self) {
+        if self.0.try_wait().unwrap().is_none() {


This code is racy: if the child process has not exited when the try_wait() call occurs, but exits before the call to kill(), the unwrap() on the return value to kill() will panic since kill() will return Err(InvalidInput).

I think the intent is to kill the process if it's still running. If that's accurate, I recommend calling kill() unconditionally and ignoring the error if it is InvalidInput.

branlwyd · 2022-08-12T20:09:41Z

interop_binaries/tests/end_to_end.rs

+
+/// Pass output from a child process's stderr pipe to eprint!(), so that it can be captured and
+/// stored by the test harness.
+async fn forward_stderr(stdout: ChildStderr) -> io::Result<()> {


Suggested change

async fn forward_stderr(stdout: ChildStderr) -> io::Result<()> {

async fn forward_stderr(stderr: ChildStderr) -> io::Result<()> {

(mega-nit)

branlwyd · 2022-08-12T20:36:01Z

janus_core/src/time.rs

+/// be controlled by a controller retrieved from any of the clones.
+#[derive(Clone, Debug)]
+#[non_exhaustive]
+pub struct MockClock {


That's unfortunate. I suppose this is another artifact of the interplay between workspaces & features/how feature unification is implemented.

I think one of the solutions at https://nickb.dev/blog/cargo-workspace-and-the-feature-unification-pitfall might help us. I think our release builds also effectively use solution #2 as well since we specify the --bin flag to cargo build (but maybe specifying -p is also required?). Are you sure our release builds would encounter this pitfall?

In any case, I don't think we need to hold this PR up over this issue, but I'd definitely like to resolve it somehow. Would you mind filing an issue? I'd like to be able to use features w/o dependency resolution changing what features are enabled on a per-package basis; hopefully one of the blog solutions will work, though I'd be happier if we didn't have to go as far as introducing sub-workspaces.

divergentdave force-pushed the david/interop-containers branch 2 times, most recently from c48f135 to a3793a7 Compare August 9, 2022 18:42

divergentdave added 3 commits August 9, 2022 16:58

Add interoperation test binaries

87f73df

Make generate_hpke_config_and_private_key() public

8752e24

Move MockClock out of test-util; support nonceTime

740078e

divergentdave force-pushed the david/interop-containers branch from a3793a7 to 740078e Compare August 9, 2022 22:36

end_to_end: don't send both tokens to helper

0d9fd8f

divergentdave marked this pull request as ready for review August 10, 2022 13:38

divergentdave requested a review from a team as a code owner August 10, 2022 13:38

branlwyd reviewed Aug 10, 2022

View reviewed changes

divergentdave added 7 commits August 10, 2022 14:38

Use #[serde(rename_all)]

2d00f16

Fix collector_auth_tokens validation

72c5c87

Take entire DB connection URL as an argument

c24ffe8

Lowercase errors

ff3da0e

Refactor drop guard

b1dbe3b

Improve generate_hpke_config_and_private_key()

dd2be60

Merge branch 'main' into david/interop-containers

6672850

tgeoghegan requested changes Aug 11, 2022

View reviewed changes

divergentdave mentioned this pull request Aug 11, 2022

Is a TCP-only readiness check too restrictive? divergentdave/draft-dcook-ppm-dap-interop-test-design#1

Closed

tgeoghegan reviewed Aug 11, 2022

View reviewed changes

divergentdave added 3 commits August 11, 2022 17:55

Review feedback

f8e2d25

Don't reuse same HpkeConfigId across keys

4eaaed9

Merge branch 'main' into david/interop-containers

76c94c2

tgeoghegan approved these changes Aug 12, 2022

View reviewed changes

divergentdave added 3 commits August 12, 2022 13:03

Don't wait for terminated processes

d688c04

Remove redundant addition of Content-Type header

30c769c

Merge branch 'main' into david/interop-containers

ab1733d

branlwyd suggested changes Aug 12, 2022

View reviewed changes

Fix race condition killing child process

74711b9

Cleanup

3670a7e

divergentdave mentioned this pull request Aug 15, 2022

Revisit workspace feature coalescing #394

Closed

branlwyd approved these changes Aug 15, 2022

View reviewed changes

divergentdave merged commit 826ce00 into main Aug 15, 2022

divergentdave deleted the david/interop-containers branch August 15, 2022 20:59

	if collector_auth_tokens.is_empty() {
	return Err(Error::InvalidParameter("collector_auth_tokens"));
	}

	let task_id = TaskId::get_decoded(&task_id_bytes).context("Invalid length of TaskId")?;
	let task_id = TaskId::get_decoded(&task_id_bytes).context("invalid length of TaskId")?;

	.header(CONTENT_TYPE, JSON_MEDIA_TYPE)
	.json(&json!({
	.json(&json!({

	async fn forward_stderr(stdout: ChildStderr) -> io::Result<()> {
	async fn forward_stderr(stderr: ChildStderr) -> io::Result<()> {

Implement interoperation test API #375

Implement interoperation test API #375

Uh oh!

Conversation

divergentdave commented Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

branlwyd left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

branlwyd Aug 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divergentdave commented Aug 10, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tgeoghegan left a comment

divergentdave commented Aug 9, 2022 •

edited

Loading

branlwyd left a comment •

edited

Loading

branlwyd Aug 10, 2022 •

edited

Loading