From 221f8230c8cebc024a8c31fe1afa299ca7d261ae Mon Sep 17 00:00:00 2001
From: PredictiveManish <manish.tiwari.09@zohomail.in>
Date: Tue, 17 Mar 2026 21:54:33 +0530
Subject: [PATCH 1/2] docs: Add prescriptive testing strategy guide

---
 docs/contributing/tests.md | 303 ++++++++++++++++++++++++++-----------
 1 file changed, 217 insertions(+), 86 deletions(-)

diff --git a/docs/contributing/tests.md b/docs/contributing/tests.md
index 5886d9e1..38c2fe97 100644
--- a/docs/contributing/tests.md
+++ b/docs/contributing/tests.md
@@ -1,98 +1,229 @@
 # Writing Tests
 
-tl;dr:
- - Setting up the `py_api` fixture to test directly against a REST API endpoint is really slow, only use it for migration/integration tests.
- - Getting a database fixture and doing a database call is slow, consider mocking if appropriate.
+This guide specifies the required testing strategy for the OpenML server API.
+All tests MUST follow this strategy to ensure correctness while maintaining fast test execution.
 
-## Overhead from Fixtures
-Sometimes, you want to interact with the REST API through the `py_api` fixture,
-or want access to a database with `user_test` or `expdb_test` fixtures.
-Be warned that these come with considerable relative overhead, which adds up when running thousands of tests.
+## Testing Strategy Overview
+
+Write tests at FOUR levels. Each level serves a specific purpose:
+
+| Level | Purpose | Overhead |
+|-------|---------|----------|
+| Migration | Verify Python API matches PHP API | ~300ms |
+| Integration | Test full FastAPI stack | ~60ms |
+| Database | Test DB functions directly | ~4ms |
+| Unit | Test input/output processing | ~1ms |
+
+## 1. Migration Tests
+
+You MUST write migration tests for every endpoint to verify correctness during PHP to Python migration.
+
+**When to write these:** Test each endpoint against the PHP API to verify equivalent behavior.
+
+**Key requirement:** Use `persisted_flow` (or similar) fixture to persist data to the database before the test, so the PHP API can also access it. Without persistence, the PHP API cannot see the test data.
+
+### Example
 
 ```python
-@pytest.mark.parametrize('execution_number', range(5000))
-def test_private_dataset_owner_access(
-        execution_number,
-        expdb_test: Connection,
-        user_test: Connection,
-        py_api: TestClient,
+@pytest.mark.mut
+async def test_flow_exists(
+    persisted_flow: Flow,
+    py_api: httpx.AsyncClient,
+    php_api: httpx.AsyncClient,
 ) -> None:
-    fetch_user(ApiKey.REGULAR_USER, user_test)  # accesses only the user db
-    get_estimation_procedures(expdb_test)  # accesses only the experiment db
-    py_api.get("/does/not/exist")  # only queries the api
+    path = f"exists/{persisted_flow.name}/{persisted_flow.external_version}"
+    py_response, php_response = await asyncio.gather(
+        py_api.get(f"/flows/{path}"),
+        php_api.get(f"/flow/{path}"),
+    )
+
+    assert py_response.status_code == php_response.status_code
+    assert py_response.json() == {"flow_id": persisted_flow.id}
+```
+
+### Persisting Data for PHP API Tests
+
+When testing against the PHP API, you MUST persist data to the database because:
+- The PHP API reads from the same database as Python
+- The `expdb_test` fixture rolls back changes after each test by default
+- Without persistence, PHP API will not see the test data
+
+Use `persist=True` in `temporary_records` or use `persisted_*` fixtures:
+
+```python
+# Option 1: Use persisted fixture (recommended)
+@pytest.fixture
+async def persisted_flow(flow: Flow, expdb_test: AsyncConnection) -> AsyncIterator[Flow]:
+    await expdb_test.commit()  # Persist to database
+    yield flow
+    await expdb_test.rollback()  # Rollback after test
+    # Cleanup...
+
+# Option 2: Use persist parameter
+async with temporary_records(connection, insert_queries, delete_queries, persist=True):
+    # Data is visible to both APIs
     pass
 ```
 
-When individually adding/removing components, we measure (for 5000 repeats, n=1):
-
-| expdb | user | api | exp call | user call | api get |  time (s) |
-|-------|------|-----|----------|-----------|---------|----------:|
-|  ❌   |  ❌  | ❌  | ❌      | ❌        | ❌      |      1.78 |
-|  ✅   |  ❌  | ❌  | ❌      | ❌        | ❌      |      3.45 |
-|  ❌   |  ✅  | ❌  | ❌      | ❌        | ❌      |      3.22 |
-|  ❌   |  ❌  | ✅  | ❌      | ❌        | ❌      |    298.48 |
-|  ✅   |  ✅  | ❌  | ❌      | ❌        | ❌      |      4.44 |
-|  ✅   |  ✅  | ✅  | ❌      | ❌        | ❌      |    285.69 |
-|  ✅   |  ❌  | ❌  | ✅      | ❌        | ❌      |      4.91 |
-|  ❌   |  ✅  | ❌  | ❌      | ✅        | ❌      |      5.81 |
-|  ✅   |  ✅  | ✅  | ✅      | ✅        | ✅      |    307.91 |
-
-Adding a fixture that just returns some value adds only minimal overhead (1.91s),
-so the burden comes from establishing the database connection itself.
-
-We make the following observations:
-
-- Adding a database fixture adds the same overhead as instantiating an entirely new test.
-- Overhead of adding multiple database fixtures is not free
-- The `py_api` fixture adds two orders of magnitude more overhead
-
-We want our tests to be fast, so we want to avoid using these fixtures when we reasonably can.
-We restrict usage of `py_api` fixtures to integration/migration tests, since it is very slow.
-These only run on CI before merges.
-For database fixtures
-
-We will write some fixtures that can be used to e.g., get a `User` without accessing the database.
-The validity of these users will be tested against the database in only a single test.
-
-### Mocking
-Mocking can help us reduce the reliance on database connections in tests.
-A mocked function can prevent accessing the database, and instead return a predefined value instead.
-
-It has a few upsides:
- - It's faster than using a database fixture (see below).
- - The test is not dependent on the database: you can run the test without a database.
-
-But it also has downsides:
- - Behavior changes in the database, such as schema changes, are not automatically reflected in the tests.
- - The database layer (e.g., queries) are not actually tested.
-
-Basically, the mocked behavior may not match real behavior when executed on a database.
-For this reason, for each mocked entity, we should add a test that verifies that if the database layer
-is invoked with the database, it returns the expected output that matches the mock.
-This is additional overhead in development, but hopefully it pays back in more granular test feedback and faster tests.
-
-On the speed of mocks, consider these two tests:
-
-```diff
-@pytest.mark.parametrize('execution_number', range(5000))
-def test_private_dataset_owner_access(
-        execution_number,
-        admin,
-+        mocker,
--        expdb_test: Connection,
+## 2. Integration Tests
+
+You SHOULD write integration tests for critical paths that exercise the full FastAPI stack.
+
+**When to write these:** Test expected paths that require the complete application stack (client, dependency injection, database). Limit to ONE test per kind of expected response.
+
+**Important:** Creating the FastAPI TestClient has significant overhead (~60ms per test). Therefore, you MUST limit integration tests to critical paths only. Use database or unit tests for additional coverage.
+
+### Example
+
+```python
+async def test_flow_exists(flow: Flow, py_api: httpx.AsyncClient) -> None:
+    """Test flow exists - found case."""
+    response = await py_api.get(f"/flows/exists/{flow.name}/{flow.external_version}")
+    assert response.status_code == HTTPStatus.OK
+    assert response.json() == {"flow_id": flow.id}
+
+
+async def test_flow_exists_not_exists(py_api: httpx.AsyncClient) -> None:
+    """Test flow exists - not found case."""
+    response = await py_api.get("/flows/exists/foo/bar")
+    assert response.status_code == HTTPStatus.NOT_FOUND
+```
+
+## 3. Database Tests
+
+You SHOULD write database tests to verify database query behavior directly.
+
+**When to write these:** Test database queries, ORM functions, and data retrieval logic.
+
+**Advantage:** Fast execution (~4ms per test), no FastAPI overhead.
+
+### Example
+
+```python
+async def test_database_flow_exists(flow: Flow, expdb_test: AsyncConnection) -> None:
+    retrieved_flow = await database.flows.get_by_name(flow.name, flow.external_version, expdb_test)
+    assert retrieved_flow is not None
+    assert retrieved_flow.id == flow.id
+
+
+async def test_database_flow_not_found(expdb_test: AsyncConnection) -> None:
+    retrieved_flow = await database.flows.get_by_name("foo", "bar", expdb_test)
+    assert retrieved_flow is None
+```
+
+### Database Fixtures and Rollback
+
+The `expdb_test` and `user_test` fixtures use automatic rollback:
+
+```python
+@contextlib.asynccontextmanager
+async def automatic_rollback(engine: AsyncEngine) -> AsyncIterator[AsyncConnection]:
+    async with engine.connect() as connection:
+        transaction = await connection.begin()
+        yield connection
+        if transaction.is_active:
+            await transaction.rollback()
+```
+
+**Behavior:**
+- Default: Changes are rolled back after each test (safe, no cleanup needed)
+- Use `persist=True` or `persisted_*` fixtures ONLY when data must be visible to external systems (e.g., PHP API)
+
+## 4. Unit Tests
+
+You SHOULD write unit tests to verify input/output processing and error handling.
+
+**When to write these:** Test edge cases, error handling, input validation, and output formatting. Use `pytest.mark.parametrize` to test multiple inputs efficiently.
+
+**Advantage:** Fast execution (~1ms per test), no database or API overhead.
+
+### Example
+
+```python
+async def test_flow_exists_calls_db_correctly(
+    name: str,
+    external_version: str,
+    expdb_test: AsyncConnection,
+    mocker: MockerFixture,
 ) -> None:
-+    mock = mocker.patch('database.datasets.get')
-+    class Dataset(NamedTuple):
-+        uploader: int
-+        visibility: Visibility
-+    mock.return_value = Dataset(uploader=1, visibility=Visibility.PRIVATE)
-
-    _get_dataset_raise_otherwise(
-        dataset_id=1,
-        user=admin,
--        expdb=expdb_test,
-+        expdb=None,
+    mocked_db = mocker.patch("database.flows.get_by_name", new_callable=mocker.AsyncMock)
+    await flow_exists(name, external_version, expdb_test)
+    mocked_db.assert_called_once_with(
+        name=name,
+        external_version=external_version,
+        expdb=mocker.ANY,
     )
+
+
+async def test_flow_exists_handles_not_found(
+    mocker: MockerFixture, expdb_test: AsyncConnection
+) -> None:
+    mocker.patch("database.flows.get_by_name", return_value=None)
+    with pytest.raises(FlowNotFoundError) as error:
+        await flow_exists("foo", "bar", expdb_test)
+    assert error.value.status_code == HTTPStatus.NOT_FOUND
+```
+
+### When to Mock
+
+| Use Case | Approach |
+|----------|----------|
+| Testing input/output processing | Mock the database call |
+| Testing error handling | Mock return value to trigger error |
+| Testing database behavior | Use database tests instead |
+
+## Test Markers
+
+Use pytest markers to control test execution:
+
+- `@pytest.mark.mut` - Migration tests (run against both PHP and Python APIs)
+- Default - Fast unit/database/integration tests (run locally and on CI)
+
+```bash
+# Run only fast tests locally
+pytest -m "not mut"
+
+# Run all tests including migration
+pytest -m "mut"
 ```
-There is only a single database call in the test. It fetches a record on an indexed field and does not require any joins.
-Despite the database call being very light, the database-included test is ~50% slower than the mocked version (3.50s vs 5.04s).
+
+## Fixtures Reference
+
+| Fixture | Purpose | Rollback | Use When |
+|---------|---------|----------|----------|
+| `expdb_test` | DB connection to experiment db | Automatic | Testing DB queries |
+| `user_test` | DB connection to user db | Automatic | Testing user data |
+| `py_api` | FastAPI test client with DB | Automatic | Integration tests |
+| `php_api` | PHP API client | N/A | Migration tests |
+| `flow` | Creates temp flow in DB | Automatic | Unit/DB tests |
+| `persisted_flow` | Persists flow for both APIs | Manual cleanup | Migration tests |
+
+## Performance Data
+
+Understanding fixture overhead helps you write efficient tests.
+
+### Fixture Overhead (5000 iterations)
+
+| Fixtures Used | Time (s) |
+|---------------|----------|
+| None | 1.78 |
+| `expdb_test` | 3.45 |
+| `user_test` | 3.22 |
+| `py_api` | 298.48 |
+| `expdb_test` + `user_test` | 4.44 |
+| All three | 307.91 |
+
+### Key Observations
+
+- **Database fixtures** add ~2ms overhead each
+- **FastAPI TestClient** adds ~60ms overhead (two orders of magnitude more than DB)
+- Adding multiple database fixtures is not free
+- Use `pytest.mark.parametrize` at the database/unit level for multiple inputs
+
+### Guidelines
+
+1. DO use database tests with `pytest.mark.parametrize` for multiple inputs
+2. DO mock database calls in unit tests for speed
+3. DO limit integration tests to one per expected response type
+4. DO NOT use `py_api` fixture unless testing the full FastAPI stack
+5. DO NOT create a new database fixture per input in parameterized tests
\ No newline at end of file

From 00a1721b62ee2d2cc4363581dd765448fb204710 Mon Sep 17 00:00:00 2001
From: "pre-commit-ci[bot]"
 <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date: Tue, 17 Mar 2026 16:28:21 +0000
Subject: [PATCH 2/2] [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
---
 docs/contributing/tests.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/contributing/tests.md b/docs/contributing/tests.md
index 38c2fe97..ade0760f 100644
--- a/docs/contributing/tests.md
+++ b/docs/contributing/tests.md
@@ -226,4 +226,4 @@ Understanding fixture overhead helps you write efficient tests.
 2. DO mock database calls in unit tests for speed
 3. DO limit integration tests to one per expected response type
 4. DO NOT use `py_api` fixture unless testing the full FastAPI stack
-5. DO NOT create a new database fixture per input in parameterized tests
\ No newline at end of file
+5. DO NOT create a new database fixture per input in parameterized tests