-
-
Notifications
You must be signed in to change notification settings - Fork 51
feat: Add GET/POST /task/list endpoint #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,6 +4,104 @@ | |
| import httpx | ||
|
|
||
|
|
||
| async def test_list_tasks_default(py_api: httpx.AsyncClient) -> None: | ||
| """Default call returns active tasks with correct shape.""" | ||
| response = await py_api.post("/tasks/list", json={}) | ||
| assert response.status_code == HTTPStatus.OK | ||
| tasks = response.json() | ||
| assert isinstance(tasks, list) | ||
| assert len(tasks) > 0 | ||
| assert all(task["status"] == "active" for task in tasks) | ||
| # verify shape of first task | ||
| task = tasks[0] | ||
| assert "task_id" in task | ||
| assert "task_type_id" in task | ||
| assert "task_type" in task | ||
| assert "did" in task | ||
| assert "name" in task | ||
| assert "format" in task | ||
| assert "status" in task | ||
| assert "input" in task | ||
| assert "quality" in task | ||
| assert "tag" in task | ||
|
|
||
|
|
||
| async def test_list_tasks_filter_type(py_api: httpx.AsyncClient) -> None: | ||
| """Filter by task_type_id returns only tasks of that type.""" | ||
| response = await py_api.post("/tasks/list", json={"task_type_id": 1}) | ||
| assert response.status_code == HTTPStatus.OK | ||
| tasks = response.json() | ||
| assert len(tasks) > 0 | ||
| assert all(t["task_type_id"] == 1 for t in tasks) | ||
|
|
||
|
|
||
| async def test_list_tasks_filter_tag(py_api: httpx.AsyncClient) -> None: | ||
| """Filter by tag returns only tasks with that tag.""" | ||
| response = await py_api.post("/tasks/list", json={"tag": "OpenML100"}) | ||
| assert response.status_code == HTTPStatus.OK | ||
| tasks = response.json() | ||
| assert len(tasks) > 0 | ||
| assert all("OpenML100" in t["tag"] for t in tasks) | ||
|
Comment on lines
+38
to
+44
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion (testing): Add coverage for the remaining filters ( Current tests exercise Consider adding targeted tests such as:
Suggested implementation: async def test_list_tasks_filter_type(py_api: httpx.AsyncClient) -> None:
"""Filter by task_type_id returns only tasks of that type."""
response = await py_api.post("/tasks/list", json={"task_type_id": 1})
assert response.status_code == HTTPStatus.OK
tasks = response.json()
assert all(t["task_type_id"] == 1 for t in tasks)
async def test_list_tasks_default(py_api: httpx.AsyncClient) -> None:
"""Default call returns active tasks with correct shape."""
response = await py_api.post("/tasks/list", json={})
assert response.status_code == HTTPStatus.OK
tasks = response.json()
assert isinstance(tasks, list)
assert len(tasks) > 0
# verify shape of first task
task = tasks[0]
async def test_list_tasks_filter_data_tag(py_api: httpx.AsyncClient) -> None:
"""Filter by data_tag returns only tasks whose dataset has that tag."""
# Get a tag actually present on at least one dataset
response = await py_api.post("/tasks/list", json={})
assert response.status_code == HTTPStatus.OK
tasks = response.json()
# Find first task that has at least one data tag
tagged_task = next(
t for t in tasks if t.get("data_tags") is not None and len(t.get("data_tags")) > 0
)
data_tag = tagged_task["data_tags"][0]
response = await py_api.post("/tasks/list", json={"data_tag": data_tag})
assert response.status_code == HTTPStatus.OK
filtered = response.json()
assert len(filtered) > 0
assert all(
data_tag in t.get("data_tags", []) for t in filtered
), "All returned tasks should have the dataset tag used for filtering"
async def test_list_tasks_filter_task_id(py_api: httpx.AsyncClient) -> None:
"""Filter by task_id returns exactly the requested tasks."""
# Discover a small set of valid task IDs from the default listing
response = await py_api.post("/tasks/list", json={})
assert response.status_code == HTTPStatus.OK
tasks = response.json()
assert len(tasks) >= 2
requested_ids = sorted(t["task_id"] for t in tasks[:3])
response = await py_api.post("/tasks/list", json={"task_id": requested_ids})
assert response.status_code == HTTPStatus.OK
filtered = response.json()
returned_ids = sorted(t["task_id"] for t in filtered)
assert set(returned_ids) == set(
requested_ids
), "Filtering by task_id should return exactly the requested tasks"
# If the API defines a particular ordering, assert on that here as well
# (e.g. sorted by task_id). The current check ensures no duplicates.
assert len(returned_ids) == len(set(returned_ids))
async def test_list_tasks_filter_data_id(py_api: httpx.AsyncClient) -> None:
"""Filter by data_id returns only tasks from those datasets."""
response = await py_api.post("/tasks/list", json={})
assert response.status_code == HTTPStatus.OK
tasks = response.json()
assert len(tasks) > 0
# Take a couple of distinct data_ids from the available tasks
data_ids = []
for t in tasks:
if t["data_id"] not in data_ids:
data_ids.append(t["data_id"])
if len(data_ids) == 2:
break
response = await py_api.post("/tasks/list", json={"data_id": data_ids})
assert response.status_code == HTTPStatus.OK
filtered = response.json()
assert len(filtered) > 0
assert all(
t["data_id"] in data_ids for t in filtered
), "All returned tasks should belong to one of the requested data_ids"
async def test_list_tasks_filter_data_name(py_api: httpx.AsyncClient) -> None:
"""Filter by data_name returns only tasks whose dataset has that name."""
response = await py_api.post("/tasks/list", json={})
assert response.status_code == HTTPStatus.OK
tasks = response.json()
assert len(tasks) > 0
data_name = tasks[0]["data_name"]
# Use the exact name observed in the listing
response = await py_api.post("/tasks/list", json={"data_name": data_name})
assert response.status_code == HTTPStatus.OK
filtered = response.json()
assert len(filtered) > 0
assert all(
t["data_name"] == data_name for t in filtered
), "All returned tasks should have the requested data_name"
async def test_list_tasks_filter_number_features_range(py_api: httpx.AsyncClient) -> None:
"""Filter by number_features quality range returns tasks within that range."""
# Probe existing tasks to discover a concrete number_features value
response = await py_api.post("/tasks/list", json={})
assert response.status_code == HTTPStatus.OK
tasks = response.json()
# Find a task with a defined number_features value
task_with_features = next(
t for t in tasks if t.get("number_features") is not None
)
num_features = task_with_features["number_features"]
# Build a narrow range around that concrete value to exercise the quality clause
payload = {
"number_features": {
"min": num_features,
"max": num_features,
}
}
response = await py_api.post("/tasks/list", json=payload)
assert response.status_code == HTTPStatus.OK
filtered = response.json()
assert len(filtered) > 0
assert all(
t.get("number_features") is not None
and payload["number_features"]["min"]
<= t["number_features"]
<= payload["number_features"]["max"]
for t in filtered
), "All returned tasks should have number_features within the requested range"These tests assume:
To fully align with your existing API:
|
||
|
|
||
|
|
||
| async def test_list_tasks_pagination(py_api: httpx.AsyncClient) -> None: | ||
| """Pagination returns correct number of results.""" | ||
| limit = 5 | ||
| response = await py_api.post( | ||
| "/tasks/list", | ||
| json={"pagination": {"limit": limit, "offset": 0}}, | ||
| ) | ||
| assert response.status_code == HTTPStatus.OK | ||
| assert len(response.json()) == limit | ||
|
|
||
|
|
||
| async def test_list_tasks_pagination_offset(py_api: httpx.AsyncClient) -> None: | ||
| """Offset returns different results than no offset.""" | ||
| r1 = await py_api.post("/tasks/list", json={"pagination": {"limit": 5, "offset": 0}}) | ||
| r2 = await py_api.post("/tasks/list", json={"pagination": {"limit": 5, "offset": 5}}) | ||
| ids1 = [t["task_id"] for t in r1.json()] | ||
| ids2 = [t["task_id"] for t in r2.json()] | ||
| assert ids1 != ids2 | ||
sourcery-ai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| async def test_list_tasks_number_instances_range(py_api: httpx.AsyncClient) -> None: | ||
| """number_instances range filter returns tasks whose dataset matches.""" | ||
| min_instances, max_instances = 100, 1000 | ||
| response = await py_api.post( | ||
| "/tasks/list", | ||
| json={"number_instances": f"{min_instances}..{max_instances}"}, | ||
| ) | ||
| assert response.status_code == HTTPStatus.OK | ||
| tasks = response.json() | ||
| assert len(tasks) > 0 | ||
| for task in tasks: | ||
| qualities = {q["name"]: q["value"] for q in task["quality"]} | ||
| assert min_instances <= float(qualities["NumberOfInstances"]) <= max_instances | ||
|
|
||
|
|
||
| async def test_list_tasks_no_results(py_api: httpx.AsyncClient) -> None: | ||
| """Nonexistent tag returns 404 NoResultsError.""" | ||
| response = await py_api.post("/tasks/list", json={"tag": "nonexistent_tag_xyz"}) | ||
| assert response.status_code == HTTPStatus.NOT_FOUND | ||
| assert response.headers["content-type"] == "application/problem+json" | ||
| error = response.json() | ||
| assert error["status"] == HTTPStatus.NOT_FOUND | ||
| assert error["code"] == "372" | ||
|
|
||
|
|
||
| async def test_list_tasks_get(py_api: httpx.AsyncClient) -> None: | ||
| """GET /tasks/list with no body also works.""" | ||
| response = await py_api.get("/tasks/list") | ||
| assert response.status_code == HTTPStatus.OK | ||
| assert isinstance(response.json(), list) | ||
|
|
||
|
|
||
| async def test_list_tasks_invalid_range_format(py_api: httpx.AsyncClient) -> None: | ||
| """Invalid number_instances range returns 422 validation error.""" | ||
| response = await py_api.post("/tasks/list", json={"number_instances": "1...2"}) | ||
| assert response.status_code == HTTPStatus.UNPROCESSABLE_ENTITY | ||
|
|
||
|
|
||
| async def test_get_task(py_api: httpx.AsyncClient) -> None: | ||
| response = await py_api.get("/tasks/59") | ||
| assert response.status_code == HTTPStatus.OK | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider extracting small helper functions for IN-clause construction, range parsing, WHERE-clause assembly, dataset-status subquery, and task-enrichment to make
list_tasksshorter and easier to follow.You can keep the new behavior but reduce complexity by extracting a few focused helpers and reusing them.
1. Extract a generic
INclause builderYou repeat
",".join(...)and f-string interpolation for IDs and names. A tiny helper keeps SQL construction consistent and reduces visual noise:Usage:
This shrinks the
list_tasksbody and centralizes theIN (...)formatting.2. Split range parsing from quality SQL rendering
_quality_clausecurrently parses and renders SQL. Splitting the concerns makes both parts easier to read and test.Then a generic quality clause builder:
This leaves
list_taskswith just the high-level mapping:3. Extract WHERE-clause assembly into a helper
The top of
list_tasksbuilds many related conditions inline. Move that to a dedicated function that returns both fragments and the parameter dict:Then
list_tasksreduces to:4. Extract the dataset-status subquery
Moving the status subquery out of the main query makes the main SQL much easier to read:
Then in
list_tasks:5. Unify enrichment of
tasks(inputs/qualities/tags)You can use a generic helper for the “append or init list” pattern to avoid repetition:
Then:
And a single “ensure keys” helper:
Usage:
These extractions keep all current behavior and SQL intact, but make
list_tasksmostly a high‑level orchestration over small, single-purpose helpers, which should address the complexity concerns.