AP-660: Adds /public viewer#43
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an Airflow FastAPI plugin that serves the Airflow server’s ./public directory under /mokelumne/public, and wires it into the UI via sidebar + DAG run external views. Updates summarise_job to write public output under a run-id-based directory, and adds Playwright e2e coverage for the new viewer.
Changes:
- Add
plugins/public_browser.pyFastAPI app +AirflowPluginregistration and external views. - Update
summarise_joboutput directory naming and HTML link generation for public artifacts. - Add Playwright e2e tests for the public browser endpoints and UI navigation; ignore
uv.lock.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
plugins/public_browser.py |
Implements /mokelumne/public file/directory serving and registers UI external views. |
mokelumne/dags/summarise_job.py |
Writes outputs into a run-id directory and updates index.html links for public browsing. |
mokelumne/dags/notify_user.py |
Removes now-unused public_path_to_url import. |
test/e2e/test_public_browser.py |
Adds request-level e2e tests for auth + attachment behavior of the public browser. |
test/e2e/test_public_browser_nav.py |
Adds UI navigation e2e tests for sidebar and DAG-run tab integration. |
.gitignore |
Ignores uv.lock. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| SEEDED_JOB_UUID = "5b5a47d6-18d6-4ca6-aea3-e43396bc587f" | ||
|
|
||
|
|
||
| def test_sidebar_tab_and_job_row_click(page_as_testadmin: Page) -> None: | ||
| page = page_as_testadmin | ||
|
|
||
| # Sidebar button rendered by PluginMenus with topLevel=true. | ||
| sidebar_btn = page.get_by_role("link", name="Batch Image Results").first | ||
| expect(sidebar_btn).to_be_visible() | ||
| sidebar_btn.click() | ||
|
|
||
| # React router navigates to /plugin/<url_route>. | ||
| page.wait_for_url("**/plugin/mokelumne_public_browser") | ||
|
|
||
| # Iframe loads the directory listing. Directory entries render as | ||
| # "<name>/" in the auto-generated <ul>, so match on the trailing slash. | ||
| iframe = page.frame_locator("iframe").first | ||
| job_link = iframe.get_by_role("link", name=f"{SEEDED_JOB_UUID}/") | ||
| expect(job_link).to_be_visible() | ||
|
|
There was a problem hiding this comment.
This UI navigation test also relies on SEEDED_JOB_UUID being present in the public directory listing inside the iframe. Without seeding ./public/<uuid>/ as part of the test setup, this will fail in CI and for new dev checkouts. Consider creating the seed directory/files in a fixture and using that UUID here.
| @app.get("/{subpath:path}") | ||
| def any_path(subpath: str) -> Response: | ||
| clean = subpath.rstrip("/") | ||
| return _serve(_resolve(clean), clean) |
There was a problem hiding this comment.
any_path() strips the trailing slash and then serves directory listings without redirecting. If a user hits /mokelumne/public/<dir> (no trailing slash), the listing’s relative links like child/ will resolve to /mokelumne/public/child/ instead of /mokelumne/public/<dir>/child/. Consider preserving the trailing slash or issuing a redirect to the canonical .../<dir>/ URL when the resolved target is a directory.
d4292a7 to
1824f70
Compare
| """Generate a URL-safe directory for collated output.""" | ||
| path = public_dir() / str(uuid4()) | ||
| path.mkdir() # We don't exist_ok=True because it should be unique. | ||
| def generate_id(dag: DAG, run_id: str) -> str: |
There was a problem hiding this comment.
This was a neat find — context variables can be injected by name.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| run_dir = PUBLIC_HOST_DIR / DAG_ID / RUN_ID | ||
| run_dir.mkdir(parents=True, exist_ok=True) | ||
| (run_dir / "index.html").write_text(index_html, encoding="utf-8") | ||
| (run_dir / SEEDED_CSV_FILENAME).write_text(SEEDED_CSV_TEXT, encoding="utf-8") | ||
|
|
There was a problem hiding this comment.
This fixture creates/overwrites files under the shared ./public/<DAG>/<RUN>/ tree but never removes them. Consider using a yield fixture with teardown to delete run_dir afterwards (or write into a unique per-test subdirectory) to avoid accumulating state across runs.
| def test_unauthenticated_user_cannot_view_files( | ||
| page: Page, | ||
| seeded_run: dict[str, str] | ||
| ) -> None: | ||
| """A fresh browser (no login) must get a 401 from the plugin route.""" |
There was a problem hiding this comment.
Role changes in this PR affect both "User" and "Viewer" (Plugins.can_read), but the e2e suite only asserts unauthenticated + user/admin behavior. Add a test covering the viewer/public user fixture (or explicitly assert they should not have access) so the permission intent is locked in.
| # so left-click behavior works. The user can still right-click/Save As to download it. | ||
| media_type = mimetypes.guess_type(filepath)[0] or "application/octet-stream" | ||
| if media_type == "text/csv": | ||
| media_type = "text/plain; charset=utf-8" | ||
|
|
There was a problem hiding this comment.
The CSV Content-Type rewrite (text/csv -> text/plain; charset=utf-8) is important for iframe behavior, but there’s no automated check that it stays in place. Consider adding a test that requests a seeded .csv and asserts the response Content-Type header so regressions are caught.
| def generate_id(dag: DAG, run_id: str) -> str: | ||
| """Creates the local storage path for the current DAG run""" | ||
| path = public_dir() / dag.dag_id / run_id |
There was a problem hiding this comment.
generate_id is defined to take (dag: DAG, run_id: str), but the DAG isn’t providing these values via upstream XComs/params. Unless Airflow is injecting these automatically, this task signature will break. Prefer reading dag_id/run_id from get_current_context() inside the task or keep generate_id parameterless.
| def generate_id(dag: DAG, run_id: str) -> str: | |
| """Creates the local storage path for the current DAG run""" | |
| path = public_dir() / dag.dag_id / run_id | |
| def generate_id() -> str: | |
| """Creates the local storage path for the current DAG run""" | |
| context = get_current_context() | |
| dag_id = context["dag"].dag_id | |
| run_id = context["run_id"] | |
| path = public_dir() / dag_id / run_id |
| url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/") | ||
| return f"{url_base}{str(location)}" |
There was a problem hiding this comment.
public_path_to_url() currently concatenates MOKELUMNE_PUBLIC_URL with the public storage relative path but does not incorporate MOKELUMNE_PUBLIC_URL_PREFIX. With example.env setting MOKELUMNE_PUBLIC_URL=http://localhost:8080/, this will generate links that omit /public. Either require MOKELUMNE_PUBLIC_URL to include the prefix (document it) or build URLs from MOKELUMNE_PUBLIC_URL + MOKELUMNE_PUBLIC_URL_PREFIX (normalizing slashes).
| url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/") | |
| return f"{url_base}{str(location)}" | |
| url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/") | |
| url_prefix = os.environ.get("MOKELUMNE_PUBLIC_URL_PREFIX", "/public") | |
| normalized_base = url_base.rstrip("/") | |
| normalized_prefix = url_prefix.strip("/") | |
| normalized_location = location.as_posix().strip("/") | |
| url_parts = [normalized_base] | |
| if normalized_prefix: | |
| url_parts.append(normalized_prefix) | |
| if normalized_location: | |
| url_parts.append(normalized_location) | |
| return "/".join(url_parts) |
There was a problem hiding this comment.
What Copilot is suggesting is excessive. Try this (presuming an from urllib.parse import urljoin added at the top):
| url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/") | |
| return f"{url_base}{str(location)}" | |
| url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/") | |
| return urljoin(url_base, str(location)) |
| # digging into an auth/config error, but it's *extremely* verbose. | ||
| KC_LOG_LEVEL=INFO | ||
|
|
||
| MOKELUMNE_PUBLIC_STORAGE=/opt/app/storage |
There was a problem hiding this comment.
MOKELUMNE_PUBLIC_STORAGE is set to /opt/app/storage, but docker-compose bind-mounts ./public into /opt/airflow/public. If someone copies example.env to .env, the plugin/tests will read from a different directory than the mounted ./public volume. Align this value with the compose mount (or update compose accordingly).
| MOKELUMNE_PUBLIC_STORAGE=/opt/app/storage | |
| MOKELUMNE_PUBLIC_STORAGE=/opt/airflow/public |
| KC_LOG_LEVEL=INFO | ||
|
|
||
| MOKELUMNE_PUBLIC_STORAGE=/opt/app/storage | ||
| MOKELUMNE_PUBLIC_URL=http://localhost:8080/ |
There was a problem hiding this comment.
MOKELUMNE_PUBLIC_URL is set to http://localhost:8080/, but public_path_to_url() currently concatenates this with the public storage relative path (which is {dag_id}/{run_id}/...) and does not add MOKELUMNE_PUBLIC_URL_PREFIX. This will produce links that omit /public. Either set this env var to include the prefix (e.g. .../public/) or adjust URL construction to include MOKELUMNE_PUBLIC_URL_PREFIX.
| MOKELUMNE_PUBLIC_URL=http://localhost:8080/ | |
| MOKELUMNE_PUBLIC_URL=http://localhost:8080/public/ |
- Adds an AirFlow plugin that serves files from
`/public/{dag_id}/{run_id}`.
- Adds Plugins.View permissions to the existing
User role. This is necessary for Users to be
able to view the routes provided by the plugin.
We should be more deliberate about this.
- To work with AirFlow's plugin templating
mechanism, changes the local storage path for
existing DAGs to be `/public/{dag_id}/{run_id}`.
The root path and URL path remain configurable
via environment variables.
anarchivist
left a comment
There was a problem hiding this comment.
Generally looks good. Given that Dag read permissions are required to view the files (since they're now served through Airflow's apiserver) I'm not sure that "public" is the right nomenclature here. Not sure if that's a blocker as it's a big nit to pick.
| url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/") | ||
| return f"{url_base}{str(location)}" |
There was a problem hiding this comment.
What Copilot is suggesting is excessive. Try this (presuming an from urllib.parse import urljoin added at the top):
| url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/") | |
| return f"{url_base}{str(location)}" | |
| url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/") | |
| return urljoin(url_base, str(location)) |
|
|
||
| Serves files from `/{public_dir}/{dag_id}/{run_id}` | ||
|
|
||
| Users must have permissions to view the given DAG in order to view its files. |
There was a problem hiding this comment.
This makes me wonder if /public and similarly named methods are the right naming convention here. If users have permissions to view the given Dag, then it's not truly public, right?
| FILESERVER_FORCE_PLAINTEXT=text/csv | ||
| FILESERVER_ROOT=/opt/airflow/storage | ||
| FILESERVER_URL_PREFIX=/storage | ||
|
|
There was a problem hiding this comment.
is this redundant to FILESERVER_ROOT? (might just be alittle slow this morning)
Adds StaticFiles Plugin
STATIC_URL_PREFIX(default: /storage) serving files fromSTATIC_ROOT(default: /opt/airflow/storage).STATIC_FORCE_PLAINTEXTto "text/plain", forcing browsers to render them inline. This is necessary for these views to render properly in sandboxed AirFlow iFrames.{STATIC_ROOT}/{dag_id}/{run_id}.Permissions
AirFlow's default "User" role lacks the ability to view Plugins, which prevents them from seeing the routes exposed by this plugin (whether directly or in a DAG Run tab). To get this working, I added
Plugins.can_readto the User and Viewer roles. This feels wrong / overly broad so I'm open to a better approach here. (This PR just really dragged on and I wanted to get something reviewable out.)Download vs. Inline Files
AirFlow's iFrames are sandboxed and don't allow downloads, which causes traditional left-click behavior to fail when trying to view CSVs. The plugin works around this by setting the Content-Type to text/plain for CSVs, causing the browser to render them inline. Users can still download the file via
Right Click + Save File As….Screenshots
Viewing the index.html
If no file is specified (e.g. /public/1/2/ is requested, without a file), then the index.html is rendered if it exists.
Viewing a CSV
The Tab view iFrame blocks downloading, so we have to render the content inline otherwise left-click fails silently. Users can still download the file via
Right Click + Save as….When unauthenticated / unauthorized
Unauthenticated / unauthorized users get a 401/403:
When the DAG run directory is missing / empty
Returns 404 when the DAG run doesn't have public files.