Skip to content

AP-660: Adds /public viewer#43

Open
danschmidt5189 wants to merge 6 commits intomainfrom
AP-660
Open

AP-660: Adds /public viewer#43
danschmidt5189 wants to merge 6 commits intomainfrom
AP-660

Conversation

@danschmidt5189
Copy link
Copy Markdown
Member

@danschmidt5189 danschmidt5189 commented Apr 18, 2026

Adds StaticFiles Plugin

  • Configures FastApi.StaticFiles as a plugin mounted at STATIC_URL_PREFIX (default: /storage) serving files from STATIC_ROOT (default: /opt/airflow/storage).
  • Adds custom middleware to map responses with a Content Type in STATIC_FORCE_PLAINTEXT to "text/plain", forcing browsers to render them inline. This is necessary for these views to render properly in sandboxed AirFlow iFrames.
  • Adds a "Files" tab to the DAG Run view showing files at {STATIC_ROOT}/{dag_id}/{run_id}.

Permissions

AirFlow's default "User" role lacks the ability to view Plugins, which prevents them from seeing the routes exposed by this plugin (whether directly or in a DAG Run tab). To get this working, I added Plugins.can_read to the User and Viewer roles. This feels wrong / overly broad so I'm open to a better approach here. (This PR just really dragged on and I wanted to get something reviewable out.)

Download vs. Inline Files

AirFlow's iFrames are sandboxed and don't allow downloads, which causes traditional left-click behavior to fail when trying to view CSVs. The plugin works around this by setting the Content-Type to text/plain for CSVs, causing the browser to render them inline. Users can still download the file via Right Click + Save File As….

Screenshots

Viewing the index.html

If no file is specified (e.g. /public/1/2/ is requested, without a file), then the index.html is rendered if it exists.

Screenshot 2026-04-24 at 11 19 12 AM

Viewing a CSV

The Tab view iFrame blocks downloading, so we have to render the content inline otherwise left-click fails silently. Users can still download the file via Right Click + Save as….

Screenshot 2026-04-24 at 11 19 16 AM

When unauthenticated / unauthorized

Unauthenticated / unauthorized users get a 401/403:

Screenshot 2026-04-24 at 11 19 58 AM

When the DAG run directory is missing / empty

Returns 404 when the DAG run doesn't have public files.

Screenshot 2026-04-24 at 11 23 36 AM

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an Airflow FastAPI plugin that serves the Airflow server’s ./public directory under /mokelumne/public, and wires it into the UI via sidebar + DAG run external views. Updates summarise_job to write public output under a run-id-based directory, and adds Playwright e2e coverage for the new viewer.

Changes:

  • Add plugins/public_browser.py FastAPI app + AirflowPlugin registration and external views.
  • Update summarise_job output directory naming and HTML link generation for public artifacts.
  • Add Playwright e2e tests for the public browser endpoints and UI navigation; ignore uv.lock.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
plugins/public_browser.py Implements /mokelumne/public file/directory serving and registers UI external views.
mokelumne/dags/summarise_job.py Writes outputs into a run-id directory and updates index.html links for public browsing.
mokelumne/dags/notify_user.py Removes now-unused public_path_to_url import.
test/e2e/test_public_browser.py Adds request-level e2e tests for auth + attachment behavior of the public browser.
test/e2e/test_public_browser_nav.py Adds UI navigation e2e tests for sidebar and DAG-run tab integration.
.gitignore Ignores uv.lock.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/e2e/test_public_browser_nav.py Outdated
Comment on lines +13 to +32
SEEDED_JOB_UUID = "5b5a47d6-18d6-4ca6-aea3-e43396bc587f"


def test_sidebar_tab_and_job_row_click(page_as_testadmin: Page) -> None:
page = page_as_testadmin

# Sidebar button rendered by PluginMenus with topLevel=true.
sidebar_btn = page.get_by_role("link", name="Batch Image Results").first
expect(sidebar_btn).to_be_visible()
sidebar_btn.click()

# React router navigates to /plugin/<url_route>.
page.wait_for_url("**/plugin/mokelumne_public_browser")

# Iframe loads the directory listing. Directory entries render as
# "<name>/" in the auto-generated <ul>, so match on the trailing slash.
iframe = page.frame_locator("iframe").first
job_link = iframe.get_by_role("link", name=f"{SEEDED_JOB_UUID}/")
expect(job_link).to_be_visible()

Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This UI navigation test also relies on SEEDED_JOB_UUID being present in the public directory listing inside the iframe. Without seeding ./public/<uuid>/ as part of the test setup, this will fail in CI and for new dev checkouts. Consider creating the seed directory/files in a fixture and using that UUID here.

Copilot uses AI. Check for mistakes.
Comment thread mokelumne/dags/summarise_job.py Outdated
Comment thread plugins/public_browser.py Outdated
Comment on lines +86 to +89
@app.get("/{subpath:path}")
def any_path(subpath: str) -> Response:
clean = subpath.rstrip("/")
return _serve(_resolve(clean), clean)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any_path() strips the trailing slash and then serves directory listings without redirecting. If a user hits /mokelumne/public/<dir> (no trailing slash), the listing’s relative links like child/ will resolve to /mokelumne/public/child/ instead of /mokelumne/public/<dir>/child/. Consider preserving the trailing slash or issuing a redirect to the canonical .../<dir>/ URL when the resolved target is a directory.

Copilot uses AI. Check for mistakes.
Comment thread plugins/public_browser.py Outdated
Comment thread test/e2e/test_public_browser_nav.py Outdated
Comment thread test/e2e/test_dag_run_file_browser.py Outdated
@danschmidt5189 danschmidt5189 force-pushed the AP-660 branch 2 times, most recently from d4292a7 to 1824f70 Compare April 24, 2026 18:19
"""Generate a URL-safe directory for collated output."""
path = public_dir() / str(uuid4())
path.mkdir() # We don't exist_ok=True because it should be unique.
def generate_id(dag: DAG, run_id: str) -> str:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a neat find — context variables can be injected by name.

Comment thread plugins/oidc_security_manager.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +56 to +60
run_dir = PUBLIC_HOST_DIR / DAG_ID / RUN_ID
run_dir.mkdir(parents=True, exist_ok=True)
(run_dir / "index.html").write_text(index_html, encoding="utf-8")
(run_dir / SEEDED_CSV_FILENAME).write_text(SEEDED_CSV_TEXT, encoding="utf-8")

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture creates/overwrites files under the shared ./public/<DAG>/<RUN>/ tree but never removes them. Consider using a yield fixture with teardown to delete run_dir afterwards (or write into a unique per-test subdirectory) to avoid accumulating state across runs.

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +74
def test_unauthenticated_user_cannot_view_files(
page: Page,
seeded_run: dict[str, str]
) -> None:
"""A fresh browser (no login) must get a 401 from the plugin route."""
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Role changes in this PR affect both "User" and "Viewer" (Plugins.can_read), but the e2e suite only asserts unauthenticated + user/admin behavior. Add a test covering the viewer/public user fixture (or explicitly assert they should not have access) so the permission intent is locked in.

Copilot uses AI. Check for mistakes.
Comment thread plugins/dag_run_file_browser.py Outdated
Comment on lines +41 to +45
# so left-click behavior works. The user can still right-click/Save As to download it.
media_type = mimetypes.guess_type(filepath)[0] or "application/octet-stream"
if media_type == "text/csv":
media_type = "text/plain; charset=utf-8"

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSV Content-Type rewrite (text/csv -> text/plain; charset=utf-8) is important for iframe behavior, but there’s no automated check that it stays in place. Consider adding a test that requests a seeded .csv and asserts the response Content-Type header so regressions are caught.

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +29
def generate_id(dag: DAG, run_id: str) -> str:
"""Creates the local storage path for the current DAG run"""
path = public_dir() / dag.dag_id / run_id
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate_id is defined to take (dag: DAG, run_id: str), but the DAG isn’t providing these values via upstream XComs/params. Unless Airflow is injecting these automatically, this task signature will break. Prefer reading dag_id/run_id from get_current_context() inside the task or keep generate_id parameterless.

Suggested change
def generate_id(dag: DAG, run_id: str) -> str:
"""Creates the local storage path for the current DAG run"""
path = public_dir() / dag.dag_id / run_id
def generate_id() -> str:
"""Creates the local storage path for the current DAG run"""
context = get_current_context()
dag_id = context["dag"].dag_id
run_id = context["run_id"]
path = public_dir() / dag_id / run_id

Copilot uses AI. Check for mistakes.
Comment thread mokelumne/util/storage.py Outdated
Comment on lines 46 to 47
url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/")
return f"{url_base}{str(location)}"
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public_path_to_url() currently concatenates MOKELUMNE_PUBLIC_URL with the public storage relative path but does not incorporate MOKELUMNE_PUBLIC_URL_PREFIX. With example.env setting MOKELUMNE_PUBLIC_URL=http://localhost:8080/, this will generate links that omit /public. Either require MOKELUMNE_PUBLIC_URL to include the prefix (document it) or build URLs from MOKELUMNE_PUBLIC_URL + MOKELUMNE_PUBLIC_URL_PREFIX (normalizing slashes).

Suggested change
url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/")
return f"{url_base}{str(location)}"
url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/")
url_prefix = os.environ.get("MOKELUMNE_PUBLIC_URL_PREFIX", "/public")
normalized_base = url_base.rstrip("/")
normalized_prefix = url_prefix.strip("/")
normalized_location = location.as_posix().strip("/")
url_parts = [normalized_base]
if normalized_prefix:
url_parts.append(normalized_prefix)
if normalized_location:
url_parts.append(normalized_location)
return "/".join(url_parts)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Copilot is suggesting is excessive. Try this (presuming an from urllib.parse import urljoin added at the top):

Suggested change
url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/")
return f"{url_base}{str(location)}"
url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/")
return urljoin(url_base, str(location))

Comment thread example.env
# digging into an auth/config error, but it's *extremely* verbose.
KC_LOG_LEVEL=INFO

MOKELUMNE_PUBLIC_STORAGE=/opt/app/storage
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MOKELUMNE_PUBLIC_STORAGE is set to /opt/app/storage, but docker-compose bind-mounts ./public into /opt/airflow/public. If someone copies example.env to .env, the plugin/tests will read from a different directory than the mounted ./public volume. Align this value with the compose mount (or update compose accordingly).

Suggested change
MOKELUMNE_PUBLIC_STORAGE=/opt/app/storage
MOKELUMNE_PUBLIC_STORAGE=/opt/airflow/public

Copilot uses AI. Check for mistakes.
Comment thread example.env
KC_LOG_LEVEL=INFO

MOKELUMNE_PUBLIC_STORAGE=/opt/app/storage
MOKELUMNE_PUBLIC_URL=http://localhost:8080/
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MOKELUMNE_PUBLIC_URL is set to http://localhost:8080/, but public_path_to_url() currently concatenates this with the public storage relative path (which is {dag_id}/{run_id}/...) and does not add MOKELUMNE_PUBLIC_URL_PREFIX. This will produce links that omit /public. Either set this env var to include the prefix (e.g. .../public/) or adjust URL construction to include MOKELUMNE_PUBLIC_URL_PREFIX.

Suggested change
MOKELUMNE_PUBLIC_URL=http://localhost:8080/
MOKELUMNE_PUBLIC_URL=http://localhost:8080/public/

Copilot uses AI. Check for mistakes.
- Adds an AirFlow plugin that serves files from
  `/public/{dag_id}/{run_id}`.
- Adds Plugins.View permissions to the existing
  User role. This is necessary for Users to be
  able to view the routes provided by the plugin.
  We should be more deliberate about this.
- To work with AirFlow's plugin templating
  mechanism, changes the local storage path for
  existing DAGs to be `/public/{dag_id}/{run_id}`.
  The root path and URL path remain configurable
  via environment variables.
Copy link
Copy Markdown
Member

@anarchivist anarchivist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Given that Dag read permissions are required to view the files (since they're now served through Airflow's apiserver) I'm not sure that "public" is the right nomenclature here. Not sure if that's a blocker as it's a big nit to pick.

Comment thread mokelumne/util/storage.py Outdated
Comment on lines 46 to 47
url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/")
return f"{url_base}{str(location)}"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Copilot is suggesting is excessive. Try this (presuming an from urllib.parse import urljoin added at the top):

Suggested change
url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/")
return f"{url_base}{str(location)}"
url_base = os.environ.get("MOKELUMNE_PUBLIC_URL", "http://localhost:8080/public/")
return urljoin(url_base, str(location))

Comment thread plugins/dag_run_file_browser.py Outdated

Serves files from `/{public_dir}/{dag_id}/{run_id}`

Users must have permissions to view the given DAG in order to view its files.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder if /public and similarly named methods are the right naming convention here. If users have permissions to view the given Dag, then it's not truly public, right?

Copy link
Copy Markdown
Member

@anarchivist anarchivist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ with a clean rebase and tests passing. just trying to clarify the multiple env variables that seem to have the same value.

Comment thread example.env
FILESERVER_FORCE_PLAINTEXT=text/csv
FILESERVER_ROOT=/opt/airflow/storage
FILESERVER_URL_PREFIX=/storage

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this redundant to FILESERVER_ROOT? (might just be alittle slow this morning)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants