Skip to content

DOC update remaining gallery examples to load datasets from paths#1964

Merged
rcap107 merged 1 commit intoskrub-data:mainfrom
MuditAtrey:doc/update-remaining-gallery-examples-to-use-paths
Mar 23, 2026
Merged

DOC update remaining gallery examples to load datasets from paths#1964
rcap107 merged 1 commit intoskrub-data:mainfrom
MuditAtrey:doc/update-remaining-gallery-examples-to-use-paths

Conversation

@MuditAtrey
Copy link
Copy Markdown
Contributor

Description

Part of #1934.

Updates the remaining gallery examples to load datasets from file paths
instead of accessing dataframes directly from the bunch object, following
the pattern established in #1852 and #1940.

Files changed:

  • examples/FIXME/07_grid_searching_with_the_tablevectorizer.py
  • examples/data_ops/1110_data_ops_intro.py

Note: examples/FIXME/08_join_aggregation_full.py uses fetch_figshare
which is no longer available in the codebase, which is why I left it untouched.

Checklist

  • I have read the contributing guidelines
  • I have added tests that verify the bug fix
  • I have added an entry to CHANGES.rst describing the fix
  • My code follows the code style of this project
  • I have checked my code and corrected any misspellings

How Has This Been Tested?

pre-commit run --all-files passes cleanly.
examples/FIXME/07_grid_searching_with_the_tablevectorizer.py runs without errors.
examples/data_ops/1110_data_ops_intro.py here data loads correctly from paths;
but a preexisting error downstream (ML pipeline) were unrelated to this PR.

AI Disclosure

  • This PR contains AI-generated code
  • I have tested the code generated in my PR
  • I have read and understood every line that has been generated by the AI agent
  • I can explain what the AI-generated code does

@MuditAtrey MuditAtrey force-pushed the doc/update-remaining-gallery-examples-to-use-paths branch from af7f6c7 to c24212c Compare March 14, 2026 14:46
@MuditAtrey MuditAtrey marked this pull request as draft March 14, 2026 14:54
@MuditAtrey
Copy link
Copy Markdown
Contributor Author

@rcap107 while updating 1110_data_ops_intro.py, I found that fetch_employee_salaries(split="test").employee_salaries_path points to an empty CSV meaning the dataframe is sliced to [id_split:] on line 71 and then sliced again when writing to the path on line 77. The fix does looks straightforward but wanted to confirm before touching _fetching.py.

Comment thread CHANGES.rst Outdated
@MuditAtrey MuditAtrey force-pushed the doc/update-remaining-gallery-examples-to-use-paths branch from c24212c to 02406ae Compare March 17, 2026 17:47
@rcap107 rcap107 added this to the Release 0.8.1 milestone Mar 18, 2026
@rcap107
Copy link
Copy Markdown
Member

rcap107 commented Mar 20, 2026

@rcap107 while updating 1110_data_ops_intro.py, I found that fetch_employee_salaries(split="test").employee_salaries_path points to an empty CSV meaning the dataframe is sliced to [id_split:] on line 71 and then sliced again when writing to the path on line 77. The fix does looks straightforward but wanted to confirm before touching _fetching.py.

Hi @MuditAtrey, yes you're right, there is an issue there. Please fix it and thanks

@MuditAtrey MuditAtrey force-pushed the doc/update-remaining-gallery-examples-to-use-paths branch from 02406ae to b0ef47d Compare March 23, 2026 15:57
@rcap107
Copy link
Copy Markdown
Member

rcap107 commented Mar 23, 2026

Hi @MuditAtrey, thanks for the fix. Do you still need to do something with the PR, or can we switch it to ready for review?

@MuditAtrey
Copy link
Copy Markdown
Contributor Author

@rcap107 was about to ask you the same, switch it to ready

@rcap107 rcap107 marked this pull request as ready for review March 23, 2026 17:20
Copy link
Copy Markdown
Member

@rcap107 rcap107 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks a lot @MuditAtrey

@rcap107 rcap107 merged commit 0f42a7e into skrub-data:main Mar 23, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DOC - Update the examples in the gallery so that they load datasets from their path

2 participants