Example notebooks demonstrating how to use Backblaze B2 Cloud Storage
with AI and data workflows. Each subdirectory is a self-contained example with its own
README.md, dependencies, and runnable notebook.
Train a PyTorch CIFAR-10 image classifier on data hosted in Backblaze B2. Demonstrates
a custom PyTorch Dataset that streams training images from a B2 bucket via the
S3-compatible API, learning to recognize the 10 CIFAR-10 categories
(airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, trucks).
End-to-end Ray Train +
Ray Tune example with checkpoints on
Backblaze B2. Reads California Housing parquet from a public B2 bucket, trains a small
PyTorch regression model with TorchTrainer, writes checkpoints back to a private B2
bucket via RunConfig(storage_path="s3://..."), and runs an optional Tuner sweep over
learning rates. Companion to the Ray Train persistent-storage user guide.
Speech-to-text transcription on Backblaze B2 with OpenAI Whisper.
Streams a public-domain demo audio clip (jfk.flac) from a public B2 bucket via
the S3-compatible API, runs Whisper for ASR, and optionally writes the
transcript JSON back to a private B2 bucket. Starting point for batch
transcription pipelines on B2-hosted audio archives.
Each example directory has its own README.md with detailed setup instructions, but in
short:
- In a browser, click one of the launch badges (Colab, Binder, Codespaces) on the example you want.
- Locally, clone this repo,
cdinto the example directory, and follow itsREADME.md(typicallypip install -r requirements.txt && jupyter lab <notebook>.ipynb).
Most examples need a Backblaze B2 application key. Generate one at https://www.backblaze.com/docs/cloud-storage-application-keys, then export the values as the standard AWS-named environment variables (B2's S3-compatible API reads these under the AWS SDK):
export AWS_ENDPOINT_URL_S3="https://s3.<region>.backblazeb2.com" # region from B2 console
export AWS_ACCESS_KEY_ID="<your B2 application key ID>"
export AWS_SECRET_ACCESS_KEY="<your B2 application key>"For Colab / Codespaces / Kaggle / Binder, see the per-example README.md for the
secret-store path that fits each runtime.
New examples are welcome. Each example lives in its own top-level directory and is expected to include:
- A descriptive
README.md(purpose, how to run, secret setup if needed) - The notebook(s) themselves, with launch badges that point at the path on
main - A
requirements.txt(or equivalent) so the example is reproducible - A
.github/workflows/test-<name>.ymlworkflow that executes the notebook end-to-end against the sharedbackblaze-samples-cibucket
See CLAUDE.md for the full conventions: writing style (no em
dashes), repo layout, two-bucket pattern, custom user_agent_extra on every
B2 boto3 client, headless-execution env vars, pre-commit hooks, and the
per-notebook CI workflow shape.
Before opening a PR, install and run pre-commit locally:
pip install pre-commit
pre-commit install
pre-commit run --all-filesCI runs the same hooks via .github/workflows/lint.yml on every push.
- Backblaze B2 Cloud Storage
- Backblaze developer docs
- More B2 sample apps across languages and frameworks: https://github.com/backblaze-b2-samples