Skip to content

backblaze-b2-samples/notebooks

Repository files navigation

Backblaze B2 sample notebooks

Example notebooks demonstrating how to use Backblaze B2 Cloud Storage with AI and data workflows. Each subdirectory is a self-contained example with its own README.md, dependencies, and runnable notebook.

Examples

Train a PyTorch CIFAR-10 image classifier on data hosted in Backblaze B2. Demonstrates a custom PyTorch Dataset that streams training images from a B2 bucket via the S3-compatible API, learning to recognize the 10 CIFAR-10 categories (airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, trucks).

Open In Colab Open In Binder Open in GitHub Codespaces

End-to-end Ray Train + Ray Tune example with checkpoints on Backblaze B2. Reads California Housing parquet from a public B2 bucket, trains a small PyTorch regression model with TorchTrainer, writes checkpoints back to a private B2 bucket via RunConfig(storage_path="s3://..."), and runs an optional Tuner sweep over learning rates. Companion to the Ray Train persistent-storage user guide.

Open In Colab Open In Binder Open in GitHub Codespaces

Speech-to-text transcription on Backblaze B2 with OpenAI Whisper. Streams a public-domain demo audio clip (jfk.flac) from a public B2 bucket via the S3-compatible API, runs Whisper for ASR, and optionally writes the transcript JSON back to a private B2 bucket. Starting point for batch transcription pipelines on B2-hosted audio archives.

Open In Colab Open In Binder Open in GitHub Codespaces

How to run an example

Each example directory has its own README.md with detailed setup instructions, but in short:

  • In a browser, click one of the launch badges (Colab, Binder, Codespaces) on the example you want.
  • Locally, clone this repo, cd into the example directory, and follow its README.md (typically pip install -r requirements.txt && jupyter lab <notebook>.ipynb).

Backblaze B2 prerequisites

Most examples need a Backblaze B2 application key. Generate one at https://www.backblaze.com/docs/cloud-storage-application-keys, then export the values as the standard AWS-named environment variables (B2's S3-compatible API reads these under the AWS SDK):

export AWS_ENDPOINT_URL_S3="https://s3.<region>.backblazeb2.com"  # region from B2 console
export AWS_ACCESS_KEY_ID="<your B2 application key ID>"
export AWS_SECRET_ACCESS_KEY="<your B2 application key>"

For Colab / Codespaces / Kaggle / Binder, see the per-example README.md for the secret-store path that fits each runtime.

Contributing

New examples are welcome. Each example lives in its own top-level directory and is expected to include:

  • A descriptive README.md (purpose, how to run, secret setup if needed)
  • The notebook(s) themselves, with launch badges that point at the path on main
  • A requirements.txt (or equivalent) so the example is reproducible
  • A .github/workflows/test-<name>.yml workflow that executes the notebook end-to-end against the shared backblaze-samples-ci bucket

See CLAUDE.md for the full conventions: writing style (no em dashes), repo layout, two-bucket pattern, custom user_agent_extra on every B2 boto3 client, headless-execution env vars, pre-commit hooks, and the per-notebook CI workflow shape.

Before opening a PR, install and run pre-commit locally:

pip install pre-commit
pre-commit install
pre-commit run --all-files

CI runs the same hooks via .github/workflows/lint.yml on every push.

Related

About

Example notebooks demonstrating how to use Backblaze B2 Cloud Storage with AI and data workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors