This project is about turning Jupyter notebooks into something we can call through a REST API. Instead of running a notebook by hand, the idea is that we can send a request, execute it in the cloud, and get the result back in a more structured way.
We built this around FastAPI, Docker, Papermill, and AWS. A request comes in through the API, the notebook is started as an AWS Batch job, and the executed notebook is stored in S3.
We made this in collaboration with a client from LifeWatch, based on the need to make notebook-based workflows easier to deploy and reuse without rebuilding everything by hand after every change.
The main goal is to make notebooks easier to use as services instead of standalone files. In our setup, that means:
- extracting notebook parameters automatically,
- exposing notebooks through an API,
- running them inside containers,
- and storing the output after execution.
The idea is simple: push a notebook, then have a matching REST-accessible service instead of a manual notebook workflow.
The project has a few main parts:
pipeline.py looks through notebooks in the current working directory and collects variables that start with param_. Those are written into paramdump.json, which is then used by the API to know which notebooks exist and which parameters they accept.
app.py contains the FastAPI app. This is the part that handles requests such as:
- starting a notebook run,
- checking the status of a job,
- listing jobs,
- listing available notebooks,
- viewing notebook parameter defaults,
- and downloading the executed output notebook.
docker/run_from_json.py handles the actual notebook run inside the container. It reads the given parameters, patches the notebook, and executes it with Papermill.
The Docker image is defined in docker/dockerfile.
lambda/aws_batch.py connects the API to AWS Batch and S3. It submits jobs, maps AWS Batch states to simpler API states, lists jobs, and fetches output notebooks after execution.
This is the basic flow:
- A client sends a
POSTrequest for a notebook. - The API loads
paramdump.jsonfrom S3 and checks the notebook and its default parameters. - If extra parameters are included, they are validated first.
- The API submits an AWS Batch job.
- The notebook runs inside a container with Papermill.
- The output notebook is written to S3.
- The client can later check the status or download the executed notebook output.
Right now the API has these routes:
POST /jobs/{notebook}GET /jobsGET /jobs/notebooksGET /jobs/notebooks/{name}GET /jobs/{job_id}GET /jobs/{job_id}/output
Example body for starting a notebook:
{
"param_example": 42
}The API returns simplified job states instead of raw AWS Batch states:
queuedrunningsucceededfailed
app.py- FastAPI apppipeline.py- notebook parameter extractionlambda/aws_batch.py- AWS Batch and S3 helper functionslambda/config.py- AWS configapi_repo/pipeline.sh- older helper script for notebook conversiondocker/run_from_json.py- notebook execution scriptdocker/dockerfile- Docker image definition
There are a few things the current code expects:
- the AWS resources already exist,
- the S3 bucket is called
notebook2rest, - the AWS region is
eu-west-1, paramdump.jsonhas been generated and uploaded before the API is used,- and notebook parameters use the
param_prefix.
Some of the main dependencies used in this repo are:
- FastAPI
- Uvicorn
- Papermill
- nbformat / nbconvert / Jupyter
- boto3
The Docker setup also separates common runtime dependencies from notebook-specific dependencies through:
docker/requirements.txtdocker/requirements_notebook.txt
- Jona Aalten
- Mike van der Deure
- Theo Gatea
- Colin de Koning