Documenting SQL case studies from Danny Ma's 8 Week SQL Challenge for learning and practice purposes.
Install docker and docker compose:
docker compose upSQLPad can accessed at http://localhost:3000 or the port specified in the compose.yml file.
Stop and remove the containers with:
docker compose downThe project manager used in this project is uv. There are at least three simple ways to set up the Python interpreter.
uv sync --frozen --all-groups --managed-pythonSee documentions for frozen, managed-python, and all-groups.
conda search python | grep " 3\.\(12\)\."
conda create --name sql_case_studies -y python=3.12
conda activate sql_case_studies
uv sync --frozen --all-groups --no-managed-python# List available Python versions
pyenv install --list | grep " 3\.\(12\)\."
# As an example, install Python 3.12.8
pyenv install 3.12.8
pyenv local 3.12.8
uv sync --frozen --all-groups --no-managed-pythonThe Athena class can be used to interact with Amazon Athena. To use this client, the AWS principal (e.g., an IAM role or IAM user) used must have the necessary permissions for Athena.
Customized S3 permissions are needed if a non-default bucket is to be used to store the query results (see below for more details).
The required permissions can be encapsulated in a boto3 session instance and passed as the first argument to the constructor of the Athena client. The create_session utility function can be used to create the session instance. The parameters are:
-
profile_name: The AWS credentials profile name to use. -
role_arn: The IAM role ARN to assume. If provided, theprofile_namemust have thests:AssumeRolepermission. -
duration_seconds: The duration, in seconds, for which the temporary credentials are valid. If role-chaining occurs, the maximum duration is 1 hour.
import boto3
from src.utils import create_session
boto3_session = create_session(
profile_name="aws-profile-name",
role_arn=os.getenv("ATHENA_IAM_ROLE_ARN"),
)The data parquet files for the case studies must be stored in an S3 bucket. All DDL queries are stored in the sql directory under each case study directory. These must be adjusted to point to the correct S3 uris. The data files can be uploaded to an S3 bucket using the aws cli or the console.
# Create a bucket
$ aws s3api create-bucket --bucket sql-case-studies --profile profile-name
# Upload all data files to the bucket
$ aws s3 cp data/ s3://sql-case-studies/ --recursive --profile profile-name Optionally, query results can be configured to be stored in a custom S3 bucket, instead of the default bucket (i.e., aws-athena-query-results-accountid-region).
The query result S3 uri can be stored as an environment variable, e.g. ATHENA_S3_OUTPUT=s3://bucket-name/path/to/output/, which can then be passed as the s3_output argument to the Athena class constructor. The client creates the default bucket if the s3_output argument is not provided.
import os
from src.athena import Athena
from src.utils import create_session
boto3_session = create_session(
profile_name="aws-profile-name",
role_arn=os.getenv("ATHENA_IAM_ROLE_ARN"),
)
s3_output = os.getenv('ATHENA_S3_OUTPUT', '')
athena = Athena(boto3_session=boto3_session, s3_output=s3_output)Each case study folder contains a notebooks directory containing Jupyter notebooks that can be used to run SQL queries.