Skip to content

Commit 0639079

Browse files
committed
refactor(specs): restructure into archive/active pattern
- Move completed specs to specs/archive/ - core-pipeline (v0.1.0 - initial implementation) - gcs-bigquery-storage (v0.1.0 - storage backend) - Create specs/active/ for in-progress features - Add READMEs explaining: - Workflow for new features - Archive contents and outcomes - Design decisions and learnings This makes it clear what's done vs what's being designed, and preserves design history for future reference.
1 parent 13d9adf commit 0639079

14 files changed

Lines changed: 227 additions & 0 deletions

File tree

specs/README.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# EventKit Specifications
2+
3+
This directory contains design specifications for EventKit features.
4+
5+
## Structure
6+
7+
```
8+
specs/
9+
├── archive/ # Completed features (historical reference)
10+
├── active/ # Features currently being designed/implemented
11+
└── README.md # This file
12+
```
13+
14+
## Workflow
15+
16+
### 1. Designing a New Feature
17+
18+
Create a new spec in `active/`:
19+
20+
```bash
21+
mkdir specs/active/feature-name
22+
```
23+
24+
Typical structure:
25+
```
26+
specs/active/feature-name/
27+
├── spec.md # What to build (user stories, requirements)
28+
├── plan.md # How to build it (architecture, components)
29+
├── tasks.md # Implementation checklist
30+
└── decisions.md # Key design decisions (ADR-style)
31+
```
32+
33+
### 2. During Implementation
34+
35+
- Work from the spec
36+
- Update tasks.md as you complete work
37+
- Document any deviations or learnings
38+
39+
### 3. After Completion
40+
41+
Move to archive and reference in commit:
42+
43+
```bash
44+
git mv specs/active/feature-name specs/archive/feature-name
45+
git commit -m "docs(specs): archive feature-name spec (closes #123)"
46+
```
47+
48+
The spec becomes historical context for:
49+
- Understanding design decisions
50+
- Future refactoring
51+
- Learning how the system evolved
52+
53+
## Archive Contents
54+
55+
### core-pipeline (v0.1.0)
56+
Initial EventKit implementation covering:
57+
- Event schema models (RawEvent, TypedEvent)
58+
- Validation & adaptation (validators, adapters)
59+
- Stream-based routing (sequencer)
60+
- Storage abstraction (EventStore protocol)
61+
- Queue implementations (AsyncQueue, PubSubQueue)
62+
- Ring buffer with WAL (durability layer)
63+
- API endpoints (collection, convenience)
64+
65+
**Status**: ✅ Complete
66+
**Timeline**: Q1 2025
67+
**Issues**: Core pipeline implementation
68+
69+
### gcs-bigquery-storage (v0.1.0)
70+
GCS + BigQuery storage backend:
71+
- Parquet serialization
72+
- Hive-partitioned file structure
73+
- BigQuery loader (batch loading)
74+
- Warehouse integration
75+
76+
**Status**: ✅ Complete
77+
**Timeline**: Q1 2025
78+
**Issues**: Storage implementation
79+
80+
## Active Specs
81+
82+
_No features currently in design phase._
83+
84+
## Tips
85+
86+
- **Keep specs lightweight** - Focus on decisions and design, not implementation details
87+
- **Reference issues** - Link specs to GitHub issues for tracking
88+
- **Archive when done** - Don't let specs rot in active/
89+
- **Living docs elsewhere** - Specs are design history; user docs live in README/ARCHITECTURE/Nextra
90+
91+
## Related
92+
93+
- [ARCHITECTURE.md](../ARCHITECTURE.md) - High-level system overview
94+
- [README.md](../README.md) - User-facing documentation
95+
- [CONTRIBUTING.md](../CONTRIBUTING.md) - Development workflow
96+
- [WORKFLOW.md](../WORKFLOW.md) - Spec-driven development process

specs/active/README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Active Specifications
2+
3+
_No features currently in design phase._
4+
5+
When you start designing a new feature:
6+
7+
1. Create a directory: `mkdir specs/active/feature-name`
8+
2. Add your spec documents (spec.md, plan.md, tasks.md)
9+
3. Work from the spec during implementation
10+
4. Move to archive when complete
11+
12+
## Template Structure
13+
14+
```
15+
specs/active/feature-name/
16+
├── spec.md # What to build
17+
│ - User stories
18+
│ - Requirements
19+
│ - Acceptance criteria
20+
21+
├── plan.md # How to build it
22+
│ - Architecture
23+
│ - Components
24+
│ - Design decisions
25+
26+
├── tasks.md # Implementation checklist
27+
│ - Detailed task breakdown
28+
│ - Acceptance criteria per task
29+
│ - Files to create/modify
30+
31+
└── decisions.md # ADR-style decision log (optional)
32+
- Context
33+
- Options considered
34+
- Decision rationale
35+
```
36+
37+
## See Also
38+
39+
- [Archive](../archive/) - Completed specs for reference
40+
- [WORKFLOW.md](../../WORKFLOW.md) - Spec-driven development process
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Core Pipeline (Archived)
2+
3+
**Status**: ✅ Completed in v0.1.0
4+
**Timeline**: Q1 2025 (8 weeks)
5+
**Issues**: Initial implementation
6+
7+
## What Was Built
8+
9+
The foundational EventKit architecture covering:
10+
11+
1. **Event Schema** - RawEvent (flexible) → TypedEvent (strict)
12+
2. **Validation & Adaptation** - Composable validators, Segment adapter
13+
3. **Stream Routing** - Hash-based sequencer for consistent partitioning
14+
4. **Queue Layer** - AsyncQueue (single-server) + PubSubQueue (distributed)
15+
5. **Storage** - EventStore protocol, GCS implementation
16+
6. **Ring Buffer** - SQLite WAL for durability
17+
7. **API** - Collection endpoints + Segment-compatible convenience endpoints
18+
8. **Observability** - Prometheus metrics, structured logging
19+
20+
## Spec Documents
21+
22+
- [spec.md](./spec.md) - User stories and requirements
23+
- [plan.md](./plan.md) - Architecture and implementation approach
24+
- [tasks.md](./tasks.md) - 17 tasks with detailed checklists
25+
- [architecture.md](./architecture.md) - System design
26+
- [api.md](./api.md) - API specification
27+
- [data-models.md](./data-models.md) - Schema definitions
28+
29+
## Key Decisions
30+
31+
1. **Flexible ingestion, strict processing** - Accept any JSON at edge, validate downstream
32+
2. **Protocol-based design** - All components use Protocol, not ABC
33+
3. **Async-first** - Full async/await throughout
34+
4. **Pluggable storage** - EventStore protocol enables multiple backends
35+
5. **Ring buffer for durability** - SQLite WAL prevents data loss
36+
37+
## Outcomes
38+
39+
- **252 unit tests** with >80% coverage
40+
- **10k+ events/sec** validated throughput
41+
- **Sub-millisecond** p50 latency
42+
- **Zero data loss** with ring buffer
43+
- **Production-ready** v0.1.0 release
44+
45+
## Related
46+
47+
- [GCS + BigQuery Storage](../gcs-bigquery-storage/) - Storage backend implementation
48+
- See [ARCHITECTURE.md](../../../ARCHITECTURE.md) for current system design
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# GCS + BigQuery Storage (Archived)
2+
3+
**Status**: ✅ Completed in v0.1.0
4+
**Timeline**: Q1 2025
5+
**Issues**: Storage implementation
6+
7+
## What Was Built
8+
9+
Production-grade storage backend using Google Cloud Platform:
10+
11+
1. **GCS Event Store** - Write events to Cloud Storage as Parquet files
12+
2. **Hive Partitioning** - Date-based organization (date=YYYY-MM-DD/)
13+
3. **BigQuery Loader** - Background service for batch loading
14+
4. **Warehouse Integration** - Idempotent loads, metadata tracking
15+
5. **EventLoader** - Batching with adaptive flushing (time + size based)
16+
17+
## Spec Documents
18+
19+
- [spec.md](./spec.md) - Requirements and user stories
20+
- [plan.md](./plan.md) - Implementation approach
21+
- [tasks.md](./tasks.md) - Task breakdown
22+
- [data-model.md](./data-model.md) - Schema and partitioning
23+
24+
## Key Decisions
25+
26+
1. **GCS as event store** - Parquet for compression + columnar format
27+
2. **Batch loading** - Write to GCS, load to BigQuery in batches (cost-optimized)
28+
3. **Hive partitioning** - Date-based folders for efficient queries
29+
4. **Metadata table** - Track loaded files for idempotency
30+
5. **Adaptive batching** - Flush on time OR size threshold
31+
32+
## Outcomes
33+
34+
- **Parquet compression** - ~10x smaller than JSON
35+
- **Cost-efficient** - GCS storage 50% cheaper than BigQuery
36+
- **Idempotent loads** - Safe to retry without duplicates
37+
- **Query performance** - Date partitioning enables fast filters
38+
- **Flexible warehouse** - Can swap BigQuery for Snowflake/Redshift
39+
40+
## Related
41+
42+
- [Core Pipeline](../core-pipeline/) - Foundation EventKit built on
43+
- See [ARCHITECTURE.md](../../../ARCHITECTURE.md) for storage design

0 commit comments

Comments
 (0)