Port the Go Smalltalk bytecode interpreter to C++ while maintaining full test compatibility. The C++ version will use tagged pointers for performance and prepare for eventual JIT compilation.
130+ test functions across these categories:
- VM Core: Bytecode handlers, context management, message sending (47 tests)
- Object System: Classes, methods, blocks, arrays, strings, symbols (62 tests)
- Compiler: Parsing, bytecode generation, method building (21 tests)
- Primitives: Integer/float math, boolean operations, memory access
- Integration: Factorial computation, expression evaluation
- Object: Base object with tagged pointer support
- Class: Class definitions with method dictionary
- Immediate Values: Tagged integers, floats, booleans, nil
- Collections: Array, Dictionary, String, Symbol, ByteArray
- Methods & Blocks: Executable code objects
- Context: Execution context with stack management
- Bytecode Handlers: 13 core bytecode operations
- Message Dispatch: Method lookup and invocation
- Primitives: Built-in operations (+, -, *, <, etc.)
- Stop & Copy GC: Two semi-space garbage collector
- Tagged Pointers: Handle immediate values correctly
- Object Allocation: Proper alignment and initialization
- Parser: Smalltalk source to AST
- Bytecode Generator: AST to bytecode compilation
- Method Builder: Method object construction
Goals: Basic object model with tagged pointers working
Tasks:
-
Tagged Pointer System
- Implement
TaggedValueunion type - Support for immediate integers, floats, booleans, nil
- Pointer tagging with 2-bit tags (00=pointer, 01=special, 10=float, 11=int)
- Implement
-
Base Object Types
Objectbase class with class pointerClasswith method dictionary and instance variables- Memory layout compatible with GC
-
Test Infrastructure
- C++ test framework (Google Test)
- Port immediate value tests (14 tests)
- Port basic object tests (12 tests)
Deliverables:
objects/tagged_value.h/cppobjects/object.h/cppobjects/class.h/cpp- Working immediate value tests
Goals: Core collection types and arithmetic working
Tasks:
-
Collection Classes
- Array with bounds checking
- Dictionary with hash table
- String with UTF-8 support
- Symbol table for interned strings
-
Primitive Operations
- Integer arithmetic (+, -, *, /, <, >, =)
- Float operations with precision handling
- Boolean logic (and, or, not)
- Memory access primitives
-
Port Collection Tests (35 tests)
- Array operations and bounds checking
- Dictionary get/set/remove operations
- String manipulation and comparison
- Symbol interning and equality
Deliverables:
objects/array.h/cpp,objects/dictionary.h/cppobjects/string.h/cpp,objects/symbol.h/cppvm/primitives.h/cpp- Passing collection and primitive tests
Goals: Bytecode execution and context management
Tasks:
-
Execution Context
- Stack management with overflow protection
- Temporary variable storage
- Method context chaining
-
Bytecode Handlers
- 13 core bytecode operations
- Efficient dispatch mechanism (computed goto or jump table)
- Stack manipulation (push, pop, duplicate)
-
Message Dispatch
- Method lookup with inheritance
- Argument passing and stack management
- Return value handling
-
Port VM Tests (47 tests)
- Context operations
- Bytecode handler execution
- Message sending scenarios
- Stack management edge cases
Deliverables:
vm/context.h/cppvm/bytecode_handlers.h/cppvm/message_dispatch.h/cppvm/vm.h/cpp- Passing VM core tests
Goals: Stop & copy garbage collection working with tagged pointers
Tasks:
-
Two-Space Collector
- Semi-space allocation and copying
- Root set scanning (globals, stacks)
- Object traversal with proper tagged pointer handling
-
Tagged Pointer GC Integration
- Skip scanning immediate values
- Handle forwarding pointers correctly
- Preserve tagged values during collection
-
Memory Layout
- Proper object alignment
- Header word optimization
- Forwarding pointer mechanics
-
Port Memory Tests (8 tests)
- Allocation and collection cycles
- Reference updating during GC
- Memory pressure scenarios
Deliverables:
memory/gc.h/cppmemory/allocator.h/cpp- Passing GC and memory tests
Goals: Block objects and non-local returns
Tasks:
-
Block Objects
- Closures with captured variables
- Block compilation and execution
- Outer context references
-
Control Flow
- Jump instructions (conditional/unconditional)
- Non-local returns from blocks
- Exception handling framework
-
Port Block Tests (18 tests)
- Block creation and execution
- Variable capture scenarios
- Nested block handling
- Non-local return edge cases
Deliverables:
objects/block.h/cppvm/control_flow.h/cpp- Passing block and control flow tests
Goals: Parse Smalltalk source and generate bytecode
Tasks:
-
Parser
- Smalltalk syntax parsing
- AST construction
- Error handling and recovery
-
Code Generation
- AST to bytecode translation
- Literal table construction
- Method object assembly
-
Port Compiler Tests (21 tests)
- Expression parsing
- Method compilation
- Bytecode generation accuracy
Deliverables:
compiler/parser.h/cppcompiler/codegen.h/cpp- Passing compiler tests
Goals: Full system integration and performance
Tasks:
-
Integration Testing
- Port remaining integration tests (8 tests)
- Factorial computation verification
- End-to-end expression evaluation
-
Performance Optimization
- Bytecode dispatch optimization (threaded code)
- Inline caching for method dispatch
- Memory allocation tuning
-
Build System
- CMake configuration
- Cross-platform compatibility
- Test automation
Deliverables:
- Complete working C++ interpreter
- All 130+ tests passing
- Performance benchmarks vs Go version
# CMakeLists.txt structure
project(SmalltalkCppVM)
add_subdirectory(objects)
add_subdirectory(vm)
add_subdirectory(memory)
add_subdirectory(compiler)
add_subdirectory(tests)- 1:1 test mapping: Each Go test gets C++ equivalent
- Test data preservation: Same test cases, expected results
- Continuous validation: Tests pass after each phase
- Startup: ≤ 10ms (vs Go's ~50ms)
- Bytecode dispatch: ≥ 100M ops/sec
- GC pause: ≤ 1ms for small heaps
- Memory overhead: ≤ 2x object size
- Tagged pointer GC bugs: Extensive testing, reference Go implementation
- Memory corruption: Valgrind, AddressSanitizer integration
- Performance regression: Continuous benchmarking vs Go
- Complex GC debugging: Allocate extra time for Phase 4
- Block semantics: Non-local returns are subtle, plan for iteration
- ✅ All 130+ Go tests pass in C++
- ✅ Factorial computation produces identical results
- ✅ Memory management stable under stress
- ✅ Performance equal or better than Go version
- ✅ Clean valgrind/sanitizer runs
- ✅ Ready for JIT compiler integration
- JIT Compiler: Template-based code generation
- Inline Caching: Polymorphic method dispatch optimization
- Generational GC: Reduce collection overhead
- LSP Integration: Connect to language server