Skip to content

Latest commit

 

History

History
465 lines (375 loc) · 14.4 KB

File metadata and controls

465 lines (375 loc) · 14.4 KB

C++ Smalltalk VM Simplification & Flexibility Plan

Overview

This plan addresses critical issues in the current C++ Smalltalk VM implementation that hinder both simplicity and long-term flexibility. The current codebase suffers from over-engineering in some areas while lacking extensibility in others.

Goals

Simplicity:

  • Reduce cognitive complexity and maintenance burden
  • Eliminate unnecessary abstractions and inheritance hierarchies
  • Consolidate error handling and resource management
  • Streamline object model and memory layout

Long-term Flexibility:

  • Enable plugin-based extensibility
  • Support multiple execution engines (interpreter → JIT)
  • Configuration-driven behavior
  • Clean separation of concerns for independent evolution

Current Problems Analysis

🚨 High-Priority Issues

  1. Over-Complex Tagged Value System

    • Too many special-case constructors and type checks
    • Brittle bit manipulation scattered throughout codebase
    • Hard to debug and extend with new immediate types
  2. Excessive Object Inheritance Hierarchy

    • Virtual inheritance overhead for simple immediate values
    • Complex memory layout makes GC traversal error-prone
    • Serialization and debugging complexity
  3. Tightly Coupled Memory Manager

    • Hardcoded allocation methods for each object type
    • No separation between allocation policy and object construction
    • Difficult to experiment with different GC strategies
  4. Inflexible Bytecode Dispatch

    • Hardcoded switch statements resist extension
    • No plugin mechanism for custom bytecodes
    • JIT integration will require major refactoring

Implementation Plan

Phase 1: Simplify Core Object Model (Week 1-2)

Goals: Eliminate inheritance complexity, create uniform object representation

Task 1.1: Replace Object Inheritance with Composition

  • Remove SmallInteger, Boolean class hierarchy
  • Create single Object struct with discriminated union
  • Implement type-safe accessors

Before:

class SmallInteger : public Object {
    int32_t value_;
public:
    SmallInteger(int32_t value, Class* integerClass);
    int32_t getValue() const { return value_; }
};

After:

struct Object {
    enum Type { IMMEDIATE, ARRAY, DICTIONARY, CLASS, METHOD };
    
    Type type;
    uint32_t size;
    uint32_t hash;
    void* class_ptr;
    
    union {
        int32_t immediate_value;
        void** array_elements;
        void* custom_data;
    };
    
    // Type-safe accessors
    int32_t as_integer() const { 
        assert(type == IMMEDIATE);
        return immediate_value; 
    }
};

Task 1.2: Simplify Tagged Value System

  • Replace complex bit manipulation with simple discriminated union
  • Single constructor with factory methods
  • Eliminate special-case handling

New Implementation:

class TaggedValue {
public:
    enum class Type : uint8_t { NIL, BOOL, INT, FLOAT, OBJECT };
    
    static TaggedValue nil() { return TaggedValue(Type::NIL, 0); }
    static TaggedValue boolean(bool val) { return TaggedValue(Type::BOOL, val ? 1 : 0); }
    static TaggedValue integer(int32_t val) { return TaggedValue(Type::INT, val); }
    static TaggedValue object(void* ptr) { return TaggedValue(Type::OBJECT, reinterpret_cast<uintptr_t>(ptr)); }
    
    Type type() const { return type_; }
    
    template<typename T>
    T as() const {
        if constexpr (std::is_same_v<T, int32_t>) {
            assert(type_ == Type::INT);
            return static_cast<int32_t>(value_);
        } else if constexpr (std::is_pointer_v<T>) {
            assert(type_ == Type::OBJECT);
            return reinterpret_cast<T>(value_);
        }
    }
    
private:
    TaggedValue(Type t, uintptr_t v) : type_(t), value_(v) {}
    Type type_;
    uintptr_t value_;
};

Deliverables:

  • include/simple_object.h - New unified object model
  • include/simple_tagged_value.h - Simplified tagged values
  • Updated tests demonstrating equivalent functionality

Phase 2: Decouple Memory Management (Week 3)

Goals: Flexible allocation strategies, cleaner GC integration

Task 2.1: Single Allocation Interface

  • Replace multiple allocateXxx() methods with generic allocator
  • Builder pattern for complex object construction
  • Clear separation of allocation from initialization

New Memory Manager:

class MemoryManager {
public:
    // Single allocation primitive
    template<typename T>
    T* allocate(size_t extra_slots = 0) {
        size_t total_size = sizeof(T) + (extra_slots * sizeof(void*));
        return static_cast<T*>(allocate_raw(total_size));
    }
    
    // Builder for complex objects
    ObjectBuilder new_object(Object::Type type) {
        return ObjectBuilder(*this, type);
    }
    
    // Policy-based GC trigger
    void set_collection_policy(std::unique_ptr<CollectionPolicy> policy) {
        collection_policy_ = std::move(policy);
    }
    
private:
    void* allocate_raw(size_t bytes);
    void collect_if_needed(size_t requested);
    std::unique_ptr<CollectionPolicy> collection_policy_;
};

class ObjectBuilder {
public:
    ObjectBuilder& with_size(size_t slots);
    ObjectBuilder& with_class(void* class_ptr);
    Object* build();
};

Task 2.2: Pluggable GC Policies

  • Abstract collection triggering logic
  • Support for different GC strategies
  • Configuration-driven collection behavior

Deliverables:

  • include/flexible_memory.h - New memory management interface
  • include/gc_policies.h - Pluggable collection strategies
  • Performance benchmarks comparing old vs new approaches

Phase 3: Plugin-Based Bytecode System (Week 4-5)

Goals: Extensible bytecode dispatch, JIT preparation

Task 3.1: Plugin Architecture for Bytecode Handlers

  • Replace switch statements with handler registry
  • Enable runtime bytecode extension
  • Prepare for JIT compiler integration

New Dispatcher:

class BytecodeDispatcher {
public:
    using Handler = std::function<void(Interpreter&, const uint8_t*&)>;
    
    void register_handler(uint8_t opcode, Handler func) {
        handlers_[opcode] = std::move(func);
    }
    
    void dispatch(uint8_t opcode, Interpreter& vm, const uint8_t*& ip) {
        if (auto& handler = handlers_[opcode]) {
            handler(vm, ip);
        } else {
            throw VMError(VMError::UNKNOWN_BYTECODE, "Opcode: " + std::to_string(opcode));
        }
    }
    
private:
    std::array<Handler, 256> handlers_;
};

// Standard bytecodes as plugins
class CoreBytecodes : public VMPlugin {
public:
    void register_handlers(BytecodeDispatcher& dispatcher) override {
        dispatcher.register_handler(PUSH_LITERAL, [](Interpreter& vm, const uint8_t*& ip) {
            uint32_t index = read_uint32(ip);
            vm.push(vm.current_method().literal(index));
        });
        // ... other core handlers
    }
};

Task 3.2: Abstract Execution Engine

  • Separate bytecode interpretation from execution policy
  • Enable multiple execution strategies (interpreter, JIT, etc.)
  • Configuration-driven engine selection

Execution Engine Interface:

class ExecutionEngine {
public:
    virtual ~ExecutionEngine() = default;
    virtual TaggedValue execute(const CompiledMethod& method, TaggedValue receiver, std::vector<TaggedValue>& args) = 0;
    virtual void configure(const VMConfig& config) = 0;
};

class InterpreterEngine : public ExecutionEngine {
    BytecodeDispatcher dispatcher_;
public:
    TaggedValue execute(const CompiledMethod& method, TaggedValue receiver, std::vector<TaggedValue>& args) override;
};

class VM {
    std::unique_ptr<ExecutionEngine> engine_;
public:
    void set_engine(std::unique_ptr<ExecutionEngine> eng) { 
        engine_ = std::move(eng); 
    }
    
    TaggedValue execute_method(const CompiledMethod& method, TaggedValue receiver, std::vector<TaggedValue>& args) {
        return engine_->execute(method, receiver, args);
    }
};

Deliverables:

  • include/bytecode_dispatcher.h - Plugin-based dispatch system
  • include/execution_engine.h - Abstract execution interface
  • src/core_bytecodes.cpp - Standard bytecodes as plugins
  • Working interpreter engine with equivalent performance

Phase 4: Configuration & Error Management (Week 6)

Goals: Unified error handling, configuration-driven behavior

Task 4.1: Consolidated Error System

  • Single error hierarchy for all VM errors
  • Structured error information for debugging
  • Exception safety throughout the codebase

Error System:

class VMError : public std::exception {
public:
    enum Type { 
        OUT_OF_MEMORY, STACK_OVERFLOW, TYPE_ERROR, 
        METHOD_NOT_FOUND, UNKNOWN_BYTECODE, INVALID_OBJECT 
    };
    
    VMError(Type t, const std::string& msg, const std::string& context = "")
        : type_(t), message_(msg), context_(context) {}
    
    Type type() const { return type_; }
    const char* what() const noexcept override { return message_.c_str(); }
    const std::string& context() const { return context_; }
    
private:
    Type type_;
    std::string message_;
    std::string context_;
};

// RAII error context for debugging
class ErrorContext {
    static thread_local std::vector<std::string> context_stack_;
public:
    ErrorContext(const std::string& context) { 
        context_stack_.push_back(context); 
    }
    ~ErrorContext() { 
        context_stack_.pop_back(); 
    }
    
    static std::string current_context() {
        std::string result;
        for (const auto& ctx : context_stack_) {
            result += ctx + " -> ";
        }
        return result;
    }
};

#define VM_CONTEXT(msg) ErrorContext _ctx(msg)

Task 4.2: Configuration-Driven Behavior

  • External configuration for VM parameters
  • Runtime behavior modification without recompilation
  • Environment and file-based configuration

Configuration System:

struct VMConfig {
    size_t heap_size = 64 * 1024 * 1024;
    bool enable_jit = false;
    bool debug_mode = false;
    std::string plugin_directory = "./plugins";
    double gc_threshold = 0.8;
    
    static VMConfig from_file(const std::string& path);
    static VMConfig from_environment();
    static VMConfig defaults();
    
    void validate() const;
};

class VM {
    VMConfig config_;
public:
    VM(const VMConfig& config = VMConfig::defaults()) 
        : config_(config) {
        config_.validate();
        initialize_from_config();
    }
    
private:
    void initialize_from_config();
};

Deliverables:

  • include/vm_error.h - Unified error handling system
  • include/vm_config.h - Configuration management
  • JSON/YAML configuration file support
  • Error reporting with full context traces

Phase 5: Integration & Plugin System (Week 7-8)

Goals: Complete plugin architecture, third-party extensibility

Task 5.1: Plugin Loading Infrastructure

  • Dynamic plugin loading and unloading
  • Plugin dependency management
  • Sandboxed plugin execution

Plugin System:

class VMPlugin {
public:
    virtual ~VMPlugin() = default;
    virtual void initialize(VM& vm) = 0;
    virtual void register_primitives(PrimitiveRegistry& reg) {}
    virtual void register_bytecodes(BytecodeDispatcher& dispatcher) {}
    virtual void cleanup() {}
    
    virtual std::string name() const = 0;
    virtual std::string version() const = 0;
    virtual std::vector<std::string> dependencies() const { return {}; }
};

class PluginManager {
public:
    void load_plugin(const std::string& path);
    void load_plugins_from_directory(const std::string& path);
    void unload_plugin(const std::string& name);
    
    std::vector<std::string> loaded_plugins() const;
    VMPlugin* get_plugin(const std::string& name);
    
private:
    std::map<std::string, std::unique_ptr<VMPlugin>> plugins_;
    void resolve_dependencies();
};

Task 5.2: Complete Integration Testing

  • Full system tests with plugins
  • Performance regression testing
  • Memory safety validation (AddressSanitizer, Valgrind)

Deliverables:

  • include/plugin_system.h - Complete plugin infrastructure
  • Example plugins demonstrating extensibility
  • Performance benchmarks vs original implementation
  • Memory safety validation reports

Success Criteria

Simplicity Metrics

  • Reduced LOC: 30% reduction in core VM code
  • Eliminated Virtual Calls: No virtual inheritance for immediate values
  • Single Allocation Path: One allocation method instead of 6+
  • Unified Error Handling: All errors through single exception hierarchy

Flexibility Metrics

  • Plugin Architecture: Load/unload bytecode handlers at runtime
  • Multiple Execution Engines: Swap interpreter/JIT without code changes
  • Configuration Driven: Major behaviors configurable without recompilation
  • Clean Interfaces: Each subsystem testable in isolation

Performance Requirements

  • No Regression: Performance equal or better than original
  • Memory Efficiency: ≤ 10% memory overhead vs original
  • Fast Plugin Loading: Plugin load/unload < 1ms
  • Clean Memory: Pass Valgrind and AddressSanitizer

Maintainability Goals

  • Test Coverage: 90%+ line coverage maintained
  • Documentation: All public APIs documented
  • Examples: Working examples for each extension point
  • Migration Guide: Clear path from old to new architecture

Migration Strategy

Backward Compatibility

  • Maintain existing public APIs during transition
  • Provide deprecation warnings for old interfaces
  • Side-by-side testing to ensure equivalent behavior

Rollout Plan

  1. Phase 1-2: Internal refactoring, no API changes
  2. Phase 3: New execution engine API, old API still works
  3. Phase 4: New configuration system, old hardcoded values as defaults
  4. Phase 5: Plugin system addition, core functionality unchanged

Risk Mitigation

  • Incremental Changes: Each phase produces working system
  • Extensive Testing: Original test suite must pass after each phase
  • Performance Monitoring: Continuous benchmarking throughout
  • Rollback Plan: Git branches for each phase, easy rollback

Next Steps

  1. Review & Approval: Stakeholder review of this plan
  2. Baseline Metrics: Establish current performance/complexity baselines
  3. Test Suite Audit: Ensure comprehensive test coverage before changes
  4. Phase 1 Kickoff: Begin object model simplification

This plan transforms the C++ Smalltalk VM from a complex, tightly-coupled system into a simple, extensible platform ready for future enhancements like JIT compilation while maintaining all existing functionality.