This plan addresses critical issues in the current C++ Smalltalk VM implementation that hinder both simplicity and long-term flexibility. The current codebase suffers from over-engineering in some areas while lacking extensibility in others.
Simplicity:
- Reduce cognitive complexity and maintenance burden
- Eliminate unnecessary abstractions and inheritance hierarchies
- Consolidate error handling and resource management
- Streamline object model and memory layout
Long-term Flexibility:
- Enable plugin-based extensibility
- Support multiple execution engines (interpreter → JIT)
- Configuration-driven behavior
- Clean separation of concerns for independent evolution
-
Over-Complex Tagged Value System
- Too many special-case constructors and type checks
- Brittle bit manipulation scattered throughout codebase
- Hard to debug and extend with new immediate types
-
Excessive Object Inheritance Hierarchy
- Virtual inheritance overhead for simple immediate values
- Complex memory layout makes GC traversal error-prone
- Serialization and debugging complexity
-
Tightly Coupled Memory Manager
- Hardcoded allocation methods for each object type
- No separation between allocation policy and object construction
- Difficult to experiment with different GC strategies
-
Inflexible Bytecode Dispatch
- Hardcoded switch statements resist extension
- No plugin mechanism for custom bytecodes
- JIT integration will require major refactoring
Goals: Eliminate inheritance complexity, create uniform object representation
- Remove
SmallInteger,Booleanclass hierarchy - Create single
Objectstruct with discriminated union - Implement type-safe accessors
Before:
class SmallInteger : public Object {
int32_t value_;
public:
SmallInteger(int32_t value, Class* integerClass);
int32_t getValue() const { return value_; }
};After:
struct Object {
enum Type { IMMEDIATE, ARRAY, DICTIONARY, CLASS, METHOD };
Type type;
uint32_t size;
uint32_t hash;
void* class_ptr;
union {
int32_t immediate_value;
void** array_elements;
void* custom_data;
};
// Type-safe accessors
int32_t as_integer() const {
assert(type == IMMEDIATE);
return immediate_value;
}
};- Replace complex bit manipulation with simple discriminated union
- Single constructor with factory methods
- Eliminate special-case handling
New Implementation:
class TaggedValue {
public:
enum class Type : uint8_t { NIL, BOOL, INT, FLOAT, OBJECT };
static TaggedValue nil() { return TaggedValue(Type::NIL, 0); }
static TaggedValue boolean(bool val) { return TaggedValue(Type::BOOL, val ? 1 : 0); }
static TaggedValue integer(int32_t val) { return TaggedValue(Type::INT, val); }
static TaggedValue object(void* ptr) { return TaggedValue(Type::OBJECT, reinterpret_cast<uintptr_t>(ptr)); }
Type type() const { return type_; }
template<typename T>
T as() const {
if constexpr (std::is_same_v<T, int32_t>) {
assert(type_ == Type::INT);
return static_cast<int32_t>(value_);
} else if constexpr (std::is_pointer_v<T>) {
assert(type_ == Type::OBJECT);
return reinterpret_cast<T>(value_);
}
}
private:
TaggedValue(Type t, uintptr_t v) : type_(t), value_(v) {}
Type type_;
uintptr_t value_;
};Deliverables:
include/simple_object.h- New unified object modelinclude/simple_tagged_value.h- Simplified tagged values- Updated tests demonstrating equivalent functionality
Goals: Flexible allocation strategies, cleaner GC integration
- Replace multiple
allocateXxx()methods with generic allocator - Builder pattern for complex object construction
- Clear separation of allocation from initialization
New Memory Manager:
class MemoryManager {
public:
// Single allocation primitive
template<typename T>
T* allocate(size_t extra_slots = 0) {
size_t total_size = sizeof(T) + (extra_slots * sizeof(void*));
return static_cast<T*>(allocate_raw(total_size));
}
// Builder for complex objects
ObjectBuilder new_object(Object::Type type) {
return ObjectBuilder(*this, type);
}
// Policy-based GC trigger
void set_collection_policy(std::unique_ptr<CollectionPolicy> policy) {
collection_policy_ = std::move(policy);
}
private:
void* allocate_raw(size_t bytes);
void collect_if_needed(size_t requested);
std::unique_ptr<CollectionPolicy> collection_policy_;
};
class ObjectBuilder {
public:
ObjectBuilder& with_size(size_t slots);
ObjectBuilder& with_class(void* class_ptr);
Object* build();
};- Abstract collection triggering logic
- Support for different GC strategies
- Configuration-driven collection behavior
Deliverables:
include/flexible_memory.h- New memory management interfaceinclude/gc_policies.h- Pluggable collection strategies- Performance benchmarks comparing old vs new approaches
Goals: Extensible bytecode dispatch, JIT preparation
- Replace switch statements with handler registry
- Enable runtime bytecode extension
- Prepare for JIT compiler integration
New Dispatcher:
class BytecodeDispatcher {
public:
using Handler = std::function<void(Interpreter&, const uint8_t*&)>;
void register_handler(uint8_t opcode, Handler func) {
handlers_[opcode] = std::move(func);
}
void dispatch(uint8_t opcode, Interpreter& vm, const uint8_t*& ip) {
if (auto& handler = handlers_[opcode]) {
handler(vm, ip);
} else {
throw VMError(VMError::UNKNOWN_BYTECODE, "Opcode: " + std::to_string(opcode));
}
}
private:
std::array<Handler, 256> handlers_;
};
// Standard bytecodes as plugins
class CoreBytecodes : public VMPlugin {
public:
void register_handlers(BytecodeDispatcher& dispatcher) override {
dispatcher.register_handler(PUSH_LITERAL, [](Interpreter& vm, const uint8_t*& ip) {
uint32_t index = read_uint32(ip);
vm.push(vm.current_method().literal(index));
});
// ... other core handlers
}
};- Separate bytecode interpretation from execution policy
- Enable multiple execution strategies (interpreter, JIT, etc.)
- Configuration-driven engine selection
Execution Engine Interface:
class ExecutionEngine {
public:
virtual ~ExecutionEngine() = default;
virtual TaggedValue execute(const CompiledMethod& method, TaggedValue receiver, std::vector<TaggedValue>& args) = 0;
virtual void configure(const VMConfig& config) = 0;
};
class InterpreterEngine : public ExecutionEngine {
BytecodeDispatcher dispatcher_;
public:
TaggedValue execute(const CompiledMethod& method, TaggedValue receiver, std::vector<TaggedValue>& args) override;
};
class VM {
std::unique_ptr<ExecutionEngine> engine_;
public:
void set_engine(std::unique_ptr<ExecutionEngine> eng) {
engine_ = std::move(eng);
}
TaggedValue execute_method(const CompiledMethod& method, TaggedValue receiver, std::vector<TaggedValue>& args) {
return engine_->execute(method, receiver, args);
}
};Deliverables:
include/bytecode_dispatcher.h- Plugin-based dispatch systeminclude/execution_engine.h- Abstract execution interfacesrc/core_bytecodes.cpp- Standard bytecodes as plugins- Working interpreter engine with equivalent performance
Goals: Unified error handling, configuration-driven behavior
- Single error hierarchy for all VM errors
- Structured error information for debugging
- Exception safety throughout the codebase
Error System:
class VMError : public std::exception {
public:
enum Type {
OUT_OF_MEMORY, STACK_OVERFLOW, TYPE_ERROR,
METHOD_NOT_FOUND, UNKNOWN_BYTECODE, INVALID_OBJECT
};
VMError(Type t, const std::string& msg, const std::string& context = "")
: type_(t), message_(msg), context_(context) {}
Type type() const { return type_; }
const char* what() const noexcept override { return message_.c_str(); }
const std::string& context() const { return context_; }
private:
Type type_;
std::string message_;
std::string context_;
};
// RAII error context for debugging
class ErrorContext {
static thread_local std::vector<std::string> context_stack_;
public:
ErrorContext(const std::string& context) {
context_stack_.push_back(context);
}
~ErrorContext() {
context_stack_.pop_back();
}
static std::string current_context() {
std::string result;
for (const auto& ctx : context_stack_) {
result += ctx + " -> ";
}
return result;
}
};
#define VM_CONTEXT(msg) ErrorContext _ctx(msg)- External configuration for VM parameters
- Runtime behavior modification without recompilation
- Environment and file-based configuration
Configuration System:
struct VMConfig {
size_t heap_size = 64 * 1024 * 1024;
bool enable_jit = false;
bool debug_mode = false;
std::string plugin_directory = "./plugins";
double gc_threshold = 0.8;
static VMConfig from_file(const std::string& path);
static VMConfig from_environment();
static VMConfig defaults();
void validate() const;
};
class VM {
VMConfig config_;
public:
VM(const VMConfig& config = VMConfig::defaults())
: config_(config) {
config_.validate();
initialize_from_config();
}
private:
void initialize_from_config();
};Deliverables:
include/vm_error.h- Unified error handling systeminclude/vm_config.h- Configuration management- JSON/YAML configuration file support
- Error reporting with full context traces
Goals: Complete plugin architecture, third-party extensibility
- Dynamic plugin loading and unloading
- Plugin dependency management
- Sandboxed plugin execution
Plugin System:
class VMPlugin {
public:
virtual ~VMPlugin() = default;
virtual void initialize(VM& vm) = 0;
virtual void register_primitives(PrimitiveRegistry& reg) {}
virtual void register_bytecodes(BytecodeDispatcher& dispatcher) {}
virtual void cleanup() {}
virtual std::string name() const = 0;
virtual std::string version() const = 0;
virtual std::vector<std::string> dependencies() const { return {}; }
};
class PluginManager {
public:
void load_plugin(const std::string& path);
void load_plugins_from_directory(const std::string& path);
void unload_plugin(const std::string& name);
std::vector<std::string> loaded_plugins() const;
VMPlugin* get_plugin(const std::string& name);
private:
std::map<std::string, std::unique_ptr<VMPlugin>> plugins_;
void resolve_dependencies();
};- Full system tests with plugins
- Performance regression testing
- Memory safety validation (AddressSanitizer, Valgrind)
Deliverables:
include/plugin_system.h- Complete plugin infrastructure- Example plugins demonstrating extensibility
- Performance benchmarks vs original implementation
- Memory safety validation reports
- Reduced LOC: 30% reduction in core VM code
- Eliminated Virtual Calls: No virtual inheritance for immediate values
- Single Allocation Path: One allocation method instead of 6+
- Unified Error Handling: All errors through single exception hierarchy
- Plugin Architecture: Load/unload bytecode handlers at runtime
- Multiple Execution Engines: Swap interpreter/JIT without code changes
- Configuration Driven: Major behaviors configurable without recompilation
- Clean Interfaces: Each subsystem testable in isolation
- No Regression: Performance equal or better than original
- Memory Efficiency: ≤ 10% memory overhead vs original
- Fast Plugin Loading: Plugin load/unload < 1ms
- Clean Memory: Pass Valgrind and AddressSanitizer
- Test Coverage: 90%+ line coverage maintained
- Documentation: All public APIs documented
- Examples: Working examples for each extension point
- Migration Guide: Clear path from old to new architecture
- Maintain existing public APIs during transition
- Provide deprecation warnings for old interfaces
- Side-by-side testing to ensure equivalent behavior
- Phase 1-2: Internal refactoring, no API changes
- Phase 3: New execution engine API, old API still works
- Phase 4: New configuration system, old hardcoded values as defaults
- Phase 5: Plugin system addition, core functionality unchanged
- Incremental Changes: Each phase produces working system
- Extensive Testing: Original test suite must pass after each phase
- Performance Monitoring: Continuous benchmarking throughout
- Rollback Plan: Git branches for each phase, easy rollback
- Review & Approval: Stakeholder review of this plan
- Baseline Metrics: Establish current performance/complexity baselines
- Test Suite Audit: Ensure comprehensive test coverage before changes
- Phase 1 Kickoff: Begin object model simplification
This plan transforms the C++ Smalltalk VM from a complex, tightly-coupled system into a simple, extensible platform ready for future enhancements like JIT compilation while maintaining all existing functionality.