This document explains how to properly use the garbage collection (GC) system in the c-nan-boxing-2u implementation.
The GC system uses a shadow stack approach where you explicitly register local Value variables with the garbage collector. The GC can only run at safe points (during allocation or explicit collection), ensuring that all registered values remain valid.
Every function that creates or manipulates Value objects should follow this pattern:
void my_function() {
GC_PUSH_SCOPE(); // 1. Start GC scope
GC_LOCALS(var1, var2, var3); // 2. Declare and protect locals
var1 = make_string("hello"); // 3. Use the variables normally
var2 = make_list(10);
var3 = string_concat(var1, var2);
// ... rest of function logic
GC_POP_SCOPE(); // 4. End GC scope before return
}Call gc_init() at the start of your program, and gc_shutdown() on exit:
int main() {
gc_init();
// ... your program logic
gc_shutdown();
return 0;
}GC_PUSH_SCOPE(): Call at the beginning of every function that usesValueobjectsGC_POP_SCOPE(): Call before every return statement in the function
Use GC_LOCALS(var1, var2, ...) to declare and automatically protect up to 8 local variables:
GC_LOCALS(str, list, result, item);
str = make_string("test");
list = make_list(5);
// Variables are automatically tracked by GCFor more than 8 variables, you can use GC_LOCALS multiple times. Or, for individual protection of a single variable at a time, you can use GC_PROTECT:
Value my_var = val_null;
GC_PROTECT(&my_var); // Pass pointer to the variable#include "value.h"
#include "nanbox_gc.h"
Value process_words(Value input) {
GC_PUSH_SCOPE();
GC_LOCALS(words, result, word, processed);
// Split input into words
Value space = make_string(" ");
words = string_split(input, space);
result = make_list(list_count(words));
// Process each word
for (int i = 0; i < list_count(words); i++) {
word = list_get(words, i);
processed = string_concat(word, make_string("!"));
list_set(result, i, processed);
}
GC_POP_SCOPE();
return result;
}
int main() {
gc_init();
GC_PUSH_SCOPE();
GC_LOCALS(input, output);
input = make_string("hello world test");
output = process_words(input);
printf("Result has %d items\n", list_count(output));
GC_POP_SCOPE();
gc_shutdown();
return 0;
}- Always call
gc_init()before using anyValueobjects - Use
GC_PUSH_SCOPE()at the start of every function - Protect all local
Valuevariables withGC_LOCALS()orGC_PROTECT() - Declare all local
Valuevariables near the top of your function - Call
GC_POP_SCOPE()before every return - Clean up with
gc_shutdown()at program end
- Forget to call
gc_init()(will cause assertion failures) - Skip
GC_PUSH_SCOPE()in functions that useValueobjects - Forget to protect local
Valuevariables - Declare a
Valuevariable inside a loop - Skip
GC_POP_SCOPE()before returning - Use
Valueobjects after their scope has been popped
void large_function() {
GC_PUSH_SCOPE();
Value var17 = val_null;
Value var18 = val_null;
GC_PROTECT(&var17);
GC_PROTECT(&var18);
// ... use variables
GC_POP_SCOPE();
}For performance-critical sections where you know GC is not needed:
gc_disable();
// ... critical section (no allocations should trigger GC)
gc_enable();Force garbage collection explicitly:
gc_collect(); // Only runs if GC is not disabledSubsystems that manage their own arrays of Value objects (like a VM stack) should register a mark callback rather than using GC_PROTECT on each element. This is more efficient and avoids shadow stack overflow issues.
// Callback type
typedef void (*gc_mark_callback_t)(void* user_data);
// Registration
void gc_register_mark_callback(gc_mark_callback_t callback, void* user_data);
void gc_unregister_mark_callback(gc_mark_callback_t callback, void* user_data);
// For use inside callbacks
void gc_mark_value(Value v);Example: VM registering its stack as a GC root
// In VM initialization
void VMStorage::InitVM(...) {
// ... other init code ...
gc_register_mark_callback(VMStorage::MarkRoots, this);
}
// Static callback invoked during GC mark phase
void VMStorage::MarkRoots(void* user_data) {
VMStorage* vm = static_cast<VMStorage*>(user_data);
for (int i = 0; i < vm->stack.Count(); i++) {
gc_mark_value(vm->stack[i]);
gc_mark_value(vm->names[i]);
}
}
// In VM destructor - always unregister!
VMStorage::~VMStorage() {
gc_unregister_mark_callback(VMStorage::MarkRoots, this);
}The GC will invoke all registered callbacks during the mark phase, allowing each subsystem to mark the Values it's responsible for.
Calling GC_PROTECT inside a loop causes the shadow stack to grow unboundedly, eventually causing overflow or memory exhaustion.
// WRONG - shadow stack grows with each iteration
for (int i = 0; i < 1000; i++) {
Value temp = make_string("hello"); // GC_PROTECT called each iteration!
GC_PROTECT(&temp);
// ...
}
// CORRECT - declare outside loop, reuse inside
GC_LOCALS(temp);
for (int i = 0; i < 1000; i++) {
temp = make_string("hello"); // Reuses the same protected slot
// ...
}To ensure GC_POP_SCOPE() is always called, use a single return point at the end of functions:
// WRONG - early return skips GC_POP_SCOPE
Value risky_function(int x) {
GC_PUSH_SCOPE();
GC_LOCALS(result);
if (x < 0) {
return val_null; // GC_POP_SCOPE not called!
}
result = make_string("ok");
GC_POP_SCOPE();
return result;
}
// CORRECT - single return point
Value safe_function(int x) {
GC_PUSH_SCOPE();
GC_LOCALS(result);
if (x < 0) {
result = val_null;
} else {
result = make_string("ok");
}
GC_POP_SCOPE();
return result;
}- Allocation: Objects are allocated from a managed heap
- Collection: Triggered automatically when memory threshold is exceeded
- Mark Phase: All objects reachable from protected variables are marked
- Sweep Phase: Unmarked objects are freed
- Threshold: Dynamically adjusted based on collection effectiveness
// WRONG - will crash with assertion
int main() {
GC_PUSH_SCOPE(); // Assertion failure!
// ...
}
// CORRECT
int main() {
gc_init(); // Initialize first
GC_PUSH_SCOPE();
// ...
}// WRONG - variables may be garbage collected
void bad_function() {
GC_PUSH_SCOPE();
Value str = make_string("test"); // Not protected!
// str might be freed during allocation
GC_POP_SCOPE();
}
// CORRECT
void good_function() {
GC_PUSH_SCOPE();
GC_LOCALS(str);
str = make_string("test"); // Protected
GC_POP_SCOPE();
}// WRONG - no scope management
Value risky_function() {
Value result = make_string("hello");
return result; // result might be garbage collected!
}
// CORRECT
Value safe_function() {
GC_PUSH_SCOPE();
GC_LOCALS(result);
result = make_string("hello");
GC_POP_SCOPE();
return result; // Safe to return
}Add -DGC_DEBUG to your compile flags to see GC activity, and also cause freed blocks to be overwritten with 0xDEADBEEF:
gcc -DGC_DEBUG -o myprogram myprogram.c gc.c unicodeUtil.c nanbox_strings.c
Add -DGC_AGGRESSIVE to force collection on every allocation (for testing):
gcc -DGC_AGGRESSIVE -o myprogram myprogram.c gc.c unicodeUtil.c nanbox_strings.c
This helps catch GC-related bugs by making them occur more frequently and predictably.