Skip to content

Latest commit

 

History

History
282 lines (203 loc) · 7.57 KB

File metadata and controls

282 lines (203 loc) · 7.57 KB

Struct for C++

C++ port of the canonical TypeScript implementation.

Status: complete. Full TS-canonical parity: all 40 functions, 15 type bit-flags, 3 mode constants (M_KEYPRE/M_KEYPOST/M_VAL), SKIP/DELETE sentinels (pointer-identity), and the Injection state machine. inject/transform/validate/select all dispatch through the canonical injector machinery: 11 transform commands, 6 validate checkers, 4 select operators.

Passes the full shared corpus (1178/1178). Run locally with make build from cpp/. Per-file pass counts are written to corpus-scoreboard.json after each run; the committed baseline lives at test-baseline.json.

For motivation, language-neutral concepts, and the cross-language parity matrix, see the top-level README and REPORT.md.

Install

In the monorepo:

cd cpp
make build       # smoke + corpus
make smoke       # just the smoke test
make corpus      # just the corpus driver
make sanitize    # build + run with ASan + UBSan

The library is header-only across three files in src/:

  • value.hppValue (std::variant), OrderedMap, Sentinel, type bit-flags, predicates.
  • value_io.hpp — JSON parse/serialise via the nlohmann::json bridge.
  • voxgig_struct.hpp — main API: utilities, getpath/setpath/walk/merge/inject/transform/validate/ select plus all transform_*/validate_*/select_* injectors.

Namespace voxgig::structlib. Requires C++17 (for std::variant / structured bindings). Depends on nlohmann/json only for the JSON-text parser/serialiser bridge — runtime values use the custom Value type.

#include "voxgig_struct.hpp"
#include <nlohmann/json.hpp>
using nlohmann::json;
using namespace VoxgigStruct;

Quick start

#include "voxgig_struct.hpp"
#include <nlohmann/json.hpp>

using nlohmann::json;
using namespace VoxgigStruct;

int main() {
    json store = { {"db", {{"host", "localhost"}}} };

    args_container args = { store, json("db") };
    json db = getprop(std::move(args));
    // db == { "host": "localhost" }

    return 0;
}

Calling convention

Most functions accept args_container&& (a std::vector<json>) rather than typed parameters. This is a porting shortcut that mirrors the variadic shape of the canonical functions; it will be replaced with type-safe signatures.

Function reference (currently implemented)

Source: src/voxgig_struct.hpp. Namespace VoxgigStruct.

20 of the 40 canonical functions are present:

Predicates

bool isnode(args_container&& args);
bool ismap(args_container&& args);
bool islist(args_container&& args);
bool iskey(args_container&& args);
bool isempty(args_container&& args);
bool isfunc(args_container&& args);

Type inspection

std::string typename_of(args_container&& args);
int         typify(args_container&& args);

Property access

json getprop(args_container&& args);
json setprop(args_container&& args);
std::vector<std::string> keysof(args_container&& args);
bool                      haskey(args_container&& args);
std::vector<json>         items(args_container&& args);

Tree operations

json clone(args_container&& args);     // shallow currently — see notes
json walk(args_container&& args);      // see notes (UB issue)
json merge(args_container&& args);     // partial implementation

Strings

std::string escre(args_container&& args);
std::string escurl(args_container&& args);
std::string joinurl(args_container&& args);
std::string stringify(args_container&& args);

Function reference (not yet implemented)

The following canonical functions are missing. Items marked P0 are foundational for the other missing pieces:

Path operations (P0)

json getpath(...);     // missing
json setpath(...);     // missing

Major subsystems (P0)

json              inject(...);     // missing
json              transform(...);  // missing
json              validate(...);   // missing
std::vector<json> select(...);     // missing

Minor utilities

getdef, getelem, delprop, size, slice, flatten, filter,
pad, replace, join, jsonify, strkey, pathify

Builders

jm, jt

Injection helpers

checkPlacement, injectorArgs, injectChild

Sentinels and mode constants

SKIP, DELETE
M_KEYPRE, M_KEYPOST, M_VAL, MODENAME

(Type bit-flags T_any..T_node are present as constexpr int.)

Constants

Type bit-flags

constexpr int VoxgigStruct::T_any
constexpr int VoxgigStruct::T_noval
// ... 15 total

Notes

Why partial?

The C++ port covers value-shape utilities and basic walk / merge. Major subsystems are not implemented yet. Tracked as P0/P1/P2 in ../REPORT.md.

Known issues

  • walk() casts function pointers through intptr_t via the JSON value -- this is undefined behaviour and needs replacing with a proper callback type.
  • clone() is a shallow copy; canonical is deep.
  • merge() is partially implemented; significant blocks are commented out.
  • All functions use args_container&& (std::vector<json>); types are not yet enforced.
  • Debug std::cout calls remain in the source.

Object model

The port uses nlohmann::json for the container type. This is reference-stable for nested values, which is the property the canonical algorithm requires.

Path syntax not yet supported

getpath / setpath are missing. Use repeated getprop calls to walk into nested data, or wait for the path API to land.

Test status

Catch2 framework with limited test coverage. See the overview directory for current API examples.

Regex

Uniform six-function regex API (see /REGEX_API.md). The C++ port wraps <regex> (C++11), which defaults to the ECMAScript dialect.

API

Function Maps to
re_compile(pattern) std::regex(pattern) (throws std::regex_error on bad pattern)
re_test(pattern, input) std::regex_search → bool
re_find(pattern, input) first match groups as std::vector<std::string> (empty if no match)
re_find_all(pattern, input) std::vector<std::vector<std::string>>
re_replace(pattern, input, rep) std::regex_replace(input, re, rep)
re_escape(s) escape regex metacharacters

Dialect

Patterns must stay inside the RE2 subset documented in /REGEX.md. std::regex defaults to ECMAScript syntax and supports backreferences and lookaround; using them will not be portable.

Sharp edges (C++-specific)

  • libstdc++ <regex> has the worst-in-class catastrophic backtracking. The discovery panel measures ~1.2 s for ^(a+)+$ over 22 a's plus !. This is well-known and is the reason many production C++ projects avoid <regex> in favour of RE2 or PCRE2. Stay inside the RE2 subset and avoid nested quantifiers; even then, performance won't match the dedicated engines.
  • Zero-width replace. re_replace("a*", "abc", "X") returns "XXbXcX" — the ECMA convention shared by all PCRE/ECMA/.NET/Java/Onigmo engines plus the in-tree Thompson ports. Go (RE2) returns "XbXcX" instead; see /REGEX_PATHOLOGICAL.md.

See /REGEX_PATHOLOGICAL.md for the cross-port pathological-input panel.

Build and test

cd cpp
make build
make test

The overview / scratch examples in overview/ show the current API in use.