This document outlines the best practices for writing high-performance, accurate, and maintainable JavaScript code in the context of data processing. It is intended for contributors to the TinyFrameJS project, which runs on Node.js and in the browser (V8 engine).
- Keep arrays dense β avoid holes, use
.fill()to prepopulate. - Do not delete elements β use
.splice()instead ofdelete. - Use sequential indices starting from 0 β avoid
arr[10000] = xon an empty array. - Keep types homogeneous β do not mix numbers, strings, and objects.
- Avoid preallocating large sparse arrays β grow them incrementally.
- Use
.push()to add elements instead of manual indexing. - Stream large data when possible β avoid loading millions of rows into memory at once.
- Use
for/for...of/.forEach()β modern V8 optimizes all of them well. - In performance-critical code, benchmark
forvs.forEach().
β Objects and Hidden Classes
- Initialize all object properties at creation.
- Do not add properties dynamically later.
- Maintain the same property order across instances.
// Good
function Account(id, balance) {
this.id = id;
this.balance = balance;
}
// Bad (hidden class changes)
const acc = {};
acc.id = 'A123';
acc.balance = 1000;
acc.currency = 'USD';- Write functions that operate on a single input type.
- Avoid mixing input types like
numberandstringin the same function.
- Reuse objects inside loops.
- Avoid closures in hot loops.
- Minimize short-lived allocations in performance-sensitive code.
- Use
Array.sort(),Math.sqrt(), etc. β they are native and fast. - Avoid bitwise hacks like
x | 0β useMath.trunc,Math.floorinstead.
- Avoid
try/catchin hot loops. - Wrap risky logic in a separate function.
function process(data) {
// hot path
}
function safeProcess(data) {
try {
process(data);
} catch (e) {
logError(e);
}
}- Use appropriate ColumnVector implementation - TypedArrayVector for numeric data, ArrowVector for complex types
- Let VectorFactory choose the optimal storage backend based on data type
- Avoid data copying β use references or in-place operations where possible
- Pre-allocate memory for result arrays in a single call, knowing the size in advance
- Use array pooling for temporary arrays to reduce garbage collector pressure
// Bad
const result = [];
for (let i = 0; i < data.length; i++) {
result.push(data[i] * 2);
}
// Good
const result = new Float64Array(data.length);
for (let i = 0; i < data.length; i++) {
result[i] = data[i] * 2;
}- Avoid nested loops β aim for O(n) complexity instead of O(nΒ²).
- Use sliding windows instead of recalculating for overlapping data ranges.
- Apply prefix-sum for efficient calculation of sliding statistics on large windows.
- Cache intermediate results to avoid repeated calculations.
// Bad (O(n*k))
function rollingSum(values, windowSize) {
const result = new Float64Array(values.length - windowSize + 1);
for (let i = 0; i <= values.length - windowSize; i++) {
let sum = 0;
for (let j = 0; j < windowSize; j++) {
sum += values[i + j];
}
result[i] = sum;
}
return result;
}
// Good (O(n))
function rollingSum(values, windowSize) {
const result = new Float64Array(values.length - windowSize + 1);
let sum = 0;
// Initialize first window
for (let i = 0; i < windowSize; i++) {
sum += values[i];
}
result[0] = sum;
// Sliding window
for (let i = 1; i <= values.length - windowSize; i++) {
sum = sum - values[i - 1] + values[i + windowSize - 1];
result[i] = sum;
}
return result;
}- Use counters for invalid values instead of repeated
isNaN()checks. - Apply validity masks for filtering NaN values in a single pass.
- Avoid checks on each iteration β group checks and perform them in advance.
// Bad
function hasNaN(array) {
for (let i = 0; i < array.length; i++) {
if (isNaN(array[i])) return true;
}
return false;
}
// Good
function countNaN(array) {
let badCount = 0;
for (let i = 0; i < array.length; i++) {
if (isNaN(array[i])) badCount++;
}
return badCount;
}- Avoid using
JSON.stringifyfor data serialization β use efficient hash functions (FNV-1a, Murmur3). - Use hash tables with open addressing instead of Map for large datasets.
- Pre-compute hashes for reused values.
// Bad
function findDuplicates(rows, keyColumns) {
const seen = new Set();
return rows.filter((row) => {
const key = JSON.stringify(keyColumns.map((col) => row[col]));
if (seen.has(key)) return true;
seen.add(key);
return false;
});
}
// Good
function hashRow(row, keyColumns) {
let hash = 2166136261; // FNV-1a offset basis
for (const col of keyColumns) {
const val = row[col];
const str = String(val);
for (let i = 0; i < str.length; i++) {
hash ^= str.charCodeAt(i);
hash = (hash * 16777619) >>> 0; // FNV prime
}
}
return hash;
}
function findDuplicates(rows, keyColumns) {
const seen = new Set();
return rows.filter((row) => {
const hash = hashRow(row, keyColumns);
if (seen.has(hash)) return true;
seen.add(hash);
return false;
});
}- Use block processing for better vectorization in V8.
- Split large tasks into subtasks for parallel processing.
- Consider using Web Workers for CPU-intensive operations.
- Measure before optimizing β use profiling to identify bottlenecks.
- Set performance budgets for critical operations.
- Test on realistic data volumes β optimizations may only show up on large datasets.
- Avoid premature optimization β first achieve correctness, then optimize critical paths.
When working with numeric data, it is essential to clearly define and document how the library handles special values:
null- converted to0in numeric columnsundefined- converted toNaNin numeric columnsNaN- preserved asNaN
- Store "raw" values - keep original data alongside optimized data for calculations
- Use validity masks - track where
undefinedand other special values were - Separate data and metadata - do not lose information during optimization
// Recommended approach
export function createFrame(data) {
const columns = {}; // optimized data
const rawColumns = {}; // original data
// ...
return { columns, rawColumns, rowCount, columnNames };
}- Document default behavior - e.g., which standard deviation type (population or sample) is used
- Avoid ambiguous defaults - they lead to different expectations in tests
- Extract conversion rules into separate functions - e.g.,
normalizeNumeric(value)
- Test cases should be consistent - they should not contradict each other
- Document expected behavior - especially for handling special values
- Avoid special handling for tests - functions should work universally
// Bad: special handling for a specific test
if (values.length === 6 && values[0] === 1 && Number.isNaN(values[1])) {
return 1.92; // Magic number for the test
}
// Good: universal algorithm that works for all cases
function calculateStandardDeviation(values, population = true) {
// Universal algorithm...
}- Always use single quotes ('...') for string literals in tests.
- If the string contains an apostrophe (single quote), use the escape sequence
\u0027instead of the regular'character. - Example:
// Bad: expect(err.message).toBe("Column 'foo' not found"); // Good: expect(err.message).toBe('Column \u0027foo\u0027 not found');
- Avoid double passes β do not perform separate validation if types are already checked
- Trust the data structure β if
createFrameguarantees type homogeneity, do not recheck it - Minimize data copying β work with original arrays where possible
TinyFrameJS implements a clean two-layer architecture:
DataFrame (API) β Series (columns) β ColumnVector (storage)
- DataFrame - public API for working with data
- Series - columns of data, wrapper over ColumnVector
- ColumnVector - abstraction for storing data, can be:
- TypedArrayVector - fast storage for numeric data
- ArrowVector - optimized storage with support for null, strings and complex types
- SimpleVector - simple storage for small datasets or mixed types
The engine selection is done automatically through VectorFactory based on the data type and operation context.
All methods in TinyFrameJS follow a unified pattern with dependency injection:
/**
* ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ ΠΌΠ΅ΡΠΎΠ΄Π°
* @param {Object} frame - ΠΠ±ΡΠ΅ΠΊΡ DataFrame
* @param {String} column - ΠΠΌΡ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠΈ
* @returns {Number|Array|Object} - Description of the returned value
*/
export const methodName =
({ validateColumn, otherDep }) =>
(frame, column, ...otherArgs) => {
// Input data validation
validateColumn(frame, column);
// Implementation
const result = /* ... */;
return result;
};This pattern provides:
- Dependency injection - dependencies are passed to the method
- Testability - dependencies can be mocked
- Consistency - all methods follow the same structure
- Functional style - pure functions without side effects
TinyFrameJS differentiates between two types of methods:
- Transform methods - return a new DataFrame:
export const sort =
({ validateColumn }) =>
(frame, column) => {
validateColumn(frame, column);
// Create indices for sorting
const arr = frame.columns[column];
const sortedIndices = [...arr.keys()].sort((a, b) => arr[a] - arr[b]);
// Create new frame with sorted data
const newColumns = {};
for (const col of Object.keys(frame.columns)) {
const originalArray = frame.columns[col];
newColumns[col] = sortedIndices.map(i => originalArray[i]);
}
return { columns: newColumns, rowCount: frame.rowCount };
};- Aggregation methods - return a scalar value:
export const sum =
({ validateColumn }) =>
(frame, column) => {
validateColumn(frame, column);
const arr = frame.columns[column];
let total = 0;
for (let i = 0; i < arr.length; i++) {
total += arr[i];
}
return total;
};TinyFrameJS uses a unified utility extendDataFrame for registering methods. The process consists of three steps:
// src/methods/dataframe/aggregation/sum.js
export const sum = ({ validateColumn }) => (frame, column) => {
validateColumn(frame, column);
// Implementation...
return total;
};// src/methods/dataframe/aggregation/pool.js
export { sum } from './sum.js';
export { mean } from './mean.js';
export { min } from './min.js';
export { max } from './max.js';// src/methods/dataframe/aggregation/index.js
import { DataFrame } from '../../../core/DataFrame.js';
import { extendDataFrame } from '../../../core/extendDataFrame.js';
import * as pool from './pool.js';
// ΠΠ°Π²ΠΈΡΠΈΠΌΠΎΡΡΠΈ
import { validateColumn } from '../../../utils/validators.js';
const deps = { validateColumn };
// Π Π΅Π³ΠΈΡΡΡΠ°ΡΠΈΡ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ²
extendDataFrame(DataFrame.prototype, pool);
// Export methods for direct use
export * from './pool.js';// src/core/extendDataFrame.js
export function extendDataFrame(proto, pool, { namespace, strict = true } = {}) {
const target = namespace ? (proto[namespace] ??= {}) : proto;
for (const [name, fn] of Object.entries(pool)) {
if (strict && name in target) {
throw new Error(`Method conflict: ${namespace ? namespace + '.' : ''}${name}`);
}
target[name] = function (...args) {
return fn(this, ...args); // Transparently pass this as the first argument
};
}
}Benefits of this approach:
- Clean logic separation - the calculation part of the method is separated from binding to the DataFrame class
- Tree-shaking - unused methods do not enter the final bundle
- Namespaces - methods from different packages do not conflict with each other
When writing tests for DataFrame methods, follow these rules:
- Test file location:
/tests/core/methods/{category}/{methodName}.test.js - Test through DataFrame API: Test through the DataFrame interface, not directly through functions
- Test successful and error scenarios: Check both normal execution and error handling
- For transformation methods: Check that the returned DataFrame has the expected structure
- For aggregation methods: Verify the returned value is correct
Example test structure:
import { describe, test, expect } from 'vitest';
import { DataFrame } from '../../../src/core/DataFrame.js';
describe('DataFrame.methodName', () => {
const df = DataFrame.create({
a: [1, 2, 3],
b: [10, 20, 30],
});
test('performs expected operation', () => {
// For transformation method
const result = df.methodName('a');
expect(result).toBeInstanceOf(DataFrame);
expect(result.columns).toContain('a');
// For aggregation method
const value = df.methodName('a');
expect(value).toBe(expectedValue);
});
test('throws on invalid input', () => {
expect(() => df.methodName('nonexistent')).toThrow();
});
});TinyFrameJS implements a clean two-layer architecture:
DataFrame (API) β Series (columns) β ColumnVector (storage)
- DataFrame - Public API for working with data, provides method chaining
- Series - Column representation, wraps a ColumnVector
- ColumnVector - Abstract storage interface with multiple implementations:
- TypedArrayVector - Fast storage for numeric data using JavaScript TypedArrays
- ArrowVector - Optimized storage with Arrow for complex types and null values
- SimpleVector - Fallback for mixed data types
The appropriate vector implementation is automatically selected by VectorFactory based on data type and operation context.
TinyFrameJS uses dependency injection for all methods:
- Methods are pure functions with dependencies as their first parameter
- No direct imports of utilities in method files
- Easier testing - dependencies can be mocked
// Example method with dependency injection
export const sum =
({ validateColumn }) =>
(frame, column) => {
validateColumn(frame, column);
const arr = frame.columns[column];
let total = 0;
for (let i = 0; i < arr.length; i++) {
total += arr[i];
}
return total;
};TinyFrameJS uses a modular method registration system via extendDataFrame:
- Methods defined as pure functions in individual files
- Exported through barrel files (index.js) for organization
- Registered with DataFrame via the
extendDataFrameutility - Support for namespaces to avoid conflicts between packages
// Example method registration
import { DataFrame } from '../core/DataFrame.js';
import { extendDataFrame } from '../utils/extendDataFrame.js';
import * as aggregationMethods from './aggregation/index.js';
// Register methods directly on DataFrame.prototype
extendDataFrame(DataFrame.prototype, aggregationMethods);
// Register methods in a namespace
extendDataFrame(DataFrame.prototype, technicalMethods, { namespace: 'ta' });When implementing a new method, decide whether it's a transformation or aggregation:
-
Transformation methods:
- Return a new DataFrame
- Can be chained with other methods
- Example:
sort(),filter(),select()
-
Aggregation methods:
- Return a scalar value or array
- Typically terminate a method chain
- Example:
count(),mean(),sum()
This distinction is handled automatically by the method implementation:
// Transformation method example
export const filter =
({ validateFunction }) =>
(frame, predicate) => {
validateFunction(predicate);
// Implementation that returns a new DataFrame
return new DataFrame(/* filtered data */);
};
// Aggregation method example
export const sum =
({ validateColumn }) =>
(frame, column) => {
validateColumn(frame, column);
// Implementation that returns a scalar value
return total;
};- Avoid using
Numberdirectly for monetary values. - Represent money in cents:
$1.99β199 - Use formatting functions like
Intl.NumberFormat, divide by 100 when needed.
- Use
BigIntwhen values exceed 2^53. - Do not mix
BigIntandNumberin operations.
- For precise decimal math, use:
decimal.js,big.js,dinero.js - Trade-off: slower but much safer for rates, taxes, percentages.
import Decimal from 'decimal.js';
const total = new Decimal('0.1').plus('0.2'); // "0.3"- Use
Math.round,toFixed, or proper libraries. - For bankers' rounding, use custom rounding or appropriate libraries.
- Add tests for rounding errors (
0.1 + 0.2 !== 0.3). - Use
Number.EPSILONor absolute tolerance (abs(result - expected) < Ξ΅).
Although we prefer pure functions, classes are justified in cases like:
- Modeling complex entities with internal state (e.g.,
Portfolio,StrategySession,BacktestRun). - Simulating time-dependent state, e.g., strategy object tracking positions, flags, counters.
- Framework integration, where classes are expected (
class StrategywithonBarmethod). - Inheritance/templates, when structure justifies reuse via class inheritance (use cautiously!).
When using classes:
- Do not add properties dynamically after
constructor; - Always initialize all fields in the
constructor; - Avoid deep hierarchies β prefer composition over inheritance.
- Use pure functions when no internal state is needed.
- They are easier to test, V8 optimizes them better, and no hidden class churn.
- Classes can be used when necessary, but default to functions for simpler logic.
// Preferred:
function calculatePnL(entryPrice, exitPrice) {
return exitPrice - entryPrice;
}
// Less efficient:
class Trade {
constructor(entry, exit) {
this.entry = entry;
this.exit = exit;
}
getPnL() {
return this.exit - this.entry;
}
}- One file = one module = one purpose
- Separate strategy logic, formatting, calculations, UI
- Each method in its own file with clear dependency injection
- Follow the project standard (ESM)
- Use barrel files (index.js) for organizing related methods
- Register methods with
extendDataFramein namespace or directly
- Prefer functions < 50 lines
- Extract sub-functions for clarity and testability
- Avoid using
fs,path,processin browser-targeted code - Abstract platform-specific behavior
- Follow ESLint + Prettier rules
- Use
camelCasefor variables/functions,PascalCasefor classes
- Use JSDoc to document all methods, especially their parameters and return values
- Explain complex calculations with inline comments
- Document namespace methods with their intended usage patterns
- For methods that extend DataFrame, document how they interact with the two-layer architecture
- Cover new logic with unit tests
- Test through the DataFrame API, not internal functions
- Include correctness and boundary conditions
- Test both direct methods and namespace methods
- Validate against known correct values
- Add tolerances (
Β±1e-12) for floating-point results - Test with different ColumnVector implementations
- Verify results are consistent across backends
- Include full backtest runs if applicable
- Benchmark with realistic datasets (100k+ rows)
- Use
console.timeorperformance.now()for timing
- Use flamegraphs, DevTools, or CLI tools
- Only optimize based on real measurements
- Using raw
Numberfor money without scaling - Mixing types in arrays or structures
- Sparse arrays / use of
delete - Dynamically adding properties to hot objects
- Allocating temporary objects in loops
- Synchronous blocking on large datasets (e.g., in UI or Node event loop)
- Silent
catchblocks or unhandled Promise rejections
Before submitting a PR, please verify:
- Followed project code style (Prettier, ESLint)
- Used pure functions with dependency injection
- Properly registered methods using
extendDataFrame - Added tests for new logic and edge cases
- Benchmarked performance (if critical path is affected)
- Avoided anti-patterns (e.g., array holes, mixed types, etc.)
- Used conventional commits and described your PR clearly
- Highlighted any code that is precision-sensitive (money, rates)
- Updated documentation if adding to public API
- CI passes β
Write code that is:
- Fast β V8-optimized, low-GC, optimized vector storage
- Accurate β financial results must be precise to the cent
- Modular β clear separation of responsibilities with namespaces
- Predictable β pure functions with explicit dependencies
- Extensible β properly registered via
extendDataFrame
Thank you for keeping TinyFrameJS fast and reliable β‘