Skip to content

Latest commit

Β 

History

History
769 lines (580 loc) Β· 22.9 KB

File metadata and controls

769 lines (580 loc) Β· 22.9 KB

πŸ“ TinyFrameJS Coding Guidelines

This document outlines the best practices for writing high-performance, accurate, and maintainable JavaScript code in the context of data processing. It is intended for contributors to the TinyFrameJS project, which runs on Node.js and in the browser (V8 engine).

⚑ Performance Recommendations

βœ… Arrays (V8 Optimizations)

  • Keep arrays dense β€” avoid holes, use .fill() to prepopulate.
  • Do not delete elements β€” use .splice() instead of delete.
  • Use sequential indices starting from 0 β€” avoid arr[10000] = x on an empty array.
  • Keep types homogeneous β€” do not mix numbers, strings, and objects.
  • Avoid preallocating large sparse arrays β€” grow them incrementally.
  • Use .push() to add elements instead of manual indexing.
  • Stream large data when possible β€” avoid loading millions of rows into memory at once.

βœ… Loops and Iteration

  • Use for / for...of / .forEach() β€” modern V8 optimizes all of them well.
  • In performance-critical code, benchmark for vs .forEach().

βœ… Objects and Hidden Classes

  • Initialize all object properties at creation.
  • Do not add properties dynamically later.
  • Maintain the same property order across instances.
// Good
function Account(id, balance) {
  this.id = id;
  this.balance = balance;
}

// Bad (hidden class changes)
const acc = {};
acc.id = 'A123';
acc.balance = 1000;
acc.currency = 'USD';

βœ… Type Monomorphism

  • Write functions that operate on a single input type.
  • Avoid mixing input types like number and string in the same function.

βœ… Memory and GC

  • Reuse objects inside loops.
  • Avoid closures in hot loops.
  • Minimize short-lived allocations in performance-sensitive code.

βœ… Built-in Methods

  • Use Array.sort(), Math.sqrt(), etc. β€” they are native and fast.
  • Avoid bitwise hacks like x | 0 β€” use Math.trunc, Math.floor instead.

βœ… Exception Handling

  • Avoid try/catch in hot loops.
  • Wrap risky logic in a separate function.
function process(data) {
  // hot path
}

function safeProcess(data) {
  try {
    process(data);
  } catch (e) {
    logError(e);
  }
}

βœ… Optimizations Based on TinyFrameJS Experience

Efficient Data Storage

  • Use appropriate ColumnVector implementation - TypedArrayVector for numeric data, ArrowVector for complex types
  • Let VectorFactory choose the optimal storage backend based on data type
  • Avoid data copying β€” use references or in-place operations where possible
  • Pre-allocate memory for result arrays in a single call, knowing the size in advance
  • Use array pooling for temporary arrays to reduce garbage collector pressure
// Bad
const result = [];
for (let i = 0; i < data.length; i++) {
  result.push(data[i] * 2);
}

// Good
const result = new Float64Array(data.length);
for (let i = 0; i < data.length; i++) {
  result[i] = data[i] * 2;
}

Algorithmic Optimizations

  • Avoid nested loops β€” aim for O(n) complexity instead of O(nΒ²).
  • Use sliding windows instead of recalculating for overlapping data ranges.
  • Apply prefix-sum for efficient calculation of sliding statistics on large windows.
  • Cache intermediate results to avoid repeated calculations.
// Bad (O(n*k))
function rollingSum(values, windowSize) {
  const result = new Float64Array(values.length - windowSize + 1);
  for (let i = 0; i <= values.length - windowSize; i++) {
    let sum = 0;
    for (let j = 0; j < windowSize; j++) {
      sum += values[i + j];
    }
    result[i] = sum;
  }
  return result;
}

// Good (O(n))
function rollingSum(values, windowSize) {
  const result = new Float64Array(values.length - windowSize + 1);
  let sum = 0;

  // Initialize first window
  for (let i = 0; i < windowSize; i++) {
    sum += values[i];
  }
  result[0] = sum;

  // Sliding window
  for (let i = 1; i <= values.length - windowSize; i++) {
    sum = sum - values[i - 1] + values[i + windowSize - 1];
    result[i] = sum;
  }
  return result;
}

Efficient NaN and Invalid Value Handling

  • Use counters for invalid values instead of repeated isNaN() checks.
  • Apply validity masks for filtering NaN values in a single pass.
  • Avoid checks on each iteration β€” group checks and perform them in advance.
// Bad
function hasNaN(array) {
  for (let i = 0; i < array.length; i++) {
    if (isNaN(array[i])) return true;
  }
  return false;
}

// Good
function countNaN(array) {
  let badCount = 0;
  for (let i = 0; i < array.length; i++) {
    if (isNaN(array[i])) badCount++;
  }
  return badCount;
}

Hashing and Duplicate Detection

  • Avoid using JSON.stringify for data serialization β€” use efficient hash functions (FNV-1a, Murmur3).
  • Use hash tables with open addressing instead of Map for large datasets.
  • Pre-compute hashes for reused values.
// Bad
function findDuplicates(rows, keyColumns) {
  const seen = new Set();
  return rows.filter((row) => {
    const key = JSON.stringify(keyColumns.map((col) => row[col]));
    if (seen.has(key)) return true;
    seen.add(key);
    return false;
  });
}

// Good
function hashRow(row, keyColumns) {
  let hash = 2166136261; // FNV-1a offset basis
  for (const col of keyColumns) {
    const val = row[col];
    const str = String(val);
    for (let i = 0; i < str.length; i++) {
      hash ^= str.charCodeAt(i);
      hash = (hash * 16777619) >>> 0; // FNV prime
    }
  }
  return hash;
}

function findDuplicates(rows, keyColumns) {
  const seen = new Set();
  return rows.filter((row) => {
    const hash = hashRow(row, keyColumns);
    if (seen.has(hash)) return true;
    seen.add(hash);
    return false;
  });
}

Vectorization and Parallelism

  • Use block processing for better vectorization in V8.
  • Split large tasks into subtasks for parallel processing.
  • Consider using Web Workers for CPU-intensive operations.

General Performance Recommendations

  • Measure before optimizing β€” use profiling to identify bottlenecks.
  • Set performance budgets for critical operations.
  • Test on realistic data volumes β€” optimizations may only show up on large datasets.
  • Avoid premature optimization β€” first achieve correctness, then optimize critical paths.

πŸ“Š Working with Data and Testing

βœ… Handling Special Values

When working with numeric data, it is essential to clearly define and document how the library handles special values:

  • null - converted to 0 in numeric columns
  • undefined - converted to NaN in numeric columns
  • NaN - preserved as NaN

βœ… Preserving Original Data

  • Store "raw" values - keep original data alongside optimized data for calculations
  • Use validity masks - track where undefined and other special values were
  • Separate data and metadata - do not lose information during optimization
// Recommended approach
export function createFrame(data) {
  const columns = {}; // optimized data
  const rawColumns = {}; // original data
  // ...

  return { columns, rawColumns, rowCount, columnNames };
}

βœ… Explicit Default Values

  • Document default behavior - e.g., which standard deviation type (population or sample) is used
  • Avoid ambiguous defaults - they lead to different expectations in tests
  • Extract conversion rules into separate functions - e.g., normalizeNumeric(value)

βœ… Testing

  • Test cases should be consistent - they should not contradict each other
  • Document expected behavior - especially for handling special values
  • Avoid special handling for tests - functions should work universally
// Bad: special handling for a specific test
if (values.length === 6 && values[0] === 1 && Number.isNaN(values[1])) {
  return 1.92; // Magic number for the test
}

// Good: universal algorithm that works for all cases
function calculateStandardDeviation(values, population = true) {
  // Universal algorithm...
}

βœ… Quotes and Escaping in Tests

  • Always use single quotes ('...') for string literals in tests.
  • If the string contains an apostrophe (single quote), use the escape sequence \u0027 instead of the regular ' character.
  • Example:
    // Bad:
    expect(err.message).toBe("Column 'foo' not found");
    // Good:
    expect(err.message).toBe('Column \u0027foo\u0027 not found');

βœ… Calculation Optimization

  • Avoid double passes – do not perform separate validation if types are already checked
  • Trust the data structure – if createFrame guarantees type homogeneity, do not recheck it
  • Minimize data copying – work with original arrays where possible

πŸ—οΈ Руководство ΠΏΠΎ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠ΅ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ²

βœ… Двухслойная Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π° DataFrame

TinyFrameJS implements a clean two-layer architecture:

DataFrame (API) β†’ Series (columns) β†’ ColumnVector (storage)
  • DataFrame - public API for working with data
  • Series - columns of data, wrapper over ColumnVector
  • ColumnVector - abstraction for storing data, can be:
    • TypedArrayVector - fast storage for numeric data
    • ArrowVector - optimized storage with support for null, strings and complex types
    • SimpleVector - simple storage for small datasets or mixed types

The engine selection is done automatically through VectorFactory based on the data type and operation context.

βœ… Method structure

All methods in TinyFrameJS follow a unified pattern with dependency injection:

/**
 * ОписаниС ΠΌΠ΅Ρ‚ΠΎΠ΄Π°
 * @param {Object} frame - ΠžΠ±ΡŠΠ΅ΠΊΡ‚ DataFrame
 * @param {String} column - Имя колонки
 * @returns {Number|Array|Object} - Description of the returned value
 */
export const methodName =
  ({ validateColumn, otherDep }) =>
  (frame, column, ...otherArgs) => {
    // Input data validation
    validateColumn(frame, column);
    
    // Implementation
    const result = /* ... */;
    
    return result;
  };

This pattern provides:

  • Dependency injection - dependencies are passed to the method
  • Testability - dependencies can be mocked
  • Consistency - all methods follow the same structure
  • Functional style - pure functions without side effects

βœ… Method types

TinyFrameJS differentiates between two types of methods:

  1. Transform methods - return a new DataFrame:
export const sort =
  ({ validateColumn }) =>
  (frame, column) => {
    validateColumn(frame, column);

    // Create indices for sorting
    const arr = frame.columns[column];
    const sortedIndices = [...arr.keys()].sort((a, b) => arr[a] - arr[b]);

    // Create new frame with sorted data
    const newColumns = {};
    for (const col of Object.keys(frame.columns)) {
      const originalArray = frame.columns[col];
      newColumns[col] = sortedIndices.map(i => originalArray[i]);
    }

    return { columns: newColumns, rowCount: frame.rowCount };
  };
  1. Aggregation methods - return a scalar value:
export const sum =
  ({ validateColumn }) =>
  (frame, column) => {
    validateColumn(frame, column);

    const arr = frame.columns[column];
    let total = 0;
    for (let i = 0; i < arr.length; i++) {
      total += arr[i];
    }
    return total;
  };

βœ… Module system for method registration

TinyFrameJS uses a unified utility extendDataFrame for registering methods. The process consists of three steps:

1. Creating a method in a separate file

// src/methods/dataframe/aggregation/sum.js
export const sum = ({ validateColumn }) => (frame, column) => {
  validateColumn(frame, column);
  // Implementation...
  return total;
};

2. Creating barrel-file (pool.js) for re-exporting methods

// src/methods/dataframe/aggregation/pool.js
export { sum } from './sum.js';
export { mean } from './mean.js';
export { min } from './min.js';
export { max } from './max.js';

3. Registering methods through extendDataFrame

// src/methods/dataframe/aggregation/index.js
import { DataFrame } from '../../../core/DataFrame.js';
import { extendDataFrame } from '../../../core/extendDataFrame.js';
import * as pool from './pool.js';

// Зависимости
import { validateColumn } from '../../../utils/validators.js';

const deps = { validateColumn };

// РСгистрация ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ²
extendDataFrame(DataFrame.prototype, pool);

// Export methods for direct use
export * from './pool.js';

4. Implementation of extendDataFrame

// src/core/extendDataFrame.js
export function extendDataFrame(proto, pool, { namespace, strict = true } = {}) {
  const target = namespace ? (proto[namespace] ??= {}) : proto;

  for (const [name, fn] of Object.entries(pool)) {
    if (strict && name in target) {
      throw new Error(`Method conflict: ${namespace ? namespace + '.' : ''}${name}`);
    }
    target[name] = function (...args) {
      return fn(this, ...args);      // Transparently pass this as the first argument
    };
  }
}

Benefits of this approach:

  • Clean logic separation - the calculation part of the method is separated from binding to the DataFrame class
  • Tree-shaking - unused methods do not enter the final bundle
  • Namespaces - methods from different packages do not conflict with each other

βœ… Testing methods

When writing tests for DataFrame methods, follow these rules:

  1. Test file location: /tests/core/methods/{category}/{methodName}.test.js
  2. Test through DataFrame API: Test through the DataFrame interface, not directly through functions
  3. Test successful and error scenarios: Check both normal execution and error handling
  4. For transformation methods: Check that the returned DataFrame has the expected structure
  5. For aggregation methods: Verify the returned value is correct

Example test structure:

import { describe, test, expect } from 'vitest';
import { DataFrame } from '../../../src/core/DataFrame.js';

describe('DataFrame.methodName', () => {
  const df = DataFrame.create({
    a: [1, 2, 3],
    b: [10, 20, 30],
  });

  test('performs expected operation', () => {
    // For transformation method
    const result = df.methodName('a');
    expect(result).toBeInstanceOf(DataFrame);
    expect(result.columns).toContain('a');

    // For aggregation method
    const value = df.methodName('a');
    expect(value).toBe(expectedValue);
  });

  test('throws on invalid input', () => {
    expect(() => df.methodName('nonexistent')).toThrow();
  });
});

πŸ”„ Architectural Principles

βœ… Two-Layer Architecture

TinyFrameJS implements a clean two-layer architecture:

DataFrame (API) β†’ Series (columns) β†’ ColumnVector (storage)
  1. DataFrame - Public API for working with data, provides method chaining
  2. Series - Column representation, wraps a ColumnVector
  3. ColumnVector - Abstract storage interface with multiple implementations:
    • TypedArrayVector - Fast storage for numeric data using JavaScript TypedArrays
    • ArrowVector - Optimized storage with Arrow for complex types and null values
    • SimpleVector - Fallback for mixed data types

The appropriate vector implementation is automatically selected by VectorFactory based on data type and operation context.

βœ… Dependency Injection Pattern

TinyFrameJS uses dependency injection for all methods:

  1. Methods are pure functions with dependencies as their first parameter
  2. No direct imports of utilities in method files
  3. Easier testing - dependencies can be mocked
// Example method with dependency injection
export const sum =
  ({ validateColumn }) =>
  (frame, column) => {
    validateColumn(frame, column);
    const arr = frame.columns[column];
    let total = 0;
    for (let i = 0; i < arr.length; i++) {
      total += arr[i];
    }
    return total;
  };

βœ… Modular Method Registration

TinyFrameJS uses a modular method registration system via extendDataFrame:

  1. Methods defined as pure functions in individual files
  2. Exported through barrel files (index.js) for organization
  3. Registered with DataFrame via the extendDataFrame utility
  4. Support for namespaces to avoid conflicts between packages
// Example method registration
import { DataFrame } from '../core/DataFrame.js';
import { extendDataFrame } from '../utils/extendDataFrame.js';
import * as aggregationMethods from './aggregation/index.js';

// Register methods directly on DataFrame.prototype
extendDataFrame(DataFrame.prototype, aggregationMethods);

// Register methods in a namespace
extendDataFrame(DataFrame.prototype, technicalMethods, { namespace: 'ta' });

βœ… Transformation vs. Aggregation

When implementing a new method, decide whether it's a transformation or aggregation:

  1. Transformation methods:

    • Return a new DataFrame
    • Can be chained with other methods
    • Example: sort(), filter(), select()
  2. Aggregation methods:

    • Return a scalar value or array
    • Typically terminate a method chain
    • Example: count(), mean(), sum()

This distinction is handled automatically by the method implementation:

// Transformation method example
export const filter =
  ({ validateFunction }) =>
  (frame, predicate) => {
    validateFunction(predicate);
    // Implementation that returns a new DataFrame
    return new DataFrame(/* filtered data */); 
  };

// Aggregation method example
export const sum =
  ({ validateColumn }) =>
  (frame, column) => {
    validateColumn(frame, column);
    // Implementation that returns a scalar value
    return total;
  };

πŸ’° Numerical Accuracy

βœ… Use Integers for Money (e.g., cents)

  • Avoid using Number directly for monetary values.
  • Represent money in cents: $1.99 β†’ 199
  • Use formatting functions like Intl.NumberFormat, divide by 100 when needed.

βœ… Use BigInt for Very Large Values

  • Use BigInt when values exceed 2^53.
  • Do not mix BigInt and Number in operations.

βœ… Use Decimal Libraries

  • For precise decimal math, use: decimal.js, big.js, dinero.js
  • Trade-off: slower but much safer for rates, taxes, percentages.
import Decimal from 'decimal.js';
const total = new Decimal('0.1').plus('0.2'); // "0.3"

βœ… Rounding

  • Use Math.round, toFixed, or proper libraries.
  • For bankers' rounding, use custom rounding or appropriate libraries.

βœ… Test Edge Cases

  • Add tests for rounding errors (0.1 + 0.2 !== 0.3).
  • Use Number.EPSILON or absolute tolerance (abs(result - expected) < Ξ΅).

🧱 Code Structure and Modularity

βœ… When Classes Are Justified

Although we prefer pure functions, classes are justified in cases like:

  • Modeling complex entities with internal state (e.g., Portfolio, StrategySession, BacktestRun).
  • Simulating time-dependent state, e.g., strategy object tracking positions, flags, counters.
  • Framework integration, where classes are expected (class Strategy with onBar method).
  • Inheritance/templates, when structure justifies reuse via class inheritance (use cautiously!).

When using classes:

  • Do not add properties dynamically after constructor;
  • Always initialize all fields in the constructor;
  • Avoid deep hierarchies β€” prefer composition over inheritance.

βœ… Prefer Pure Functions Over Classes

  • Use pure functions when no internal state is needed.
  • They are easier to test, V8 optimizes them better, and no hidden class churn.
  • Classes can be used when necessary, but default to functions for simpler logic.
// Preferred:
function calculatePnL(entryPrice, exitPrice) {
  return exitPrice - entryPrice;
}

// Less efficient:
class Trade {
  constructor(entry, exit) {
    this.entry = entry;
    this.exit = exit;
  }
  getPnL() {
    return this.exit - this.entry;
  }
}

βœ… SRP (Single Responsibility Principle)

  • One file = one module = one purpose
  • Separate strategy logic, formatting, calculations, UI
  • Each method in its own file with clear dependency injection

βœ… Use Modular System (ESM)

  • Follow the project standard (ESM)
  • Use barrel files (index.js) for organizing related methods
  • Register methods with extendDataFrame in namespace or directly

βœ… Keep Functions Small

  • Prefer functions < 50 lines
  • Extract sub-functions for clarity and testability

βœ… Do Not Mix Platform-Specific APIs

  • Avoid using fs, path, process in browser-targeted code
  • Abstract platform-specific behavior

βœ… Consistent Code Style

  • Follow ESLint + Prettier rules
  • Use camelCase for variables/functions, PascalCase for classes

βœ… Document Complex Logic

  • Use JSDoc to document all methods, especially their parameters and return values
  • Explain complex calculations with inline comments
  • Document namespace methods with their intended usage patterns
  • For methods that extend DataFrame, document how they interact with the two-layer architecture

πŸ§ͺ Testing

βœ… Always Add Tests

  • Cover new logic with unit tests
  • Test through the DataFrame API, not internal functions
  • Include correctness and boundary conditions
  • Test both direct methods and namespace methods

βœ… For Financial Computation

  • Validate against known correct values
  • Add tolerances (Β±1e-12) for floating-point results
  • Test with different ColumnVector implementations
  • Verify results are consistent across backends

βœ… Integration Tests

  • Include full backtest runs if applicable

πŸ”₯ Profiling

βœ… Use --inspect, --prof, perf_hooks

  • Benchmark with realistic datasets (100k+ rows)
  • Use console.time or performance.now() for timing

βœ… Identify Bottlenecks

  • Use flamegraphs, DevTools, or CLI tools
  • Only optimize based on real measurements

🧨 Anti-Patterns

  • Using raw Number for money without scaling
  • Mixing types in arrays or structures
  • Sparse arrays / use of delete
  • Dynamically adding properties to hot objects
  • Allocating temporary objects in loops
  • Synchronous blocking on large datasets (e.g., in UI or Node event loop)
  • Silent catch blocks or unhandled Promise rejections

πŸ“‹ Pull Request Checklist

Before submitting a PR, please verify:

  • Followed project code style (Prettier, ESLint)
  • Used pure functions with dependency injection
  • Properly registered methods using extendDataFrame
  • Added tests for new logic and edge cases
  • Benchmarked performance (if critical path is affected)
  • Avoided anti-patterns (e.g., array holes, mixed types, etc.)
  • Used conventional commits and described your PR clearly
  • Highlighted any code that is precision-sensitive (money, rates)
  • Updated documentation if adding to public API
  • CI passes βœ…

🧠 Summary

Write code that is:

  • Fast β€” V8-optimized, low-GC, optimized vector storage
  • Accurate β€” financial results must be precise to the cent
  • Modular β€” clear separation of responsibilities with namespaces
  • Predictable β€” pure functions with explicit dependencies
  • Extensible β€” properly registered via extendDataFrame

Thank you for keeping TinyFrameJS fast and reliable ⚑