Skip to content

Add window functions node (rolling, cumulative, rank, tile)#420

Open
Edwardvaneechoud wants to merge 3 commits into
mainfrom
claude/window-functions-flowfile-GiN6i
Open

Add window functions node (rolling, cumulative, rank, tile)#420
Edwardvaneechoud wants to merge 3 commits into
mainfrom
claude/window-functions-flowfile-GiN6i

Conversation

@Edwardvaneechoud
Copy link
Copy Markdown
Owner

Summary

This PR adds comprehensive support for window functions to Flowfile, enabling users to compute rolling averages, cumulative sums, rankings, and tile-based partitioning operations on their data.

Key Changes

Backend Implementation

  • Schema definitions (transform_schema.py): Added WindowFunctionName, RankMethod, RollingEdgeBehavior types and WindowFunctionInput/WindowFunctionsInput models with comprehensive validation

    • Validates that rolling functions have positive window sizes
    • Ensures tile functions have at least 2 groups
    • Requires order_by for rolling and tile operations
    • Prevents duplicate output column names
    • Infers output types based on function (e.g., rolling_mean → Float64, rank → UInt32)
  • Data engine (flow_data_engine.py): Implemented do_window_functions() method and _build_window_expr() helper

    • Supports 11 window functions: rolling_sum/mean/min/max/std, cum_sum/count/min/max, rank, tile
    • Handles three edge behaviors for rolling windows: require_full (nulls), partial (incomplete windows), fill_zero
    • Implements SQL NTILE-compatible tile distribution across partitions
    • Supports 5 ranking methods: ordinal, dense, min, max, average
  • Flow graph (flow_graph.py): Added add_window_functions() method with automatic schema inference for output columns

  • Code generator (code_generator.py): Generates executable Polars/FlowFrame code for window operations, including complex tile logic

  • Input schema (input_schema.py): Added NodeWindowFunctions class with human-readable descriptions

Frontend Implementation

  • Vue component (WindowFunctions.vue): Full-featured UI with:

    • Multi-select for partition_by columns with optional badge
    • Ordered table for order_by configuration with direction controls
    • Dynamic window function table with function-specific parameters
    • Auto-naming of output columns that updates when function/column changes
    • Comprehensive tooltips explaining each feature
    • Real-time validation with helpful error messages
  • Type definitions (node.types.ts): TypeScript interfaces for all window function types

  • Node configuration (nodes.py): Registered window_functions as a standard node template

  • Icon (window_functions.svg): Added SVG icon with light/dark mode support

Testing

  • Comprehensive test suite (test_window_functions.py): 13 tests covering:

    • Rolling functions with partitioning and edge behaviors
    • Cumulative operations across partitions
    • Tile distribution with SQL NTILE semantics
    • Ranking with various tie-breaking methods
    • Input validation and error cases
  • Code generation tests (test_code_generator.py): 2 tests verifying generated code correctness for rolling/cumulative and tile operations

Notable Implementation Details

  • Auto-naming: Output column names are auto-generated based on function and source column, but switch to manual mode when user edits them (re-typing the auto name re-enables auto-update)
  • Tile algorithm: Uses ceiling division and conditional logic to distribute rows into equal-sized groups per partition, matching SQL NTILE behavior
  • Edge behavior: Rolling windows can require full windows (nulls), use partial windows, or fill nulls with 0
  • Type inference: Output types are automatically determined from the function and input column type

https://claude.ai/code/session_01KHCMBBsT4TcFtpDJ3tsXTc

claude added 3 commits April 19, 2026 22:09
Introduces a new "Window Functions" transformation node so medium users
can compute rolling aggregates, cumulative aggregates, rank and
equal-sized tile groupings (Alteryx-style) without writing code. The
node accepts an optional partition-by, an order-by list, and a list of
per-column window operations producing one new column each.

Backend:
- transform_schema: WindowFunctionInput / WindowFunctionsInput with
  validation for required params per function family.
- input_schema: NodeWindowFunctions wired into NODE_TYPE_TO_SETTINGS_CLASS.
- FlowDataEngine.do_window_functions applies ops via with_columns and
  .over(partition) after pre-sorting by order_by. Tile uses SQL NTILE
  semantics (sizes distributed with larger groups first).
- FlowGraph.add_window_functions registers the node and its schema.
- code_generator emits cross-framework (pl + ff) expressions.
- node_store registers the node under the aggregate category.

Frontend:
- WindowFunctions.vue component with partition-by, order-by, and op
  table reusing the GroupBy/Sort UX patterns.
- Type definitions in node.types.ts; icon added.

Tests:
- Unit tests for rolling, cumulative, rank, tile semantics and schema
  validation in test_window_functions.py.
- End-to-end code-generator tests for both polars and flowframe outputs.

https://claude.ai/code/session_01KHCMBBsT4TcFtpDJ3tsXTc
…led params

Addresses UX feedback from the first pass:

- Output name column is now properly laid out (table uses table-layout:
  fixed with explicit column widths) so the input is always visible and
  editable. Previously the column existed but got squashed.
- Auto-naming now tracks per-row whether the name is auto-generated or
  user-edited. Changing the source column or function updates the output
  name only when the user hasn't customized it; clearing the field
  re-enables auto-update.
- Added help-icon tooltips on each section header explaining what
  partition-by and order-by do and what the parameter column means.
- Parameter column now shows a label under each input ("window size
  (rows)", "number of groups", "tie-breaking method") so "3" in the row
  is no longer mystery meat.
- Empty-params row (cum_sum etc) now renders a clear "No parameters"
  label instead of an empty input-number blob.
- Tile minimum is now 2 (a single group is pointless).

https://claude.ai/code/session_01KHCMBBsT4TcFtpDJ3tsXTc
Adds a per-row control for how rolling functions behave before the
window has filled up:

- "Leave empty (null)" — require a full window (default, matches Polars'
  default behavior)
- "Use partial window" — compute on whatever rows are available
  (min_samples=1)
- "Fill with 0" — compute on partial windows, then coalesce any
  remaining nulls to 0 (useful when the source column itself has nulls)

Wired through transform_schema (new RollingEdgeBehavior literal),
FlowDataEngine expression builder, code generator, and the Vue UI
(secondary dropdown labeled "incomplete windows" under the window size
input). Tests cover all three behaviors including the null-source-column
case for fill_zero.

https://claude.ai/code/session_01KHCMBBsT4TcFtpDJ3tsXTc
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 20, 2026

Deploy Preview for flowfile-wasm canceled.

Name Link
🔨 Latest commit 7d71693
🔍 Latest deploy log https://app.netlify.com/projects/flowfile-wasm/deploys/69e5b117a5bb73000949f002

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants