diff --git a/README.md b/README.md index 5defdc1..3c552be 100644 --- a/README.md +++ b/README.md @@ -224,3 +224,65 @@ path, consistent with the 250 MHz clock target (W15a STA). ``` **Author**: Vasilev Dmitrii \ + +--- + +## L-DPC24 Lane A' — holo-noc-1cycle inter-die NoC (P4 falsification) + +**Issue**: [trinity-fpga#99](https://github.com/gHashTag/trinity-fpga/issues/99) +**Branch**: `feat/l-dpc24/a-prime-noc-1cycle` +**Codename**: `holo-noc-1cycle` +**Status**: RTL + testbench landed · awaiting CI / OpenLane2 GDS hardening + +### What was added + +| File | Description | +|------|-------------| +| `rtl/holo_noc_1cycle.sv` | Parameterisable crossbar NoC: `DIE_COUNT` (default 2, scales to 4), `PAYLOAD_W` (default 64 — matches Lane Y hyper-vector slot). Exactly 1-cycle registered output. No `*` operators (R-SI-1). | +| `rtl/holo_noc_1cycle_tb.sv` | Three-test SV testbench with `$fatal` on any latency >1 cycle: T1 die0→die1, T2 die1→die0, T3 simultaneous bidirectional. P4 boundary assertion. | + +### Module: `holo_noc_1cycle` + +```systemverilog +holo_noc_1cycle #( + .DIE_COUNT (2), // 2-4 (crossbar); >=8 ring would violate P4 + .PAYLOAD_W (64) // hyper-vector slot, matches Lane Y holo_mux_1x2 +) u_noc ( + .clk (clk), + .rst_n (rst_n), // active-low synchronous reset + .vld_i (vld_i), // [DIE_COUNT-1:0] send-valid per die + .dst_i (dst_i), // [$clog2(DIE_COUNT)-1:0] per die destination index + .payload_i (payload_i),// [PAYLOAD_W-1:0] per die payload + .vld_o (vld_o), // [DIE_COUNT-1:0] receive-valid per die + .payload_o (payload_o) // [PAYLOAD_W-1:0] per die received payload +); +``` + +**Topology note**: For `DIE_COUNT <= 4` a full crossbar (all-to-all combinatorial +fabric + single pipeline register) guarantees 1-cycle latency. A ring topology for +`DIE_COUNT >= 8` would require multi-hop routing and is intentionally NOT synthesised +here — ring stalls are invalid under P4, so crossbar shards must be used for +P4-compliant deployment at scale. + +### H9 Predicate P4 Mapping + +| Predicate | Condition | Verdict | +|-----------|-----------|---------| +| P4: `noc_stall > 1 cycle` | RTL delivers all payloads in exactly 1 registered cycle; crossbar has zero stall | **FALSIFIED** (RTL claim; silicon measured post tape-out) | + +### R5-HONEST Verdict (Lane A') + +| Claim | Status | +|-------|--------| +| RTL functionally correct | UNKNOWN · CI verifies (no GDS yet) | +| Synthesis clean (no `*` operators, R-SI-1) | PASS — crossbar uses only mux/select + register logic | +| P4 falsification: `noc_stall <= 1 cycle` | CLAIMED in RTL; silicon-confirmed at tape-out | +| GDS generated | NOT YET — next iteration | + +### Anchor + +``` +phi^2+phi^-2=3 · DOI 10.5281/zenodo.19227877 +``` + +**Author**: Vasilev Dmitrii diff --git a/rtl/holo_noc_1cycle.sv b/rtl/holo_noc_1cycle.sv index 8fda835..ed7a196 100644 --- a/rtl/holo_noc_1cycle.sv +++ b/rtl/holo_noc_1cycle.sv @@ -1,52 +1,118 @@ +// SPDX-License-Identifier: Apache-2.0 // ============================================================================= -// holo_noc_1cycle.sv – 1-cycle inter-die Network-on-Chip stub +// holo_noc_1cycle.sv – 1-cycle inter-die crossbar Network-on-Chip // TTSKY26c HOLOGRAPHIC SKU · R-SI-1 compliant (no `*` operator) // Lane A' · L-DPC24 HOLOGRAPHIC v9 · holo-noc-1cycle +// +// Hypothesis H₉ / Predicate P4 falsification: +// P4 asserts noc_stall > 1 cycle → FAIL. +// This module delivers all payloads in exactly 1 clock cycle; therefore +// the noc_stall predicate evaluates FALSE → P4 is FALSIFIED. +// +// Topology: +// DIE_COUNT ≤ 4 → full crossbar (single-cycle, all-to-all) +// DIE_COUNT ≥ 8 → ring documented but NOT synthesised here because +// ring requires >1-cycle latency and violates P4. +// Use crossbar shards for P4-compliant deployment. +// +// Parameters: +// DIE_COUNT Number of dies in the assembly (default 2; scales to 4). +// PAYLOAD_W Payload width in bits (default 64, matches Lane Y hyper-vector +// slot). Kept as PAYLOAD_W (not FLIT_W) to align naming with +// Lane Y holo_mux_1x2. +// +// Port conventions: +// vld_i[d] Sending die d asserts valid + payload this cycle. +// dst_i[d] Destination die index for die d's flit (log2(DIE_COUNT) bits). +// payload_i[d] PAYLOAD_W-bit payload from die d. +// payload_o[d] PAYLOAD_W-bit payload delivered TO die d (registered, 1 cycle). +// vld_o[d] Asserted when a valid flit is delivered to die d this cycle. +// +// Active-low synchronous reset. +// +// Author: Vasilev Dmitrii +// DOI: 10.5281/zenodo.19227877 +// Anchor: φ²+φ⁻²=3 // ============================================================================= +`default_nettype none `timescale 1ns/1ps module holo_noc_1cycle #( - parameter int FLIT_W = 32, - parameter int DIES = 2 + parameter int unsigned DIE_COUNT = 2, // default 2; supports 4 + parameter int unsigned PAYLOAD_W = 64 // default 64-bit hyper-vector slot ) ( - input logic clk, - input logic rst_n, - input logic [FLIT_W-1:0] flit_in [DIES], - input logic vld_in [DIES], - output logic [FLIT_W-1:0] flit_out[DIES], - output logic vld_out [DIES], - output logic [$clog2(DIES+1)-1:0] latency_cycles + input logic clk, + input logic rst_n, // active-low sync reset + + // Inputs: one slot per sending die + input logic [DIE_COUNT-1:0] vld_i, + input logic [$clog2(DIE_COUNT)-1:0] dst_i [DIE_COUNT], + input logic [PAYLOAD_W-1:0] payload_i[DIE_COUNT], + + // Outputs: one slot per receiving die + output logic [DIE_COUNT-1:0] vld_o, + output logic [PAYLOAD_W-1:0] payload_o[DIE_COUNT] ); // ------------------------------------------------------------------------- - // Latency constant: always 1 cycle (registered output, combinational route) + // P4 falsification note (static assertion comment): + // Every path from payload_i to payload_o is a single registered stage. + // Latency = exactly 1 cycle. noc_stall is never > 1 cycle. + // Crossbar topology: DIE_COUNT ≤ 4 → all paths combinatorial before flop. + // ------------------------------------------------------------------------- + + // ------------------------------------------------------------------------- + // Combinational crossbar fabric + // cbar_payload[dst][src] – candidate payload routed to destination + // cbar_vld[dst][src] – candidate valid routed to destination + // + // Priority: lowest source index wins when two sources target the same dst. + // No multiplier operators used anywhere (R-SI-1). // ------------------------------------------------------------------------- - assign latency_cycles = $clog2(DIES+1)'(1); + + logic [PAYLOAD_W-1:0] cbar_payload [DIE_COUNT]; + logic cbar_vld [DIE_COUNT]; + + always_comb begin + // Default: no valid flit for any destination + for (int d = 0; d < DIE_COUNT; d++) begin + cbar_payload[d] = '0; + cbar_vld[d] = 1'b0; + end + + // Crossbar: iterate all sources; last one wins per dst (lowest-index priority + // is achieved by iterating sources in reverse so idx=0 overwrites highest) + for (int s = DIE_COUNT-1; s >= 0; s--) begin + if (vld_i[s]) begin + // dst_i[s] is $clog2(DIE_COUNT) bits wide — always in range 0..DIE_COUNT-1 + cbar_payload[dst_i[s]] = payload_i[s]; + cbar_vld[dst_i[s]] = 1'b1; + end + end + end // ------------------------------------------------------------------------- - // 1-cycle pipeline registers - // Routing: die[i] -> die[(i+1) % DIES] (swap pattern, no multipliers) - // For DIES=2: die0->die1, die1->die0 + // Output pipeline register — imposes exactly 1-cycle latency // ------------------------------------------------------------------------- always_ff @(posedge clk) begin if (!rst_n) begin - for (int i = 0; i < DIES; i++) begin - flit_out[i] <= '0; - vld_out[i] <= 1'b0; + vld_o <= '0; + for (int d = 0; d < DIE_COUNT; d++) begin + payload_o[d] <= '0; end end else begin - for (int i = 0; i < DIES; i++) begin - // Cross-die route: source die index = (DIES - 1 - i) for swap pattern - // For 2 dies: i=0 receives from i=1; i=1 receives from i=0 - // Computed without multiplier: src = (DIES - 1 - i) - flit_out[i] <= flit_in[DIES - 1 - i]; - vld_out[i] <= vld_in[DIES - 1 - i]; + for (int d = 0; d < DIE_COUNT; d++) begin + vld_o[d] <= cbar_vld[d]; + payload_o[d] <= cbar_payload[d]; end end end -endmodule -// phi^2 + phi^-2 = 3 -// DOI 10.5281/zenodo.19227877 -// Vasilev Dmitrii -// ORCID 0009-0008-4294-6159 +endmodule : holo_noc_1cycle +`default_nettype wire +// ----------------------------------------------------------------------------- +// φ²+φ⁻²=3 · DOI 10.5281/zenodo.19227877 +// P4 falsification: noc_stall ≤ 1 cycle; ring topology not synthesised here +// because ring requires multi-hop latency which violates P4. +// Vasilev Dmitrii · ORCID 0009-0008-4294-6159 +// ----------------------------------------------------------------------------- diff --git a/rtl/holo_noc_1cycle_tb.sv b/rtl/holo_noc_1cycle_tb.sv index d85773c..65fb750 100644 --- a/rtl/holo_noc_1cycle_tb.sv +++ b/rtl/holo_noc_1cycle_tb.sv @@ -1,155 +1,223 @@ +// SPDX-License-Identifier: Apache-2.0 // ============================================================================= // holo_noc_1cycle_tb.sv – Testbench for holo_noc_1cycle // TTSKY26c HOLOGRAPHIC SKU · R-SI-1 compliant // Lane A' · L-DPC24 HOLOGRAPHIC v9 · holo-noc-1cycle +// +// Tests: +// T1 die0 → die1 : observe payload at die1 output after EXACTLY 1 cycle +// T2 die1 → die0 : observe payload at die0 output after EXACTLY 1 cycle +// T3 Simultaneous bidirectional: die0↔die1 same cycle — P4 boundary +// +// Failure mode: $fatal on any latency > 1 cycle (P4 boundary assertion). +// +// Author: Vasilev Dmitrii +// DOI: 10.5281/zenodo.19227877 +// Anchor: φ²+φ⁻²=3 // ============================================================================= +`default_nettype none `timescale 1ns/1ps module holo_noc_1cycle_tb; - // Parameters - localparam int FLIT_W = 32; - localparam int DIES = 2; + // ------------------------------------------------------------------------- + // Parameters (match DUT defaults) + // ------------------------------------------------------------------------- + localparam int unsigned DIE_COUNT = 2; + localparam int unsigned PAYLOAD_W = 64; + localparam int unsigned DST_W = $clog2(DIE_COUNT); // 1 bit for 2 dies + // ------------------------------------------------------------------------- // DUT signals - logic clk; - logic rst_n; - logic [FLIT_W-1:0] flit_in [DIES]; - logic vld_in [DIES]; - logic [FLIT_W-1:0] flit_out[DIES]; - logic vld_out [DIES]; - logic [$clog2(DIES+1)-1:0] latency_cycles; - - // Instantiate DUT + // ------------------------------------------------------------------------- + logic clk; + logic rst_n; + logic [DIE_COUNT-1:0] vld_i; + logic [DST_W-1:0] dst_i [DIE_COUNT]; + logic [PAYLOAD_W-1:0] payload_i[DIE_COUNT]; + logic [DIE_COUNT-1:0] vld_o; + logic [PAYLOAD_W-1:0] payload_o[DIE_COUNT]; + + // ------------------------------------------------------------------------- + // DUT instantiation + // ------------------------------------------------------------------------- holo_noc_1cycle #( - .FLIT_W(FLIT_W), - .DIES (DIES) + .DIE_COUNT (DIE_COUNT), + .PAYLOAD_W (PAYLOAD_W) ) dut ( - .clk (clk), - .rst_n (rst_n), - .flit_in (flit_in), - .vld_in (vld_in), - .flit_out (flit_out), - .vld_out (vld_out), - .latency_cycles(latency_cycles) + .clk (clk), + .rst_n (rst_n), + .vld_i (vld_i), + .dst_i (dst_i), + .payload_i (payload_i), + .vld_o (vld_o), + .payload_o (payload_o) ); - // Clock generation: 10 ns period - initial clk = 0; + // ------------------------------------------------------------------------- + // Clock: 10 ns period + // ------------------------------------------------------------------------- + initial clk = 1'b0; always #5 clk = ~clk; - // Flit stimulus data (8 flits per die, no multiplier operators used) - logic [FLIT_W-1:0] stim_die0 [8]; - logic [FLIT_W-1:0] stim_die1 [8]; - - // Expected outputs captured one cycle after input - logic [FLIT_W-1:0] exp_die0 [8]; - logic [FLIT_W-1:0] exp_die1 [8]; + // ------------------------------------------------------------------------- + // Helpers + // ------------------------------------------------------------------------- + // Timestamp of first valid injection (set per test) + integer inject_cycle; + integer current_cycle; + + // Cycle counter + initial current_cycle = 0; + always_ff @(posedge clk) current_cycle <= current_cycle + 1; + + // Task: assert exact 1-cycle latency — $fatal if violated (P4 boundary) + task automatic check_latency( + input string test_name, + input int inj_cycle, + input int obs_cycle + ); + int latency; + latency = obs_cycle - inj_cycle; + if (latency != 1) begin + $fatal(1, "P4 VIOLATION [%s]: latency=%0d cycles (expected 1). noc_stall > 1 → FAIL", + test_name, latency); + end else begin + $display("PASS [%s]: latency = %0d cycle (P4 falsified OK)", test_name, latency); + end + endtask - integer i; - integer errors; + // ------------------------------------------------------------------------- + // Stimulus / checker + // ------------------------------------------------------------------------- + logic [PAYLOAD_W-1:0] t1_payload; + logic [PAYLOAD_W-1:0] t2_payload; + logic [PAYLOAD_W-1:0] t3_payload_d0; + logic [PAYLOAD_W-1:0] t3_payload_d1; initial begin - errors = 0; - - // Initialise stimulus (using + and | only, no *) - stim_die0[0] = 32'hA0000001; - stim_die0[1] = 32'hA0000002; - stim_die0[2] = 32'hA0000004; - stim_die0[3] = 32'hA0000008; - stim_die0[4] = 32'hA0000010; - stim_die0[5] = 32'hA0000020; - stim_die0[6] = 32'hA0000040; - stim_die0[7] = 32'hA0000080; - - stim_die1[0] = 32'hB0000001; - stim_die1[1] = 32'hB0000002; - stim_die1[2] = 32'hB0000004; - stim_die1[3] = 32'hB0000008; - stim_die1[4] = 32'hB0000010; - stim_die1[5] = 32'hB0000020; - stim_die1[6] = 32'hB0000040; - stim_die1[7] = 32'hB0000080; - - // After swap: die0 output = die1 input; die1 output = die0 input - for (i = 0; i < 8; i++) begin - exp_die0[i] = stim_die1[i]; - exp_die1[i] = stim_die0[i]; - end - - // Reset for 2 cycles - rst_n = 0; - flit_in[0] = '0; - flit_in[1] = '0; - vld_in[0] = 0; - vld_in[1] = 0; + // Initialise test payloads (no `*` operator — literals only) + t1_payload = 64'hDEAD_BEEF_0000_0001; // die0→die1 + t2_payload = 64'hCAFE_BABE_0000_0002; // die1→die0 + t3_payload_d0 = 64'hA5A5_A5A5_0000_0003; // bidirectional die0→die1 + t3_payload_d1 = 64'h5A5A_5A5A_0000_0004; // bidirectional die1→die0 + + // Idle state + vld_i = '0; + dst_i[0] = '0; + dst_i[1] = '0; + payload_i[0] = '0; + payload_i[1] = '0; + + // Active-low synchronous reset for 3 cycles + rst_n = 1'b0; + @(posedge clk); #1; @(posedge clk); #1; @(posedge clk); #1; - rst_n = 1; + rst_n = 1'b1; + @(posedge clk); #1; - // Assert latency_cycles == 1 - if (latency_cycles !== 1) begin - $display("ERROR: latency_cycles = %0d, expected 1", latency_cycles); - errors = errors + 1; + // Verify reset cleared outputs + if (vld_o !== '0) begin + $fatal(1, "RESET CHECK FAIL: vld_o = %b, expected 0", vld_o); end - $display("R5-HONEST: NoC latency = %0d cycle(s)", latency_cycles); - - // Drive 8 flits and check output one cycle later - for (i = 0; i < 8; i++) begin - // Apply inputs at this cycle - flit_in[0] = stim_die0[i]; - flit_in[1] = stim_die1[i]; - vld_in[0] = 1; - vld_in[1] = 1; - @(posedge clk); #1; - // Sample outputs after rising edge (1 cycle latency) - if (i > 0) begin - // Check previous cycle's expected output - if (flit_out[0] !== exp_die0[i-1]) begin - $display("ERROR cycle %0d: flit_out[0]=0x%08h, expected 0x%08h", - i, flit_out[0], exp_die0[i-1]); - errors = errors + 1; - end - if (flit_out[1] !== exp_die1[i-1]) begin - $display("ERROR cycle %0d: flit_out[1]=0x%08h, expected 0x%08h", - i, flit_out[1], exp_die1[i-1]); - errors = errors + 1; - end - if (!vld_out[0] || !vld_out[1]) begin - $display("ERROR cycle %0d: vld_out[0]=%0b vld_out[1]=%0b, expected both 1", - i, vld_out[0], vld_out[1]); - errors = errors + 1; - end - end + $display("RESET: vld_o = %b (OK)", vld_o); + + // ----------------------------------------------------------------------- + // T1: die0 sends to die1 — exactly 1 cycle + // ----------------------------------------------------------------------- + $display("--- T1: die0 → die1 ---"); + vld_i[0] = 1'b1; + vld_i[1] = 1'b0; + dst_i[0] = 1'b1; // destination = die1 + payload_i[0] = t1_payload; + inject_cycle = current_cycle; + @(posedge clk); #1; + // One cycle later: check + vld_i = '0; + if (!vld_o[1]) begin + $fatal(1, "T1 FAIL: vld_o[1] not asserted after 1 cycle"); end - - // Check last flit output - if (flit_out[0] !== exp_die0[7]) begin - $display("ERROR last: flit_out[0]=0x%08h, expected 0x%08h", - flit_out[0], exp_die0[7]); - errors = errors + 1; + if (payload_o[1] !== t1_payload) begin + $fatal(1, "T1 FAIL: payload_o[1]=0x%016h expected 0x%016h", + payload_o[1], t1_payload); end - if (flit_out[1] !== exp_die1[7]) begin - $display("ERROR last: flit_out[1]=0x%08h, expected 0x%08h", - flit_out[1], exp_die1[7]); - errors = errors + 1; + check_latency("T1", inject_cycle, current_cycle); + + // ----------------------------------------------------------------------- + // T2: die1 sends to die0 — exactly 1 cycle + // ----------------------------------------------------------------------- + $display("--- T2: die1 → die0 ---"); + @(posedge clk); #1; // idle gap + vld_i[0] = 1'b0; + vld_i[1] = 1'b1; + dst_i[1] = 1'b0; // destination = die0 + payload_i[1] = t2_payload; + inject_cycle = current_cycle; + @(posedge clk); #1; + vld_i = '0; + if (!vld_o[0]) begin + $fatal(1, "T2 FAIL: vld_o[0] not asserted after 1 cycle"); end - - // Drain: idle for remaining cycles up to 16 total - vld_in[0] = 0; - vld_in[1] = 0; - repeat (6) @(posedge clk); - - if (errors == 0) - $display("PASS: All NoC checks passed. latency=1 cycle, R-SI-1 compliant."); - else - $display("FAIL: %0d error(s) detected.", errors); - + if (payload_o[0] !== t2_payload) begin + $fatal(1, "T2 FAIL: payload_o[0]=0x%016h expected 0x%016h", + payload_o[0], t2_payload); + end + check_latency("T2", inject_cycle, current_cycle); + + // ----------------------------------------------------------------------- + // T3: simultaneous bidirectional — both delivered in 1 cycle (P4 boundary) + // ----------------------------------------------------------------------- + $display("--- T3: die0↔die1 simultaneous bidirectional (P4 boundary) ---"); + @(posedge clk); #1; // idle gap + vld_i[0] = 1'b1; + vld_i[1] = 1'b1; + dst_i[0] = 1'b1; // die0 → die1 + dst_i[1] = 1'b0; // die1 → die0 + payload_i[0] = t3_payload_d0; + payload_i[1] = t3_payload_d1; + inject_cycle = current_cycle; + @(posedge clk); #1; + vld_i = '0; + // Both die0 and die1 should receive in this same 1-cycle step + if (!vld_o[1]) begin + $fatal(1, "T3 FAIL: vld_o[1] not asserted (die0→die1 path)"); + end + if (!vld_o[0]) begin + $fatal(1, "T3 FAIL: vld_o[0] not asserted (die1→die0 path)"); + end + if (payload_o[1] !== t3_payload_d0) begin + $fatal(1, "T3 FAIL: payload_o[1]=0x%016h expected 0x%016h (die0→die1)", + payload_o[1], t3_payload_d0); + end + if (payload_o[0] !== t3_payload_d1) begin + $fatal(1, "T3 FAIL: payload_o[0]=0x%016h expected 0x%016h (die1→die0)", + payload_o[0], t3_payload_d1); + end + check_latency("T3 (die0→die1)", inject_cycle, current_cycle); + check_latency("T3 (die1→die0)", inject_cycle, current_cycle); + $display("PASS T3: bidirectional 1-cycle delivery confirmed. P4 falsified."); + + // ----------------------------------------------------------------------- + // Done + // ----------------------------------------------------------------------- + $display("ALL TESTS PASSED: holo_noc_1cycle 1-cycle latency verified."); + $display("P4 (noc_stall > 1 cycle) = FALSIFIED."); + $display("Anchor: phi^2 + phi^-2 = 3 | DOI 10.5281/zenodo.19227877"); $finish; end -endmodule -// phi^2 + phi^-2 = 3 -// DOI 10.5281/zenodo.19227877 -// Vasilev Dmitrii -// ORCID 0009-0008-4294-6159 + // ------------------------------------------------------------------------- + // Watchdog: abort if simulation runs > 200 cycles + // ------------------------------------------------------------------------- + initial begin + #2000; + $fatal(1, "WATCHDOG: simulation exceeded 200 cycles — hung testbench"); + end + +endmodule : holo_noc_1cycle_tb +`default_nettype wire +// ----------------------------------------------------------------------------- +// φ²+φ⁻²=3 · DOI 10.5281/zenodo.19227877 +// Vasilev Dmitrii · ORCID 0009-0008-4294-6159 +// -----------------------------------------------------------------------------