Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,3 +224,65 @@ path, consistent with the 250 MHz clock target (W15a STA).
```

**Author**: Vasilev Dmitrii \<admin@t27.ai\>

---

## L-DPC24 Lane A' — holo-noc-1cycle inter-die NoC (P4 falsification)

**Issue**: [trinity-fpga#99](https://github.com/gHashTag/trinity-fpga/issues/99)
**Branch**: `feat/l-dpc24/a-prime-noc-1cycle`
**Codename**: `holo-noc-1cycle`
**Status**: RTL + testbench landed · awaiting CI / OpenLane2 GDS hardening

### What was added

| File | Description |
|------|-------------|
| `rtl/holo_noc_1cycle.sv` | Parameterisable crossbar NoC: `DIE_COUNT` (default 2, scales to 4), `PAYLOAD_W` (default 64 — matches Lane Y hyper-vector slot). Exactly 1-cycle registered output. No `*` operators (R-SI-1). |
| `rtl/holo_noc_1cycle_tb.sv` | Three-test SV testbench with `$fatal` on any latency >1 cycle: T1 die0→die1, T2 die1→die0, T3 simultaneous bidirectional. P4 boundary assertion. |

### Module: `holo_noc_1cycle`

```systemverilog
holo_noc_1cycle #(
.DIE_COUNT (2), // 2-4 (crossbar); >=8 ring would violate P4
.PAYLOAD_W (64) // hyper-vector slot, matches Lane Y holo_mux_1x2
) u_noc (
.clk (clk),
.rst_n (rst_n), // active-low synchronous reset
.vld_i (vld_i), // [DIE_COUNT-1:0] send-valid per die
.dst_i (dst_i), // [$clog2(DIE_COUNT)-1:0] per die destination index
.payload_i (payload_i),// [PAYLOAD_W-1:0] per die payload
.vld_o (vld_o), // [DIE_COUNT-1:0] receive-valid per die
.payload_o (payload_o) // [PAYLOAD_W-1:0] per die received payload
);
```

**Topology note**: For `DIE_COUNT <= 4` a full crossbar (all-to-all combinatorial
fabric + single pipeline register) guarantees 1-cycle latency. A ring topology for
`DIE_COUNT >= 8` would require multi-hop routing and is intentionally NOT synthesised
here — ring stalls are invalid under P4, so crossbar shards must be used for
P4-compliant deployment at scale.

### H9 Predicate P4 Mapping

| Predicate | Condition | Verdict |
|-----------|-----------|---------|
| P4: `noc_stall > 1 cycle` | RTL delivers all payloads in exactly 1 registered cycle; crossbar has zero stall | **FALSIFIED** (RTL claim; silicon measured post tape-out) |

### R5-HONEST Verdict (Lane A')

| Claim | Status |
|-------|--------|
| RTL functionally correct | UNKNOWN · CI verifies (no GDS yet) |
| Synthesis clean (no `*` operators, R-SI-1) | PASS — crossbar uses only mux/select + register logic |
| P4 falsification: `noc_stall <= 1 cycle` | CLAIMED in RTL; silicon-confirmed at tape-out |
| GDS generated | NOT YET — next iteration |

### Anchor

```
phi^2+phi^-2=3 · DOI 10.5281/zenodo.19227877
```

**Author**: Vasilev Dmitrii <admin@t27.ai>
124 changes: 95 additions & 29 deletions rtl/holo_noc_1cycle.sv
Original file line number Diff line number Diff line change
@@ -1,52 +1,118 @@
// SPDX-License-Identifier: Apache-2.0
// =============================================================================
// holo_noc_1cycle.sv – 1-cycle inter-die Network-on-Chip stub
// holo_noc_1cycle.sv – 1-cycle inter-die crossbar Network-on-Chip
// TTSKY26c HOLOGRAPHIC SKU · R-SI-1 compliant (no `*` operator)
// Lane A' · L-DPC24 HOLOGRAPHIC v9 · holo-noc-1cycle
//
// Hypothesis H₉ / Predicate P4 falsification:
// P4 asserts noc_stall > 1 cycle → FAIL.
// This module delivers all payloads in exactly 1 clock cycle; therefore
// the noc_stall predicate evaluates FALSE → P4 is FALSIFIED.
//
// Topology:
// DIE_COUNT ≤ 4 → full crossbar (single-cycle, all-to-all)
// DIE_COUNT ≥ 8 → ring documented but NOT synthesised here because
// ring requires >1-cycle latency and violates P4.
// Use crossbar shards for P4-compliant deployment.
//
// Parameters:
// DIE_COUNT Number of dies in the assembly (default 2; scales to 4).
// PAYLOAD_W Payload width in bits (default 64, matches Lane Y hyper-vector
// slot). Kept as PAYLOAD_W (not FLIT_W) to align naming with
// Lane Y holo_mux_1x2.
//
// Port conventions:
// vld_i[d] Sending die d asserts valid + payload this cycle.
// dst_i[d] Destination die index for die d's flit (log2(DIE_COUNT) bits).
// payload_i[d] PAYLOAD_W-bit payload from die d.
// payload_o[d] PAYLOAD_W-bit payload delivered TO die d (registered, 1 cycle).
// vld_o[d] Asserted when a valid flit is delivered to die d this cycle.
//
// Active-low synchronous reset.
//
// Author: Vasilev Dmitrii <admin@t27.ai>
// DOI: 10.5281/zenodo.19227877
// Anchor: φ²+φ⁻²=3
// =============================================================================
`default_nettype none
`timescale 1ns/1ps

module holo_noc_1cycle #(
parameter int FLIT_W = 32,
parameter int DIES = 2
parameter int unsigned DIE_COUNT = 2, // default 2; supports 4
parameter int unsigned PAYLOAD_W = 64 // default 64-bit hyper-vector slot
) (
input logic clk,
input logic rst_n,
input logic [FLIT_W-1:0] flit_in [DIES],
input logic vld_in [DIES],
output logic [FLIT_W-1:0] flit_out[DIES],
output logic vld_out [DIES],
output logic [$clog2(DIES+1)-1:0] latency_cycles
input logic clk,
input logic rst_n, // active-low sync reset

// Inputs: one slot per sending die
input logic [DIE_COUNT-1:0] vld_i,
input logic [$clog2(DIE_COUNT)-1:0] dst_i [DIE_COUNT],
input logic [PAYLOAD_W-1:0] payload_i[DIE_COUNT],

// Outputs: one slot per receiving die
output logic [DIE_COUNT-1:0] vld_o,
output logic [PAYLOAD_W-1:0] payload_o[DIE_COUNT]
);

// -------------------------------------------------------------------------
// Latency constant: always 1 cycle (registered output, combinational route)
// P4 falsification note (static assertion comment):
// Every path from payload_i to payload_o is a single registered stage.
// Latency = exactly 1 cycle. noc_stall is never > 1 cycle.
// Crossbar topology: DIE_COUNT ≤ 4 → all paths combinatorial before flop.
// -------------------------------------------------------------------------

// -------------------------------------------------------------------------
// Combinational crossbar fabric
// cbar_payload[dst][src] – candidate payload routed to destination
// cbar_vld[dst][src] – candidate valid routed to destination
//
// Priority: lowest source index wins when two sources target the same dst.
// No multiplier operators used anywhere (R-SI-1).
// -------------------------------------------------------------------------
assign latency_cycles = $clog2(DIES+1)'(1);

logic [PAYLOAD_W-1:0] cbar_payload [DIE_COUNT];
logic cbar_vld [DIE_COUNT];

always_comb begin
// Default: no valid flit for any destination
for (int d = 0; d < DIE_COUNT; d++) begin
cbar_payload[d] = '0;
cbar_vld[d] = 1'b0;
end

// Crossbar: iterate all sources; last one wins per dst (lowest-index priority
// is achieved by iterating sources in reverse so idx=0 overwrites highest)
for (int s = DIE_COUNT-1; s >= 0; s--) begin
if (vld_i[s]) begin
// dst_i[s] is $clog2(DIE_COUNT) bits wide — always in range 0..DIE_COUNT-1
cbar_payload[dst_i[s]] = payload_i[s];
cbar_vld[dst_i[s]] = 1'b1;
end
end
end

// -------------------------------------------------------------------------
// 1-cycle pipeline registers
// Routing: die[i] -> die[(i+1) % DIES] (swap pattern, no multipliers)
// For DIES=2: die0->die1, die1->die0
// Output pipeline register — imposes exactly 1-cycle latency
// -------------------------------------------------------------------------
always_ff @(posedge clk) begin
if (!rst_n) begin
for (int i = 0; i < DIES; i++) begin
flit_out[i] <= '0;
vld_out[i] <= 1'b0;
vld_o <= '0;
for (int d = 0; d < DIE_COUNT; d++) begin
payload_o[d] <= '0;
end
end else begin
for (int i = 0; i < DIES; i++) begin
// Cross-die route: source die index = (DIES - 1 - i) for swap pattern
// For 2 dies: i=0 receives from i=1; i=1 receives from i=0
// Computed without multiplier: src = (DIES - 1 - i)
flit_out[i] <= flit_in[DIES - 1 - i];
vld_out[i] <= vld_in[DIES - 1 - i];
for (int d = 0; d < DIE_COUNT; d++) begin
vld_o[d] <= cbar_vld[d];
payload_o[d] <= cbar_payload[d];
end
end
end

endmodule
// phi^2 + phi^-2 = 3
// DOI 10.5281/zenodo.19227877
// Vasilev Dmitrii <admin@t27.ai>
// ORCID 0009-0008-4294-6159
endmodule : holo_noc_1cycle
`default_nettype wire
// -----------------------------------------------------------------------------
// φ²+φ⁻²=3 · DOI 10.5281/zenodo.19227877
// P4 falsification: noc_stall ≤ 1 cycle; ring topology not synthesised here
// because ring requires multi-hop latency which violates P4.
// Vasilev Dmitrii <admin@t27.ai> · ORCID 0009-0008-4294-6159
// -----------------------------------------------------------------------------
Loading
Loading