Skip to content

Conversation

@fjtrujy
Copy link
Member

@fjtrujy fjtrujy commented Dec 16, 2025

Summary

Fix incorrect inline assembly register constraints in calculate_vertices and calculate_normals functions that cause crashes when Link-Time Optimization (LTO) is enabled.

Problem

The calculate_vertices and calculate_normals functions in ee/math3d/src/math3d.c use inline assembly with loops that modify the output, count, and vertices/normals pointer operands:

"addi		%0, 0x10	\n"   // output += 16
"addi		%2, 0x10	\n"   // vertices += 16
"addi		%1, -1		\n"   // count -= 1
"bne		$0, %1, 1b	\n"

However, these operands were declared as input-only ("r"):

: : "r" (output), "r" (count), "r" (vertices), "r" (local_screen)

This tells GCC that the register values are not modified by the assembly block, which is incorrect.

Impact

Without LTO, this bug is usually hidden because functions are compiled separately and the corrupted register values aren't reused.

With LTO enabled, GCC optimizes across function boundaries and may:

  • Inline the function
  • Assume the register values are unchanged after the asm block
  • Reuse the (now corrupted) pointer values for subsequent operations

This causes TLB Miss errors and crashes when programs using math3d (like gsKit's cube and hires examples) are compiled with LTO.

Example crash:

TLB Miss, pc=0x102338 addr=0x2000000 [store]

Solution

Change the constraints from input-only ("r") to read-write ("+r") for operands that are modified:

// Before:
: : "r" (output), "r" (count), "r" (vertices), "r" (local_screen)

// After:
: "+r" (output), "+r" (count), "+r" (vertices) : "r" (local_screen)

The "+r" constraint correctly tells GCC that these operands are both read and written.

Changes

ee/math3d/src/math3d.c

calculate_normals (line 519):

-   : : "r" (output), "r" (count), "r" (normals), "r" (local_light)
+   : "+r" (output), "+r" (count), "+r" (normals) : "r" (local_light)

calculate_vertices (line 649):

-   : : "r" (output), "r" (count), "r" (vertices), "r" (local_screen)
+   : "+r" (output), "+r" (count), "+r" (vertices) : "r" (local_screen)

Testing

Tested with:

  • GCC compiled with LTO support
  • gsKit examples (cube, hires) that use math3d functions
  • Full optimization flags: -O3 -flto -ftree-vectorize -ftree-slp-vectorize

Before fix: TLB Miss crashes
After fix: All examples run correctly

References

@uyjulian
Copy link
Member

Nice job

@uyjulian uyjulian merged commit 185cd58 into ps2dev:master Dec 17, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants