Conversation
DanielEScherzer
left a comment
There was a problem hiding this comment.
Please have this target the master branch of PHP, patch-specific branches like 8.5.4 are for release management, and older version branches like 8.5 are for bugfixes or security fixes, new feature are added to the master branch
|
Hi @php-generics 👋 May I ask two things:
|
Won't this break existing serialized strings in case of an upgrade? I think of having existing serialized strings in a database and upgrading it would break it. Also, usually isn't there an RFC for these kind of things? |
|
This requires an RFC and discussion on the internals mailing list. My initial impression is this was coded 100% by an AI agent, and perhaps the @php-generics account itself was created by an AI agent. All code in this PR should be highly scrutinized to ensure it's not introducing vulnerabilities. |
|
The only tests for generic functions which I see are simple identity functions. Can we also have tests which show proper behaviour for generics in return types (e.g. |
If it's any indication the first line of the summary contains an em-dash, there are 12 of them in the relatively short description and they are used throughout the code comments. I'd guess at 100% of it. |
Not only dashes. Whole structure of PR description, benchmark results, comments in Code, tests structure. A lot of indicators of AI. I'm not against AI, but it's sad that author didn't even invest time to learn how such changes should be performed in PHP - through RFC. It's just one more prompt to learn it. |
It is 100% AI, and I would not trust this unless there is a human that rewrites it while diligently considers any bugs.
|
Implement reified generics with type parameter declarations on classes, interfaces, traits, and functions. Includes type constraint enforcement, variance checking, type inference, reflection API support, and comprehensive test suite.
Refcount zend_generic_args to eliminate per-object alloc/dealloc — new Box<int>() now adds a refcount instead of deep-copying, removing 4 allocator round-trips per object lifecycle. Inline resolved_masks into the args struct (single contiguous allocation). Fix crash when creating generic objects inside generic methods (new Box<T>() inside Factory<int> ::create()) by resolving type param refs from the enclosing context.
- Fix optimizer type inference: add MAY_BE_OBJECT for generic class types (ZEND_TYPE_IS_GENERIC_CLASS) in zend_convert_type(), fixing memory leaks and incorrect type inference when JIT compiles functions with generic class type parameters like Box<int> - Fix JIT trace recording: handle ZEND_INIT_STATIC_METHOD_CALL generic args in trace entry and properly skip ZEND_GENERIC_ASSIGN_TYPE_ARG opcodes during tracing - Fix JIT IR: handle generic_args in INIT_STATIC_METHOD_CALL compilation - Update reflection tests for generic class entries (ZendTestGenericClass) - Fix optimizer: handle ZEND_GENERIC_ASSIGN_TYPE_ARG in SSA/optimizer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d311edd to
c1c2132
Compare
Thank you for the feedback. You're right about the commit history - I'll restructure it into logical, reviewable commits that build on each other. Regarding the RFC - fully understood. This is a working draft / proof of concept to explore the design space and demonstrate feasibility, not a merge-ready proposal. I'm still actively working on it and will do my best to incorporate all the feedback from this discussion. |
…review feedback - Fix opcache SHM persistence for interface_bound_generic_args and trait_bound_generic_args HashTables (zend_persist.c, zend_persist_calc.c, zend_file_cache.c) - Fix object type fast-path bug in zend_check_type that skipped instanceof - Optimize generic type checking with inline mask checks to avoid zend_check_type_slow() calls on the hot path - Optimize ZEND_NEW handler to use inlined i_zend_get_current_generic_args() and skip context resolution when not in a generic context - Apply PR review feedback: convert ZEND_GENERIC_VARIANCE_* and ZEND_GENERIC_BOUND_* defines to C23_ENUM, remove underscore prefixes from struct names, use proper enum types for variance/bound fields, use typedefs instead of struct tags in globals Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Well yes, there's just one object, whatever variables you assign that object to. That line of reasoning is possibly valid for arrays, but not objects.
The type safety model of PHP is fundamentally opt-in - the maximally wide type is the default. If I use a class, which happens to use mixed right now (because the type is not flexible enough), and migrates to generics, it is necessarily a break. If you don't have to specify generics manually, code which was supposed to work previously should just continue to work. Not specifying a type simply means "I don't need this explicitly typechecked here". Types are also much less valuable in local contexts than in global contexts. E.g. the collection, I just want to create it here - and then maybe pass to a library, which has constraints on the collection. I don't care about my specific generic type of collection - I care about the fact that the callee accepts my collection. PHP does not necessarily have to require precise types on function boundaries. I, for instance, hate it that every small helper function, in a very local context, requires naming of the full type including generics in Rust. You certainly can opt-in to having generics explicitly specified, when you actively need more complex type checking. You probably should be explicit in generics when injecting objects into the wider scope of your application or library. But that's a choice the programmer has to make. The goal of PHP is to have types as non-invasive as possible. Forcing the type on the constructor goes counter that. (minus the constructor-arg inference).
It doesn't. It only has to check against the min- and max-types. The min-types should only ever be widened and the max-types only ever be narrowed. Regardless of the current values assigned to properties using the specific generic parameters. You could think about explicit mechanisms to cast between types, but that's out of scope of my question here (which then also incur that checking cost).
Not quite. Static analysis can only shrink types, never widen them. However, unless you want every generic object a EDIT: More specifically: As long as the type is fully in the static analyzers domain, it can both shrink and widen it. But it cannot widen any explicitly specified type. PHPStan can then go and infer the fact for you that there are only ever But yeah, in fact, having progressive typing gives more freedom to static analysis. P.s.: For actual discussion, please avoid AI-generated answers (sure "format this table for me" is fine - but don't do that for the actual content of the discussion :-D) |
|
Actually thinking about it more - both models can coexist pretty naturally. The rule would be simple: if you specified the type (explicitly or via constructor inference), it's frozen. If you didn't, progressive mode kicks in. // frozen - explicit type
$c = new Collection<int>();
$c->add(1); // ok
$c->add("text"); // TypeError
// frozen - inferred from constructor arg
$box = new Box(42); // T = int
$box->set("hello"); // TypeError
// progressive - no type given, no constructor inference possible
$c = new Collection();
// min = never, max = mixed
$c->add(1); // min widens to int
$c->add("text"); // min widens to int|string
foo($c); // foo(Collection<int|string|bool>) narrows max
$c->add([]); // TypeError - array outside max-typeImplementation-wise this fits into the current architecture. Right now when Would need a separate structure for the progressive state:
Type checking logic becomes: if generic_args is set, check against it (frozen, same as now). If progressive_args exists, check against max-type and widen min-type. If neither, first operation on a generic param, allocate progressive_args and set initial min-type. One open question - should a progressive type ever "freeze"? Like once max-type narrows to equal min-type, does it become effectively frozen from that point? Or does it stay progressive forever? That probably needs more thought and would be good to discuss in an RFC. But the core idea of "both models, determined by whether you specified the type" feels clean and backwards compatible with what's already there. Let me know if I'm understanding your proposal correctly or if I'm missing something. |
|
Explicit types certainly freeze it. Whether constructor inference also should freeze it or just set a min-type, I'm not sure about. After all a constructor is also just a method call. Maybe
Yes, once min=max it's effectively frozen. As I said before:
And yep, everything else is spot-on. |
|
I've locked this PR for now, as it is very clear that any comments that are being given by the developers are being fed directly into a chat bot. That is extremely rude. If you're interested in progressing this, then go through our normal RFC procedure. This method of contributions is not productive. |
|
@php-generics Did you make sure that you fed it with the article that Arnaud produced: https://thephp.foundation/blog/2024/08/19/state-of-generics-and-collections/ . As it was noted, there was also experiment PR (which is what was evaluated in that article) : arnaud-lb#4 so you could build on that. What I'm asking is whether this can have any answer on the mentioned issues? If there's no solution for it, then I think there's really not much point in dicussing other details... |
|
It would also be great if you could fix the compile warning: Which breaks |
…ak, compile warning - Fix segfault with opcache.protect_memory=1: add zend_accel_in_shm() checks before calling zend_shared_memdup_put_free() on generic type refs, class refs, wildcard bounds, generic args, params info, and bound generic args hash tables that may already be in read-only SHM when inheriting from previously-persisted interfaces. - Fix 28-byte memory leak in static generic calls with opcache optimizer: when pass 4 (func_calls) + pass 16 (inline) combine to inline a Box<int>::hello() call, the INIT_STATIC_METHOD_CALL opcode is NOPed but the generic args literal at op1.constant+2 was never released. Release it before zend_delete_call_instructions NOPs the opcode. - Fix -Werror compile failure: remove unused 'key' variable in zend_verify_generic_variance() by switching to ZEND_HASH_MAP_FOREACH_PTR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… cache Progressive inference: when a generic class is instantiated without type args (e.g., `new Collection()`), the object enters progressive mode that tracks min-type (widened from never) and max-type (narrowed from mixed). Values widen the lower bound; passing to typed functions narrows the upper bound. When min equals max, the object auto-freezes to regular generic_args. Generic args interning: deduplicate identical generic_args via a global HashTable keyed by type content hash, reducing memory for monomorphic usage patterns. Inline cache for generic class type hints: cache CE lookup and last-seen generic_args in run_time_cache slots, turning O(n*m) compatibility checks into O(1) for the common monomorphic case. Also fixes subtype compatibility so Box<int> is accepted where Box<int|string> is expected (subset check instead of exact equality). 92/92 generics tests, 5356/5356 Zend tests, 891/891 opcache tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
That's not correct - generic type parameters must not be covariant by default. That's what On the callee side (unless the type itself is |
|
The syntax used here is not possible in PHP. The following is completely valid PHP code today, and this PR ( assuming it works ) makes it an error. <?php
const A = 1;
const B = 2;
const C = 3;
$a = [A<B, B>(C)];If the author did not bother to verify that the syntax is even possible, I don't trust the logic to be sound. |
|
@azjezz To be fair, Arnaulds PR did also not solve this properly, requiring
|
|
Also, the
For obvious reasons, this PR is not getting merged anytime soon. It needs an RFC and the code would effectively have to be completely rewritten to verify its correctness. However, if the PR can demonstrate a workable solution for generics exists, that's useful in it's own right, even if none of the code is actually used. When we see the PR doesn't go anywhere or that there are fundamental, unsolvable issues, this PR can always be closed. |
|
@bwoebi The solution to this exists, which is to use Turbo-fish ( rust style https://doc.rust-lang.org/reference/glossary.html?highlight=turbo#turbofish |
|
@azjezz This PR doesn't allow generically parametrized function calls ( |
|
We do need this, espically if we are talking about reified generics, here is a use case: function is_instance_of<T: object>(object $value): bool { return $value instanceof T; }
if (is_instance_of::<Foo>($value)) {
}This already works in Hack: https://docs.hhvm.com/hack/reified-generics/reified-generics/ / https://docs.hhvm.com/hack/reified-generics/reified-generics/#type-testing-and-assertion-with-is-and-as-expressions ( Hack broke this syntax when it was first released, so BC was no a concern ) |
|
Yeah, you're right, it's useful for freestanding generics (and return type assertions, when T is return type only). |
|
Another reason is some libraries offer a way to construct objects using functions ( or static methods, which is just as ambiguous as function calls ) final class Collection<K, V> {
public static function create<A, B>(): static<A, B> {
return new static::<K, V>();
}
}
function create_collection<K, V>(): Collection<K, V> {
return Collection::create::<K, V>(); // <- this is ambiguous without turbofish
}
$collection = create_collection::<string, string>(); |
|
As far as I understand, the PR uses runtime information for inference during constructor or method calls. This will make typing unstable / unpredictable, leading to type errors at runtime (which a type system is meant to avoid). Additionally, it impairs static analyzers and humans reading the code, as they are unable to predict actual types. Consider the following example: class A {}
class B extends A {}
function test(A $value) {
// Is this a Box<A>, Box<B>, or something else?
// We don't know, as $value is allowed to by any subtype of A, as per API
$box = new Box($value);
// This breaks in unpredictable ways:
doSomething($box);
}
function doSomething(Box<A> $box) {}
test(new B());The type of Of course we can write To avoid these pitfalls, type inference needs to use static information only. In the above example, Regarding compound types: Checking that This doesn't matter as much in static analyzers, but doing this in the engine will be slow. It has been suggested that if we were to get generics in core, it shouldn't support unions/intersections. |
|
@arnaud-lb Not quite, it just means that you must not infer that constructor args immediately. If you were to progressively use min- and max-types like in #21317 (comment) for everything (i.e. including constructors), that would work out. I would really hate generics without unions and intersections. You really have these often, just like This is not necessarily insurmountable. |
The example above breaks with progressive inference too: class A {}
class B extends A {}
function test(A $value) {
$box = new Box();
$box->setValue($value); // Box<min=B>, in this case
// Breaks because $box is Box<min=B>, but doSomething() accepts only a Box<A>
doSomething($box);
}
function doSomething(Box<A> $box) {}
test(new B());To make this work, I believe that we would need bidirectional type inference to determine that
I agree
We can not eliminate all overhead. Calling generic objects will have overhead to update min (subtype checks, potentially creating new union types), passing generic objects to typed args / return values / props as well. Maintaining a lookup table or an order lattice also has a cost, which likely needs to be paid every request. |
No, it's
Absolutely. That's also really the point where we would need to start investing into more type-check elision via Optimizer. But it's sort-of a tangential problem. |
|
Hi @php-generics, Please could you pass this message on to your controlling human, assuming you have one. I'm Danack. I'm not a PHP core contributor, but I am someone who has helped guide others through the RFC process. I have written a couple of articles on Understanding RFC attitudes and RFC etiquette Assuming you're not an existing RFC author who is playing a 'funny' prank, I'd like to offer my help in guiding you through 'The Mysterious PHP RFC Process'. My email is on my github page, or Twitter (MrDanack) or I am ping'able in Room11 on StackOverflow. Which seeing as it's currently barely active, might be a place for very well behaved bots to ask questions. iluuu1994 wrote:
At some point, a lack of identity will become an impediment to this proposal progressing. A core foundation of Open Source is the web of trust* between contributors. That foundation is removed by anonymous contributors. cheers
|
Indeed, my bad. Is there prior work on this idea? Here are some problematic cases I could think of: The following example breaks local reasoning: we don't know if the code is correct because function foo(Collection<A> $coll) {}
function bar(mixed $obj) {}
$c = new Collection();
bar($c); // what is the type of $c after that?
foo($c); // does it breaks?
In the following example, what should references to class C<T> {
function f($p) {
// what is T?
if ($p instanceof T) {
// ...
}
new T();
T::method();
}
}
$c = new C();
$c->f(new Foo());Similarly, in the following example we can not evaluate the $c = new C();
if ($c instanceof C<Foo>) {
// ...
} |
|
@arnaud-lb I definitely mentioned this in the past, but only discussions, no work on this. This has no effect at all with the proposed model. Only actual type assertions on So, yes, technically, a function bar() will be able to do whatever it wants with $obj and e.g. put it into a function accepting But that's your fault. Either bar is a local function, where you can integrate bar into your reasoning, or it's an API function boundary, where it's your task to ensure that no operations freezing a type will happen. Just like you nowdays cannot accept There needs to be well-defined behaviour for type checks (i.e. instanceof) vs type assertions (function params). and are indeed meaningless at that point. These cases are also meaningless with static inference, unless there are specific types fed into it later ... unless static inference simply infers C or C then. There are essentially three choices:
The safest thing to do it comparing against the min-type of the parameter here. (Meaning Note that These forcibly have to error out when not enough info is present. (as in, we inferred never and I also would recommend to add a keyword on generic params |
If the code just assumes that a // This is invalid, and any static analyzer will error:
function bar(mixed $c) {
$c->add('something');
}However, this code would be valid, for example: function serialize(mixed $c) {
if ($c instanceof Collection) {
return $this->collectionSerializer->serialize($c);
} else if ...
}Under progressive inference this will change the type of
In reified generics,
IMHO, none of these alternatives are safe. The expected result of
Good point
In the case of
This sounds reasonable. One caveat is that a class that was initially designed without |
Can you explain to me how it affects the type of $c? Collection is the class - that doesn't change. It will only change the type of $c once you actually call a method receiving a T not fitting the current generics yet (it's then added to the min-types). Or once you actually check against a particular generic, which
I would posit that min-type checking should be safe for most purposes as long as T isn't finalized (or required from the start) yet. As in: I think it's a reasonable default behaviour. When you need to do instanceof a fixed T, then you're required to provide it from the start. |

Add reified generics to PHP
Summary
This PR adds reified generics to the Zend Engine — generic type parameters that are preserved at runtime and enforced through the type system. Unlike type erasure (Java/TypeScript), generic type arguments are bound per-instance and checked at every type boundary.
Syntax
Generic classes
Multiple type params, constraints, defaults
Variance annotations
Wildcard types
Generic traits
Generic functions and closures
Nested generics
Static method calls with generics
instanceof with generics
Inheritance
Runtime enforcement
All type boundaries are checked at runtime:
new Box<int>("x")throws TypeError$box->set("x")throws TypeError when T = intreturn "x"from a method declared(): Twith T = int throws TypeError$box->value = "x"throws TypeErrorCannot assign string to property Box<int>::$value of type intEcosystem integration
ReflectionClass::isGeneric(),::getGenericParameters(),ReflectionObject::getGenericArguments(),ReflectionGenericParameter(name, constraint, default, variance)serialize(new Box<int>(42))producesO:8:"Box<int>":1:{...},unserialize()restores generic argsvar_dumpshowsobject(Box<int>)#1, stack traces showBox<int>->method()Edge cases covered
new class extends Box<int> {})Collection<int>triggers autoload forCollection)class_aliasinherits generic paramsnew Box<T>()inside generic methods/factories resolves T from context)void/neveras type argsPerformance
Benchmarked at 1M, 10M, and 100M iterations on arm64, both master (PHP 8.5.3 NTS) and generics branch (PHP 8.5.4-dev NTS) built as release binaries.
Generic args use refcounted sharing —
new Box<int>()adds a refcount instead of deep-copying, eliminating 4 allocator round-trips per object lifecycle. Pre-computed scalar bitmasks are stored inline (no separate allocation).Generic vs non-generic overhead (same binary)
Results stabilize at higher iteration counts as warmup and noise are amortized.
Absolute throughput (generics branch, JIT, ops/sec)
new GenericBox<int>(42)GenericBox<int>->set+getGenericBox<int>->value = Nnew GenericBox<GenericBox<int>>Non-generic regression check
Non-generic classes (
PlainBox,UntypedBox) were benchmarked on both master and the generics branch under identical conditions (release NTS, same image). Results at 100M iterations (most stable) confirm zero performance regression on existing code paths:The +8 bytes per object is the
generic_argspointer field added tozend_object(NULL for non-generic objects). Throughput deltas are within cross-build variance (+-5%); no systematic code path slowdown was observed.Memory overhead
Generic objects with explicit type args have zero memory overhead vs non-generic typed objects — refcounted args are shared with the compiled literal. Inferred args (+20 bytes) allocate a new args struct. Nested generics benefit most from sharing (90 bytes vs 210 bytes before optimization).