feat(python): add bfloat16 and bfloat16_array support#3329
feat(python): add bfloat16 and bfloat16_array support#3329asadjan4611 wants to merge 24 commits intoapache:mainfrom
Conversation
- Add BFloat16 Cython type with IEEE 754 compliant conversions
- Add BFloat16Array class backed by array.array('H')
- Implement serializers for scalar and array types
- Register types in type resolver (TypeId.BFLOAT16 = 18, TypeId.BFLOAT16_ARRAY = 54)
- Add buffer read/write methods for bfloat16
- Add codegen support for bfloat16
- Add row format support (with temporary float16 mapping until C++ support)
- Add comprehensive test suite with 11 test cases covering all edge cases
- Follow existing float16 implementation patterns
Fixes apache#3289
- Change single quotes to double quotes (ruff format requirement) - Remove trailing whitespace - Add blank lines after imports (PEP 8) - Remove unused import (pyfory) - Fix closing parenthesis alignment
- Remove invalid Cython type casts (<BFloat16>) in serialization.pyx and primitive.pxi - Use isinstance() check instead of type casting for Python classes - Fix bfloat16() function to use float16() as temporary workaround until C++ support is added - Comment out bfloat16() declaration in libformat.pxd with TODO for future C++ implementation
Replace unsafe pointer casts with memcpy to ensure cross-platform compatibility across all OS versions (Windows, Linux, macOS) and architectures (x86_64, ARM). This fixes strict aliasing violations that cause compilation failures on ARM and newer compilers.
Replace sizeof(float) with explicit constant 4 in memcpy calls to ensure cross-platform compatibility, especially on ARM architectures where sizeof() may cause compilation issues. This matches the project's pattern of using explicit size constants (as seen in types.py). Fixes build failures on: - ubuntu-24.04-arm (aarch64) - macos-arm64 (Apple Silicon) - ubuntu-24.04-arm with Python 3.13
|
@chaokunyang please review my PR and this is very interesting Project and i learn a lot of things from this issue . |
|
@komamitsu @chaokunyang |
python/pyfory/registry.py
Outdated
| ) | ||
| register(float, type_id=TypeId.FLOAT64, serializer=Float64Serializer) | ||
| # BFloat16 is optional if the extension module is unavailable. | ||
| try: |
There was a problem hiding this comment.
this should always be available, could you remove the tra excep clause
python/pyfory/registry.py
Outdated
| serializer=PyArraySerializer(self.fory, ftype, typeid), | ||
| ) | ||
| # BFloat16Array is optional if the extension module is unavailable. | ||
| try: |
python/pyfory/serialization.pyx
Outdated
| cpdef inline read_nullable_bfloat16(Buffer buffer): | ||
| if buffer.read_int8() == NOT_NULL_VALUE_FLAG: | ||
| from pyfory.bfloat16 import BFloat16 | ||
| return BFloat16.from_bits(buffer.read_bfloat16()) |
There was a problem hiding this comment.
do you need to create a bfloat.pxd, so we can import it in buffer.pyx and make buffer.read_bfloat16() return BFloat16 directly
There was a problem hiding this comment.
And could you rename BFloat16 to bfloat16? This is a primitive type, use lowercase name style make it looks like buildin
python/pyfory/serialization.pyx
Outdated
| return False | ||
|
|
||
|
|
||
| cdef class XlangCompatibleSerializer(Serializer): |
There was a problem hiding this comment.
Could you merge main branch, we've removed the xwrite/xread API, and unified API in #3348
There was a problem hiding this comment.
This XlangCompatibleSerializer is not needed anymore
python/pyfory/serializer.py
Outdated
| self.type_id = type_id | ||
| self.itemsize = 2 | ||
|
|
||
| def xwrite(self, buffer, value): |
There was a problem hiding this comment.
ditto for xwrite/xread, we don't haev such API anymore
python/pyfory/_serializer.py
Outdated
| return False | ||
|
|
||
|
|
||
| class XlangCompatibleSerializer(Serializer): |
…llection.pxi Cython cpdef functions do not support keyword arguments when called from C code. Changed all read_no_ref(buffer, serializer=...) calls to use positional arguments read_no_ref(buffer, serializer) instead.
…ruct.py and collection.py Cython cpdef functions do not support keyword arguments when called from C code. Changed all xwrite_ref, xread_ref, write_no_ref, and read_no_ref calls to use positional arguments instead of keyword arguments (serializer=...).
Why?
This PR implements bfloat16 (Brain Float 16) and bfloat16_array support for Fory Python runtime and codegen, addressing issue #3289. This enables using bfloat16 in FDL to reduce payload size while keeping a wide exponent range, which is common in ML/AI workflows.
What does this PR do?
This PR adds comprehensive bfloat16 support to Fory Python:
Core Implementation
array.array('H')for packed contiguous storageBFloat16Serializer) and array (BFloat16ArraySerializer) serializersIntegration Points
write_bfloat16()andread_bfloat16()methodsbfloat16()factory function (temporarily maps to float16 until C++ row format supports it)Testing
Code Quality
float16implementation patternsRelated issues
Does this PR introduce any user-facing change?
BFloat16,BFloat16Arraytypes andbfloat16()factory functionImplementation Details
Wire Format
Type System
Performance
array.array('H')