Skip to content

Comments

feat(python): add bfloat16 and bfloat16_array support#3329

Open
asadjan4611 wants to merge 24 commits intoapache:mainfrom
asadjan4611:feat/python-bfloat16-support
Open

feat(python): add bfloat16 and bfloat16_array support#3329
asadjan4611 wants to merge 24 commits intoapache:mainfrom
asadjan4611:feat/python-bfloat16-support

Conversation

@asadjan4611
Copy link

Why?

This PR implements bfloat16 (Brain Float 16) and bfloat16_array support for Fory Python runtime and codegen, addressing issue #3289. This enables using bfloat16 in FDL to reduce payload size while keeping a wide exponent range, which is common in ML/AI workflows.

What does this PR do?

This PR adds comprehensive bfloat16 support to Fory Python:

Core Implementation

  • BFloat16 Type: Cython implementation with IEEE 754 compliant float32↔bfloat16 conversions (round-to-nearest, ties-to-even)
  • BFloat16Array: Python-visible array type backed by array.array('H') for packed contiguous storage
  • Serializers: Both scalar (BFloat16Serializer) and array (BFloat16ArraySerializer) serializers
  • Type Registration: Registered with TypeId.BFLOAT16 (18) and TypeId.BFLOAT16_ARRAY (54)

Integration Points

  • Buffer Operations: Added write_bfloat16() and read_bfloat16() methods
  • Codegen Support: Added bfloat16 to codegen type mapping
  • Row Format: Added bfloat16() factory function (temporarily maps to float16 until C++ row format supports it)
  • Type System: Fully integrated into Fory type resolver

Testing

  • 11 comprehensive test cases covering:
    • Basic operations and conversions
    • Special values (NaN, ±Inf, ±0)
    • Serialization round-trips
    • Array operations
    • Integration with dataclasses, lists, and maps
    • Type registration verification

Code Quality

  • Follows existing float16 implementation patterns
  • Matches project code standards and style
  • All files include proper Apache 2.0 license headers
  • No linter errors

Related issues

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
    • Yes: Adds BFloat16, BFloat16Array types and bfloat16() factory function
  • Does this PR introduce any binary protocol compatibility change?
    • No: Uses existing TypeId.BFLOAT16 (18) already defined in protocol spec

Implementation Details

Wire Format

  • Encodes bfloat16 as 2 bytes representing raw IEEE 754 bfloat16 bit pattern
  • Little endian byte order (matches existing float32/float64 behavior)
  • NaN/Inf/±0/subnormal values round-trip correctly at bit level

Type System

  • Type ID: 18 (BFLOAT16) - already defined in xlang serialization spec
  • Array Type ID: 54 (BFLOAT16_ARRAY)
  • Protocol compliant with existing xlang serialization format

Performance

  • Uses Cython for performance-critical conversion operations
  • Zero-copy array operations using array.array('H')
  • Follows same optimization patterns as existing float16 implementation

- Add BFloat16 Cython type with IEEE 754 compliant conversions
- Add BFloat16Array class backed by array.array('H')
- Implement serializers for scalar and array types
- Register types in type resolver (TypeId.BFLOAT16 = 18, TypeId.BFLOAT16_ARRAY = 54)
- Add buffer read/write methods for bfloat16
- Add codegen support for bfloat16
- Add row format support (with temporary float16 mapping until C++ support)
- Add comprehensive test suite with 11 test cases covering all edge cases
- Follow existing float16 implementation patterns

Fixes apache#3289
- Change single quotes to double quotes (ruff format requirement)
- Remove trailing whitespace
- Add blank lines after imports (PEP 8)
- Remove unused import (pyfory)
- Fix closing parenthesis alignment
- Remove invalid Cython type casts (<BFloat16>) in serialization.pyx and primitive.pxi
- Use isinstance() check instead of type casting for Python classes
- Fix bfloat16() function to use float16() as temporary workaround until C++ support is added
- Comment out bfloat16() declaration in libformat.pxd with TODO for future C++ implementation
Replace unsafe pointer casts with memcpy to ensure cross-platform
compatibility across all OS versions (Windows, Linux, macOS) and
architectures (x86_64, ARM). This fixes strict aliasing violations
that cause compilation failures on ARM and newer compilers.
Replace sizeof(float) with explicit constant 4 in memcpy calls to ensure
cross-platform compatibility, especially on ARM architectures where
sizeof() may cause compilation issues. This matches the project's
pattern of using explicit size constants (as seen in types.py).

Fixes build failures on:
- ubuntu-24.04-arm (aarch64)
- macos-arm64 (Apple Silicon)
- ubuntu-24.04-arm with Python 3.13
@asadjan4611
Copy link
Author

@chaokunyang please review my PR and this is very interesting Project and i learn a lot of things from this issue .

@asadjan4611
Copy link
Author

@komamitsu @chaokunyang
please review my PR ,i wanna to work on another issue still waiting from the reviewer to check this PR.

)
register(float, type_id=TypeId.FLOAT64, serializer=Float64Serializer)
# BFloat16 is optional if the extension module is unavailable.
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should always be available, could you remove the tra excep clause

serializer=PyArraySerializer(self.fory, ftype, typeid),
)
# BFloat16Array is optional if the extension module is unavailable.
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

cpdef inline read_nullable_bfloat16(Buffer buffer):
if buffer.read_int8() == NOT_NULL_VALUE_FLAG:
from pyfory.bfloat16 import BFloat16
return BFloat16.from_bits(buffer.read_bfloat16())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to create a bfloat.pxd, so we can import it in buffer.pyx and make buffer.read_bfloat16() return BFloat16 directly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And could you rename BFloat16 to bfloat16? This is a primitive type, use lowercase name style make it looks like buildin

return False


cdef class XlangCompatibleSerializer(Serializer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you merge main branch, we've removed the xwrite/xread API, and unified API in #3348

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This XlangCompatibleSerializer is not needed anymore

self.type_id = type_id
self.itemsize = 2

def xwrite(self, buffer, value):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for xwrite/xread, we don't haev such API anymore

return False


class XlangCompatibleSerializer(Serializer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this, we've removed it in #3348

…llection.pxi

Cython cpdef functions do not support keyword arguments when called
from C code. Changed all read_no_ref(buffer, serializer=...) calls
to use positional arguments read_no_ref(buffer, serializer) instead.
…ruct.py and collection.py

Cython cpdef functions do not support keyword arguments when called
from C code. Changed all xwrite_ref, xread_ref, write_no_ref, and
read_no_ref calls to use positional arguments instead of keyword
arguments (serializer=...).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] add bfloat16 and bfloat16_array (Cython, no numpy)

2 participants