Conversation
|
CI failing due to the release of snowballstemmer: https://pypi.org/project/snowballstemmer/#history |
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM! Thanks for the fix. Added a few notes to the review for my future self and other reviewers :)
| with fo.create(overwrite=True) as fos: | ||
| with pq.ParquetWriter(fos, schema=arrow_table.schema, **parquet_writer_kwargs) as writer: | ||
| with pq.ParquetWriter( | ||
| fos, schema=arrow_table.schema, store_decimal_as_integer=True, **parquet_writer_kwargs |
There was a problem hiding this comment.
"""
By default, this is DISABLED and all decimal types annotate fixed_len_byte_array. When enabled, the writer will use the following physical types to store decimals:
- int32: for 1 <= precision <= 9.
- int64: for 10 <= precision <= 18.
- fixed_len_byte_array: for precision > 18.
"""
from https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html

There was a problem hiding this comment.
this matches the parquet data type mapping for decimal
https://iceberg.apache.org/spec/#parquet
Looks like arrow doesnt have decimal32 support yet |
|
@kevinjqliu Good call, looking at the code, it seems like it will automatically map it to INT32/INT64: https://github.com/apache/arrow/blob/598938711a8376cbfdceaf5c77ab0fd5057e6c02/cpp/src/parquet/arrow/schema.cc#L380-L392 |
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM! Thanks for adding the comments.
I double checked that decimal128 also errors for precision >38
>>> pyarrow.decimal128(38+1, 0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/types.pxi", line 4718, in pyarrow.lib.decimal128
File "pyarrow/types.pxi", line 4768, in pyarrow.lib.decimal128
ValueError: precision should be between 1 and 38
# Rationale for this change Resolves apache#1979 # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->

Rationale for this change
Resolves #1979
Are these changes tested?
Are there any user-facing changes?