-
Notifications
You must be signed in to change notification settings - Fork 0
fix: encode non ascii characters #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughModified Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
genlayer_py/contracts/actions.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
genlayer_py/contracts/actions.py (1)
genlayer_py/provider/provider.py (1)
make_request(19-46)
🪛 Ruff (0.14.2)
genlayer_py/contracts/actions.py
47-47: Do not use bare except
(E722)
🔇 Additional comments (1)
genlayer_py/contracts/actions.py (1)
45-52: I need to gather more information about the backend expectations and the encoding approach. Let me search for tests, documentation, and similar patterns:Now I need to search for tests, documentation about the RPC method expectations, and similar patterns:
Let me search more directly for tests and other similar RPC method calls in the codebase:
Based on my research, I have gathered sufficient information about the encoding pattern and the bare except anti-pattern. Let me now generate the rewritten review comment:
Replace bare
except:with specific exception handling and verify backend RPC requirements.The bare
except:clause (line 47) silently catches all exceptions, which you should avoid in Python programs. By not specifying an exception type, you not only hide this error but you also lose information about the error itself.Since
eth_utils.encode_hex()raisesTypeErrorfor unsupported input types, catch that specifically:try: encoded_contract_code = eth_utils.hexadecimal.encode_hex(contract_code) except TypeError: encoded_contract_code = contract_codeAdditionally, verify that the RPC method
gen_getContractSchemaForCodeactually accepts raw (non-encoded) contract code as a fallback. The standard Ethereum JSON-RPC pattern typically expects hex-encoded data. If raw code is never valid, remove the fallback and let the encoding error propagate.
| try: | ||
| encoded_contract_code = eth_utils.hexadecimal.encode_hex(contract_code) | ||
| except: | ||
| encoded_contract_code = contract_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace bare except with specific exception types and fix the fallback logic.
The current implementation has critical issues:
-
Bare
exceptis dangerous - it catches all exceptions includingSystemExitandKeyboardInterrupt, which can mask serious problems and make the application harder to debug. -
The fallback doesn't solve encoding problems - if
encode_hexfails (e.g., becausecontract_codeis a string instead of bytes), simply passing the original input won't fix the encoding issue. This just defers the error to the RPC layer or causes unexpected behavior. -
Missing type validation -
eth_utils.hexadecimal.encode_hexexpectsbytes. Ifcontract_codeis astrcontaining non-ASCII characters, it will raise aTypeError, and the fallback won't properly encode those characters.
Apply this diff to properly handle both string and bytes inputs with specific exception handling:
- try:
- encoded_contract_code = eth_utils.hexadecimal.encode_hex(contract_code)
- except:
- encoded_contract_code = contract_code
+ try:
+ if isinstance(contract_code, str):
+ # Encode string to bytes using UTF-8 to handle non-ASCII characters
+ contract_bytes = contract_code.encode('utf-8')
+ else:
+ contract_bytes = contract_code
+ encoded_contract_code = eth_utils.hexadecimal.encode_hex(contract_bytes)
+ except (TypeError, AttributeError, UnicodeDecodeError) as e:
+ raise GenLayerError(f"Failed to encode contract code: {str(e)}") from eAlternatively, if the backend accepts both hex-encoded and non-encoded inputs, document this behavior and handle it explicitly:
- try:
- encoded_contract_code = eth_utils.hexadecimal.encode_hex(contract_code)
- except:
- encoded_contract_code = contract_code
+ # Handle both string and bytes inputs
+ if isinstance(contract_code, str):
+ try:
+ # Try to encode string to bytes with UTF-8 for non-ASCII support
+ contract_bytes = contract_code.encode('utf-8')
+ encoded_contract_code = eth_utils.hexadecimal.encode_hex(contract_bytes)
+ except (UnicodeEncodeError, TypeError) as e:
+ raise GenLayerError(f"Failed to encode contract code string: {str(e)}") from e
+ elif isinstance(contract_code, bytes):
+ try:
+ encoded_contract_code = eth_utils.hexadecimal.encode_hex(contract_code)
+ except TypeError as e:
+ raise GenLayerError(f"Failed to hex-encode contract code bytes: {str(e)}") from e
+ else:
+ raise GenLayerError(f"Invalid contract_code type: expected str or bytes, got {type(contract_code)}")Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.2)
47-47: Do not use bare except
(E722)
🤖 Prompt for AI Agents
In genlayer_py/contracts/actions.py around lines 45-48, replace the bare except
and weak fallback with explicit type checks and targeted exception handling:
ensure contract_code is validated and normalized to bytes (if it's a str, encode
it with UTF-8 or detect a hex string starting with "0x" and treat it
accordingly), call eth_utils.hexadecimal.encode_hex only on bytes, catch and
handle specific exceptions (TypeError, ValueError) and surface a clear error (or
raise) if encoding fails instead of silently returning the original input; if
the backend legitimately accepts already-hex strings, explicitly detect that
case and pass-through the validated hex string, otherwise raise a descriptive
exception explaining the invalid type/encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeRabbit eth_utils.hexadecimal.encode_hex expects a string and returns a string then encoded_contract_code is always a string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
eth_utils encode_hex function signature parameters
💡 Result:
Signature: encode_hex(value) -> str
Parameters:
- value: bytes-like (a bytes or bytearray) or str containing raw bytes — the binary data to encode as hex.
Returns:
- str: hex string with "0x" prefix (e.g. '0x010203').
Sources: [1], [2].
@danielrc888, thank you for the clarification! You're right that encode_hex returns a string. However, there's an important nuance about its input:
According to the documentation, eth_utils.hexadecimal.encode_hex accepts bytes-like objects (bytes/bytearray) or str containing raw bytes as input, not Unicode strings. In Python 3, if contract_code is a regular string (Unicode) containing non-ASCII characters (which seems to be what this PR is fixing based on "DXP-707-encoding-error-for-non-ascii-characters"), calling encode_hex directly may fail or produce unexpected results.
Key questions:
- What type is
contract_codetypically in your use cases - is it astr(Unicode string) orbytes? - When the function fails (which triggers the except block), what is the actual exception being raised?
The core issue remains: The bare except masks the real problem. If encode_hex fails with non-ASCII characters, simply returning the original contract_code doesn't solve the encoding issue - it just defers the error.
Suggested approach for handling non-ASCII characters:
# If contract_code is a Unicode string, encode it to bytes first
if isinstance(contract_code, str):
contract_bytes = contract_code.encode('utf-8')
encoded_contract_code = eth_utils.hexadecimal.encode_hex(contract_bytes)
elif isinstance(contract_code, bytes):
encoded_contract_code = eth_utils.hexadecimal.encode_hex(contract_code)
else:
raise GenLayerError(f"Invalid contract_code type: {type(contract_code)}")Could you share what the actual error was that prompted this fix? That would help us determine the best solution.
🧠 Learnings used
Learnt from: danielrc888
Repo: genlayerlabs/genlayer-py PR: 39
File: genlayer_py/consensus/consensus_main/decoder.py:14-16
Timestamp: 2025-06-13T17:01:47.303Z
Learning: `Web3.to_bytes(hexstr=…)` accepts hex strings without the `0x` prefix because it ultimately calls `eth_utils.decode_hex`, which strips the prefix internally.
Fixes DXP-707
Note: Merge the studio PR before this one
genlayerlabs/genlayer-studio#1344
Summary by CodeRabbit