Skip to content

⚡️ Speed up function unpad_aes by 36%#65

Open
codeflash-ai[bot] wants to merge 1 commit intomasterfrom
codeflash/optimize-unpad_aes-mkql8tz1
Open

⚡️ Speed up function unpad_aes by 36%#65
codeflash-ai[bot] wants to merge 1 commit intomasterfrom
codeflash/optimize-unpad_aes-mkql8tz1

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai Bot commented Jan 23, 2026

📄 36% (0.36x) speedup for unpad_aes in pdfminer/utils.py

⏱️ Runtime : 110 microseconds 81.1 microseconds (best of 5 runs)

📝 Explanation and details

The optimized code achieves a 35% speedup by replacing Python-level iteration with faster C-implemented bytes operations for padding validation.

Key Optimization

The original code used all(x == padding for x in padded[-padding:]) to validate padding bytes. This involves a Python generator expression that iterates byte-by-byte, incurring significant Python interpreter overhead. The line profiler shows this check consuming 58.5% of total runtime (283.6µs out of 485.1µs).

The optimization replaces this with:

  1. Direct bytes comparison: padded[-padding:] == bytes((padding,)) * padding uses C-level bytes equality checking instead of Python iteration
  2. Special handling for padding==0: Uses padded.count(b"\x00") instead of the generic check, leveraging an optimized C implementation

Why It's Faster

  • C-level operations: Both bytes.__eq__() and bytes.count() are implemented in C and operate on contiguous memory, avoiding Python's per-element overhead
  • Single operation vs iteration: Direct slice comparison executes as one native operation rather than iterating through each byte in Python
  • Reduced branch misprediction: The bytes comparison likely benefits from better CPU pipeline utilization

Performance Characteristics

The test results show the optimization is particularly effective for:

  • Valid padding removal (21-58% faster): Cases like test_basic_unpad_various_pad_lengths (51% faster) and test_full_block_padding_on_multiple_of_16 (57.4% faster) benefit most because they hit the optimized validation path
  • Larger padding values (31-55% faster): Tests with 8-16 byte padding show significant gains as the Python iteration overhead was proportionally higher
  • Moderate gains for invalid padding (8-26% faster): Even non-padding cases benefit slightly from reduced overhead in earlier checks

Impact on Real Workloads

Based on function_references, unpad_aes is called during AES decryption operations (decrypt_aes128 and decrypt_aes256) when processing encrypted PDF documents. Since decryption typically occurs for every encrypted object/stream in a PDF:

  • High-frequency execution: Documents with many encrypted objects will call this function repeatedly
  • Cumulative benefit: Even microsecond-level improvements compound across hundreds/thousands of decryption operations
  • Latency-sensitive: PDF parsing is often user-facing, so reducing decryption overhead improves perceived responsiveness

The optimization maintains correctness while providing meaningful speedup in a hot path for encrypted PDF processing.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 12 Passed
🌀 Generated Regression Tests 65 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_pdfminer_crypto.py::TestAES.test_unpad_aes 7.46μs 5.18μs 44.0%✅
🌀 Click to see Generated Regression Tests
import pytest  # used for our unit tests
from pdfminer.utils import unpad_aes

def test_empty_input_returns_same_empty():
    # An empty input should be returned unchanged (early exit)
    data = b''
    codeflash_output = unpad_aes(data) # 630ns -> 620ns (1.61% faster)

@pytest.mark.parametrize("padlen", list(range(1, 17)))
def test_basic_unpad_various_pad_lengths(padlen):
    # For padlen in 1..16, when the last padlen bytes are equal to padlen,
    # the function should remove exactly padlen bytes.
    original = b'payload'  # a short payload
    padded = original + bytes([padlen]) * padlen
    # Should return exactly the original payload
    codeflash_output = unpad_aes(padded) # 29.2μs -> 19.4μs (51.0% faster)

def test_full_block_padding_on_multiple_of_16():
    # When the original message length is a multiple of 16, the pad may be 16
    # bytes of value 0x10 (16). The function should strip those 16 bytes.
    original = b'B' * 16  # 16 bytes long original
    padded = original + bytes([16]) * 16
    codeflash_output = unpad_aes(padded) # 1.92μs -> 1.22μs (57.4% faster)

def test_padding_byte_greater_than_16_returns_original():
    # If the last byte (candidate padding length) is > 16, the function must
    # treat the padding as invalid and return the original bytes unchanged.
    padded = b'abc' + b'\x20'  # last byte 0x20 (32) > 16
    codeflash_output = unpad_aes(padded) # 500ns -> 440ns (13.6% faster)

def test_padding_byte_greater_than_length_returns_original():
    # If the indicated padding length is larger than the total message length,
    # it's invalid and should return the original bytes.
    padded = b'ab' + bytes([5])  # length is 3 but padding byte indicates 5
    codeflash_output = unpad_aes(padded) # 510ns -> 450ns (13.3% faster)

def test_inconsistent_padding_returns_original():
    # If the last 'padding' bytes are not all equal to the padding value,
    # the function must not strip anything and should return the original bytes.
    # Construct a case where last byte = 2 but the last two bytes are not both 2.
    padded = b'ABC' + b'\x03\x03\x02'  # last byte = 2, last two bytes = [3,2]
    codeflash_output = unpad_aes(padded) # 1.40μs -> 1.19μs (17.6% faster)

def test_zero_padding_all_zero_bytes_removed():
    # Special case: if the last byte is 0, the implementation treats padding=0.
    # Note: padded[-0:] == padded[0:] (full sequence) and padded[:-0] == padded[:0] (empty).
    # If all bytes are 0, the function will remove "0" bytes as if it's padding,
    # resulting in an empty byte string. This documents current behavior.
    padded = b'\x00' * 8  # all bytes zero
    # Implementation will consider padding = 0, all(...) True, and return padded[:-0] -> b''
    codeflash_output = unpad_aes(padded) # 1.75μs -> 1.23μs (42.3% faster)

def test_zero_padding_mixed_bytes_returns_original():
    # If the last byte is 0 but not all bytes are zero, the all(...) check fails
    # and the original bytes must be returned unchanged.
    padded = b'\x01\x00'  # last byte 0, but not all bytes equal to 0
    codeflash_output = unpad_aes(padded) # 1.28μs -> 760ns (68.4% faster)

def test_all_equal_padding_value_equal_to_length_results_in_empty():
    # If the padded message consists entirely of N bytes all equal to N,
    # the function will consider that a valid padding and return an empty bytes object.
    padded = bytes([5]) * 5  # b'\x05\x05\x05\x05\x05'
    # padding = 5, padded[-5:] is entire message and all bytes equal 5, so returns padded[:-5] -> b''
    codeflash_output = unpad_aes(padded) # 1.54μs -> 1.17μs (31.6% faster)

def test_large_scale_valid_padding_stripped():
    # Large-scale test under the requested element limits: 500 bytes payload + 8 bytes padding.
    # Ensure the function correctly removes the padding on a larger input.
    payload_len = 500
    payload = bytes([0xAA]) * payload_len  # deterministic large payload
    padlen = 8  # valid pad length
    padded = payload + bytes([padlen]) * padlen
    # Should recover the original payload exactly
    codeflash_output = unpad_aes(padded) # 2.07μs -> 1.40μs (47.9% faster)

def test_large_scale_full_block_padding_on_large_multiple_of_16():
    # Another large-scale case where the original is multiple of 16 and padding is 16 bytes.
    # Use 480 bytes (30 * 16) to stay safely under the 1000 element guidance.
    original = bytes([0x7F]) * 480  # 480 bytes, multiple of 16
    padded = original + bytes([16]) * 16  # 16-byte pad appended
    # Should remove the 16-byte padding and return the original 480-byte message
    codeflash_output = unpad_aes(padded) # 2.27μs -> 1.43μs (58.7% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest
from pdfminer.utils import unpad_aes

def test_unpad_aes_basic_valid_padding_one_byte():
    """Test unpacking when padding is 1 byte (0x01)."""
    padded = b'hello world     \x01'  # 16 bytes, 1 byte of 0x01 padding
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.71μs -> 1.41μs (21.3% faster)

def test_unpad_aes_basic_valid_padding_multiple_bytes():
    """Test unpacking when padding is multiple bytes (0x04 repeated 4 times)."""
    padded = b'hello\x04\x04\x04\x04'  # 9 bytes data + 4 padding bytes
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.73μs -> 1.42μs (21.8% faster)

def test_unpad_aes_basic_full_block_padding():
    """Test unpacking when message is multiple of 16, requires full block padding (16 x 0x10)."""
    padded = b'hello world\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10'  # 11 + 16 = 27 bytes
    codeflash_output = unpad_aes(padded); result = codeflash_output # 2.11μs -> 1.47μs (43.5% faster)

def test_unpad_aes_basic_empty_input():
    """Test unpacking empty bytes (edge case where no padding exists)."""
    padded = b''
    codeflash_output = unpad_aes(padded); result = codeflash_output # 420ns -> 360ns (16.7% faster)

def test_unpad_aes_basic_single_padding_byte():
    """Test unpacking with minimal data and one padding byte."""
    padded = b'a\x01'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.67μs -> 1.49μs (12.1% faster)

def test_unpad_aes_basic_two_padding_bytes():
    """Test unpacking with two padding bytes (0x02 repeated)."""
    padded = b'test\x02\x02'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.59μs -> 1.34μs (18.7% faster)

def test_unpad_aes_basic_eight_padding_bytes():
    """Test unpacking with eight padding bytes (0x08 repeated)."""
    padded = b'12345678\x08\x08\x08\x08\x08\x08\x08\x08'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.87μs -> 1.40μs (33.6% faster)

def test_unpad_aes_edge_invalid_padding_value_too_high():
    """Test that padding values > 16 are not removed (invalid padding)."""
    padded = b'hello\x11'  # 0x11 = 17, which is > 16
    codeflash_output = unpad_aes(padded); result = codeflash_output # 500ns -> 440ns (13.6% faster)

def test_unpad_aes_edge_invalid_padding_value_max_byte():
    """Test that padding value of 0xFF (255) is not removed (invalid)."""
    padded = b'test\xff'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 480ns -> 380ns (26.3% faster)

def test_unpad_aes_edge_invalid_padding_exceeds_length():
    """Test when last byte value exceeds the length of the entire message."""
    padded = b'hi\x10'  # Only 3 bytes total, but last byte says 16 bytes of padding
    codeflash_output = unpad_aes(padded); result = codeflash_output # 530ns -> 450ns (17.8% faster)

def test_unpad_aes_edge_invalid_padding_inconsistent_bytes():
    """Test when padding bytes don't match the expected pattern (inconsistent)."""
    padded = b'hello\x03\x03\x02'  # Last byte is 0x02, but previous byte is 0x03
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.38μs -> 1.27μs (8.66% faster)

def test_unpad_aes_edge_invalid_padding_only_some_match():
    """Test when only some padding bytes match the expected value."""
    padded = b'data\x05\x05\x05\x04\x05'  # Last byte is 0x05, but not all last 5 bytes are 0x05
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.56μs -> 1.11μs (40.5% faster)

def test_unpad_aes_edge_no_padding_value_one():
    """Test case where data naturally ends with 0x01 but it's not padding."""
    padded = b'test\x01'  # Ends with 0x01 which could be valid padding
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.52μs -> 1.26μs (20.6% faster)

def test_unpad_aes_edge_no_padding_all_ones():
    """Test data that is all 0x01 bytes with 0x01 as last byte."""
    padded = b'\x01\x01\x01\x01'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.39μs -> 1.25μs (11.2% faster)

def test_unpad_aes_edge_max_valid_padding():
    """Test maximum valid padding value (16 bytes of 0x10)."""
    padded = b'\x10' * 16
    codeflash_output = unpad_aes(padded); result = codeflash_output # 2.03μs -> 1.47μs (38.1% faster)

def test_unpad_aes_edge_padding_value_zero():
    """Test when last byte is 0x00 (which is invalid as padding > 0)."""
    padded = b'hello\x00'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.25μs -> 830ns (50.6% faster)

def test_unpad_aes_edge_single_byte_input_invalid_padding():
    """Test single byte input where that byte is invalid padding."""
    padded = b'\x20'  # 0x20 = 32, which is > 16
    codeflash_output = unpad_aes(padded); result = codeflash_output # 461ns -> 400ns (15.2% faster)

def test_unpad_aes_edge_single_byte_input_valid_padding():
    """Test single byte input of 0x01 (valid padding for empty message)."""
    padded = b'\x01'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.43μs -> 1.26μs (13.5% faster)

def test_unpad_aes_edge_16_bytes_no_padding():
    """Test exactly 16 bytes of data that happens to end with valid-looking padding."""
    padded = b'0123456789ABCDE\x01'  # 16 bytes, last is 0x01
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.43μs -> 1.21μs (18.2% faster)

def test_unpad_aes_edge_binary_data_with_padding():
    """Test binary data (non-ASCII) with valid padding."""
    padded = b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x02\x02'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.50μs -> 1.25μs (20.0% faster)

def test_unpad_aes_edge_padding_byte_mismatch_at_start():
    """Test where first of the padding bytes doesn't match the expected value."""
    padded = b'message\x04\x03\x04\x04'  # Last byte is 0x04, but 2nd-to-last is 0x03
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.39μs -> 1.11μs (25.2% faster)

def test_unpad_aes_edge_very_short_message_full_padding():
    """Test very short message (1 byte) with maximum padding."""
    padded = b'a\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f\x0f'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 2.04μs -> 1.43μs (42.7% faster)

def test_unpad_aes_edge_all_zeros_except_last():
    """Test message of all zeros with single valid padding byte."""
    padded = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.41μs -> 1.21μs (16.5% faster)

def test_unpad_aes_edge_16_bytes_with_0x10_at_end():
    """Test 16 bytes where last byte is 0x10 (full block padding indicator)."""
    padded = b'0123456789ABCDE\x10'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.27μs -> 1.18μs (7.63% faster)

def test_unpad_aes_edge_32_bytes_with_full_padding_block():
    """Test 32 bytes (multiple of 16) with full padding block at end."""
    data = b'0123456789ABCDEF' + b'\x10' * 16
    codeflash_output = unpad_aes(data); result = codeflash_output # 2.09μs -> 1.38μs (51.4% faster)

def test_unpad_aes_large_scale_long_message_with_minimal_padding():
    """Test large message with minimal valid padding (single byte 0x01)."""
    message = b'X' * 500  # 500 bytes of data
    padded = message + b'\x01'
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.60μs -> 1.38μs (15.9% faster)

def test_unpad_aes_large_scale_long_message_with_max_padding():
    """Test large message with full block padding (16 bytes of 0x10)."""
    message = b'Y' * 512  # 512 bytes of data (exactly 32 blocks)
    padded = message + b'\x10' * 16
    codeflash_output = unpad_aes(padded); result = codeflash_output # 2.17μs -> 1.41μs (53.8% faster)

def test_unpad_aes_large_scale_long_message_with_medium_padding():
    """Test large message with medium-length padding (8 bytes of 0x08)."""
    message = b'Z' * 1000  # 1000 bytes of data
    padded = message + b'\x08' * 8
    codeflash_output = unpad_aes(padded); result = codeflash_output # 2.08μs -> 1.58μs (31.6% faster)

def test_unpad_aes_large_scale_many_blocks():
    """Test 100 complete 16-byte blocks with proper padding for last block."""
    message = b'A' * (100 * 16 - 5)  # 1595 bytes (100 blocks - 5 bytes)
    padded = message + b'\x05' * 5  # 5 bytes of 0x05 padding
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.78μs -> 1.43μs (24.5% faster)

def test_unpad_aes_large_scale_mixed_binary_data():
    """Test large binary data with various byte values and valid padding."""
    message = bytes(range(256)) * 2  # 512 bytes of repeating 0x00-0xFF
    padded = message + b'\x04' * 4
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.65μs -> 1.42μs (16.2% faster)

def test_unpad_aes_large_scale_boundary_multiple_of_16():
    """Test message that's multiple of 16 requiring full padding block."""
    message = b'M' * 480  # Exactly 30 blocks of 16 bytes
    padded = message + b'\x10' * 16  # Full padding block
    codeflash_output = unpad_aes(padded); result = codeflash_output # 2.07μs -> 1.47μs (40.8% faster)

def test_unpad_aes_large_scale_just_under_multiple_of_16():
    """Test message that's just under multiple of 16 (1 byte short)."""
    message = b'N' * 479  # 479 bytes = 29*16 + 15
    padded = message + b'\x01'  # 1 byte of 0x01 padding
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.47μs -> 1.34μs (9.70% faster)

def test_unpad_aes_large_scale_just_over_multiple_of_16():
    """Test message that's just over multiple of 16 (1 byte over)."""
    message = b'O' * 481  # 481 bytes = 30*16 + 1
    padded = message + b'\x0f' * 15  # 15 bytes of 0x0f padding
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.96μs -> 1.49μs (31.5% faster)

def test_unpad_aes_large_scale_alternating_pattern():
    """Test large message with alternating byte pattern and valid padding."""
    message = b'\xaa\x55' * 256  # 512 bytes of alternating 0xaa and 0x55
    padded = message + b'\x02' * 2
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.63μs -> 1.32μs (23.5% faster)

def test_unpad_aes_large_scale_invalid_padding_in_large_message():
    """Test that invalid padding in large messages is correctly not removed."""
    message = b'P' * 500
    padded = message + b'\x11'  # 0x11 = 17, which is > 16 (invalid)
    codeflash_output = unpad_aes(padded); result = codeflash_output # 490ns -> 450ns (8.89% faster)

def test_unpad_aes_large_scale_inconsistent_padding_in_large_message():
    """Test that inconsistent padding in large messages is correctly not removed."""
    message = b'Q' * 500
    padded = message + b'\x04\x04\x04\x03'  # Inconsistent: 0x03 instead of 0x04
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.36μs -> 1.24μs (9.68% faster)

def test_unpad_aes_large_scale_maximum_valid_padding():
    """Test very long message with maximum valid padding (16 bytes)."""
    message = b'R' * 2000
    padded = message + b'\x10' * 16
    codeflash_output = unpad_aes(padded); result = codeflash_output # 2.48μs -> 1.65μs (50.3% faster)

def test_unpad_aes_large_scale_all_padding_bytes_removed():
    """Test case where all bytes are valid padding (all 0x10)."""
    padded = b'\x10' * 32  # 32 bytes of 0x10
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.99μs -> 1.28μs (55.5% faster)

def test_unpad_aes_large_scale_repeated_padding_pattern():
    """Test large message with multiple padding blocks (edge: not our case, but test anyway)."""
    message = b'S' * 256
    padded = message + b'\x08' * 8
    codeflash_output = unpad_aes(padded); result = codeflash_output # 1.83μs -> 1.37μs (33.6% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-unpad_aes-mkql8tz1 and push.

Codeflash Static Badge

The optimized code achieves a **35% speedup** by replacing Python-level iteration with faster C-implemented bytes operations for padding validation.

## Key Optimization

The original code used `all(x == padding for x in padded[-padding:])` to validate padding bytes. This involves a Python generator expression that iterates byte-by-byte, incurring significant Python interpreter overhead. The line profiler shows this check consuming **58.5% of total runtime** (283.6µs out of 485.1µs).

The optimization replaces this with:
1. **Direct bytes comparison**: `padded[-padding:] == bytes((padding,)) * padding` uses C-level bytes equality checking instead of Python iteration
2. **Special handling for padding==0**: Uses `padded.count(b"\x00")` instead of the generic check, leveraging an optimized C implementation

## Why It's Faster

- **C-level operations**: Both `bytes.__eq__()` and `bytes.count()` are implemented in C and operate on contiguous memory, avoiding Python's per-element overhead
- **Single operation vs iteration**: Direct slice comparison executes as one native operation rather than iterating through each byte in Python
- **Reduced branch misprediction**: The bytes comparison likely benefits from better CPU pipeline utilization

## Performance Characteristics

The test results show the optimization is particularly effective for:
- **Valid padding removal** (21-58% faster): Cases like `test_basic_unpad_various_pad_lengths` (51% faster) and `test_full_block_padding_on_multiple_of_16` (57.4% faster) benefit most because they hit the optimized validation path
- **Larger padding values** (31-55% faster): Tests with 8-16 byte padding show significant gains as the Python iteration overhead was proportionally higher
- **Moderate gains for invalid padding** (8-26% faster): Even non-padding cases benefit slightly from reduced overhead in earlier checks

## Impact on Real Workloads

Based on `function_references`, `unpad_aes` is called during **AES decryption operations** (`decrypt_aes128` and `decrypt_aes256`) when processing encrypted PDF documents. Since decryption typically occurs for every encrypted object/stream in a PDF:
- **High-frequency execution**: Documents with many encrypted objects will call this function repeatedly
- **Cumulative benefit**: Even microsecond-level improvements compound across hundreds/thousands of decryption operations
- **Latency-sensitive**: PDF parsing is often user-facing, so reducing decryption overhead improves perceived responsiveness

The optimization maintains correctness while providing meaningful speedup in a hot path for encrypted PDF processing.
@codeflash-ai codeflash-ai Bot requested a review from aseembits93 January 23, 2026 07:57
@codeflash-ai codeflash-ai Bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants