-
-
Notifications
You must be signed in to change notification settings - Fork 15
index out of range for tables still exists on version 1.1.4 #70
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What the bug is: On line 32 of handle_table:
while used_cells[cell_row][col_offset]:
col_offset += 1
This loop increments col_offset to skip over cells already occupied by a previous rowspan/colspan merge. But there's no bounds check — if the HTML table has
malformed or complex merged cells where the colspan/rowspan claims extend beyond the grid dimensions calculated by get_table_dimensions(), col_offset goes past
cols and you get IndexError: list index out of range.
In other words: get_table_dimensions() calculates the table as having N columns, but the actual HTML content (with its rowspan/colspan attributes) implies more
columns than that, so the used_cells grid is too small.
you can reproduce running:
uv run python3 -c "
from html4docx import HtmlToDocx
from docx import Document
# Row 1: 2 visible cells, one with rowspan=2 → get_table_dimensions sees max 2 cols
# Row 2: 2 visible cells, BUT col 0 is already occupied by the rowspan
# so it needs 3 columns total, but the grid is only 2 wide
html = '''
<table>
<tr>
<td rowspan=\"2\">spans down</td>
<td>B1</td>
</tr>
<tr>
<td>A2</td>
<td>B2</td>
</tr>
</table>
'''
doc = Document()
parser = HtmlToDocx()
parser.add_html_to_document(html, doc)
"
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working