Skip to content

index out of range for tables still exists on version 1.1.4 #70

@sarah-harveyai

Description

@sarah-harveyai

What the bug is: On line 32 of handle_table:
while used_cells[cell_row][col_offset]:
col_offset += 1

This loop increments col_offset to skip over cells already occupied by a previous rowspan/colspan merge. But there's no bounds check — if the HTML table has
malformed or complex merged cells where the colspan/rowspan claims extend beyond the grid dimensions calculated by get_table_dimensions(), col_offset goes past
cols and you get IndexError: list index out of range.

In other words: get_table_dimensions() calculates the table as having N columns, but the actual HTML content (with its rowspan/colspan attributes) implies more
columns than that, so the used_cells grid is too small.

you can reproduce running:

uv run python3 -c "                                                                                                                                            
   from html4docx import HtmlToDocx                                                                                                                               
   from docx import Document                                                                                                                                      
                                                                                                                                                                  
   # Row 1: 2 visible cells, one with rowspan=2 → get_table_dimensions sees max 2 cols
   # Row 2: 2 visible cells, BUT col 0 is already occupied by the rowspan
   #         so it needs 3 columns total, but the grid is only 2 wide
   html = '''     
   <table>
     <tr>
       <td rowspan=\"2\">spans down</td>
       <td>B1</td>
     </tr>
     <tr>  
       <td>A2</td>
       <td>B2</td>
     </tr>         
   </table>             
   '''                                   
    
   doc = Document()                  
   parser = HtmlToDocx()
   parser.add_html_to_document(html, doc)                                           
   "

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions