index out of range for tables still exists on version 1.1.4

  What the bug is: On line 32 of handle_table:                                                                                                                    
  while used_cells[cell_row][col_offset]:                                                                                                                         
      col_offset += 1                                                                                                                                             
                                                                                                                                                                  
  This loop increments col_offset to skip over cells already occupied by a previous rowspan/colspan merge. But there's no bounds check — if the HTML table has    
  malformed or complex merged cells where the colspan/rowspan claims extend beyond the grid dimensions calculated by get_table_dimensions(), col_offset goes past
  cols and you get IndexError: list index out of range.                                                                                                           
                                                                                                                                                                  
  In other words: get_table_dimensions() calculates the table as having N columns, but the actual HTML content (with its rowspan/colspan attributes) implies more 
  columns than that, so the used_cells grid is too small.

you can reproduce running:
```
uv run python3 -c "                                                                                                                                            
   from html4docx import HtmlToDocx                                                                                                                               
   from docx import Document                                                                                                                                      
                                                                                                                                                                  
   # Row 1: 2 visible cells, one with rowspan=2 → get_table_dimensions sees max 2 cols
   # Row 2: 2 visible cells, BUT col 0 is already occupied by the rowspan
   #         so it needs 3 columns total, but the grid is only 2 wide
   html = '''     
   <table>
     <tr>
       <td rowspan=\"2\">spans down</td>
       <td>B1</td>
     </tr>
     <tr>  
       <td>A2</td>
       <td>B2</td>
     </tr>         
   </table>             
   '''                                   
    
   doc = Document()                  
   parser = HtmlToDocx()
   parser.add_html_to_document(html, doc)                                           
   "

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

index out of range for tables still exists on version 1.1.4 #70

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

index out of range for tables still exists on version 1.1.4 #70

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions