Egor/dev 1155 move deck manipulation to gslides api#61
Egor/dev 1155 move deck manipulation to gslides api#61ZmeiGorynych wants to merge 3 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughIntroduces multi-format presentation adapter architecture with abstract APIs for Google Slides, PowerPoint, and HTML. Adds a CLI tool implementing a four-step Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as prepare.py<br/>(CLI)
participant Adapter as Adapter<br/>(Google/PPTX)
participant Presentation as Presentation<br/>Object
participant Storage as Cloud/Disk<br/>Storage
User->>CLI: copy --source <id>
CLI->>Adapter: resolve_api_client()
Adapter->>Presentation: from_id(api_client)
Presentation->>Storage: load metadata
CLI->>Adapter: copy_via_drive()
Adapter->>Storage: duplicate presentation
Storage-->>CLI: new_presentation_id
CLI-->>User: print copy URL
User->>CLI: inspect --slide 0
CLI->>Presentation: slides[0]
Presentation->>Adapter: slide.thumbnail(api_client)
Adapter->>Storage: fetch + save thumbnail.png
CLI->>Presentation: slide.markdown()
Presentation-->>CLI: metadata + content_markdown
CLI-->>User: print markdown
User->>CLI: name --mapping mapping.json
CLI->>CLI: load_mapping()
CLI->>Presentation: iterate slides
Presentation->>Adapter: element.set_alt_text()
Adapter->>Storage: update element metadata
Presentation->>Adapter: save()
Storage-->>CLI: ✓ saved
CLI-->>User: print success
User->>CLI: templatize
CLI->>Presentation: iterate slides + elements
Presentation->>Adapter: replace_image(placeholder.png)
Presentation->>Adapter: write_text(example_text)
Presentation->>Adapter: update_table(template)
Adapter->>Storage: update elements
Presentation->>Adapter: save()
Storage-->>CLI: ✓ saved
CLI-->>User: print success
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 18
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
gslides_api/element/image.py (1)
131-147:⚠️ Potential issue | 🟠 MajorUse the source element’s alt text in the recreate path.
When
recreate_element=True,imageis the fresh object returned bycreate_image_element_like(). That object is instantiated withouttitle/descriptioningslides_api/element/base.pyLines 167-174, so Lines 146-147 passNoneandreplace_image_from_id()skips the restore request. Recreated images still lose alt text afterreplaceImage.🔧 Suggested fix
ImageElement.replace_image_from_id( image.objectId, image.presentation_id, url=url, file=file, method=method, api_client=api_client, - title=image.title, - description=image.description, + title=self.title, + description=self.description, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gslides_api/element/image.py` around lines 131 - 147, When recreate_element is True the newly created image lacks alt text, so change the arguments passed to ImageElement.replace_image_from_id to use the source element's alt text (self.title and self.description) in the recreate path instead of image.title/image.description; keep using image.title/image.description when not recreating. Locate the recreate branch around create_image_element_like and adjust the title/description passed to replace_image_from_id (or compute title, description = (self.title, self.description) if recreate_element else (image.title, image.description)) so alt text is preserved after replace_image_from_id.gslides_api/element/base.py (1)
293-311:⚠️ Potential issue | 🟠 MajorGuard the new
Nonecontract before callingelement_properties().Lines 310 and 328 still route through
element_properties(), and that helper unconditionally dereferencesself.transform. If an element reaches this layer without a transform, these methods still raise instead of returningNoneas the new signature/docstring promise. Short-circuit here first, or maketransformoptional end-to-end.🔧 Suggested fix
def absolute_size(self, units: OutputUnit) -> Optional[Tuple[float, float]]: """Calculate the absolute size of the element in the specified units. @@ Raises: ValueError: If units is not "cm" or "in". """ + if self.size is None or self.transform is None: + return None element_props = self.element_properties() return element_props.absolute_size(units) @@ def absolute_position(self, units: OutputUnit = OutputUnit.CM) -> Optional[Tuple[float, float]]: """Calculate the absolute position of the element on the page in the specified units. @@ Returns: A tuple of (x, y) representing the position in the specified units, where x is the horizontal distance from the left edge and y is the vertical distance from the top edge of the slide, or None if transform is not available. """ + if self.transform is None: + return None element_props = self.element_properties() return element_props.absolute_position(units)Also applies to: 313-329
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gslides_api/element/base.py` around lines 293 - 311, The method absolute_size currently calls element_properties() unconditionally which dereferences self.transform and can raise; before calling element_properties() (and in any other methods that call it in this file), short-circuit when transform or size data is missing by checking self.transform (and any transform.size) and return None if absent, then call element_properties() only when transform is present; alternatively, make transform optional end-to-end by propagating Optional types from element_properties() to absolute_size and related methods (absolute_size, any other methods invoking element_properties()) so they honor the new None contract instead of raising.
🟡 Minor comments (11)
gslides_api/common/download.py-38-41 (1)
38-41:⚠️ Potential issue | 🟡 MinorRemove the redundant
httpx.TimeoutExceptionfrom the exceptions tuple.Since
httpx.TimeoutExceptionis a subclass ofhttpx.RequestError, catchingRequestErroralready handles timeout exceptions. Keeping onlyRequestErrorsimplifies the code without losing any error handling.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gslides_api/common/download.py` around lines 38 - 41, In the exceptions tuple used for retrying requests in download.py (the "exceptions" key that currently lists httpx.RequestError and httpx.TimeoutException), remove httpx.TimeoutException so the tuple contains only httpx.RequestError, since TimeoutException is a subclass and already covered; update the tuple at the same location (the "exceptions" entry) to only include httpx.RequestError and run tests to confirm no behavioral change.gslides_api/domain/domain.py-734-736 (1)
734-736:⚠️ Potential issue | 🟡 MinorDocstring inconsistency: Remove obsolete
Raises: ValueErrorsection.The
absolute_sizemethod no longer raisesValueErrorwhen size or transform is unavailable—it now returnsNone. The docstring should be updated to remove theRaisessection.📝 Suggested docstring fix
Returns: A tuple of (width, height) representing the element's dimensions in the specified units, or None if size or transform is not available (e.g. for group container elements). - - Raises: - ValueError: If units is not "cm" or "in". """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gslides_api/domain/domain.py` around lines 734 - 736, The docstring for absolute_size is out of date: remove the obsolete "Raises: ValueError" section and update the return description to reflect that the method may return None (when size or transform is unavailable) instead of raising; locate the absolute_size method in domain.py and edit its docstring to remove the Raises block and clearly state the return type/semantic (e.g., "Returns: Optional[float] or None when size/transform unavailable") while keeping existing parameter descriptions intact.tests/test_pptx/test_pptx_integration.py-448-490 (1)
448-490:⚠️ Potential issue | 🟡 MinorGuard cleanup against an unset
pptx_path.If anything fails before Line 448, the
finallyblock raisesUnboundLocalErrorand hides the real test failure.Suggested fix
- try: + pptx_path = None + try: # Create test images using PIL from PIL import Image @@ finally: # Cleanup for path in [test_image_path, replacement_image_path, pptx_path]: - if os.path.exists(path): + if path and os.path.exists(path): os.unlink(path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_pptx/test_pptx_integration.py` around lines 448 - 490, The finally block can raise UnboundLocalError if pptx_path wasn't created; initialize pptx_path (and other temp path vars like replacement_image_path, test_image_path) to None at top of the test and change the cleanup loop to check "if path and os.path.exists(path)" before calling os.path.exists/remove; update references in this test function where pptx_path is assigned and in the finally block to use this guarded check so cleanup only runs for set paths.tests/test_pptx/test_markdown_to_pptx.py-229-233 (1)
229-233:⚠️ Potential issue | 🟡 MinorUse explicit optionals in this helper signature.
These defaults are currently implicit
Optionals, which Ruff flags as RUF013. Use| Nonestyle to match recent patterns in the codebase.Suggested fix
- bold: bool = False, - italic: bool = False, - font_name: str = None, - font_size_pt: float = None, - color_rgb: tuple = None, + bold: bool = False, + italic: bool = False, + font_name: str | None = None, + font_size_pt: float | None = None, + color_rgb: tuple[int, int, int] | None = None,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_pptx/test_markdown_to_pptx.py` around lines 229 - 233, Update the helper signature to use explicit optional types for parameters that default to None: change font_name: str = None to font_name: str | None = None, font_size_pt: float = None to font_size_pt: float | None = None, and color_rgb: tuple = None to color_rgb: tuple | None = None; leave bold: bool = False and italic: bool = False as-is. This ensures the annotations for font_name, font_size_pt, and color_rgb are explicit Optionals using the | None style (refer to the parameter names font_name, font_size_pt, color_rgb in the helper signature).gslides_api/pptx/shape_copier.py-91-91 (1)
91-91:⚠️ Potential issue | 🟡 Minor
copied_shapeslist is never populated.The
copied_shapeslist is initialized at line 91 but no copy method ever appends to it. This makesget_copy_stats()always return{'shapes_copied': 0}.Proposed fix - track successful copies
def _copy_text_shape( ... ) -> Optional[BaseShape]: ... logger.debug("Successfully copied text shape") + self.copied_shapes.append({ + 'type': 'text_shape', + 'source_id': source_shape.shape_id if hasattr(source_shape, 'shape_id') else None, + }) return text_boxApply similar tracking to other
_copy_*methods, or remove the unused tracking infrastructure.Also applies to: 1058-1071
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gslides_api/pptx/shape_copier.py` at line 91, The copied_shapes list is initialized but never updated, so get_copy_stats() always reports zero; update each shape-copying method (e.g., _copy_shape, and all _copy_* helpers that perform successful copy operations referenced in this class) to append a small record (e.g., shape type and any id or size info) into self.copied_shapes when a copy succeeds, or alternatively remove the copied_shapes field and get_copy_stats() if tracking is not needed; ensure you update get_copy_stats() to compute counts from self.copied_shapes (or remove its usage) and keep function names like copied_shapes, get_copy_stats, _copy_shape and other _copy_* methods in sync..claude/skills/prepare-presentation/prepare.py-140-141 (1)
140-141:⚠️ Potential issue | 🟡 MinorAdd error handling for invalid JSON in
--mappingargument.If the user provides malformed JSON,
json.loadswill raise aJSONDecodeErrorwith no user-friendly message.Proposed fix
api_client, presentation = _load_presentation(args.source) - mapping = json.loads(args.mapping) + try: + mapping = json.loads(args.mapping) + except json.JSONDecodeError as e: + print(f"ERROR: Invalid JSON mapping: {e}") + return🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/skills/prepare-presentation/prepare.py around lines 140 - 141, The code calls json.loads(args.mapping) without handling malformed JSON; wrap that call in a try/except catching json.JSONDecodeError (importing json if not already) and on failure print or log a clear user-facing error that the --mapping JSON is invalid (include the decode error message and the provided string for context) and then exit with a non-zero status; update the block around _load_presentation(...) and mapping so the exception handling surrounds json.loads and returns/exit early on error.gslides_api/adapters/add_names.py-54-56 (1)
54-56:⚠️ Potential issue | 🟡 MinorRemove orphaned docstring.
Line 55 contains a duplicate docstring that appears to be leftover from refactoring.
Proposed fix
if api_client is None: raise ValueError("API client is required") - """Name slides in a presentation based on their speaker notes., enforcing unique names""" - # api_client = api_client or AbstractSlidesAPIClient.get_default_api_client() if llm is None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gslides_api/adapters/add_names.py` around lines 54 - 56, Remove the stray/orphaned docstring string literal ("Name slides in a presentation based on their speaker notes., enforcing unique names") that appears after the ValueError raise; delete that line so the function/module no longer contains a duplicated docstring, and if a descriptive docstring is needed for the surrounding function (e.g., the function that raises ValueError in add_names) move or consolidate the text into the function's proper top-level triple-quoted docstring instead of leaving the orphaned string literal.tests/test_pptx/test_pptx_converters.py-346-349 (1)
346-349:⚠️ Potential issue | 🟡 MinorIncomplete assertion - expression has no effect.
Line 348 accesses
mock_run.font.color.rgbbut doesn't assert anything. This appears to be a placeholder that should verify the color was set correctly.Proposed fix
apply_rich_style_to_pptx_run(rich, mock_run) # Color should be set via rgb property - mock_run.font.color.rgb # Just verify access doesn't fail + # Verify rgb was assigned (MagicMock records the assignment) + assert mock_run.font.color.rgb is not None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_pptx/test_pptx_converters.py` around lines 346 - 349, The test currently only accesses mock_run.font.color.rgb without asserting anything; replace that line with an assertion that the rgb value matches the expected color used in the test (e.g., assert mock_run.font.color.rgb == expected_rgb or assert mock_run.font.color.rgb == RGBColor(...)), where expected_rgb is the RGB value you set earlier in this test case; update the test to import/construct the same RGBColor representation used by the code under test and assert equality (or at minimum assert mock_run.font.color.rgb is not None) to ensure the color was actually applied on mock_run.gslides_api/adapters/pptx_adapter.py-1594-1623 (1)
1594-1623:⚠️ Potential issue | 🟡 MinorTemporary file cleanup on success path is missing.
The thumbnail temp file is created but never cleaned up on the success path. This can lead to temp file accumulation over time. Consider documenting this as intentional (if the file URL is meant to be consumed later) or implementing cleanup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gslides_api/adapters/pptx_adapter.py` around lines 1594 - 1623, The temp file created in the NamedTemporaryFile block (temp_path) is never removed on the success path; decide whether the on-disk file must persist for consumers or not and implement the corresponding fix: if the file should not persist, unlink temp_path before returning and still populate AbstractThumbnail.content with png_bytes when include_data is True (use temp_path only transiently); if the file must persist for consumers, document that behavior and implement a cleanup strategy (e.g. register temp_path for removal via atexit or a background cleanup job) so files do not accumulate. Ensure changes reference temp_path, png_bytes, include_data, NamedTemporaryFile, and the returned AbstractThumbnail so the correct code paths are updated.gslides_api/adapters/pptx_adapter.py-1741-1745 (1)
1741-1745:⚠️ Potential issue | 🟡 MinorUse exception chaining with
raise ... from.When re-raising a wrapped exception, use
from eto preserve the exception chain for better debugging.🐛 Proposed fix
try: pptx_presentation = Presentation(presentation_id) return cls(pptx_presentation, presentation_id, uploaded_url=uploaded_url) except Exception as e: - raise ValueError(f"Could not load presentation from {presentation_id}: {e}") + raise ValueError(f"Could not load presentation from {presentation_id}: {e}") from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gslides_api/adapters/pptx_adapter.py` around lines 1741 - 1745, The except block that catches Exception as e in the classmethod constructing a Presentation (calling Presentation(presentation_id) and returning cls(...)) should re-raise the ValueError using exception chaining to preserve the original traceback; change the raise to use "raise ValueError(f'Could not load presentation from {presentation_id}: {e}') from e" so the original exception is linked to the new ValueError.gslides_api/adapters/pptx_adapter.py-690-696 (1)
690-696:⚠️ Potential issue | 🟡 MinorRemove extraneous f-string prefixes.
These f-strings have no placeholders and should be regular strings.
🐛 Proposed fix
parent.remove(self.pptx_element._element) - logger.debug(f"create_image_element_like: removed element from parent") + logger.debug("create_image_element_like: removed element from parent") else: - logger.warning(f"create_image_element_like: element has no parent") + logger.warning("create_image_element_like: element has no parent") else: - logger.warning(f"create_image_element_like: element has no _element or getparent") + logger.warning("create_image_element_like: element has no _element or getparent")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gslides_api/adapters/pptx_adapter.py` around lines 690 - 696, The log calls inside create_image_element_like use f-strings without any placeholders (logger.debug(f"create_image_element_like: removed element from parent"), logger.warning(f"create_image_element_like: element has no parent"), logger.warning(f"create_image_element_like: element has no _element or getparent")), change them to plain string literals by removing the unnecessary f-prefix on those messages so they are simple quoted strings; update the three logger calls in create_image_element_like accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6dba22a4-a508-45d3-8706-53126d8f6b4f
⛔ Files ignored due to path filters (3)
.claude/skills/prepare-presentation/placeholder.pngis excluded by!**/*.pngpoetry.lockis excluded by!**/*.locktests/test_pptx/Samplead Master Deck Template.pptxis excluded by!**/*.pptx
📒 Files selected for processing (62)
.claude/skills/prepare-presentation/SKILL.md.claude/skills/prepare-presentation/prepare.pygslides_api/adapters/__init__.pygslides_api/adapters/abstract_slides.pygslides_api/adapters/add_names.pygslides_api/adapters/gslides_adapter.pygslides_api/adapters/html_adapter.pygslides_api/adapters/markdown_to_html.pygslides_api/adapters/pptx_adapter.pygslides_api/agnostic/element_size.pygslides_api/common/__init__.pygslides_api/common/download.pygslides_api/common/google_errors.pygslides_api/common/log_time.pygslides_api/common/presentation_id.pygslides_api/common/retry.pygslides_api/domain/domain.pygslides_api/element/base.pygslides_api/element/image.pygslides_api/mcp/__init__.pygslides_api/mcp/models.pygslides_api/mcp/server.pygslides_api/mcp/utils.pygslides_api/pptx/__init__.pygslides_api/pptx/chart_renderer.pygslides_api/pptx/converters.pygslides_api/pptx/id_manager.pygslides_api/pptx/markdown_to_pptx.pygslides_api/pptx/relationship_copier.pygslides_api/pptx/shape_copier.pygslides_api/pptx/slide_copier.pygslides_api/pptx/slide_deleter.pygslides_api/pptx/xml_utils.pyplayground/table_games.pypyproject.tomltests/mcp_tests/test_add_element_names.pytests/mcp_tests/test_copy_presentation.pytests/mcp_tests/test_models.pytests/mcp_tests/test_new_tools.pytests/test_absolute_size_position.pytests/test_adapters/__init__.pytests/test_adapters/test_abstract_slide_markdown.pytests/test_adapters/test_gslides_adapter_discriminated_union.pytests/test_adapters/test_gslides_adapter_write_text.pytests/test_page_element.pytests/test_pptx/__init__.pytests/test_pptx/test_markdown_to_pptx.pytests/test_pptx/test_pptx_adapter.pytests/test_pptx/test_pptx_adapter_slides.pytests/test_pptx/test_pptx_autoscale.pytests/test_pptx/test_pptx_bullets.pytests/test_pptx/test_pptx_converters.pytests/test_pptx/test_pptx_integration.pytests/test_pptx/test_pptx_roundtrip.pytests/test_pptx/test_pptx_shape_styling.pytests/test_pptx/test_pptx_slide_copying.pytests/test_pptx/test_pptx_table_roundtrip.pytests/test_pptx/test_pptx_text_extraction.pytests/test_pptx/test_pptx_text_frame_markdown.pytests/test_pptx/test_pptx_write_markdown.pytests/test_pptx/test_table_adapter_interface.pytests/test_replace_image_preserves_alt_text.py
💤 Files with no reviewable changes (1)
- gslides_api/mcp/init.py
| # Convert pixels to requested units (assuming 96 DPI for px to inches) | ||
| if units == OutputUnit.IN or units == "in": | ||
| return (width_px / 96.0, height_px / 96.0) | ||
| elif units == OutputUnit.PX or units == "px": | ||
| return (width_px, height_px) | ||
| else: | ||
| return (width_px / 96.0, height_px / 96.0) |
There was a problem hiding this comment.
absolute_size() breaks for non-inch callers.
OutputUnit only defines EMU, IN, CM, and PT, so evaluating OutputUnit.PX raises. Even after that, the fallback returns inches while labeling them as the requested unit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@gslides_api/adapters/html_adapter.py` around lines 467 - 473, absolute_size()
currently assumes OutputUnit contains PX and falls back to inches, causing
AttributeError and incorrect unit conversions; update the absolute_size function
to handle all OutputUnit members (EMU, IN, CM, PT) and accept string aliases
("in","cm","pt","px") and optional "px" conversion using 96 DPI. Implement
conversions: EMU -> inches (1 inch = 914400 EMU) or compute directly to
requested unit, CM -> inches (1 in = 2.54 cm), PT -> inches (1 in = 72 pt), PX
-> inches (96 px = 1 in); return sizes in the true requested unit rather than
defaulting to inches, and raise a clear ValueError for unsupported units.
Reference: function absolute_size and enum OutputUnit.
| # Save the HTML | ||
| with open(html_file, "w", encoding="utf-8") as f: | ||
| f.write(self.html_soup.prettify()) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "html_adapter.py" -type f | head -20Repository: MotleyAI/gslides-api
Length of output: 102
🏁 Script executed:
wc -l gslides_api/adapters/html_adapter.pyRepository: MotleyAI/gslides-api
Length of output: 105
🏁 Script executed:
sed -n '1360,1375p' gslides_api/adapters/html_adapter.py | cat -nRepository: MotleyAI/gslides-api
Length of output: 641
🏁 Script executed:
sed -n '1340,1380p' gslides_api/adapters/html_adapter.py | cat -nRepository: MotleyAI/gslides-api
Length of output: 1838
🏁 Script executed:
rg "prettify\(\)" --type py gslides_api/Repository: MotleyAI/gslides-api
Length of output: 147
🏁 Script executed:
rg "html_soup" --type py gslides_api/adapters/html_adapter.py | head -30Repository: MotleyAI/gslides-api
Length of output: 1535
🏁 Script executed:
python3 << 'EOF'
from bs4 import BeautifulSoup
# Test 1: Simple HTML
html = '<html><head><title>Test</title></head><body><p>Hello</p></body></html>'
soup = BeautifulSoup(html, 'lxml')
print("=== Test 1: Basic HTML ===")
print("str() output length:", len(str(soup)))
print("prettify() output length:", len(soup.prettify()))
print("Are they identical?", str(soup) == soup.prettify())
print()
# Test 2: Edited soup (similar to what happens in the app)
soup2 = BeautifulSoup(html, 'lxml')
# Simulate editing - adding a section
new_section = soup2.new_tag('section')
new_section.string = 'New content'
soup2.body.append(new_section)
str_version = str(soup2)
pretty_version = soup2.prettify()
print("=== Test 2: After editing (adding section) ===")
print("str() length:", len(str_version))
print("prettify() length:", len(pretty_version))
print("Identical?", str_version == pretty_version)
print()
# Test 3: Check if re-parsing gives same DOM
print("=== Test 3: Re-parsing both versions ===")
soup_from_str = BeautifulSoup(str_version, 'lxml')
soup_from_pretty = BeautifulSoup(pretty_version, 'lxml')
print("DOM structure identical after re-parsing?", str(soup_from_str) == str(soup_from_pretty))
print()
# Test 4: Check actual diffs
print("=== Test 4: First 300 chars of each ===")
print("str():")
print(str_version[:300])
print("\nprettify():")
print(pretty_version[:300])
EOFRepository: MotleyAI/gslides-api
Length of output: 392
🌐 Web query:
BeautifulSoup prettify whitespace semantic HTML meaning
💡 Result:
BeautifulSoup.prettify() re-serializes the parse tree and inserts formatting whitespace (newlines + indentation) between and inside tags to make the markup readable. That whitespace is not just “cosmetic”: it becomes real text nodes in the output, so it can change the document’s meaning/rendering (especially around inline elements, or inside whitespace-sensitive contexts like <pre>/<textarea>, and sometimes code-like inline content). Beautiful Soup’s own docs explicitly warn that because prettify() adds whitespace, it can change an HTML document’s meaning and shouldn’t be used as a safe “HTML reformatter.” [1][2]
If you only want to protect certain tags from being reformatted, Beautiful Soup supports preserve_whitespace_tags (tags in this list are output “as-is,” i.e., immune from pretty-printing), but there isn’t a general “indent-only without adding newlines” mode. [1]
🏁 Script executed:
grep -r "round.trip\|re.*pars\|preserve.*html" gslides_api/ --include="*.py" | head -20Repository: MotleyAI/gslides-api
Length of output: 2011
Use str(self.html_soup) instead of prettify() when saving HTML.
BeautifulSoup.prettify() adds formatting whitespace (newlines and indentation) that becomes part of the serialized output. This can change the document's meaning, particularly for inline elements, whitespace-sensitive content (like <pre> tags), and code-like inline text. When saving HTML for re-parsing, use str() to serialize without introducing formatting changes.
Suggested fix
# Save the HTML
with open(html_file, "w", encoding="utf-8") as f:
- f.write(self.html_soup.prettify())
+ f.write(str(self.html_soup))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Save the HTML | |
| with open(html_file, "w", encoding="utf-8") as f: | |
| f.write(self.html_soup.prettify()) | |
| # Save the HTML | |
| with open(html_file, "w", encoding="utf-8") as f: | |
| f.write(str(self.html_soup)) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@gslides_api/adapters/html_adapter.py` around lines 1366 - 1368, Replace the
use of BeautifulSoup.prettify() when writing out the saved HTML to avoid
injecting formatting whitespace: in the block that opens html_file and writes
the document, use str(self.html_soup) instead of self.html_soup.prettify() so
the serialized output preserves original whitespace-sensitive content (refer to
self.html_soup and html_file in the write/save code path).
| # Deep copy the section | ||
| new_section = copy.deepcopy(source_slide.html_section) | ||
|
|
||
| # Generate a unique ID for the copied section to avoid ID collisions | ||
| unique_id = f"slide-{uuid.uuid4().hex[:8]}" | ||
| new_section["id"] = unique_id |
There was a problem hiding this comment.
Cloned slides need fresh ids all the way down.
insert_copy() only changes the <section> id, and duplicate_slide() doesn't even do that. Reusing copied id/data-element-id attributes makes the DOM invalid and can send later selectors/element lookups to the wrong slide.
Also applies to: 1471-1476
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@gslides_api/adapters/html_adapter.py` around lines 1380 - 1385, insert_copy()
currently only replaces the top-level section id (and duplicate_slide()
doesn't), so you must traverse the deep-copied new_section (from
source_slide.html_section) and replace any DOM identifiers inside it with fresh
unique values: update "id" attributes and "data-element-id" attributes on the
section and all descendant nodes (use uuid.uuid4().hex[:8] or the existing
unique_id pattern to generate new tokens), and also update any attributes that
reference those ids (e.g., "for", "aria-labelledby", "aria-describedby") to
point to the new ids; apply the same recursive/id-rewriting logic in
duplicate_slide() so cloned slides never retain original ids or
data-element-ids.
| # TODO: Implement actual XML-based slide reordering | ||
|
|
||
| # For now, just update our local slides list | ||
| self.slides.remove(slide) | ||
| self.slides.insert(insertion_index, slide) |
There was a problem hiding this comment.
move_slide implementation is incomplete.
The method updates the local slides list but the TODO indicates actual XML-based slide reordering is not implemented. This could lead to inconsistency between in-memory state and the saved file.
Do you want me to help implement the XML-based slide reordering, or should this be tracked as a separate issue?
| class ElementSizeMeta(BaseModel): | ||
| """Presentation element size metadata, captured during layout ingestion.""" | ||
|
|
||
| box_width_inches: float | ||
| box_height_inches: float | ||
| font_size_pt: float | ||
|
|
||
| @computed_field | ||
| @property | ||
| def approx_char_capacity(self) -> int: | ||
| """Estimate how many characters fit in this textbox.""" | ||
| char_width_in = self.font_size_pt * 0.5 / 72 | ||
| line_height_in = self.font_size_pt * 1.2 / 72 | ||
| if char_width_in <= 0 or line_height_in <= 0: | ||
| return 0 | ||
| chars_per_line = self.box_width_inches / char_width_in | ||
| num_lines = self.box_height_inches / line_height_in | ||
| return int(chars_per_line * num_lines * 0.85) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add to_api_format() for model interoperability.
This model is missing the standard API conversion method expected for domain-model structures.
Suggested addition
class ElementSizeMeta(BaseModel):
@@
def approx_char_capacity(self) -> int:
@@
return int(chars_per_line * num_lines * 0.85)
+
+ def to_api_format(self) -> dict[str, float]:
+ return {
+ "boxWidthInches": self.box_width_inches,
+ "boxHeightInches": self.box_height_inches,
+ "fontSizePt": self.font_size_pt,
+ "approxCharCapacity": float(self.approx_char_capacity),
+ }As per coding guidelines: "Provide to_api_format() method on all domain models for Google API compatibility".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class ElementSizeMeta(BaseModel): | |
| """Presentation element size metadata, captured during layout ingestion.""" | |
| box_width_inches: float | |
| box_height_inches: float | |
| font_size_pt: float | |
| @computed_field | |
| @property | |
| def approx_char_capacity(self) -> int: | |
| """Estimate how many characters fit in this textbox.""" | |
| char_width_in = self.font_size_pt * 0.5 / 72 | |
| line_height_in = self.font_size_pt * 1.2 / 72 | |
| if char_width_in <= 0 or line_height_in <= 0: | |
| return 0 | |
| chars_per_line = self.box_width_inches / char_width_in | |
| num_lines = self.box_height_inches / line_height_in | |
| return int(chars_per_line * num_lines * 0.85) | |
| class ElementSizeMeta(BaseModel): | |
| """Presentation element size metadata, captured during layout ingestion.""" | |
| box_width_inches: float | |
| box_height_inches: float | |
| font_size_pt: float | |
| `@computed_field` | |
| `@property` | |
| def approx_char_capacity(self) -> int: | |
| """Estimate how many characters fit in this textbox.""" | |
| char_width_in = self.font_size_pt * 0.5 / 72 | |
| line_height_in = self.font_size_pt * 1.2 / 72 | |
| if char_width_in <= 0 or line_height_in <= 0: | |
| return 0 | |
| chars_per_line = self.box_width_inches / char_width_in | |
| num_lines = self.box_height_inches / line_height_in | |
| return int(chars_per_line * num_lines * 0.85) | |
| def to_api_format(self) -> dict[str, float]: | |
| return { | |
| "boxWidthInches": self.box_width_inches, | |
| "boxHeightInches": self.box_height_inches, | |
| "fontSizePt": self.font_size_pt, | |
| "approxCharCapacity": float(self.approx_char_capacity), | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@gslides_api/agnostic/element_size.py` around lines 4 - 21, Add a
to_api_format() instance method to ElementSizeMeta that returns a plain dict
suitable for the Google API (convert fields to the expected API keys and include
the computed approx_char_capacity); locate the ElementSizeMeta class and
implement to_api_format() to map box_width_inches, box_height_inches,
font_size_pt, and approx_char_capacity into the API-friendly names (e.g.,
camelCase) and return them as primitives so other code can call
element.to_api_format() for interoperability.
| # Update ID in cNvPr element if this is a shape and new_id is provided | ||
| if new_id is not None: | ||
| cnv_pr_elements = new_element.xpath('.//p:cNvPr', namespaces=cls.NAMESPACES) | ||
| for cnv_pr in cnv_pr_elements: | ||
| cnv_pr.set('id', str(new_id)) |
There was a problem hiding this comment.
Don't assign the same cNvPr.id to every descendant.
The .//p:cNvPr XPath also hits child shapes inside groups, so copying a grouped element collapses multiple shape ids to one new_id. That's exactly the kind of duplication that triggers PowerPoint repair.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@gslides_api/pptx/xml_utils.py` around lines 73 - 77, The bug is that
new_element.xpath('.//p:cNvPr', namespaces=cls.NAMESPACES) updates cNvPr on all
descendant shapes (collapsing IDs); restrict the XPath to only the immediate
cNvPr of the shape being copied (e.g., use './p:nvSpPr/p:cNvPr' with the same
cls.NAMESPACES) and set its 'id' to str(new_id) for that single element (refer
to new_element, cnv_pr elements, and new_id).
| # Find and clean relationship references | ||
| rel_id_xpath = './/@r:id' | ||
| rel_id_attrs = element.xpath(rel_id_xpath, namespaces=cls.NAMESPACES) | ||
|
|
||
| for attr in rel_id_attrs: | ||
| # For now, we'll clear the relationship ID | ||
| # The RelationshipCopier will handle rebuilding them | ||
| if hasattr(attr, 'getparent'): | ||
| parent = attr.getparent() | ||
| if parent is not None: | ||
| # Remove the r:id attribute temporarily | ||
| parent.attrib.pop(f"{{{cls.NAMESPACES['r']}}}id", None) | ||
|
|
There was a problem hiding this comment.
The fallback cleanup leaves image/chart relationships behind.
When relationship_mapping is absent, copy_shape_element() falls back to clean_element_relationships(), but that helper only removes r:id. Most media links live on r:embed or r:link, so the cloned XML can still point at nonexistent relationships.
Suggested fix
- # Find and clean relationship references
- rel_id_xpath = './/@r:id'
- rel_id_attrs = element.xpath(rel_id_xpath, namespaces=cls.NAMESPACES)
-
- for attr in rel_id_attrs:
- # For now, we'll clear the relationship ID
- # The RelationshipCopier will handle rebuilding them
- if hasattr(attr, 'getparent'):
- parent = attr.getparent()
- if parent is not None:
- # Remove the r:id attribute temporarily
- parent.attrib.pop(f"{{{cls.NAMESPACES['r']}}}id", None)
+ rel_attrs = (
+ f"{{{cls.NAMESPACES['r']}}}id",
+ f"{{{cls.NAMESPACES['r']}}}embed",
+ f"{{{cls.NAMESPACES['r']}}}link",
+ )
+ for elem in element.iter():
+ for attr_name in rel_attrs:
+ elem.attrib.pop(attr_name, None)Also applies to: 351-358
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@gslides_api/pptx/xml_utils.py` around lines 250 - 262,
clean_element_relationships currently only strips r:id, leaving media
relationship attributes like r:embed and r:link on cloned shapes so copied XML
still references missing relationships; update clean_element_relationships (used
by copy_shape_element when relationship_mapping is absent) to also remove
attributes for r:embed and r:link (using cls.NAMESPACES['r']) from the attribute
parent elements, and make the same change in the other analogous cleanup block
referenced in the diff (the second cleanup around the other code region) so
RelationshipCopier can rebuild relationships cleanly.
| try: | ||
| table_element.resize(api_client, rows=4, cols=4) | ||
| except Exception as e: | ||
| # Table resizing might fail due to python-pptx limitations | ||
| print(f"Table resize failed (expected): {e}") | ||
|
|
||
| # Test content update (if we have MarkdownTableElement support) | ||
| try: | ||
| # Create markdown table data | ||
| table_data = [ | ||
| ["Product", "Price", "Stock", "Category"], | ||
| ["Laptop", "$999", "50", "Electronics"], | ||
| ["Phone", "$599", "100", "Electronics"], | ||
| ["Book", "$25", "200", "Education"] | ||
| ] | ||
|
|
||
| # This would need MarkdownTableElement implementation | ||
| # For now, just verify the table element exists | ||
| assert table_element.pptx_element is not None | ||
|
|
||
| except Exception as e: | ||
| print(f"Table content update test skipped: {e}") |
There was a problem hiding this comment.
Exercise update_content() here instead of silently skipping it.
This block never calls table_element.update_content(...); it only builds table_data, asserts that the raw PPTX element exists, and swallows failures. A regression in table markdown application will still leave this integration test green.
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 195-195: Do not catch blind exception: Exception
(BLE001)
[warning] 213-213: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_pptx/test_pptx_integration.py` around lines 193 - 214, Replace the
silent "skipped" block with an actual invocation of table_element.update_content
using the constructed table_data (or the appropriate MarkdownTableElement API)
and assert the content was applied (e.g., verify table_element.pptx_element cell
text or a returned success value), rather than only asserting pptx_element is
not None; remove the broad except that swallows failures so test failures
surface, and if the implementation truly isn't available guard with a clear
skip/assertRaises for update_content instead of printing and continuing.
| try: | ||
| duplicated_slide = first_slide.duplicate(api_client) | ||
| assert isinstance(duplicated_slide, PowerPointSlide) | ||
|
|
||
| # Note: The duplicated slide might not automatically be added to the presentation | ||
| # depending on the implementation, so we might not see an increase in slide count | ||
| print(f"Original slides: {original_slide_count}") | ||
| print(f"Duplicated slide created: {duplicated_slide.objectId}") | ||
|
|
||
| except Exception as e: | ||
| # Slide duplication might have limitations in python-pptx | ||
| print(f"Slide duplication test failed (might be expected): {e}") |
There was a problem hiding this comment.
Don't turn slide-duplication failures into a passing test.
If first_slide.duplicate() breaks, this test still passes because the exception is printed and discarded. Either assert the happy path or explicitly xfail a documented unsupported case.
🧰 Tools
🪛 Ruff (0.15.6)
[warning] 287-287: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_pptx/test_pptx_integration.py` around lines 278 - 289, The test
currently swallows exceptions from first_slide.duplicate(api_client) making
failures pass; update the block so the failure becomes explicit: either assert
the happy path by removing the try/except and using assert
isinstance(duplicated_slide, PowerPointSlide) (and optionally assert slide
count/inclusion), or catch exceptions and call pytest.xfail("slide duplication
not supported: <error>") with the caught exception message; ensure you reference
first_slide.duplicate and PowerPointSlide and import pytest if using xfail, and
do not simply print and discard the exception.
| # Find table elements | ||
| table_elements = [elem for elem in table_slide.elements if elem.type == "table"] | ||
|
|
||
| if table_elements: | ||
| table_element = table_elements[0] | ||
| # Basic verification that table structure is accessible | ||
| assert table_element is not None | ||
| assert hasattr(table_element, 'pptx_element') | ||
|
|
There was a problem hiding this comment.
This "roundtrip" test never checks table data.
It only proves that some table-like element exists. A broken parser or empty table would still pass; assert the extracted cell values or to_markdown_element().content after reload.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_pptx/test_pptx_roundtrip.py` around lines 232 - 240, The test
currently only checks that a table element exists (table_elements /
table_element) but never verifies cell data; update the assertions to validate
the parsed table content by extracting the table's cell values or markdown
content from table_element (for example call
table_element.to_markdown_element().content and/or inspect
table_element.pptx_element cell/text properties) and assert they match the
expected cell strings from the source slide (e.g., check a few known cells or
that the markdown content contains expected headers/rows) so the roundtrip
actually verifies table data integrity.
Summary by CodeRabbit
Release Notes
New Features
prepare-presentationCLI tool for presentation workflow managementImprovements