Fix UnicodeMapper and OpenXmlRegex bugs regarding lastRenderedPageBreakFix/open xml regex bugs #178
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some DOCX files contain <w:lastRenderedPageBreak/> elements, which seems to be Word's way of indicating that "the last time I calculated the pagination for this document, a page break was here." While this might be useful for some applications, that element does NOT indicate any actual visible or editable content in the document.
The element is not recognized by UnicodeMapper (which renders it as a U+0001 control character), and because of that, it also messes up the behavior of OpenXmlRegex. The UnicodeMapper issue also screws up the behavior of DocumentAssembler WHEN the <w:lastRenderedPageBreak/> happens to fall within the contents of a field... in this case, the control character becomes part of the XML DocumentAssembler is trying to parse, and it throws an exception. Fixing the issue in UnicodeMapper fixes that DA exception, but then you get another related issue in OpenXmlRegex (as it is also used by DocumentAssembler), so both must be fixed at the same time.
I have added test cases that highlight specific failure cases and then fixed the bugs so the test cases pass.