Skip to content

Streaming multi-byte UTF8 characters not being parsed correctly #13

@jlank

Description

@jlank

When streaming data into jsonparse that consists of multi-byte utf8 characters, if a data chunk splits a multi-byte character, jsonparse does not properly reconcile the character between data events. I wrote a quick demo repo to show this behavior and started writing blog post to explain the issue in more detail (not finished). In the meantime check the demo repo out, it has the current implementation and proposed patch working. For more context on this issue see this thread with @mikeal discussing where the "proper" place to reconcile / parse mutli-byte utf8 characters is. I already have a proposed fix written up for jsonparse with test cases, but wanted to open an issue first and get your feedback before I made a PR.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions