Skip to content

Modalities expects JSONL to have a trailing \n for the last JSON document (best practice but not guaranteed by JSONL file format) #395

@le1nux

Description

@le1nux

System Info

current master version (cb096c2) and probably all previous versions.

🐛 Describe the bug

Modalities currently expects that each JSONL file ends with a trailing \n.

While a trailing \nafter all JSONs is best practice, it is not enforced by the JSONL format that also the last JSON ends with \n.

https://jsonlines.org/

Including a line terminator after the last JSON value in a file is strongly recommended but not required.

In Modalities, this can lead to e.g., two documents getting concatenated when shuffling (last document + some non-last document). For large files, this is probably negligible ... but for consistency, we should fix this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions