Describe the bug
Using the import utility with JSON files containing unicode characters fails.
To Reproduce
- Add example JSON file, containing
ë.
example.json
[
{
"id": 7902,
"labels": [
"Foo"
],
"type": "node",
"properties": {
"name": "Categorieën"
}
}
]
- Import it
CALL import_util.json("/data/example.json")
- See error
(click to show)
import_util.json: Traceback (most recent call last):
File "/usr/lib/memgraph/query_modules/import_util.py", line 335, in json
graph_objects = js.load(file)
^^^^^^^^^^^^^
File "/usr/lib/python3.12/json/__init__.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^
File "/usr/lib/python3.12/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 163: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/memgraph/query_modules/import_util.py", line 337, in json
raise OSError("Could not open/read file.")
OSError: Could not open/read file.
Expected behavior
It should be able to import files containing Unicode characters.
Additional context
The issue is caused by https://github.com/memgraph/mage/blob/c72865104f09b8b71b03f7b14136f72376332fa3/python/import_util.py#L334
try:
with open(path, "r") as file: # <- this opens it with ASCII encoding which fails with many characters
graph_objects = js.load(file)
except Exception:
raise OSError("Could not open/read file.")
This can probably be fixed by using with open(path, 'r', encoding="utf-8") as f:. This should support more characters and is backwards compatible with ASCII (all valid ASCII is valid UTF8).
Describe the bug
Using the import utility with JSON files containing unicode characters fails.
To Reproduce
ë.example.json[ { "id": 7902, "labels": [ "Foo" ], "type": "node", "properties": { "name": "Categorieën" } } ](click to show)
Expected behavior
It should be able to import files containing Unicode characters.
Additional context
The issue is caused by https://github.com/memgraph/mage/blob/c72865104f09b8b71b03f7b14136f72376332fa3/python/import_util.py#L334
This can probably be fixed by using
with open(path, 'r', encoding="utf-8") as f:. This should support more characters and is backwards compatible with ASCII (all valid ASCII is valid UTF8).