-
Notifications
You must be signed in to change notification settings - Fork 78
Open
Description
[GCC 8.5.0 20210514 (Red Hat 8.5.0-24)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyreadstat as prs
>>> d,m=prs.read_dta("test.dta")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyreadstat/pyreadstat.pyx", line 296, in pyreadstat.pyreadstat.read_dta
File "pyreadstat/_readstat_parser.pyx", line 1282, in pyreadstat._readstat_parser.run_conversion
File "pyreadstat/_readstat_parser.pyx", line 955, in pyreadstat._readstat_parser.run_readstat_parser
File "pyreadstat/_readstat_parser.pyx", line 877, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to allocate memory
>>>From my investigation, the issue is caused by L451 in readstat_dta_read.c, within dta_read_strls() function. It allocates memory for each string separately in a while loop. Later, at L445, the code is unable to allocate a large continuous chunk of memory because the heap is heavily fragmented.
With the reproducible example https://www.dropbox.com/scl/fi/sx9cz7vjekvud3ail9ph3/test.dta?rlkey=7e5qmwl9tbuoa0967kq3uq65f&st=g3wxulnc&dl=0,
L451 (malloc for each string) was executed approximately 1.6 million times. After that, L445 failed to allocate 26MB of continuous heap memory.
Metadata
Metadata
Assignees
Labels
No labels