Conversation
…ecially for big files), so we include specialized bindings to libsodium, and we remove pynacl dependency. Building now must be adapted so it finds libsodium. Not decided yet on whether to leave it in the docs with LDFLAGS/CFLAGS or bundling a version in crypt4gh
…on. No need for yaml, and less error-prone
Updating github action Using libsodium as submodule. We use branch "stable", which already contains ./configure, so we won't need autotools.
|
I was wondering about the timing difference between encryption and decryption. python -m cProfile -o profile.encrypt -m crypt4gh encrypt --recipient_pk pubkey < bigfile > bigfile.profile.c4gh
python -m cProfile -o profile.decrypt -m crypt4gh decrypt --sk seckey < bigfile.profile.c4gh > bigfile.profile.decrypted>>> import pstats
>>> p = pstats.Stats('profile.encrypt')
>>> p.strip_dirs().sort_stats('time').print_stats()
543649 function calls (541893 primitive calls) in 18.162 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
163841 13.510 0.000 13.510 0.000 {built-in method crypt4gh.sodium.chacha20poly1305_encrypt}
163841 2.591 0.000 2.591 0.000 {method 'readinto' of '_io.BufferedReader' objects}
163841 1.441 0.000 1.441 0.000 {method 'write' of '_io.BufferedWriter' objects}
1 0.585 0.585 18.126 18.126 lib.py:33(encrypt)
...
>>> d = pstats.Stats('profile.decrypt')
>>> d.strip_dirs().sort_stats('time').print_stats()
2680795 function calls (2678855 primitive calls) in 28.156 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
163841 13.957 0.000 13.957 0.000 {built-in method crypt4gh.sodium.chacha20poly1305_decrypt}
163854 5.195 0.000 5.195 0.000 {method 'hex' of 'bytes' objects}
163841 5.111 0.000 5.111 0.000 {method 'hex' of 'bytearray' objects}
163918 1.302 0.000 1.302 0.000 {method 'read' of '_io.BufferedReader' objects}
163840 1.189 0.000 1.189 0.000 {method 'write' of '_io.BufferedWriter' objects}
1 0.772 0.772 27.967 27.967 lib.py:181(body_decrypt)
163842 0.283 0.000 1.481 0.000 lib.py:141(limited_output)
...huh? 5 seconds doing for ciphersegment in cipher_chunker(infile, CIPHER_SEGMENT_SIZE):
LOG.debug("Ciphersegment [%d]: %s", len(ciphersegment), ciphersegment.hex())
plen = decrypt_block(segment, ciphersegment, session_keys)
LOG.debug("Segment [%d]: %s", plen, segment[:plen].hex())After commenting them out, we got back to similar timings between encryption and decryption. >>> d2 = pstats.Stats('profile.decrypt.2')
>>> d2.strip_dirs().sort_stats('time').print_stats()
1534265 function calls (1532325 primitive calls) in 16.561 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
163841 13.580 0.000 13.580 0.000 {built-in method crypt4gh.sodium.chacha20poly1305_decrypt}
163840 1.130 0.000 1.130 0.000 {method 'write' of '_io.BufferedWriter' objects}
163918 0.913 0.000 0.913 0.000 {method 'read' of '_io.BufferedReader' objects}
163842 0.270 0.000 1.408 0.000 lib.py:141(limited_output)
1 0.234 0.234 16.210 16.210 lib.py:181(body_decrypt)
18/17 0.204 0.011 0.205 0.012 {built-in method _imp.create_dynamic}
1 0.099 0.099 0.099 0.099 {built-in method bcrypt._bcrypt.kdf}
163841 0.031 0.000 0.946 0.000 lib.py:116(cipher_chunker)
163840 0.022 0.000 13.602 0.000 lib.py:126(decrypt_block)
163840 0.020 0.000 1.427 0.000 {method 'send' of 'generator' objects}
331213/331015 0.014 0.000 0.014 0.000 {built-in method builtins.len}
...$ time crypt4gh decrypt --sk seckey < bigfile.old.c4gh > bigfile.c.decrypted
real 0m18.269s
user 0m14.412s
sys 0m1.836s |
omllobet
left a comment
There was a problem hiding this comment.
I installed the extension using pip install git+https://github.com/EGA-archive/crypt4gh.git@c-extension
The install worked perfectly. I tried to encrypt a 10 GB file and saw an improvement :
1.7:
real 0m23,390s
user 0m7,793s
sys 0m6,604s
1.8:
real 0m19,468s
user 0m6,099s
sys 0m6,354s
Oscar
|
oh... Moreover, are you sure about the numbers? v1.7 took only 23 seconds? In my tests, it's 2m36s! (and I have a powerful machine!). Can you double-check the numbers, please? |
|
I'll send you some details of the install log and some other details in the log, but it did a |
|
Thanks for the install logs, @omllobet! |
We bump the version to 1.8 (which would also include the change to docopt-ng).
In this PR, we remove the dependency to
pynacl, which usescffiand forces the use ofbytes, rather thanbytearray. That makes it difficult to use for big files, since every segment triggers lots of memory allocations instead of re-using the same buffer (see pynacl issue#707). Instead, we implemented a Python C extension that binds to libsodium (and tries to retain only the used functions in the final shared object).We update the testsuite to run on macOS and ubuntu, for python versions 3.9 to 3.14. We adapt the compilation to a bundled libsodium, and to an already installed system-wide version.
We update the docs (showing on readthedocs.org) to reflect the installation changes.
We did not push yet to PyPI (we will do so when the PR is merged, solving #52 at the same time).
Efficiency:
Here is a small benchmark: we encrypt a bigfile (10 GB) using version 1.7 and this new C-extension. We then decrypt the files using the version the other encrypted. We are using a Macbook Air 2025 (M4, 32 GB, with SSD).
We encrypt using version 1.7:
We encrypt using the C extension:
We decrypt using version 1.7:
We decrypt using the C extension:
Using
diff, we get thatbigfile == bigfile.c.decrypted == bigfile.old.decryptedConclusion: about 6 to 8 times faster ⇒ Not too bad