Skip to content

osamasrour/bpe_compressor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Implementation of Byte-Pair Encoding in pure C


bpe_compressor is a simple compressor for .txt files using BPE algorithm.

Warning

It's not meant to be used in production.


Build

.\build.bat

Quick Start

$ .\main.exe zip input.txt output.bpe

or

$ .\main.exe unzip input.bpe output.txt

Format of BPE container

_______________________________________
|BPE(version)(LE|BE)                   |
|                                      |
|(compressed data length as uint32_t)  |
|____________________________          |
|                            |         |
|                            |         |
|     (compressed data       |         |
|      as linked list)       |         |
|                            |         |
|____________________________|         |
|   (highest element value in the      |
|      compressed data as uint32_t)    |
|                                      |
|   (pairs array length as uint32_t)   |
|                                      |
|____________________________          |
|                            |         |
|                            |         |
|     (pairs array           |         |
|      as uint32_t[])        |         |
|                            |         |
|____________________________|         |
|                                      |
|                EOF                   |
|--------------------------------------|

References

Byte-Pair Encoding (Medium)

Byte-pair encoding (Wikipedia)

Serialize and Deserialize Binary in C

Implementing Serialization and Deserialization in C

Releases

No releases published

Packages

 
 
 

Contributors