A miniature reimplementation of openssl dgst. It exposes two cryptographic
hash commands — md5 and sha256 — driven through the same CLI
shape and output format as the reference tool.
This document walks the program from the entry point down to each algorithm,
so you can read it once and understand exactly how a byte typed on the
command line becomes a hex digest on stdout.
Security note. MD5 is cryptographically broken (collision attacks have been practical since 2004). It is included here because the 42 subject requires it, and because it is still useful as a non-security checksum. Do not use MD5 for anything that needs collision resistance.
- Build & usage
- Project layout
- Architecture: the pipeline from root to algorithm
- The driver layer
- The algorithm layer
- Adding a new hash algorithm
- References
make # produces ./ft_ssl
make re # full rebuild
make clean # remove .o files
make fclean # remove .o files and the binary# Hash a string
./ft_ssl md5 -s "hello"
# MD5 ("hello") = 5d41402abc4b2a76b9719d911017c592
# Hash a file
./ft_ssl sha256 Makefile
# Hash stdin (silent)
echo -n "hello" | ./ft_ssl md5
# (stdin)= 5d41402abc4b2a76b9719d911017c592
# Echo stdin and hash it (-p)
echo "abc" | ./ft_ssl sha256 -p
# Quiet (just the digest), reverse format
./ft_ssl md5 -q -s "hello"
./ft_ssl md5 -r -s "hello"Supported flags: -p (echo stdin), -q (quiet), -r (reverse format), -s STRING.
ft_ssl_md5/
├── Makefile
├── includes/
│ ├── ft_ssl.h # shared types, dispatch table, prototypes
│ ├── md5.h # MD5 streaming + one-shot API
│ └── sha256.h # SHA-256 streaming + one-shot API
└── srcs/
├── main.c # entry point, argv validation
├── dispatch.c # command lookup table
├── parsing.c # flag / input parser
├── io.c # read, hash, format, write
├── utils.c # ft_strlen, ft_xmalloc, hex encoder, ...
├── md5/
│ └── md5.c # MD5 implementation (RFC 1321)
└── sha256/
└── sha256.c # SHA-256 implementation (FIPS 180-4)
Every invocation flows through the same five-stage pipeline. The driver
layer is algorithm-agnostic — it never mentions MD5 or SHA-256. It
selects an algorithm by looking up a t_command in the dispatch table and
calling its fn pointer.
argv[]
│
▼
┌─────────────────┐
│ main.c │ validate argc, look up command
└────────┬────────┘
│ cmd = dispatch_find(argv[1])
▼
┌─────────────────┐
│ dispatch.c │ { name, label, digest_len, fn }
└────────┬────────┘
│ cmd->fn (e.g. md5_hash)
▼
┌─────────────────┐
│ parsing.c │ argv → t_flags + t_input[]
└────────┬────────┘
│
▼
┌─────────────────┐
│ io.c │ for each input:
│ │ read bytes
│ │ cmd->fn(bytes, len, digest)
│ │ ft_hex_encode + print_result
└────────┬────────┘
│ digest bytes
▼
┌─────────────────┐
│ md5.c / sha256.c│ init → update* → final
└─────────────────┘
The contract that ties the two layers together lives in ft_ssl.h:
typedef void (*t_hash_fn)(const uint8_t *data, size_t len, uint8_t *digest);
typedef struct s_command {
const char *name; // "md5"
const char *label; // "MD5"
size_t digest_len; // 16 or 32
t_hash_fn fn; // md5_hash, sha256_hash, ...
} t_command;Anything that satisfies t_hash_fn plugs in.
int main(int argc, char **argv) does five things, and nothing else:
- Validate argv. If
argc < 2, print usage and exit 1. - Resolve the command.
dispatch_find(argv[1])returns a pointer into the static command table orNULL.NULL⇒print_usage_error, exit 1. - Allocate input storage. Worst case: every remaining argv entry is an
input, plus one possible implicit
INPUT_STDIN. Soargcslots is always enough. - Parse.
parse_args(argc - 2, argv + 2, &flags, inputs, &input_count)splits the remaining argv into flag bits and an ordered input list. - Process.
process_inputs(cmd, &flags, inputs, input_count)does the actual hashing and printing.
main is intentionally tiny — every responsibility lives in its own file.
A single const-qualified, NULL-terminated array:
const t_command g_commands[] = {
{ "md5", "MD5", 16, md5_hash },
{ "sha256", "SHA256", 32, sha256_hash },
{ NULL, NULL, 0, NULL }
};dispatch_find is a linear scan — fine for a tiny table, and far more
readable than a hash map. The sentinel terminator means callers don't need
to track the length.
This table is the only place that knows which algorithms exist. To add
SHA-512 you would add one row here, include the header, and add the source
file to the Makefile. Nothing in main.c, parsing.c, or io.c changes.
The slice of argv after the command name (e.g. -q -s hello file.txt) is
turned into:
- a
t_flagsstruct (p,q,rbits), and - an ordered array of
t_input { type, value }.
t_input_type is one of:
| type | meaning |
|---|---|
INPUT_STDIN |
read from fd 0 |
INPUT_STRING |
hash value (NUL-terminated) |
INPUT_FILE |
open value and hash its contents |
The parsing rule mirrors OpenSSL exactly:
- Walk argv left to right in flag mode.
- An argument starting with
-is parsed character by character — bundled flags like-qrwork. -sis special: it consumes the next argv as anINPUT_STRING.- The first bare positional argument (anything not starting with
-, or any unknown flag) ends flag mode. Every remaining argv is treated as anINPUT_FILE, even if it looks like a flag. - After the loop:
- if
-pwas set, prepend anINPUT_STDINentry (so the-pecho happens first); - if no inputs were collected at all, append a single silent
INPUT_STDINentry.
- if
That single function captures all of openssl's quirky CLI behaviour.
process_inputs iterates the input array and dispatches each entry:
INPUT_STRING→ft_strlen(value)bytes are passed straight tocmd->fn.INPUT_STDIN→read_all(0, ...)slurps the whole stream into a doubling-allocation buffer, then hashes it. With-pset, the buffer is also written to stdout before the hash line.INPUT_FILE→open + read_all + close, then hash. File errors are reported in OpenSSL format (ft_ssl: md5: foo: No such file or directory) and processing continues with the next input.
After hashing, ft_hex_encode turns the raw digest bytes into a lowercase
hex string and print_result writes one of these formats:
| Source / flags | Output |
|---|---|
| stdin (silent) | (stdin)= <hex> |
stdin + -p |
echoed bytes, then ("...")= <hex> |
string -s X |
MD5 ("X") = <hex> |
string -s X + -r |
<hex> "X" |
file F |
MD5 (F) = <hex> |
file F + -r |
<hex> F |
any input + -q |
<hex> (just the digest) |
All output goes through write(2) directly — no printf, no stdio buffer,
to satisfy the project's allowed-functions list.
A handful of self-contained helpers that the rest of the program builds on:
ft_strlen,ft_strcmp— the two libc functions we are not allowed to use.ft_xmalloc(size)—malloc+ zero-init + exit on failure.ft_putstr_fd,ft_putendl_fd— thin wrappers aroundwrite.ft_hex_encode(bytes, len, out)— converts each byte to two lowercase hex chars using a static lookup table; the high nibble isbyte >> 4, the low nibble isbyte & 0x0f.
Both MD5 and SHA-256 are Merkle–Damgård hashes. Their high-level shape is the same — only the constants, mixing functions, and endianness differ.
┌─────────┐
IV ─────│ │───► state
│compress │
block₀ ────────────│ │
└─────────┘
▼
┌─────────┐
state ──►│ │───► state
│compress │
block₁ ────────────│ │
└─────────┘
▼
...
▼
┌─────────┐
state ──►│ │───► final state
│compress │
padded last ───────│ │
└─────────┘
▼
digest
The structure has three parts:
- Initial value (IV). A fixed set of state words baked into the spec.
- Block compression. A function
state' = compress(state, block)that eats one fixed-size block of input. - Padding. The message is padded so its total length is a multiple of the block size, with the original bit-length encoded in the trailing bytes. This makes the hash length-extension safe… ish. (Both MD5 and SHA-256 are still vulnerable to length extension; HMAC fixes that.)
The padding scheme used by both:
[ original message ][ 0x80 ][ 0x00 × k ][ 64-bit length ]
^
MD5: little-endian
SHA-256: big-endian
k is chosen so the total padded length is a multiple of 64 bytes (with
exactly 8 bytes left at the end for the length field).
Both algorithms expose an identical three-call API plus a one-shot wrapper:
md5_init(&ctx); // load IV into ctx->state
md5_update(&ctx, data, len); // feed bytes (any number of times)
md5_final(&ctx, digest); // pad, compress, serialise
md5_hash(data, len, digest); // = init + update + finalThe shared context layout:
typedef struct {
uint32_t state[N]; // accumulator (4 for MD5, 8 for SHA-256)
uint64_t bit_count; // total bits fed so far
uint8_t buffer[BLOCK_SIZE]; // partial block awaiting a flush
size_t buf_used; // valid bytes in buffer
} t_xxx_ctx;update is a tiny state machine:
while (bytes left in input):
if input cannot fill the buffer:
copy into buffer; return
fill buffer to 64 bytes
compress(state, buffer)
buf_used = 0
final builds the padding into a local 128-byte scratch array, feeds it
through update (so the same compression path runs), then serialises the
state words into the output digest.
md5_hash / sha256_hash is the convenience wrapper used by the
dispatch table — one call, no streaming.
Reference: RFC 1321 — The MD5 Message-Digest Algorithm, R. Rivest, 1992.
At a glance
| property | value |
|---|---|
| block size | 64 bytes (512 bits) |
| state | 4 × 32-bit words (A, B, C, D) |
| digest size | 16 bytes (128 bits) |
| rounds / block | 64 |
| endianness | little-endian |
Initial state (md5_init)
A = 0x67452301
B = 0xefcdab89
C = 0x98badcfe
D = 0x10325476These are nothing-up-my-sleeve numbers chosen by Rivest in the original spec.
Round constants K[64]
K[i] = floor(|sin(i + 1)| × 2^32) for i = 0..63
Precomputed in md5.c so we don't drag in <math.h>. They are the
fractional parts of the sines of the integers, scaled to 32 bits.
Per-round shift table S[64]
The same four rotations repeat in each group of 16:
F-rounds (1-16): 7, 12, 17, 22
G-rounds (17-32): 5, 9, 14, 20
H-rounds (33-48): 4, 11, 16, 23
I-rounds (49-64): 6, 10, 15, 21
The four nonlinear functions (each takes three 32-bit words, returns one):
F(b, c, d) = (b & c) | (~b & d) // "bit-select": if b then c else d
G(b, c, d) = (b & d) | (c & ~d) // similar but with d as the selector
H(b, c, d) = b ^ c ^ d // XOR — purely linear
I(b, c, d) = c ^ (b | ~d)The block compression function md5_compress
For each 64-byte block:
-
Load the block as 16 little-endian 32-bit words
m[0..15]. -
Copy state into working variables
a, b, c, d. -
Run 64 rounds. Each round picks an
(f, g_idx)pair based on the round group, then does:temp = f + a + K[i] + m[g_idx]; a = d; d = c; c = b; b = b + ROTL32(temp, S[i]);
The message-word index
g_idxpermutes throughm[]differently in each group:rounds function g_idxformula0–15 F i16–31 G (5·i + 1) mod 1632–47 H (3·i + 5) mod 1648–63 I (7·i) mod 16 -
Add the working variables back into the state:
state[0] += a; state[1] += b; state[2] += c; state[3] += d;
Finalisation (md5_final)
- Append
0x80. - Append zero bytes until the length is
≡ 56 (mod 64). - Append the 64-bit message length in little-endian order as the last 8 bytes of the padded message.
- Run
md5_compresson the final block(s). - Serialise
state[0..3]todigest[0..15]in little-endian order.
Worked example:
$ ./ft_ssl md5 -s ""
MD5 ("") = d41d8cd98f00b204e9800998ecf8427e
$ ./ft_ssl md5 -s "abc"
MD5 ("abc") = 900150983cd24fb0d6963f7d28e17f72
The empty-string digest d41d8cd98f00b204e9800998ecf8427e is the canonical
MD5 sanity check.
Reference: FIPS PUB 180-4 — Secure Hash Standard, NIST, 2015.
At a glance
| property | value |
|---|---|
| block size | 64 bytes (512 bits) |
| state | 8 × 32-bit words (H0..H7) |
| digest size | 32 bytes (256 bits) |
| rounds / block | 64 |
| endianness | big-endian |
Initial state (sha256_init)
The first 32 bits of the fractional parts of √p for the first 8 primes
(2, 3, 5, 7, 11, 13, 17, 19):
H0 = 0x6a09e667 H1 = 0xbb67ae85
H2 = 0x3c6ef372 H3 = 0xa54ff53a
H4 = 0x510e527f H5 = 0x9b05688c
H6 = 0x1f83d9ab H7 = 0x5be0cd19Round constants K[64]
The first 32 bits of the fractional parts of ∛p for the first 64 primes.
Hard-coded in sha256.c.
Bitwise functions (FIPS 180-4 §3.2 / §4.1.2):
ROTR32(x, n) = (x >> n) | (x << (32 - n))
SHR(x, n) = x >> n
CH (x, y, z) = (x & y) ^ (~x & z) // "choose": pick y where x is 1, z elsewhere
MAJ(x, y, z) = (x & y) ^ (x & z) ^ (y & z) // bitwise majority
BSIG0(x) = ROTR(x, 2) ^ ROTR(x, 13) ^ ROTR(x, 22) // big-sigma 0
BSIG1(x) = ROTR(x, 6) ^ ROTR(x, 11) ^ ROTR(x, 25) // big-sigma 1
SSIG0(x) = ROTR(x, 7) ^ ROTR(x, 18) ^ SHR(x, 3) // small-sigma 0
SSIG1(x) = ROTR(x, 17) ^ ROTR(x, 19) ^ SHR(x, 10) // small-sigma 1BSIGx are used in the per-round state mixing; SSIGx are used in the
message schedule that precedes mixing.
The block compression function sha256_compress
For each 64-byte block:
-
Build the message schedule
w[0..63](FIPS §6.2.2 step 1).for i in 0..15: w[i] = big-endian word from block[i*4 .. i*4+3] for i in 16..63: w[i] = SSIG1(w[i-2]) + w[i-7] + SSIG0(w[i-15]) + w[i-16]
The first 16 words come straight from the input. The remaining 48 are derived from earlier ones — this expansion is what makes SHA-256 resistant to the simpler attacks that broke MD5.
-
Initialise working variables.
a..h ← H0..H7. -
Run 64 rounds.
for i in 0..63: T1 = h + BSIG1(e) + CH(e, f, g) + K[i] + w[i]; T2 = BSIG0(a) + MAJ(a, b, c); h = g; g = f; f = e; e = d + T1; d = c; c = b; b = a; a = T1 + T2;
-
Add back into the state.
H0 += a; H1 += b; H2 += c; H3 += d; H4 += e; H5 += f; H6 += g; H7 += h;
Finalisation (sha256_final)
Same structure as MD5, with one critical difference: the bit-length is appended in big-endian order, and the eight state words are serialised in big-endian order to produce the 32-byte digest.
Worked example:
$ ./ft_ssl sha256 -s ""
SHA256 ("") = e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
$ ./ft_ssl sha256 -s "abc"
SHA256 ("abc") = ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
The "abc" digest is the official FIPS 180-4 test vector.
| MD5 | SHA-256 | |
|---|---|---|
| Block size | 64 bytes | 64 bytes |
| State words | 4 × 32-bit | 8 × 32-bit |
| Digest size | 16 bytes | 32 bytes |
| Rounds per block | 64 | 64 |
| Endianness | little-endian | big-endian |
| Round constants | `floor( | sin(i+1) |
| Initial state | fixed magic words | fractional bits of √p for first 8 primes |
| Message schedule | reuses m[0..15] with index permutation |
expands to w[0..63] via SSIG0/SSIG1 |
| Mixing functions | F, G, H, I | CH, MAJ, BSIG0/1 |
| Cryptographic security | broken (collisions) | considered secure |
The structural differences explain the security gap. MD5's rounds always operate on the same 16 raw message words (just with different orderings), so an attacker has more leverage to craft colliding inputs. SHA-256's expanded 64-word schedule, larger state, and more nonlinear mixing make the same kind of attack vastly harder.
The plug-in shape was designed to make this almost mechanical. Suppose you want to add SHA-224:
- Header. Create
includes/sha224.hwithsha224_hash(...)matching thet_hash_fnsignature. - Implementation. Create
srcs/sha224/sha224.c. Often you can reuse the SHA-256 compression function with different IVs and a truncated output. - Register. In
srcs/dispatch.c:#include "sha224.h" ... { "sha224", "SHA224", 28, sha224_hash },
- Build. Add
srcs/sha224/sha224.ctoSRCSin theMakefile.
That's it. main.c, parsing.c, io.c, and utils.c do not need to
know the new algorithm exists — they receive bytes, hand them to
cmd->fn, and format cmd->digest_len bytes of output.
- RFC 1321 — The MD5 Message-Digest Algorithm, R. Rivest, April 1992. https://www.rfc-editor.org/rfc/rfc1321
- FIPS PUB 180-4 — Secure Hash Standard (SHS), NIST, August 2015. https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf
- Wang, Yu (2005) — How to Break MD5 and Other Hash Functions. The original MD5 collision paper.
- OpenSSL —
openssl dgst -md5,openssl dgst -sha256. The reference output format this project emulates.