Skip to content

42-Course/ft_ssl_md5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

ft_ssl

A miniature reimplementation of openssl dgst. It exposes two cryptographic hash commands — md5 and sha256 — driven through the same CLI shape and output format as the reference tool.

This document walks the program from the entry point down to each algorithm, so you can read it once and understand exactly how a byte typed on the command line becomes a hex digest on stdout.

Security note. MD5 is cryptographically broken (collision attacks have been practical since 2004). It is included here because the 42 subject requires it, and because it is still useful as a non-security checksum. Do not use MD5 for anything that needs collision resistance.


Table of contents

  1. Build & usage
  2. Project layout
  3. Architecture: the pipeline from root to algorithm
  4. The driver layer
  5. The algorithm layer
  6. Adding a new hash algorithm
  7. References

Build & usage

make            # produces ./ft_ssl
make re         # full rebuild
make clean      # remove .o files
make fclean     # remove .o files and the binary
# Hash a string
./ft_ssl md5 -s "hello"
# MD5 ("hello") = 5d41402abc4b2a76b9719d911017c592

# Hash a file
./ft_ssl sha256 Makefile

# Hash stdin (silent)
echo -n "hello" | ./ft_ssl md5
# (stdin)= 5d41402abc4b2a76b9719d911017c592

# Echo stdin and hash it (-p)
echo "abc" | ./ft_ssl sha256 -p

# Quiet (just the digest), reverse format
./ft_ssl md5 -q -s "hello"
./ft_ssl md5 -r -s "hello"

Supported flags: -p (echo stdin), -q (quiet), -r (reverse format), -s STRING.


Project layout

ft_ssl_md5/
├── Makefile
├── includes/
│   ├── ft_ssl.h          # shared types, dispatch table, prototypes
│   ├── md5.h             # MD5 streaming + one-shot API
│   └── sha256.h          # SHA-256 streaming + one-shot API
└── srcs/
    ├── main.c            # entry point, argv validation
    ├── dispatch.c        # command lookup table
    ├── parsing.c         # flag / input parser
    ├── io.c              # read, hash, format, write
    ├── utils.c           # ft_strlen, ft_xmalloc, hex encoder, ...
    ├── md5/
    │   └── md5.c         # MD5 implementation (RFC 1321)
    └── sha256/
        └── sha256.c      # SHA-256 implementation (FIPS 180-4)

Architecture: the pipeline from root to algorithm

Every invocation flows through the same five-stage pipeline. The driver layer is algorithm-agnostic — it never mentions MD5 or SHA-256. It selects an algorithm by looking up a t_command in the dispatch table and calling its fn pointer.

                         argv[]
                           │
                           ▼
                  ┌─────────────────┐
                  │     main.c      │   validate argc, look up command
                  └────────┬────────┘
                           │ cmd = dispatch_find(argv[1])
                           ▼
                  ┌─────────────────┐
                  │   dispatch.c    │   { name, label, digest_len, fn }
                  └────────┬────────┘
                           │ cmd->fn (e.g. md5_hash)
                           ▼
                  ┌─────────────────┐
                  │   parsing.c     │   argv  →  t_flags + t_input[]
                  └────────┬────────┘
                           │
                           ▼
                  ┌─────────────────┐
                  │     io.c        │   for each input:
                  │                 │     read bytes
                  │                 │     cmd->fn(bytes, len, digest)
                  │                 │     ft_hex_encode + print_result
                  └────────┬────────┘
                           │ digest bytes
                           ▼
                  ┌─────────────────┐
                  │ md5.c / sha256.c│   init → update* → final
                  └─────────────────┘

The contract that ties the two layers together lives in ft_ssl.h:

typedef void (*t_hash_fn)(const uint8_t *data, size_t len, uint8_t *digest);

typedef struct s_command {
    const char *name;        // "md5"
    const char *label;       // "MD5"
    size_t      digest_len;  // 16 or 32
    t_hash_fn   fn;          // md5_hash, sha256_hash, ...
} t_command;

Anything that satisfies t_hash_fn plugs in.


The driver layer

main.c — entry point

int main(int argc, char **argv) does five things, and nothing else:

  1. Validate argv. If argc < 2, print usage and exit 1.
  2. Resolve the command. dispatch_find(argv[1]) returns a pointer into the static command table or NULL. NULLprint_usage_error, exit 1.
  3. Allocate input storage. Worst case: every remaining argv entry is an input, plus one possible implicit INPUT_STDIN. So argc slots is always enough.
  4. Parse. parse_args(argc - 2, argv + 2, &flags, inputs, &input_count) splits the remaining argv into flag bits and an ordered input list.
  5. Process. process_inputs(cmd, &flags, inputs, input_count) does the actual hashing and printing.

main is intentionally tiny — every responsibility lives in its own file.

dispatch.c — command table

A single const-qualified, NULL-terminated array:

const t_command g_commands[] = {
    { "md5",    "MD5",    16, md5_hash    },
    { "sha256", "SHA256", 32, sha256_hash },
    { NULL,     NULL,      0, NULL        }
};

dispatch_find is a linear scan — fine for a tiny table, and far more readable than a hash map. The sentinel terminator means callers don't need to track the length.

This table is the only place that knows which algorithms exist. To add SHA-512 you would add one row here, include the header, and add the source file to the Makefile. Nothing in main.c, parsing.c, or io.c changes.

parsing.c — argv → flags + inputs

The slice of argv after the command name (e.g. -q -s hello file.txt) is turned into:

  • a t_flags struct (p, q, r bits), and
  • an ordered array of t_input { type, value }.

t_input_type is one of:

type meaning
INPUT_STDIN read from fd 0
INPUT_STRING hash value (NUL-terminated)
INPUT_FILE open value and hash its contents

The parsing rule mirrors OpenSSL exactly:

  1. Walk argv left to right in flag mode.
  2. An argument starting with - is parsed character by character — bundled flags like -qr work.
  3. -s is special: it consumes the next argv as an INPUT_STRING.
  4. The first bare positional argument (anything not starting with -, or any unknown flag) ends flag mode. Every remaining argv is treated as an INPUT_FILE, even if it looks like a flag.
  5. After the loop:
    • if -p was set, prepend an INPUT_STDIN entry (so the -p echo happens first);
    • if no inputs were collected at all, append a single silent INPUT_STDIN entry.

That single function captures all of openssl's quirky CLI behaviour.

io.c — read, hash, format

process_inputs iterates the input array and dispatches each entry:

  • INPUT_STRINGft_strlen(value) bytes are passed straight to cmd->fn.
  • INPUT_STDINread_all(0, ...) slurps the whole stream into a doubling-allocation buffer, then hashes it. With -p set, the buffer is also written to stdout before the hash line.
  • INPUT_FILEopen + read_all + close, then hash. File errors are reported in OpenSSL format (ft_ssl: md5: foo: No such file or directory) and processing continues with the next input.

After hashing, ft_hex_encode turns the raw digest bytes into a lowercase hex string and print_result writes one of these formats:

Source / flags Output
stdin (silent) (stdin)= <hex>
stdin + -p echoed bytes, then ("...")= <hex>
string -s X MD5 ("X") = <hex>
string -s X + -r <hex> "X"
file F MD5 (F) = <hex>
file F + -r <hex> F
any input + -q <hex> (just the digest)

All output goes through write(2) directly — no printf, no stdio buffer, to satisfy the project's allowed-functions list.

utils.c — small helpers

A handful of self-contained helpers that the rest of the program builds on:

  • ft_strlen, ft_strcmp — the two libc functions we are not allowed to use.
  • ft_xmalloc(size)malloc + zero-init + exit on failure.
  • ft_putstr_fd, ft_putendl_fd — thin wrappers around write.
  • ft_hex_encode(bytes, len, out) — converts each byte to two lowercase hex chars using a static lookup table; the high nibble is byte >> 4, the low nibble is byte & 0x0f.

The algorithm layer

Shared shape: Merkle–Damgård

Both MD5 and SHA-256 are Merkle–Damgård hashes. Their high-level shape is the same — only the constants, mixing functions, and endianness differ.

                           ┌─────────┐
                  IV  ─────│         │───► state
                           │compress │
        block₀ ────────────│         │
                           └─────────┘
                                ▼
                           ┌─────────┐
                  state ──►│         │───► state
                           │compress │
        block₁ ────────────│         │
                           └─────────┘
                                ▼
                              ...
                                ▼
                           ┌─────────┐
                  state ──►│         │───► final state
                           │compress │
        padded last ───────│         │
                           └─────────┘
                                ▼
                            digest

The structure has three parts:

  1. Initial value (IV). A fixed set of state words baked into the spec.
  2. Block compression. A function state' = compress(state, block) that eats one fixed-size block of input.
  3. Padding. The message is padded so its total length is a multiple of the block size, with the original bit-length encoded in the trailing bytes. This makes the hash length-extension safe… ish. (Both MD5 and SHA-256 are still vulnerable to length extension; HMAC fixes that.)

The padding scheme used by both:

[ original message ][ 0x80 ][ 0x00 × k ][ 64-bit length ]
                                              ^
                                              MD5: little-endian
                                              SHA-256: big-endian

k is chosen so the total padded length is a multiple of 64 bytes (with exactly 8 bytes left at the end for the length field).

The streaming API: init / update / final

Both algorithms expose an identical three-call API plus a one-shot wrapper:

md5_init(&ctx);                           // load IV into ctx->state
md5_update(&ctx, data, len);              // feed bytes (any number of times)
md5_final(&ctx, digest);                  // pad, compress, serialise
md5_hash(data, len, digest);              // = init + update + final

The shared context layout:

typedef struct {
    uint32_t state[N];                    // accumulator (4 for MD5, 8 for SHA-256)
    uint64_t bit_count;                   // total bits fed so far
    uint8_t  buffer[BLOCK_SIZE];          // partial block awaiting a flush
    size_t   buf_used;                    // valid bytes in buffer
} t_xxx_ctx;

update is a tiny state machine:

while (bytes left in input):
    if input cannot fill the buffer:
        copy into buffer; return
    fill buffer to 64 bytes
    compress(state, buffer)
    buf_used = 0

final builds the padding into a local 128-byte scratch array, feeds it through update (so the same compression path runs), then serialises the state words into the output digest.

md5_hash / sha256_hash is the convenience wrapper used by the dispatch table — one call, no streaming.


MD5 in depth

Reference: RFC 1321The MD5 Message-Digest Algorithm, R. Rivest, 1992.

At a glance

property value
block size 64 bytes (512 bits)
state 4 × 32-bit words (A, B, C, D)
digest size 16 bytes (128 bits)
rounds / block 64
endianness little-endian

Initial state (md5_init)

A = 0x67452301
B = 0xefcdab89
C = 0x98badcfe
D = 0x10325476

These are nothing-up-my-sleeve numbers chosen by Rivest in the original spec.

Round constants K[64]

K[i] = floor(|sin(i + 1)| × 2^32)   for i = 0..63

Precomputed in md5.c so we don't drag in <math.h>. They are the fractional parts of the sines of the integers, scaled to 32 bits.

Per-round shift table S[64]

The same four rotations repeat in each group of 16:

F-rounds (1-16):  7, 12, 17, 22
G-rounds (17-32): 5,  9, 14, 20
H-rounds (33-48): 4, 11, 16, 23
I-rounds (49-64): 6, 10, 15, 21

The four nonlinear functions (each takes three 32-bit words, returns one):

F(b, c, d) = (b & c) | (~b & d)        // "bit-select": if b then c else d
G(b, c, d) = (b & d) | (c & ~d)        // similar but with d as the selector
H(b, c, d) = b ^ c ^ d                 // XOR — purely linear
I(b, c, d) = c ^ (b | ~d)

The block compression function md5_compress

For each 64-byte block:

  1. Load the block as 16 little-endian 32-bit words m[0..15].

  2. Copy state into working variables a, b, c, d.

  3. Run 64 rounds. Each round picks an (f, g_idx) pair based on the round group, then does:

    temp = f + a + K[i] + m[g_idx];
    a = d;  d = c;  c = b;
    b = b + ROTL32(temp, S[i]);

    The message-word index g_idx permutes through m[] differently in each group:

    rounds function g_idx formula
    0–15 F i
    16–31 G (5·i + 1) mod 16
    32–47 H (3·i + 5) mod 16
    48–63 I (7·i) mod 16
  4. Add the working variables back into the state:

    state[0] += a;  state[1] += b;
    state[2] += c;  state[3] += d;

Finalisation (md5_final)

  1. Append 0x80.
  2. Append zero bytes until the length is ≡ 56 (mod 64).
  3. Append the 64-bit message length in little-endian order as the last 8 bytes of the padded message.
  4. Run md5_compress on the final block(s).
  5. Serialise state[0..3] to digest[0..15] in little-endian order.

Worked example:

$ ./ft_ssl md5 -s ""
MD5 ("") = d41d8cd98f00b204e9800998ecf8427e
$ ./ft_ssl md5 -s "abc"
MD5 ("abc") = 900150983cd24fb0d6963f7d28e17f72

The empty-string digest d41d8cd98f00b204e9800998ecf8427e is the canonical MD5 sanity check.


SHA-256 in depth

Reference: FIPS PUB 180-4Secure Hash Standard, NIST, 2015.

At a glance

property value
block size 64 bytes (512 bits)
state 8 × 32-bit words (H0..H7)
digest size 32 bytes (256 bits)
rounds / block 64
endianness big-endian

Initial state (sha256_init)

The first 32 bits of the fractional parts of √p for the first 8 primes (2, 3, 5, 7, 11, 13, 17, 19):

H0 = 0x6a09e667    H1 = 0xbb67ae85
H2 = 0x3c6ef372    H3 = 0xa54ff53a
H4 = 0x510e527f    H5 = 0x9b05688c
H6 = 0x1f83d9ab    H7 = 0x5be0cd19

Round constants K[64]

The first 32 bits of the fractional parts of ∛p for the first 64 primes. Hard-coded in sha256.c.

Bitwise functions (FIPS 180-4 §3.2 / §4.1.2):

ROTR32(x, n)  = (x >> n) | (x << (32 - n))
SHR(x, n)     = x >> n

CH (x, y, z)  = (x & y) ^ (~x & z)               // "choose": pick y where x is 1, z elsewhere
MAJ(x, y, z)  = (x & y) ^ (x & z) ^ (y & z)      // bitwise majority

BSIG0(x)      = ROTR(x, 2)  ^ ROTR(x, 13) ^ ROTR(x, 22)   // big-sigma 0
BSIG1(x)      = ROTR(x, 6)  ^ ROTR(x, 11) ^ ROTR(x, 25)   // big-sigma 1

SSIG0(x)      = ROTR(x, 7)  ^ ROTR(x, 18) ^ SHR(x, 3)     // small-sigma 0
SSIG1(x)      = ROTR(x, 17) ^ ROTR(x, 19) ^ SHR(x, 10)    // small-sigma 1

BSIGx are used in the per-round state mixing; SSIGx are used in the message schedule that precedes mixing.

The block compression function sha256_compress

For each 64-byte block:

  1. Build the message schedule w[0..63] (FIPS §6.2.2 step 1).

    for i in 0..15:   w[i] = big-endian word from block[i*4 .. i*4+3]
    for i in 16..63:  w[i] = SSIG1(w[i-2]) + w[i-7] + SSIG0(w[i-15]) + w[i-16]

    The first 16 words come straight from the input. The remaining 48 are derived from earlier ones — this expansion is what makes SHA-256 resistant to the simpler attacks that broke MD5.

  2. Initialise working variables. a..h ← H0..H7.

  3. Run 64 rounds.

    for i in 0..63:
        T1 = h + BSIG1(e) + CH(e, f, g) + K[i] + w[i];
        T2 = BSIG0(a) + MAJ(a, b, c);
        h = g;  g = f;  f = e;  e = d + T1;
        d = c;  c = b;  b = a;  a = T1 + T2;
  4. Add back into the state.

    H0 += a;  H1 += b;  H2 += c;  H3 += d;
    H4 += e;  H5 += f;  H6 += g;  H7 += h;

Finalisation (sha256_final)

Same structure as MD5, with one critical difference: the bit-length is appended in big-endian order, and the eight state words are serialised in big-endian order to produce the 32-byte digest.

Worked example:

$ ./ft_ssl sha256 -s ""
SHA256 ("") = e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
$ ./ft_ssl sha256 -s "abc"
SHA256 ("abc") = ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad

The "abc" digest is the official FIPS 180-4 test vector.


MD5 vs SHA-256 at a glance

MD5 SHA-256
Block size 64 bytes 64 bytes
State words 4 × 32-bit 8 × 32-bit
Digest size 16 bytes 32 bytes
Rounds per block 64 64
Endianness little-endian big-endian
Round constants `floor( sin(i+1)
Initial state fixed magic words fractional bits of √p for first 8 primes
Message schedule reuses m[0..15] with index permutation expands to w[0..63] via SSIG0/SSIG1
Mixing functions F, G, H, I CH, MAJ, BSIG0/1
Cryptographic security broken (collisions) considered secure

The structural differences explain the security gap. MD5's rounds always operate on the same 16 raw message words (just with different orderings), so an attacker has more leverage to craft colliding inputs. SHA-256's expanded 64-word schedule, larger state, and more nonlinear mixing make the same kind of attack vastly harder.


Adding a new hash algorithm

The plug-in shape was designed to make this almost mechanical. Suppose you want to add SHA-224:

  1. Header. Create includes/sha224.h with sha224_hash(...) matching the t_hash_fn signature.
  2. Implementation. Create srcs/sha224/sha224.c. Often you can reuse the SHA-256 compression function with different IVs and a truncated output.
  3. Register. In srcs/dispatch.c:
    #include "sha224.h"
    ...
    { "sha224", "SHA224", 28, sha224_hash },
  4. Build. Add srcs/sha224/sha224.c to SRCS in the Makefile.

That's it. main.c, parsing.c, io.c, and utils.c do not need to know the new algorithm exists — they receive bytes, hand them to cmd->fn, and format cmd->digest_len bytes of output.


References

About

implementation for md5, sha256 & sha224

Resources

Stars

Watchers

Forks

Contributors