Protocol Reverse Engineering

A protocol-agnostic reverse engineering pipeline that analyzes binary protocol traffic from PCAP files and automatically infers protocol structure, message types, field boundaries, and semantic roles.

Key Features

Protocol-Agnostic Analysis - Works with any binary protocol without prior knowledge
Automatic Message Clustering - Discovers message families using advanced clustering
Field Boundary Detection - Infers field boundaries with enhanced anti-fragmentation
Semantic Labeling - Identifies field labels
Request/Response Pairing - Discovers protocol interactions and relations
LLM-Assisted Refinement - LLM integration for improved analysis
Comprehensive Reports - Generates Markdown and interactive HTML specifications
Ground Truth Evaluation - Validates results against known protocol specifications

Documentation

Getting Started - Installation, first analysis, and basic usage
Architecture - System design and technical details
Documentation Guide - How to build and contribute to docs

Requirements

Python 3.10+
TShark (Wireshark CLI)
Dependencies: numpy, scikit-learn, hdbscan, scapy, torch

Project Structure

protocol_re/
├── src/protocol_re/          # Core library
├── scripts/                  # Pipeline stages (01-24)
├── docs/                     # Documentation
├── data/                     # Intermediate artifacts
├── output/                   # Final reports
├── pre_trained/              # Trained nueral models
├── prompts/                  # Prompts used in LLM assisted stages
├── schema/                   # protocol model, evaluation schema
├── tests/                    # test modules
├── truth-files/              # Real protocol specification, used for evaluation
└── main.py                   # Pipeline runner

Supported Protocols

The pipeline is protocol-agnostic and has been tested with:

Modbus TCP

Performance

Typical runtime for 200K Modbus messages: ~6 minutes

Accuracy on Modbus TCP:

Message type detection: 90%+ precision/recall
Field boundary recall: 88%+
Field boundary precision: 65%+

License

MIT License - See LICENSE file for details.

Contact

For questions or issues, please open an issue on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protocol Reverse Engineering

Key Features

Documentation

Requirements

Project Structure

Supported Protocols

Performance

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
docs		docs
output		output
pre_trained		pre_trained
prompts		prompts
schema		schema
scripts		scripts
src/protocol_re		src/protocol_re
tests		tests
truth-files		truth-files
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
ReadMe.md		ReadMe.md
main.py		main.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Protocol Reverse Engineering

Key Features

Documentation

Requirements

Project Structure

Supported Protocols

Performance

License

Contact

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages