A collection of IDAPython scripts for exporting structured data from IDA Pro + Hex-Rays for use in AI/ML pipelines, malware analysis, and reverse engineering workflows.
ida_export/
├── meta.json # Binary metadata (hashes, architecture, segments)
├── index.json # Symbol index (functions, names, imports, exports)
├── functions.jsonl # Detailed function data (disasm, call graph, basic blocks)
├── strings.jsonl # String data with cross-references
├── data.jsonl # Global variables and data structures
├── decomp/ # Decompiled C code (requires Hex-Rays)
│ └── *.c
└── sample.bin # Original binary (optional)
- IDA Pro 7.0+ with IDAPython
- Hex-Rays Decompiler (for
export_decomp.py) - Python 3.x
| Script | Output | Description |
|---|---|---|
export_all.py |
project/* |
One-click export all data |
export_meta.py |
meta.json |
Binary metadata, segments, entry points, imports/exports summary |
export_index.py |
index.json |
Function list, named addresses, globals, imports, exports |
export_functions.py |
functions.jsonl |
Detailed function info with disassembly and call graph |
export_strings.py |
strings.jsonl |
Strings with encoding and cross-references |
export_data.py |
data.jsonl |
Data segment items, global variables, structures |
export_decomp.py |
decomp/*.c |
Decompiled C code for each function |
- Open your binary in IDA Pro
- Wait for auto-analysis to complete
- File > Script file > select
export_all.py
That's it! The script auto-executes and exports everything to ./project/ (relative to your IDB file).
Alternatively, run from Python console:
exec(open(r"C:\path\to\scripts\export_all.py").read())# Run individual scripts
exec(open("scripts/export_meta.py").read())
exec(open("scripts/export_index.py").read())
exec(open("scripts/export_functions.py").read())
exec(open("scripts/export_strings.py").read())
exec(open("scripts/export_data.py").read())
exec(open("scripts/export_decomp.py").read())from export_meta import export_meta
from export_functions import export_functions
export_meta("output/meta.json")
export_functions("output/functions.jsonl", include_disasm=True)Create a script to run all exports:
import os
output_dir = "ida_export"
os.makedirs(output_dir, exist_ok=True)
exec(open("scripts/export_meta.py").read())
export_meta(os.path.join(output_dir, "meta.json"))
exec(open("scripts/export_index.py").read())
export_index(os.path.join(output_dir, "index.json"))
exec(open("scripts/export_functions.py").read())
export_functions(os.path.join(output_dir, "functions.jsonl"))
exec(open("scripts/export_strings.py").read())
export_strings(os.path.join(output_dir, "strings.jsonl"))
exec(open("scripts/export_data.py").read())
export_data(os.path.join(output_dir, "data.jsonl"))
exec(open("scripts/export_decomp.py").read())
export_decomp(os.path.join(output_dir, "decomp")){
"file": {
"name": "sample.exe",
"size": 123456,
"hashes": {
"md5": "...",
"sha256": "..."
}
},
"architecture": {
"processor": "metapc",
"bitness": 64,
"endianness": "little"
},
"segments": [...],
"entry_points": [...],
"imports": {...},
"exports": [...]
}Each line is a JSON object:
{
"address": 4198400,
"name": "main",
"size": 256,
"type_info": {
"return_type": "int",
"parameters": [...]
},
"callers": [...],
"callees": [...],
"basic_blocks": [...],
"disassembly": [...]
}{
"address": 4206592,
"content": "Hello, World!",
"length": 13,
"encoding": "ascii",
"xrefs": [...]
}from export_functions import export_function
import idautils
import json
# Export only non-library functions
with open("user_functions.jsonl", "w") as f:
for func_ea in idautils.Functions():
flags = idc.get_func_attr(func_ea, idc.FUNCATTR_FLAGS)
if not (flags & idc.FUNC_LIB):
data = export_function(func_ea)
f.write(json.dumps(data) + "\n")from export_strings import find_strings_with_pattern
# Find URLs
urls = find_strings_with_pattern(r"https?://")
# Find registry keys
regkeys = find_strings_with_pattern(r"HKEY_")from export_decomp import export_decomp_single
# Export main function
main_ea = idc.get_name_ea_simple("main")
export_decomp_single(main_ea, "main.c")- Run scripts after IDA's auto-analysis is complete for best results
export_decomp.pyrequires Hex-Rays Decompiler license- Large binaries may take significant time to process
- Always analyze samples in a safe, isolated environment
For large binaries, consider:
- Using
include_disasm=Falseinexport_functions()to reduce size - Filtering by segment or function type
- Exporting only specific functions of interest
MIT License - See LICENSE file
Contributions welcome! Please submit issues and pull requests on GitHub.