Skip to content

0xKatze/llm_ida_tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IDA Exporter

A collection of IDAPython scripts for exporting structured data from IDA Pro + Hex-Rays for use in AI/ML pipelines, malware analysis, and reverse engineering workflows.

Output Structure

ida_export/
├── meta.json           # Binary metadata (hashes, architecture, segments)
├── index.json          # Symbol index (functions, names, imports, exports)
├── functions.jsonl     # Detailed function data (disasm, call graph, basic blocks)
├── strings.jsonl       # String data with cross-references
├── data.jsonl          # Global variables and data structures
├── decomp/             # Decompiled C code (requires Hex-Rays)
│   └── *.c
└── sample.bin          # Original binary (optional)

Requirements

  • IDA Pro 7.0+ with IDAPython
  • Hex-Rays Decompiler (for export_decomp.py)
  • Python 3.x

Scripts

Script Output Description
export_all.py project/* One-click export all data
export_meta.py meta.json Binary metadata, segments, entry points, imports/exports summary
export_index.py index.json Function list, named addresses, globals, imports, exports
export_functions.py functions.jsonl Detailed function info with disassembly and call graph
export_strings.py strings.jsonl Strings with encoding and cross-references
export_data.py data.jsonl Data segment items, global variables, structures
export_decomp.py decomp/*.c Decompiled C code for each function

Usage

Quick Start (One-Click Export)

  1. Open your binary in IDA Pro
  2. Wait for auto-analysis to complete
  3. File > Script file > select export_all.py

That's it! The script auto-executes and exports everything to ./project/ (relative to your IDB file).

Alternatively, run from Python console:

exec(open(r"C:\path\to\scripts\export_all.py").read())

Individual Scripts

# Run individual scripts
exec(open("scripts/export_meta.py").read())
exec(open("scripts/export_index.py").read())
exec(open("scripts/export_functions.py").read())
exec(open("scripts/export_strings.py").read())
exec(open("scripts/export_data.py").read())
exec(open("scripts/export_decomp.py").read())

Custom Output Paths

from export_meta import export_meta
from export_functions import export_functions

export_meta("output/meta.json")
export_functions("output/functions.jsonl", include_disasm=True)

Export All at Once

Create a script to run all exports:

import os

output_dir = "ida_export"
os.makedirs(output_dir, exist_ok=True)

exec(open("scripts/export_meta.py").read())
export_meta(os.path.join(output_dir, "meta.json"))

exec(open("scripts/export_index.py").read())
export_index(os.path.join(output_dir, "index.json"))

exec(open("scripts/export_functions.py").read())
export_functions(os.path.join(output_dir, "functions.jsonl"))

exec(open("scripts/export_strings.py").read())
export_strings(os.path.join(output_dir, "strings.jsonl"))

exec(open("scripts/export_data.py").read())
export_data(os.path.join(output_dir, "data.jsonl"))

exec(open("scripts/export_decomp.py").read())
export_decomp(os.path.join(output_dir, "decomp"))

Output Formats

meta.json

{
  "file": {
    "name": "sample.exe",
    "size": 123456,
    "hashes": {
      "md5": "...",
      "sha256": "..."
    }
  },
  "architecture": {
    "processor": "metapc",
    "bitness": 64,
    "endianness": "little"
  },
  "segments": [...],
  "entry_points": [...],
  "imports": {...},
  "exports": [...]
}

functions.jsonl

Each line is a JSON object:

{
  "address": 4198400,
  "name": "main",
  "size": 256,
  "type_info": {
    "return_type": "int",
    "parameters": [...]
  },
  "callers": [...],
  "callees": [...],
  "basic_blocks": [...],
  "disassembly": [...]
}

strings.jsonl

{
  "address": 4206592,
  "content": "Hello, World!",
  "length": 13,
  "encoding": "ascii",
  "xrefs": [...]
}

Advanced Usage

Filter Functions

from export_functions import export_function
import idautils
import json

# Export only non-library functions
with open("user_functions.jsonl", "w") as f:
    for func_ea in idautils.Functions():
        flags = idc.get_func_attr(func_ea, idc.FUNCATTR_FLAGS)
        if not (flags & idc.FUNC_LIB):
            data = export_function(func_ea)
            f.write(json.dumps(data) + "\n")

Search Strings

from export_strings import find_strings_with_pattern

# Find URLs
urls = find_strings_with_pattern(r"https?://")

# Find registry keys
regkeys = find_strings_with_pattern(r"HKEY_")

Export Specific Functions

from export_decomp import export_decomp_single

# Export main function
main_ea = idc.get_name_ea_simple("main")
export_decomp_single(main_ea, "main.c")

Notes

  • Run scripts after IDA's auto-analysis is complete for best results
  • export_decomp.py requires Hex-Rays Decompiler license
  • Large binaries may take significant time to process
  • Always analyze samples in a safe, isolated environment

Output Size Considerations

For large binaries, consider:

  • Using include_disasm=False in export_functions() to reduce size
  • Filtering by segment or function type
  • Exporting only specific functions of interest

License

MIT License - See LICENSE file

Contributing

Contributions welcome! Please submit issues and pull requests on GitHub.

About

IDA Pro exporter scripts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages