Skip to content

Idsl-group/SpecMAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking

This repository is the official implementation of SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking as accepted in 39th Conference on Neural Information Processing Systems(NeurIPS 2025).

alt text

Code Files

Following is a one liner description of the code files used in to build our system.

  • agent_templates - Prompt templates for the General Purpose Agentic Graph Node.
  • api_server - This code implements a FastAPI-based AI agent system with HuggingFace and LangChain integrations, featuring:

LLM-powered reasoning & code execution: Uses HuggingFace models (via Llama.cpp API) for multi-step reasoning and code generation Advanced workflows: Supports ReWOO and Plan-and-Execute strategies for complex problem-solving RAG/CRAG capabilities: Implements Retrieval-Augmented Generation with document search and corrective multi-hop reasoning Model checking integration: Includes NuSMV planner and tools for Kripke structure extraction and property verification Streaming interface: Provides real-time SSE streaming for agent thought processes and results Conversation memory: Maintains chat history with buffer memory and thread management The system combines LLM inference, code execution, knowledge retrieval, and formal verification in a unified API for interactive AI-powered workflows.

  • crag_agent - Information Retriever Agent - Implements a CRAG workflow using Arxiv/Local PDFs with Chroma vectorstore and Tavily web search. Features: Hybrid RAG: Combines Arxiv papers & local PDFs (using PyPDFLoader) Relevance grading: Filters documents with LLM-based binary relevance scoring Query rewriting: Optimizes questions for better retrieval Graph-based pipeline: StateGraph workflow with conditional edges for document grading and web search HuggingFace embeddings: Uses gte-large-en-v1.5 for document vectorization Automatically switches between knowledge retrieval, web search, and answer generation for comprehensive question-answering

  • knowledge_search_templates - Prompt Templates for the Knowledge Search tool.

  • Exp_Utils - Processes AI model evaluations for system specification tasks, computes weighted scores, and generates performance visualizations. Features:

Score aggregation: Calculates composite metrics (structural alignment, property/semantic fidelity) Count normalization: Normalizes error counts across models (counterexamples, minor issues) Compliance merging: Combines quality and execution scores into final weighted metrics Visual analytics: Debugging steps comparison bar charts Dual-axis spec-quality vs compliance slope graphs Radar charts for domain-specific model performance Domain categorization: Evaluates models across 6 technical domains (protocols, concurrency, memory mgmt, etc.) Exports CSV tables and PNG visualizations for detailed model comparison and compliance analysis.

  • Extraction_templates - Prompt Templates for the Kripke, Properties and NuSMV Agent.
  • mrag_templates - Prompt Templates for Multi-Hop RAG. Use in Knowledge Search Tool.
  • node_definitions - Implements NuSMV/Kripke/Property verification agents with LangChain-based orchestration and LLM-powered planning. Features:

Modular planners: Kripke structure extraction, property specification, and NuSMV execution pipelines StateGraph orchestration: Conditional execution of planning/steps with ReWOO and Plan-and-Execute strategies Tool integration: Executes NuSMV, reads/writes to filesystem, and uses RAG (Chroma + HuggingFace embeddings) Comprehensive verification: Tracks structural alignment, semantic fidelity, and execution compliance Visualization tools: Generates bar charts, slope graphs, and radar plots for model performance comparison Manages formal verification workflows by coordinating LLM planning, tool execution, and compliance scoring with support for SOP-based specification extraction and model validation.

  • tool_templates - Prompt templates for tools used by the Kripke, Properties and NuSMV Agent.
  • verification_templates - Includes templates for general question answering and domain-specific checks with configurable response formats and evaluation criteria.

Results

Detailed Results are described in the supplementary section document.

Benchmark

All the benchmark files, including the Standard Operations Procedure documents, Expert-written SMV files, are in the Mini SpecBench folder.

About

Repository containing all the code for the paper "SpecMAS: A Multi-Agent System for Self-Verifying System Generation via Formal Model Checking", accepted at the 39th Annual Conference on Neural Information Processing Systems(NeurIPS 2025).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages