A production-ready knowledge graph system on Snowflake that combines graph data modeling, natural language querying via Cortex Analyst, advanced graph algorithms via custom NetworkX tools, and intelligent orchestration using Cortex Agent. For complete content, see Blog
This repository contains a complete implementation of a knowledge graph solution for soccer analytics, demonstrating how to:
- Model graph data using node-edge architecture on Snowflake
- Enable natural language queries via Cortex Analyst with semantic models
- Deploy custom graph algorithms (NetworkX) via Snowpark Container Services (SPCS)
- Orchestrate intelligently using Cortex Agent to route queries to optimal tools
The architecture handles both simple SQL queries (80% of use cases) and complex graph algorithms (20% requiring advanced analytics), all accessible through natural language.
repo_knowledge_graph/
βββ 1_setup_sql/ # Phase 1: Database setup and knowledge graph schema
βββ 2_cortex_analyst/ # Phase 2: Semantic model for natural language queries
βββ 3_mcp_server/ # Phase 3: Custom NetworkX tools deployed to SPCS
βββ 4_cortex_agent/ # Phase 4: Agent orchestration and instructions
βββ README.md # This file
- Snowflake account with access to:
- Cortex Analyst (for semantic models)
- Cortex Agent (for orchestration)
- Snowpark Container Services (SPCS) (for custom tools)
- Python 3.10+ (for MCP server development)
- Docker (for containerizing MCP server)
Before deploying, you'll need to replace the placeholder (XXXXXX) with your actual Snowflake account identifier:
Files that need configuration:
3_mcp_server/deploy_spcs.sh- ReplaceXXXXXXwith your Snowflake account identifier (e.g.,abc12345.us-east-1)3_mcp_server/service.yaml- ReplaceXXXXXXwith your Snowflake account identifier (e.g.,abc12345.us-east-1)
Sensitive files (not included in repo):
- None required - The MCP server uses only static JSON files from
graph_data/directory
Note: The graph_data/ folder is included in the repo and will be copied into the Docker image during build. No Snowflake connection or authentication is needed for the MCP server to function.
Follow these phases in order:
π Location: 1_setup_sql/soccer_knowledge_graph.sql
Execute the SQL script to create:
- Database
KNOWLEDGE_GRAPH_DBand schemaSOCCER_KG - Core tables:
KG_NODE(entities) andKG_EDGE(relationships) - Convenience views:
V_PLAYER,V_CLUB,V_MATCH,V_PLAYS_FOR, etc. - Sample soccer data (players, clubs, matches, contracts)
π Location: 2_cortex_analyst/soccer_semantic_model.yml
Upload the semantic model to Snowflake to enable Cortex Analyst natural language queries:
- Defines entity views (players, clubs, matches, coaches)
- Defines relationship views (plays_for, coaches, played_in)
- Specifies verified queries for common patterns
- Enables text-to-SQL translation for graph queries
π Location: 3_mcp_server/
Deploy custom NetworkX graph algorithms as an SPCS service. Follow the complete guide:
π Deployment Guide: 3_mcp_server/SPCS_DEPLOYMENT_GUIDE.md
Quick Summary:
- Build and push Docker image (
./deploy_spcs.sh) - Create Snowflake infrastructure (
deploy_to_spcs.sql) - Create Service Functions (
spcs_service_functions.sql)
Provides 5 Graph Analytics Tools:
shortest_path_tool- Find shortest paths between entitiescentrality_tool- Identify most influential nodescommunity_detection_tool- Detect natural groupingstransfer_analysis_tool- Analyze transfer patternstemporal_analysis_tool- Examine network evolution
Key Feature: Uses static JSON files from graph_data/ (no database connection required)
π Location: 4_cortex_agent/
Create a Cortex Agent that orchestrates between Cortex Analyst and custom tools.
Configuration Files:
agent_description.md- Agent capabilities and limitationstool_descriptions.md- When to use each toolorchestration_instructions.md- Decision logic for tool selectionresponse_instructions.md- Response formatting standards
Agent Capabilities:
- Routes simple queries to Cortex Analyst (SQL)
- Routes complex graph queries to NetworkX tools
- Executes multiple tools in parallel when independent
- Synthesizes results into unified responses
π Integration: See 3_mcp_server/spcs_service_functions.sql (Step 5) for adding custom tools to Cortex Agent
"What is Lionel Messi's complete career trajectory?"
Result: Returns all clubs, dates, and contract details for Messi's career.
"What's the shortest connection between Messi and Haaland?"
Result: Finds shortest path through shared teammates/clubs.
"Find all clubs within 3 degrees of Real Madrid AND show which ones
belong to the same transfer market community?"
Result: Uses Cortex Analyst for recursive CTE (3-degree network) and community detection tool for transfer market groupings.
- SQL Speed: 1-3 seconds for direct relationships via Cortex Analyst
- Graph Power: 5-10 seconds for complex algorithms via NetworkX
- Intelligent Routing: Agent automatically selects optimal tools
- Parallel Execution: Independent tools run simultaneously (2-3x faster)
- Natural Language: No SQL or algorithm knowledge required
- Scalable: Handles billions of edges with horizontal scaling via SPCS
- KG_NODE: All entities (players, clubs, matches, coaches)
- KG_EDGE: All relationships (contracts, appearances, matches)
- VARIANT columns: Flexible properties per entity/relationship type
- Temporal support: Start/end dates for time-bounded relationships
Domain-specific views that abstract the underlying node-edge structure:
V_PLAYER,V_COACH,V_CLUB,V_MATCHV_PLAYS_FOR,V_COACHES,V_PLAYED_INV_HOME_TEAM,V_AWAY_TEAM
YAML-based definition that enables Cortex Analyst to:
- Understand entity relationships
- Generate optimized SQL queries
- Handle verified query patterns
- Support natural language variations
NetworkX-based graph algorithms deployed as SPCS service:
- Preloads graph data for performance
- Exposes HTTP endpoints
- Wrapped by Snowflake UDFs
- Accessible via Cortex Agent
Intelligent orchestration layer that:
- Analyzes query intent
- Selects appropriate tools
- Executes in parallel when possible
- Synthesizes multi-tool results
- Formats professional responses
| Component | File | Description |
|---|---|---|
| Database Setup | 1_setup_sql/soccer_knowledge_graph.sql |
Complete schema and data |
| Semantic Model | 2_cortex_analyst/soccer_semantic_model.yml |
Natural language query model |
| MCP Deployment | 3_mcp_server/SPCS_DEPLOYMENT_GUIDE.md |
Step-by-step SPCS guide |
| Service Functions | 3_mcp_server/spcs_service_functions.sql |
SQL functions and Cortex Agent setup |
| Agent Configuration | 4_cortex_agent/*.md |
Agent orchestration instructions |
This architecture applies to any domain where relationships matter:
- Supply Chain Networks: Track suppliers, shipments, and contracts
- Fraud Detection: Connect accounts, devices, and transactions
- Customer 360: Map customer journeys and interactions
- Cybersecurity: Trace attack paths and analyze network traffic
- Recommendation Systems: Find connections between users and products
To extend this implementation:
- New Entity Types: Add to
KG_NODE(Phase 1), create views, update semantic model (Phase 2) - New Relationships: Add to
KG_EDGE(Phase 1), create views, update semantic model (Phase 2) - New Graph Algorithms: Implement in
3_mcp_server/soccer_mcp_server.py, register as MCP tool, update agent configuration (Phase 4)
See individual files for detailed implementation patterns.
This repository contains example code and configurations for Snowflake knowledge graph implementation. Adapt as needed for your use case.
This is a reference implementation. Feel free to:
- Customize for your domain
- Extend with additional tools
- Optimize for your data volume
- Share improvements
For questions about:
- Snowflake features: Refer to Snowflake Documentation
- Cortex Analyst: See Cortex Analyst Guide
- Cortex Agent: See Cortex Agent Guide
- SPCS: See SPCS Documentation
Built with: Snowflake Intelligence, Cortex Analyst, Cortex Agent, Snowpark Container Services, NetworkX