Summary
YAML files (including Helm chart templates) are indexed as flat key extraction in v0.6.1 — one Variable node per YAML leaf key, no semantic edges, no Helm-specific understanding. The result is a node flood with zero graph traversal value.
Scale
| Repo |
.yaml/.yml files |
Primary content |
| 1 |
3,147 |
App config, K8s manifests |
| 2 |
429 |
K8s, Helm values, CI |
| 3 |
301 |
K8s, CI |
| 4 |
161 |
K8s, Helm values |
| 5 |
56 |
Helm chart templates (source of truth) |
Currently excluded from indexing (except helm-charts) because flat extraction produces noise without signal.
Current behavior — helm-charts repo (56 files, kept indexed)
MATCH (n) RETURN DISTINCT labels(n) AS type, count(n) AS cnt ORDER BY cnt DESC
// → Variable (278), File (54), Module (54), Folder (23)
MATCH ()-[r]->() RETURN DISTINCT type(r), count(r) ORDER BY count(r) DESC
// → DEFINES (332), CONTAINS_FILE (44), CONTAINS_FOLDER (20), FILE_CHANGES_WITH (16)
// → ZERO CALLS, IMPORTS, REFERENCES, or any semantic edges
MATCH (n) WHERE n.file_path ENDS WITH '.yaml'
RETURN n.name, n.type, n.language, n.content LIMIT 3
// → name="appVersion", type="", language="", content=""
// → name="version", type="", language="", content=""
// → name="name", type="", language="", content=""
values.yaml produces 86 duplicate nodes (one per leaf key). values.schema.json produces 124 nodes. Fields type, language, content all empty. DEFINES edge means: "this file has a key named X" — not useful for traversal.
Expected behavior — Helm templates
Named template definitions → Function nodes
_helpers.tpl:
{{- define "chart.fullname" -}}
→ Function {name: "chart.fullname", language: "helm", file_path: "templates/_helpers.tpl"}
include / template calls → CALLS edges
{{ include "chart.fullname" . }}
→ CALLS edge: calling template → chart.fullname Function node
Chart.yaml dependencies → DEPENDS_ON edges
dependencies:
- name: postgresql
repository: https://charts.bitnami.com/bitnami
→ DEPENDS_ON edge: Chart node → dependency Chart node
values.yaml → structured Variable nodes
Top-level keys only (not leaf explosion), with inferred type, not one-node-per-leaf.
Expected behavior — generic YAML (K8s manifests, CI config)
At minimum:
- One node per file (not per key)
language: "yaml" populated
kind + apiVersion fields for K8s resources (e.g. kind: Deployment)
REFERENCES edges where one manifest references another by name (e.g. Service → Deployment selector)
If full semantic extraction is not feasible for generic YAML, a single File node per .yaml file (no key explosion) is strongly preferred over the current Variable flood — it produces less noise and the same traversal value (zero).
Proposed node types
| Helm construct |
CBM node type |
Key fields |
{{- define "name" -}} |
Function |
name, language: "helm" |
Chart.yaml |
Module |
name, version, app_version |
values.yaml top-level key |
Variable |
name, type, default |
| K8s manifest |
Resource |
kind, api_version, name |
Proposed edges
| Edge |
Trigger |
CALLS |
{{ include "name" }} → named template |
DEPENDS_ON |
Chart.yaml dependency → dependency chart |
REFERENCES |
K8s Service selector → Deployment labels |
Related
Companion to #337 (HCL/Terraform semantic extraction). Together these two cover the full IaC surface of cloud-native repos where application code, Terraform infra, and Helm deployment config coexist.
Environment
- CBM version: 0.6.1
- Platform: macOS arm64
- Primary use case: cloud-native monorepos (Python/Go app + Terraform + Helm)
Summary
YAML files (including Helm chart templates) are indexed as flat key extraction in v0.6.1 — one
Variablenode per YAML leaf key, no semantic edges, no Helm-specific understanding. The result is a node flood with zero graph traversal value.Scale
.yaml/.ymlfilesCurrently excluded from indexing (except
helm-charts) because flat extraction produces noise without signal.Current behavior —
helm-chartsrepo (56 files, kept indexed)values.yamlproduces 86 duplicate nodes (one per leaf key).values.schema.jsonproduces 124 nodes. Fieldstype,language,contentall empty.DEFINESedge means: "this file has a key named X" — not useful for traversal.Expected behavior — Helm templates
Named template definitions →
Functionnodes_helpers.tpl:→
Function {name: "chart.fullname", language: "helm", file_path: "templates/_helpers.tpl"}include/templatecalls →CALLSedges→
CALLSedge: calling template →chart.fullnameFunction nodeChart.yamldependencies →DEPENDS_ONedges→
DEPENDS_ONedge: Chart node → dependency Chart nodevalues.yaml→ structuredVariablenodesTop-level keys only (not leaf explosion), with inferred type, not one-node-per-leaf.
Expected behavior — generic YAML (K8s manifests, CI config)
At minimum:
language: "yaml"populatedkind+apiVersionfields for K8s resources (e.g.kind: Deployment)REFERENCESedges where one manifest references another by name (e.g. Service → Deployment selector)If full semantic extraction is not feasible for generic YAML, a single
Filenode per.yamlfile (no key explosion) is strongly preferred over the current Variable flood — it produces less noise and the same traversal value (zero).Proposed node types
{{- define "name" -}}Functionname,language: "helm"Chart.yamlModulename,version,app_versionvalues.yamltop-level keyVariablename,type,defaultResourcekind,api_version,nameProposed edges
CALLS{{ include "name" }}→ named templateDEPENDS_ONChart.yamldependency → dependency chartREFERENCESselector→ DeploymentlabelsRelated
Companion to #337 (HCL/Terraform semantic extraction). Together these two cover the full IaC surface of cloud-native repos where application code, Terraform infra, and Helm deployment config coexist.
Environment