Skip to content

Possible root cause bug related to firewall read functions #288

@synergiator

Description

@synergiator

Hello everybody,

while experimenting with agentic coding in infrastructure context, the agent raised concern about a possible root cause which has been addressed in some 0.6.0 fixes but as it seems is itself still there.

Please have a look - does this make sense for those with deeper understanding of this provider? (Opus 4.6R is known to have quite comprehensive context understanding).


Firewall-family Read functions destroy state via SetId("") — root cause behind #115, #194, #241, #278

Summary

The Read functions in cloudstack_firewall, cloudstack_egress_firewall, and
cloudstack_port_forward contain a structural defect: when the CloudStack API
listing returns no matching rules, the Read function calls d.SetId("")
destroying the resource from Terraform state. This is the root cause behind
four separate issues (#115, #194, #241, #278), each of which was diagnosed and
patched as a project-context problem rather than the underlying logic error.

The defect

All three resources share this pattern in their Read function:

// resource_cloudstack_firewall.go:388-392
if rules.Len() > 0 {
    d.Set("rule", rules)
} else if !managed {
    d.SetId("")        // ← destroys the resource from state
}

The logic is: "if I found zero matching rules, the resource no longer exists."
This conflates two distinct situations:

Situation Correct action
The IP address / network was deleted d.SetId("") — resource is gone
The rules exist but the API listing didn't return them Preserve state — resource is fine

The second case happens whenever listFirewallRules (or equivalent) can't see
the rules. Every reported instance of this bug was caused by an API visibility
problem, not an actually-deleted resource.

How the secondary issues trace back to this root cause

#115 / #194 — Firewall rules in projects (closed, PR #198)

Trigger: Rules created in a project context. The Read function called
listFirewallRules without passing projectid, so the API returned zero
rules (they're scoped to the project).

Fix applied (commit edb0901): Added a project schema field and
setProjectid() call to the Read function so the listing includes the project
context.

Why it's a symptom fix: PR #198 fixed one specific reason the API returns
zero rules. The SetId("") fallback remained untouched. Any other reason the
listing returns empty (see below) still triggers the same destruction.

#241 — Same error persists in v0.6.0-rc2 (open)

Trigger: Egress firewall without an explicit project attribute. Even
after PR #198 added the project field, it's Optional — if the user doesn't
set it, setProjectid() is a no-op, and the listing still returns zero rules
for project-scoped networks.

Why it proves the root cause is unfixed: The PR #198 fix requires the user
to explicitly specify project on every firewall resource. When they don't, the
same empty-listing → SetId("") path fires. The root cause — treating "empty
listing" as "resource deleted" — was never addressed.

#278 — Port forward "always returns error" (open, PR #280 in progress)

Trigger: Port forward rules in a project. Same mechanism: Read calls
listPortForwardingRules without project context, gets zero results, calls
SetId("").

Fix in progress (PR #280): Implements automatic project inheritance (inherit
from IP address) and a fallback lookup with projectid=-1. PR #280 also
refactors the SetId("") logic in port_forward, but the same pattern remains
in cloudstack_firewall and cloudstack_egress_firewall.

Pattern: Each new trigger produces a new issue. Each fix addresses that
specific trigger. The SetId("") bomb stays armed.

The chain of events on every occurrence

Create           succeeds → rule exists in CloudStack
  ↓
Read             calls listFirewallRules(ipaddressid=X)
  ↓
API returns 0    (missing project, source NAT filter, eventual consistency, …)
  ↓
rules.Len() == 0
  ↓
d.SetId("")      resource destroyed from state
  ↓
Terraform        "Root object was present, but now absent"
  ↓
Next apply       tries to re-create → "conflicts with existing rule"

Locations

Resource File Line Pattern
cloudstack_firewall resource_cloudstack_firewall.go 388-392 rules.Len() == 0 && !managed → SetId("")
cloudstack_egress_firewall resource_cloudstack_egress_firewall.go 425-430 Same
cloudstack_port_forward resource_cloudstack_port_forward.go 355-359 forwards.Len() == 0 && !managed → SetId("")

Note: cloudstack_port_forward already validates the IP address exists at the
top of its Read (lines 237-250). It confirms the IP is present, then proceeds
to destroy the resource at line 358 because the forwards listing is empty.
This is self-contradictory.

Proposed fix

Replace the "empty set → destroy" logic with a parent-object existence check:

For cloudstack_firewall — before calling SetId(""), verify the IP
address still exists:

} else if !managed {
    _, count, err := cs.Address.GetPublicIpAddressByID(d.Id(),
        cloudstack.WithProject(d.Get("project").(string)))
    if err != nil && count == 0 {
        log.Printf("[DEBUG] IP %s no longer exists, removing from state", d.Id())
        d.SetId("")
    } else {
        log.Printf("[WARN] No firewall rules found for IP %s but IP exists", d.Id())
        d.Set("rule", rules) // preserve empty set in state
    }
}

For cloudstack_egress_firewall — verify the network still exists.

For cloudstack_port_forward — remove the SetId("") at line 358
entirely; the IP existence check at lines 237-250 already handles the
"parent deleted" case.

This eliminates the class of bug rather than patching individual triggers.
Future API visibility issues (new project scoping, RBAC changes, API
versioning) would be handled correctly without new code.

Proof

The structural defect is demonstrable via unit test without a CloudStack
instance. See cloudstack/gotcha_issues_test.go:

  • TestFirewallRead_clearsIdWhenRulesEmpty — proves the logic path exists
  • TestFirewallRead_ruleMatchRequiresUuidInMap — proves unmatched UUIDs drop rules
  • TestEgressFirewallRead_sameSetIdBugPattern — proves egress has same defect

Environment

  • Provider: main branch (includes v0.6.0 release code)
  • Terraform: 1.11+
  • CloudStack: 4.19+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions