Skip to content

Comments

[Feature] P2S VPNGateway support for flexnode#68

Open
wenxuan0923 wants to merge 6 commits intomainfrom
wenx/vpn-gateway
Open

[Feature] P2S VPNGateway support for flexnode#68
wenxuan0923 wants to merge 6 commits intomainfrom
wenx/vpn-gateway

Conversation

@wenxuan0923
Copy link
Collaborator

@wenxuan0923 wenxuan0923 commented Feb 6, 2026

Pod networking needs IP reachability between nodes. This PR added a comprehensive VPN Gateway component enabling secure pod-to-pod communication between AKS clusters and AKS Flex nodes through Point-to-Site VPN connections.

Details

  • VPN Gateway provisioning: Automated Azure VPN Gateway creation with P2S configuration
  • Certificate management: Self-generated root CA and client certificates for authentication
  • OpenVPN integration: Automatic client configuration and systemd service management
  • Network setup: Route and iptables configuration for seamless pod connectivity
  • Graceful cleanup: Robust uninstaller with file, network, and Azure resource cleanup

New sample config for enabling VPN gateway for flex node -> AKS node communication:

{
  "azure": {
    "subscriptionId": "xxxxxxxxxxxxxx",
    "tenantId": "xxxxxxxxxxxxxx",
    "cloud": "AzurePublicCloud",
    "vpnGateway": {
      "enabled": true,
      "p2sGatewayCIDR": "192.168.100.0/24",
      "podCIDR": "172.16.0.0/16",
      "vnetID": "/subscriptions/xxxxxxxxxxxxxx/resourceGroups/MC_wenx-rg_wenx-edge-cluster_eastus/providers/Microsoft.Network/virtualNetworks/aks-vnet-xxxxx"
    },
    "arc": {
      "enabled": true,
      "machineName": "edge-node",
      "tags": {
        "node-type": "edge"
      },
      "resourceGroup": "wenx-rg",
      "location": "eastus"
    },
    "targetCluster": {
      "resourceId": "/subscriptions/xxxxxxxxxxxxxx/resourceGroups/wenx-rg/providers/Microsoft.ContainerService/managedClusters/wenx-edge-cluster",
      "location": "eastus"
    }
  },
  "kubernetes": {
    "version": "1.32.7"
  },
  "agent": {
    "logLevel": "info",
    "logDir": "/var/log/aks-flex-node"
  }
}

Then on top of that, user can use cilium for pod networking.

helm install cilium cilium/cilium \
  --version 1.16.5 \
  --namespace kube-system \
  --set operator.replicas=1 \
  --set routingMode=tunnel \
  --set tunnelProtocol=vxlan \
  --set mtu=1350 \
  --set ipam.mode=cluster-pool \
  --set ipam.operator.clusterPoolIPv4PodCIDRList="{172.16.0.0/16}" \
  --set bpf.masquerade=true \
  --set bpf.hostLegacyRouting=false \
  --set kubeProxyReplacement=true \
  --wait

}

// Create the subnet - this is a long-running operation
poller, err := i.subnetsClient.BeginCreateOrUpdate(ctx, vnetInfo.resourceGroupName, to.String(vnetInfo.vnet.Name), gatewaySubnetName, gatewaySubnetParams, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do a network permission check on the service principal or MSI that used to run this agent? Since it needs to create gateway subnet in the byo subnet or managed subnet.

Copy link
Collaborator

@weiliu2dev weiliu2dev Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR comparing to my previouse PR: #55

Both PRs solve the same problem (joining nodes to private AKS clusters) but take different approaches:
#55: Uses a self-managed Gateway VM with WireGuard — lightweight, low cost, fast provisioning (~2-5 min)
#68: Uses Azure native P2S VPN Gateway (OpenVPN) — managed service, higher cost, longer provisioning (~30 - 40min)

From my understanding, the two approaches can coexist and serve different use cases, is it?
Azure VPN Gateway for complex production/enterprise scenarios that need managed HA, will cost 30-40 minutes
WireGuard Gateway VM for cost-sensitive deployments, will cost 2~5 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants