HNS OutBoundNAT conntrack fails to correlate responses when upstream device performs DNAT

**Describe the bug**

When an HNS endpoint has an OutBoundNAT policy applied, TCP connections fail if an upstream network device (router/firewall) performs DNAT on the destination address. This happens with hairpin NAT for example. HNS conntrack appears to strictly require responses from the original destination IP, rather than correlating based on the SNATed source tuple.

This differs from Linux iptables behavior, where conntrack correctly handles responses even when an upstream device modifies the destination after the local SNAT is applied. I'm not sure why?

ICMP (ping) works because it's stateless, but TCP connections time out.

**To Reproduce**

1. Create a Windows container with an HNS network that has OutBoundNAT policy enabled (not sure if this is essential, mine happens to have this because I am using Calico in kubernetes)
2. Configure an upstream router with a DNAT rule that redirects traffic from an external IP to an internal IP (hairpin NAT / NAT reflection scenario)
3. From the container, attempt a TCP connection to the external IP: `curl https://192.184.0.1`
4. Observe: TCP connection times out
5. From the container, ping the same IP: `ping 192.184.0.1`
6. Observe: ICMP succeeds

**Packet flow demonstrating the issue:**
1. Container (`10.3.50.100`) sends SYN to `192.184.0.1:443`
2. HNS OutBoundNAT translates source to host IP: `10.2.0.15 → 192.184.0.1:443`
3. HNS conntrack records connection, expecting response from `192.184.0.1`
4. Upstream router DNATs destination: packet becomes `10.2.0.15 → 10.152.184.99:443`
5. Backend server responds: `10.152.184.99 → 10.2.0.15`
6. Response arrives at Windows host from `10.152.184.99`
7. HNS conntrack fails to match - expected response from `192.184.0.1`, got `10.152.184.99`
8. TCP connection times out

**Expected behavior**

HNS conntrack should correlate the response based on the SNATed source IP and port tuple, regardless of whether the response source IP matches the original destination. This is how Linux iptables/conntrack behaves.

Alternatively, provide a configuration option to control this behavior.

**Configuration:**
- Edition: Windows Server 2022 Datacenter (ltsc2022)
- Base Image being used: Windows Server Core
- Container engine: containerd 1.7.29
- Container Engine version: 1.7.29
- HNS Network type: L2Bridge (Calico BGP mode)

**Additional context**

This issue affects Kubernetes clusters running Calico CNI on Windows nodes in BGP mode. The OutBoundNAT policy is applied via the CNI configuration:

```json
{
  "Name": "EndpointPolicy",
  "Value": {
    "Type": "OutBoundNAT",
    "ExceptionList": ["10.152.184.0/24"]
  }
}
```

The hairpin NAT / NAT reflection pattern is common when internal clients need to access services via their public DNS names. Linux containers in the same cluster handle this correctly because iptables conntrack is more permissive with response source matching.

Workarounds:
- Use split-horizon DNS so internal clients resolve to internal IPs
- Add SNAT on the upstream router for hairpin traffic (forces responses back through the router)

Neither workaround is ideal. The expected behavior is for HNS to match Linux iptables semantics when OutBoundNAT is enabled.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HNS OutBoundNAT conntrack fails to correlate responses when upstream device performs DNAT #625

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HNS OutBoundNAT conntrack fails to correlate responses when upstream device performs DNAT #625

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions