Two SYNs, One Connection: The TCP Simultaneous Open Path Through NAT
Source: lobsters
Most people who have worked with peer-to-peer networking know UDP hole punching well enough: both peers contact a rendezvous server, learn each other’s public addresses, fire UDP packets at each other simultaneously, and the NAT mappings open up in both directions. It works, it is well understood, and the canonical treatment is still the 2005 USENIX paper by Bryan Ford, Pyda Srisuresh, and Dan Kegel. TCP, though, gets treated as the hard sibling. The state machine is stricter, the failure modes are less obvious, and most tutorials either skip it or recommend UDP instead.
Matthew Roberts’s TCP hole punching writeup cuts through that reputation by centering the approach around something that has been in the TCP specification since 1981: simultaneous open. Once you understand that mechanism, the algorithm stops looking complicated and starts looking inevitable.
What NAT Actually Blocks
A NAT maintains a state table. When an internal host opens an outbound TCP connection, the NAT creates an entry: (internal_ip, internal_port) ↔ (public_ip, public_port). Inbound packets are only forwarded if a matching entry exists. An incoming SYN with no matching entry gets dropped.
For two hosts behind separate NATs to connect directly, each NAT needs to see an outbound packet from its own side before it will allow the corresponding inbound packet from the other side. With UDP this is trivial because any packet creates a NAT mapping. With TCP, the packet that creates a NAT mapping is the SYN, and a SYN also triggers a specific state machine transition. That transition is what most people get wrong.
RFC 793 Has Had the Answer Since 1981
The TCP specification in RFC 793, Section 3.4 describes a scenario where two hosts simultaneously initiate a connection to each other. Both send SYN. Both receive the other’s SYN while in SYN-SENT state. Both transition to SYN-RECEIVED. Both send SYN-ACK. The connection reaches ESTABLISHED without either side ever having been in LISTEN state.
Host A Host B
│ │
├──SYN────────────────────────►│
│◄────────────────────────SYN──┤
│ (both transition: │
│ SYN-SENT → SYN-RECEIVED) │
├──SYN-ACK ───────────────────►│
│◄─────────────────── SYN-ACK──┤
│ (both: ESTABLISHED) │
This is the simultaneous open path. It is fully standardized, required by RFC 5382 for any compliant NAT, and present in every mainstream TCP stack. What it requires is that both sides send their SYN close enough in time that each NAT sees the outbound SYN before the inbound SYN arrives. The first SYN creates the NAT mapping; the second SYN from the peer finds that mapping and gets forwarded.
The Socket Reuse Problem
The non-obvious implementation constraint is that the same local port must serve two roles simultaneously: it is the source port for the outbound SYN, and it needs to be the destination port for the inbound SYN. This means a single (local_ip, local_port) tuple must be bound by both a connecting socket and whatever is listening for the incoming connection.
By default, operating systems reject a second bind() call on an already-bound port. The solution is SO_REUSEPORT, which Linux added in kernel 3.9. BSD systems have had similar semantics under the same name for longer. Windows uses SO_REUSEADDR with broader semantics that can cause security issues if used carelessly.
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
sock.bind(('0.0.0.0', local_port)) # same port as used for rendezvous
sock.connect((peer_public_ip, peer_public_port)) # SYN sent here
The local_port here must match the port the peer learned from the rendezvous server. If you open a fresh socket and let the OS assign a port, you get a different NAT mapping with a different public port, and the peer is aiming at the wrong target.
Coordination and Timing
The rendezvous server serves two functions. First, it is the mechanism by which each peer discovers its own public (IP, port) as seen from the outside, which is the standard STUN function. Second, it coordinates a synchronized moment for both peers to send their SYNs. The server sends each peer the other’s public address along with a timestamp T, and both call connect() at time T.
The timing window is not as narrow as it sounds. TCP retransmits unacknowledged SYNs at roughly 1, 3, 7, and 15 seconds. If the first SYN from peer A reaches peer B’s NAT before B has sent its own SYN, B’s NAT drops it. But when A’s TCP stack retransmits that SYN, B’s NAT now has its own outbound mapping from B’s SYN, and the retransmitted SYN gets through. The retry behavior effectively extends the timing tolerance to several seconds, which is achievable with NTP-synchronized clocks or a server-coordinated countdown.
What RFC 5382 Requires and What NATs Actually Do
RFC 5382 specifies NAT behavioral requirements for TCP. The relevant requirements:
- REQ-1: NAT MUST use endpoint-independent mapping. The same internal (IP, port) always maps to the same public (IP, port) regardless of the remote destination. This is essential: without it, the public port peer A reports to the rendezvous server is useless because a connection to a different destination gets a different public port.
- REQ-2: NAT MUST support all valid TCP packet sequences, which explicitly includes simultaneous open.
- REQ-3: TCP mappings in
SYN-SENTstate MUST persist for at least 4 minutes.
The problem is compliance. Consumer NAT devices, particularly those running older firmware or custom ISP builds, frequently violate REQ-1 by using address and port-dependent mapping, known as symmetric NAT behavior. When both peers are behind symmetric NATs, the public port is different for every distinct (remote_ip, remote_port) pair, and there is no way to predict what port the peer will use for the actual hole-punching attempt. Measured success rates for simultaneous open between two symmetric NATs are near zero without a relay. Carrier-grade NAT (CGN) compounds the problem by stacking NAT layers at the ISP level, as specified in RFC 6888.
Empirical measurements from real networks, summarized in RFC 5128, suggest TCP hole punching succeeds in roughly 64% of tested scenarios, compared to around 82% for UDP. The gap mostly reflects stricter filtering behavior on TCP and inconsistent simultaneous open support in NAT firmware.
How libp2p Solved the Timing Problem
The implementation challenge of synchronized timing led libp2p to take a different approach in its DCuTR (Direct Connection Upgrade through Relay) protocol, documented as /libp2p/dcutr/1.0.0. Instead of using a rendezvous server for timing coordination, DCuTR uses an existing relay connection between the peers.
Both peers are already connected through a circuit relay. Through that relay, they exchange observed external addresses and then negotiate a simultaneous dial. The relay connection provides the signaling channel with accurate round-trip time measurements. Both sides wait a fixed 200 milliseconds after the final Connect message, then dial simultaneously. If the direct connection succeeds, the relay connection is dropped. This sidesteps the NTP-based clock synchronization problem and uses the relay only for setup, not for sustained traffic, which is the right cost model.
WebRTC’s ICE framework generalizes this further by running simultaneous STUN connectivity checks to all candidate address pairs, which accomplishes hole punching as a side effect of the probing process. The STUN packets opening NAT mappings while also testing reachability is the mechanism; the elegant part is that you learn what works and open the path in one round of messages.
QUIC Changes the Calculus
TCP hole punching will remain necessary as long as significant numbers of applications prefer TCP’s guarantees. But QUIC, which underlies HTTP/3, uses UDP and adds connection identifiers that are independent of the underlying (IP, port) tuple. A QUIC connection survives NAT rebinding events that would break a TCP connection. libp2p reports substantially higher hole-punching success rates with QUIC than TCP for this reason. WebRTC data channels use SCTP over DTLS over ICE over UDP, avoiding TCP hole punching entirely for application data.
IPv6 removes the problem entirely. Every host with a global unicast address has a publicly routable endpoint, and NAT traversal becomes unnecessary. Adoption has been slower than the IETF hoped, but as of 2025 roughly 45% of Google traffic is IPv6. The TCP simultaneous open technique is a bridge over an infrastructure gap that is, slowly, narrowing.
The Elegance
The appeal of a clean TCP hole punching algorithm is that it uses nothing beyond what RFC 793 already specifies. Simultaneous open has been in every conforming TCP stack for over four decades. RFC 5382 requires NATs to support it. The socket reuse semantics needed to implement it on a single port are available on every modern OS. The coordination problem is solvable with a timer and a shared clock reference.
When the pieces fit together, the implementation is short, uses no special protocols, requires no server-side relay bandwidth, and produces a direct TCP connection that behaves like any other. The cases where it fails, symmetric NAT and CGN, are infrastructure problems, not protocol problems. For the significant fraction of networks where the infrastructure cooperates, sending two SYNs that collide in flight is the whole trick.