Reaching beyond 1Gbps: How we achieved NAT traversal with vanilla WireGuard
Nord Security engineers have been hard at work developing Meshnet, a mesh networking solution that employs the WireGuard tunneling protocol. Here are the technical details on how we tackled the challenge of optimizing Meshnet’s speed.
Meshnet is powered by NordLynx, a protocol based on Wireguard. WireGuard is an excellent tunneling protocol. It is open, secure, lightweight, lean, and – thanks to the in-kernel implementations like in the Linux kernel or the Windows NT kernel – really, really fast.
At the heart of it is “cryptokey routing,” which makes creating a tunnel almost as easy as tracking a few hundred bytes of state. So having hundreds or even thousands of tunnels from a single machine is feasible.
These properties make WireGuard a very appealing building block for peer-to-peer mesh networks. But before getting there, a challenge or two must still be overcome. So let’s dig into them!
Ground rules
Here are ground rules to help us to better weigh tradeoffs. First, privacy and security is a priority, so any tradeoff compromising end-to-end encryption or exposing too much information is automatically off the table. Second, speed and stability is one of the most important qualities of Meshnet. Finally, to cover all major operating systems (Windows, Android, iOS, macOS, and Linux), any ideas or solutions must be implementable on those platforms.
So here are the ground rules:
Rule #1
Everything will be end-to-end encrypted. Any user data passing between devices must be inaccessible to anyone else – even to Nord Security itself.
Rule #2
No mixing of the data plane (i.e., the code that processes packets) and control plane (i.e., the code that configures the network), if possible. That’s because any additional logic (e.g., NAT traversal, packet filtering/processing) added to the WireGuard will slow it down.
Rule #3
No solutions that target a single WireGuard implementation. Remember those fast in-kernel implementations? In order to reach high throughput everywhere, we must be able to adapt to the intricacies of every platform.
Great! Now let’s get cracking!
NAT traversal 101
Every peer-to-peer application (including Meshnet) has a NAT traversal implementation at its heart. While this is a rather wide topic (just look at the amount of related RFCs: RFC3261, RFC4787, RFC5128, RFC8489, RFC8445, RFC8656…), the core principle is quite simple: NATs are generally designed to support outgoing connections really well.
They achieve this by forwarding any outgoing packets while remembering just enough information to be able to discern where and how to forward incoming response packets whenever they arrive. The exact nature of this information and how it is used will determine the type of the NAT and its specific behavior. For example, Linux NATs are based on the conntrack kernel module and one can easily check the state of this information at any moment using the conntrack -L command.
$ sudo conntrack -L
tcp 6 382155 ESTABLISHED src=192.168.3.140 dst=172.217.18.3 sport=60278 dport=443 src=172.217.18.3 dst=192.168.3.140 sport=443 dport=60278 [ASSURED] mark=0 use=1
tcp 6 348377 ESTABLISHED src=192.168.228.204 dst=35.85.173.255 sport=38758 dport=443 src=35.85.173.255 dst=192.168.228.204 sport=443 dport=38758 [ASSURED] mark=0 use=1
......
This great RFC4787 goes into a lot of detail about NAT behavior in general.
While outgoing connections are handled transparently, incoming connections can be trouble. Without outgoing packets forwarded first (and consequently without the conntrack information), NATs simply do not have any clue where to forward packets of incoming connections and the only choice left is to drop them. At this moment, we finally arrive at the core part of any peer-to-peer connection establishment:
Suppose you shoot a packet from both sides of the peer-to-peer connection at each other roughly at the same time. In this case, the connection will appear to be “outgoing” from the perspective of both NATs, allowing hosts to communicate.
Let’s unpack it a bit:
- “Shoot a packet” – send a UDP packet. While there are techniques regarding other protocols, only UDP packets matter in this case, as WireGuard is UDP-based. The packet’s payload contents do not matter (it can even be empty), but it’s important to get the headers right.
- “at each other” – the packet’s source and destination addresses and ports, transmitted from different sides of the connection, must mirror each other just after the first translation has been performed but before any translations by the second NAT occur. No matter what source address and port are being used by the NAT on the side for outgoing packets, the other side must send its packets to this exact address and port and vice versa. Unfortunately, some NATs make it very difficult to figure out the translations they are making, which is why NAT traversal is never 100% reliable.
- “roughly at the same time” – the data about outgoing connections within a NAT isn’t stored forever, so the packet from the other side must reach the NAT before this data disappears. The storage time greatly depends on the NAT – it varies from half a minute to a few minutes.
This technique is surprisingly general. Only small bits and pieces differ within the different cases a typical peer-to-peer application needs to support.
A few things need to be done right, but all of this is possible with vanilla WireGuard and the established ground rules. Take two packets and send them from the right source to the right destination at roughly the same time, without even worrying about what’s inside of the packets. How hard can it be? #FamousLastWords.
WG-STUN
The key part of any NAT traversal implementation is figuring out what translations will be performed by the NAT. In some cases, there is no NAT (e.g., host on the open internet), or it is possible to simply request a NAT to perform specific translations instead (e.g., by using UPnP RFC6970, PMP RFC6886). Sometimes, the translation has to be observed in action. Luckily, a standardized protocol STUN (RFC8489) does just that.
While there are some intricacies with the STUN protocol itself, the so-called STUN binding request is at its core. This binding request usually is formatted by the client behind NAT and processed by the server hosted on the open internet. Upon receiving this request, the server will look at the source IP address and port of the request packet and add it to the payload of the response packet.
A few of the NATs will use the same translations of the source IP address regardless of the destination (let’s call them “friendly NATs”). The same source IP address and the source port will be used for the packets going to the STUN server and any Meshnet peer. But there is a catch! The same NAT translations will be performed only as long as the packets are using the same source IP and port for all destinations on the originating host.
Here’s the first challenge. Vanilla WireGuard is not capable of performing STUN requests on its own. Moreover, once WireGuard reserves a source port for communications with its peers, other programs cannot, generally, use it anymore.
While it is technically possible to add STUN functionality to WireGuard, it would be in violation of our ground rule #2 and would seriously complicate the relationship with the rule #3. The search continues.
The WireGuard protocol is designed to create IP tunnels. Maybe it’s possible to transmit STUN requests inside of the tunnel? That way, the STUN request would get encapsulated, resulting in two IP packets: inner (STUN) and outer (WireGuard). Luckily, according to the WireGuard whitepaper, all outer packets destined to any peer should reuse the same source IP and port:
It’s been the behavior of all WireGuard implementations tested for this blog post.
Using this property, we can assume that packets destined for distinct WireGuard peers will get the same translations when going through friendly NATs. That’s precisely what we need when using an external service (like STUN) to determine which translations NAT will use when communicating with Meshnet peers.
But no standard STUN server can communicate with WireGuard directly. Even if we hosted a STUN server at the other end of the tunnel, after decapsulation, the server would respond with the inner packet’s source IP and port – but we the need outer packet’s source IP and port.
Say hello to WG-STUN, a small service that maintains WireGuard tunnels with clients and waits for STUN requests inside the tunnels. When a binding request arrives, instead of looking into the binding request packet, the STUN server takes the address from the WireGuard peer itself and writes it into the STUN binding response. Later, it encapsulates the packet according to WireGuard protocol and sends it back to the client. On the client side, to figure out what translations will be performed by the NAT for the WireGuard connections, we just need to add WG-STUN peer and transmit a standard STUN request inside the tunnel.
In the picture above, you can see a standard WG-STUN request. In this case, a STUN request was sent to 100.64.0.4, which is a reserved IP for an in-tunnel STUN service. The request got encapsulated and transmitted by WireGuard to one of the WG-STUN servers hosted by Nord Security. This WG-STUN server is just a standard WireGuard peer with the allowed IP set to 100.64.0.4/32, and the endpoint pointed to the server itself.
Note that the WG-STUN service is, by design, a small service that is functionally incapable of doing anything other than responding to STUN requests (and ICMP for reachability testing). This way, we are bounding this service to control-plane only and adhering to rule #2. Because the WG-STUN service is just a standard peer, WireGuard’s cross-platform interface is more than enough to control the WG-STUN peer in any of the WireGuard implementations (rule #3), Most importantly, due to WireGuard’s encryption, we get privacy and security by default (rule #1).
Path selection
Now we can perform STUN with vanilla WireGuard and figure out some translations which NAT will perform, provided that our NAT is friendly NAT. Unfortunately, that’s not enough to ensure good connectivity with Meshnet peers. What if there is no NAT at all? What if two NATs are in a chain, and our Meshnet peer is between them? What if a Meshnet peer is running in the VM of a local machine? What if a Meshnet peer managed to “ask” its NAT for specific translations via UPnP? There are quite a few possible configurations here. Sometimes we call these configurations “paths,” describing how one Meshnet peer can reach another. In the real world, the list of potential paths is a lot longer than the list of paths that can sustain the peer-to-peer connection.
For example, one Meshnet peer may access the other directly if both are within the same local area network. What’s more, if NAT supports hair-pinning, the same peer may be accessed via the WAN IP address of the router too. Additionally, it is common for a single host to participate in multiple networks at the same time (e.g., by virtualized networks, using multiple physical interfaces, DNATing, etc.). But it is impossible to know in advance which paths are valid and which are not.
For this reason, peer-to-peer applications usually implement connectivity checks to determine which paths allow peers to reach one another (e.g., checks standardized in ICE (RFC8445), and when multiple paths pass the checks, they select the best one. These checks are usually performed in the background, separate from a data channel, to avoid interfering with the currently in-use path. For example, if two peers are connected via some relay service (e.g., TURN RFC8656), an attempt to upgrade to a better path (e.g., direct LAN), which is not validated, may cause path interruption until timeout passes and that would be deeply undesirable.
While WireGuard implementations indicate the reachability of currently configured peers used for the data plane, the lightweight nature of the WireGuard protocol makes alternative path evaluation out of scope. The question is: how can we separate the data plane from connectivity checks?
Considering the affordable nature of WireGuard tunnels, the most straightforward solution would be to configure two pairs of peers on each Meshnet node – one for the data plane, the other for connectivity checks. But this solution is not feasible in practice. WireGuard peers are identified by their identity (public key), and each interface has only one identity. Otherwise, cryptokey routing and roaming functionality, in its current form, would break. Moreover, mobile platforms can have at most one interface open at any moment, restricting Meshnet nodes to a single identity at a given time.
So let’s look for solutions elsewhere. Here’s how we came to the observation which is now the core principle for performing connectivity checks out of the data plane:
Given that a connection can be established using a pair of endpoints – it is highly likely that performing the same steps with a different source endpoint will succeed.
It is possible to force this observation not to be true, but it wouldn’t be a natural occurrence. NATs will have the same mapping and filtering behavior for any pair of distinct outgoing connections. RFC4787 considers NAT determinicity as a desirable property. UPnP RFC6970, PMP RFC6886, and similar protocols will behave similarly for distinct requests. LAN is almost never filtered on a per-source-port basis for outgoing connections.
On the other hand, making such an assumption allows us to completely separate connectivity checks and the data plane. After performing a connectivity check out-of-band, a path upgrade can be done with a high degree of certainty of success.
Therefore, in our Meshnet implementation, Meshnet nodes gather endpoints (as per ICE (RFC8445) standard) for two distinct purposes. First, to perform connectivity checks, and second, to upgrade the WireGuard connection in case connectivity checks succeed. Once the list of endpoints is known, the endpoints are exchanged between participating Meshnet nodes using relay servers. For privacy and security, the endpoint exchange messages are encrypted and authenticated using the X25519 ECDH algorithm and ChaCha20Poly1305 for AEAD. Afterward, the connectivity checks are performed separately from WireGuard using plain old UDP sockets. If multiple endpoint candidates succeed in the connectivity check, the candidate with the lowest round-trip time is preferred.
We have validated a path using some pair of endpoints, so the corresponding data plane endpoints are selected, and a path upgrade is attempted. If the upgrade fails to establish a connection, it is banned for a period of time, but if it succeeds → we have successfully established a peer-to-peer connection using vanilla WireGuard.
And now we can fire up iperf3 and measure what it means. As you may have realized, we are now measuring vanilla WireGuard itself. For example, running two Meshnet nodes in docker containers on a single, rather average laptop equipped with Intel i5-8265U without any additional tweaking or tuning, we can easily surpass the 2Gbps mark for single TCP connection iperf3 test.
At the time of writing, the default WireGuard implementation used by Meshnet for Linux is the Linux kernel, Windows – WireGuard-NT or WireGuard-go, and for other platforms – boringtun.
Conclusion
By solving a few challenges, Nord Security’s Meshnet implementation managed to build a Meshnet based on WireGuard with peer-to-peer capabilities using only an xplatform interface and the benefits of in-kernel WireGuard implementations. It surpassed the 1Gbps throughput mark. Currently, the implementation is in the process of being released, so stay tuned for a big speed upgrade!