In a previous blog post, my colleague Jes Nielsen explained how to protect your network traffic during uncertain times. He mentioned using IPsec as a network protocol to secure existing traffic and highlighted the need for high performance. In this blog post, we will focus on the usage of a single IPsec tunnel and the challenge to get higher bandwidth out of it.
If you have ever analyzed a network appliance datasheet, you would recognize very high throughputs for each network processing such as IP forwarding, firewalling or IPsec VPN tunnels. As any engineer will keep telling you, the devil is in the details. If you take the example of IPsec VPNs, there is a well-known limitation with general-purpose CPUs where a single IPsec tunnel performance will be limited and will not scale no matter the number of CPU cores you try to use. It Is sometimes called the “big fat pipe” syndrome. However, such details do not typically appear on datasheets where performance is based on the aggregated throughput of multiple IPsec tunnels. This issue typically affects site-to-site use-cases where a single IPsec tunnel connects both sites.
The limitation explained
In order to explain the Single IPsec tunnel performance limitation, we have to introduce the load balancing mechanism of Ethernet NICs, taken care of by an algorithm called RSS (Receive Side Scaling). This load balancing mechanism leverages reception queues that a user can tie to a CPU core. It helps to use multiple cores to poll multiple queues of the same NIC. The default load balancing of RSS consists of using a hash based on the 5-tuple of incoming packets and then distributing them to different reception queues.
As you can imagine, the 5-tuple of encrypted packets are based on the outer tunnel information of the IPsec tunnel. This information does not change for a given single IPsec tunnel which leads the RSS algorithm to send all encrypted packet to the same reception queue and thus being processed by the same CPU core.
This means that regardless of the number of different payloads we encapsulate within an IPsec tunnel, the decrypt operation will be always processed by one CPU core and any other core available in the system will run idle (cf. example figure 1.).
Figure 1: Example data plane (fast path) CPU usage for a VPN endpoint receiving encrypted traffic – Only 1 CPU core receiving the traffic
6WIND’s solution explained
Due to a lack of solutions, many users are forced to use more tunnels to help with the load balancing of the encrypted packets or buying an additional IPsec crypto offload card. Adopting a pipelined approach would be very helpful here and increase the performance of a single IPsec tunnel. The pipeline would be done at the IP level. The single-core receiving the entire single IPsec tunnel traffic would act as a load balancer core and dispatch the crypto operation to idle cores in the system. This mechanism would prevent using one core only for the crypto operations.
6WIND products, including 6WINDGate foundation and the vRouter software solutions, provide such a mechanism that we call Crypto Offloading. It is a feature enabled by default for the encrypted to the clear path and optional for the clear to encrypted path. It provides a boost to your single tunnel performance by leveraging crypto processing to any available cores in the data plane (fast-path).
Concretely, this solution helps to spread the traffic to multiple cores in the data plane and free some valuable CPU cycles on the receiving core which can receive additional packets and thus increase the single IPsec tunnel performance (cf. example figure 2.).
Figure 2: Example data plane (fast path) CPU usage for a VPN endpoint receiving encrypted traffic with 6WINDGate Crypto offload – Traffic is load-balanced among cores and Core #1 can process more traffic
In this example, a single IPsec tunnel flow is received by CPU0 only. CPU1 and CPU2 are idle and thus not receiving any traffic. Let us assume that we are receiving 5 packets over the tunnel, referenced from 1 to 5. Packet number 1 will be the first one to be offloaded to CPU2. Packet number 2 will be received by CPU0 and sent to the next available idle CPU which is CPU2. Any user can benefit from this crypto offload features and observe their single IPsec tunnel performance multiplied between two and six times.