TCP connection backlog - a struggling server

Servers can’t accept incoming connections at an infinite rate, so let’s explore what happens when you try to establish too many connections to an overwhelmed server in a Linux environment.

Connection Establishment

A new connection is initiated when the server’s host receives a SYN packet from a client - this is the first part of the TCP 3-way handshake. Upon receipt of the SYN, the connection is initialized in SYN_RECV state on the server side.

Under normal conditions, the server side then responds with a SYN-ACK packet to complete the second part of the handshake. At this point, the connection is still in SYN_RECV stat on the server side. Finally, when the server side receives an ACK in response to the SYN(-ACK) it just sent, the connection is transitioned to ESTABLISHED state and can be considered ready for use.

Kernel Handoff

When the server listens on a TCP address, it relies on the kernel to get new inbound connections to the ESTABLISHED state before the application can accept the connection. This means the entire 3-way handshake occurs before the application even becomes aware the connection is available. The handoff is done when the application calls the accept(2) system call, and this results in a file descriptor for a new socket that the it can use to write or read data into or out of the TCP stream.

Note that accept(2) only yields new connected sockets for connections that have already gone through the SYN_RECV –> ESTABLISHED transition.

The SYN Backlog and Socket Backlog Queues

The important bit here is that connections get queued up, waiting for the application to accept(2) them. The queue that holds established connections is (of course) of finite length. In fact, there are two separate queues: - The “SYN Backlog” queue, which holds connections in SYN_RECV state. These inbound connections have been responded to by the kernel with a SYN-ACK packet, but the kernel hasn’t yet heard the final ACK back from the active opener. Such connections cannot yet be accepted by the application. This queue is specific to each listening socket, but its length is defined by the global setting, net.ipv4.tcp_max_syn_backlog. - The listening socket’s “Backlog”. This queue holds connections that have transitioned to ESTABLISHED state before they’ve been accepted by the application. Connections are transferred to this queue from the SYN backlog upon receipt of the final ACK from the active opener. The length of this queue can be defined using the backlog argument to listen(2), but defaults to (and cannot surpass) the net.core.somaxconn kernel parameter.

The Queues in Action

Now imagine that many clients are attempting to connect to a server, and there’s an influx of SYNs at a rate greater than the application can accept them. We can create this scenario rather easily to see what happens.

In short, inbound connections will enter ESTABLISHED state so long as there’s space in the socket backlog. Once the backlog fills up, the server will begin ignoring the client’s ACK in hopes of deferring the connection setup until a subsequent retry.

Interestingly, the server will actually sent multiple SYN-ACK packets back to the client to nudge the connection along in hopes that an ACK will come back at a point when there’s space in the socket backlog. At least, this is the behavior with Linux 4.9.16.

Part I: Creating an Unresponsive Server

netcat(1) provides everything we need to simulate an overloaded TCP server. We’ll start up a netcat server, then send it a SIGSTOP signal to effectively keep the process off the CPU and prevent it from accepting new connections.

# start netcat listening on 127.0.0.1:44444
❯ nc -l 127.0.0.1 -p 44444
# SIGSTOP the new server so it cannot accept new inbound connections
❯ ps ux | grep nc
root      4826  0.0  0.3   6328  1680 pts/0    S+   05:19   0:00 nc -l 127.0.0.1 -p 44444kill -SIGSTOP 4826
[1]+  Stopped                 nc -l 127.0.0.1 -p 44444

# note that nc is now in "T" (stopped) state
❯ ps ux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      4826  0.0  0.3   6328  1680 pts/0    T    05:19   0:00 nc -l 127.0.0.1 -p 44444
...

Part II: Create a Connection to the Paused Server

Next, we can again use netcat, this time in client mode, to connect to the (paused) server:

❯ nc 127.0.0.1 44444

If we use tcpdump to capture packets to/from port 44444 during the previous command, we can see the three-way handshake ([S], [S.], [.]) take place even though the server application itself is doing nothing.

❯ tcpdump -i lo -n port 44444
...
05:20:29.411715 IP 127.0.0.1.46596 > 127.0.0.1.44444: Flags [S], seq 4143248557, win 43690, options [mss 65495,sackOK,TS val 1193633764 ecr 0,nop,wscale 6], length 0
05:20:29.411726 IP 127.0.0.1.44444 > 127.0.0.1.46596: Flags [S.], seq 2061299443, ack 4143248558, win 43690, options [mss 65495,sackOK,TS val 1193633764 ecr 1193633764,nop,wscale 6], length 0
05:20:29.411739 IP 127.0.0.1.46596 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 1193633764 ecr 1193633764], length 0

We can also see the new connection between our client and port 44444 is in fully ESTABLISHED state.

# show file descriptors belonging to network sockets for port 44444
# we see the listening socket and the new connection
❯ lsof -i :44444
COMMAND  PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
nc      4826 root    3u  IPv4 5573653      0t0  TCP *:44444 (LISTEN)
nc      4834 root    3u  IPv4 5573818      0t0  TCP localhost:46596->localhost:44444 (ESTABLISHED)

Part III: Filling the Socket Backlog

Next, let’s fill the listening socket’s backlog (ie, the queue that holds ESTABLISHED connections while they wait to be accepted by the netcat server). Specifically, I’d like to see what happens when a new client connection is attempted after the socket backlog is already full.

We know that this backlog’s maximum size is established during listen(2), and we know the backlog must be no larger than net.core.somaxconn, but we don’t know what value netcat uses when it calls listen(2).

strace to the rescue!

# run the netcat server command and capture traces of any calls
# to the listen() syscall. Put the results in /tmp/trace.out
❯ strace -e trace=listen -o /tmp/trace.out nc -l 127.0.0.1 -p 44444

❯ cat /tmp/trace.out
listen(3, 1)                            = 0
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
+++ exited with 1 +++

Great news! Netcat specifies its listen backlog with length 1, which means our above experiment should have filled the backlog.

Part IV: Attempting to Overflow the Backlog Queue

Now that the backlog is expected to be full, let’s see what happens when a second connection is attempted:

❯ nc 127.0.0.1 44444

Let’s look at the connections

# show IPV4 connections, printing only those lines where the
# fourth column (local address) is the port owned by the nc
# server.
❯ netstat -4 | awk '$4 == "localhost:44444" { print $0 }'
tcp        0      0 localhost:44444         localhost:53938         ESTABLISHED
tcp        0      0 localhost:44444         localhost:53936         ESTABLISHED

Oh weird, the connection established just fine. That’s unexpected.

I think this is because Linux actually rounds the queue length up to the nearest power of two, so even though the specified backlog value was 1, it’s actually 2. I haven’t confirmed this yet.

Part V: Actually Overflowing the Backlog Queue

Okay, now that the queue is actually full, a third connection attempt behaves differently:

❯ nc 127.0.0.1 44444

Immediately thereafter, netstat reveals the new connection is in SYN_RECV state, and hasn’t achieved ESTABLISHED state:

❯ netstat -4 | awk '$4 == "localhost:44444" { print $0 }'
tcp        0      0 localhost:44444         localhost:53952         SYN_RECV
tcp        0      0 localhost:44444         localhost:53938         ESTABLISHED
tcp        0      0 localhost:44444         localhost:53936         ESTABLISHED

We also see a series of SYN-ACKs back to the client, followed by ACKs from the client to complete the handshake:

22:59:23.203407 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [S], seq 3515778601, win 43690, options [mss 65495,sackOK,TS val 20018301 ecr 0,nop,wscale 6], length 0
22:59:23.203430 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20018301 ecr 20018301,nop,wscale 6], length 0
22:59:23.203444 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20018301 ecr 20018301], length 0
22:59:24.230237 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20019328 ecr 20018301,nop,wscale 6], length 0
22:59:24.230250 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20019328 ecr 20018301], length 0
22:59:26.278225 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20021376 ecr 20019328,nop,wscale 6], length 0
22:59:26.278239 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20021376 ecr 20018301], length 0
22:59:30.310225 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20025408 ecr 20021376,nop,wscale 6], length 0
22:59:30.310241 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20025408 ecr 20018301], length 0
22:59:38.438251 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20033536 ecr 20025408,nop,wscale 6], length 0
22:59:38.438266 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20033536 ecr 20018301], length 0
22:59:54.822223 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20049920 ecr 20033536,nop,wscale 6], length 0
22:59:54.822234 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20049920 ecr 20018301], length 0

Now that the socket backlog is full, the kernel knows it cannot create any more ESTABLISHED connections until the server accepts an existing connection and frees up space in the backlog. However, there’s still room in the SYN Backlog (where pre-ESTABLISHED connections can be queued up), so the kernel replies to the connection attempt with SYN-ACK as usual.

The client dutifully replies with ACK, expecting the connection to proceed as usual. However, the kernel simply drops the ACK upon receipt since it cannot move the connection to ESTABLISHED.

At this point, the connection is “half-open”, since the client sees the connection as ESTABLISHED but the server still holds it in SYN_RECV state. As a way of notifying the client of this half-open status, and in hopes that the socket backlog will soon have some space for a new connection, the kernel retries SYN-ACK a few times to trigger additional ACKs from the client.

If, in the meantime, some space opened up in the socket backlog, the connection would be come fully open and we’d be in business. This retry behavior is specified by the net.ipv4.tcp_synack_retries kernel parameter.

Half Open

To wrap things up, let’s look at the connection status from the server’s perspective:

# After a unsuccessful few minutes, the SYN_RECV connection is closed
# and removed from the server's perspective
❯ netstat -4 | awk '$4 == "localhost:44444" { print $0 }'
tcp        0      0 localhost:44444         localhost:53938         ESTABLISHED
tcp        0      0 localhost:44444         localhost:53936         ESTABLISHED

In contrast, the client sees an ESTABLISHED connection, since it successfully sent ACKs to complete the 3-way handshake. This mixed state is called “half-open”, and will end up getting reset as soon as the client tries to send anything to the server:

❯ netstat -4 | awk '$5 == "localhost:44444" { print $0 }'
tcp        0      0 localhost:53952         localhost:44444         ESTABLISHED
tcp        0      0 localhost:53938         localhost:44444         ESTABLISHED
tcp        0      0 localhost:53936         localhost:44444         ESTABLISHED

That this connection was allowed to reach this half-open state is due to the fact that there was sufficient space in the SYN Queue, where connections sit on the server while they’re in the process of being established. If the SYN Queue also fills up, new connection attempts will never even receive a SYN-ACK at all.