TCP connection backlog - a struggling server
Tue, Mar 28, 2017Servers can’t accept incoming connections at an infinite rate, so let’s explore what happens when you try to establish too many connections to an overwhelmed server in a Linux environment.
Connection Establishment
A new connection is initiated when the server’s host receives a SYN
packet from a client - this is the first part of the TCP 3-way handshake. Upon receipt of the SYN
, the connection is initialized in SYN_RECV
state on the server side.
Under normal conditions, the server side then responds with a SYN-ACK
packet to complete the second part of the handshake. At this point, the connection is still in SYN_RECV
stat on the server side. Finally, when the server side receives an ACK
in response to the SYN(-ACK)
it just sent, the connection is transitioned to ESTABLISHED
state and can be considered ready for use.
Kernel Handoff
When the server listens on a TCP address, it relies on the kernel to get new inbound connections to the ESTABLISHED
state before the application can accept the connection. This means the entire 3-way handshake occurs before the application even becomes aware the connection is available. The handoff is done when the application calls the accept(2)
system call, and this results in a file descriptor for a new socket that the it can use to write or read data into or out of the TCP stream.
Note that accept(2)
only yields new connected sockets for connections that have already gone through the SYN_RECV
–> ESTABLISHED
transition.
The SYN Backlog and Socket Backlog Queues
The important bit here is that connections get queued up, waiting for the application to accept(2)
them. The queue that holds established connections is (of course) of finite length. In fact, there are two separate queues:
- The “SYN Backlog” queue, which holds connections in SYN_RECV
state. These inbound connections have been responded to by the kernel with a SYN-ACK
packet, but the kernel hasn’t yet heard the final ACK
back from the active opener. Such connections cannot yet be accepted by the application. This queue is specific to each listening socket, but its length is defined by the global setting, net.ipv4.tcp_max_syn_backlog
.
- The listening socket’s “Backlog”. This queue holds connections that have transitioned to ESTABLISHED
state before they’ve been accepted by the application. Connections are transferred to this queue from the SYN backlog upon receipt of the final ACK
from the active opener. The length of this queue can be defined using the backlog
argument to listen(2)
, but defaults to (and cannot surpass) the net.core.somaxconn
kernel parameter.
The Queues in Action
Now imagine that many clients are attempting to connect to a server, and there’s an influx of SYN
s at a rate greater than the application can accept them. We can create this scenario rather easily to see what happens.
In short, inbound connections will enter ESTABLISHED
state so long as there’s space in the socket backlog. Once the backlog fills up, the server will begin ignoring the client’s ACK
in hopes of deferring the connection setup until a subsequent retry.
Interestingly, the server will actually sent multiple SYN-ACK
packets back to the client to nudge the connection along in hopes that an ACK
will come back at a point when there’s space in the socket backlog. At least, this is the behavior with Linux 4.9.16.
Part I: Creating an Unresponsive Server
netcat(1)
provides everything we need to simulate an overloaded TCP server. We’ll start up a netcat server, then send it a SIGSTOP
signal to effectively keep the process off the CPU and prevent it from accepting new connections.
# start netcat listening on 127.0.0.1:44444
❯ nc -l 127.0.0.1 -p 44444
# SIGSTOP the new server so it cannot accept new inbound connections
❯ ps ux | grep nc
root 4826 0.0 0.3 6328 1680 pts/0 S+ 05:19 0:00 nc -l 127.0.0.1 -p 44444
❯ kill -SIGSTOP 4826
[1]+ Stopped nc -l 127.0.0.1 -p 44444
# note that nc is now in "T" (stopped) state
❯ ps ux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 4826 0.0 0.3 6328 1680 pts/0 T 05:19 0:00 nc -l 127.0.0.1 -p 44444
...
Part II: Create a Connection to the Paused Server
Next, we can again use netcat, this time in client mode, to connect to the (paused) server:
❯ nc 127.0.0.1 44444
If we use tcpdump to capture packets to/from port 44444 during the previous command, we can see the three-way handshake ([S]
, [S.]
, [.]
) take place even though the server application itself is doing nothing.
❯ tcpdump -i lo -n port 44444
...
05:20:29.411715 IP 127.0.0.1.46596 > 127.0.0.1.44444: Flags [S], seq 4143248557, win 43690, options [mss 65495,sackOK,TS val 1193633764 ecr 0,nop,wscale 6], length 0
05:20:29.411726 IP 127.0.0.1.44444 > 127.0.0.1.46596: Flags [S.], seq 2061299443, ack 4143248558, win 43690, options [mss 65495,sackOK,TS val 1193633764 ecr 1193633764,nop,wscale 6], length 0
05:20:29.411739 IP 127.0.0.1.46596 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 1193633764 ecr 1193633764], length 0
We can also see the new connection between our client and port 44444 is in fully ESTABLISHED
state.
# show file descriptors belonging to network sockets for port 44444
# we see the listening socket and the new connection
❯ lsof -i :44444
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nc 4826 root 3u IPv4 5573653 0t0 TCP *:44444 (LISTEN)
nc 4834 root 3u IPv4 5573818 0t0 TCP localhost:46596->localhost:44444 (ESTABLISHED)
Part III: Filling the Socket Backlog
Next, let’s fill the listening socket’s backlog (ie, the queue that holds ESTABLISHED
connections while they wait to be accepted by the netcat server). Specifically, I’d like to see what happens when a new client connection is attempted after the socket backlog is already full.
We know that this backlog’s maximum size is established during listen(2)
, and we know the backlog must be no larger than net.core.somaxconn
, but we don’t know what value netcat uses when it calls listen(2)
.
strace
to the rescue!
# run the netcat server command and capture traces of any calls
# to the listen() syscall. Put the results in /tmp/trace.out
❯ strace -e trace=listen -o /tmp/trace.out nc -l 127.0.0.1 -p 44444
❯ cat /tmp/trace.out
listen(3, 1) = 0
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
+++ exited with 1 +++
Great news! Netcat specifies its listen backlog with length 1, which means our above experiment should have filled the backlog.
Part IV: Attempting to Overflow the Backlog Queue
Now that the backlog is expected to be full, let’s see what happens when a second connection is attempted:
❯ nc 127.0.0.1 44444
Let’s look at the connections
# show IPV4 connections, printing only those lines where the
# fourth column (local address) is the port owned by the nc
# server.
❯ netstat -4 | awk '$4 == "localhost:44444" { print $0 }'
tcp 0 0 localhost:44444 localhost:53938 ESTABLISHED
tcp 0 0 localhost:44444 localhost:53936 ESTABLISHED
Oh weird, the connection established just fine. That’s unexpected.
I think this is because Linux actually rounds the queue length up to the nearest power of two, so even though the specified backlog value was 1, it’s actually 2. I haven’t confirmed this yet.
Part V: Actually Overflowing the Backlog Queue
Okay, now that the queue is actually full, a third connection attempt behaves differently:
❯ nc 127.0.0.1 44444
Immediately thereafter, netstat
reveals the new connection is in SYN_RECV state, and hasn’t achieved ESTABLISHED state:
❯ netstat -4 | awk '$4 == "localhost:44444" { print $0 }'
tcp 0 0 localhost:44444 localhost:53952 SYN_RECV
tcp 0 0 localhost:44444 localhost:53938 ESTABLISHED
tcp 0 0 localhost:44444 localhost:53936 ESTABLISHED
We also see a series of SYN-ACK
s back to the client, followed by ACK
s from the client to complete the handshake:
22:59:23.203407 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [S], seq 3515778601, win 43690, options [mss 65495,sackOK,TS val 20018301 ecr 0,nop,wscale 6], length 0
22:59:23.203430 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20018301 ecr 20018301,nop,wscale 6], length 0
22:59:23.203444 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20018301 ecr 20018301], length 0
22:59:24.230237 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20019328 ecr 20018301,nop,wscale 6], length 0
22:59:24.230250 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20019328 ecr 20018301], length 0
22:59:26.278225 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20021376 ecr 20019328,nop,wscale 6], length 0
22:59:26.278239 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20021376 ecr 20018301], length 0
22:59:30.310225 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20025408 ecr 20021376,nop,wscale 6], length 0
22:59:30.310241 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20025408 ecr 20018301], length 0
22:59:38.438251 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20033536 ecr 20025408,nop,wscale 6], length 0
22:59:38.438266 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20033536 ecr 20018301], length 0
22:59:54.822223 IP 127.0.0.1.44444 > 127.0.0.1.53952: Flags [S.], seq 2326745480, ack 3515778602, win 43690, options [mss 65495,sackOK,TS val 20049920 ecr 20033536,nop,wscale 6], length 0
22:59:54.822234 IP 127.0.0.1.53952 > 127.0.0.1.44444: Flags [.], ack 1, win 683, options [nop,nop,TS val 20049920 ecr 20018301], length 0
Now that the socket backlog is full, the kernel knows it cannot create any more ESTABLISHED
connections until the server accepts an existing connection and frees up space in the backlog. However, there’s still room in the SYN Backlog (where pre-ESTABLISHED
connections can be queued up), so the kernel replies to the connection attempt with SYN-ACK
as usual.
The client dutifully replies with ACK
, expecting the connection to proceed as usual. However, the kernel simply drops the ACK
upon receipt since it cannot move the connection to ESTABLISHED
.
At this point, the connection is “half-open”, since the client sees the connection as ESTABLISHED
but the server still holds it in SYN_RECV
state. As a way of notifying the client of this half-open status, and in hopes that the socket backlog will soon have some space for a new connection, the kernel retries SYN-ACK
a few times to trigger additional ACK
s from the client.
If, in the meantime, some space opened up in the socket backlog, the connection would be come fully open and we’d be in business. This retry behavior is specified by the net.ipv4.tcp_synack_retries
kernel parameter.
Half Open
To wrap things up, let’s look at the connection status from the server’s perspective:
# After a unsuccessful few minutes, the SYN_RECV connection is closed
# and removed from the server's perspective
❯ netstat -4 | awk '$4 == "localhost:44444" { print $0 }'
tcp 0 0 localhost:44444 localhost:53938 ESTABLISHED
tcp 0 0 localhost:44444 localhost:53936 ESTABLISHED
In contrast, the client sees an ESTABLISHED
connection, since it successfully sent ACK
s to complete the 3-way handshake. This mixed state is called “half-open”, and will end up getting reset as soon as the client tries to send anything to the server:
❯ netstat -4 | awk '$5 == "localhost:44444" { print $0 }'
tcp 0 0 localhost:53952 localhost:44444 ESTABLISHED
tcp 0 0 localhost:53938 localhost:44444 ESTABLISHED
tcp 0 0 localhost:53936 localhost:44444 ESTABLISHED
That this connection was allowed to reach this half-open state is due to the fact that there was sufficient space in the SYN Queue, where connections sit on the server while they’re in the process of being established. If the SYN Queue also fills up, new connection attempts will never even receive a SYN-ACK
at all.