All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marek Majkowski <marek@cloudflare.com>
To: linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	jbaron@akamai.com
Subject: Resurrecting EPOLLROUNDROBIN
Date: Mon, 25 Mar 2019 12:38:24 +0100	[thread overview]
Message-ID: <CAJPywTLRxP4P6J8c4pzpwtZ1NhYwiRQ_P1dbCX00UYrBK7hg2Q@mail.gmail.com> (raw)

Hi,

Recently we noticed epoll is not helpful for load balancing when
called on a listen TCP socket. I described this in a blog post:

https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/

The short explanation: new connections going to a listen socket are
not evenly distributed across processes that wait on the EPOLLIN. In
practice the last process doing epoll_wait() will get the new
connection. See the trivial program to reproduce:

https://github.com/cloudflare/cloudflare-blog/blob/master/2017-10-accept-balancing/epoll-and-accept.py

   $ ./epoll-and-accept.py &
   $ for i in `seq 6`; do echo | nc localhost 1024; done
   worker 0
   worker 0
   worker 0
   worker 0
   worker 0
   worker 0

Worker #0 did all the accept() calls. This is because the listen
socket wait queue is a LIFO (not FIFO!). With current behaviour, the
process calling epoll_wait() most recently will be woken up first.
This usually is the busiest process. This leads to uneven load
distribution across worker processes.

Notice, described problem is different from what EPOLLEXCLUSIVE tries
to solve. Exclusive flag is about waking up exactly one process, as
opposed to default behaviour of waking up all the subscribers
(thundering herd problem). Without EPOLLEXCLUSIVE the described
load-balancing problem is less prominent, since there is an inherent
race when all the woken up processes fight for the new connection. In
such case the other workers have some chance of getting the new
connection. The core problem still is there - accept calls are not
well balanced across waiting processes.

On a loaded server avoiding EPOLLEXCLUSIVE is wasteful. With high
number of new connections, and dozens of worker processes, waking up
everybody on every new connection is suboptimal.

Notice, that multiple threads doing blocking accept() have a proper
FIFO behaviour. In other words: you can achieve round-robin load
balancing by having multiple workers hang on accept(), while you can't
have that behaviour when waiting in epoll_wait().

We are using EPOLLEXCLUSIVE, and as a solution to load-balancing
problem we backported the EPOLLROUNDROBIN patch submitted by Jason
Byron in 2015. We are running this patch for last 6 months, and it
helped us to flatten the load across workers (and reduce tail
latency).

https://lists.openwall.net/linux-kernel/2015/02/17/723

(PS. generally speaking EPOLLROUNDROBIN makes no sense in conjunction
with SO_REUSEPORT sockets)

Jason, would you mind to resubmit it?

Cheers,
    Marek

             reply	other threads:[~2019-03-25 11:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-25 11:38 Marek Majkowski [this message]
2019-03-25 16:54 ` Resurrecting EPOLLROUNDROBIN Jason Baron
2019-03-26  0:23 ` Andy Lutomirski
2019-03-26 15:00   ` Jason Baron
2019-03-27 15:57     ` Marek Majkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJPywTLRxP4P6J8c4pzpwtZ1NhYwiRQ_P1dbCX00UYrBK7hg2Q@mail.gmail.com \
    --to=marek@cloudflare.com \
    --cc=jbaron@akamai.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.