linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: Roman Penyaev <rpenyaev@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [RFC PATCH 1/1] epoll: use rwlock in order to reduce ep_poll_callback() contention
Date: Thu, 6 Dec 2018 20:35:52 +0000	[thread overview]
Message-ID: <20181206203552.GA20162@dcvr> (raw)
In-Reply-To: <39192b9caf1114c95cd23e786a9c3e60@suse.de>

Roman Penyaev <rpenyaev@suse.de> wrote:
> On 2018-12-06 00:46, Eric Wong wrote:
> > Roman Penyaev <rpenyaev@suse.de> wrote:
> > > Hi all,
> > > 
> > > The goal of this patch is to reduce contention of ep_poll_callback()
> > > which
> > > can be called concurrently from different CPUs in case of high events
> > > rates and many fds per epoll.  Problem can be very well reproduced by
> > > generating events (write to pipe or eventfd) from many threads, while
> > > consumer thread does polling.  In other words this patch increases the
> > > bandwidth of events which can be delivered from sources to the
> > > poller by
> > > adding poll items in a lockless way to the list.
> > 
> > Hi Roman,
> > 
> > I also tried to solve this problem many years ago with help of
> > the well-tested-in-userspace wfcqueue from Mathieu's URCU.
> > 
> > I was also looking to solve contention with parallel epoll_wait
> > callers with this.  AFAIK, it worked well; but needed the
> > userspace tests from wfcqueue ported over to the kernel and more
> > review.
> > 
> > I didn't have enough computing power to show the real-world
> > benefits or funding to continue:
> > 
> > 	https://lore.kernel.org/lkml/?q=wfcqueue+d:..20130501
> 
> Hi Eric,
> 
> Nice work.  That was a huge change by itself and by dependency
> on wfcqueue.  I could not find any valuable discussion on this,
> what was the reaction of the community?

Hi Roman, AFAIK there wasn't much reaction.  Mathieu was VERY
helpful with wfcqueue but there wasn't much else.  Honestly, I'm
surprised wfcqueue hasn't made it into more places; I love it :)

(More recently, I started an effort to get glibc malloc to use wfcqueue:
https://public-inbox.org/libc-alpha/20180731084936.g4yw6wnvt677miti@dcvr/ )

> > It might not be too much trouble for you to brush up the wait-free
> > patches and test them against the rwlock implementation.
> 
> Ha :)  I may try to cherry-pick these patches, let's see how many
> conflicts I have to resolve, eventpoll.c has been changed a lot
> since that (6 years passed, right?)

AFAIK not, epoll remains a queue with a key-value mapping.
I'm not a regular/experienced kernel hacker and I had no trouble
understanding eventpoll.c years ago.

> But reading your work description I can assume that epoll_wait() calls
> should be faster, because they do not content with ep_poll_callback(),
> and I did not try to solve this, only contention between producers,
> which make my change tiny.

Yes, I recall that was it.  My real-world programs[1], even without
slow HDD access, didn't show it, though.

> I also found your https://yhbt.net/eponeshotmt.c , where you count
> number of bare epoll_wait() calls, which IMO is not correct, because
> we need to count how many events are delivered, but not how fast
> you've returned from epoll_wait().  But as I said no doubts that
> getting rid of contention between consumer and producers will show
> even better results.

"epoll_wait calls" == "events delivered" in my case since I (ab)use
epoll_wait with max_events=1 as a work-distribution mechanism
between threads.  Not a common use-case, I admit.

My design was terrible from a syscall overhead POV, but my
bottleneck for real-world use for cmogstored[1] was dozens of
rotational HDDs in JBOD configuration; so I favored elimination
of head-of-line blocking over throughput of epoll itself.

My motivation for hacking on epoll back then was only to look
better on synthetic benchmarks that didn't hit slow HDDs :)



[1] git clone https://bogomips.org/cmogstored.git/
    the Ragel-generated HTTP parser was also a bottleneck in
    synthetic benchmarks, as we

      reply	other threads:[~2018-12-06 20:35 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 11:02 [RFC PATCH 1/1] epoll: use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
2018-12-03 17:34 ` Linus Torvalds
2018-12-04 11:50   ` Roman Penyaev
2018-12-04 23:59     ` Andrea Parri
2018-12-05 11:25       ` Roman Penyaev
2018-12-04 17:23 ` Jason Baron
2018-12-04 19:02   ` Paul E. McKenney
2018-12-05 11:22     ` Roman Penyaev
2018-12-05 11:16   ` Roman Penyaev
2018-12-05 16:38     ` Jason Baron
2018-12-05 20:11       ` Roman Penyaev
2018-12-06  1:54   ` Davidlohr Bueso
2018-12-06  3:08   ` Davidlohr Bueso
2018-12-06 10:27     ` Roman Penyaev
2018-12-06  4:04   ` Davidlohr Bueso
2018-12-06 10:25     ` Roman Penyaev
2018-12-05 23:46 ` Eric Wong
2018-12-06 10:52   ` Roman Penyaev
2018-12-06 20:35     ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181206203552.GA20162@dcvr \
    --to=e@80x24.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rpenyaev@suse.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).