From: Paolo Abeni <pabeni@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>,
Eric Dumazet <edumazet@google.com>,
netdev <netdev@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Jiri Pirko <jiri@mellanox.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Alexei Starovoitov <ast@plumgrid.com>,
Alexander Duyck <aduyck@mirantis.com>,
Tom Herbert <tom@herbertland.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>, Rik van Riel <riel@redhat.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/2] net: threadable napi poll loop
Date: Wed, 11 May 2016 16:38:36 +0200 [thread overview]
Message-ID: <1462977516.4444.73.camel@redhat.com> (raw)
In-Reply-To: <1462972112.23934.139.camel@edumazet-glaptop3.roam.corp.google.com>
On Wed, 2016-05-11 at 06:08 -0700, Eric Dumazet wrote:
> On Wed, 2016-05-11 at 11:48 +0200, Paolo Abeni wrote:
> > Hi Eric,
> > On Tue, 2016-05-10 at 15:51 -0700, Eric Dumazet wrote:
> > > On Wed, 2016-05-11 at 00:32 +0200, Hannes Frederic Sowa wrote:
> > >
> > > > Not only did we want to present this solely as a bugfix but also as as
> > > > performance enhancements in case of virtio (as you can see in the cover
> > > > letter). Given that a long time ago there was a tendency to remove
> > > > softirqs completely, we thought it might be very interesting, that a
> > > > threaded napi in general seems to be absolutely viable nowadays and
> > > > might offer new features.
> > >
> > > Well, you did not fix the bug, you worked around by adding yet another
> > > layer, with another sysctl that admins or programs have to manage.
> > >
> > > If you have a special need for virtio, do not hide it behind a 'bug fix'
> > > but add it as a features request.
> > >
> > > This ksoftirqd issue is real and a fix looks very reasonable.
> > >
> > > Please try this patch, as I had very good success with it.
> >
> > Thank you for your time and your effort.
> >
> > I tested your patch on the bare metal "single core" scenario, disabling
> > the unneeded cores with:
> > CPUS=`nproc`
> > for I in `seq 1 $CPUS`; do echo 0 > /sys/devices/system/node/node0/cpu$I/online; done
> >
> > And I got a:
> >
> > [ 86.925249] Broke affinity for irq <num>
> >
>
> Was it fatal, or simply a warning that you are removing the cpu that was
> the only allowed cpu in an affinity_mask ?
The above message is emitted with pr_notice() by the x86 version of
fixup_irqs(). It's not fatal, the host is alive and well after that. The
un-patched kernel does not emit it on cpus disabling.
I'll try to look into this later.
> Looks another bug to fix then ? We disabled CPU hotplug here at Google
> for our production, as it was notoriously buggy. No time to fix dozens
> of issues added by a crowd of developers that do not even know a cpu can
> be unplugged.
>
> Maybe some caller of local_bh_disable()/local_bh_enable() expected that
> current softirq would be processed. Obviously flaky even before the
> patches.
>
> > for each irq number generated by a network device.
> >
> > In this scenario, your patch solves the ksoftirqd issue, performing
> > comparable to the napi threaded patches (with a negative delta in the
> > noise range) and introducing a minor regression with a single flow, in
> > the noise range (3%).
> >
> > As said in a previous mail, we actually experimented something similar,
> > but it felt quite hackish.
>
> Right, we are networking guys, and we feel that messing with such core
> infra is not for us. So we feel comfortable adding a pure networking
> patch.
>
> >
> > AFAICS this patch adds three more tests in the fast path and affect all
> > other softirq use case. I'm not sure how to check for regression there.
>
> It is obvious to me that ksoftird mechanism is not working as intended.
>
> Fixing it might uncover bugs from parts of the kernel relying on the
> bug, indirectly or directly. Is it a good thing ?
>
> I can not tell before trying.
>
> Just by looking at /proc/{ksoftirqs_pid}/sched you can see the problem,
> as we normally schedule ksoftird under stress but most of the time,
> the softirq items were processed by another tasks as you found out.
>
>
> >
> > The napi thread patches are actually a new feature, that also fixes the
> > ksoftirqd issue: hunting the ksoftirqd issue has been the initial
> > trigger for this work. I'm sorry for not being clear enough in the cover
> > letter.
> >
> > The napi thread patches offer additional benefits, i.e. an additional
> > relevant gain in the described test scenario, and do not impact on other
> > subsystems/kernel entities.
> >
> > I still think they are worthy, and I bet you would disagree, but could
> > you please articulate more which parts concern you most and/or are more
> > bloated ?
>
> Just look at the added code. napi_threaded_poll() is very buggy, but
> honestly I do not want to fix the bugs you added there. If you have only
> one vcpu, how jiffies can ever change since you block BH ?
Uh, we have likely the same issue in the net_rx_action() function, which
also execute with bh disabled and check for jiffies changes even on
single core hosts ?!?
Aren't jiffies updated by the timer interrupt ? and thous even with
bh_disabled ?!?
> I was planning to remove cond_resched_softirq() that we no longer use
> after my recent changes to TCP stack,
> and you call it again (while it is obviously buggy since it does not
> check if a BH is pending, only if a thread needs the cpu)
I missed that, thank you for pointing out.
> I prefer fixing the existing code, really. It took us years to
> understand it and maybe fix it.
>
> Just think of what will happen if you have 10 devices (10 new threads in
> your model) and one cpu.
>
> Instead of the nice existing netif_rx() doing 64 packets per device
> rounds, you'll now rely on process scheduler behavior that has no such
> granularity.
>
> Adding more threads is the natural answer of userland programmers, but
> in the kernel it is not the right answer. We already have mechanism,
> just use them and fix them if they are broken.
>
> Sorry, I really do not think your patches are the way to go.
> But this thread is definitely interesting.
Oh, this is a far better comment that I would have expected ;-)
Cheers,
Paolo
next prev parent reply other threads:[~2016-05-11 14:38 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-10 14:11 [RFC PATCH 0/2] net: threadable napi poll loop Paolo Abeni
2016-05-10 14:11 ` [RFC PATCH 1/2] net: implement threaded-able napi poll loop support Paolo Abeni
2016-05-10 14:11 ` [RFC PATCH 2/2] net: add sysfs attribute to control napi threaded mode Paolo Abeni
2016-05-10 14:29 ` [RFC PATCH 0/2] net: threadable napi poll loop Eric Dumazet
2016-05-10 15:51 ` David Miller
2016-05-10 16:03 ` Paolo Abeni
2016-05-10 16:08 ` Eric Dumazet
2016-05-10 20:22 ` Paolo Abeni
2016-05-10 20:45 ` David Miller
2016-05-10 20:50 ` Rik van Riel
2016-05-10 20:52 ` David Miller
2016-05-10 21:01 ` Rik van Riel
2016-05-10 20:46 ` Hannes Frederic Sowa
2016-05-10 21:09 ` Eric Dumazet
2016-05-10 21:31 ` Eric Dumazet
2016-05-10 21:35 ` Rik van Riel
2016-05-10 21:53 ` Eric Dumazet
2016-05-10 22:02 ` Eric Dumazet
2016-05-10 22:44 ` Eric Dumazet
2016-05-10 22:02 ` Rik van Riel
2016-05-11 17:55 ` Eric Dumazet
2016-05-10 22:32 ` Hannes Frederic Sowa
2016-05-10 22:51 ` Eric Dumazet
2016-05-11 6:55 ` Peter Zijlstra
2016-05-11 13:13 ` Hannes Frederic Sowa
2016-05-11 14:40 ` Eric Dumazet
2016-05-11 15:01 ` Rik van Riel
2016-05-11 15:50 ` Eric Dumazet
2016-05-11 21:56 ` Eric Dumazet
2016-05-12 20:07 ` Paolo Abeni
2016-05-12 20:49 ` Eric Dumazet
2016-05-12 20:58 ` Paolo Abeni
2016-05-12 21:05 ` Eric Dumazet
2016-05-13 16:50 ` Paolo Abeni
2016-05-13 17:03 ` Eric Dumazet
2016-05-13 17:19 ` Paolo Abeni
2016-05-13 17:36 ` Eric Dumazet
2016-05-16 13:10 ` Paolo Abeni
2016-05-16 13:38 ` Eric Dumazet
2016-05-11 9:48 ` Paolo Abeni
2016-05-11 13:08 ` Eric Dumazet
2016-05-11 13:39 ` Hannes Frederic Sowa
2016-05-11 13:47 ` Hannes Frederic Sowa
2016-05-11 14:38 ` Paolo Abeni [this message]
2016-05-11 14:45 ` Eric Dumazet
2016-05-11 22:47 ` Hannes Frederic Sowa
2016-05-10 15:57 ` Thomas Gleixner
2016-05-10 20:41 ` Paolo Abeni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1462977516.4444.73.camel@redhat.com \
--to=pabeni@redhat.com \
--cc=aduyck@mirantis.com \
--cc=ast@plumgrid.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=hannes@stressinduktion.org \
--cc=jiri@mellanox.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).