linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Eric Dumazet <edumazet@google.com>,
	netdev <netdev@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jiri Pirko <jiri@mellanox.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Alexander Duyck <aduyck@mirantis.com>,
	Tom Herbert <tom@herbertland.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Rik van Riel <riel@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/2] net: threadable napi poll loop
Date: Wed, 11 May 2016 16:38:36 +0200	[thread overview]
Message-ID: <1462977516.4444.73.camel@redhat.com> (raw)
In-Reply-To: <1462972112.23934.139.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, 2016-05-11 at 06:08 -0700, Eric Dumazet wrote:
> On Wed, 2016-05-11 at 11:48 +0200, Paolo Abeni wrote:
> > Hi Eric,
> > On Tue, 2016-05-10 at 15:51 -0700, Eric Dumazet wrote:
> > > On Wed, 2016-05-11 at 00:32 +0200, Hannes Frederic Sowa wrote:
> > > 
> > > > Not only did we want to present this solely as a bugfix but also as as
> > > > performance enhancements in case of virtio (as you can see in the cover
> > > > letter). Given that a long time ago there was a tendency to remove
> > > > softirqs completely, we thought it might be very interesting, that a
> > > > threaded napi in general seems to be absolutely viable nowadays and
> > > > might offer new features.
> > > 
> > > Well, you did not fix the bug, you worked around by adding yet another
> > > layer, with another sysctl that admins or programs have to manage.
> > > 
> > > If you have a special need for virtio, do not hide it behind a 'bug fix'
> > > but add it as a features request.
> > > 
> > > This ksoftirqd issue is real and a fix looks very reasonable.
> > > 
> > > Please try this patch, as I had very good success with it.
> > 
> > Thank you for your time and your effort.
> > 
> > I tested your patch on the bare metal "single core" scenario, disabling
> > the unneeded cores with:
> > CPUS=`nproc`
> > for I in `seq 1 $CPUS`; do echo 0  >  /sys/devices/system/node/node0/cpu$I/online; done
> > 
> > And I got a:
> > 
> > [   86.925249] Broke affinity for irq <num>
> > 
> 
> Was it fatal, or simply a warning that you are removing the cpu that was
> the only allowed cpu in an affinity_mask ?

The above message is emitted with pr_notice() by the x86 version of
fixup_irqs(). It's not fatal, the host is alive and well after that. The
un-patched kernel does not emit it on cpus disabling.

I'll try to look into this later.

> Looks another bug to fix then ? We disabled CPU hotplug here at Google
> for our production, as it was notoriously buggy. No time to fix dozens
> of issues added by a crowd of developers that do not even know a cpu can
> be unplugged.
> 
> Maybe some caller of local_bh_disable()/local_bh_enable() expected that
> current softirq would be processed. Obviously flaky even before the
> patches.
> 
> > for each irq number generated by a network device.
> > 
> > In this scenario, your patch solves the ksoftirqd issue, performing
> > comparable to the napi threaded patches (with a negative delta in the
> > noise range) and introducing a minor regression with a single flow, in
> > the noise range (3%).
> > 
> > As said in a previous mail, we actually experimented something similar,
> > but it felt quite hackish.
> 
> Right, we are networking guys, and we feel that messing with such core
> infra is not for us. So we feel comfortable adding a pure networking
> patch.
> 
> > 
> > AFAICS this patch adds three more tests in the fast path and affect all
> > other softirq use case. I'm not sure how to check for regression there.
> 
> It is obvious to me that ksoftird mechanism is not working as intended.
> 
> Fixing it might uncover bugs from parts of the kernel relying on the
> bug, indirectly or directly. Is it a good thing ?
> 
> I can not tell before trying.
> 
> Just by looking at /proc/{ksoftirqs_pid}/sched you can see the problem,
> as we normally schedule ksoftird under stress but most of the time,
> the softirq items were processed by another tasks as you found out.
> 
> 
> > 
> > The napi thread patches are actually a new feature, that also fixes the
> > ksoftirqd issue: hunting the ksoftirqd issue has been the initial
> > trigger for this work. I'm sorry for not being clear enough in the cover
> > letter.
> > 
> > The napi thread patches offer additional benefits, i.e. an additional
> > relevant gain in the described test scenario, and do not impact on other
> > subsystems/kernel entities. 
> > 
> > I still think they are worthy, and I bet you would disagree, but could
> > you please articulate more which parts concern you most and/or are more
> > bloated ?
> 
> Just look at the added code. napi_threaded_poll() is very buggy, but
> honestly I do not want to fix the bugs you added there. If you have only
> one vcpu, how jiffies can ever change since you block BH ?

Uh, we have likely the same issue in the net_rx_action() function, which
also execute with bh disabled and check for jiffies changes even on
single core hosts ?!?

Aren't jiffies updated by the timer interrupt ? and thous even with
bh_disabled ?!?

> I was planning to remove cond_resched_softirq() that we no longer use
> after my recent changes to TCP stack,
> and you call it again (while it is obviously buggy since it does not
> check if a BH is pending, only if a thread needs the cpu)

I missed that, thank you for pointing out.

> I prefer fixing the existing code, really. It took us years to
> understand it and maybe fix it.
> 
> Just think of what will happen if you have 10 devices (10 new threads in
> your model) and one cpu.
> 
> Instead of the nice existing netif_rx() doing 64 packets per device
> rounds, you'll now rely on process scheduler behavior that has no such
> granularity.
> 
> Adding more threads is the natural answer of userland programmers, but
> in the kernel it is not the right answer. We already have mechanism,
> just use them and fix them if they are broken.
> 
> Sorry, I really do not think your patches are the way to go.
> But this thread is definitely interesting.

Oh, this is a far better comment that I would have expected ;-)

Cheers,

Paolo

  parent reply	other threads:[~2016-05-11 14:38 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-10 14:11 [RFC PATCH 0/2] net: threadable napi poll loop Paolo Abeni
2016-05-10 14:11 ` [RFC PATCH 1/2] net: implement threaded-able napi poll loop support Paolo Abeni
2016-05-10 14:11 ` [RFC PATCH 2/2] net: add sysfs attribute to control napi threaded mode Paolo Abeni
2016-05-10 14:29 ` [RFC PATCH 0/2] net: threadable napi poll loop Eric Dumazet
2016-05-10 15:51   ` David Miller
2016-05-10 16:03   ` Paolo Abeni
2016-05-10 16:08     ` Eric Dumazet
2016-05-10 20:22       ` Paolo Abeni
2016-05-10 20:45         ` David Miller
2016-05-10 20:50           ` Rik van Riel
2016-05-10 20:52             ` David Miller
2016-05-10 21:01               ` Rik van Riel
2016-05-10 20:46   ` Hannes Frederic Sowa
2016-05-10 21:09     ` Eric Dumazet
2016-05-10 21:31       ` Eric Dumazet
2016-05-10 21:35         ` Rik van Riel
2016-05-10 21:53           ` Eric Dumazet
2016-05-10 22:02             ` Eric Dumazet
2016-05-10 22:44               ` Eric Dumazet
2016-05-10 22:02             ` Rik van Riel
2016-05-11 17:55             ` Eric Dumazet
2016-05-10 22:32       ` Hannes Frederic Sowa
2016-05-10 22:51         ` Eric Dumazet
2016-05-11  6:55           ` Peter Zijlstra
2016-05-11 13:13             ` Hannes Frederic Sowa
2016-05-11 14:40               ` Eric Dumazet
2016-05-11 15:01                 ` Rik van Riel
2016-05-11 15:50                 ` Eric Dumazet
2016-05-11 21:56             ` Eric Dumazet
2016-05-12 20:07               ` Paolo Abeni
2016-05-12 20:49                 ` Eric Dumazet
2016-05-12 20:58                   ` Paolo Abeni
2016-05-12 21:05                     ` Eric Dumazet
2016-05-13 16:50               ` Paolo Abeni
2016-05-13 17:03                 ` Eric Dumazet
2016-05-13 17:19                   ` Paolo Abeni
2016-05-13 17:36                     ` Eric Dumazet
2016-05-16 13:10                       ` Paolo Abeni
2016-05-16 13:38                         ` Eric Dumazet
2016-05-11  9:48           ` Paolo Abeni
2016-05-11 13:08             ` Eric Dumazet
2016-05-11 13:39               ` Hannes Frederic Sowa
2016-05-11 13:47                 ` Hannes Frederic Sowa
2016-05-11 14:38               ` Paolo Abeni [this message]
2016-05-11 14:45                 ` Eric Dumazet
2016-05-11 22:47                   ` Hannes Frederic Sowa
2016-05-10 15:57 ` Thomas Gleixner
2016-05-10 20:41   ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1462977516.4444.73.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=aduyck@mirantis.com \
    --cc=ast@plumgrid.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=jiri@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).