netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Wei Wang <weiwan@google.com>
Cc: Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Felix Fietkau <nbd@nbd.name>
Subject: Re: [RFC PATCH net-next 0/6] implement kthread based napi poll
Date: Tue, 29 Sep 2020 14:48:47 -0700	[thread overview]
Message-ID: <20200929144847.05f3dcf7@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <CAEA6p_BPT591fqFRqsM=k4urVXQ1sqL-31rMWjhvKQZm9-Lksg@mail.gmail.com>

On Tue, 29 Sep 2020 13:16:59 -0700 Wei Wang wrote:
> On Tue, Sep 29, 2020 at 12:19 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Mon, 28 Sep 2020 19:43:36 +0200 Eric Dumazet wrote:  
> > > Wei, this is a very nice work.
> > >
> > > Please re-send it without the RFC tag, so that we can hopefully merge it ASAP.  
> >
> > The problem is for the application I'm testing with this implementation
> > is significantly slower (in terms of RPS) than Felix's code:
> >
> >               |        L  A  T  E  N  C  Y       |  App   |     C P U     |
> >        |  RPS |   AVG  |  P50  |   P99  |   P999 | Overld |  busy |  PSI  |
> > thread | 1.1% | -15.6% | -0.3% | -42.5% |  -8.1% | -83.4% | -2.3% | 60.6% |
> > work q | 4.3% | -13.1% |  0.1% | -44.4% |  -1.1% |   2.3% | -1.2% | 90.1% |
> > TAPI   | 4.4% | -17.1% | -1.4% | -43.8% | -11.0% | -60.2% | -2.3% | 46.7% |
> >
> > thread is this code, "work q" is Felix's code, TAPI is my hacks.
> >
> > The numbers are comparing performance to normal NAPI.
> >
> > In all cases (but not the baseline) I configured timer-based polling
> > (defer_hard_irqs), with around 100us timeout. Without deferring hard
> > IRQs threaded NAPI is actually slower for this app. Also I'm not
> > modifying niceness, this again causes application performance
> > regression here.
> >  
> 
> If I remember correctly, Felix's workqueue code uses HIGHPRIO flag
> which by default uses -20 as the nice value for the workqueue threads.
> But the kthread implementation leaves nice level as 20 by default.
> This could be 1 difference.

FWIW this is the data based on which I concluded the nice -20 actually
makes things worse here:

      threded: -1.50%
 threded p-20: -5.67%
     thr poll:  2.93%
thr poll p-20:  2.22%

Annoyingly relative performance change varies day to day and this test
was run a while back (over the weekend I was getting < 2% improvement
with this set).

> I am not sure what the benchmark is doing

Not a benchmark, real workload :)

> but one thing to try is to limit the CPUs that run the kthreads to a
> smaller # of CPUs. This could bring up the kernel cpu usage to a
> higher %, e.g. > 80%, so the scheduler is less likely to schedule
> user threads on these CPUs, thus providing isolations between
> kthreads and the user threads, and reducing the scheduling overhead.

Yeah... If I do pinning or isolation I can get to 15% RPS improvement
for this application.. no threaded NAPI needed. The point for me is to
not have to do such tuning per app x platform x workload of the day.

> This could help if the throughput drop is caused by higher scheduling
> latency for the user threads. Another thing to try is to raise the
> scheduling class of the kthread from SCHED_OTHER to SCHED_FIFO. This
> could help if the throughput drop is caused by the kthreads
> experiencing higher scheduling latency.

Isn't the fundamental problem that scheduler works at ms scale while
where we're talking about 100us at most? And AFAICT scheduler doesn't
have a knob to adjust migration cost per process? :(

I just reached out to the kernel experts @FB for their input.

Also let me re-run with a normal prio WQ.

> > 1 NUMA node. 18 NAPI instances each is around 25% of a single CPU.
> >
> > I was initially hoping that TAPI would fit nicely as an extension
> > of this code, but I don't think that will be the case.
> >
> > Are there any assumptions you're making about the configuration that
> > I should try to replicate?  

  reply	other threads:[~2020-09-29 21:48 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-14 17:24 [RFC PATCH net-next 0/6] implement kthread based napi poll Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 1/6] net: implement threaded-able napi poll loop support Wei Wang
2020-09-25 19:45   ` Hannes Frederic Sowa
2020-09-25 23:50     ` Wei Wang
2020-09-26 14:22       ` Hannes Frederic Sowa
2020-09-28  8:45         ` Paolo Abeni
2020-09-28 18:13           ` Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 2/6] net: add sysfs attribute to control napi threaded mode Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 3/6] net: extract napi poll functionality to __napi_poll() Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 4/6] net: modify kthread handler to use __napi_poll() Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 5/6] net: process RPS/RFS work in kthread context Wei Wang
2020-09-18 22:44   ` Wei Wang
2020-09-21  8:11     ` Eric Dumazet
2020-09-14 17:24 ` [RFC PATCH net-next 6/6] net: improve napi threaded config Wei Wang
2020-09-25 13:48 ` [RFC PATCH net-next 0/6] implement kthread based napi poll Magnus Karlsson
2020-09-25 17:15   ` Wei Wang
2020-09-25 17:30     ` Eric Dumazet
2020-09-25 18:16     ` Stephen Hemminger
2020-09-25 18:23       ` Eric Dumazet
2020-09-25 19:00         ` Stephen Hemminger
2020-09-25 19:06   ` Jakub Kicinski
2020-09-28 14:07     ` Magnus Karlsson
2020-09-28 17:43 ` Eric Dumazet
2020-09-28 18:15   ` Wei Wang
2020-09-29 19:19   ` Jakub Kicinski
2020-09-29 20:16     ` Wei Wang
2020-09-29 21:48       ` Jakub Kicinski [this message]
2020-09-30  8:23         ` David Laight
2020-09-30  8:58         ` Paolo Abeni
2020-09-30 15:58           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200929144847.05f3dcf7@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com \
    --to=kuba@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=weiwan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).