All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Julian Wiedmann <jwi@linux.ibm.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, Luigi Rizzo <lrizzo@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH net-next 1/3] net: napi: add hard irqs deferral feature
Date: Sat, 2 May 2020 08:40:58 -0700	[thread overview]
Message-ID: <CANn89iKid-JWYs6esRYo25NqVdLkLvn6uwiB7wLz_PXuREQQKA@mail.gmail.com> (raw)
In-Reply-To: <a8f1fbf8-b25f-d3aa-27fe-11b1f0fdae3f@linux.ibm.com>

On Sat, May 2, 2020 at 7:56 AM Julian Wiedmann <jwi@linux.ibm.com> wrote:
>
> On 22.04.20 18:13, Eric Dumazet wrote:
> > Back in commit 3b47d30396ba ("net: gro: add a per device gro flush timer")
> > we added the ability to arm one high resolution timer, that we used
> > to keep not-complete packets in GRO engine a bit longer, hoping that further
> > frames might be added to them.
> >
> > Since then, we added the napi_complete_done() interface, and commit
> > 364b6055738b ("net: busy-poll: return busypolling status to drivers")
> > allowed drivers to avoid re-arming NIC interrupts if we made a promise
> > that their NAPI poll() handler would be called in the near future.
> >
> > This infrastructure can be leveraged, thanks to a new device parameter,
> > which allows to arm the napi hrtimer, instead of re-arming the device
> > hard IRQ.
> >
> > We have noticed that on some servers with 32 RX queues or more, the chit-chat
> > between the NIC and the host caused by IRQ delivery and re-arming could hurt
> > throughput by ~20% on 100Gbit NIC.
> >
> > In contrast, hrtimers are using local (percpu) resources and might have lower
> > cost.
> >
> > The new tunable, named napi_defer_hard_irqs, is placed in the same hierarchy
> > than gro_flush_timeout (/sys/class/net/ethX/)
> >
>
> Hi Eric,
> could you please add some Documentation for this new sysfs tunable? Thanks!
> Looks like gro_flush_timeout is missing the same :).


Yes. I was planning adding this in
Documentation/networking/scaling.rst, once our fires are extinguished.

>
>
> > By default, both gro_flush_timeout and napi_defer_hard_irqs are zero.
> >
> > This patch does not change the prior behavior of gro_flush_timeout
> > if used alone : NIC hard irqs should be rearmed as before.
> >
> > One concrete usage can be :
> >
> > echo 20000 >/sys/class/net/eth1/gro_flush_timeout
> > echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs
> >
> > If at least one packet is retired, then we will reset napi counter
> > to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans
> > of the queue.
> >
> > On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ
> > avoidance was only possible if napi->poll() was exhausting its budget
> > and not call napi_complete_done().
> >
>
> I was confused here for a second, so let me just clarify how this is intended
> to look like for pure TX completion IRQs:
>
> napi->poll() calls napi_complete_done() with an accurate work_done value, but
> then still returns 0 because TX completion work doesn't consume NAPI budget.


If the napi budget was consumed, the driver does _not_ call
napi_complete() or napi_complete_done() anyway.

If the budget is consumed, then napi_complete_done(napi, X>0) allows
napi_complete_done()
to return 0 if napi_defer_hard_irqs is not 0

This means that the NIC hard irq will stay disabled for at least one more round.


>
>
> > This feature also can be used to work around some non-optimal NIC irq
> > coalescing strategies.
> >
> > Having the ability to insert XX usec delays between each napi->poll()
> > can increase cache efficiency, since we increase batch sizes.
> >
> > It also keeps serving cpus not idle too long, reducing tail latencies.
> >
> > Co-developed-by: Luigi Rizzo <lrizzo@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>

  reply	other threads:[~2020-05-02 15:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-22 16:13 [PATCH net-next 0/3] net: napi: addition of napi_defer_hard_irqs Eric Dumazet
2020-04-22 16:13 ` [PATCH net-next 1/3] net: napi: add hard irqs deferral feature Eric Dumazet
2020-05-02 14:56   ` Julian Wiedmann
2020-05-02 15:40     ` Eric Dumazet [this message]
2020-05-02 16:10       ` Julian Wiedmann
2020-05-02 16:24         ` Eric Dumazet
2020-05-02 23:45           ` David Miller
2020-05-04 15:25           ` Julian Wiedmann
2020-05-04 15:33             ` Eric Dumazet
2020-04-22 16:13 ` [PATCH net-next 2/3] net: napi: use READ_ONCE()/WRITE_ONCE() Eric Dumazet
2020-04-22 16:13 ` [PATCH net-next 3/3] net/mlx4_en: use napi_complete_done() in TX completion Eric Dumazet
2020-04-23 19:43 ` [PATCH net-next 0/3] net: napi: addition of napi_defer_hard_irqs David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANn89iKid-JWYs6esRYo25NqVdLkLvn6uwiB7wLz_PXuREQQKA@mail.gmail.com \
    --to=edumazet@google.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jwi@linux.ibm.com \
    --cc=lrizzo@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.