netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Magnus Karlsson <magnus.karlsson@gmail.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "Wei Wang" <weiwan@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	"Network Development" <netdev@vger.kernel.org>,
	"Eric Dumazet" <edumazet@google.com>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Hannes Frederic Sowa" <hannes@stressinduktion.org>,
	"Felix Fietkau" <nbd@nbd.name>,
	"Björn Töpel" <bjorn.topel@intel.com>
Subject: Re: [RFC PATCH net-next 0/6] implement kthread based napi poll
Date: Mon, 28 Sep 2020 16:07:03 +0200	[thread overview]
Message-ID: <CAJ8uoz0ZMdFkFooipvJphFKH9XP9qEc7vApfjkGu6hC0usHDRQ@mail.gmail.com> (raw)
In-Reply-To: <20200925120652.10b8d7c5@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Fri, Sep 25, 2020 at 9:06 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 25 Sep 2020 15:48:35 +0200 Magnus Karlsson wrote:
> > I really like this RFC and would encourage you to submit it as a
> > patch. Would love to see it make it into the kernel.
> >
> > I see the same positive effects as you when trying it out with AF_XDP
> > sockets. Made some simple experiments where I sent 64-byte packets to
> > a single AF_XDP socket. Have not managed to figure out how to do
> > percentiles on my load generator, so this is going to be min, avg and
> > max only. The application using the AF_XDP socket just performs a mac
> > swap on the packet and sends it back to the load generator that then
> > measures the round trip latency. The kthread is taskset to the same
> > core as ksoftirqd would run on. So in each experiment, they always run
> > on the same core id (which is not the same as the application).
> >
> > Rate 12 Mpps with 0% loss.
> >               Latencies (us)         Delay Variation between packets
> >           min    avg    max      avg   max
> > sofirq  11.0  17.1   78.4      0.116  63.0
> > kthread 11.2  17.1   35.0     0.116  20.9
> >
> > Rate ~58 Mpps (Line rate at 40 Gbit/s) with substantial loss
> >               Latencies (us)         Delay Variation between packets
> >           min    avg    max      avg   max
> > softirq  87.6  194.9  282.6    0.062  25.9
> > kthread  86.5  185.2  271.8    0.061  22.5
> >
> > For the last experiment, I also get 1.5% to 2% higher throughput with
> > your kthread approach. Moreover, just from the per-second throughput
> > printouts from my application, I can see that the kthread numbers are
> > more stable. The softirq numbers can vary quite a lot between each
> > second, around +-3%. But for the kthread approach, they are nice and
> > stable. Have not examined why.
>
> Sure, it's better than status quo for AF_XDP but it's going to be far
> inferior to well implemented busy polling.

Agree completely. Björn is looking into this at the moment, so I will
let him comment on it and post some patches.

> We already discussed the potential scheme with Bjorn, since you prompted
> me again, let me shoot some code from the hip at ya:
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 74ce8b253ed6..8dbdfaeb0183 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6668,6 +6668,7 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
>
>  static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
>  {
> +       unsigned long to;
>         int rc;
>
>         /* Busy polling means there is a high chance device driver hard irq
> @@ -6682,6 +6683,13 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
>         clear_bit(NAPI_STATE_MISSED, &napi->state);
>         clear_bit(NAPI_STATE_IN_BUSY_POLL, &napi->state);
>
> +       if (READ_ONCE(napi->dev->napi_defer_hard_irqs)) {
> +               netpoll_poll_unlock(have_poll_lock);
> +               to = ns_to_ktime(READ_ONCE(napi->dev->gro_flush_timeout));
> +               hrtimer_start(&n->timer, to, HRTIMER_MODE_REL_PINNED);
> +               return;
> +       }
> +
>         local_bh_disable();
>
>         /* All we really want here is to re-enable device interrupts.
>
>
> With basic busy polling implemented for AF_XDP this is all** you need
> to make busy polling work very well.
>
> ** once bugs are fixed :D I haven't even compiled this
>
> Eric & co. already implemented hard IRQ deferral. All we need to do is
> push the timer away when application picks up frames. I think.
>
> Please, no loose threads for AF_XDP apps (or other busy polling apps).
> Let the application burn 100% of the core :(

  reply	other threads:[~2020-09-28 14:07 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-14 17:24 [RFC PATCH net-next 0/6] implement kthread based napi poll Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 1/6] net: implement threaded-able napi poll loop support Wei Wang
2020-09-25 19:45   ` Hannes Frederic Sowa
2020-09-25 23:50     ` Wei Wang
2020-09-26 14:22       ` Hannes Frederic Sowa
2020-09-28  8:45         ` Paolo Abeni
2020-09-28 18:13           ` Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 2/6] net: add sysfs attribute to control napi threaded mode Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 3/6] net: extract napi poll functionality to __napi_poll() Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 4/6] net: modify kthread handler to use __napi_poll() Wei Wang
2020-09-14 17:24 ` [RFC PATCH net-next 5/6] net: process RPS/RFS work in kthread context Wei Wang
2020-09-18 22:44   ` Wei Wang
2020-09-21  8:11     ` Eric Dumazet
2020-09-14 17:24 ` [RFC PATCH net-next 6/6] net: improve napi threaded config Wei Wang
2020-09-25 13:48 ` [RFC PATCH net-next 0/6] implement kthread based napi poll Magnus Karlsson
2020-09-25 17:15   ` Wei Wang
2020-09-25 17:30     ` Eric Dumazet
2020-09-25 18:16     ` Stephen Hemminger
2020-09-25 18:23       ` Eric Dumazet
2020-09-25 19:00         ` Stephen Hemminger
2020-09-25 19:06   ` Jakub Kicinski
2020-09-28 14:07     ` Magnus Karlsson [this message]
2020-09-28 17:43 ` Eric Dumazet
2020-09-28 18:15   ` Wei Wang
2020-09-29 19:19   ` Jakub Kicinski
2020-09-29 20:16     ` Wei Wang
2020-09-29 21:48       ` Jakub Kicinski
2020-09-30  8:23         ` David Laight
2020-09-30  8:58         ` Paolo Abeni
2020-09-30 15:58           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ8uoz0ZMdFkFooipvJphFKH9XP9qEc7vApfjkGu6hC0usHDRQ@mail.gmail.com \
    --to=magnus.karlsson@gmail.com \
    --cc=bjorn.topel@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=kuba@kernel.org \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=weiwan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).