All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Wang <weiwan@google.com>
To: David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>
Cc: Eric Dumazet <edumazet@google.com>, Felix Fietkau <nbd@nbd.name>,
	Paolo Abeni <pabeni@redhat.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Hillf Danton <hdanton@sina.com>
Subject: Re: [PATCH net-next v3 0/5] implement kthread based napi poll
Date: Wed, 18 Nov 2020 12:14:09 -0800	[thread overview]
Message-ID: <CAEA6p_CKXMzqqWK0Mo5ppA4vV7bKqV=2toDxmumCJwFeWtq4gQ@mail.gmail.com> (raw)
In-Reply-To: <20201118191009.3406652-1-weiwan@google.com>

On Wed, Nov 18, 2020 at 12:07 PM Wei Wang <weiwan@google.com> wrote:
>
> The idea of moving the napi poll process out of softirq context to a
> kernel thread based context is not new.
> Paolo Abeni and Hannes Frederic Sowa have proposed patches to move napi
> poll to kthread back in 2016. And Felix Fietkau has also proposed
> patches of similar ideas to use workqueue to process napi poll just a
> few weeks ago.
>
> The main reason we'd like to push forward with this idea is that the
> scheduler has poor visibility into cpu cycles spent in softirq context,
> and is not able to make optimal scheduling decisions of the user threads.
> For example, we see in one of the application benchmark where network
> load is high, the CPUs handling network softirqs has ~80% cpu util. And
> user threads are still scheduled on those CPUs, despite other more idle
> cpus available in the system. And we see very high tail latencies. In this
> case, we have to explicitly pin away user threads from the CPUs handling
> network softirqs to ensure good performance.
> With napi poll moved to kthread, scheduler is in charge of scheduling both
> the kthreads handling network load, and the user threads, and is able to
> make better decisions. In the previous benchmark, if we do this and we
> pin the kthreads processing napi poll to specific CPUs, scheduler is
> able to schedule user threads away from these CPUs automatically.
>
> And the reason we prefer 1 kthread per napi, instead of 1 workqueue
> entity per host, is that kthread is more configurable than workqueue,
> and we could leverage existing tuning tools for threads, like taskset,
> chrt, etc to tune scheduling class and cpu set, etc. Another reason is
> if we eventually want to provide busy poll feature using kernel threads
> for napi poll, kthread seems to be more suitable than workqueue.
> Furthermore, for large platforms with 2 NICs attached to 2 sockets,
> kthread is more flexible to be pinned to different sets of CPUs.
>
> In this patch series, I revived Paolo and Hannes's patch in 2016 and
> left them as the first 2 patches. Then there are changes proposed by
> Felix, Jakub, Paolo and myself on top of those, with suggestions from
> Eric Dumazet.
>
> In terms of performance, I ran tcp_rr tests with 1000 flows with
> various request/response sizes, with RFS/RPS disabled, and compared
> performance between softirq vs kthread vs workqueue (patchset proposed
> by Felix Fietkau).
> Host has 56 hyper threads and 100Gbps nic, 8 rx queues and only 1 numa
> node. All threads are unpinned.
>
>         req/resp   QPS   50%tile    90%tile    99%tile    99.9%tile
> softirq   1B/1B   2.75M   337us       376us      1.04ms     3.69ms
> kthread   1B/1B   2.67M   371us       408us      455us      550us
> workq     1B/1B   2.56M   384us       435us      673us      822us
>
> softirq 5KB/5KB   1.46M   678us       750us      969us      2.78ms
> kthread 5KB/5KB   1.44M   695us       789us      891us      1.06ms
> workq   5KB/5KB   1.34M   720us       905us     1.06ms      1.57ms
>
> softirq 1MB/1MB   11.0K   79ms       166ms      306ms       630ms
> kthread 1MB/1MB   11.0K   75ms       177ms      303ms       596ms
> workq   1MB/1MB   11.0K   79ms       180ms      303ms       587ms
>
> When running workqueue implementation, I found the number of threads
> used is usually twice as much as kthread implementation. This probably
> introduces higher scheduling cost, which results in higher tail
> latencies in most cases.
>
> I also ran an application benchmark, which performs fixed qps remote SSD
> read/write operations, with various sizes. Again, both with RFS/RPS
> disabled.
> The result is as follows:
>          op_size  QPS   50%tile 95%tile 99%tile 99.9%tile
> softirq   4K     572.6K   385us   1.5ms  3.16ms   6.41ms
> kthread   4K     572.6K   390us   803us  2.21ms   6.83ms
> workq     4k     572.6K   384us   763us  3.12ms   6.87ms
>
> softirq   64K    157.9K   736us   1.17ms 3.40ms   13.75ms
> kthread   64K    157.9K   745us   1.23ms 2.76ms    9.87ms
> workq     64K    157.9K   746us   1.23ms 2.76ms    9.96ms
>
> softirq   1M     10.98K   2.03ms  3.10ms  3.7ms   11.56ms
> kthread   1M     10.98K   2.13ms  3.21ms  4.02ms  13.3ms
> workq     1M     10.98K   2.13ms  3.20ms  3.99ms  14.12ms
>
> In this set of tests, the latency is predominant by the SSD operation.
> Also, the user threads are much busier compared to tcp_rr tests. We have
> to pin the kthreads/workqueue threads to limit to a few CPUs, to not
> disturb user threads, and provide some isolation.
>
>
> Changes since v2:
> Corrected typo in patch 1, and updated the cover letter with more
> detailed and updated test results.
>

Hi everyone,

We thought it is a good time to re-push this patch series to get
another round of evaluation after several weeks since last version.
The patch series itself did not have much change. But I updated the
cover letter to include the updated and more detailed test results,
hoping to give more contexts.

Thanks for reviewing!
Wei

> Changes since v1:
> Replaced kthread_create() with kthread_run() in patch 5 as suggested by
> Felix Fietkau.
>
> Changes since RFC:
> Renamed the kthreads to be napi/<dev>-<napi_id> in patch 5 as suggested
> by Hannes Frederic Sowa.
>
> Paolo Abeni (2):
>   net: implement threaded-able napi poll loop support
>   net: add sysfs attribute to control napi threaded mode
> Felix Fietkau (1):
>   net: extract napi poll functionality to __napi_poll()
> Jakub Kicinski (1):
>   net: modify kthread handler to use __napi_poll()
> Wei Wang (1):
>   net: improve napi threaded config
>
>  include/linux/netdevice.h |   5 ++
>  net/core/dev.c            | 143 +++++++++++++++++++++++++++++++++++---
>  net/core/net-sysfs.c      | 100 ++++++++++++++++++++++++++
>  3 files changed, 239 insertions(+), 9 deletions(-)
>
> --
> 2.29.2.454.gaff20da3a2-goog
>

      parent reply	other threads:[~2020-11-18 20:14 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-18 19:10 [PATCH net-next v3 0/5] implement kthread based napi poll Wei Wang
2020-11-18 19:10 ` [PATCH net-next v3 1/5] net: implement threaded-able napi poll loop support Wei Wang
2020-11-22  0:31   ` Jakub Kicinski
2020-11-22  2:23     ` Wei Wang
2020-11-23 18:56       ` Jakub Kicinski
2020-11-23 19:07         ` Wei Wang
2020-11-18 19:10 ` [PATCH net-next v3 2/5] net: add sysfs attribute to control napi threaded mode Wei Wang
2020-11-18 20:36   ` Randy Dunlap
2020-11-18 21:20     ` Wei Wang
2020-11-18 19:10 ` [PATCH net-next v3 3/5] net: extract napi poll functionality to __napi_poll() Wei Wang
2020-11-18 19:10 ` [PATCH net-next v3 4/5] net: modify kthread handler to use __napi_poll() Wei Wang
2020-11-18 19:10 ` [PATCH net-next v3 5/5] net: improve napi threaded config Wei Wang
2020-11-18 20:14 ` Wei Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEA6p_CKXMzqqWK0Mo5ppA4vV7bKqV=2toDxmumCJwFeWtq4gQ@mail.gmail.com' \
    --to=weiwan@google.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hannes@stressinduktion.org \
    --cc=hdanton@sina.com \
    --cc=kuba@kernel.org \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.