linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Joe Damato <jdamato@fastly.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	 chuck.lever@oracle.com, jlayton@kernel.org,
	linux-api@vger.kernel.org,  brauner@kernel.org,
	davem@davemloft.net, alexander.duyck@gmail.com,
	 sridhar.samudrala@intel.com, kuba@kernel.org,
	Wei Wang <weiwan@google.com>
Subject: Re: [net-next 0/3] Per epoll context busy poll support
Date: Wed, 24 Jan 2024 09:20:09 +0100	[thread overview]
Message-ID: <CANn89i+YKwrgpt8VnHrw4eeVpqRamLkTSr4u+g1mRDMZa6b+7Q@mail.gmail.com> (raw)
In-Reply-To: <20240124025359.11419-1-jdamato@fastly.com>

On Wed, Jan 24, 2024 at 3:54 AM Joe Damato <jdamato@fastly.com> wrote:
>
> Greetings:
>
> TL;DR This builds on commit bf3b9f6372c4 ("epoll: Add busy poll support to
> epoll with socket fds.") by allowing user applications to enable
> epoll-based busy polling and set a busy poll packet budget on a per epoll
> context basis.
>
> To allow for this, two ioctls have been added for epoll contexts for
> getting and setting a new struct, struct epoll_params.
>
> This makes epoll-based busy polling much more usable for user
> applications than the current system-wide sysctl and hardcoded budget.
>
> Longer explanation:
>
> Presently epoll has support for a very useful form of busy poll based on
> the incoming NAPI ID (see also: SO_INCOMING_NAPI_ID [1]).
>
> This form of busy poll allows epoll_wait to drive NAPI packet processing
> which allows for a few interesting user application designs which can
> reduce latency and also potentially improve L2/L3 cache hit rates by
> deferring NAPI until userland has finished its work.
>
> The documentation available on this is, IMHO, a bit confusing so please
> allow me to explain how one might use this:
>
> 1. Ensure each application thread has its own epoll instance mapping
> 1-to-1 with NIC RX queues. An n-tuple filter would likely be used to
> direct connections with specific dest ports to these queues.
>
> 2. Optionally: Setup IRQ coalescing for the NIC RX queues where busy
> polling will occur. This can help avoid the userland app from being
> pre-empted by a hard IRQ while userland is running. Note this means that
> userland must take care to call epoll_wait and not take too long in
> userland since it now drives NAPI via epoll_wait.
>
> 3. Ensure that all incoming connections added to an epoll instance
> have the same NAPI ID. This can be done with a BPF filter when
> SO_REUSEPORT is used or getsockopt + SO_INCOMING_NAPI_ID when a single
> accept thread is used which dispatches incoming connections to threads.
>
> 4. Lastly, busy poll must be enabled via a sysctl
> (/proc/sys/net/core/busy_poll).
>
> The unfortunate part about step 4 above is that this enables busy poll
> system-wide which affects all user applications on the system,
> including epoll-based network applications which were not intended to
> be used this way or applications where increased CPU usage for lower
> latency network processing is unnecessary or not desirable.
>
> If the user wants to run one low latency epoll-based server application
> with epoll-based busy poll, but would like to run the rest of the
> applications on the system (which may also use epoll) without busy poll,
> this system-wide sysctl presents a significant problem.
>
> This change preserves the system-wide sysctl, but adds a mechanism (via
> ioctl) to enable or disable busy poll for epoll contexts as needed by
> individual applications, making epoll-based busy poll more usable.
>

I think this description missed the napi_defer_hard_irqs and
gro_flush_timeout settings ?

I would think that if an application really wants to make sure its
thread is the only one
eventually calling napi->poll(), we must make sure NIC interrupts stay masked.

Current implementations of busy poll always release NAPI_STATE_SCHED bit when
returning to user space.

It seems you want to make sure the application and only the
application calls the napi->poll()
at chosen times.

Some kind of contract is needed, and the presence of the hrtimer
(currently only driven from dev->@gro_flush_timeout)
would allow to do that correctly.

Whenever we 'trust' user space to perform the napi->poll shortly, we
also want to arm the hrtimer to eventually detect
the application took too long, to restart the other mechanisms (NIC irq based)

Note that we added the kthread based napi polling, and we are working
to add a busy polling feature to these kthreads.
allowing to completely mask NIC interrupts and further reduce latencies.

Thank you

> Thanks,
> Joe
>
> [1]: https://lore.kernel.org/lkml/20170324170836.15226.87178.stgit@localhost.localdomain/
>
> Joe Damato (3):
>   eventpoll: support busy poll per epoll instance
>   eventpoll: Add per-epoll busy poll packet budget
>   eventpoll: Add epoll ioctl for epoll_params
>
>  .../userspace-api/ioctl/ioctl-number.rst      |  1 +
>  fs/eventpoll.c                                | 99 ++++++++++++++++++-
>  include/uapi/linux/eventpoll.h                | 12 +++
>  3 files changed, 107 insertions(+), 5 deletions(-)
>
> --
> 2.25.1
>

  parent reply	other threads:[~2024-01-24  8:20 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-24  2:53 [net-next 0/3] Per epoll context busy poll support Joe Damato
2024-01-24  2:53 ` [net-next 1/3] eventpoll: support busy poll per epoll instance Joe Damato
2024-01-24  2:53 ` [net-next 2/3] eventpoll: Add per-epoll busy poll packet budget Joe Damato
2024-01-24  2:53 ` [net-next 3/3] eventpoll: Add epoll ioctl for epoll_params Joe Damato
2024-01-24 15:37   ` Joe Damato
2024-01-24  8:20 ` Eric Dumazet [this message]
2024-01-24 14:20   ` [net-next 0/3] Per epoll context busy poll support Joe Damato
2024-01-24 14:38     ` Eric Dumazet
2024-01-24 15:19       ` Joe Damato
2024-01-30 18:54   ` Samudrala, Sridhar
2024-02-02  3:28     ` Joe Damato
2024-02-02 17:23       ` Samudrala, Sridhar
2024-02-02 18:22         ` Jakub Kicinski
2024-02-02 19:33           ` Joe Damato
2024-02-02 19:58             ` Jakub Kicinski
2024-02-02 20:23               ` Joe Damato
2024-02-02 20:50                 ` Samudrala, Sridhar
2024-02-02 20:55                   ` Joe Damato
2024-02-03  1:15                 ` Jakub Kicinski
2024-02-05 18:17                   ` Stanislav Fomichev
2024-02-05 18:52                     ` Joe Damato
2024-02-05 19:48                       ` Jakub Kicinski
2024-02-05 18:51                   ` Joe Damato
2024-02-05 19:59                     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANn89i+YKwrgpt8VnHrw4eeVpqRamLkTSr4u+g1mRDMZa6b+7Q@mail.gmail.com \
    --to=edumazet@google.com \
    --cc=alexander.duyck@gmail.com \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=davem@davemloft.net \
    --cc=jdamato@fastly.com \
    --cc=jlayton@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sridhar.samudrala@intel.com \
    --cc=weiwan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).