All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Maxim Mikityanskiy <maximmi@nvidia.com>
Cc: "bjorn@kernel.org" <bjorn@kernel.org>,
	"magnus.karlsson@intel.com" <magnus.karlsson@intel.com>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	Saeed Mahameed <saeedm@nvidia.com>,
	"alexandr.lobakin@intel.com" <alexandr.lobakin@intel.com>,
	"ast@kernel.org" <ast@kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ
Date: Tue, 23 Aug 2022 13:24:06 +0200	[thread overview]
Message-ID: <YwS41lA+mz0uUZVP@boxer> (raw)
In-Reply-To: <93b8740b39267bc550a8f6e0077fb4772535c65e.camel@nvidia.com>

On Tue, Aug 23, 2022 at 09:49:43AM +0000, Maxim Mikityanskiy wrote:
> Anyone from Intel? Maciej, Björn, Magnus?

Hey Maxim,

how about keeping it simple and going with option 1? This behavior was
even proposed in the v1 submission of the patch set we're talking about.

> 
> Does anyone else have anything to say? IMO, calling XDP for the same
> packet multiple times is a bug, we should agree on some sane solution.
> 
> On Thu, 2022-08-18 at 14:26 +0300, Maxim Mikityanskiy wrote:
> > Hi Maciej,
> > 
> > On Wed, 2022-04-13 at 17:30 +0200, Maciej Fijalkowski wrote:
> > > v2:
> > > - add likely for internal redirect return codes in ice and ixgbe
> > >   (Jesper)
> > > - do not drop the buffer that head pointed to at full XSK RQ
> > > (Maxim)
> > 
> > I found an issue with this approach. If you don't drop that packet,
> > you'll need to run the XDP program for the same packet again. If the
> > XDP program is anything more complex than "redirect-everything-to-
> > XSK",
> > it will get confused, for example, if it tracks any state or counts
> > anything.
> > 
> > We can't check whether there is enough space in the XSK RX ring
> > before
> > running the XDP program, as we don't know in advance which XSK socket
> > will be selected.
> > 
> > We can't store bpf_redirect_info across NAPI calls to avoid running
> > the
> > XDP program second time, as bpf_redirect_info is protected by RCU and
> > the assumption that the whole XDP_REDIRECT operation happens within
> > one
> > NAPI cycle.
> > 
> > I see the following options:
> > 
> > 1. Drop the packet when an overflow happens. The problem is that it
> > will happen systematically, but it's still better than the old
> > behavior
> > (drop mulitple packets when an overflow happens and hog the CPU).
> > 
> > 2. Make this feature opt-in. If the application opts in, it
> > guarantees
> > to take measures to handle duplicate packets in XDP properly.
> > 
> > 3. Require the XDP program to indicate it supports being called
> > multiple times for the same packet and pass a flag to it if it's a
> > repeated run. Drop the packet in the driver if the XDP program
> > doesn't
> > indicate this support. The indication can be done similar to how it's
> > implemented for XDP multi buffer.
> > 
> > Thoughts? Other options?
> > 
> > Thanks,
> > Max
> > 
> > > - terminate Rx loop only when need_wakeup feature is enabled
> > > (Maxim)
> > > - reword from 'stop softirq processing' to 'stop NAPI Rx
> > > processing'
> > > - s/ENXIO/EINVAL in mlx5 and stmmac's ndo_xsk_wakeup to keep it
> > >   consitent with Intel's drivers (Maxim)
> > > - include Jesper's Acks
> > > 
> > > Hi!
> > > 
> > > This is a revival of Bjorn's idea [0] to break NAPI loop when XSK
> > > Rx
> > > queue gets full which in turn makes it impossible to keep on
> > > successfully producing descriptors to XSK Rx ring. By breaking out
> > > of
> > > the driver side immediately we will give the user space opportunity
> > > for
> > > consuming descriptors from XSK Rx ring and therefore provide room
> > > in the
> > > ring so that HW Rx -> XSK Rx redirection can be done.
> > > 
> > > Maxim asked and Jesper agreed on simplifying Bjorn's original API
> > > used
> > > for detecting the event of interest, so let's just simply check for
> > > -ENOBUFS within Intel's ZC drivers after an attempt to redirect a
> > > buffer
> > > to XSK Rx. No real need for redirect API extension.
> > > 
> > > One might ask why it is still relevant even after having proper
> > > busy
> > > poll support in place - here is the justification.
> > > 
> > > For xdpsock that was:
> > > - run for l2fwd scenario,
> > > - app/driver processing took place on the same core in busy poll
> > >   with 2048 budget,
> > > - HW ring sizes Tx 256, Rx 2048,
> > > 
> > > this work improved throughput by 78% and reduced Rx queue full
> > > statistic
> > > bump by 99%.
> > > 
> > > For testing ice, make sure that you have [1] present on your side.
> > > 
> > > This set, besides the work described above, carries also
> > > improvements
> > > around return codes in various XSK paths and lastly a minor
> > > optimization
> > > for xskq_cons_has_entries(), a helper that might be used when XSK
> > > Rx
> > > batching would make it to the kernel.
> > > 
> > > Link to v1 and discussion around it is at [2].
> > > 
> > > Thanks!
> > > MF
> > > 
> > > [0]:
> > > https://lore.kernel.org/bpf/20200904135332.60259-1-bjorn.topel@gmail.com/
> > > [1]:
> > > https://lore.kernel.org/netdev/20220317175727.340251-1-maciej.fijalkowski@intel.com/
> > > [2]:
> > > https://lore.kernel.org/bpf/5864171b-1e08-1b51-026e-5f09b8945076@nvidia.com/T/
> > > 
> > > Björn Töpel (1):
> > >   xsk: improve xdp_do_redirect() error codes
> > > 
> > > Maciej Fijalkowski (13):
> > >   xsk: diversify return codes in xsk_rcv_check()
> > >   ice: xsk: decorate ICE_XDP_REDIR with likely()
> > >   ixgbe: xsk: decorate IXGBE_XDP_REDIR with likely()
> > >   ice: xsk: terminate Rx side of NAPI when XSK Rx queue gets full
> > >   i40e: xsk: terminate Rx side of NAPI when XSK Rx queue gets full
> > >   ixgbe: xsk: terminate Rx side of NAPI when XSK Rx queue gets full
> > >   ice: xsk: diversify return values from xsk_wakeup call paths
> > >   i40e: xsk: diversify return values from xsk_wakeup call paths
> > >   ixgbe: xsk: diversify return values from xsk_wakeup call paths
> > >   mlx5: xsk: diversify return values from xsk_wakeup call paths
> > >   stmmac: xsk: diversify return values from xsk_wakeup call paths
> > >   ice: xsk: avoid refilling single Rx descriptors
> > >   xsk: drop ternary operator from xskq_cons_has_entries
> > > 
> > >  .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
> > >  drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 38 ++++++++-----
> > >  drivers/net/ethernet/intel/ice/ice_txrx.h     |  1 +
> > >  drivers/net/ethernet/intel/ice/ice_xsk.c      | 53 ++++++++++++---
> > > ----
> > >  .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  1 +
> > >  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 52 ++++++++++-----
> > > ---
> > >  .../ethernet/mellanox/mlx5/core/en/xsk/tx.c   |  2 +-
> > >  .../net/ethernet/stmicro/stmmac/stmmac_main.c |  4 +-
> > >  net/xdp/xsk.c                                 |  4 +-
> > >  net/xdp/xsk_queue.h                           |  4 +-
> > >  10 files changed, 99 insertions(+), 61 deletions(-)
> > > 
> > 
> 

  reply	other threads:[~2022-08-23 14:35 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 15:30 [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 01/14] xsk: improve xdp_do_redirect() error codes Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 02/14] xsk: diversify return codes in xsk_rcv_check() Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 03/14] ice: xsk: decorate ICE_XDP_REDIR with likely() Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 04/14] ixgbe: xsk: decorate IXGBE_XDP_REDIR " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 05/14] ice: xsk: terminate Rx side of NAPI when XSK Rx queue gets full Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 06/14] i40e: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 07/14] ixgbe: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 08/14] ice: xsk: diversify return values from xsk_wakeup call paths Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 09/14] i40e: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 10/14] ixgbe: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 11/14] mlx5: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 12/14] stmmac: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 13/14] ice: xsk: avoid refilling single Rx descriptors Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 14/14] xsk: drop ternary operator from xskq_cons_has_entries Maciej Fijalkowski
2022-04-15 19:20 ` [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ patchwork-bot+netdevbpf
2022-08-19  8:35 ` Maxim Mikityanskiy
     [not found] ` <f1eea2e9ca337e0c4e072bdd94a89859a4539c09.camel@nvidia.com>
2022-08-23  9:49   ` Maxim Mikityanskiy
2022-08-23 11:24     ` Maciej Fijalkowski [this message]
2022-08-23 13:46       ` Maxim Mikityanskiy
2022-08-24  5:18         ` John Fastabend
2022-08-25 14:42           ` Maxim Mikityanskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YwS41lA+mz0uUZVP@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=maximmi@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.