All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maxim Mikityanskiy <maximmi@nvidia.com>
To: "maciej.fijalkowski@intel.com" <maciej.fijalkowski@intel.com>,
	"john.fastabend@gmail.com" <john.fastabend@gmail.com>
Cc: "magnus.karlsson@intel.com" <magnus.karlsson@intel.com>,
	"alexandr.lobakin@intel.com" <alexandr.lobakin@intel.com>,
	"bjorn@kernel.org" <bjorn@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	"ast@kernel.org" <ast@kernel.org>
Subject: Re: [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ
Date: Thu, 25 Aug 2022 14:42:25 +0000	[thread overview]
Message-ID: <4e65c78fd58f1bea9ccccb4f056b5d8e83b75c3f.camel@nvidia.com> (raw)
In-Reply-To: <6305b48ebc40b_6d4fc2083@john.notmuch>

On Tue, 2022-08-23 at 22:18 -0700, John Fastabend wrote:
> Maxim Mikityanskiy wrote:
> > On Tue, 2022-08-23 at 13:24 +0200, Maciej Fijalkowski wrote:
> > > On Tue, Aug 23, 2022 at 09:49:43AM +0000, Maxim Mikityanskiy wrote:
> > > > Anyone from Intel? Maciej, Björn, Magnus?
> > > 
> > > Hey Maxim,
> > > 
> > > how about keeping it simple and going with option 1? This behavior was
> > > even proposed in the v1 submission of the patch set we're talking about.
> > 
> > Yeah, I know it was the behavior in v1. It was me who suggested not
> > dropping that packet, and I didn't realize back then that it had this
> > undesired side effect - sorry for that!
> 
> Just want to reiterate what was said originally, you'll definately confuse
> our XDP programs if they ever saw the same pkt twice. It would confuse
> metrics and any "tap" and so on.
> 
> > 
> > Option 1 sounds good to me as the first remedy, we can start with that.
> > 
> > However, it's not perfect: when NAPI and the application are pinned to
> > the same core, if the fill ring is bigger than the RX ring (which makes
> > sense in case of multiple sockets on the same UMEM), the driver will
> > constantly get into this condition, drop one packet, yield to
> > userspace, the application will of course clean up the RX ring, but
> > then the process will repeat.
> 
> Maybe dumb question haven't followed the entire thread or at least
> don't recall it. Could you yield when you hit a high water mark at
> some point before pkt drop?

That would be an ideal solution, but it doesn't sound feasible to me.
There may be multiple XSK sockets on the same RX queue, and each socket
has its own RX ring, which can be full or have some space. We don't
know in advance which RX ring will be chosen by the XDP program (if any
at all; the XDP program can still drop, pass or retransmit the packet),
so we can't check the watermark on a specific ring before running XDP.

It may be possible to iterate over all sockets and stop the processing
if any socket's RX ring is full, but that would be a heavy thing to do
per packet. It should be possible to optimize it to run once per NAPI
cycle, checking that each RX ring has enough space to fit the whole
NAPI budget, but I'm still not sure of performance implications, and it
sounds overly protective.

> 
> > 
> > That means, we'll always have a small percentage of packets dropped,
> > which may trigger the congestion control algorithms on the other side,
> > slowing down the TX to unacceptable speeds (because packet drops won't
> > disappear after slowing down just a little).
> > 
> > Given the above, we may need a more complex solution for the long term.
> > What do you think?
> > 
> > Also, if the application uses poll(), this whole logic (either v1 or
> > v2) seems not needed, because poll() returns to the application when
> > something becomes available in the RX ring, but I guess the reason for
> > adding it was that fantastic 78% performance improvement mentioned in
> > the cover letter?


      reply	other threads:[~2022-08-25 14:43 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 15:30 [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 01/14] xsk: improve xdp_do_redirect() error codes Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 02/14] xsk: diversify return codes in xsk_rcv_check() Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 03/14] ice: xsk: decorate ICE_XDP_REDIR with likely() Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 04/14] ixgbe: xsk: decorate IXGBE_XDP_REDIR " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 05/14] ice: xsk: terminate Rx side of NAPI when XSK Rx queue gets full Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 06/14] i40e: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 07/14] ixgbe: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 08/14] ice: xsk: diversify return values from xsk_wakeup call paths Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 09/14] i40e: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 10/14] ixgbe: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 11/14] mlx5: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 12/14] stmmac: " Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 13/14] ice: xsk: avoid refilling single Rx descriptors Maciej Fijalkowski
2022-04-13 15:30 ` [PATCH v2 bpf-next 14/14] xsk: drop ternary operator from xskq_cons_has_entries Maciej Fijalkowski
2022-04-15 19:20 ` [PATCH v2 bpf-next 00/14] xsk: stop NAPI Rx processing on full XSK RQ patchwork-bot+netdevbpf
2022-08-19  8:35 ` Maxim Mikityanskiy
     [not found] ` <f1eea2e9ca337e0c4e072bdd94a89859a4539c09.camel@nvidia.com>
2022-08-23  9:49   ` Maxim Mikityanskiy
2022-08-23 11:24     ` Maciej Fijalkowski
2022-08-23 13:46       ` Maxim Mikityanskiy
2022-08-24  5:18         ` John Fastabend
2022-08-25 14:42           ` Maxim Mikityanskiy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e65c78fd58f1bea9ccccb4f056b5d8e83b75c3f.camel@nvidia.com \
    --to=maximmi@nvidia.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.