netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues"
@ 2021-07-07  8:16 Íñigo Huguet
  2021-07-07  8:16 ` [PATCH 2/3] sfc: revert "adjust efx->xdp_tx_queue_count with the real number of initialized queues" Íñigo Huguet
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-07  8:16 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, linux-kernel, ihuguet

Fixes: e26ca4b53582 sfc: reduce the number of requested xdp ev queues

The buggy commit intended to allocate less channels for XDP in order to
be more unlikely to reach the limit of 32 channels of the driver.

The idea was to use each IRQ/eventqeue for more XDP TX queues than
before, calculating which is the maximum number of TX queues that one
event queue can handle. For example, in EF10 each event queue could
handle up to 8 queues, better than the 4 they were handling before the
change. This way, it would have to allocate half of channels than before
for XDP TX.

The problem is that the TX queues are also contained inside the channel
structs, and there are only 4 queues per channel. Reducing the number of
channels means also reducing the number of queues, resulting in not
having the desired number of 1 queue per CPU.

This leads to getting errors on XDP_TX and XDP_REDIRECT if they're
executed from a high numbered CPU, because there only exist queues for
the low half of CPUs, actually. If XDP_TX/REDIRECT is executed in a low
numbered CPU, the error doesn't happen. This is the error in the logs
(repeated many times, even rate limited):
sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)

This errors happens in function efx_xdp_tx_buffers, where it expects to
have a dedicated XDP TX queue per CPU.

Reverting the change makes again more likely to reach the limit of 32
channels in machines with many CPUs. If this happen, no XDP_TX/REDIRECT
will be possible at all, and we will have this log error messages:

At interface probe:
sfc 0000:5e:00.0: Insufficient resources for 12 XDP event queues (24 other channels, max 32)

At every subsequent XDP_TX/REDIRECT failure, rate limited:
sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)

However, without reverting the patch, it makes the user to think that
everything is OK at probe time, but later it fails in an unpredictable
way, depending on the CPU that handles the packet.

It is better to restore the predictable behaviour. If the user sees the
error message at probe time, he/she can try to configure the best way it
fits his needs. At least, he/she will have 2 options:
- Accept that XDP_TX/REDIRECT is not available (he/she may not need it)
- Load sfc module with modparam 'rss_cpus' with a lower number, thus
  creating less normal RX queues/channels, letting more free resources
  for XDP, with some performance penalty.

Related mailing list thread:
https://lore.kernel.org/bpf/20201215104327.2be76156@carbon/

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
---
 drivers/net/ethernet/sfc/efx_channels.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index a3ca406a3561..4fa5d675b6d4 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -17,7 +17,6 @@
 #include "rx_common.h"
 #include "nic.h"
 #include "sriov.h"
-#include "workarounds.h"
 
 /* This is the first interrupt mode to try out of:
  * 0 => MSI-X
@@ -138,7 +137,6 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 {
 	unsigned int n_channels = parallelism;
 	int vec_count;
-	int tx_per_ev;
 	int n_xdp_tx;
 	int n_xdp_ev;
 
@@ -151,9 +149,9 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 	 * multiple tx queues, assuming tx and ev queues are both
 	 * maximum size.
 	 */
-	tx_per_ev = EFX_MAX_EVQ_SIZE / EFX_TXQ_MAX_ENT(efx);
+
 	n_xdp_tx = num_possible_cpus();
-	n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, tx_per_ev);
+	n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, EFX_MAX_TXQ_PER_CHANNEL);
 
 	vec_count = pci_msix_vec_count(efx->pci_dev);
 	if (vec_count < 0)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/3] sfc: revert "adjust efx->xdp_tx_queue_count with the real number of initialized queues"
  2021-07-07  8:16 [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues" Íñigo Huguet
@ 2021-07-07  8:16 ` Íñigo Huguet
  2021-07-07  8:16 ` [PATCH 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available Íñigo Huguet
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-07  8:16 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, linux-kernel, ihuguet

This reverts commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count
with the real number of initialized queues"). It intended to fix a
problem caused by a round up when calculating the number of XDP channels
and queues.

However, that was not the real problem. The real problem was that the
number of XDP TX queues created had been reduced to half in
commit e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues"),
but the variable xdp_tx_queue_count had remained the same.

After reverting that commit in the previous patch of this series, this
also can be reverted since the error doesn't actually exist.

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
---
 drivers/net/ethernet/sfc/efx_channels.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index 4fa5d675b6d4..a4a626e9cd9a 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -912,8 +912,6 @@ int efx_set_channels(struct efx_nic *efx)
 			}
 		}
 	}
-	if (xdp_queue_number)
-		efx->xdp_tx_queue_count = xdp_queue_number;
 
 	rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);
 	if (rc)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available
  2021-07-07  8:16 [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues" Íñigo Huguet
  2021-07-07  8:16 ` [PATCH 2/3] sfc: revert "adjust efx->xdp_tx_queue_count with the real number of initialized queues" Íñigo Huguet
@ 2021-07-07  8:16 ` Íñigo Huguet
  2021-07-07 11:23 ` [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues" Edward Cree
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-07  8:16 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, linux-kernel, ihuguet

If it's not possible to allocate enough channels for XDP, XDP_TX and
XDP_REDIRECT don't work. However, only a message saying that not enough
channels were available was shown, but not saying what are the
consequences in that case. The user didn't know if he/she can use XDP
or not, if the performance is reduced, or what.

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
---
 drivers/net/ethernet/sfc/efx_channels.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index a4a626e9cd9a..3c732bbe1589 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -167,6 +167,8 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 		netif_err(efx, drv, efx->net_dev,
 			  "Insufficient resources for %d XDP event queues (%d other channels, max %d)\n",
 			  n_xdp_ev, n_channels, max_channels);
+		netif_err(efx, drv, efx->net_dev,
+			  "XDP_TX and XDP_REDIRECT will not work on this interface");
 		efx->n_xdp_channels = 0;
 		efx->xdp_tx_per_channel = 0;
 		efx->xdp_tx_queue_count = 0;
@@ -174,6 +176,8 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 		netif_err(efx, drv, efx->net_dev,
 			  "Insufficient resources for %d XDP TX queues (%d other channels, max VIs %d)\n",
 			  n_xdp_tx, n_channels, efx->max_vis);
+		netif_err(efx, drv, efx->net_dev,
+			  "XDP_TX and XDP_REDIRECT will not work on this interface");
 		efx->n_xdp_channels = 0;
 		efx->xdp_tx_per_channel = 0;
 		efx->xdp_tx_queue_count = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues"
  2021-07-07  8:16 [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues" Íñigo Huguet
  2021-07-07  8:16 ` [PATCH 2/3] sfc: revert "adjust efx->xdp_tx_queue_count with the real number of initialized queues" Íñigo Huguet
  2021-07-07  8:16 ` [PATCH 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available Íñigo Huguet
@ 2021-07-07 11:23 ` Edward Cree
  2021-07-07 11:49   ` Íñigo Huguet
  2021-07-09 12:55 ` [PATCH v2 0/3] Fix lack of XDP TX queues Íñigo Huguet
  2021-07-13 14:21 ` [PATCH v3 0/3] Fix lack of XDP TX queues Íñigo Huguet
  4 siblings, 1 reply; 23+ messages in thread
From: Edward Cree @ 2021-07-07 11:23 UTC (permalink / raw)
  To: Íñigo Huguet, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, linux-kernel

On 07/07/2021 09:16, Íñigo Huguet wrote:
> The problem is that the TX queues are also contained inside the channel
> structs, and there are only 4 queues per channel. Reducing the number of
> channels means also reducing the number of queues, resulting in not
> having the desired number of 1 queue per CPU.
> 
> This leads to getting errors on XDP_TX and XDP_REDIRECT if they're
> executed from a high numbered CPU, because there only exist queues for
> the low half of CPUs, actually.

Should we then be using min(tx_per_ev, EFX_MAX_TXQ_PER_CHANNEL) in the
 DIV_ROUND_UP?
And on line 184 probably we need to set efx->xdp_tx_per_channel to the
 same thing, rather than blindly to EFX_MAX_TXQ_PER_CHANNEL as at
 present — I suspect the issue you mention in patch #2 stemmed from
 that.
Note that if we are in fact hitting this limitation (i.e. if
 tx_per_ev > EFX_MAX_TXQ_PER_CHANNEL), we could readily increase
 EFX_MAX_TXQ_PER_CHANNEL at the cost of a little host memory, enabling
 us to make more efficient use of our EVQs and thus retain XDP TX
 support up to a higher number of CPUs.

-ed

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues"
  2021-07-07 11:23 ` [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues" Edward Cree
@ 2021-07-07 11:49   ` Íñigo Huguet
  2021-07-07 13:01     ` Martin Habets
  0 siblings, 1 reply; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-07 11:49 UTC (permalink / raw)
  To: Edward Cree
  Cc: habetsm.xilinx, David S. Miller, Jakub Kicinski, ivan, ast,
	daniel, hawk, john.fastabend, netdev, linux-kernel

On Wed, Jul 7, 2021 at 1:23 PM Edward Cree <ecree.xilinx@gmail.com> wrote:
> Should we then be using min(tx_per_ev, EFX_MAX_TXQ_PER_CHANNEL) in the
>  DIV_ROUND_UP?

Could be another possibility, but currently that will always result in
EFX_MAX_TXQ_PER_CHANNEL, because tx_per_ev will be 4 or 8 depending on
the model. Anyway, I will add this change to v2, just in case any
constant is changed in the future.

> And on line 184 probably we need to set efx->xdp_tx_per_channel to the
>  same thing, rather than blindly to EFX_MAX_TXQ_PER_CHANNEL as at
>  present — I suspect the issue you mention in patch #2 stemmed from
>  that.
> Note that if we are in fact hitting this limitation (i.e. if
>  tx_per_ev > EFX_MAX_TXQ_PER_CHANNEL), we could readily increase
>  EFX_MAX_TXQ_PER_CHANNEL at the cost of a little host memory, enabling
>  us to make more efficient use of our EVQs and thus retain XDP TX
>  support up to a higher number of CPUs.

Yes, that was a possibility I was thinking of as long term solution,
or even allocate the queues dynamically. Would this be a problem?
What's the reason for them being statically allocated? Also, what's
the reason for the channels being limited to 32? The hardware can be
configured to provide more than that, but the driver has this constant
limit.

Another question I have, thinking about the long term solution: would
it be a problem to use the standard TX queues for XDP_TX/REDIRECT? At
least in the case that we're hitting the resources limits, I think
that they could be enqueued to these queues. I think that just taking
netif_tx_lock would avoid race conditions, or a per-queue lock.

In any case, these are 2 different things: one is fixing this bug as
soon as possible, and another thinking and implementing the long term
solution to the short-of-resources problem.

Regards
-- 
Íñigo Huguet


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues"
  2021-07-07 11:49   ` Íñigo Huguet
@ 2021-07-07 13:01     ` Martin Habets
  2021-07-08 12:14       ` Íñigo Huguet
  0 siblings, 1 reply; 23+ messages in thread
From: Martin Habets @ 2021-07-07 13:01 UTC (permalink / raw)
  To: Íñigo Huguet
  Cc: Edward Cree, David S. Miller, Jakub Kicinski, ivan, ast, daniel,
	hawk, john.fastabend, netdev, linux-kernel

On Wed, Jul 07, 2021 at 01:49:40PM +0200, Íñigo Huguet wrote:
> > And on line 184 probably we need to set efx->xdp_tx_per_channel to the
> >  same thing, rather than blindly to EFX_MAX_TXQ_PER_CHANNEL as at
> >  present — I suspect the issue you mention in patch #2 stemmed from
> >  that.
> > Note that if we are in fact hitting this limitation (i.e. if
> >  tx_per_ev > EFX_MAX_TXQ_PER_CHANNEL), we could readily increase
> >  EFX_MAX_TXQ_PER_CHANNEL at the cost of a little host memory, enabling
> >  us to make more efficient use of our EVQs and thus retain XDP TX
> >  support up to a higher number of CPUs.
> 
> Yes, that was a possibility I was thinking of as long term solution,
> or even allocate the queues dynamically. Would this be a problem?
> What's the reason for them being statically allocated? Also, what's
> the reason for the channels being limited to 32? The hardware can be
> configured to provide more than that, but the driver has this constant
> limit.

The static defines in this area are historic only. We have wanted to
remove them for a number of years. With newer hardware the reasons to
do so are ever increasing, so we are more actively working on this now.

> Another question I have, thinking about the long term solution: would
> it be a problem to use the standard TX queues for XDP_TX/REDIRECT? At
> least in the case that we're hitting the resources limits, I think
> that they could be enqueued to these queues. I think that just taking
> netif_tx_lock would avoid race conditions, or a per-queue lock.

We considered this but did not want normal traffic to get delayed for
XDP traffic. The perceived performance drop on a normal queue would
be tricky to diagnose, and the only way to prevent it would be to
disable XDP on the interface all together. There is no way to do the
latter per interface, and we felt the "solution" of disabling XDP
was not a good way forward.
Off course our design of this was all done several years ago.

Regards,
Martin Habets

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues"
  2021-07-07 13:01     ` Martin Habets
@ 2021-07-08 12:14       ` Íñigo Huguet
  2021-07-09 14:07         ` Edward Cree
  0 siblings, 1 reply; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-08 12:14 UTC (permalink / raw)
  To: Íñigo Huguet, Edward Cree, David S. Miller,
	Jakub Kicinski, ivan, ast, daniel, hawk, john.fastabend, netdev,
	linux-kernel

On Wed, Jul 7, 2021 at 3:01 PM Martin Habets <habetsm.xilinx@gmail.com> wrote:
> > Another question I have, thinking about the long term solution: would
> > it be a problem to use the standard TX queues for XDP_TX/REDIRECT? At
> > least in the case that we're hitting the resources limits, I think
> > that they could be enqueued to these queues. I think that just taking
> > netif_tx_lock would avoid race conditions, or a per-queue lock.
>
> We considered this but did not want normal traffic to get delayed for
> XDP traffic. The perceived performance drop on a normal queue would
> be tricky to diagnose, and the only way to prevent it would be to
> disable XDP on the interface all together. There is no way to do the
> latter per interface, and we felt the "solution" of disabling XDP
> was not a good way forward.
> Off course our design of this was all done several years ago.

In my opinion, there is no reason to make that distinction between
normal traffic and XDP traffic. XDP traffic redirected with XDP_TX or
XDP_REDIRECT is traffic that the user has chosen to redirect that way,
but pushing the work down in the stack. Without XDP, this traffic had
gone up the stack to userspace, or at least to the firewall, and then
redirected, passed again to the network stack and added to normal TX
queues.

If the user wants to prevent XDP from mixing with normal traffic, just
not attaching an XDP program to the interface, or not using
XDP_TX/REDIRECT in it would be enough. Probably I don't understand
what you want to say here.

Anyway, if you think that keeping XDP TX queues separated is the way
to go, it's OK, but my proposal is to share the normal TX queues at
least in the cases where dedicated queues cannot be allocated. As you
say, the performance drop would be tricky to measure, if there's any,
but in any case, even separating the queues, they're competing for
resources of CPU, PCI bandwidth, network bandwidth...

The fact is that the situation right now is this one:
- Many times (or almost always with modern servers' processors)
XDP_TX/REDIRECT doesn't work at all
- The only workaround is reducing the number of normal channels to let
free resources for XDP, but this is a much higher performance drop for
normal traffic than sharing queues with XDP, IMHO.

Increasing the maximum number of channels and queues, or even making
them virtually unlimited, would be very good, I think, because people
who knows how to configure the hardware would take advantage of it,
but there will always be situations of getting short of resources:
- Who knows how many cores we will be using 5 forward from now?
- VFs normally have less resources available: 8 MSI-X vectors by default

With some time, I can try to prepare some patches with these changes,
if you agree.

Regards
-- 
Íñigo Huguet


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 0/3] Fix lack of XDP TX queues
  2021-07-07  8:16 [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues" Íñigo Huguet
                   ` (2 preceding siblings ...)
  2021-07-07 11:23 ` [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues" Edward Cree
@ 2021-07-09 12:55 ` Íñigo Huguet
  2021-07-09 12:55   ` [PATCH v2 1/3] sfc: fix lack of XDP TX queues - error XDP TX failed (-22) Íñigo Huguet
                     ` (2 more replies)
  2021-07-13 14:21 ` [PATCH v3 0/3] Fix lack of XDP TX queues Íñigo Huguet
  4 siblings, 3 replies; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-09 12:55 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, ihuguet

A change introduced in commit e26ca4b53582 ("sfc: reduce the number of
requested xdp ev queues") created a bug in XDP_TX and XDP_REDIRECT
because it unintentionally reduced the number of XDP TX queues, letting
not enough queues to have one per CPU, which leaded to errors if XDP
TX/REDIRECT was done from a high numbered CPU.

This patchs make the following changes:
- Fix the bug mentioned above
- Revert commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with
  the real number of initialized queues") which intended to fix a related
  problem, created by mentioned bug, but it's no longer necesary
- Add a new error log message if there are not enough resources to make
  XDP_TX/REDIRECT work

V1 -> V2: keep the calculation of how many tx queues can handle a single
event queue, but apply the "max. tx queues per channel" upper limit.

Íñigo Huguet (3):
  sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
  sfc: revert "adjust efx->xdp_tx_queue_count with the real number of
    initialized queues"
  sfc: add logs explaining XDP_TX/REDIRECT is not available

 drivers/net/ethernet/sfc/efx_channels.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/3] sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
  2021-07-09 12:55 ` [PATCH v2 0/3] Fix lack of XDP TX queues Íñigo Huguet
@ 2021-07-09 12:55   ` Íñigo Huguet
  2021-07-09 12:55   ` [PATCH v2 2/3] sfc: revert "adjust efx->xdp_tx_queue_count with the real number of initialized queues" Íñigo Huguet
  2021-07-09 12:55   ` [PATCH v2 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available Íñigo Huguet
  2 siblings, 0 replies; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-09 12:55 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, ihuguet

Fixes: e26ca4b53582 sfc: reduce the number of requested xdp ev queues

The buggy commit intended to allocate less channels for XDP in order to
be more unlikely to reach the limit of 32 channels of the driver.

The idea was to use each IRQ/eventqeue for more XDP TX queues than
before, calculating which is the maximum number of TX queues that one
event queue can handle. For example, in EF10 each event queue could
handle up to 8 queues, better than the 4 they were handling before the
change. This way, it would have to allocate half of channels than before
for XDP TX.

The problem is that the TX queues are also contained inside the channel
structs, and there are only 4 queues per channel. Reducing the number of
channels means also reducing the number of queues, resulting in not
having the desired number of 1 queue per CPU.

This leads to getting errors on XDP_TX and XDP_REDIRECT if they're
executed from a high numbered CPU, because there only exist queues for
the low half of CPUs, actually. If XDP_TX/REDIRECT is executed in a low
numbered CPU, the error doesn't happen. This is the error in the logs
(repeated many times, even rate limited):
sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)

This errors happens in function efx_xdp_tx_buffers, where it expects to
have a dedicated XDP TX queue per CPU.

Reverting the change makes again more likely to reach the limit of 32
channels in machines with many CPUs. If this happen, no XDP_TX/REDIRECT
will be possible at all, and we will have this log error messages:

At interface probe:
sfc 0000:5e:00.0: Insufficient resources for 12 XDP event queues (24 other channels, max 32)

At every subsequent XDP_TX/REDIRECT failure, rate limited:
sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)

However, without reverting the change, it makes the user to think that
everything is OK at probe time, but later it fails in an unpredictable
way, depending on the CPU that handles the packet.

It is better to restore the predictable behaviour. If the user sees the
error message at probe time, he/she can try to configure the best way it
fits his/her needs. At least, he/she will have 2 options:
- Accept that XDP_TX/REDIRECT is not available (he/she may not need it)
- Load sfc module with modparam 'rss_cpus' with a lower number, thus
  creating less normal RX queues/channels, letting more free resources
  for XDP, with some performance penalty.

Anyway, let the calculation of maximum TX queues that can be handled by
a single event queue, and use it only if it's less than the number of TX
queues per channel. This doesn't happen in practice, but could happen if
some constant values are tweaked in the future, such us
EFX_MAX_TXQ_PER_CHANNEL, EFX_MAX_EVQ_SIZE or EFX_MAX_DMAQ_SIZE.

Related mailing list thread:
https://lore.kernel.org/bpf/20201215104327.2be76156@carbon/

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
---
 drivers/net/ethernet/sfc/efx_channels.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index a3ca406a3561..5b71f8a03a6d 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -152,6 +152,7 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 	 * maximum size.
 	 */
 	tx_per_ev = EFX_MAX_EVQ_SIZE / EFX_TXQ_MAX_ENT(efx);
+	tx_per_ev = min(tx_per_ev, EFX_MAX_TXQ_PER_CHANNEL);
 	n_xdp_tx = num_possible_cpus();
 	n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, tx_per_ev);
 
@@ -181,7 +182,7 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 		efx->xdp_tx_queue_count = 0;
 	} else {
 		efx->n_xdp_channels = n_xdp_ev;
-		efx->xdp_tx_per_channel = EFX_MAX_TXQ_PER_CHANNEL;
+		efx->xdp_tx_per_channel = tx_per_ev;
 		efx->xdp_tx_queue_count = n_xdp_tx;
 		n_channels += n_xdp_ev;
 		netif_dbg(efx, drv, efx->net_dev,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/3] sfc: revert "adjust efx->xdp_tx_queue_count with the real number of initialized queues"
  2021-07-09 12:55 ` [PATCH v2 0/3] Fix lack of XDP TX queues Íñigo Huguet
  2021-07-09 12:55   ` [PATCH v2 1/3] sfc: fix lack of XDP TX queues - error XDP TX failed (-22) Íñigo Huguet
@ 2021-07-09 12:55   ` Íñigo Huguet
  2021-07-09 13:53     ` Edward Cree
  2021-07-09 12:55   ` [PATCH v2 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available Íñigo Huguet
  2 siblings, 1 reply; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-09 12:55 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, ihuguet

This reverts commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count
with the real number of initialized queues"). It intended to fix a
problem caused by a round up when calculating the number of XDP channels
and queues.

However, that was not the real problem. The real problem was that the
number of XDP TX queues created had been reduced to half in
commit e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues"),
but the variable xdp_tx_queue_count had remained the same.

After reverting that commit in the previous patch of this series, this
also can be reverted since the error doesn't actually exist.

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
---
 drivers/net/ethernet/sfc/efx_channels.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index 5b71f8a03a6d..e25c8f9d9ff4 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -915,8 +915,6 @@ int efx_set_channels(struct efx_nic *efx)
 			}
 		}
 	}
-	if (xdp_queue_number)
-		efx->xdp_tx_queue_count = xdp_queue_number;
 
 	rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);
 	if (rc)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available
  2021-07-09 12:55 ` [PATCH v2 0/3] Fix lack of XDP TX queues Íñigo Huguet
  2021-07-09 12:55   ` [PATCH v2 1/3] sfc: fix lack of XDP TX queues - error XDP TX failed (-22) Íñigo Huguet
  2021-07-09 12:55   ` [PATCH v2 2/3] sfc: revert "adjust efx->xdp_tx_queue_count with the real number of initialized queues" Íñigo Huguet
@ 2021-07-09 12:55   ` Íñigo Huguet
  2 siblings, 0 replies; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-09 12:55 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, ihuguet

If it's not possible to allocate enough channels for XDP, XDP_TX and
XDP_REDIRECT don't work. However, only a message saying that not enough
channels were available was shown, but not saying what are the
consequences in that case. The user didn't know if he/she can use XDP
or not, if the performance is reduced, or what.

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
---
 drivers/net/ethernet/sfc/efx_channels.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index e25c8f9d9ff4..b51957e1a7de 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -170,6 +170,8 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 		netif_err(efx, drv, efx->net_dev,
 			  "Insufficient resources for %d XDP event queues (%d other channels, max %d)\n",
 			  n_xdp_ev, n_channels, max_channels);
+		netif_err(efx, drv, efx->net_dev,
+			  "XDP_TX and XDP_REDIRECT will not work on this interface");
 		efx->n_xdp_channels = 0;
 		efx->xdp_tx_per_channel = 0;
 		efx->xdp_tx_queue_count = 0;
@@ -177,6 +179,8 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 		netif_err(efx, drv, efx->net_dev,
 			  "Insufficient resources for %d XDP TX queues (%d other channels, max VIs %d)\n",
 			  n_xdp_tx, n_channels, efx->max_vis);
+		netif_err(efx, drv, efx->net_dev,
+			  "XDP_TX and XDP_REDIRECT will not work on this interface");
 		efx->n_xdp_channels = 0;
 		efx->xdp_tx_per_channel = 0;
 		efx->xdp_tx_queue_count = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] sfc: revert "adjust efx->xdp_tx_queue_count with the real number of initialized queues"
  2021-07-09 12:55   ` [PATCH v2 2/3] sfc: revert "adjust efx->xdp_tx_queue_count with the real number of initialized queues" Íñigo Huguet
@ 2021-07-09 13:53     ` Edward Cree
  0 siblings, 0 replies; 23+ messages in thread
From: Edward Cree @ 2021-07-09 13:53 UTC (permalink / raw)
  To: Íñigo Huguet, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev

On 09/07/2021 13:55, Íñigo Huguet wrote:
> This reverts commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count
> with the real number of initialized queues"). It intended to fix a
> problem caused by a round up when calculating the number of XDP channels
> and queues.
> 
> However, that was not the real problem. The real problem was that the
> number of XDP TX queues created had been reduced to half in
> commit e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues"),
> but the variable xdp_tx_queue_count had remained the same.
> 
> After reverting that commit in the previous patch of this series, this
> also can be reverted since the error doesn't actually exist.
> 
> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
> ---
>  drivers/net/ethernet/sfc/efx_channels.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
> index 5b71f8a03a6d..e25c8f9d9ff4 100644
> --- a/drivers/net/ethernet/sfc/efx_channels.c
> +++ b/drivers/net/ethernet/sfc/efx_channels.c
> @@ -915,8 +915,6 @@ int efx_set_channels(struct efx_nic *efx)
>  			}
>  		}
>  	}
> -	if (xdp_queue_number)
> -		efx->xdp_tx_queue_count = xdp_queue_number;
>  

Probably best to add in a WARN_ON[_ONCE] here to catch if these ever
 aren't already equal.  Or at least a netif_warn().

-ed

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues"
  2021-07-08 12:14       ` Íñigo Huguet
@ 2021-07-09 14:07         ` Edward Cree
  2021-07-09 15:06           ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 23+ messages in thread
From: Edward Cree @ 2021-07-09 14:07 UTC (permalink / raw)
  To: Íñigo Huguet, David S. Miller, Jakub Kicinski, ivan,
	ast, daniel, hawk, john.fastabend, netdev, linux-kernel

On 08/07/2021 13:14, Íñigo Huguet wrote:
> In my opinion, there is no reason to make that distinction between
> normal traffic and XDP traffic.
> [...]
> If the user wants to prevent XDP from mixing with normal traffic, just
> not attaching an XDP program to the interface, or not using
> XDP_TX/REDIRECT in it would be enough. Probably I don't understand
> what you want to say here.

I think it's less about that and more about avoiding lock contention.
If two sources (XDP and the regular stack) are both trying to use a TXQ,
 and contending for a lock, it's possible that the resulting total
 throughput could be far less than either source alone would get if it
 had exclusive use of a queue.
There don't really seem to be any good answers to this; any CPU in the
 system can initiate an XDP_REDIRECT at any time and if they can't each
 get a queue to themselves then I don't see how the arbitration can be
 performant.  (There is the middle-ground possibility of TXQs shared by
 multiple XDP CPUs but not shared with the regular stack, in which case
 if only a subset of CPUs are actually handling RX on the device(s) with
 an XDP_REDIRECTing program it may be possible to avoid contention if
 the core-to-XDP-TXQ mapping can be carefully configured.)

-ed

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues"
  2021-07-09 14:07         ` Edward Cree
@ 2021-07-09 15:06           ` Jesper Dangaard Brouer
  2021-07-12 13:40             ` Íñigo Huguet
  0 siblings, 1 reply; 23+ messages in thread
From: Jesper Dangaard Brouer @ 2021-07-09 15:06 UTC (permalink / raw)
  To: Edward Cree, Íñigo Huguet, David S. Miller,
	Jakub Kicinski, ivan, ast, daniel, hawk, john.fastabend, netdev,
	linux-kernel
  Cc: brouer



On 09/07/2021 16.07, Edward Cree wrote:
> On 08/07/2021 13:14, Íñigo Huguet wrote:
>> In my opinion, there is no reason to make that distinction between
>> normal traffic and XDP traffic.
>> [...]
>> If the user wants to prevent XDP from mixing with normal traffic, just
>> not attaching an XDP program to the interface, or not using
>> XDP_TX/REDIRECT in it would be enough. Probably I don't understand
>> what you want to say here.
> 
> I think it's less about that and more about avoiding lock contention.
> If two sources (XDP and the regular stack) are both trying to use a TXQ,
>   and contending for a lock, it's possible that the resulting total
>   throughput could be far less than either source alone would get if it
>   had exclusive use of a queue.
> There don't really seem to be any good answers to this; any CPU in the
>   system can initiate an XDP_REDIRECT at any time and if they can't each
>   get a queue to themselves then I don't see how the arbitration can be
>   performant.  (There is the middle-ground possibility of TXQs shared by
>   multiple XDP CPUs but not shared with the regular stack, in which case
>   if only a subset of CPUs are actually handling RX on the device(s) with
>   an XDP_REDIRECTing program it may be possible to avoid contention if
>   the core-to-XDP-TXQ mapping can be carefully configured.)

Yes, I prefer the 'middle-ground' fallback you describe.  XDP gets it's 
own set of TXQ-queues, and when driver detect TXQ's are less than CPUs 
that can redirect packets it uses an ndo_xdp_xmit function that takes a 
(hashed) lock (happens per packet burst (max 16)).

--Jesper


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues"
  2021-07-09 15:06           ` Jesper Dangaard Brouer
@ 2021-07-12 13:40             ` Íñigo Huguet
  2021-07-12 14:52               ` Edward Cree
  0 siblings, 1 reply; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-12 13:40 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Edward Cree, David S. Miller, Jakub Kicinski, ivan, ast, daniel,
	hawk, john.fastabend, netdev, linux-kernel, brouer

On Fri, Jul 9, 2021 at 5:07 PM Jesper Dangaard Brouer
<jbrouer@redhat.com> wrote:
> > I think it's less about that and more about avoiding lock contention.
> > If two sources (XDP and the regular stack) are both trying to use a TXQ,
> >   and contending for a lock, it's possible that the resulting total
> >   throughput could be far less than either source alone would get if it
> >   had exclusive use of a queue.
> > There don't really seem to be any good answers to this; any CPU in the
> >   system can initiate an XDP_REDIRECT at any time and if they can't each
> >   get a queue to themselves then I don't see how the arbitration can be
> >   performant.  (There is the middle-ground possibility of TXQs shared by
> >   multiple XDP CPUs but not shared with the regular stack, in which case
> >   if only a subset of CPUs are actually handling RX on the device(s) with
> >   an XDP_REDIRECTing program it may be possible to avoid contention if
> >   the core-to-XDP-TXQ mapping can be carefully configured.)
>
> Yes, I prefer the 'middle-ground' fallback you describe.  XDP gets it's
> own set of TXQ-queues, and when driver detect TXQ's are less than CPUs
> that can redirect packets it uses an ndo_xdp_xmit function that takes a
> (hashed) lock (happens per packet burst (max 16)).

That's a good idea, which in fact I had already considered, but I had
(almost) discarded because I still see there 2 problems:
1. If there are no free MSI-X vectors remaining at all,
XDP_TX/REDIRECT will still be disabled.
2. If the amount of free MSI-X vectors is little. Then, many CPUs will
be contending for very few queues/locks, not for normal traffic but
yes for XDP traffic. If someone wants to intensively use
XDP_TX/REDIRECT will get a very poor performance, with no option to
get a better tradeoff between normal and XDP traffic.

We have to consider that both scenarios are very feasible because this
problem appears on machines with a high number of CPUs. Even if
support for more channels and queues per channel is added, who knows
what crazy numbers for CPU cores we will be using in a few years? And
we also have to consider VFs, which usually have much less MSI-X
vectors available, and can be assigned to many different
configurations of virtual machines.

So I think that we still need a last resort fallback of sharing TXQs
with network stack:
1. If there are enough resources: 1 queue per CPU for XDP
2. If there are not enough resources, but still a fair amount: many
queues dedicated only to XDP, with (hashed) locking contention
3. If there are not free resources, or there are very few: TXQs shared
for network core and XDP

Of course, there is always the option of tweaking driver and hardware
parameters to, for example, increase the amount of resources
available. But if the user doesn't use it I think we should give them
a good enough tradeoff. If the user doesn't use XDP, it won't be
noticeable at all. If he/she intensively uses it and doesn't get the
desired performance, he/she will have to tweak parameters.

Jesper has tomorrow a workshop in netdevconf where they will speak
about this topic. Please let us know if you come up with any good new
idea.

Regards
-- 
Íñigo Huguet


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues"
  2021-07-12 13:40             ` Íñigo Huguet
@ 2021-07-12 14:52               ` Edward Cree
  2021-07-13  6:20                 ` Íñigo Huguet
  0 siblings, 1 reply; 23+ messages in thread
From: Edward Cree @ 2021-07-12 14:52 UTC (permalink / raw)
  To: Íñigo Huguet, Jesper Dangaard Brouer
  Cc: David S. Miller, Jakub Kicinski, ivan, ast, daniel, hawk,
	john.fastabend, netdev, linux-kernel, brouer

On 12/07/2021 14:40, Íñigo Huguet wrote:
> That's a good idea, which in fact I had already considered, but I had
> (almost) discarded because I still see there 2 problems:
> 1. If there are no free MSI-X vectors remaining at all,
> XDP_TX/REDIRECT will still be disabled.
> 2. If the amount of free MSI-X vectors is little. Then, many CPUs will
> be contending for very few queues/locks, not for normal traffic but
> yes for XDP traffic. If someone wants to intensively use
> XDP_TX/REDIRECT will get a very poor performance, with no option to
> get a better tradeoff between normal and XDP traffic.
[snip]
> So I think that we still need a last resort fallback of sharing TXQs
> with network stack:
> 1. If there are enough resources: 1 queue per CPU for XDP
> 2. If there are not enough resources, but still a fair amount: many
> queues dedicated only to XDP, with (hashed) locking contention
> 3. If there are not free resources, or there are very few: TXQs shared
> for network core and XDP

I think the proper solution to this is to get this policy decision out
 of the kernel, and make the allocation of queues be under the control
 of userspace.  I recall some discussion a couple of years ago about
 "making queues a first-class citizen" for the sake of AF_XDP; this
 seems to be part and parcel of that.
But I don't know what such an interface would/should look like.

-ed

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues"
  2021-07-12 14:52               ` Edward Cree
@ 2021-07-13  6:20                 ` Íñigo Huguet
  0 siblings, 0 replies; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-13  6:20 UTC (permalink / raw)
  To: Edward Cree
  Cc: Jesper Dangaard Brouer, David S. Miller, Jakub Kicinski, ivan,
	ast, daniel, hawk, john.fastabend, netdev, linux-kernel,
	Jesper Brouer

On Mon, Jul 12, 2021 at 4:53 PM Edward Cree <ecree.xilinx@gmail.com> wrote:
> I think the proper solution to this is to get this policy decision out
>  of the kernel, and make the allocation of queues be under the control
>  of userspace.  I recall some discussion a couple of years ago about
>  "making queues a first-class citizen" for the sake of AF_XDP; this
>  seems to be part and parcel of that.
> But I don't know what such an interface would/should look like.

I absolutely agree, or at least standardize it for all drivers via XDP
APIs. Let's see if any good ideas arise in netdevconf.

Anyway, meanwhile we should do something to have it working.
-- 
Íñigo Huguet


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 0/3] Fix lack of XDP TX queues
  2021-07-07  8:16 [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues" Íñigo Huguet
                   ` (3 preceding siblings ...)
  2021-07-09 12:55 ` [PATCH v2 0/3] Fix lack of XDP TX queues Íñigo Huguet
@ 2021-07-13 14:21 ` Íñigo Huguet
  2021-07-13 14:21   ` [PATCH v3 1/3] sfc: fix lack of XDP TX queues - error XDP TX failed (-22) Íñigo Huguet
                     ` (4 more replies)
  4 siblings, 5 replies; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-13 14:21 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, Íñigo Huguet

A change introduced in commit e26ca4b53582 ("sfc: reduce the number of
requested xdp ev queues") created a bug in XDP_TX and XDP_REDIRECT
because it unintentionally reduced the number of XDP TX queues, letting
not enough queues to have one per CPU, which leaded to errors if XDP
TX/REDIRECT was done from a high numbered CPU.

This patchs make the following changes:
- Fix the bug mentioned above
- Revert commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with
  the real number of initialized queues") which intended to fix a related
  problem, created by mentioned bug, but it's no longer necessary
- Add a new error log message if there are not enough resources to make
  XDP_TX/REDIRECT work

V1 -> V2: keep the calculation of how many tx queues can handle a single
event queue, but apply the "max. tx queues per channel" upper limit.
V2 -> V3: WARN_ON if the number of initialized XDP TXQs differs from the
expected.

Íñigo Huguet (3):
  sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
  sfc: ensure correct number of XDP queues
  sfc: add logs explaining XDP_TX/REDIRECT is not available

 drivers/net/ethernet/sfc/efx_channels.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/3] sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
  2021-07-13 14:21 ` [PATCH v3 0/3] Fix lack of XDP TX queues Íñigo Huguet
@ 2021-07-13 14:21   ` Íñigo Huguet
  2021-07-13 14:21   ` [PATCH v3 2/3] sfc: ensure correct number of XDP queues Íñigo Huguet
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-13 14:21 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, Íñigo Huguet

Fixes: e26ca4b53582 sfc: reduce the number of requested xdp ev queues

The buggy commit intended to allocate less channels for XDP in order to
be more unlikely to reach the limit of 32 channels of the driver.

The idea was to use each IRQ/eventqeue for more XDP TX queues than
before, calculating which is the maximum number of TX queues that one
event queue can handle. For example, in EF10 each event queue could
handle up to 8 queues, better than the 4 they were handling before the
change. This way, it would have to allocate half of channels than before
for XDP TX.

The problem is that the TX queues are also contained inside the channel
structs, and there are only 4 queues per channel. Reducing the number of
channels means also reducing the number of queues, resulting in not
having the desired number of 1 queue per CPU.

This leads to getting errors on XDP_TX and XDP_REDIRECT if they're
executed from a high numbered CPU, because there only exist queues for
the low half of CPUs, actually. If XDP_TX/REDIRECT is executed in a low
numbered CPU, the error doesn't happen. This is the error in the logs
(repeated many times, even rate limited):
sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)

This errors happens in function efx_xdp_tx_buffers, where it expects to
have a dedicated XDP TX queue per CPU.

Reverting the change makes again more likely to reach the limit of 32
channels in machines with many CPUs. If this happen, no XDP_TX/REDIRECT
will be possible at all, and we will have this log error messages:

At interface probe:
sfc 0000:5e:00.0: Insufficient resources for 12 XDP event queues (24 other channels, max 32)

At every subsequent XDP_TX/REDIRECT failure, rate limited:
sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)

However, without reverting the change, it makes the user to think that
everything is OK at probe time, but later it fails in an unpredictable
way, depending on the CPU that handles the packet.

It is better to restore the predictable behaviour. If the user sees the
error message at probe time, he/she can try to configure the best way it
fits his/her needs. At least, he/she will have 2 options:
- Accept that XDP_TX/REDIRECT is not available (he/she may not need it)
- Load sfc module with modparam 'rss_cpus' with a lower number, thus
  creating less normal RX queues/channels, letting more free resources
  for XDP, with some performance penalty.

Anyway, let the calculation of maximum TX queues that can be handled by
a single event queue, and use it only if it's less than the number of TX
queues per channel. This doesn't happen in practice, but could happen if
some constant values are tweaked in the future, such us
EFX_MAX_TXQ_PER_CHANNEL, EFX_MAX_EVQ_SIZE or EFX_MAX_DMAQ_SIZE.

Related mailing list thread:
https://lore.kernel.org/bpf/20201215104327.2be76156@carbon/

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
---
 drivers/net/ethernet/sfc/efx_channels.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index a3ca406a3561..5b71f8a03a6d 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -152,6 +152,7 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 	 * maximum size.
 	 */
 	tx_per_ev = EFX_MAX_EVQ_SIZE / EFX_TXQ_MAX_ENT(efx);
+	tx_per_ev = min(tx_per_ev, EFX_MAX_TXQ_PER_CHANNEL);
 	n_xdp_tx = num_possible_cpus();
 	n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, tx_per_ev);
 
@@ -181,7 +182,7 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 		efx->xdp_tx_queue_count = 0;
 	} else {
 		efx->n_xdp_channels = n_xdp_ev;
-		efx->xdp_tx_per_channel = EFX_MAX_TXQ_PER_CHANNEL;
+		efx->xdp_tx_per_channel = tx_per_ev;
 		efx->xdp_tx_queue_count = n_xdp_tx;
 		n_channels += n_xdp_ev;
 		netif_dbg(efx, drv, efx->net_dev,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/3] sfc: ensure correct number of XDP queues
  2021-07-13 14:21 ` [PATCH v3 0/3] Fix lack of XDP TX queues Íñigo Huguet
  2021-07-13 14:21   ` [PATCH v3 1/3] sfc: fix lack of XDP TX queues - error XDP TX failed (-22) Íñigo Huguet
@ 2021-07-13 14:21   ` Íñigo Huguet
  2021-07-13 14:21   ` [PATCH v3 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available Íñigo Huguet
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-13 14:21 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, Íñigo Huguet

Commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with the real
number of initialized queues") intended to fix a problem caused by a
round up when calculating the number of XDP channels and queues.
However, this was not the real problem. The real problem was that the
number of XDP TX queues had been reduced to half in
commit e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues"),
but the variable xdp_tx_queue_count had remained the same.

Once the correct number of XDP TX queues is created again in the
previous patch of this series, this also can be reverted since the error
doesn't actually exist.

Only in the case that there is a bug in the code we can have different
values in xdp_queue_number and efx->xdp_tx_queue_count. Because of this,
and per Edward Cree's suggestion, I add instead a WARN_ON to catch if it
happens again in the future.

Note that the number of allocated queues can be higher than the number
of used ones due to the round up, as explained in the existing comment
in the code. That's why we also have to stop increasing xdp_queue_number
beyond efx->xdp_tx_queue_count.

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
---
 drivers/net/ethernet/sfc/efx_channels.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index 5b71f8a03a6d..bb48a139dd15 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -892,18 +892,20 @@ int efx_set_channels(struct efx_nic *efx)
 			if (efx_channel_is_xdp_tx(channel)) {
 				efx_for_each_channel_tx_queue(tx_queue, channel) {
 					tx_queue->queue = next_queue++;
-					netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is XDP %u, HW %u\n",
-						  channel->channel, tx_queue->label,
-						  xdp_queue_number, tx_queue->queue);
+
 					/* We may have a few left-over XDP TX
 					 * queues owing to xdp_tx_queue_count
 					 * not dividing evenly by EFX_MAX_TXQ_PER_CHANNEL.
 					 * We still allocate and probe those
 					 * TXQs, but never use them.
 					 */
-					if (xdp_queue_number < efx->xdp_tx_queue_count)
+					if (xdp_queue_number < efx->xdp_tx_queue_count) {
+						netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is XDP %u, HW %u\n",
+							  channel->channel, tx_queue->label,
+							  xdp_queue_number, tx_queue->queue);
 						efx->xdp_tx_queues[xdp_queue_number] = tx_queue;
-					xdp_queue_number++;
+						xdp_queue_number++;
+					}
 				}
 			} else {
 				efx_for_each_channel_tx_queue(tx_queue, channel) {
@@ -915,8 +917,7 @@ int efx_set_channels(struct efx_nic *efx)
 			}
 		}
 	}
-	if (xdp_queue_number)
-		efx->xdp_tx_queue_count = xdp_queue_number;
+	WARN_ON(xdp_queue_number != efx->xdp_tx_queue_count);
 
 	rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);
 	if (rc)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available
  2021-07-13 14:21 ` [PATCH v3 0/3] Fix lack of XDP TX queues Íñigo Huguet
  2021-07-13 14:21   ` [PATCH v3 1/3] sfc: fix lack of XDP TX queues - error XDP TX failed (-22) Íñigo Huguet
  2021-07-13 14:21   ` [PATCH v3 2/3] sfc: ensure correct number of XDP queues Íñigo Huguet
@ 2021-07-13 14:21   ` Íñigo Huguet
  2021-07-13 14:41   ` [PATCH v3 0/3] Fix lack of XDP TX queues Edward Cree
  2021-07-13 20:10   ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 23+ messages in thread
From: Íñigo Huguet @ 2021-07-13 14:21 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev, Íñigo Huguet

If it's not possible to allocate enough channels for XDP, XDP_TX and
XDP_REDIRECT don't work. However, only a message saying that not enough
channels were available was shown, but not saying what are the
consequences in that case. The user didn't know if he/she can use XDP
or not, if the performance is reduced, or what.

Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
---
 drivers/net/ethernet/sfc/efx_channels.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index bb48a139dd15..e5b0d795c301 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -170,6 +170,8 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 		netif_err(efx, drv, efx->net_dev,
 			  "Insufficient resources for %d XDP event queues (%d other channels, max %d)\n",
 			  n_xdp_ev, n_channels, max_channels);
+		netif_err(efx, drv, efx->net_dev,
+			  "XDP_TX and XDP_REDIRECT will not work on this interface");
 		efx->n_xdp_channels = 0;
 		efx->xdp_tx_per_channel = 0;
 		efx->xdp_tx_queue_count = 0;
@@ -177,6 +179,8 @@ static int efx_allocate_msix_channels(struct efx_nic *efx,
 		netif_err(efx, drv, efx->net_dev,
 			  "Insufficient resources for %d XDP TX queues (%d other channels, max VIs %d)\n",
 			  n_xdp_tx, n_channels, efx->max_vis);
+		netif_err(efx, drv, efx->net_dev,
+			  "XDP_TX and XDP_REDIRECT will not work on this interface");
 		efx->n_xdp_channels = 0;
 		efx->xdp_tx_per_channel = 0;
 		efx->xdp_tx_queue_count = 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/3] Fix lack of XDP TX queues
  2021-07-13 14:21 ` [PATCH v3 0/3] Fix lack of XDP TX queues Íñigo Huguet
                     ` (2 preceding siblings ...)
  2021-07-13 14:21   ` [PATCH v3 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available Íñigo Huguet
@ 2021-07-13 14:41   ` Edward Cree
  2021-07-13 20:10   ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 23+ messages in thread
From: Edward Cree @ 2021-07-13 14:41 UTC (permalink / raw)
  To: Íñigo Huguet, habetsm.xilinx, davem, kuba, ivan
  Cc: ast, daniel, hawk, john.fastabend, netdev

On 13/07/2021 15:21, Íñigo Huguet wrote:
> A change introduced in commit e26ca4b53582 ("sfc: reduce the number of
> requested xdp ev queues") created a bug in XDP_TX and XDP_REDIRECT
> because it unintentionally reduced the number of XDP TX queues, letting
> not enough queues to have one per CPU, which leaded to errors if XDP
> TX/REDIRECT was done from a high numbered CPU.
> 
> This patchs make the following changes:
> - Fix the bug mentioned above
> - Revert commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with
>   the real number of initialized queues") which intended to fix a related
>   problem, created by mentioned bug, but it's no longer necessary
> - Add a new error log message if there are not enough resources to make
>   XDP_TX/REDIRECT work
> 
> V1 -> V2: keep the calculation of how many tx queues can handle a single
> event queue, but apply the "max. tx queues per channel" upper limit.
> V2 -> V3: WARN_ON if the number of initialized XDP TXQs differs from the
> expected.
> 
> Íñigo Huguet (3):
>   sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
>   sfc: ensure correct number of XDP queues
>   sfc: add logs explaining XDP_TX/REDIRECT is not available
> 
>  drivers/net/ethernet/sfc/efx_channels.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 

For the series:
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/3] Fix lack of XDP TX queues
  2021-07-13 14:21 ` [PATCH v3 0/3] Fix lack of XDP TX queues Íñigo Huguet
                     ` (3 preceding siblings ...)
  2021-07-13 14:41   ` [PATCH v3 0/3] Fix lack of XDP TX queues Edward Cree
@ 2021-07-13 20:10   ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 23+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-07-13 20:10 UTC (permalink / raw)
  To: =?utf-8?b?w43DsWlnbyBIdWd1ZXQgPGlodWd1ZXRAcmVkaGF0LmNvbT4=?=
  Cc: ecree.xilinx, habetsm.xilinx, davem, kuba, ivan, ast, daniel,
	hawk, john.fastabend, netdev

Hello:

This series was applied to netdev/net.git (refs/heads/master):

On Tue, 13 Jul 2021 16:21:26 +0200 you wrote:
> A change introduced in commit e26ca4b53582 ("sfc: reduce the number of
> requested xdp ev queues") created a bug in XDP_TX and XDP_REDIRECT
> because it unintentionally reduced the number of XDP TX queues, letting
> not enough queues to have one per CPU, which leaded to errors if XDP
> TX/REDIRECT was done from a high numbered CPU.
> 
> This patchs make the following changes:
> - Fix the bug mentioned above
> - Revert commit 99ba0ea616aa ("sfc: adjust efx->xdp_tx_queue_count with
>   the real number of initialized queues") which intended to fix a related
>   problem, created by mentioned bug, but it's no longer necessary
> - Add a new error log message if there are not enough resources to make
>   XDP_TX/REDIRECT work
> 
> [...]

Here is the summary with links:
  - [v3,1/3] sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
    https://git.kernel.org/netdev/net/c/f28100cb9c96
  - [v3,2/3] sfc: ensure correct number of XDP queues
    https://git.kernel.org/netdev/net/c/788bc000d4c2
  - [v3,3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available
    https://git.kernel.org/netdev/net/c/d2a16bde7732

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-07-13 20:10 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-07  8:16 [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues" Íñigo Huguet
2021-07-07  8:16 ` [PATCH 2/3] sfc: revert "adjust efx->xdp_tx_queue_count with the real number of initialized queues" Íñigo Huguet
2021-07-07  8:16 ` [PATCH 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available Íñigo Huguet
2021-07-07 11:23 ` [PATCH 1/3] sfc: revert "reduce the number of requested xdp ev queues" Edward Cree
2021-07-07 11:49   ` Íñigo Huguet
2021-07-07 13:01     ` Martin Habets
2021-07-08 12:14       ` Íñigo Huguet
2021-07-09 14:07         ` Edward Cree
2021-07-09 15:06           ` Jesper Dangaard Brouer
2021-07-12 13:40             ` Íñigo Huguet
2021-07-12 14:52               ` Edward Cree
2021-07-13  6:20                 ` Íñigo Huguet
2021-07-09 12:55 ` [PATCH v2 0/3] Fix lack of XDP TX queues Íñigo Huguet
2021-07-09 12:55   ` [PATCH v2 1/3] sfc: fix lack of XDP TX queues - error XDP TX failed (-22) Íñigo Huguet
2021-07-09 12:55   ` [PATCH v2 2/3] sfc: revert "adjust efx->xdp_tx_queue_count with the real number of initialized queues" Íñigo Huguet
2021-07-09 13:53     ` Edward Cree
2021-07-09 12:55   ` [PATCH v2 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available Íñigo Huguet
2021-07-13 14:21 ` [PATCH v3 0/3] Fix lack of XDP TX queues Íñigo Huguet
2021-07-13 14:21   ` [PATCH v3 1/3] sfc: fix lack of XDP TX queues - error XDP TX failed (-22) Íñigo Huguet
2021-07-13 14:21   ` [PATCH v3 2/3] sfc: ensure correct number of XDP queues Íñigo Huguet
2021-07-13 14:21   ` [PATCH v3 3/3] sfc: add logs explaining XDP_TX/REDIRECT is not available Íñigo Huguet
2021-07-13 14:41   ` [PATCH v3 0/3] Fix lack of XDP TX queues Edward Cree
2021-07-13 20:10   ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).