All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net/mlx5e: do as little as possible in napi poll when budget is 0
@ 2023-05-12  2:57 Jakub Kicinski
  2023-05-15  6:10 ` Tariq Toukan
  2023-05-15 16:39 ` Eric Dumazet
  0 siblings, 2 replies; 4+ messages in thread
From: Jakub Kicinski @ 2023-05-12  2:57 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, Jakub Kicinski, saeedm, leon, brouer, tariqt

NAPI gets called with budget of 0 from netpoll, which has interrupts
disabled. We should try to free some space on Tx rings and nothing
else.

Specifically do not try to handle XDP TX or try to refill Rx buffers -
we can't use the page pool from IRQ context. Don't check if IRQs moved,
either, that makes no sense in netpoll. Netpoll calls _all_ the rings
from whatever CPU it happens to be invoked on.

In general do as little as possible, the work quickly adds up when
there's tens of rings to poll.

The immediate stack trace I was seeing is:

    __do_softirq+0xd1/0x2c0
    __local_bh_enable_ip+0xc7/0x120
    </IRQ>
    <TASK>
    page_pool_put_defragged_page+0x267/0x320
    mlx5e_free_xdpsq_desc+0x99/0xd0
    mlx5e_poll_xdpsq_cq+0x138/0x3b0
    mlx5e_napi_poll+0xc3/0x8b0
    netpoll_poll_dev+0xce/0x150

AFAIU page pool takes a BH lock, releases it and since BH is now
enabled tries to run softirqs.

Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
I'm pointing Fixes at where page_pool was added, although that's
probably not 100% fair.

CC: saeedm@nvidia.com
CC: leon@kernel.org
CC: brouer@redhat.com
CC: tariqt@mellanox.com
---
 .../net/ethernet/mellanox/mlx5/core/en_txrx.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index a50bfda18e96..bd4294dd72da 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -161,20 +161,25 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
 		}
 	}
 
+	/* budget=0 means we may be in IRQ context, do as little as possible */
+	if (unlikely(!budget)) {
+		/* no work done, can't be asked to re-enable IRQs */
+		WARN_ON_ONCE(napi_complete_done(napi, work_done));
+		goto out;
+	}
+
 	busy |= mlx5e_poll_xdpsq_cq(&c->xdpsq.cq);
 
 	if (c->xdp)
 		busy |= mlx5e_poll_xdpsq_cq(&c->rq_xdpsq.cq);
 
-	if (likely(budget)) { /* budget=0 means: don't poll rx rings */
-		if (xsk_open)
-			work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget);
+	if (xsk_open)
+		work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget);
 
-		if (likely(budget - work_done))
-			work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done);
+	if (likely(budget - work_done))
+		work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done);
 
-		busy |= work_done == budget;
-	}
+	busy |= work_done == budget;
 
 	mlx5e_poll_ico_cq(&c->icosq.cq);
 	if (mlx5e_poll_ico_cq(&c->async_icosq.cq))
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net] net/mlx5e: do as little as possible in napi poll when budget is 0
  2023-05-12  2:57 [PATCH net] net/mlx5e: do as little as possible in napi poll when budget is 0 Jakub Kicinski
@ 2023-05-15  6:10 ` Tariq Toukan
  2023-05-15 16:39 ` Eric Dumazet
  1 sibling, 0 replies; 4+ messages in thread
From: Tariq Toukan @ 2023-05-15  6:10 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, saeedm, leon, brouer, tariqt, Tariq Toukan



On 12/05/2023 5:57, Jakub Kicinski wrote:
> NAPI gets called with budget of 0 from netpoll, which has interrupts
> disabled. We should try to free some space on Tx rings and nothing
> else.
> 
> Specifically do not try to handle XDP TX or try to refill Rx buffers -
> we can't use the page pool from IRQ context. Don't check if IRQs moved,
> either, that makes no sense in netpoll. Netpoll calls _all_ the rings
> from whatever CPU it happens to be invoked on.
> 
> In general do as little as possible, the work quickly adds up when
> there's tens of rings to poll.
> 
> The immediate stack trace I was seeing is:
> 
>      __do_softirq+0xd1/0x2c0
>      __local_bh_enable_ip+0xc7/0x120
>      </IRQ>
>      <TASK>
>      page_pool_put_defragged_page+0x267/0x320
>      mlx5e_free_xdpsq_desc+0x99/0xd0
>      mlx5e_poll_xdpsq_cq+0x138/0x3b0
>      mlx5e_napi_poll+0xc3/0x8b0
>      netpoll_poll_dev+0xce/0x150
> 
> AFAIU page pool takes a BH lock, releases it and since BH is now
> enabled tries to run softirqs.
> 
> Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> I'm pointing Fixes at where page_pool was added, although that's
> probably not 100% fair.
> 
> CC: saeedm@nvidia.com
> CC: leon@kernel.org
> CC: brouer@redhat.com
> CC: tariqt@mellanox.com
> ---

Thanks for your patch.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] net/mlx5e: do as little as possible in napi poll when budget is 0
  2023-05-12  2:57 [PATCH net] net/mlx5e: do as little as possible in napi poll when budget is 0 Jakub Kicinski
  2023-05-15  6:10 ` Tariq Toukan
@ 2023-05-15 16:39 ` Eric Dumazet
  2023-05-17  2:02   ` Jakub Kicinski
  1 sibling, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2023-05-15 16:39 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, pabeni, saeedm, leon, brouer, tariqt

On Fri, May 12, 2023 at 4:57 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> NAPI gets called with budget of 0 from netpoll, which has interrupts
> disabled. We should try to free some space on Tx rings and nothing
> else.
>
> Specifically do not try to handle XDP TX or try to refill Rx buffers -
> we can't use the page pool from IRQ context. Don't check if IRQs moved,
> either, that makes no sense in netpoll. Netpoll calls _all_ the rings
> from whatever CPU it happens to be invoked on.
>
> In general do as little as possible, the work quickly adds up when
> there's tens of rings to poll.
>
> The immediate stack trace I was seeing is:
>
>     __do_softirq+0xd1/0x2c0
>     __local_bh_enable_ip+0xc7/0x120
>     </IRQ>
>     <TASK>
>     page_pool_put_defragged_page+0x267/0x320
>     mlx5e_free_xdpsq_desc+0x99/0xd0
>     mlx5e_poll_xdpsq_cq+0x138/0x3b0
>     mlx5e_napi_poll+0xc3/0x8b0
>     netpoll_poll_dev+0xce/0x150
>
> AFAIU page pool takes a BH lock, releases it and since BH is now
> enabled tries to run softirqs.
>
> Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> I'm pointing Fixes at where page_pool was added, although that's
> probably not 100% fair.
>
> CC: saeedm@nvidia.com
> CC: leon@kernel.org
> CC: brouer@redhat.com
> CC: tariqt@mellanox.com
> ---
>  .../net/ethernet/mellanox/mlx5/core/en_txrx.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
> index a50bfda18e96..bd4294dd72da 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
> @@ -161,20 +161,25 @@ int mlx5e_napi_poll(struct napi_struct *napi, int budget)
>                 }
>         }
>
> +       /* budget=0 means we may be in IRQ context, do as little as possible */
> +       if (unlikely(!budget)) {
> +               /* no work done, can't be asked to re-enable IRQs */
> +               WARN_ON_ONCE(napi_complete_done(napi, work_done));

This is not clear why you call napi_complete_done() here ?

Note the fine doc  ( https://www.kernel.org/doc/html/next/networking/napi.html )
says:

<quote>If the budget is 0 napi_complete_done() should never be called.</quote>



> +               goto out;
> +       }
> +
>         busy |= mlx5e_poll_xdpsq_cq(&c->xdpsq.cq);
>
>         if (c->xdp)
>                 busy |= mlx5e_poll_xdpsq_cq(&c->rq_xdpsq.cq);
>
> -       if (likely(budget)) { /* budget=0 means: don't poll rx rings */
> -               if (xsk_open)
> -                       work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget);
> +       if (xsk_open)
> +               work_done = mlx5e_poll_rx_cq(&xskrq->cq, budget);
>
> -               if (likely(budget - work_done))
> -                       work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done);
> +       if (likely(budget - work_done))
> +               work_done += mlx5e_poll_rx_cq(&rq->cq, budget - work_done);
>
> -               busy |= work_done == budget;
> -       }
> +       busy |= work_done == budget;
>
>         mlx5e_poll_ico_cq(&c->icosq.cq);
>         if (mlx5e_poll_ico_cq(&c->async_icosq.cq))
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net] net/mlx5e: do as little as possible in napi poll when budget is 0
  2023-05-15 16:39 ` Eric Dumazet
@ 2023-05-17  2:02   ` Jakub Kicinski
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2023-05-17  2:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, pabeni, saeedm, leon, brouer, tariqt

On Mon, 15 May 2023 18:39:20 +0200 Eric Dumazet wrote:
> > +       /* budget=0 means we may be in IRQ context, do as little as possible */
> > +       if (unlikely(!budget)) {
> > +               /* no work done, can't be asked to re-enable IRQs */
> > +               WARN_ON_ONCE(napi_complete_done(napi, work_done));  
> 
> This is not clear why you call napi_complete_done() here ?
> 
> Note the fine doc  ( https://www.kernel.org/doc/html/next/networking/napi.html )
> says:
> 
> <quote>If the budget is 0 napi_complete_done() should never be called.</quote>

fair point, I was trying to be conservative because I think it would
have gotten called before

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-17  2:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-12  2:57 [PATCH net] net/mlx5e: do as little as possible in napi poll when budget is 0 Jakub Kicinski
2023-05-15  6:10 ` Tariq Toukan
2023-05-15 16:39 ` Eric Dumazet
2023-05-17  2:02   ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.