* [PATCH net] mlx5e: add add missing BH locking around napi_schdule()
@ 2021-05-05 20:20 Jakub Kicinski
2021-05-18 19:23 ` Saeed Mahameed
0 siblings, 1 reply; 3+ messages in thread
From: Jakub Kicinski @ 2021-05-05 20:20 UTC (permalink / raw)
To: saeedm, eric.dumazet; +Cc: netdev, Jakub Kicinski
It's not correct to call napi_schedule() in pure process
context. Because we use __raise_softirq_irqoff() we require
callers to be in a context which will eventually lead to
softirq handling (hardirq, bh disabled, etc.).
With code as is users will see:
NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
Fixes: a8dd7ac12fc3 ("net/mlx5e: Generalize RQ activation")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
We may want to patch net-next once it opens to switch
from __raise_softirq_irqoff() to raise_softirq_irqoff().
The irq_count() check is probably negligable and we'd need
to split the hardirq / non-hardirq paths completely to
keep the current behaviour. Plus what's hardirq is murky
with RT enabled..
Eric WDYT?
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index bca832cdc4cb..11e50f5b3a1e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -889,10 +889,13 @@ int mlx5e_open_rq(struct mlx5e_params *params, struct mlx5e_rq_param *param,
void mlx5e_activate_rq(struct mlx5e_rq *rq)
{
set_bit(MLX5E_RQ_STATE_ENABLED, &rq->state);
- if (rq->icosq)
+ if (rq->icosq) {
mlx5e_trigger_irq(rq->icosq);
- else
+ } else {
+ local_bh_disable();
napi_schedule(rq->cq.napi);
+ local_bh_enable();
+ }
}
void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
--
2.31.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net] mlx5e: add add missing BH locking around napi_schdule()
2021-05-05 20:20 [PATCH net] mlx5e: add add missing BH locking around napi_schdule() Jakub Kicinski
@ 2021-05-18 19:23 ` Saeed Mahameed
2021-05-18 19:33 ` Jakub Kicinski
0 siblings, 1 reply; 3+ messages in thread
From: Saeed Mahameed @ 2021-05-18 19:23 UTC (permalink / raw)
To: Jakub Kicinski, eric.dumazet; +Cc: netdev
On Wed, 2021-05-05 at 13:20 -0700, Jakub Kicinski wrote:
> It's not correct to call napi_schedule() in pure process
> context. Because we use __raise_softirq_irqoff() we require
> callers to be in a context which will eventually lead to
> softirq handling (hardirq, bh disabled, etc.).
>
> With code as is users will see:
>
> NOHZ tick-stop error: Non-RCU local softirq work is pending, handler
> #08!!!
>
> Fixes: a8dd7ac12fc3 ("net/mlx5e: Generalize RQ activation")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> We may want to patch net-next once it opens to switch
> from __raise_softirq_irqoff() to raise_softirq_irqoff().
> The irq_count() check is probably negligable and we'd need
> to split the hardirq / non-hardirq paths completely to
> keep the current behaviour. Plus what's hardirq is murky
> with RT enabled..
>
> Eric WDYT?
>
I was waiting for Eric to reply, Anyway i think this patch is correct
as is,
Jakub do you want me to submit to net via net-mlx5 branch?
Another valid solution is that driver will avoid calling
napi_schedule() altogether from process context, we have the
mechanism of mlx5e_trigger_irq(), which can be utilized here, but needs
some re-factoring to move the icosq object from the main rx rq to the
containing channel object.
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index bca832cdc4cb..11e50f5b3a1e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -889,10 +889,13 @@ int mlx5e_open_rq(struct mlx5e_params *params,
> struct mlx5e_rq_param *param,
> void mlx5e_activate_rq(struct mlx5e_rq *rq)
> {
> set_bit(MLX5E_RQ_STATE_ENABLED, &rq->state);
> - if (rq->icosq)
> + if (rq->icosq) {
> mlx5e_trigger_irq(rq->icosq);
> - else
> + } else {
> + local_bh_disable();
> napi_schedule(rq->cq.napi);
> + local_bh_enable();
> + }
> }
>
> void mlx5e_deactivate_rq(struct mlx5e_rq *rq)
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net] mlx5e: add add missing BH locking around napi_schdule()
2021-05-18 19:23 ` Saeed Mahameed
@ 2021-05-18 19:33 ` Jakub Kicinski
0 siblings, 0 replies; 3+ messages in thread
From: Jakub Kicinski @ 2021-05-18 19:33 UTC (permalink / raw)
To: Saeed Mahameed; +Cc: eric.dumazet, netdev
On Tue, 18 May 2021 12:23:54 -0700 Saeed Mahameed wrote:
> On Wed, 2021-05-05 at 13:20 -0700, Jakub Kicinski wrote:
> > It's not correct to call napi_schedule() in pure process
> > context. Because we use __raise_softirq_irqoff() we require
> > callers to be in a context which will eventually lead to
> > softirq handling (hardirq, bh disabled, etc.).
> >
> > With code as is users will see:
> >
> > NOHZ tick-stop error: Non-RCU local softirq work is pending, handler
> > #08!!!
> >
> > Fixes: a8dd7ac12fc3 ("net/mlx5e: Generalize RQ activation")
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > ---
> > We may want to patch net-next once it opens to switch
> > from __raise_softirq_irqoff() to raise_softirq_irqoff().
> > The irq_count() check is probably negligable and we'd need
> > to split the hardirq / non-hardirq paths completely to
> > keep the current behaviour. Plus what's hardirq is murky
> > with RT enabled..
> >
> > Eric WDYT?
>
> I was waiting for Eric to reply, Anyway i think this patch is correct
> as is,
>
> Jakub do you want me to submit to net via net-mlx5 branch?
Yes, please. FWIW we had a short exchange with RT folks last Friday,
and it doesn't seem like RT is an issue, so tglx will likely take
care of just adding a lockdep check and maybe a helper for scheduling
from process ctx.
> Another valid solution is that driver will avoid calling
> napi_schedule() altogether from process context, we have the
> mechanism of mlx5e_trigger_irq(), which can be utilized here, but needs
> some re-factoring to move the icosq object from the main rx rq to the
> containing channel object.
Yea.. someone on your side would probably need to take care of that
kind of surgery. Apart from that no preference on which fix gets
applied.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-05-18 19:33 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 20:20 [PATCH net] mlx5e: add add missing BH locking around napi_schdule() Jakub Kicinski
2021-05-18 19:23 ` Saeed Mahameed
2021-05-18 19:33 ` Jakub Kicinski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.