stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
@ 2021-04-27 21:09 Ignat Korchagin
  2021-04-27 22:40 ` patchwork-bot+netdevbpf
  2021-04-29 14:22 ` Edward Cree
  0 siblings, 2 replies; 5+ messages in thread
From: Ignat Korchagin @ 2021-04-27 21:09 UTC (permalink / raw)
  To: ecree.xilinx, habetsm.xilinx, davem, kuba, netdev
  Cc: Ignat Korchagin, kernel-team, stable

efx->xdp_tx_queue_count is initially initialized to num_possible_cpus() and is
later used to allocate and traverse efx->xdp_tx_queues lookup array. However,
we may end up not initializing all the array slots with real queues during
probing. This results, for example, in a NULL pointer dereference, when running
"# ethtool -S <iface>", similar to below

[2570283.664955][T4126959] BUG: kernel NULL pointer dereference, address: 00000000000000f8
[2570283.681283][T4126959] #PF: supervisor read access in kernel mode
[2570283.695678][T4126959] #PF: error_code(0x0000) - not-present page
[2570283.710013][T4126959] PGD 0 P4D 0
[2570283.721649][T4126959] Oops: 0000 [#1] SMP PTI
[2570283.734108][T4126959] CPU: 23 PID: 4126959 Comm: ethtool Tainted: G           O      5.10.20-cloudflare-2021.3.1 #1
[2570283.752641][T4126959] Hardware name: <redacted>
[2570283.781408][T4126959] RIP: 0010:efx_ethtool_get_stats+0x2ca/0x330 [sfc]
[2570283.796073][T4126959] Code: 00 85 c0 74 39 48 8b 95 a8 0f 00 00 48 85 d2 74 2d 31 c0 eb 07 48 8b 95 a8 0f 00 00 48 63 c8 49 83 c4 08 83 c0 01 48 8b 14 ca <48> 8b 92 f8 00 00 00 49 89 54 24 f8 39 85 a0 0f 00 00 77 d7 48 8b
[2570283.831259][T4126959] RSP: 0018:ffffb79a77657ce8 EFLAGS: 00010202
[2570283.845121][T4126959] RAX: 0000000000000019 RBX: ffffb799cd0c9280 RCX: 0000000000000018
[2570283.860872][T4126959] RDX: 0000000000000000 RSI: ffff96dd970ce000 RDI: 0000000000000005
[2570283.876525][T4126959] RBP: ffff96dd86f0a000 R08: ffff96dd970ce480 R09: 000000000000005f
[2570283.892014][T4126959] R10: ffffb799cd0c9fff R11: ffffb799cd0c9000 R12: ffffb799cd0c94f8
[2570283.907406][T4126959] R13: ffffffffc11b1090 R14: ffff96dd970ce000 R15: ffffffffc11cd66c
[2570283.922705][T4126959] FS:  00007fa7723f8740(0000) GS:ffff96f51fac0000(0000) knlGS:0000000000000000
[2570283.938848][T4126959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2570283.952524][T4126959] CR2: 00000000000000f8 CR3: 0000001a73e6e006 CR4: 00000000007706e0
[2570283.967529][T4126959] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2570283.982400][T4126959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2570283.997308][T4126959] PKRU: 55555554
[2570284.007649][T4126959] Call Trace:
[2570284.017598][T4126959]  dev_ethtool+0x1832/0x2830

Fix this by adjusting efx->xdp_tx_queue_count after probing to reflect the true
value of initialized slots in efx->xdp_tx_queues.

Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Fixes: e26ca4b53582 ("sfc: reduce the number of requested xdp ev queues")
Cc: <stable@vger.kernel.org> # 5.12.x
---
 drivers/net/ethernet/sfc/efx_channels.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index 1bfeee283ea9..a3ca406a3561 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -914,6 +914,8 @@ int efx_set_channels(struct efx_nic *efx)
 			}
 		}
 	}
+	if (xdp_queue_number)
+		efx->xdp_tx_queue_count = xdp_queue_number;
 
 	rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);
 	if (rc)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
  2021-04-27 21:09 [PATCH] sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues Ignat Korchagin
@ 2021-04-27 22:40 ` patchwork-bot+netdevbpf
  2021-04-29 14:22 ` Edward Cree
  1 sibling, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-04-27 22:40 UTC (permalink / raw)
  To: Ignat Korchagin
  Cc: ecree.xilinx, habetsm.xilinx, davem, kuba, netdev, kernel-team, stable

Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Tue, 27 Apr 2021 22:09:38 +0100 you wrote:
> efx->xdp_tx_queue_count is initially initialized to num_possible_cpus() and is
> later used to allocate and traverse efx->xdp_tx_queues lookup array. However,
> we may end up not initializing all the array slots with real queues during
> probing. This results, for example, in a NULL pointer dereference, when running
> "# ethtool -S <iface>", similar to below
> 
> [2570283.664955][T4126959] BUG: kernel NULL pointer dereference, address: 00000000000000f8
> [2570283.681283][T4126959] #PF: supervisor read access in kernel mode
> [2570283.695678][T4126959] #PF: error_code(0x0000) - not-present page
> [2570283.710013][T4126959] PGD 0 P4D 0
> [2570283.721649][T4126959] Oops: 0000 [#1] SMP PTI
> [2570283.734108][T4126959] CPU: 23 PID: 4126959 Comm: ethtool Tainted: G           O      5.10.20-cloudflare-2021.3.1 #1
> [2570283.752641][T4126959] Hardware name: <redacted>
> [2570283.781408][T4126959] RIP: 0010:efx_ethtool_get_stats+0x2ca/0x330 [sfc]
> [2570283.796073][T4126959] Code: 00 85 c0 74 39 48 8b 95 a8 0f 00 00 48 85 d2 74 2d 31 c0 eb 07 48 8b 95 a8 0f 00 00 48 63 c8 49 83 c4 08 83 c0 01 48 8b 14 ca <48> 8b 92 f8 00 00 00 49 89 54 24 f8 39 85 a0 0f 00 00 77 d7 48 8b
> [2570283.831259][T4126959] RSP: 0018:ffffb79a77657ce8 EFLAGS: 00010202
> [2570283.845121][T4126959] RAX: 0000000000000019 RBX: ffffb799cd0c9280 RCX: 0000000000000018
> [2570283.860872][T4126959] RDX: 0000000000000000 RSI: ffff96dd970ce000 RDI: 0000000000000005
> [2570283.876525][T4126959] RBP: ffff96dd86f0a000 R08: ffff96dd970ce480 R09: 000000000000005f
> [2570283.892014][T4126959] R10: ffffb799cd0c9fff R11: ffffb799cd0c9000 R12: ffffb799cd0c94f8
> [2570283.907406][T4126959] R13: ffffffffc11b1090 R14: ffff96dd970ce000 R15: ffffffffc11cd66c
> [2570283.922705][T4126959] FS:  00007fa7723f8740(0000) GS:ffff96f51fac0000(0000) knlGS:0000000000000000
> [2570283.938848][T4126959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2570283.952524][T4126959] CR2: 00000000000000f8 CR3: 0000001a73e6e006 CR4: 00000000007706e0
> [2570283.967529][T4126959] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [2570283.982400][T4126959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [2570283.997308][T4126959] PKRU: 55555554
> [2570284.007649][T4126959] Call Trace:
> [2570284.017598][T4126959]  dev_ethtool+0x1832/0x2830
> 
> [...]

Here is the summary with links:
  - sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
    https://git.kernel.org/netdev/net-next/c/99ba0ea616aa

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
  2021-04-27 21:09 [PATCH] sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues Ignat Korchagin
  2021-04-27 22:40 ` patchwork-bot+netdevbpf
@ 2021-04-29 14:22 ` Edward Cree
  2021-04-29 14:49   ` Ignat Korchagin
  1 sibling, 1 reply; 5+ messages in thread
From: Edward Cree @ 2021-04-29 14:22 UTC (permalink / raw)
  To: Ignat Korchagin, habetsm.xilinx, davem, kuba, netdev; +Cc: kernel-team, stable

On 27/04/2021 22:09, Ignat Korchagin wrote:
> efx->xdp_tx_queue_count is initially initialized to num_possible_cpus() and is
> later used to allocate and traverse efx->xdp_tx_queues lookup array. However,
> we may end up not initializing all the array slots with real queues during
> probing. This results, for example, in a NULL pointer dereference, when running
> "# ethtool -S <iface>", similar to below
...
> diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
> index 1bfeee283ea9..a3ca406a3561 100644
> --- a/drivers/net/ethernet/sfc/efx_channels.c
> +++ b/drivers/net/ethernet/sfc/efx_channels.c
> @@ -914,6 +914,8 @@ int efx_set_channels(struct efx_nic *efx)
>  			}
>  		}
>  	}
> +	if (xdp_queue_number)
Wait, why is this guard condition needed?
What happens if we had nonzero efx->xdp_tx_queue_count initially, but we end up
 with no TXQs available for XDP at all (so xdp_queue_number == 0)?

-ed
> +		efx->xdp_tx_queue_count = xdp_queue_number;
>  
>  	rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);
>  	if (rc)
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
  2021-04-29 14:22 ` Edward Cree
@ 2021-04-29 14:49   ` Ignat Korchagin
  2021-04-29 15:04     ` Edward Cree
  0 siblings, 1 reply; 5+ messages in thread
From: Ignat Korchagin @ 2021-04-29 14:49 UTC (permalink / raw)
  To: Edward Cree
  Cc: habetsm.xilinx, David S. Miller, kuba, netdev, kernel-team, stable

On Thu, Apr 29, 2021 at 3:22 PM Edward Cree <ecree.xilinx@gmail.com> wrote:
>
> On 27/04/2021 22:09, Ignat Korchagin wrote:
> > efx->xdp_tx_queue_count is initially initialized to num_possible_cpus() and is
> > later used to allocate and traverse efx->xdp_tx_queues lookup array. However,
> > we may end up not initializing all the array slots with real queues during
> > probing. This results, for example, in a NULL pointer dereference, when running
> > "# ethtool -S <iface>", similar to below
> ...
> > diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
> > index 1bfeee283ea9..a3ca406a3561 100644
> > --- a/drivers/net/ethernet/sfc/efx_channels.c
> > +++ b/drivers/net/ethernet/sfc/efx_channels.c
> > @@ -914,6 +914,8 @@ int efx_set_channels(struct efx_nic *efx)
> >                       }
> >               }
> >       }
> > +     if (xdp_queue_number)
> Wait, why is this guard condition needed?
> What happens if we had nonzero efx->xdp_tx_queue_count initially, but we end up
>  with no TXQs available for XDP at all (so xdp_queue_number == 0)?
>
> -ed

My thoughts were: efx->xdp_tx_queue_count is originally used to
allocate efx->xdp_tx_queues.
So, if xdp_queue_number ends up being 0, we should keep
efx->xdp_tx_queue_count positive not
to forget to release efx->xdp_tx_queues (because most checks are
efx->xdp_tx_queue_count && efx->xdp_tx_queues).

I'm not familiar enough with SFC internals to definitely say if it is
even possible to have
xdp_queue_number == 0 while having efx->xdp_tx_queue_count > 0, but my
understanding is that
it should not be possible due to the checks in the driver init path,
when we actually determine the number
of queues, channels, events per channel etc.

Ignat

> > +             efx->xdp_tx_queue_count = xdp_queue_number;
> >
> >       rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels);
> >       if (rc)
> >
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues
  2021-04-29 14:49   ` Ignat Korchagin
@ 2021-04-29 15:04     ` Edward Cree
  0 siblings, 0 replies; 5+ messages in thread
From: Edward Cree @ 2021-04-29 15:04 UTC (permalink / raw)
  To: Ignat Korchagin
  Cc: habetsm.xilinx, David S. Miller, kuba, netdev, kernel-team, stable

On 29/04/2021 15:49, Ignat Korchagin wrote:
> On Thu, Apr 29, 2021 at 3:22 PM Edward Cree <ecree.xilinx@gmail.com> wrote:
>>
>> On 27/04/2021 22:09, Ignat Korchagin wrote:
>>> +     if (xdp_queue_number)
>> Wait, why is this guard condition needed?
>> What happens if we had nonzero efx->xdp_tx_queue_count initially, but we end up
>>  with no TXQs available for XDP at all (so xdp_queue_number == 0)?
>>
>> -ed
> 
> My thoughts were: efx->xdp_tx_queue_count is originally used to
> allocate efx->xdp_tx_queues.
> So, if xdp_queue_number ends up being 0, we should keep
> efx->xdp_tx_queue_count positive not
> to forget to release efx->xdp_tx_queues (because most checks are
> efx->xdp_tx_queue_count && efx->xdp_tx_queues).
Well, we allocated it in this function, so could we not just free it
 (and NULL it) if we get here with xdp_queue_number == 0?
Assuming it even makes sense for those checks to be that conjunction,
 and not just efx->xdp_tx_queues.

> I'm not familiar enough with SFC internals to definitely say if it is
> even possible to have
> xdp_queue_number == 0 while having efx->xdp_tx_queue_count > 0
If it's possible for us to get xdp_queue_number != efx->xdp_tx_queue_count
 at all (which I can't remember exactly how it happens, but I think it's a
 case of not getting as many VIs back from firmware as we wanted, which
 happens after the initial determination of numbers of queues & channels),
 then it's possible that our number of available TXQs is reduced far
 enough that we don't have any left for XDP.
At least, I think so; this part of the driver confuses me too :S

-ed

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-29 15:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 21:09 [PATCH] sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues Ignat Korchagin
2021-04-27 22:40 ` patchwork-bot+netdevbpf
2021-04-29 14:22 ` Edward Cree
2021-04-29 14:49   ` Ignat Korchagin
2021-04-29 15:04     ` Edward Cree

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).