bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] i40e: fix the panic when running bpf in xdpdrv mode
@ 2021-04-12  6:57 kerneljasonxing
  2021-04-12 21:52 ` Jesse Brandeburg
  2021-04-13  2:50 ` [PATCH net v2] " kerneljasonxing
  0 siblings, 2 replies; 13+ messages in thread
From: kerneljasonxing @ 2021-04-12  6:57 UTC (permalink / raw)
  To: jesse.brandeburg, anthony.l.nguyen, davem, kuba, ast, daniel,
	hawk, john.fastabend, andrii, kafai, songliubraving, yhs,
	kpsingh
  Cc: intel-wired-lan, netdev, linux-kernel, bpf, kerneljasonxing,
	Jason Xing, Shujin Li

From: Jason Xing <xingwanli@kuaishou.com>

Fix this by add more rules to calculate the value of @rss_size_max which
could be used in allocating the queues when bpf is loaded, which, however,
could cause the failure and then triger the NULL pointer of vsi->rx_rings.
Prio to this fix, the machine doesn't care about how many cpus are online
and then allocates 256 queues on the machine with 32 cpus online
actually.

Once the load of bpf begins, the log will go like this "failed to get
tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
failed".

Thus, I attach the key information of the crash-log here.

BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
Call Trace:
[2160294.717292]  ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
[2160294.717666]  dev_xdp_install+0x4f/0x70
[2160294.718036]  dev_change_xdp_fd+0x11f/0x230
[2160294.718380]  ? dev_disable_lro+0xe0/0xe0
[2160294.718705]  do_setlink+0xac7/0xe70
[2160294.719035]  ? __nla_parse+0xed/0x120
[2160294.719365]  rtnl_newlink+0x73b/0x860

Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
Signed-off-by: Shujin Li <lishujin@kuaishou.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 521ea9d..4e9a247 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11867,6 +11867,7 @@ static int i40e_sw_init(struct i40e_pf *pf)
 {
 	int err = 0;
 	int size;
+	u16 pow;
 
 	/* Set default capability flags */
 	pf->flags = I40E_FLAG_RX_CSUM_ENABLED |
@@ -11885,6 +11886,11 @@ static int i40e_sw_init(struct i40e_pf *pf)
 	pf->rss_table_size = pf->hw.func_caps.rss_table_size;
 	pf->rss_size_max = min_t(int, pf->rss_size_max,
 				 pf->hw.func_caps.num_tx_qp);
+
+	/* find the next higher power-of-2 of num cpus */
+	pow = roundup_pow_of_two(num_online_cpus());
+	pf->rss_size_max = min_t(int, pf->rss_size_max, pow);
+
 	if (pf->hw.func_caps.rss) {
 		pf->flags |= I40E_FLAG_RSS_ENABLED;
 		pf->alloc_rss_size = min_t(int, pf->rss_size_max,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-12  6:57 [PATCH] i40e: fix the panic when running bpf in xdpdrv mode kerneljasonxing
@ 2021-04-12 21:52 ` Jesse Brandeburg
  2021-04-13  2:17   ` Jason Xing
  2021-04-13  2:50 ` [PATCH net v2] " kerneljasonxing
  1 sibling, 1 reply; 13+ messages in thread
From: Jesse Brandeburg @ 2021-04-12 21:52 UTC (permalink / raw)
  To: kerneljasonxing
  Cc: anthony.l.nguyen, davem, kuba, ast, daniel, hawk, john.fastabend,
	andrii, kafai, songliubraving, yhs, kpsingh, intel-wired-lan,
	netdev, linux-kernel, bpf, Jason Xing, Shujin Li

kerneljasonxing@gmail.com wrote:

> From: Jason Xing <xingwanli@kuaishou.com>
> 
> Re: [PATCH] i40e: fix the panic when running bpf in xdpdrv mode

Please use netdev style subject lines when patching net kernel to
indicate which kernel tree this is targeted at, "net" or "net-next"
[PATCH net v2] i40e: ...

> Fix this by add more rules to calculate the value of @rss_size_max which

Fix this panic by adding ...

> could be used in allocating the queues when bpf is loaded, which, however,
> could cause the failure and then triger the NULL pointer of vsi->rx_rings.

trigger

> Prio to this fix, the machine doesn't care about how many cpus are online
> and then allocates 256 queues on the machine with 32 cpus online
> actually.
> 
> Once the load of bpf begins, the log will go like this "failed to get
> tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
> failed".
> 
> Thus, I attach the key information of the crash-log here.
> 
> BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000000
> RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
> Call Trace:
> [2160294.717292]  ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
> [2160294.717666]  dev_xdp_install+0x4f/0x70
> [2160294.718036]  dev_change_xdp_fd+0x11f/0x230
> [2160294.718380]  ? dev_disable_lro+0xe0/0xe0
> [2160294.718705]  do_setlink+0xac7/0xe70
> [2160294.719035]  ? __nla_parse+0xed/0x120
> [2160294.719365]  rtnl_newlink+0x73b/0x860
> 
> Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
> Signed-off-by: Shujin Li <lishujin@kuaishou.com>

if you send to "net" - I suspect you should supply a Fixes: line, above
the sign-offs.
In this case however, this bug has been here since the beginning of the
driver, but the patch will easily apply, so please supply

Fixes: 41c445ff0f48 ("i40e: main driver core")

> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 521ea9d..4e9a247 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -11867,6 +11867,7 @@ static int i40e_sw_init(struct i40e_pf *pf)
>  {
>  	int err = 0;
>  	int size;
> +	u16 pow;
>  
>  	/* Set default capability flags */
>  	pf->flags = I40E_FLAG_RX_CSUM_ENABLED |
> @@ -11885,6 +11886,11 @@ static int i40e_sw_init(struct i40e_pf *pf)
>  	pf->rss_table_size = pf->hw.func_caps.rss_table_size;
>  	pf->rss_size_max = min_t(int, pf->rss_size_max,
>  				 pf->hw.func_caps.num_tx_qp);
> +
> +	/* find the next higher power-of-2 of num cpus */
> +	pow = roundup_pow_of_two(num_online_cpus());
> +	pf->rss_size_max = min_t(int, pf->rss_size_max, pow);
> +

The fix itself is fine, and is correct as far as I can tell, thank you
for sending the patch!

>  	if (pf->hw.func_caps.rss) {
>  		pf->flags |= I40E_FLAG_RSS_ENABLED;
>  		pf->alloc_rss_size = min_t(int, pf->rss_size_max,



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-12 21:52 ` Jesse Brandeburg
@ 2021-04-13  2:17   ` Jason Xing
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Xing @ 2021-04-13  2:17 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: anthony.l.nguyen, David Miller, kuba, ast, daniel, hawk,
	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh,
	intel-wired-lan, netdev, LKML, bpf, Jason Xing, Shujin Li

On Tue, Apr 13, 2021 at 5:52 AM Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
>
> kerneljasonxing@gmail.com wrote:
>
> > From: Jason Xing <xingwanli@kuaishou.com>
> >
> > Re: [PATCH] i40e: fix the panic when running bpf in xdpdrv mode
>
> Please use netdev style subject lines when patching net kernel to
> indicate which kernel tree this is targeted at, "net" or "net-next"
> [PATCH net v2] i40e: ...
>
> > Fix this by add more rules to calculate the value of @rss_size_max which
>
> Fix this panic by adding ...
>
> > could be used in allocating the queues when bpf is loaded, which, however,
> > could cause the failure and then triger the NULL pointer of vsi->rx_rings.
>
> trigger
>
> > Prio to this fix, the machine doesn't care about how many cpus are online
> > and then allocates 256 queues on the machine with 32 cpus online
> > actually.
> >
> > Once the load of bpf begins, the log will go like this "failed to get
> > tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
> > failed".
> >
> > Thus, I attach the key information of the crash-log here.
> >
> > BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000000
> > RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
> > Call Trace:
> > [2160294.717292]  ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
> > [2160294.717666]  dev_xdp_install+0x4f/0x70
> > [2160294.718036]  dev_change_xdp_fd+0x11f/0x230
> > [2160294.718380]  ? dev_disable_lro+0xe0/0xe0
> > [2160294.718705]  do_setlink+0xac7/0xe70
> > [2160294.719035]  ? __nla_parse+0xed/0x120
> > [2160294.719365]  rtnl_newlink+0x73b/0x860
> >
> > Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
> > Signed-off-by: Shujin Li <lishujin@kuaishou.com>
>
> if you send to "net" - I suspect you should supply a Fixes: line, above
> the sign-offs.
> In this case however, this bug has been here since the beginning of the
> driver, but the patch will easily apply, so please supply
>
> Fixes: 41c445ff0f48 ("i40e: main driver core")
>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > index 521ea9d..4e9a247 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> > @@ -11867,6 +11867,7 @@ static int i40e_sw_init(struct i40e_pf *pf)
> >  {
> >       int err = 0;
> >       int size;
> > +     u16 pow;
> >
> >       /* Set default capability flags */
> >       pf->flags = I40E_FLAG_RX_CSUM_ENABLED |
> > @@ -11885,6 +11886,11 @@ static int i40e_sw_init(struct i40e_pf *pf)
> >       pf->rss_table_size = pf->hw.func_caps.rss_table_size;
> >       pf->rss_size_max = min_t(int, pf->rss_size_max,
> >                                pf->hw.func_caps.num_tx_qp);
> > +
> > +     /* find the next higher power-of-2 of num cpus */
> > +     pow = roundup_pow_of_two(num_online_cpus());
> > +     pf->rss_size_max = min_t(int, pf->rss_size_max, pow);
> > +
>
> The fix itself is fine, and is correct as far as I can tell, thank you
> for sending the patch!
>

Thanks for your advice. I'm going to send the patch v2 :)

Jason

> >       if (pf->hw.func_caps.rss) {
> >               pf->flags |= I40E_FLAG_RSS_ENABLED;
> >               pf->alloc_rss_size = min_t(int, pf->rss_size_max,
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net v2] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-12  6:57 [PATCH] i40e: fix the panic when running bpf in xdpdrv mode kerneljasonxing
  2021-04-12 21:52 ` Jesse Brandeburg
@ 2021-04-13  2:50 ` kerneljasonxing
  2021-04-13 16:18   ` Jesse Brandeburg
  2021-04-14  2:34   ` [PATCH net v3] " kerneljasonxing
  1 sibling, 2 replies; 13+ messages in thread
From: kerneljasonxing @ 2021-04-13  2:50 UTC (permalink / raw)
  To: jesse.brandeburg, anthony.l.nguyen, davem, kuba, ast, daniel,
	hawk, john.fastabend, andrii, kafai, songliubraving, yhs,
	kpsingh
  Cc: netdev, linux-kernel, bpf, kerneljasonxing, Jason Xing, Shujin Li

From: Jason Xing <xingwanli@kuaishou.com>

Fix this panic by adding more rules to calculate the value of @rss_size_max
which could be used in allocating the queues when bpf is loaded, which,
however, could cause the failure and then trigger the NULL pointer of
vsi->rx_rings. Prio to this fix, the machine doesn't care about how many
cpus are online and then allocates 256 queues on the machine with 32 cpus
online actually.

Once the load of bpf begins, the log will go like this "failed to get
tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
failed".

Thus, I attach the key information of the crash-log here.

BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
Call Trace:
[2160294.717292]  ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
[2160294.717666]  dev_xdp_install+0x4f/0x70
[2160294.718036]  dev_change_xdp_fd+0x11f/0x230
[2160294.718380]  ? dev_disable_lro+0xe0/0xe0
[2160294.718705]  do_setlink+0xac7/0xe70
[2160294.719035]  ? __nla_parse+0xed/0x120
[2160294.719365]  rtnl_newlink+0x73b/0x860

Fixes: 41c445ff0f48 ("i40e: main driver core")

Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
Signed-off-by: Shujin Li <lishujin@kuaishou.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 521ea9d..4e9a247 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11867,6 +11867,7 @@ static int i40e_sw_init(struct i40e_pf *pf)
 {
 	int err = 0;
 	int size;
+	u16 pow;
 
 	/* Set default capability flags */
 	pf->flags = I40E_FLAG_RX_CSUM_ENABLED |
@@ -11885,6 +11886,11 @@ static int i40e_sw_init(struct i40e_pf *pf)
 	pf->rss_table_size = pf->hw.func_caps.rss_table_size;
 	pf->rss_size_max = min_t(int, pf->rss_size_max,
 				 pf->hw.func_caps.num_tx_qp);
+
+	/* find the next higher power-of-2 of num cpus */
+	pow = roundup_pow_of_two(num_online_cpus());
+	pf->rss_size_max = min_t(int, pf->rss_size_max, pow);
+
 	if (pf->hw.func_caps.rss) {
 		pf->flags |= I40E_FLAG_RSS_ENABLED;
 		pf->alloc_rss_size = min_t(int, pf->rss_size_max,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-13  2:50 ` [PATCH net v2] " kerneljasonxing
@ 2021-04-13 16:18   ` Jesse Brandeburg
  2021-04-14  1:37     ` Jason Xing
  2021-04-15  1:12     ` Jason Xing
  2021-04-14  2:34   ` [PATCH net v3] " kerneljasonxing
  1 sibling, 2 replies; 13+ messages in thread
From: Jesse Brandeburg @ 2021-04-13 16:18 UTC (permalink / raw)
  To: kerneljasonxing
  Cc: anthony.l.nguyen, davem, kuba, ast, daniel, hawk, john.fastabend,
	andrii, kafai, songliubraving, yhs, kpsingh, netdev,
	linux-kernel, bpf, Jason Xing, Shujin Li, intel-wired-lan

kerneljasonxing@gmail.com wrote:

> From: Jason Xing <xingwanli@kuaishou.com>

Hi Jason,

Sorry, I missed this on the first time: Added intel-wired-lan,
please include on any future submissions for Intel drivers.
get-maintainers script might help here?

> 
> Fix this panic by adding more rules to calculate the value of @rss_size_max
> which could be used in allocating the queues when bpf is loaded, which,
> however, could cause the failure and then trigger the NULL pointer of
> vsi->rx_rings. Prio to this fix, the machine doesn't care about how many
> cpus are online and then allocates 256 queues on the machine with 32 cpus
> online actually.
> 
> Once the load of bpf begins, the log will go like this "failed to get
> tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
> failed".
> 
> Thus, I attach the key information of the crash-log here.
> 
> BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000000
> RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
> Call Trace:
> [2160294.717292]  ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
> [2160294.717666]  dev_xdp_install+0x4f/0x70
> [2160294.718036]  dev_change_xdp_fd+0x11f/0x230
> [2160294.718380]  ? dev_disable_lro+0xe0/0xe0
> [2160294.718705]  do_setlink+0xac7/0xe70
> [2160294.719035]  ? __nla_parse+0xed/0x120
> [2160294.719365]  rtnl_newlink+0x73b/0x860
> 
> Fixes: 41c445ff0f48 ("i40e: main driver core")
> 

This Fixes line should be connected to the Sign offs with
no linefeeds between.

> Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
> Signed-off-by: Shujin Li <lishujin@kuaishou.com>

Did Shujin contribute to this patch? Why are they signing off? If
they developed this patch with you, it should say:
Co-developed-by: Shujin ....
Signed-off-by: Shujin ...
Signed-off-by: Jason ...

Your signature should be last if you sent the patch. The sign-offs are
like a chain of custody, please review 
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by

Thanks,
 Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-13 16:18   ` Jesse Brandeburg
@ 2021-04-14  1:37     ` Jason Xing
  2021-04-15  1:12     ` Jason Xing
  1 sibling, 0 replies; 13+ messages in thread
From: Jason Xing @ 2021-04-14  1:37 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: anthony.l.nguyen, David Miller, kuba, ast, daniel, hawk,
	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh,
	netdev, LKML, bpf, Jason Xing, Shujin Li, intel-wired-lan

On Wed, Apr 14, 2021 at 12:27 AM Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
>
> kerneljasonxing@gmail.com wrote:
>
> > From: Jason Xing <xingwanli@kuaishou.com>
>
> Hi Jason,
>
> Sorry, I missed this on the first time: Added intel-wired-lan,
> please include on any future submissions for Intel drivers.
> get-maintainers script might help here?
>

Hi, Jesse

In the first patch, I did send to intel-wired-lan but it told me that
it is open for the member only, so
I got rid of it in this patch v2.

> >
> > Fix this panic by adding more rules to calculate the value of @rss_size_max
> > which could be used in allocating the queues when bpf is loaded, which,
> > however, could cause the failure and then trigger the NULL pointer of
> > vsi->rx_rings. Prio to this fix, the machine doesn't care about how many
> > cpus are online and then allocates 256 queues on the machine with 32 cpus
> > online actually.
> >
> > Once the load of bpf begins, the log will go like this "failed to get
> > tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
> > failed".
> >
> > Thus, I attach the key information of the crash-log here.
> >
> > BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000000
> > RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
> > Call Trace:
> > [2160294.717292]  ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
> > [2160294.717666]  dev_xdp_install+0x4f/0x70
> > [2160294.718036]  dev_change_xdp_fd+0x11f/0x230
> > [2160294.718380]  ? dev_disable_lro+0xe0/0xe0
> > [2160294.718705]  do_setlink+0xac7/0xe70
> > [2160294.719035]  ? __nla_parse+0xed/0x120
> > [2160294.719365]  rtnl_newlink+0x73b/0x860
> >
> > Fixes: 41c445ff0f48 ("i40e: main driver core")
> >
>
> This Fixes line should be connected to the Sign offs with
> no linefeeds between.
>
> > Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
> > Signed-off-by: Shujin Li <lishujin@kuaishou.com>
>
> Did Shujin contribute to this patch? Why are they signing off? If
> they developed this patch with you, it should say:
> Co-developed-by: Shujin ....
> Signed-off-by: Shujin ...
> Signed-off-by: Jason ...
>

Well, yes, I will add a Co-developed-by label later.

> Your signature should be last if you sent the patch. The sign-offs are
> like a chain of custody, please review
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by
>

Thanks so much for your detailed instructions. I'm about to correct
them all together and then resend the patch v3.

Jason

> Thanks,
>  Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net v3] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-13  2:50 ` [PATCH net v2] " kerneljasonxing
  2021-04-13 16:18   ` Jesse Brandeburg
@ 2021-04-14  2:34   ` kerneljasonxing
  2021-04-15  2:06     ` Jesse Brandeburg
  2021-04-15 21:40     ` patchwork-bot+netdevbpf
  1 sibling, 2 replies; 13+ messages in thread
From: kerneljasonxing @ 2021-04-14  2:34 UTC (permalink / raw)
  To: jesse.brandeburg, anthony.l.nguyen, davem, kuba, ast, daniel,
	hawk, john.fastabend, andrii, kafai, songliubraving, yhs,
	kpsingh
  Cc: intel-wired-lan, netdev, linux-kernel, bpf, Jason Xing, Shujin Li

From: Jason Xing <xingwanli@kuaishou.com>

Fix this panic by adding more rules to calculate the value of @rss_size_max
which could be used in allocating the queues when bpf is loaded, which,
however, could cause the failure and then trigger the NULL pointer of
vsi->rx_rings. Prio to this fix, the machine doesn't care about how many
cpus are online and then allocates 256 queues on the machine with 32 cpus
online actually.

Once the load of bpf begins, the log will go like this "failed to get
tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
failed".

Thus, I attach the key information of the crash-log here.

BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
Call Trace:
[2160294.717292]  ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
[2160294.717666]  dev_xdp_install+0x4f/0x70
[2160294.718036]  dev_change_xdp_fd+0x11f/0x230
[2160294.718380]  ? dev_disable_lro+0xe0/0xe0
[2160294.718705]  do_setlink+0xac7/0xe70
[2160294.719035]  ? __nla_parse+0xed/0x120
[2160294.719365]  rtnl_newlink+0x73b/0x860

Fixes: 41c445ff0f48 ("i40e: main driver core")
Co-developed-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Shujin Li <lishujin@kuaishou.com>
Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 521ea9d..4e9a247 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11867,6 +11867,7 @@ static int i40e_sw_init(struct i40e_pf *pf)
 {
 	int err = 0;
 	int size;
+	u16 pow;
 
 	/* Set default capability flags */
 	pf->flags = I40E_FLAG_RX_CSUM_ENABLED |
@@ -11885,6 +11886,11 @@ static int i40e_sw_init(struct i40e_pf *pf)
 	pf->rss_table_size = pf->hw.func_caps.rss_table_size;
 	pf->rss_size_max = min_t(int, pf->rss_size_max,
 				 pf->hw.func_caps.num_tx_qp);
+
+	/* find the next higher power-of-2 of num cpus */
+	pow = roundup_pow_of_two(num_online_cpus());
+	pf->rss_size_max = min_t(int, pf->rss_size_max, pow);
+
 	if (pf->hw.func_caps.rss) {
 		pf->flags |= I40E_FLAG_RSS_ENABLED;
 		pf->alloc_rss_size = min_t(int, pf->rss_size_max,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-13 16:18   ` Jesse Brandeburg
  2021-04-14  1:37     ` Jason Xing
@ 2021-04-15  1:12     ` Jason Xing
  2021-04-15  2:08       ` Jesse Brandeburg
  1 sibling, 1 reply; 13+ messages in thread
From: Jason Xing @ 2021-04-15  1:12 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: anthony.l.nguyen, David Miller, kuba, ast, daniel, hawk,
	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh,
	netdev, LKML, bpf, Jason Xing, Shujin Li, intel-wired-lan

On Wed, Apr 14, 2021 at 12:27 AM Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
>
> kerneljasonxing@gmail.com wrote:
>
> > From: Jason Xing <xingwanli@kuaishou.com>
>
> Hi Jason,
>
> Sorry, I missed this on the first time: Added intel-wired-lan,
> please include on any future submissions for Intel drivers.
> get-maintainers script might help here?
>

Probably I got this wrong in the last email. Did you mean that I should add
intel-wired-lan in the title not the cc list? It seems I should put
this together on
the next submission like this:

[Intel-wired-lan] [PATCH net v4]

Am I missing something?

Thanks,
Jason

> >
> > Fix this panic by adding more rules to calculate the value of @rss_size_max
> > which could be used in allocating the queues when bpf is loaded, which,
> > however, could cause the failure and then trigger the NULL pointer of
> > vsi->rx_rings. Prio to this fix, the machine doesn't care about how many
> > cpus are online and then allocates 256 queues on the machine with 32 cpus
> > online actually.
> >
> > Once the load of bpf begins, the log will go like this "failed to get
> > tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
> > failed".
> >
> > Thus, I attach the key information of the crash-log here.
> >
> > BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000000
> > RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
> > Call Trace:
> > [2160294.717292]  ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
> > [2160294.717666]  dev_xdp_install+0x4f/0x70
> > [2160294.718036]  dev_change_xdp_fd+0x11f/0x230
> > [2160294.718380]  ? dev_disable_lro+0xe0/0xe0
> > [2160294.718705]  do_setlink+0xac7/0xe70
> > [2160294.719035]  ? __nla_parse+0xed/0x120
> > [2160294.719365]  rtnl_newlink+0x73b/0x860
> >
> > Fixes: 41c445ff0f48 ("i40e: main driver core")
> >
>
> This Fixes line should be connected to the Sign offs with
> no linefeeds between.
>
> > Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
> > Signed-off-by: Shujin Li <lishujin@kuaishou.com>
>
> Did Shujin contribute to this patch? Why are they signing off? If
> they developed this patch with you, it should say:
> Co-developed-by: Shujin ....
> Signed-off-by: Shujin ...
> Signed-off-by: Jason ...
>
> Your signature should be last if you sent the patch. The sign-offs are
> like a chain of custody, please review
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by
>
> Thanks,
>  Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-14  2:34   ` [PATCH net v3] " kerneljasonxing
@ 2021-04-15  2:06     ` Jesse Brandeburg
  2021-04-15 10:31       ` Jesper Dangaard Brouer
  2021-04-15 21:40     ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 13+ messages in thread
From: Jesse Brandeburg @ 2021-04-15  2:06 UTC (permalink / raw)
  To: kerneljasonxing, davem, kuba
  Cc: anthony.l.nguyen, ast, daniel, hawk, john.fastabend, andrii,
	kafai, songliubraving, yhs, kpsingh, intel-wired-lan, netdev,
	linux-kernel, bpf, Jason Xing, Shujin Li

kerneljasonxing@gmail.com wrote:

> From: Jason Xing <xingwanli@kuaishou.com>
> 
> Fix this panic by adding more rules to calculate the value of @rss_size_max
> which could be used in allocating the queues when bpf is loaded, which,
> however, could cause the failure and then trigger the NULL pointer of
> vsi->rx_rings. Prio to this fix, the machine doesn't care about how many
> cpus are online and then allocates 256 queues on the machine with 32 cpus
> online actually.
> 
> Once the load of bpf begins, the log will go like this "failed to get
> tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
> failed".
> 
> Thus, I attach the key information of the crash-log here.
> 
> BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000000
> RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
> Call Trace:
> [2160294.717292]  ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
> [2160294.717666]  dev_xdp_install+0x4f/0x70
> [2160294.718036]  dev_change_xdp_fd+0x11f/0x230
> [2160294.718380]  ? dev_disable_lro+0xe0/0xe0
> [2160294.718705]  do_setlink+0xac7/0xe70
> [2160294.719035]  ? __nla_parse+0xed/0x120
> [2160294.719365]  rtnl_newlink+0x73b/0x860
> 
> Fixes: 41c445ff0f48 ("i40e: main driver core")
> Co-developed-by: Shujin Li <lishujin@kuaishou.com>
> Signed-off-by: Shujin Li <lishujin@kuaishou.com>
> Signed-off-by: Jason Xing <xingwanli@kuaishou.com>

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

@Jakub/@DaveM - feel free to apply this directly.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-15  1:12     ` Jason Xing
@ 2021-04-15  2:08       ` Jesse Brandeburg
  2021-04-15  5:16         ` Jason Xing
  0 siblings, 1 reply; 13+ messages in thread
From: Jesse Brandeburg @ 2021-04-15  2:08 UTC (permalink / raw)
  To: Jason Xing
  Cc: anthony.l.nguyen, David Miller, kuba, ast, daniel, hawk,
	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh,
	netdev, LKML, bpf, Jason Xing, Shujin Li, intel-wired-lan

Jason Xing wrote:

> On Wed, Apr 14, 2021 at 12:27 AM Jesse Brandeburg
> <jesse.brandeburg@intel.com> wrote:
> >
> > kerneljasonxing@gmail.com wrote:
> >
> > > From: Jason Xing <xingwanli@kuaishou.com>
> >
> > Hi Jason,
> >
> > Sorry, I missed this on the first time: Added intel-wired-lan,
> > please include on any future submissions for Intel drivers.
> > get-maintainers script might help here?
> >
> 
> Probably I got this wrong in the last email. Did you mean that I should add
> intel-wired-lan in the title not the cc list? It seems I should put
> this together on
> the next submission like this:
> 
> [Intel-wired-lan] [PATCH net v4]

Your v3 submittal was correct. My intent was to make sure
intel-wired-lan was in CC:

If Kuba or Dave wants us to take the fix in via intel-wired-lan trees,
then we can do that, or they can apply it directly. I'll ack it on the
v3.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v2] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-15  2:08       ` Jesse Brandeburg
@ 2021-04-15  5:16         ` Jason Xing
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Xing @ 2021-04-15  5:16 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: anthony.l.nguyen, David Miller, kuba, ast, daniel, hawk,
	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh,
	netdev, LKML, bpf, Jason Xing, Shujin Li, intel-wired-lan

On Thu, Apr 15, 2021 at 10:08 AM Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
>
> Jason Xing wrote:
>
> > On Wed, Apr 14, 2021 at 12:27 AM Jesse Brandeburg
> > <jesse.brandeburg@intel.com> wrote:
> > >
> > > kerneljasonxing@gmail.com wrote:
> > >
> > > > From: Jason Xing <xingwanli@kuaishou.com>
> > >
> > > Hi Jason,
> > >
> > > Sorry, I missed this on the first time: Added intel-wired-lan,
> > > please include on any future submissions for Intel drivers.
> > > get-maintainers script might help here?
> > >
> >
> > Probably I got this wrong in the last email. Did you mean that I should add
> > intel-wired-lan in the title not the cc list? It seems I should put
> > this together on
> > the next submission like this:
> >
> > [Intel-wired-lan] [PATCH net v4]
>
> Your v3 submittal was correct. My intent was to make sure
> intel-wired-lan was in CC:
>

Well, I get to know more about the whole thing.

> If Kuba or Dave wants us to take the fix in via intel-wired-lan trees,
> then we can do that, or they can apply it directly. I'll ack it on the
> v3.

Thanks, Jesse:)

Jason

>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-15  2:06     ` Jesse Brandeburg
@ 2021-04-15 10:31       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2021-04-15 10:31 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: brouer, kerneljasonxing, davem, kuba, anthony.l.nguyen, ast,
	daniel, hawk, john.fastabend, andrii, kafai, songliubraving, yhs,
	kpsingh, intel-wired-lan, netdev, linux-kernel, bpf, Jason Xing,
	Shujin Li

On Wed, 14 Apr 2021 19:06:52 -0700
Jesse Brandeburg <jesse.brandeburg@intel.com> wrote:

> kerneljasonxing@gmail.com wrote:
> 
> > From: Jason Xing <xingwanli@kuaishou.com>
> > 
> > Fix this panic by adding more rules to calculate the value of @rss_size_max
> > which could be used in allocating the queues when bpf is loaded, which,
> > however, could cause the failure and then trigger the NULL pointer of
> > vsi->rx_rings. Prio to this fix, the machine doesn't care about how many
> > cpus are online and then allocates 256 queues on the machine with 32 cpus
> > online actually.
> > 
> > Once the load of bpf begins, the log will go like this "failed to get
> > tracking for 256 queues for VSI 0 err -12" and this "setup of MAIN VSI
> > failed".
> > 
> > Thus, I attach the key information of the crash-log here.
> > 
> > BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000000
> > RIP: 0010:i40e_xdp+0xdd/0x1b0 [i40e]
> > Call Trace:
> > [2160294.717292]  ? i40e_reconfig_rss_queues+0x170/0x170 [i40e]
> > [2160294.717666]  dev_xdp_install+0x4f/0x70
> > [2160294.718036]  dev_change_xdp_fd+0x11f/0x230
> > [2160294.718380]  ? dev_disable_lro+0xe0/0xe0
> > [2160294.718705]  do_setlink+0xac7/0xe70
> > [2160294.719035]  ? __nla_parse+0xed/0x120
> > [2160294.719365]  rtnl_newlink+0x73b/0x860
> > 
> > Fixes: 41c445ff0f48 ("i40e: main driver core")
> > Co-developed-by: Shujin Li <lishujin@kuaishou.com>
> > Signed-off-by: Shujin Li <lishujin@kuaishou.com>
> > Signed-off-by: Jason Xing <xingwanli@kuaishou.com>  
> 
> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> @Jakub/@DaveM - feel free to apply this directly.

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

The crash/bug happens in this code:

 static int i40e_xdp_setup(struct i40e_vsi *vsi, struct bpf_prog *prog,
			  struct netlink_ext_ack *extack)
 {
 [...]
	for (i = 0; i < vsi->num_queue_pairs; i++)
		WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog);


And this is a side effect of i40e_setup_pf_switch() failing with "setup
of MAIN VSI failed".

LGTM
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net v3] i40e: fix the panic when running bpf in xdpdrv mode
  2021-04-14  2:34   ` [PATCH net v3] " kerneljasonxing
  2021-04-15  2:06     ` Jesse Brandeburg
@ 2021-04-15 21:40     ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-04-15 21:40 UTC (permalink / raw)
  To: Jason Xing
  Cc: jesse.brandeburg, anthony.l.nguyen, davem, kuba, ast, daniel,
	hawk, john.fastabend, andrii, kafai, songliubraving, yhs,
	kpsingh, intel-wired-lan, netdev, linux-kernel, bpf, xingwanli,
	lishujin

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Wed, 14 Apr 2021 10:34:28 +0800 you wrote:
> From: Jason Xing <xingwanli@kuaishou.com>
> 
> Fix this panic by adding more rules to calculate the value of @rss_size_max
> which could be used in allocating the queues when bpf is loaded, which,
> however, could cause the failure and then trigger the NULL pointer of
> vsi->rx_rings. Prio to this fix, the machine doesn't care about how many
> cpus are online and then allocates 256 queues on the machine with 32 cpus
> online actually.
> 
> [...]

Here is the summary with links:
  - [net,v3] i40e: fix the panic when running bpf in xdpdrv mode
    https://git.kernel.org/netdev/net/c/4e39a072a6a0

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-04-15 21:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-12  6:57 [PATCH] i40e: fix the panic when running bpf in xdpdrv mode kerneljasonxing
2021-04-12 21:52 ` Jesse Brandeburg
2021-04-13  2:17   ` Jason Xing
2021-04-13  2:50 ` [PATCH net v2] " kerneljasonxing
2021-04-13 16:18   ` Jesse Brandeburg
2021-04-14  1:37     ` Jason Xing
2021-04-15  1:12     ` Jason Xing
2021-04-15  2:08       ` Jesse Brandeburg
2021-04-15  5:16         ` Jason Xing
2021-04-14  2:34   ` [PATCH net v3] " kerneljasonxing
2021-04-15  2:06     ` Jesse Brandeburg
2021-04-15 10:31       ` Jesper Dangaard Brouer
2021-04-15 21:40     ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).