netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bnx2x: Prevent load reordering in tx completion processing
@ 2019-07-15 21:41 Brian King
  2019-07-17 19:00 ` David Miller
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Brian King @ 2019-07-15 21:41 UTC (permalink / raw)
  To: GR-everest-linux-l2; +Cc: skalluru, aelior, netdev, Brian King

This patch fixes an issue seen on Power systems with bnx2x which results
in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb
pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons
load in bnx2x_tx_int. Adding a read memory barrier resolves the issue.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 656ed80..e2be5a6 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -285,6 +285,9 @@ int bnx2x_tx_int(struct bnx2x *bp, struct bnx2x_fp_txdata *txdata)
 	hw_cons = le16_to_cpu(*txdata->tx_cons_sb);
 	sw_cons = txdata->tx_pkt_cons;
 
+	/* Ensure subsequent loads occur after hw_cons */
+	smp_rmb();
+
 	while (sw_cons != hw_cons) {
 		u16 pkt_cons;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] bnx2x: Prevent load reordering in tx completion processing
  2019-07-15 21:41 [PATCH] bnx2x: Prevent load reordering in tx completion processing Brian King
@ 2019-07-17 19:00 ` David Miller
  2019-07-18 10:12 ` [EXT] " Manish Chopra
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2019-07-17 19:00 UTC (permalink / raw)
  To: brking; +Cc: GR-everest-linux-l2, skalluru, aelior, netdev

From: Brian King <brking@linux.vnet.ibm.com>
Date: Mon, 15 Jul 2019 16:41:50 -0500

> This patch fixes an issue seen on Power systems with bnx2x which results
> in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb
> pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons
> load in bnx2x_tx_int. Adding a read memory barrier resolves the issue.
> 
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>

Marvell folks, please review.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [EXT] [PATCH] bnx2x: Prevent load reordering in tx completion processing
  2019-07-15 21:41 [PATCH] bnx2x: Prevent load reordering in tx completion processing Brian King
  2019-07-17 19:00 ` David Miller
@ 2019-07-18 10:12 ` Manish Chopra
  2019-07-18 18:26   ` Brian King
  2019-07-18 23:30 ` David Miller
  2019-07-21 19:29 ` David Miller
  3 siblings, 1 reply; 6+ messages in thread
From: Manish Chopra @ 2019-07-18 10:12 UTC (permalink / raw)
  To: Brian King, GR-everest-linux-l2
  Cc: Sudarsana Reddy Kalluru, Ariel Elior, netdev

> -----Original Message-----
> From: Brian King <brking@linux.vnet.ibm.com>
> Sent: Tuesday, July 16, 2019 3:12 AM
> To: GR-everest-linux-l2 <GR-everest-linux-l2@marvell.com>
> Cc: Sudarsana Reddy Kalluru <skalluru@marvell.com>; Ariel Elior
> <aelior@marvell.com>; netdev@vger.kernel.org; Brian King
> <brking@linux.vnet.ibm.com>
> Subject: [EXT] [PATCH] bnx2x: Prevent load reordering in tx completion
> processing
> 
> External Email
> 
> ----------------------------------------------------------------------
> This patch fixes an issue seen on Power systems with bnx2x which results in
> the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb pointer
> getting loaded in bnx2x_free_tx_pkt prior to the hw_cons load in
> bnx2x_tx_int. Adding a read memory barrier resolves the issue.
> 
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> index 656ed80..e2be5a6 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> @@ -285,6 +285,9 @@ int bnx2x_tx_int(struct bnx2x *bp, struct
> bnx2x_fp_txdata *txdata)
>  	hw_cons = le16_to_cpu(*txdata->tx_cons_sb);
>  	sw_cons = txdata->tx_pkt_cons;
> 
> +	/* Ensure subsequent loads occur after hw_cons */
> +	smp_rmb();
> +
>  	while (sw_cons != hw_cons) {
>  		u16 pkt_cons;
> 
> --
> 1.8.3.1

Could you please explain a bit in detail what could have caused skb to NULL exactly ?
Curious that if skb would have been NULL for some reason it did not cause NULL pointer dereference in bnx2x_free_tx_pkt() on below call -

prefetch(&skb->end);

Which is prior to the said WARN_ON(!skb) in bnx2x_free_tx_pkt().

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [EXT] [PATCH] bnx2x: Prevent load reordering in tx completion processing
  2019-07-18 10:12 ` [EXT] " Manish Chopra
@ 2019-07-18 18:26   ` Brian King
  0 siblings, 0 replies; 6+ messages in thread
From: Brian King @ 2019-07-18 18:26 UTC (permalink / raw)
  To: Manish Chopra, GR-everest-linux-l2
  Cc: Sudarsana Reddy Kalluru, Ariel Elior, netdev

On 7/18/19 5:12 AM, Manish Chopra wrote:
>> -----Original Message-----
>> From: Brian King <brking@linux.vnet.ibm.com>
>> Sent: Tuesday, July 16, 2019 3:12 AM
>> To: GR-everest-linux-l2 <GR-everest-linux-l2@marvell.com>
>> Cc: Sudarsana Reddy Kalluru <skalluru@marvell.com>; Ariel Elior
>> <aelior@marvell.com>; netdev@vger.kernel.org; Brian King
>> <brking@linux.vnet.ibm.com>
>> Subject: [EXT] [PATCH] bnx2x: Prevent load reordering in tx completion
>> processing
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> This patch fixes an issue seen on Power systems with bnx2x which results in
>> the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb pointer
>> getting loaded in bnx2x_free_tx_pkt prior to the hw_cons load in
>> bnx2x_tx_int. Adding a read memory barrier resolves the issue.
>>
>> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
>> ---
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
>> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
>> index 656ed80..e2be5a6 100644
>> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
>> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
>> @@ -285,6 +285,9 @@ int bnx2x_tx_int(struct bnx2x *bp, struct
>> bnx2x_fp_txdata *txdata)
>>  	hw_cons = le16_to_cpu(*txdata->tx_cons_sb);
>>  	sw_cons = txdata->tx_pkt_cons;
>>
>> +	/* Ensure subsequent loads occur after hw_cons */
>> +	smp_rmb();
>> +
>>  	while (sw_cons != hw_cons) {
>>  		u16 pkt_cons;
>>
>> --
>> 1.8.3.1
> 
> Could you please explain a bit in detail what could have caused skb to NULL exactly ?
> Curious that if skb would have been NULL for some reason it did not cause NULL pointer dereference in bnx2x_free_tx_pkt() on below call -
> 
> prefetch(&skb->end);
> 
> Which is prior to the said WARN_ON(!skb) in bnx2x_free_tx_pkt().

Right. In this case, that would end up passing an invalid address to prefetch. On a
Power processor, that turns into a dcbt instruction (data cache block touch), which
is a hint to the process that the subsequent code may access that data cache block.
Passing an invalid address to dcbt causes no harm. I just built a userspace program
to validate that doing something very similar to what is happening here, and the dcbt
executed with no errors, no segfault to the userspace process.

This is the scenario I think is occurring. 

CPU[0]
bnx2x_start_xmit
[1] tx_buf->skb = skb; /* store skb pointer */
[2] ...
[3] wmb(); 
[4] DOORBELL_RELAXED

CPU[1]
bnx2x_tx_int
[5]  hw_cons = le16_to_cpu(*txdata->tx_cons_sb);
[6]  sw_cons = txdata->tx_pkt_cons;
[7]  while (sw_cons != hw_cons) {
[8]  pkt_cons = TX_BD(sw_cons);
[9]  bnx2x_free_tx_pkt
[10]  tx_buf = &txdata->tx_buf_ring[pkt_cons];
[11]  skb = tx_buf->skb;
[12]  prefetch(&skb->end);
[13]  ...
[14]  WARN_ON(!skb);


On CPU0 we are in the process of sending a TX buffer to the adapter. We have a wmb at [3]
to ensure that all stores to cacheable storage are coherent with respect to the
non-cacheable store at [4] to tell the adapter about the new TX buffer.

On CPU1, we have a potential race condition if the processor is aggressively reordering
loads. If we find ourselves in bnx2x_tx_int, while still in bn2x_start_xmit on CPU0,
its possible the processor could begin speculatively executing well into [11]. Since there
is no read barrier of any sort to tell the processor that the load of the skb pointer
cannot happen until the load of hw_cons has occurred, we could end up speculatively
loading the skb pointer before the write in [1] has completed with respect to CPU1. 

Adding the smp_rmb between 6 and 7 ensures that the load at 5 occurs prior to the
load at 11.

This was reproducing consistently on a 16 socket Power 9 system built of four, four socket
nodes, with 10 bnx2x adapters and 26TB of memory. The NUMA effects can get to be exaggerated
a bit on these systems. With this patch, the system runs without issues. We've now been able to
run the workload multiple times with no issues.

Thanks,

Brian


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] bnx2x: Prevent load reordering in tx completion processing
  2019-07-15 21:41 [PATCH] bnx2x: Prevent load reordering in tx completion processing Brian King
  2019-07-17 19:00 ` David Miller
  2019-07-18 10:12 ` [EXT] " Manish Chopra
@ 2019-07-18 23:30 ` David Miller
  2019-07-21 19:29 ` David Miller
  3 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2019-07-18 23:30 UTC (permalink / raw)
  To: brking; +Cc: GR-everest-linux-l2, skalluru, aelior, netdev

From: Brian King <brking@linux.vnet.ibm.com>
Date: Mon, 15 Jul 2019 16:41:50 -0500

> This patch fixes an issue seen on Power systems with bnx2x which results
> in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb
> pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons
> load in bnx2x_tx_int. Adding a read memory barrier resolves the issue.
> 
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>

I agree with Brian's explanation of how the reordering can happen, and why
the crash doesn't show up in the prefetch() call.

I'll give the Marvell folks one more day to give a proper ACK.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] bnx2x: Prevent load reordering in tx completion processing
  2019-07-15 21:41 [PATCH] bnx2x: Prevent load reordering in tx completion processing Brian King
                   ` (2 preceding siblings ...)
  2019-07-18 23:30 ` David Miller
@ 2019-07-21 19:29 ` David Miller
  3 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2019-07-21 19:29 UTC (permalink / raw)
  To: brking; +Cc: GR-everest-linux-l2, skalluru, aelior, netdev

From: Brian King <brking@linux.vnet.ibm.com>
Date: Mon, 15 Jul 2019 16:41:50 -0500

> This patch fixes an issue seen on Power systems with bnx2x which results
> in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb
> pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons
> load in bnx2x_tx_int. Adding a read memory barrier resolves the issue.
> 
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-21 19:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-15 21:41 [PATCH] bnx2x: Prevent load reordering in tx completion processing Brian King
2019-07-17 19:00 ` David Miller
2019-07-18 10:12 ` [EXT] " Manish Chopra
2019-07-18 18:26   ` Brian King
2019-07-18 23:30 ` David Miller
2019-07-21 19:29 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).