[PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
@ 2018-01-12  3:42 Jianchao Wang
  2018-01-12 16:32 ` Jason Gunthorpe
  0 siblings, 1 reply; 27+ messages in thread
From: Jianchao Wang @ 2018-01-12  3:42 UTC (permalink / raw)
  To: tariqt; +Cc: junxiao.bi, netdev, linux-rdma, linux-kernel

Customer reported memory corruption issue on previous mlx4_en driver
version where the order-3 pages and multiple page reference counting
were still used.

Finally, find out one of the root causes is that the HW may see stale
rx_descs due to prod db updating reaches HW before rx_desc. Especially
when cross order-3 pages boundary and update a new one, HW may write
on the pages which may has been freed and allocated again by others.

To fix it, add a wmb between rx_desc and prod db updating to ensure
the order. Even thougth order-0 and page recycling has been introduced,
the disorder between rx_desc and prod db still could lead to corruption
on different inbound packages.

Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 85e28ef..eefa82c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
 			break;
 		ring->prod++;
 	} while (likely(--missing));
-
+	wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
 	mlx4_en_update_rx_prod_db(ring);
 }

-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-12  3:42 [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating Jianchao Wang
@ 2018-01-12 16:32 ` Jason Gunthorpe
  2018-01-12 16:46   ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Gunthorpe @ 2018-01-12 16:32 UTC (permalink / raw)
  To: Jianchao Wang
  Cc: tariqt, junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

On Fri, Jan 12, 2018 at 11:42:22AM +0800, Jianchao Wang wrote:
> Customer reported memory corruption issue on previous mlx4_en driver
> version where the order-3 pages and multiple page reference counting
> were still used.
> 
> Finally, find out one of the root causes is that the HW may see stale
> rx_descs due to prod db updating reaches HW before rx_desc. Especially
> when cross order-3 pages boundary and update a new one, HW may write
> on the pages which may has been freed and allocated again by others.
> 
> To fix it, add a wmb between rx_desc and prod db updating to ensure
> the order. Even thougth order-0 and page recycling has been introduced,
> the disorder between rx_desc and prod db still could lead to corruption
> on different inbound packages.
> 
> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 85e28ef..eefa82c 100644
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
>  			break;
>  		ring->prod++;
>  	} while (likely(--missing));
> -
> +	wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
>  	mlx4_en_update_rx_prod_db(ring);
>  }
>  

Does this need to be dma_wmb(), and should it be in
mlx4_en_update_rx_prod_db ?

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-12 16:32 ` Jason Gunthorpe
@ 2018-01-12 16:46   ` Eric Dumazet
  2018-01-12 19:53     ` Saeed Mahameed
  2018-01-14  2:40     ` jianchao.wang
  0 siblings, 2 replies; 27+ messages in thread
From: Eric Dumazet @ 2018-01-12 16:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Jianchao Wang
  Cc: tariqt, junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

On Fri, 2018-01-12 at 09:32 -0700, Jason Gunthorpe wrote:
> On Fri, Jan 12, 2018 at 11:42:22AM +0800, Jianchao Wang wrote:
> > Customer reported memory corruption issue on previous mlx4_en driver
> > version where the order-3 pages and multiple page reference counting
> > were still used.
> > 
> > Finally, find out one of the root causes is that the HW may see stale
> > rx_descs due to prod db updating reaches HW before rx_desc. Especially
> > when cross order-3 pages boundary and update a new one, HW may write
> > on the pages which may has been freed and allocated again by others.
> > 
> > To fix it, add a wmb between rx_desc and prod db updating to ensure
> > the order. Even thougth order-0 and page recycling has been introduced,
> > the disorder between rx_desc and prod db still could lead to corruption
> > on different inbound packages.
> > 
> > Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
> >  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > index 85e28ef..eefa82c 100644
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > @@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> >  			break;
> >  		ring->prod++;
> >  	} while (likely(--missing));
> > -
> > +	wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
> >  	mlx4_en_update_rx_prod_db(ring);
> >  }
> >  
> 
> Does this need to be dma_wmb(), and should it be in
> mlx4_en_update_rx_prod_db ?
> 

+1 on dma_wmb()

On what architecture bug was observed ?

In any case, the barrier should be moved in mlx4_en_update_rx_prod_db()
I think.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-12 16:46   ` Eric Dumazet
@ 2018-01-12 19:53     ` Saeed Mahameed
  2018-01-12 20:16       ` Eric Dumazet
  2018-01-14  2:40     ` jianchao.wang
  1 sibling, 1 reply; 27+ messages in thread
From: Saeed Mahameed @ 2018-01-12 19:53 UTC (permalink / raw)
  To: Eric Dumazet, Jason Gunthorpe, Jianchao Wang
  Cc: tariqt, junxiao.bi, netdev, linux-rdma, linux-kernel



On 01/12/2018 08:46 AM, Eric Dumazet wrote:
> On Fri, 2018-01-12 at 09:32 -0700, Jason Gunthorpe wrote:
>> On Fri, Jan 12, 2018 at 11:42:22AM +0800, Jianchao Wang wrote:
>>> Customer reported memory corruption issue on previous mlx4_en driver
>>> version where the order-3 pages and multiple page reference counting
>>> were still used.
>>>
>>> Finally, find out one of the root causes is that the HW may see stale
>>> rx_descs due to prod db updating reaches HW before rx_desc. Especially
>>> when cross order-3 pages boundary and update a new one, HW may write
>>> on the pages which may has been freed and allocated again by others.
>>>
>>> To fix it, add a wmb between rx_desc and prod db updating to ensure
>>> the order. Even thougth order-0 and page recycling has been introduced,
>>> the disorder between rx_desc and prod db still could lead to corruption
>>> on different inbound packages.
>>>
>>> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
>>>   drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>> index 85e28ef..eefa82c 100644
>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>> @@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
>>>   			break;
>>>   		ring->prod++;
>>>   	} while (likely(--missing));
>>> -
>>> +	wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
>>>   	mlx4_en_update_rx_prod_db(ring);
>>>   }
>>>   
>>
>> Does this need to be dma_wmb(), and should it be in
>> mlx4_en_update_rx_prod_db ?
>>
> 
> +1 on dma_wmb()
> 
> On what architecture bug was observed ?
> 
> In any case, the barrier should be moved in mlx4_en_update_rx_prod_db()
> I think.
> 

+1 on dma_wmb(), thanks Eric for reviewing this.

The barrier is also needed elsewhere in the code as well, but I wouldn't 
put it in mlx4_en_update_rx_prod_db(), just to allow batch filling of 
all rx rings and then hit the barrier only once. As a rule of thumb, mem 
barriers are the ring API caller responsibility.

e.g. in mlx4_en_activate_rx_rings():
between mlx4_en_fill_rx_buffers(priv); and the loop that updates rx prod 
for all rings ring, the dma_wmb is needed, see below.

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index b4d144e67514..65541721a240 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -370,6 +370,8 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv)
         if (err)
                 goto err_buffers;

+       dma_wmb();
+
         for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++) {
                 ring = priv->rx_ring[ring_ind];

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-12 19:53     ` Saeed Mahameed
@ 2018-01-12 20:16       ` Eric Dumazet
  2018-01-12 21:01         ` Saeed Mahameed
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2018-01-12 20:16 UTC (permalink / raw)
  To: Saeed Mahameed, Jason Gunthorpe, Jianchao Wang
  Cc: tariqt, junxiao.bi, netdev, linux-rdma, linux-kernel

On Fri, 2018-01-12 at 11:53 -0800, Saeed Mahameed wrote:
> 
> On 01/12/2018 08:46 AM, Eric Dumazet wrote:
> > On Fri, 2018-01-12 at 09:32 -0700, Jason Gunthorpe wrote:
> > > On Fri, Jan 12, 2018 at 11:42:22AM +0800, Jianchao Wang wrote:
> > > > Customer reported memory corruption issue on previous mlx4_en driver
> > > > version where the order-3 pages and multiple page reference counting
> > > > were still used.
> > > > 
> > > > Finally, find out one of the root causes is that the HW may see stale
> > > > rx_descs due to prod db updating reaches HW before rx_desc. Especially
> > > > when cross order-3 pages boundary and update a new one, HW may write
> > > > on the pages which may has been freed and allocated again by others.
> > > > 
> > > > To fix it, add a wmb between rx_desc and prod db updating to ensure
> > > > the order. Even thougth order-0 and page recycling has been introduced,
> > > > the disorder between rx_desc and prod db still could lead to corruption
> > > > on different inbound packages.
> > > > 
> > > > Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
> > > >   drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
> > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > > index 85e28ef..eefa82c 100644
> > > > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > > @@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> > > >   			break;
> > > >   		ring->prod++;
> > > >   	} while (likely(--missing));
> > > > -
> > > > +	wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
> > > >   	mlx4_en_update_rx_prod_db(ring);
> > > >   }
> > > >   
> > > 
> > > Does this need to be dma_wmb(), and should it be in
> > > mlx4_en_update_rx_prod_db ?
> > > 
> > 
> > +1 on dma_wmb()
> > 
> > On what architecture bug was observed ?
> > 
> > In any case, the barrier should be moved in mlx4_en_update_rx_prod_db()
> > I think.
> > 
> 
> +1 on dma_wmb(), thanks Eric for reviewing this.
> 
> The barrier is also needed elsewhere in the code as well, but I wouldn't 
> put it in mlx4_en_update_rx_prod_db(), just to allow batch filling of 
> all rx rings and then hit the barrier only once. As a rule of thumb, mem 
> barriers are the ring API caller responsibility.
> 
> e.g. in mlx4_en_activate_rx_rings():
> between mlx4_en_fill_rx_buffers(priv); and the loop that updates rx prod 
> for all rings ring, the dma_wmb is needed, see below.
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index b4d144e67514..65541721a240 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -370,6 +370,8 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv)
>          if (err)
>                  goto err_buffers;
> 
> +       dma_wmb();
> +
>          for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++) {
>                  ring = priv->rx_ring[ring_ind];


Why bother, considering dma_wmb() is a nop on x86,
simply a compiler barrier.

Putting it in mlx4_en_update_rx_prod_db() and have no obscure bugs...

Also we might change the existing wmb() in mlx4_en_process_rx_cq() by
dma_wmb(), that would help performance a bit.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-12 20:16       ` Eric Dumazet
@ 2018-01-12 21:01         ` Saeed Mahameed
  2018-01-12 21:21           ` Eric Dumazet
  2018-01-13 19:15           ` Jason Gunthorpe
  0 siblings, 2 replies; 27+ messages in thread
From: Saeed Mahameed @ 2018-01-12 21:01 UTC (permalink / raw)
  To: Eric Dumazet, Jason Gunthorpe, Jianchao Wang
  Cc: tariqt, junxiao.bi, netdev, linux-rdma, linux-kernel



On 01/12/2018 12:16 PM, Eric Dumazet wrote:
> On Fri, 2018-01-12 at 11:53 -0800, Saeed Mahameed wrote:
>>
>> On 01/12/2018 08:46 AM, Eric Dumazet wrote:
>>> On Fri, 2018-01-12 at 09:32 -0700, Jason Gunthorpe wrote:
>>>> On Fri, Jan 12, 2018 at 11:42:22AM +0800, Jianchao Wang wrote:
>>>>> Customer reported memory corruption issue on previous mlx4_en driver
>>>>> version where the order-3 pages and multiple page reference counting
>>>>> were still used.
>>>>>
>>>>> Finally, find out one of the root causes is that the HW may see stale
>>>>> rx_descs due to prod db updating reaches HW before rx_desc. Especially
>>>>> when cross order-3 pages boundary and update a new one, HW may write
>>>>> on the pages which may has been freed and allocated again by others.
>>>>>
>>>>> To fix it, add a wmb between rx_desc and prod db updating to ensure
>>>>> the order. Even thougth order-0 and page recycling has been introduced,
>>>>> the disorder between rx_desc and prod db still could lead to corruption
>>>>> on different inbound packages.
>>>>>
>>>>> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
>>>>>   drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>>>> index 85e28ef..eefa82c 100644
>>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>>>> @@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
>>>>>   			break;
>>>>>   		ring->prod++;
>>>>>   	} while (likely(--missing));
>>>>> -
>>>>> +	wmb(); /* ensure rx_desc updating reaches HW before prod db updating */
>>>>>   	mlx4_en_update_rx_prod_db(ring);
>>>>>   }
>>>>>   
>>>>
>>>> Does this need to be dma_wmb(), and should it be in
>>>> mlx4_en_update_rx_prod_db ?
>>>>
>>>
>>> +1 on dma_wmb()
>>>
>>> On what architecture bug was observed ?
>>>
>>> In any case, the barrier should be moved in mlx4_en_update_rx_prod_db()
>>> I think.
>>>
>>
>> +1 on dma_wmb(), thanks Eric for reviewing this.
>>
>> The barrier is also needed elsewhere in the code as well, but I wouldn't 
>> put it in mlx4_en_update_rx_prod_db(), just to allow batch filling of 
>> all rx rings and then hit the barrier only once. As a rule of thumb, mem 
>> barriers are the ring API caller responsibility.
>>
>> e.g. in mlx4_en_activate_rx_rings():
>> between mlx4_en_fill_rx_buffers(priv); and the loop that updates rx prod 
>> for all rings ring, the dma_wmb is needed, see below.
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
>> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> index b4d144e67514..65541721a240 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> @@ -370,6 +370,8 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv)
>>          if (err)
>>                  goto err_buffers;
>>
>> +       dma_wmb();
>> +
>>          for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++) {
>>                  ring = priv->rx_ring[ring_ind];
> 
> 
> Why bother, considering dma_wmb() is a nop on x86,
> simply a compiler barrier.
> 
> Putting it in mlx4_en_update_rx_prod_db() and have no obscure bugs...
> 

Simply putting a memory barrier on the top or the bottom of a functions,
means nothing unless you are looking at the whole picture, of all the
callers of that function to understand why is it there.

which is better to grasp ?:

update_doorbell() {
    dma_wmb();
    ring->db = prod;
}

or

fill buffers();
dma_wmb();
update_doorbell();

I simply like the 2nd one since with one look you can understand what this dma_wmb is protecting.

Anyway this is truly a nit, Tariq can decide what is better for him :).

> Also we might change the existing wmb() in mlx4_en_process_rx_cq() by
> dma_wmb(), that would help performance a bit.
> 
> 

+1, Tariq will you handle ? 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-12 21:01         ` Saeed Mahameed
@ 2018-01-12 21:21           ` Eric Dumazet
  2018-01-13 19:15           ` Jason Gunthorpe
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2018-01-12 21:21 UTC (permalink / raw)
  To: Saeed Mahameed, Jason Gunthorpe, Jianchao Wang
  Cc: tariqt, junxiao.bi, netdev, linux-rdma, linux-kernel

On Fri, 2018-01-12 at 13:01 -0800, Saeed Mahameed wrote:

> which is better to grasp ?:
> 
> update_doorbell() {
>     dma_wmb();
>     ring->db = prod;
> }

This one is IMO the most secure one (least surprise)

Considering the time it took to discover this bug, I would really play
safe.

But obviously I do not maintain mlx4.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-12 21:01         ` Saeed Mahameed
  2018-01-12 21:21           ` Eric Dumazet
@ 2018-01-13 19:15           ` Jason Gunthorpe
  1 sibling, 0 replies; 27+ messages in thread
From: Jason Gunthorpe @ 2018-01-13 19:15 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Eric Dumazet, Jianchao Wang, tariqt, junxiao.bi, netdev,
	linux-rdma, linux-kernel

On Fri, Jan 12, 2018 at 01:01:56PM -0800, Saeed Mahameed wrote:

> Simply putting a memory barrier on the top or the bottom of a functions,
> means nothing unless you are looking at the whole picture, of all the
> callers of that function to understand why is it there.

When I review code I want to see the memory barrier placed *directly*
before the write which allows the DMA.

So yes, this is my preference:

> update_doorbell() {
>     dma_wmb();
>     ring->db = prod;
> }

Conceptually what is happening here is very similar to what
smp_store_release() does for SMP cases. In most cases wmb should
always be strongly connected with a following write.

smp_store_release() is called 'release' because the write it
incorporates allows the other CPU to 'see' what is being
protected. Similarly here, the write to the db allows the device to
'see' the new ring data.

And this is bad idea:

> fill buffers();
> dma_wmb();
> update_doorbell();

> I simply like the 2nd one since with one look you can understand
> what this dma_wmb is protecting.

What do you think the wmb is protecting in the above? It isn't the fill.

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-12 16:46   ` Eric Dumazet
  2018-01-12 19:53     ` Saeed Mahameed
@ 2018-01-14  2:40     ` jianchao.wang
  2018-01-14  9:47       ` Tariq Toukan
  1 sibling, 1 reply; 27+ messages in thread
From: jianchao.wang @ 2018-01-14  2:40 UTC (permalink / raw)
  To: Eric Dumazet, Jason Gunthorpe
  Cc: tariqt, junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

Dear all

Thanks for the kindly response and reviewing. That's really appreciated.

On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>> Does this need to be dma_wmb(), and should it be in
>> mlx4_en_update_rx_prod_db ?
>>
> +1 on dma_wmb()
> 
> On what architecture bug was observed ?
This issue was observed on x86-64.
And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
to confirm.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-14  2:40     ` jianchao.wang
@ 2018-01-14  9:47       ` Tariq Toukan
  2018-01-15  5:50         ` jianchao.wang
  0 siblings, 1 reply; 27+ messages in thread
From: Tariq Toukan @ 2018-01-14  9:47 UTC (permalink / raw)
  To: jianchao.wang, Eric Dumazet, Jason Gunthorpe
  Cc: tariqt, junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

Thanks Jianchao for your patch.

And Thank you guys for your reviews, much appreciated.
I was off-work on Friday and Saturday.

On 14/01/2018 4:40 AM, jianchao.wang wrote:
> Dear all
> 
> Thanks for the kindly response and reviewing. That's really appreciated.
> 
> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>> Does this need to be dma_wmb(), and should it be in
>>> mlx4_en_update_rx_prod_db ?
>>>
>> +1 on dma_wmb()
>>
>> On what architecture bug was observed ?
> This issue was observed on x86-64.
> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
> to confirm.

+1 on dma_wmb, let us know once customer confirms.
Please place it within mlx4_en_update_rx_prod_db as suggested.
All other calls to mlx4_en_update_rx_prod_db are in control/slow path so 
I prefer being on the safe side, and care less about bulking the barrier.

Thanks,
Tariq

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-14  9:47       ` Tariq Toukan
@ 2018-01-15  5:50         ` jianchao.wang
  2018-01-19 15:16           ` jianchao.wang
  0 siblings, 1 reply; 27+ messages in thread
From: jianchao.wang @ 2018-01-15  5:50 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

Hi Tariq

Thanks for your kindly response.

On 01/14/2018 05:47 PM, Tariq Toukan wrote:
> Thanks Jianchao for your patch.
> 
> And Thank you guys for your reviews, much appreciated.
> I was off-work on Friday and Saturday.
> 
> On 14/01/2018 4:40 AM, jianchao.wang wrote:
>> Dear all
>>
>> Thanks for the kindly response and reviewing. That's really appreciated.
>>
>> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>>> Does this need to be dma_wmb(), and should it be in
>>>> mlx4_en_update_rx_prod_db ?
>>>>
>>> +1 on dma_wmb()
>>>
>>> On what architecture bug was observed ?
>> This issue was observed on x86-64.
>> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
>> to confirm.
> 
> +1 on dma_wmb, let us know once customer confirms.
> Please place it within mlx4_en_update_rx_prod_db as suggested.
Yes, I have recommended it to customer.
Once I get the result, I will share it here.
> All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier.
> 
> Thanks,
> Tariq
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-15  5:50         ` jianchao.wang
@ 2018-01-19 15:16           ` jianchao.wang
  2018-01-19 15:49             ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: jianchao.wang @ 2018-01-19 15:16 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

Hi Tariq

Very sad that the crash was reproduced again after applied the patch.

--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring)
 
 static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
 {
+	dma_wmb();
 	*ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
 }

I analyzed the kdump, it should be a memory corruption.

Thanks
Jianchao
On 01/15/2018 01:50 PM, jianchao.wang wrote:
> Hi Tariq
> 
> Thanks for your kindly response.
> 
> On 01/14/2018 05:47 PM, Tariq Toukan wrote:
>> Thanks Jianchao for your patch.
>>
>> And Thank you guys for your reviews, much appreciated.
>> I was off-work on Friday and Saturday.
>>
>> On 14/01/2018 4:40 AM, jianchao.wang wrote:
>>> Dear all
>>>
>>> Thanks for the kindly response and reviewing. That's really appreciated.
>>>
>>> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>>>> Does this need to be dma_wmb(), and should it be in
>>>>> mlx4_en_update_rx_prod_db ?
>>>>>
>>>> +1 on dma_wmb()
>>>>
>>>> On what architecture bug was observed ?
>>> This issue was observed on x86-64.
>>> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
>>> to confirm.
>>
>> +1 on dma_wmb, let us know once customer confirms.
>> Please place it within mlx4_en_update_rx_prod_db as suggested.
> Yes, I have recommended it to customer.
> Once I get the result, I will share it here.
>> All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier.
>>
>> Thanks,
>> Tariq
>>
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-19 15:16           ` jianchao.wang
@ 2018-01-19 15:49             ` Eric Dumazet
  2018-01-21  9:31               ` Tariq Toukan
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2018-01-19 15:49 UTC (permalink / raw)
  To: jianchao.wang, Tariq Toukan, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
> Hi Tariq
> 
> Very sad that the crash was reproduced again after applied the patch.
> 
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring)
>  
>  static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
>  {
> +	dma_wmb();

So... is wmb() here fixing the issue ?

>  	*ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
>  }
> 
> I analyzed the kdump, it should be a memory corruption.
> 
> Thanks
> Jianchao
> On 01/15/2018 01:50 PM, jianchao.wang wrote:
> > Hi Tariq
> > 
> > Thanks for your kindly response.
> > 
> > On 01/14/2018 05:47 PM, Tariq Toukan wrote:
> > > Thanks Jianchao for your patch.
> > > 
> > > And Thank you guys for your reviews, much appreciated.
> > > I was off-work on Friday and Saturday.
> > > 
> > > On 14/01/2018 4:40 AM, jianchao.wang wrote:
> > > > Dear all
> > > > 
> > > > Thanks for the kindly response and reviewing. That's really appreciated.
> > > > 
> > > > On 01/13/2018 12:46 AM, Eric Dumazet wrote:
> > > > > > Does this need to be dma_wmb(), and should it be in
> > > > > > mlx4_en_update_rx_prod_db ?
> > > > > > 
> > > > > 
> > > > > +1 on dma_wmb()
> > > > > 
> > > > > On what architecture bug was observed ?
> > > > 
> > > > This issue was observed on x86-64.
> > > > And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
> > > > to confirm.
> > > 
> > > +1 on dma_wmb, let us know once customer confirms.
> > > Please place it within mlx4_en_update_rx_prod_db as suggested.
> > 
> > Yes, I have recommended it to customer.
> > Once I get the result, I will share it here.
> > > All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier.
> > > 
> > > Thanks,
> > > Tariq
> > > 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-19 15:49             ` Eric Dumazet
@ 2018-01-21  9:31               ` Tariq Toukan
  2018-01-21 16:24                 ` Tariq Toukan
  2018-01-21 20:40                 ` Jason Gunthorpe
  0 siblings, 2 replies; 27+ messages in thread
From: Tariq Toukan @ 2018-01-21  9:31 UTC (permalink / raw)
  To: Eric Dumazet, jianchao.wang, Tariq Toukan, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed



On 19/01/2018 5:49 PM, Eric Dumazet wrote:
> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>> Hi Tariq
>>
>> Very sad that the crash was reproduced again after applied the patch.
>>
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring)
>>   
>>   static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
>>   {
>> +	dma_wmb();
> 
> So... is wmb() here fixing the issue ?
> 
>>   	*ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
>>   }
>>
>> I analyzed the kdump, it should be a memory corruption.
>>
>> Thanks
>> Jianchao

Hmm, this is actually consistent with the example below [1].

AIU from the example, it seems that the dma_wmb/dma_rmb barriers are 
good for synchronizing cpu/device accesses to the "Streaming DMA mapped" 
buffers (the descriptors, went through the dma_map_page() API), but not 
for the doorbell (a coherent memory, typically allocated via 
dma_alloc_coherent) that requires using the stronger wmb() barrier.


[1] Documentation/memory-barriers.txt

  (*) dma_wmb();
  (*) dma_rmb();

      These are for use with consistent memory to guarantee the ordering
      of writes or reads of shared memory accessible to both the CPU and a
      DMA capable device.

      For example, consider a device driver that shares memory with a device
      and uses a descriptor status value to indicate if the descriptor 
belongs
      to the device or the CPU, and a doorbell to notify it when new
      descriptors are available:

	if (desc->status != DEVICE_OWN) {
		/* do not read data until we own descriptor */
		dma_rmb();

		/* read/modify data */
		read_data = desc->data;
		desc->data = write_data;

		/* flush modifications before status update */
		dma_wmb();

		/* assign ownership */
		desc->status = DEVICE_OWN;

		/* force memory to sync before notifying device via MMIO */
		wmb();

		/* notify device of new descriptors */
		writel(DESC_NOTIFY, doorbell);
	}

      The dma_rmb() allows us guarantee the device has released ownership
      before we read the data from the descriptor, and the dma_wmb() allows
      us to guarantee the data is written to the descriptor before the 
device
      can see it now has ownership.  The wmb() is needed to guarantee 
that the
      cache coherent memory writes have completed before attempting a 
write to
      the cache incoherent MMIO region.

      See Documentation/DMA-API.txt for more information on consistent 
memory.


>> On 01/15/2018 01:50 PM, jianchao.wang wrote:
>>> Hi Tariq
>>>
>>> Thanks for your kindly response.
>>>
>>> On 01/14/2018 05:47 PM, Tariq Toukan wrote:
>>>> Thanks Jianchao for your patch.
>>>>
>>>> And Thank you guys for your reviews, much appreciated.
>>>> I was off-work on Friday and Saturday.
>>>>
>>>> On 14/01/2018 4:40 AM, jianchao.wang wrote:
>>>>> Dear all
>>>>>
>>>>> Thanks for the kindly response and reviewing. That's really appreciated.
>>>>>
>>>>> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>>>>>> Does this need to be dma_wmb(), and should it be in
>>>>>>> mlx4_en_update_rx_prod_db ?
>>>>>>>
>>>>>>
>>>>>> +1 on dma_wmb()
>>>>>>
>>>>>> On what architecture bug was observed ?
>>>>>
>>>>> This issue was observed on x86-64.
>>>>> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
>>>>> to confirm.
>>>>
>>>> +1 on dma_wmb, let us know once customer confirms.
>>>> Please place it within mlx4_en_update_rx_prod_db as suggested.
>>>
>>> Yes, I have recommended it to customer.
>>> Once I get the result, I will share it here.
>>>> All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier.
>>>>
>>>> Thanks,
>>>> Tariq
>>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-21  9:31               ` Tariq Toukan
@ 2018-01-21 16:24                 ` Tariq Toukan
  2018-01-21 16:43                   ` Eric Dumazet
  2018-01-22  2:12                   ` jianchao.wang
  2018-01-21 20:40                 ` Jason Gunthorpe
  1 sibling, 2 replies; 27+ messages in thread
From: Tariq Toukan @ 2018-01-21 16:24 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, jianchao.wang, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed



On 21/01/2018 11:31 AM, Tariq Toukan wrote:
> 
> 
> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>> Hi Tariq
>>>
>>> Very sad that the crash was reproduced again after applied the patch.

Memory barriers vary for different Archs, can you please share more 
details regarding arch and repro steps?

>>>
>>> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>> @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct 
>>> mlx4_en_rx_ring *ring)
>>>   static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring 
>>> *ring)
>>>   {
>>> +    dma_wmb();
>>
>> So... is wmb() here fixing the issue ?
>>
>>>       *ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
>>>   }
>>>
>>> I analyzed the kdump, it should be a memory corruption.
>>>
>>> Thanks
>>> Jianchao
> 
> Hmm, this is actually consistent with the example below [1].
> 
> AIU from the example, it seems that the dma_wmb/dma_rmb barriers are 
> good for synchronizing cpu/device accesses to the "Streaming DMA mapped" 
> buffers (the descriptors, went through the dma_map_page() API), but not 
> for the doorbell (a coherent memory, typically allocated via 
> dma_alloc_coherent) that requires using the stronger wmb() barrier.
> 
> 
> [1] Documentation/memory-barriers.txt
> 
>   (*) dma_wmb();
>   (*) dma_rmb();
> 
>       These are for use with consistent memory to guarantee the ordering
>       of writes or reads of shared memory accessible to both the CPU and a
>       DMA capable device.
> 
>       For example, consider a device driver that shares memory with a 
> device
>       and uses a descriptor status value to indicate if the descriptor 
> belongs
>       to the device or the CPU, and a doorbell to notify it when new
>       descriptors are available:
> 
>      if (desc->status != DEVICE_OWN) {
>          /* do not read data until we own descriptor */
>          dma_rmb();
> 
>          /* read/modify data */
>          read_data = desc->data;
>          desc->data = write_data;
> 
>          /* flush modifications before status update */
>          dma_wmb();
> 
>          /* assign ownership */
>          desc->status = DEVICE_OWN;
> 
>          /* force memory to sync before notifying device via MMIO */
>          wmb();
> 
>          /* notify device of new descriptors */
>          writel(DESC_NOTIFY, doorbell);
>      }
> 
>       The dma_rmb() allows us guarantee the device has released ownership
>       before we read the data from the descriptor, and the dma_wmb() allows
>       us to guarantee the data is written to the descriptor before the 
> device
>       can see it now has ownership.  The wmb() is needed to guarantee 
> that the
>       cache coherent memory writes have completed before attempting a 
> write to
>       the cache incoherent MMIO region.
> 
>       See Documentation/DMA-API.txt for more information on consistent 
> memory.
> 
> 
>>> On 01/15/2018 01:50 PM, jianchao.wang wrote:
>>>> Hi Tariq
>>>>
>>>> Thanks for your kindly response.
>>>>
>>>> On 01/14/2018 05:47 PM, Tariq Toukan wrote:
>>>>> Thanks Jianchao for your patch.
>>>>>
>>>>> And Thank you guys for your reviews, much appreciated.
>>>>> I was off-work on Friday and Saturday.
>>>>>
>>>>> On 14/01/2018 4:40 AM, jianchao.wang wrote:
>>>>>> Dear all
>>>>>>
>>>>>> Thanks for the kindly response and reviewing. That's really 
>>>>>> appreciated.
>>>>>>
>>>>>> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>>>>>>> Does this need to be dma_wmb(), and should it be in
>>>>>>>> mlx4_en_update_rx_prod_db ?
>>>>>>>>
>>>>>>>
>>>>>>> +1 on dma_wmb()
>>>>>>>
>>>>>>> On what architecture bug was observed ?
>>>>>>
>>>>>> This issue was observed on x86-64.
>>>>>> And I will send a new patch, in which replace wmb() with 
>>>>>> dma_wmb(), to customer
>>>>>> to confirm.
>>>>>
>>>>> +1 on dma_wmb, let us know once customer confirms.
>>>>> Please place it within mlx4_en_update_rx_prod_db as suggested.
>>>>
>>>> Yes, I have recommended it to customer.
>>>> Once I get the result, I will share it here.
>>>>> All other calls to mlx4_en_update_rx_prod_db are in control/slow 
>>>>> path so I prefer being on the safe side, and care less about 
>>>>> bulking the barrier.
>>>>>
>>>>> Thanks,
>>>>> Tariq
>>>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-21 16:24                 ` Tariq Toukan
@ 2018-01-21 16:43                   ` Eric Dumazet
  2018-01-22  2:40                     ` jianchao.wang
  2018-01-22  2:12                   ` jianchao.wang
  1 sibling, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2018-01-21 16:43 UTC (permalink / raw)
  To: Tariq Toukan, jianchao.wang, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote:
> 
> On 21/01/2018 11:31 AM, Tariq Toukan wrote:
> > 
> > 
> > On 19/01/2018 5:49 PM, Eric Dumazet wrote:
> > > On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
> > > > Hi Tariq
> > > > 
> > > > Very sad that the crash was reproduced again after applied the patch.
> 
> Memory barriers vary for different Archs, can you please share more 
> details regarding arch and repro steps?

Yeah, mlx4 NICs in Google fleet receive trillions of packets per
second, and we never noticed an issue.

Although we are using a slightly different driver, using order-0 pages
and fast pages recycling.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-21 16:43                   ` Eric Dumazet
@ 2018-01-22  2:40                     ` jianchao.wang
  2018-01-22 15:47                       ` Jason Gunthorpe
  0 siblings, 1 reply; 27+ messages in thread
From: jianchao.wang @ 2018-01-22  2:40 UTC (permalink / raw)
  To: Eric Dumazet, Tariq Toukan, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

Hi Eric

On 01/22/2018 12:43 AM, Eric Dumazet wrote:
> On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote:
>>
>> On 21/01/2018 11:31 AM, Tariq Toukan wrote:
>>>
>>>
>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>> Hi Tariq
>>>>>
>>>>> Very sad that the crash was reproduced again after applied the patch.
>>
>> Memory barriers vary for different Archs, can you please share more 
>> details regarding arch and repro steps?
> 
> Yeah, mlx4 NICs in Google fleet receive trillions of packets per
> second, and we never noticed an issue.
> 
> Although we are using a slightly different driver, using order-0 pages
> and fast pages recycling.
> 
> 
The driver we use will will set the page reference count to (size of pages)/stride, the
pages will be freed by networking stack when the reference become zero, and the order-3
pages maybe allocated soon, this give NIC device a chance to corrupt the pages which have
 been allocated by others, such as slab.
In the current version with order-0 and page recycling, maybe the corruption occurred on
the inbound packets sometimes and just cause some bad and invalid packets which will be
dropped.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-22  2:40                     ` jianchao.wang
@ 2018-01-22 15:47                       ` Jason Gunthorpe
  2018-01-23  3:25                         ` jianchao.wang
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Gunthorpe @ 2018-01-22 15:47 UTC (permalink / raw)
  To: jianchao.wang
  Cc: Eric Dumazet, Tariq Toukan, junxiao.bi, netdev, linux-rdma,
	linux-kernel, Saeed Mahameed

On Mon, Jan 22, 2018 at 10:40:53AM +0800, jianchao.wang wrote:
> Hi Eric
> 
> On 01/22/2018 12:43 AM, Eric Dumazet wrote:
> > On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote:
> >>
> >> On 21/01/2018 11:31 AM, Tariq Toukan wrote:
> >>>
> >>>
> >>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
> >>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
> >>>>> Hi Tariq
> >>>>>
> >>>>> Very sad that the crash was reproduced again after applied the patch.
> >>
> >> Memory barriers vary for different Archs, can you please share more 
> >> details regarding arch and repro steps?
> > 
> > Yeah, mlx4 NICs in Google fleet receive trillions of packets per
> > second, and we never noticed an issue.
> > 
> > Although we are using a slightly different driver, using order-0 pages
> > and fast pages recycling.
> > 
> > 
> The driver we use will will set the page reference count to (size of pages)/stride, the
> pages will be freed by networking stack when the reference become zero, and the order-3
> pages maybe allocated soon, this give NIC device a chance to corrupt the pages which have
>  been allocated by others, such as slab.

But it looks like the wmb() is placed when stuffing new rx descriptors
into the device - how can it prevent corruption of pages where
ownership was transfered from device to the host? That sounds more like a
rmb() is missing someplace to me...

(Granted the missing wmb() is a bug, but it may not be fully solving this
 issue??)

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-22 15:47                       ` Jason Gunthorpe
@ 2018-01-23  3:25                         ` jianchao.wang
  0 siblings, 0 replies; 27+ messages in thread
From: jianchao.wang @ 2018-01-23  3:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Eric Dumazet, Tariq Toukan, junxiao.bi, netdev, linux-rdma,
	linux-kernel, Saeed Mahameed

Hi Jason

Thanks for your kindly response.

On 01/22/2018 11:47 PM, Jason Gunthorpe wrote:
>>> Yeah, mlx4 NICs in Google fleet receive trillions of packets per
>>> second, and we never noticed an issue.
>>>
>>> Although we are using a slightly different driver, using order-0 pages
>>> and fast pages recycling.
>>>
>>>
>> The driver we use will will set the page reference count to (size of pages)/stride, the
>> pages will be freed by networking stack when the reference become zero, and the order-3
>> pages maybe allocated soon, this give NIC device a chance to corrupt the pages which have
>>  been allocated by others, such as slab.
> But it looks like the wmb() is placed when stuffing new rx descriptors
> into the device - how can it prevent corruption of pages where
> ownership was transfered from device to the host? That sounds more like a
> rmb() is missing someplace to me...
> 
The device may see the prod_db updating before rx_desc updating.
Then it will get stale rx_descs.
These stale rx_descs may contain pages that has been freed to host.
 

> (Granted the missing wmb() is a bug, but it may not be fully solving this
>  issue??)the wmb() here fix one of the customer's test case.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-21 16:24                 ` Tariq Toukan
  2018-01-21 16:43                   ` Eric Dumazet
@ 2018-01-22  2:12                   ` jianchao.wang
  2018-01-25  3:27                     ` jianchao.wang
  1 sibling, 1 reply; 27+ messages in thread
From: jianchao.wang @ 2018-01-22  2:12 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

Hi Tariq and all

Many thanks for your kindly and detailed response and comment.

On 01/22/2018 12:24 AM, Tariq Toukan wrote:
> 
> 
> On 21/01/2018 11:31 AM, Tariq Toukan wrote:
>>
>>
>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>> Hi Tariq
>>>>
>>>> Very sad that the crash was reproduced again after applied the patch.
> 
> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
The xen is installed. The crash occurred in DOM0.
Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.

The patch that can fix this issue is as follow:
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1005,6 +1005,7 @@ out:
 	wmb(); /* ensure HW sees CQ consumer before we post new buffers */
 	ring->cons = cq->mcq.cons_index;
 	mlx4_en_refill_rx_buffers(priv, ring);
+	wmb();
 	mlx4_en_update_rx_prod_db(ring);
 	return polled;
 }

Thanks
Jianchao
> 
>>>>
>>>> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>>>> @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring)
>>>>   static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
>>>>   {
>>>> +    dma_wmb();
>>>
>>> So... is wmb() here fixing the issue ?
>>>
>>>>       *ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
>>>>   }
>>>>
>>>> I analyzed the kdump, it should be a memory corruption.
>>>>
>>>> Thanks
>>>> Jianchao
>>
>> Hmm, this is actually consistent with the example below [1].
>>
>> AIU from the example, it seems that the dma_wmb/dma_rmb barriers are good for synchronizing cpu/device accesses to the "Streaming DMA mapped" buffers (the descriptors, went through the dma_map_page() API), but not for the doorbell (a coherent memory, typically allocated via dma_alloc_coherent) that requires using the stronger wmb() barrier.
>>
>>
>> [1] Documentation/memory-barriers.txt
>>
>>   (*) dma_wmb();
>>   (*) dma_rmb();
>>
>>       These are for use with consistent memory to guarantee the ordering
>>       of writes or reads of shared memory accessible to both the CPU and a
>>       DMA capable device.
>>
>>       For example, consider a device driver that shares memory with a device
>>       and uses a descriptor status value to indicate if the descriptor belongs
>>       to the device or the CPU, and a doorbell to notify it when new
>>       descriptors are available:
>>
>>      if (desc->status != DEVICE_OWN) {
>>          /* do not read data until we own descriptor */
>>          dma_rmb();
>>
>>          /* read/modify data */
>>          read_data = desc->data;
>>          desc->data = write_data;
>>
>>          /* flush modifications before status update */
>>          dma_wmb();
>>
>>          /* assign ownership */
>>          desc->status = DEVICE_OWN;
>>
>>          /* force memory to sync before notifying device via MMIO */
>>          wmb();
>>
>>          /* notify device of new descriptors */
>>          writel(DESC_NOTIFY, doorbell);
>>      }
>>
>>       The dma_rmb() allows us guarantee the device has released ownership
>>       before we read the data from the descriptor, and the dma_wmb() allows
>>       us to guarantee the data is written to the descriptor before the device
>>       can see it now has ownership.  The wmb() is needed to guarantee that the
>>       cache coherent memory writes have completed before attempting a write to
>>       the cache incoherent MMIO region.
>>
>>       See Documentation/DMA-API.txt for more information on consistent memory.
>>
>>
>>>> On 01/15/2018 01:50 PM, jianchao.wang wrote:
>>>>> Hi Tariq
>>>>>
>>>>> Thanks for your kindly response.
>>>>>
>>>>> On 01/14/2018 05:47 PM, Tariq Toukan wrote:
>>>>>> Thanks Jianchao for your patch.
>>>>>>
>>>>>> And Thank you guys for your reviews, much appreciated.
>>>>>> I was off-work on Friday and Saturday.
>>>>>>
>>>>>> On 14/01/2018 4:40 AM, jianchao.wang wrote:
>>>>>>> Dear all
>>>>>>>
>>>>>>> Thanks for the kindly response and reviewing. That's really appreciated.
>>>>>>>
>>>>>>> On 01/13/2018 12:46 AM, Eric Dumazet wrote:
>>>>>>>>> Does this need to be dma_wmb(), and should it be in
>>>>>>>>> mlx4_en_update_rx_prod_db ?
>>>>>>>>>
>>>>>>>>
>>>>>>>> +1 on dma_wmb()
>>>>>>>>
>>>>>>>> On what architecture bug was observed ?
>>>>>>>
>>>>>>> This issue was observed on x86-64.
>>>>>>> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer
>>>>>>> to confirm.
>>>>>>
>>>>>> +1 on dma_wmb, let us know once customer confirms.
>>>>>> Please place it within mlx4_en_update_rx_prod_db as suggested.
>>>>>
>>>>> Yes, I have recommended it to customer.
>>>>> Once I get the result, I will share it here.
>>>>>> All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier.
>>>>>>
>>>>>> Thanks,
>>>>>> Tariq
>>>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=s8_-sqvK_-1EHwvxh5DBpBIakIb0lpcn0fN6zbFxgpk&s=q3jITeGfYvYPdMo8vqfURwAbUNbSrVi2pkJfmPVGUH8&e=
>>>
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-22  2:12                   ` jianchao.wang
@ 2018-01-25  3:27                     ` jianchao.wang
  2018-01-25  3:55                       ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: jianchao.wang @ 2018-01-25  3:27 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

Hi Tariq

On 01/22/2018 10:12 AM, jianchao.wang wrote:
>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>> Hi Tariq
>>>>>
>>>>> Very sad that the crash was reproduced again after applied the patch.
>> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?

> The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
> The xen is installed. The crash occurred in DOM0.
> Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
> 

What is the finial suggestion on this ?
If use wmb there, is the performance pulled down ?

Thanks in advance
Jianchao

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-25  3:27                     ` jianchao.wang
@ 2018-01-25  3:55                       ` Eric Dumazet
  2018-01-25  6:25                         ` jianchao.wang
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2018-01-25  3:55 UTC (permalink / raw)
  To: jianchao.wang, Tariq Toukan, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote:
> Hi Tariq
> 
> On 01/22/2018 10:12 AM, jianchao.wang wrote:
> > > > On 19/01/2018 5:49 PM, Eric Dumazet wrote:
> > > > > On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
> > > > > > Hi Tariq
> > > > > > 
> > > > > > Very sad that the crash was reproduced again after applied the patch.
> > > 
> > > Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?
> > The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
> > The xen is installed. The crash occurred in DOM0.
> > Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
> > 
> 
> What is the finial suggestion on this ?
> If use wmb there, is the performance pulled down ?

Since https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=dad42c3038a59d27fced28ee4ec1d4a891b28155

we batch allocations, so mlx4_en_refill_rx_buffers() is not called that often.

I doubt the additional wmb() will have serious impact there.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-25  3:55                       ` Eric Dumazet
@ 2018-01-25  6:25                         ` jianchao.wang
  2018-01-25  9:54                           ` Tariq Toukan
  0 siblings, 1 reply; 27+ messages in thread
From: jianchao.wang @ 2018-01-25  6:25 UTC (permalink / raw)
  To: Eric Dumazet, Tariq Toukan, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

Hi Eric

Thanks for you kindly response and suggestion.
That's really appreciated.

Jianchao

On 01/25/2018 11:55 AM, Eric Dumazet wrote:
> On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote:
>> Hi Tariq
>>
>> On 01/22/2018 10:12 AM, jianchao.wang wrote:
>>>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>>>> Hi Tariq
>>>>>>>
>>>>>>> Very sad that the crash was reproduced again after applied the patch.
>>>>
>>>> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?
>>> The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
>>> The xen is installed. The crash occurred in DOM0.
>>> Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
>>>
>>
>> What is the finial suggestion on this ?
>> If use wmb there, is the performance pulled down ?
> 
> Since https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_davem_net-2Dnext.git_commit_-3Fid-3Ddad42c3038a59d27fced28ee4ec1d4a891b28155&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=c0oI8duFkyFBILMQYDsqRApHQrOlLY_2uGiz_utcd7s&s=E4_XKmSI0B63qB0DLQ1EX_fj1bOP78ZdeYADBf33B-k&e=
> 
> we batch allocations, so mlx4_en_refill_rx_buffers() is not called that often.
> 
> I doubt the additional wmb() will have serious impact there.
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-25  6:25                         ` jianchao.wang
@ 2018-01-25  9:54                           ` Tariq Toukan
  2018-01-27 12:41                             ` jianchao.wang
  0 siblings, 1 reply; 27+ messages in thread
From: Tariq Toukan @ 2018-01-25  9:54 UTC (permalink / raw)
  To: jianchao.wang, Eric Dumazet, Tariq Toukan, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed



On 25/01/2018 8:25 AM, jianchao.wang wrote:
> Hi Eric
> 
> Thanks for you kindly response and suggestion.
> That's really appreciated.
> 
> Jianchao
> 
> On 01/25/2018 11:55 AM, Eric Dumazet wrote:
>> On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote:
>>> Hi Tariq
>>>
>>> On 01/22/2018 10:12 AM, jianchao.wang wrote:
>>>>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>>>>> Hi Tariq
>>>>>>>>
>>>>>>>> Very sad that the crash was reproduced again after applied the patch.
>>>>>
>>>>> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?
>>>> The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
>>>> The xen is installed. The crash occurred in DOM0.
>>>> Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
>>>>
>>>
>>> What is the finial suggestion on this ?
>>> If use wmb there, is the performance pulled down ?

I want to evaluate this effect.
I agree with Eric, expected impact is restricted, especially after 
batching the allocations.

>>
>> Since https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_davem_net-2Dnext.git_commit_-3Fid-3Ddad42c3038a59d27fced28ee4ec1d4a891b28155&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=c0oI8duFkyFBILMQYDsqRApHQrOlLY_2uGiz_utcd7s&s=E4_XKmSI0B63qB0DLQ1EX_fj1bOP78ZdeYADBf33B-k&e=
>>
>> we batch allocations, so mlx4_en_refill_rx_buffers() is not called that often.
>>
>> I doubt the additional wmb() will have serious impact there.
>>

I will test the effect (it'll be beginning of next week).
I'll update so we can make a more confident decision.

Thanks,
Tariq

>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-25  9:54                           ` Tariq Toukan
@ 2018-01-27 12:41                             ` jianchao.wang
       [not found]                               ` <d9883261-e93e-400a-757c-3a81d8b6aca1@mellanox.com>
  0 siblings, 1 reply; 27+ messages in thread
From: jianchao.wang @ 2018-01-27 12:41 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed

Hi Tariq

Thanks for your kindly response.
That's really appreciated.

On 01/25/2018 05:54 PM, Tariq Toukan wrote:
> 
> 
> On 25/01/2018 8:25 AM, jianchao.wang wrote:
>> Hi Eric
>>
>> Thanks for you kindly response and suggestion.
>> That's really appreciated.
>>
>> Jianchao
>>
>> On 01/25/2018 11:55 AM, Eric Dumazet wrote:
>>> On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote:
>>>> Hi Tariq
>>>>
>>>> On 01/22/2018 10:12 AM, jianchao.wang wrote:
>>>>>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>>>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>>>>>> Hi Tariq
>>>>>>>>>
>>>>>>>>> Very sad that the crash was reproduced again after applied the patch.
>>>>>>
>>>>>> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?
>>>>> The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
>>>>> The xen is installed. The crash occurred in DOM0.
>>>>> Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
>>>>>
>>>>
>>>> What is the finial suggestion on this ?
>>>> If use wmb there, is the performance pulled down ?
> 
> I want to evaluate this effect.
> I agree with Eric, expected impact is restricted, especially after batching the allocations.> 
>>>
>>> Since https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_davem_net-2Dnext.git_commit_-3Fid-3Ddad42c3038a59d27fced28ee4ec1d4a891b28155&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=c0oI8duFkyFBILMQYDsqRApHQrOlLY_2uGiz_utcd7s&s=E4_XKmSI0B63qB0DLQ1EX_fj1bOP78ZdeYADBf33B-k&e=
>>>
>>> we batch allocations, so mlx4_en_refill_rx_buffers() is not called that often.
>>>
>>> I doubt the additional wmb() will have serious impact there.
>>>
> 
> I will test the effect (it'll be beginning of next week).
> I'll update so we can make a more confident decision.
> 
I have also sent patches with wmb and batching allocations to customer and let them check whether the performance is impacted.
And update here asap when get feedback.

> Thanks,
> Tariq
> 
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=f0myCdBQoRjaklxGau_S9ZtQKSQYALW9p2MIuTMAEYo&s=447fFu-xZoLvmxdaVhijK6cUk4Jcx7GtBCNddQT4GOQ&e=
>>
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

[parent not found: <d9883261-e93e-400a-757c-3a81d8b6aca1@mellanox.com>]

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
       [not found]                               ` <d9883261-e93e-400a-757c-3a81d8b6aca1@mellanox.com>
@ 2019-01-02  1:43                                 ` jianchao.wang
  0 siblings, 0 replies; 27+ messages in thread
From: jianchao.wang @ 2019-01-02  1:43 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Jason Gunthorpe
  Cc: junxiao.bi, netdev, linux-rdma, linux-kernel, Saeed Mahameed



On 12/31/18 12:27 AM, Tariq Toukan wrote:
> 
> 
> On 1/27/2018 2:41 PM, jianchao.wang wrote:
>> Hi Tariq
>>
>> Thanks for your kindly response.
>> That's really appreciated.
>>
>> On 01/25/2018 05:54 PM, Tariq Toukan wrote:
>>>
>>>
>>> On 25/01/2018 8:25 AM, jianchao.wang wrote:
>>>> Hi Eric
>>>>
>>>> Thanks for you kindly response and suggestion.
>>>> That's really appreciated.
>>>>
>>>> Jianchao
>>>>
>>>> On 01/25/2018 11:55 AM, Eric Dumazet wrote:
>>>>> On Thu, 2018-01-25 at 11:27 +0800, jianchao.wang wrote:
>>>>>> Hi Tariq
>>>>>>
>>>>>> On 01/22/2018 10:12 AM, jianchao.wang wrote:
>>>>>>>>> On 19/01/2018 5:49 PM, Eric Dumazet wrote:
>>>>>>>>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote:
>>>>>>>>>>> Hi Tariq
>>>>>>>>>>>
>>>>>>>>>>> Very sad that the crash was reproduced again after applied the patch.
>>>>>>>>
>>>>>>>> Memory barriers vary for different Archs, can you please share more details regarding arch and repro steps?
>>>>>>> The hardware is HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
>>>>>>> The xen is installed. The crash occurred in DOM0.
>>>>>>> Regarding to the repro steps, it is a customer's test which does heavy disk I/O over NFS storage without any guest.
>>>>>>>
>>>>>>
>>>>>> What is the finial suggestion on this ?
>>>>>> If use wmb there, is the performance pulled down ?
>>>
>>> I want to evaluate this effect.
>>> I agree with Eric, expected impact is restricted, especially after batching the allocations.>
>>>>>
>>>>> Since https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_davem_net-2Dnext.git_commit_-3Fid-3Ddad42c3038a59d27fced28ee4ec1d4a891b28155&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=7WdAxUBeiTUTCy8v-7zXyr4qk7sx26ATvfo6QSTvZyQ&m=c0oI8duFkyFBILMQYDsqRApHQrOlLY_2uGiz_utcd7s&s=E4_XKmSI0B63qB0DLQ1EX_fj1bOP78ZdeYADBf33B-k&e=
>>>>>
>>>>> we batch allocations, so mlx4_en_refill_rx_buffers() is not called that often.
>>>>>
>>>>> I doubt the additional wmb() will have serious impact there.
>>>>>
>>>
>>> I will test the effect (it'll be beginning of next week).
>>> I'll update so we can make a more confident decision.
>>>
>> I have also sent patches with wmb and batching allocations to customer and let them check whether the performance is impacted.
>> And update here asap when get feedback.
>>
>>> Thanks,
>>> Tariq
>>>
> 
> Hi Jianchao,
> 
> I am interested to push this bug fix.
> Do you want me to submit, or do it yourself?
> Can you elaborate regarding the arch with the repro?
> 
> This is the patch I suggest:
> 
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -161,6 +161,8 @@ static bool mlx4_en_is_ring_empty(const struct 
> mlx4_en_rx_ring *ring)
> 
>   static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
>   {
> +       /* ensure rx_desc updating reaches HW before prod db updating */
> +       wmb();
>          *ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
>   }
> 

Hi Tariq

Happy new year !

The customer provided confused test result for us.
The fix cannot fix their issue.

And we finally find a upstream fix
5d70bd5c98d0e655bde2aae2b5251bdd44df5e71
(net/mlx4_en: fix potential use-after-free with dma_unmap_page)
and killed the issue during Oct 2018. That's really a long way.

Please go ahead with this patch.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating
  2018-01-21  9:31               ` Tariq Toukan
  2018-01-21 16:24                 ` Tariq Toukan
@ 2018-01-21 20:40                 ` Jason Gunthorpe
  1 sibling, 0 replies; 27+ messages in thread
From: Jason Gunthorpe @ 2018-01-21 20:40 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Eric Dumazet, jianchao.wang, junxiao.bi, netdev, linux-rdma,
	linux-kernel, Saeed Mahameed

> Hmm, this is actually consistent with the example below [1].
> 
> AIU from the example, it seems that the dma_wmb/dma_rmb barriers are good
> for synchronizing cpu/device accesses to the "Streaming DMA mapped" buffers
> (the descriptors, went through the dma_map_page() API), but not for the
> doorbell (a coherent memory, typically allocated via dma_alloc_coherent)
> that requires using the stronger wmb() barrier.

If x86 truely requires a wmb() (aka SFENCE) here then the userspace
RDMA stuff is broken too, and that has been tested to death at this
point..

I looked into this at one point and I thought I concluded that x86 did
not require a SFENCE between a posted PCI write and writes to system
memory to guarnetee order with-respect-to the PCI device?

Well, so long as non-temporal stores and other specialty accesses are
not being used.. Is there a chance a fancy sse optimized memcpy or
memset, crypto or something is being involved here?

However, Documentation/memory-barriers.txt does seem pretty clear that
the kernel definition of wmb() makes it required here, even if it
might be overkill for x86?

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-01-02  1:43 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-12  3:42 [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating Jianchao Wang
2018-01-12 16:32 ` Jason Gunthorpe
2018-01-12 16:46   ` Eric Dumazet
2018-01-12 19:53     ` Saeed Mahameed
2018-01-12 20:16       ` Eric Dumazet
2018-01-12 21:01         ` Saeed Mahameed
2018-01-12 21:21           ` Eric Dumazet
2018-01-13 19:15           ` Jason Gunthorpe
2018-01-14  2:40     ` jianchao.wang
2018-01-14  9:47       ` Tariq Toukan
2018-01-15  5:50         ` jianchao.wang
2018-01-19 15:16           ` jianchao.wang
2018-01-19 15:49             ` Eric Dumazet
2018-01-21  9:31               ` Tariq Toukan
2018-01-21 16:24                 ` Tariq Toukan
2018-01-21 16:43                   ` Eric Dumazet
2018-01-22  2:40                     ` jianchao.wang
2018-01-22 15:47                       ` Jason Gunthorpe
2018-01-23  3:25                         ` jianchao.wang
2018-01-22  2:12                   ` jianchao.wang
2018-01-25  3:27                     ` jianchao.wang
2018-01-25  3:55                       ` Eric Dumazet
2018-01-25  6:25                         ` jianchao.wang
2018-01-25  9:54                           ` Tariq Toukan
2018-01-27 12:41                             ` jianchao.wang
     [not found]                               ` <d9883261-e93e-400a-757c-3a81d8b6aca1@mellanox.com>
2019-01-02  1:43                                 ` jianchao.wang
2018-01-21 20:40                 ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).