From mboxrd@z Thu Jan 1 00:00:00 1970 From: "jianchao.wang" Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating Date: Fri, 19 Jan 2018 23:16:09 +0800 Message-ID: <53b1ac4d-a294-eb98-149e-65d7954243da@oracle.com> References: <1515728542-3060-1-git-send-email-jianchao.w.wang@oracle.com> <20180112163247.GB15974@ziepe.ca> <1515775567.131759.42.camel@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Language: en-US Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Tariq Toukan , Eric Dumazet , Jason Gunthorpe Cc: junxiao.bi-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Saeed Mahameed List-Id: linux-rdma@vger.kernel.org Hi Tariq Very sad that the crash was reproduced again after applied the patch. --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring) static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring) { + dma_wmb(); *ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff); } I analyzed the kdump, it should be a memory corruption. Thanks Jianchao On 01/15/2018 01:50 PM, jianchao.wang wrote: > Hi Tariq > > Thanks for your kindly response. > > On 01/14/2018 05:47 PM, Tariq Toukan wrote: >> Thanks Jianchao for your patch. >> >> And Thank you guys for your reviews, much appreciated. >> I was off-work on Friday and Saturday. >> >> On 14/01/2018 4:40 AM, jianchao.wang wrote: >>> Dear all >>> >>> Thanks for the kindly response and reviewing. That's really appreciated. >>> >>> On 01/13/2018 12:46 AM, Eric Dumazet wrote: >>>>> Does this need to be dma_wmb(), and should it be in >>>>> mlx4_en_update_rx_prod_db ? >>>>> >>>> +1 on dma_wmb() >>>> >>>> On what architecture bug was observed ? >>> This issue was observed on x86-64. >>> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer >>> to confirm. >> >> +1 on dma_wmb, let us know once customer confirms. >> Please place it within mlx4_en_update_rx_prod_db as suggested. > Yes, I have recommended it to customer. > Once I get the result, I will share it here. >> All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier. >> >> Thanks, >> Tariq >> > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932363AbeASPT7 (ORCPT ); Fri, 19 Jan 2018 10:19:59 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:46242 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932191AbeASPTs (ORCPT ); Fri, 19 Jan 2018 10:19:48 -0500 Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating To: Tariq Toukan , Eric Dumazet , Jason Gunthorpe Cc: junxiao.bi@oracle.com, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, Saeed Mahameed References: <1515728542-3060-1-git-send-email-jianchao.w.wang@oracle.com> <20180112163247.GB15974@ziepe.ca> <1515775567.131759.42.camel@gmail.com> From: "jianchao.wang" Message-ID: <53b1ac4d-a294-eb98-149e-65d7954243da@oracle.com> Date: Fri, 19 Jan 2018 23:16:09 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8778 signatures=668654 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801190199 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Tariq Very sad that the crash was reproduced again after applied the patch. --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -252,6 +252,7 @@ static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring) static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring) { + dma_wmb(); *ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff); } I analyzed the kdump, it should be a memory corruption. Thanks Jianchao On 01/15/2018 01:50 PM, jianchao.wang wrote: > Hi Tariq > > Thanks for your kindly response. > > On 01/14/2018 05:47 PM, Tariq Toukan wrote: >> Thanks Jianchao for your patch. >> >> And Thank you guys for your reviews, much appreciated. >> I was off-work on Friday and Saturday. >> >> On 14/01/2018 4:40 AM, jianchao.wang wrote: >>> Dear all >>> >>> Thanks for the kindly response and reviewing. That's really appreciated. >>> >>> On 01/13/2018 12:46 AM, Eric Dumazet wrote: >>>>> Does this need to be dma_wmb(), and should it be in >>>>> mlx4_en_update_rx_prod_db ? >>>>> >>>> +1 on dma_wmb() >>>> >>>> On what architecture bug was observed ? >>> This issue was observed on x86-64. >>> And I will send a new patch, in which replace wmb() with dma_wmb(), to customer >>> to confirm. >> >> +1 on dma_wmb, let us know once customer confirms. >> Please place it within mlx4_en_update_rx_prod_db as suggested. > Yes, I have recommended it to customer. > Once I get the result, I will share it here. >> All other calls to mlx4_en_update_rx_prod_db are in control/slow path so I prefer being on the safe side, and care less about bulking the barrier. >> >> Thanks, >> Tariq >> >