From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751195AbeAVClB (ORCPT ); Sun, 21 Jan 2018 21:41:01 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:36998 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751086AbeAVCk7 (ORCPT ); Sun, 21 Jan 2018 21:40:59 -0500 Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating To: Eric Dumazet , Tariq Toukan , Jason Gunthorpe Cc: junxiao.bi@oracle.com, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, Saeed Mahameed References: <1515728542-3060-1-git-send-email-jianchao.w.wang@oracle.com> <339a7156-9ef1-1f3c-30b8-3cc3558d124e@mellanox.com> <1516552998.3478.5.camel@gmail.com> From: "jianchao.wang" Message-ID: <460fca68-f8a8-e3c4-2e60-e90dc0e2f843@oracle.com> Date: Mon, 22 Jan 2018 10:40:53 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <1516552998.3478.5.camel@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8781 signatures=668655 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=748 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801220034 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Eric On 01/22/2018 12:43 AM, Eric Dumazet wrote: > On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote: >> >> On 21/01/2018 11:31 AM, Tariq Toukan wrote: >>> >>> >>> On 19/01/2018 5:49 PM, Eric Dumazet wrote: >>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote: >>>>> Hi Tariq >>>>> >>>>> Very sad that the crash was reproduced again after applied the patch. >> >> Memory barriers vary for different Archs, can you please share more >> details regarding arch and repro steps? > > Yeah, mlx4 NICs in Google fleet receive trillions of packets per > second, and we never noticed an issue. > > Although we are using a slightly different driver, using order-0 pages > and fast pages recycling. > > The driver we use will will set the page reference count to (size of pages)/stride, the pages will be freed by networking stack when the reference become zero, and the order-3 pages maybe allocated soon, this give NIC device a chance to corrupt the pages which have been allocated by others, such as slab. In the current version with order-0 and page recycling, maybe the corruption occurred on the inbound packets sometimes and just cause some bad and invalid packets which will be dropped. Thanks Jianchao