From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965272AbeALTxr (ORCPT + 1 other); Fri, 12 Jan 2018 14:53:47 -0500 Received: from mail-eopbgr40043.outbound.protection.outlook.com ([40.107.4.43]:7738 "EHLO EUR03-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S965190AbeALTxn (ORCPT ); Fri, 12 Jan 2018 14:53:43 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=saeedm@mellanox.com; Subject: Re: [PATCH] net/mlx4_en: ensure rx_desc updating reaches HW before prod db updating To: Eric Dumazet , Jason Gunthorpe , Jianchao Wang Cc: tariqt@mellanox.com, junxiao.bi@oracle.com, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org References: <1515728542-3060-1-git-send-email-jianchao.w.wang@oracle.com> <20180112163247.GB15974@ziepe.ca> <1515775567.131759.42.camel@gmail.com> From: Saeed Mahameed Message-ID: <85116e56-52b1-944d-6ee2-916ccfc3a7a6@mellanox.com> Date: Fri, 12 Jan 2018 11:53:22 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <1515775567.131759.42.camel@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [209.116.155.178] X-ClientProxiedBy: CY4PR1101CA0015.namprd11.prod.outlook.com (2603:10b6:910:15::25) To AM3PR05MB0853.eurprd05.prod.outlook.com (2a01:111:e400:884f::16) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: a3fd6eb8-5a6f-4a9e-bd63-08d559f62c95 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020086)(4652020)(48565401081)(5600026)(4604075)(2017052603307)(7153060)(7193020);SRVR:AM3PR05MB0853; X-Microsoft-Exchange-Diagnostics: 1;AM3PR05MB0853;3:KnkJMHtfz7R/Jwz/SZio5rUCeUz/FEkxsKLLjI/BtImcpkEwGmBGbAFgyI35/loW+roihBSGdxPE8m4WhyJ0cOO84ZQttYZ1EVx3+MBjVO/3n+sRMzYqy9sRWZhDpIB7h+c2Wi6Yfrl0NmEY9IuQ9R9DuD4E+DC0FKv2siZKzSzvFHMf4JXRcQ3N66q98W1Bl5aPdJj58fd2Qc8/hEneaWpFApKboFOxSlNFA+WqyxLuV1x4I83pv8t5XBqBzGLN;25:Ydjf/KCV9u1HNzKwCvZB8eMF3J2+88bz9CTVWQ0kmZ2A4dK3/18erdIp/ztBvrOQQ8m0ZF9/zEWIUTu47ebOt0zerzbE7qeD7jxbjRPjfGcOFTRCVHKkmmS/Q0DpP4tZ5aOnFIOrRA29u+D43AcEVdKEINzyHiVJiQiG/R5kzZdrkCzpdJQOPVUaFbhga8iGd4NmEYA7s7jS128vMs8H6kz1UA3sCn7HiEVldQCrRBcA3iGN+EffE2zD3YTEIUpdgz4/WSrZdunOr7tKko4i4j0wZSn2MbEtCrtsvZtIu0T3b31NZS/bOlBJwxBYcD59lQ5XPbVX53ZJliWCXnm0qg==;31:b0sBtoeZNZ9bSIo53Fm12DE6PF070Hvdt+ER8W+XML0g+hGHkMd/+8iU/UNfOIHO7LZa9QKuk+maZYDP7DplPI0z/8hmGGj3PoCgxxXe4jYXD4tvCFl8HUQv7kh6NHrIcDY3GX1si/FKd7i08RUKa4fgThY9oCacww9jSkeyo9bHfkUy6PN0FjV1q0nPC/zLMp5S70v11HEVpSoE2WXu7l11gi0048LfsdZdzqj/P8c= X-MS-TrafficTypeDiagnostic: AM3PR05MB0853: X-Microsoft-Exchange-Diagnostics: 1;AM3PR05MB0853;20:V1Ds1EitDxVcsLejuddDYOdfWaKsaLlvoIren1CkWsRbv9rje/Ut5gMxpCZTHOyNKfIVPsKtBHQgRmHxzPJ9pd50vdqeWgp/NBVVFm/EwSHCpFHqvZ3bsh8QGcrW9kOh7aslLeWk9V75T6oyc4T1mdty61cTNakee7hEoVI5VRlze1vG41lZnk2m45wQvKtAPblZNE9cYuPq5ftqPNVMMdQTtxyGYlwUbuypT7S+2eiJ0O7S8mMiJADhJ81e1y1QvVIYLiqW8EXuzUCXcKWS2hLRrLm5nFIwxHsGaJzDZ7aic7726HHYW9/PBYJOVLaYLPt0505SWCvVfl4n0/HpPsmqtEzEyfybXz/y2KewsFu8pJx6HYYgXK9ID8BJ+VY9ymZJWZID9kOXNFI0YHNte9Tkiy1mP9Tmq8nQfDv9+9EMpodGQvFydm9vxZeNGEqmS7WGyXoppQLY71aZQ3refumLDKTMwY0P6pMHnXlfH2erYBMGHOwEAYeHwv2E0bQA;4:gcp+cUzzyU+rrvJAK3mZmAEQ79NoYSk4lQnPhy7Ll3OGW/OhktMAegSOefwoZY9oTHABLCUo+gpInlgtYw0ERRu5WkU/AVj/Q3OqJ+FXltL3HYfCfqawLOTtWLtKJ9a+8Z5rMTSb4LP0kt7Iq/FyGNcsLxQ/Vleyg7uW4g56gNvRVWqPn7lGzXJ7VNBX7pvQ6nrWRWX3dU71FLrqc3rP0sTgnDiSLwtAQsEtis6K6Ughfn7snCnau5ctJk7X5z/Cwj8hAU+w2dixqmsG99wk4Bs91OrZK9kWlvf5EQh3pJitwz67Ty7FmV5ris2f96vp X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(146099531331640); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040470)(2401047)(8121501046)(5005006)(10201501046)(3231023)(944501147)(3002001)(93006095)(93001095)(6055026)(6041268)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123564045)(20161123562045)(20161123558120)(6072148)(201708071742011);SRVR:AM3PR05MB0853;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:AM3PR05MB0853; X-Forefront-PRVS: 0550778858 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6049001)(39860400002)(39380400002)(346002)(376002)(366004)(396003)(377424004)(24454002)(189003)(199004)(65806001)(65956001)(64126003)(77096006)(6486002)(66066001)(39060400002)(386003)(229853002)(53546011)(81166006)(47776003)(81156014)(59450400001)(2906002)(58126008)(86362001)(110136005)(8676002)(4326008)(25786009)(83506002)(16576012)(316002)(6246003)(36756003)(230700001)(305945005)(53936002)(16526018)(97736004)(8936002)(7736002)(50466002)(2950100002)(478600001)(106356001)(117156002)(6666003)(105586002)(31686004)(31696002)(23676004)(52146003)(2486003)(76176011)(65826007)(5660300001)(52116002)(68736007)(6116002)(67846002)(3846002);DIR:OUT;SFP:1101;SCL:1;SRVR:AM3PR05MB0853;H:[192.168.70.195];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtBTTNQUjA1TUIwODUzOzIzOjFndFdUa2hYVnRVRW5NOUdCV3Fqc2VWVndN?= =?utf-8?B?cUtFRmhxOXZNd2VyLzlMVDcyUXJ6Y2RWMGM2RGVFRUVTaHRHZGprTm9hMTdy?= =?utf-8?B?bDlHNjR3TUozUGoyRTdqRkEvNFJybk0vaXJKaVpUeVN2VEZhMmJMNHMxLzJH?= =?utf-8?B?MmhDMGhzUU9laXhnY0p3VmEyQWVCNVhpbXZ5TjJvYjVINno3cUw3S3FEM3VW?= =?utf-8?B?RE9XLzlYelR6ZmVFVFVjVDczZjQ0WWRVdmFYajdDSmk4emdOMmVZLzdUbUZN?= =?utf-8?B?aEN4THBQenFnekZQcDJybktiYjVrY1ljOVh1bXFlMms5WHB4bm1BdDlaWFhV?= =?utf-8?B?aXF2QVR2a2xTRmYxaENWemdpaEo1YW1NdEFKTjNOTzNCbWtETmpZZ3BKUTEz?= =?utf-8?B?bkZxK0IyYWZvT3dUaUZBbjBhWGlVdUlzYVRxdVhhU2R0cWxNUzF2ZVRGUCs4?= =?utf-8?B?aURGa2g2Zm5YbXRnUmRiMXppTmVCZ1dWYkR4cXF1bkhTNWJ1Z3lTcEJ5Y3M4?= =?utf-8?B?RS9wb2FjRnhRcUZqL0FvTjA0RUlzOXRkMUY4Q2NhemRjR2tldUsvL3ZONXVV?= =?utf-8?B?dHoyRjlrRWVOZDF2bmszdkdYT1g5NnJ5a3hsRy82TnlxY1dQZDRBK1Yvdk1t?= =?utf-8?B?SDlzQUZxNzd0ZlNzVXozYUd6UW5RY01aNlVZM0dHTVA0ZFpVcnl3YVJpT3pK?= =?utf-8?B?ZHc1aHdXbDFmRUEyZTNRdVl1bjc2Y3g3bnZ0R3VNalE1dUd2S2dVYS9ycEY1?= =?utf-8?B?bUtBZWRBUWhyWUNIaUdkQ2E3d2ZVQmlQNW4zeVpIenhsZmVZdU9OcEdCcWFO?= =?utf-8?B?Skt0eENnOGI4NW9LaFgrSDljWDRoNG54eU9BbFlCVVFCbGFtU01JeHNJV2pj?= =?utf-8?B?cVF5SU9hUFNKTEF4dUZmSWgwbTFXdFBHVi9VRnZrVnBJYnRSVlZZeWxLVitG?= =?utf-8?B?VzhVbkJ0b3J4bHdPT21PQmx0eUUvZ0gvd1N4Mm5nR2xGekxGK29yczR2aEgy?= =?utf-8?B?R0VmdWJ6eTNuV2VneEhLSDduRjJ0c2VqZitzSjg1VUFFeEhBellCYlhoaEp0?= =?utf-8?B?RXpRTTkzaEpTYk16QW8rZUNYcUFxdFk0Vnd6d002NFdGd3RrLzRmWVhub2N6?= =?utf-8?B?c1dKenVHNWtOOWtOV3FwclhLeGVqSU43OXgxM1hzemQxMjZRWDF5RWRaN2NM?= =?utf-8?B?Nnp0dmlRUGJFWU9VR3lTQVVmRHlsRGJzdlFsNWN1RjcvaFFOZ2dCRFRpSjdE?= =?utf-8?B?UmFYbU9EUzRPc3pQbHFuc0Zna1U4VUFGdlNVYXdVd0ZOMjd2TlMwS1RxQ3Ri?= =?utf-8?B?UytZUGEwK3AxOG5GZVh2Mi9lZEJzK3lIMGNFanl3eG1tZFRuRlYwZVVsRVRh?= =?utf-8?B?TDlzYXZOT1d2VStRNWt5SWZ1SUczQk53MlVHeTcyMG13UnhPVlQ2ODBMWHUw?= =?utf-8?B?Q2NQQTN2M2Jtdkl6N2YxWFUveENvL08yeldxKzRzbzJxM2oxanY4cE8ya2lH?= =?utf-8?B?emVvSGVSL0c3ZFpKbXVZbHpTU1J3c1hRRGJydVdMbG5iNm1WZlZnanBwRVFa?= =?utf-8?B?bzVQMjdqTEprbEZINUNHSUNyMGlrd01LQkFwSnRCTjZqV3hnWHdpTDdVNWla?= =?utf-8?B?TEE2dUtOYTA2Z2ZUS2tjMzNSMHNyam9PU05TTitrRExuamQ3NkNISnpWZ1pG?= =?utf-8?B?em9WVStEMS9RYzFWdVRTMjllaWVIaUZhN2xuMy9KNXBnVHppS0FrRXVVdTZZ?= =?utf-8?B?cmhvWnB4cnVUZWVVREp4ZGYzd1YvQ3VXZlBLOU5taVpJZ0QxeWRGZGFFR0tX?= =?utf-8?B?ditMcDRuN1UvcGdydVp3cUpXb016U0FpYzkxc1lEV3VZQk9WK1g2cEdDRXV4?= =?utf-8?B?WlJxdXNYOHVwTDFTVSsyUnhKRXMvWG96dElPVnNxNFd3aFoveTJNUytLYnoy?= =?utf-8?B?THdybEFzUEM5bUV1U2IwTXlOMHZFNUNDSU9ENXdiWGxYeEJNVElwamtyUjJB?= =?utf-8?Q?e60IQ9?= X-Microsoft-Exchange-Diagnostics: 1;AM3PR05MB0853;6:dz93+pewPPRKSpsfNAzCWF3SbKSjpv9MDDivbN0WurcK5FprsiZTSCwOYzXfPBcv39r8ZQ0FX/qzRn3uMlOYKH0o0inRmV4y5Njo3Df4K37QaiW+oLK6frBEbYKGIQp3DeWMqEX61G+7/tbQG781fE7l0cW896gimruzzFf1VDCkG/vP0DCm7FoyYZH+DDIAj6TiOzI070Jzi3WWiBn6KXrP9XL3YWQynFV0TzMSc9RzePnNUiXgdVsBUZ05AycdVgFnNJ7HBUpm1kk1Y0WA0MNPH6aeGRkhBnb3MrrS5rZ4HwW7t8Tr6j12LLM3qL5Ew6p3PBrqhEmReWF5FITrSnlXIU9s9tOSW0uLYJ5vRig=;5:IVrHvEwRuuFwvqaFqEARMgrmGAjl2tbzCHgLkG45m1mRU6e37DGilCdRVG5wVwkNuA1C/1A/zUgGiF7kud+OOPRpJ68XCdY2sPq73Xl8UuSw74G86/FzJLLdB0VQjhRGSeMVEG82RZkNDC/BH37t7fqJ8SLxXH7j/vJj6Vtb9g8=;24:Us2cTJHn+SxdY3ccD1uXeSuzs8lsO/ZJvFIDUArxPzyFoHh/OQqzE3Y66NDOQWFtUvpm57j/6oPOTQqtzywfp1lUg3Z0g06YtQf3xT2gwHY=;7:wNTVgR9yAF0woqooDPhQw5bElUfrnDUzZbauJTTYRKksx8gfVLKMo7e4pwL9ShfrNEIOI/cmUZ49hfMP84bc/uJyDWjzInYf/oiRwUKupXrKEB8IdQc8/O7YPA/c+8VNu5YQ3k+RLL/NOegLiY6YteiSJwP3+UJiIfQqNbluO4t0gtZ3MyPVYT3VijIz+rBNRmlH17qPZ1Ann9yt5W64mPq5sMfI34MX1ohaPCAEvPNbY8FJWBWHfVR6laNEbB2A SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2018 19:53:36.6536 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a3fd6eb8-5a6f-4a9e-bd63-08d559f62c95 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM3PR05MB0853 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/12/2018 08:46 AM, Eric Dumazet wrote: > On Fri, 2018-01-12 at 09:32 -0700, Jason Gunthorpe wrote: >> On Fri, Jan 12, 2018 at 11:42:22AM +0800, Jianchao Wang wrote: >>> Customer reported memory corruption issue on previous mlx4_en driver >>> version where the order-3 pages and multiple page reference counting >>> were still used. >>> >>> Finally, find out one of the root causes is that the HW may see stale >>> rx_descs due to prod db updating reaches HW before rx_desc. Especially >>> when cross order-3 pages boundary and update a new one, HW may write >>> on the pages which may has been freed and allocated again by others. >>> >>> To fix it, add a wmb between rx_desc and prod db updating to ensure >>> the order. Even thougth order-0 and page recycling has been introduced, >>> the disorder between rx_desc and prod db still could lead to corruption >>> on different inbound packages. >>> >>> Signed-off-by: Jianchao Wang >>> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c >>> index 85e28ef..eefa82c 100644 >>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c >>> @@ -555,7 +555,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv, >>> break; >>> ring->prod++; >>> } while (likely(--missing)); >>> - >>> + wmb(); /* ensure rx_desc updating reaches HW before prod db updating */ >>> mlx4_en_update_rx_prod_db(ring); >>> } >>> >> >> Does this need to be dma_wmb(), and should it be in >> mlx4_en_update_rx_prod_db ? >> > > +1 on dma_wmb() > > On what architecture bug was observed ? > > In any case, the barrier should be moved in mlx4_en_update_rx_prod_db() > I think. > +1 on dma_wmb(), thanks Eric for reviewing this. The barrier is also needed elsewhere in the code as well, but I wouldn't put it in mlx4_en_update_rx_prod_db(), just to allow batch filling of all rx rings and then hit the barrier only once. As a rule of thumb, mem barriers are the ring API caller responsibility. e.g. in mlx4_en_activate_rx_rings(): between mlx4_en_fill_rx_buffers(priv); and the loop that updates rx prod for all rings ring, the dma_wmb is needed, see below. diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index b4d144e67514..65541721a240 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -370,6 +370,8 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv) if (err) goto err_buffers; + dma_wmb(); + for (ring_ind = 0; ring_ind < priv->rx_ring_num; ring_ind++) { ring = priv->rx_ring[ring_ind];