From mboxrd@z Thu Jan  1 00:00:00 1970
From: Qing Huang <qing.huang@oracle.com>
Subject: Re: [PATCH v2 net] mlx4_core: restore optimal ICM memory allocation
Date: Thu, 31 May 2018 18:51:45 -0700
Message-ID: <d470065c-f45d-7761-b347-729fb49c89c9@oracle.com>
References: <20180531125224.97098-1-edumazet@google.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: netdev <netdev@vger.kernel.org>,
        Eric Dumazet <eric.dumazet@gmail.com>,
        John Sperbeck <jsperbeck@google.com>,
        Tarick Bedeir <tarick@google.com>,
        Daniel Jurgens <danielj@mellanox.com>,
        Zhu Yanjun <yanjun.zhu@oracle.com>
To: Eric Dumazet <edumazet@google.com>,
        "David S . Miller" <davem@davemloft.net>
Return-path: <netdev-owner@vger.kernel.org>
Received: from userp2120.oracle.com ([156.151.31.85]:51174 "EHLO
        userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750757AbeFABvv (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 31 May 2018 21:51:51 -0400
In-Reply-To: <20180531125224.97098-1-edumazet@google.com>
Content-Language: en-US
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>


On 5/31/2018 5:52 AM, Eric Dumazet wrote:
> Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> brought two regressions caught in our regression suite.
>
> The big one is an additional cost of 256 bytes of overhead per 4096 bytes,
> or 6.25 % which is unacceptable since ICM can be pretty large.
>
> This comes from having to allocate one struct mlx4_icm_chunk (256 bytes)
> per MLX4_TABLE_CHUNK, which the buggy commit shrank to 4KB
> (instead of prior 256KB)

It would be great if you could share the test case that triggered the 
KASAN report in your
environment. Our QA has been running intensive tests using 8KB or 4KB 
chunk size configuration
for some time, no one has reported memory corruption issues so far, 
given that we are not
running the latest upstream kernel.

IMO, it's worthwhile to find out the root cause and fix the problem.


>
> Note that mlx4_alloc_icm() is already able to try high order allocations
> and fallback to low-order allocations under high memory pressure.
>
> Most of these allocations happen right after boot time, when we get
> plenty of non fragmented memory, there is really no point being so
> pessimistic and break huge pages into order-0 ones just for fun.
>
> We only have to tweak gfp_mask a bit, to help falling back faster,
> without risking OOM killings.

Just FYI, out of memory wasn't our original concern. We didn't encounter 
OOM killings.


>
> Second regression is an KASAN fault, that will need further investigations.
>
> Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Tariq Toukan <tariqt@mellanox.com>
> Cc: John Sperbeck <jsperbeck@google.com>
> Cc: Tarick Bedeir <tarick@google.com>
> Cc: Qing Huang <qing.huang@oracle.com>
> Cc: Daniel Jurgens <danielj@mellanox.com>
> Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/icm.c | 18 ++++++++++++------
>   1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index 685337d58276fc91baeeb64387c52985e1bc6dda..5342bd8a3d0bfaa9e76bb9b6943790606c97b181 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -43,12 +43,13 @@
>   #include "fw.h"
>   
>   /*
> - * We allocate in page size (default 4KB on many archs) chunks to avoid high
> - * order memory allocations in fragmented/high usage memory situation.
> + * We allocate in as big chunks as we can, up to a maximum of 256 KB
> + * per chunk. Note that the chunks are not necessarily in contiguous
> + * physical memory.
>    */
>   enum {
> -	MLX4_ICM_ALLOC_SIZE	= PAGE_SIZE,
> -	MLX4_TABLE_CHUNK_SIZE	= PAGE_SIZE,
> +	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
> +	MLX4_TABLE_CHUNK_SIZE	= 1 << 18,
>   };
>   
>   static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
> @@ -135,6 +136,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
>   	struct mlx4_icm *icm;
>   	struct mlx4_icm_chunk *chunk = NULL;
>   	int cur_order;
> +	gfp_t mask;
>   	int ret;
>   
>   	/* We use sg_set_buf for coherent allocs, which assumes low memory */
> @@ -178,13 +180,17 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
>   		while (1 << cur_order > npages)
>   			--cur_order;
>   
> +		mask = gfp_mask;
> +		if (cur_order)
> +			mask &= ~__GFP_DIRECT_RECLAIM;
> +
>   		if (coherent)
>   			ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev,
>   						      &chunk->mem[chunk->npages],
> -						      cur_order, gfp_mask);
> +						      cur_order, mask);
>   		else
>   			ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages],
> -						   cur_order, gfp_mask,
> +						   cur_order, mask,
>   						   dev->numa_node);
>   
>   		if (ret) {