From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EC97C433DB for ; Wed, 3 Feb 2021 14:47:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6A99264DF0 for ; Wed, 3 Feb 2021 14:47:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A99264DF0 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 911A06B0005; Wed, 3 Feb 2021 09:47:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 898EC6B006E; Wed, 3 Feb 2021 09:47:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7393F6B0070; Wed, 3 Feb 2021 09:47:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0172.hostedemail.com [216.40.44.172]) by kanga.kvack.org (Postfix) with ESMTP id 528A36B0005 for ; Wed, 3 Feb 2021 09:47:46 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1E2D3181AC1F5 for ; Wed, 3 Feb 2021 14:47:46 +0000 (UTC) X-FDA: 77777235732.01.steel45_0c03a22275d4 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id AA14F10047F92 for ; Wed, 3 Feb 2021 14:47:45 +0000 (UTC) X-HE-Tag: steel45_0c03a22275d4 X-Filterd-Recvd-Size: 5101 Received: from smtp-fw-2101.amazon.com (smtp-fw-2101.amazon.com [72.21.196.25]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Wed, 3 Feb 2021 14:47:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1612363665; x=1643899665; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=Svcsvzf3Ng+V+7E2OJ6vov2g0Z2eg1nQgvarWUEuu8Q=; b=KIx7joHE3zw1Erzp1Y+927QFMAN0o/HtOpbem8Htuv2t98SLnwJxh+KX S/QAeXTFP72kmXdLvl6xk99ZFDOwK2R3aUHrAPbpcJVyhQFEua9ewWn2+ WR+PhIe5hdrRGu+MVrjTJ2GthbNYZZTKjKj3vzrSZpz9BXHVjZ1/HVUZC M=; X-IronPort-AV: E=Sophos;i="5.79,398,1602547200"; d="scan'208";a="79457428" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1e-c7c08562.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-2101.iad2.amazon.com with ESMTP; 03 Feb 2021 14:47:37 +0000 Received: from EX13D19EUB003.ant.amazon.com (iad12-ws-svc-p26-lb9-vlan2.iad.amazon.com [10.40.163.34]) by email-inbound-relay-1e-c7c08562.us-east-1.amazon.com (Postfix) with ESMTPS id BFE1F240BE6; Wed, 3 Feb 2021 14:47:32 +0000 (UTC) Received: from 8c85908914bf.ant.amazon.com (10.43.161.244) by EX13D19EUB003.ant.amazon.com (10.43.166.69) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Wed, 3 Feb 2021 14:47:24 +0000 Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification To: Jason Gunthorpe CC: , , , , , , , , , , , , , , , , , References: <20210202171327.GN4718@ziepe.ca> <20210203124358.59017-1-galpress@amazon.com> <20210203140015.GP4718@ziepe.ca> From: Gal Pressman Message-ID: Date: Wed, 3 Feb 2021 16:47:20 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <20210203140015.GP4718@ziepe.ca> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.43.161.244] X-ClientProxiedBy: EX13D27UWB001.ant.amazon.com (10.43.161.169) To EX13D19EUB003.ant.amazon.com (10.43.166.69) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 03/02/2021 16:00, Jason Gunthorpe wrote: > On Wed, Feb 03, 2021 at 02:43:58PM +0200, Gal Pressman wrote: >>> On Tue, Feb 02, 2021 at 12:05:36PM -0500, Peter Xu wrote: >>> >>>>> Gal, you could also MADV_DONTFORK this range if you are explicitly >>>>> allocating them via special mmap. >>>> >>>> Yeah I wanted to mention this one too but I just forgot when reply: the issue >>>> thread previously pasted smells like some people would like to drop >>>> MADV_DONTFORK, but if it's able to still be applied I don't know why >>>> not.. >>> >>> I want to drop the MADV_DONTFORK for dynamic data memory allocated by >>> the application layers (eg with malloc) without knowledge of how they >>> will be used. >>> >>> This case is a buffer internal to the communication system that we >>> know at allocation time how it will be used; so an explicit, >>> deliberate, MADV_DONTFORK is fine >> >> We are referring to libfabric's bounce buffers, correct? >> Libfabric could be considered as the "app" here, it's not clear why these >> buffers should be DONTFORK'd before ibv_reg_mr() but others don't. > > I assumed they were internal to the EFA code itself. The hugepages allocation is part of libfabric generic bufpool implementation: https://github.com/ofiwg/libfabric/blob/cde8665ca5ec2fb957260490d0c8700d8ac69863/include/linux/osd.h#L64 I guess we could madvise them at the libfabric provider's layer. >> Anyway, it should be simple enough to madvise them after allocation, although I >> think it's part of libfabric's generic code (which isn't necessarily used on >> top of rdma-core). > > Ah, so that is a reasonable justification for wanting to fix this in > the kernel.. > > Lets give Peter some time first. > > The other direction to validate this approach is to remove the > MAP_HUGETLB flags and rely on THP instead, and/or mark them as > MAP_SHARED. > > I'm not sure generic code should be use using MAP_HUGETLB.. It's using MAP_HUGETLB but has a fallback in case it fails: ret = ofi_alloc_hugepage_buf((void **) &buf_region->alloc_region, pool->alloc_size); /* If we can't allocate huge pages, fall back to normal * allocations for all future attempts. */ if (ret) { pool->attr.flags &= ~OFI_BUFPOOL_HUGEPAGES; goto retry; } buf_region->flags = OFI_BUFPOOL_HUGEPAGES; > This would be enough to confirm that everything else is working as > expected Agree.