From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C5A9C43467 for ; Fri, 9 Oct 2020 04:14:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A1C7922248 for ; Fri, 9 Oct 2020 04:14:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="v7gBYn5/" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A1C7922248 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 01B616B005C; Fri, 9 Oct 2020 00:14:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EE6046B005D; Fri, 9 Oct 2020 00:14:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DACBB6B0062; Fri, 9 Oct 2020 00:14:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id A4D9E6B005C for ; Fri, 9 Oct 2020 00:14:25 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 327268249980 for ; Fri, 9 Oct 2020 04:14:25 +0000 (UTC) X-FDA: 77351070090.28.dime84_0e12fc0271dd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 13B426C05 for ; Fri, 9 Oct 2020 04:14:25 +0000 (UTC) X-HE-Tag: dime84_0e12fc0271dd X-Filterd-Recvd-Size: 7339 Received: from mail-pl1-f194.google.com (mail-pl1-f194.google.com [209.85.214.194]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Fri, 9 Oct 2020 04:14:24 +0000 (UTC) Received: by mail-pl1-f194.google.com with SMTP id y1so302085plp.6 for ; Thu, 08 Oct 2020 21:14:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2EdRl1RZSZyQAx1Q/Ie821/xWpZ1zaRf0wSS4lUA3SI=; b=v7gBYn5/EZRmnBAafloGJqCSbpmjFsH9/KMYR1XRcVdwbTsT+m1XSFlQI84qBLDeOT b6kqq2Jl1eNr7H6VNFbnPEN3x97qDCgPpi/SPmll2afdgy/agYkIwQeMQYm2S8IO9ztp OHcPIBkoKPNfbYmailisJnqZthEwjvGgu8lniBGv3y5+22eE7vAsugCQiA6cnJ9pNAfO TBz4fOgAgQiOOiyrCe6fjZuO1tQ5Ougo0x6zCMAT8nLaQBtch2+Y+la2jwgpauoS2nyi VJi1YXlCmmzENl7kiOneL+0Q+/1cXu+BVnEXk3bhxCwELuqUtAHE+dkeLkC03XLw7fUe va5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2EdRl1RZSZyQAx1Q/Ie821/xWpZ1zaRf0wSS4lUA3SI=; b=QxzdZ0PEMT/9v4PliHHu5IExRHF+car7iQ3f8nCpuGb5lUyH6qtuq5axk4LHmyfBl8 Z7RkNKfr6SQEsedrQccn0UX+4zx6QgwDLozScpCn5Gtek4rUCXOeCtFX/MSbMsf28aiB 0vgPYmSeEiVUQmebjdy+n2oDJEBU2GvkXHbvkMShKxWgRLWwx8L8WMXP/1wyRGaCt5Eo nyZ/RxcbMZprhGGQdFgGq6unBlhLltaa8dojErc1wJeb4qCWAujKBzCmpiubRta5pcMm Y/e3YrsE8u7KAgpoh/IkPBy1+j4M5xQsQLPOtLhv38sYo9urUlWFvpM85EIpcmlr3FUG h4TQ== X-Gm-Message-State: AOAM532QbLofpNUQO4l1LFE1LKL4xblDXy9DsHq3gTGngLFXVdJfUzvS 1kkOd6xxbKxMqhkYGafqNwf+ZLPSPqPQsKTB7O/f7g== X-Google-Smtp-Source: ABdhPJy+9zgYNBhCq4QlEB9KjNxaK/TjK9EK5mKamVQ74y0m5jXebPwnkkW6871aHrnVRAd4WHnZoDUCYUoEHq5INcY= X-Received: by 2002:a17:902:7681:b029:d2:88b1:b130 with SMTP id m1-20020a1709027681b02900d288b1b130mr10859993pll.20.1602216863306; Thu, 08 Oct 2020 21:14:23 -0700 (PDT) MIME-Version: 1.0 References: <20200915125947.26204-1-songmuchun@bytedance.com> <31eac1d8-69ba-ed2f-8e47-d957d6bb908c@oracle.com> <9d220de0-f06d-cb5b-363f-6ae97d5b4146@oracle.com> In-Reply-To: <9d220de0-f06d-cb5b-363f-6ae97d5b4146@oracle.com> From: Muchun Song Date: Fri, 9 Oct 2020 12:13:44 +0800 Message-ID: Subject: Re: [External] Re: [RFC PATCH 00/24] mm/hugetlb: Free some vmemmap pages of hugetlb page To: Mike Kravetz Cc: Jonathan Corbet , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Matthew Wilcox , Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel@vger.kernel.org, Xiongchun duan Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 8, 2020 at 5:15 AM Mike Kravetz wrote: > > On 9/29/20 2:58 PM, Mike Kravetz wrote: > > On 9/15/20 5:59 AM, Muchun Song wrote: > >> Hi all, > >> > >> This patch series will free some vmemmap pages(struct page structures) > >> associated with each hugetlbpage when preallocated to save memory. > > ... > >> The mapping of the first page(index 0) and the second page(index 1) is > >> unchanged. The remaining 6 pages are all mapped to the same page(index > >> 1). So we only need 2 pages for vmemmap area and free 6 pages to the > >> buddy system to save memory. Why we can do this? Because the content > >> of the remaining 7 pages are usually same except the first page. > >> > >> When a hugetlbpage is freed to the buddy system, we should allocate 6 > >> pages for vmemmap pages and restore the previous mapping relationship. > >> > >> If we uses the 1G hugetlbpage, we can save 4095 pages. This is a very > >> substantial gain. On our server, run some SPDK applications which will > >> use 300GB hugetlbpage. With this feature enabled, we can save 4797MB > >> memory. > > I had a hard time going through the patch series as it is currently > structured, and instead examined all the code together. Muchun put in > much effort and the code does reduce memory usage. > - For 2MB hugetlb pages, we save 5 pages of struct pages > - For 1GB hugetlb pages, we save 4086 pages of struct pages > > Code is even in pace to handle poisoned pages, although I have not looked > at this closely. The code survives the libhugetlbfs and ltp huge page tests. > > To date, nobody has asked the important question "Is the added complexity > worth the memory savings?". I suppose it all depends on one's use case. > Obviously, the savings are more significant when one uses 1G huge pages but > that may not be the common case today. > > > At a high level this seems like a reasonable optimization for hugetlb > > pages. It is possible because hugetlb pages are 'special' and mostly > > handled differently than pages in normal mm paths. > > Such an optimization only makes sense for something like hugetlb pages. One > reason is the 'special' nature of hugetlbfs as stated above. The other is > that this optimization mostly makes sense for huge pages that are created > once and stick around for a long time. hugetlb pool pages are a perfect > example. This is because manipulation of struct page mappings is done when > a huge page is created or destroyed. Yeah, in our cloud server, we have some application scenarios(e.g. SPDK, DPDK, QEMU and jemalloc). These applications may use a lot of hugetlb pages. > > > The majority of the new code is hugetlb specific, so it should not be > > of too much concern for the general mm code paths. > > It is true that much of the code in this series was put in hugetlb.c. However, > I would argue that there is a bunch of code that only deals with remapping > the memmap which should more generic and added to sparse-vmemmap.c. This > would at least allow for easier reuse. I agree with you. > > Before Muchun and myself put more effort into this series, I would really > like to get feedback on the whether or not this should move forward. > Specifically, is the memory savings worth added complexity? Is the removing > of struct pages going to come back and cause issues for future features? Some users do need this optimization to save memory. But if some users do not need this optimization, they also can disable it by using a kernel boot parameter 'hugetlb_free_vmemmap=off' or not configuring CONFIG_HUGETLB_PAGE_FREE_VMEMMAP. I have no idea about "cause issues for future features". Is there any feature ongoing or planned? -- Yours, Muchun