From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33168C5519F for ; Fri, 20 Nov 2020 15:45:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 76EDC2245B for ; Fri, 20 Nov 2020 15:45:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="qW+cbYIf" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 76EDC2245B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8D0F66B0036; Fri, 20 Nov 2020 10:45:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A70D6B005C; Fri, 20 Nov 2020 10:45:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7BF1C6B006E; Fri, 20 Nov 2020 10:45:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0143.hostedemail.com [216.40.44.143]) by kanga.kvack.org (Postfix) with ESMTP id 4DF5C6B0036 for ; Fri, 20 Nov 2020 10:45:06 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E745C181AEF0B for ; Fri, 20 Nov 2020 15:45:05 +0000 (UTC) X-FDA: 77505220170.01.help69_2615fa82734c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id C3A2310051131 for ; Fri, 20 Nov 2020 15:45:05 +0000 (UTC) X-HE-Tag: help69_2615fa82734c X-Filterd-Recvd-Size: 6556 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Fri, 20 Nov 2020 15:45:04 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id 81so7625701pgf.0 for ; Fri, 20 Nov 2020 07:45:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DUaJ36sT5sqvT77QZgKqlW4z7GZkzcPfj81qnxeQebA=; b=qW+cbYIfyE/YclD6/OoMNca+M7Tkh/aPKcKUa2/jD8YKhRzUfQ5xuQPJditN/3qfXi iomQyjtsl1XRlyUvHy/KwLYMeXPq92PnxjZRh34FTSNV944O/9RWLlra1f9ixcihX0wP Y7WNpIPH8xxtinwgWAu7uHXLAkTejPdX2VQvhz6fjsFQ/SaKWq/B9j8ARM38OACC1/aq zsgiwfkKQQ94B2YYCidxE7So02PncvNpjnNAor+EaeO7amsTxH6WxPopLIMA8JTTbnvI fSmTqpzpyg0uWYNUhZgg6wvslriEFUZlrnqOGOhrVPHyzv6x+L0+jgnptbqwJYBwICyu hSVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DUaJ36sT5sqvT77QZgKqlW4z7GZkzcPfj81qnxeQebA=; b=OQZwORSOxqUkgA0tl/NklspDMmM2G/jj8jlW6J2IWff/rYbFsKkZNK9M1XADjJzOVp a+U478knF+QgjXwi8rhUcChtASD0ToBCHfqE9WdEtuHcO2RHxVgcIP5OmuXH3LIjRhjg XBmY1DQAc+TWcRHvFsf05fMPdhMYy4Dcu0wuZcvyL/oRc6/0gSPWchFa+kNu4hnIPEti jpLwUGpXR1tSQ41IKtN7ILUyxmWDu2pZi0Ip/x/wcUya24wepPQgN6eiZNC0teZO8iN4 Ka3eIO5ZSH02voSSylK3K1gehThKygF/srG1/lQzdh5OTNiC9sy0JzG3lQn0hqhcuEhg yu9Q== X-Gm-Message-State: AOAM530vfT0G6OpF02Vl8tnq2222m9xHNfFqX/6HbwoRxGlL8h9zoQEg lRcCDwp1HSsDDnazdmRGqhoKbhX3EiDmIx0TeJeWMw== X-Google-Smtp-Source: ABdhPJzmbK5n3mM2ZmdJphA3yz5noZLknoJg14VWIwCafTYyHUe6A7LPpGU824Rpt37PGuu19LPqU3nP+FXJHz+pCRU= X-Received: by 2002:a63:594a:: with SMTP id j10mr17197062pgm.341.1605887103558; Fri, 20 Nov 2020 07:45:03 -0800 (PST) MIME-Version: 1.0 References: <20201120064325.34492-1-songmuchun@bytedance.com> <20201120084202.GJ3200@dhcp22.suse.cz> <20201120131129.GO3200@dhcp22.suse.cz> In-Reply-To: <20201120131129.GO3200@dhcp22.suse.cz> From: Muchun Song Date: Fri, 20 Nov 2020 23:44:26 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v5 00/21] Free some vmemmap pages of hugetlb page To: Michal Hocko Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Oscar Salvador , "Song Bao Hua (Barry Song)" , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Nov 20, 2020 at 9:11 PM Michal Hocko wrote: > > On Fri 20-11-20 20:40:46, Muchun Song wrote: > > On Fri, Nov 20, 2020 at 4:42 PM Michal Hocko wrote: > > > > > > On Fri 20-11-20 14:43:04, Muchun Song wrote: > > > [...] > > > > > > Thanks for improving the cover letter and providing some numbers. I have > > > only glanced through the patchset because I didn't really have more time > > > to dive depply into them. > > > > > > Overall it looks promissing. To summarize. I would prefer to not have > > > the feature enablement controlled by compile time option and the kernel > > > command line option should be opt-in. I also do not like that freeing > > > the pool can trigger the oom killer or even shut the system down if no > > > oom victim is eligible. > > > > Hi Michal, > > > > I have replied to you about those questions on the other mail thread. > > > > Thanks. > > > > > > > > One thing that I didn't really get to think hard about is what is the > > > effect of vmemmap manipulation wrt pfn walkers. pfn_to_page can be > > > invalid when racing with the split. How do we enforce that this won't > > > blow up? > > > > This feature depends on the CONFIG_SPARSEMEM_VMEMMAP, > > in this case, the pfn_to_page can work. The return value of the > > pfn_to_page is actually the address of it's struct page struct. > > I can not figure out where the problem is. Can you describe the > > problem in detail please? Thanks. > > struct page returned by pfn_to_page might get invalid right when it is > returned because vmemmap could get freed up and the respective memory > released to the page allocator and reused for something else. See? If the HugeTLB page is already allocated from the buddy allocator, the struct page of the HugeTLB can be freed? Does this exist? If yes, how to free the HugeTLB page to the buddy allocator (cannot access the struct page)? > > > > I have also asked in a previous version whether the vmemmap manipulation > > > should be really unconditional. E.g. shortlived hugetlb pages allocated > > > from the buddy allocator directly rather than for a pool. Maybe it > > > should be restricted for the pool allocation as those are considered > > > long term and therefore the overhead will be amortized and freeing path > > > restrictions better understandable. > > > > Yeah, I agree with you. This can be an optimization. And we can > > add it to the todo list and implement it in the future. Now the patch > > series is already huge. > > Yes the patchset is large and the primary aim should be reducing > functionality to make it smaller in the first incarnation. Especially > when it is tricky to implement. Releasing vmemmap sparse hugepages is > one of those things. Do you really need it for your usecase? > -- > Michal Hocko > SUSE Labs -- Yours, Muchun