From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 398B9C63798 for ; Fri, 20 Nov 2020 09:38:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D12C22242B for ; Fri, 20 Nov 2020 09:38:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="SYxcW6HI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727349AbgKTJhx (ORCPT ); Fri, 20 Nov 2020 04:37:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43216 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726460AbgKTJhv (ORCPT ); Fri, 20 Nov 2020 04:37:51 -0500 Received: from mail-pf1-x444.google.com (mail-pf1-x444.google.com [IPv6:2607:f8b0:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDCD0C061A04 for ; Fri, 20 Nov 2020 01:37:51 -0800 (PST) Received: by mail-pf1-x444.google.com with SMTP id y7so7302264pfq.11 for ; Fri, 20 Nov 2020 01:37:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Irqz9SDqUIDSV5L/8FdFdXAurDYsnyc+HWaKMTHel6Y=; b=SYxcW6HISIb8Awuh3nmJxgBqMNn6Q5GeMGaNN4zWGoyErVEU+AZ6B7m7JrV2+aVg/I qKv9JPOYAWnCAT+77ENHbA0k0suOlxx8gRsdjzajNwuUzaTKLTJVooMbRGtWd+194Fhs 3CwBMuEyoIWueQNySRtc0S3biNBTuMrHggzoUWAFee8WLDSUVlmcpjvOHU3ma2hNKn49 2j//Ku7/vTvBvqjgtoS3EE8Z4UzTujVrgtrGX1iCL0LyE0sM6+rYkiGPliBe1aickIOt fTrJTmvKQkEes7ltwtIat40s17xslYkJu7DhUKLTElLqX7NCeVbEIlMeij/RqzE1BiW9 piiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Irqz9SDqUIDSV5L/8FdFdXAurDYsnyc+HWaKMTHel6Y=; b=YtIbGGKptb7JR78tIBeZ/dZ97rvzyCAHe3KHagj6h4gohkumI4kOi2jwlcY8wnfmu0 6rzeqv3RTkuFt2NIVxhWC7ZDw3sCkjBfIx7zWhtpYK1hyUXCmYIAOTwwJ7pLG98aQCOh uydw2IU5gN+R81Io9AEEHFbWzRYvwXGexZ5JgC1OQeoFY/wXCt4n8G7B2kBz+JUGZDrp Ff8CK6p5ksL4HQ17iLCQoSL9QPlqgli9aJQcB2MFyf0gglHTByRQ2HOw5vaI0Evs1mLR YBRAh+fESRUHhv29t/DPHR/OkDPoUAWbYXCzEjYtnZdjGU+I5xwSSL+Qbuf0V6QIzSxe wRxw== X-Gm-Message-State: AOAM532dYEA2jETV8HF5R3qwQqiKFE6annIq4MCpt/kvjqt58vgrkuPW h6c9HPqekswH7SOVHm0iFweM31V0KQklqPnOI02U2w== X-Google-Smtp-Source: ABdhPJw9yspq9j/qG1NKAt5aapXI1Q0GtnvJsXQ+8BEuEA6w4nEfVelDgROJG5yjDGSXPqb9X0CBLn7z1ViZeLeqFp4= X-Received: by 2002:a17:90b:941:: with SMTP id dw1mr9250411pjb.147.1605865071332; Fri, 20 Nov 2020 01:37:51 -0800 (PST) MIME-Version: 1.0 References: <20201120064325.34492-1-songmuchun@bytedance.com> <20201120064325.34492-12-songmuchun@bytedance.com> <20201120081123.GC3200@dhcp22.suse.cz> <20201120092826.GL3200@dhcp22.suse.cz> In-Reply-To: <20201120092826.GL3200@dhcp22.suse.cz> From: Muchun Song Date: Fri, 20 Nov 2020 17:37:09 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v5 11/21] mm/hugetlb: Allocate the vmemmap pages associated with each hugetlb page To: Michal Hocko Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Oscar Salvador , "Song Bao Hua (Barry Song)" , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 20, 2020 at 5:28 PM Michal Hocko wrote: > > On Fri 20-11-20 16:51:59, Muchun Song wrote: > > On Fri, Nov 20, 2020 at 4:11 PM Michal Hocko wrote: > > > > > > On Fri 20-11-20 14:43:15, Muchun Song wrote: > > > [...] > > > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > > > > index eda7e3a0b67c..361c4174e222 100644 > > > > --- a/mm/hugetlb_vmemmap.c > > > > +++ b/mm/hugetlb_vmemmap.c > > > > @@ -117,6 +117,8 @@ > > > > #define RESERVE_VMEMMAP_NR 2U > > > > #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) > > > > #define TAIL_PAGE_REUSE -1 > > > > +#define GFP_VMEMMAP_PAGE \ > > > > + (GFP_KERNEL | __GFP_NOFAIL | __GFP_MEMALLOC) > > > > > > This is really dangerous! __GFP_MEMALLOC would allow a complete memory > > > depletion. I am not even sure triggering the OOM killer is a reasonable > > > behavior. It is just unexpected that shrinking a hugetlb pool can have > > > destructive side effects. I believe it would be more reasonable to > > > simply refuse to shrink the pool if we cannot free those pages up. This > > > sucks as well but it isn't destructive at least. > > > > I find the instructions of __GFP_MEMALLOC from the kernel doc. > > > > %__GFP_MEMALLOC allows access to all memory. This should only be used when > > the caller guarantees the allocation will allow more memory to be freed > > very shortly. > > > > Our situation is in line with the description above. We will free a HugeTLB page > > to the buddy allocator which is much larger than that we allocated shortly. > > Yes that is a part of the description. But read it in its full entirety. > * %__GFP_MEMALLOC allows access to all memory. This should only be used when > * the caller guarantees the allocation will allow more memory to be freed > * very shortly e.g. process exiting or swapping. Users either should > * be the MM or co-ordinating closely with the VM (e.g. swap over NFS). > * Users of this flag have to be extremely careful to not deplete the reserve > * completely and implement a throttling mechanism which controls the > * consumption of the reserve based on the amount of freed memory. > * Usage of a pre-allocated pool (e.g. mempool) should be always considered > * before using this flag. > > GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_HIGH We want to free the HugeTLB page to the buddy allocator, but before that, we need to allocate some pages as vmemmap pages, so here we cannot handle allocation failures. I think that we should replace the __GFP_RETRY_MAYFAIL to __GFP_NOFAIL. GFP_KERNEL | __GFP_NOFAIL | __GFP_HIGH This meets our needs here. Thanks. > > sounds like a more reasonable fit to me. > > -- > Michal Hocko > SUSE Labs -- Yours, Muchun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80586C5519F for ; Fri, 20 Nov 2020 09:37:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DA4932240A for ; Fri, 20 Nov 2020 09:37:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="SYxcW6HI" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DA4932240A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 469946B005D; Fri, 20 Nov 2020 04:37:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 41A3A6B006E; Fri, 20 Nov 2020 04:37:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 331956B007D; Fri, 20 Nov 2020 04:37:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0058.hostedemail.com [216.40.44.58]) by kanga.kvack.org (Postfix) with ESMTP id F02466B005D for ; Fri, 20 Nov 2020 04:37:53 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8F5DF180AD838 for ; Fri, 20 Nov 2020 09:37:53 +0000 (UTC) X-FDA: 77504294826.20.drug55_2d0622f2734a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 6D2AB180C07A3 for ; Fri, 20 Nov 2020 09:37:53 +0000 (UTC) X-HE-Tag: drug55_2d0622f2734a X-Filterd-Recvd-Size: 6608 Received: from mail-pf1-f193.google.com (mail-pf1-f193.google.com [209.85.210.193]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Fri, 20 Nov 2020 09:37:52 +0000 (UTC) Received: by mail-pf1-f193.google.com with SMTP id q10so7339969pfn.0 for ; Fri, 20 Nov 2020 01:37:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Irqz9SDqUIDSV5L/8FdFdXAurDYsnyc+HWaKMTHel6Y=; b=SYxcW6HISIb8Awuh3nmJxgBqMNn6Q5GeMGaNN4zWGoyErVEU+AZ6B7m7JrV2+aVg/I qKv9JPOYAWnCAT+77ENHbA0k0suOlxx8gRsdjzajNwuUzaTKLTJVooMbRGtWd+194Fhs 3CwBMuEyoIWueQNySRtc0S3biNBTuMrHggzoUWAFee8WLDSUVlmcpjvOHU3ma2hNKn49 2j//Ku7/vTvBvqjgtoS3EE8Z4UzTujVrgtrGX1iCL0LyE0sM6+rYkiGPliBe1aickIOt fTrJTmvKQkEes7ltwtIat40s17xslYkJu7DhUKLTElLqX7NCeVbEIlMeij/RqzE1BiW9 piiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Irqz9SDqUIDSV5L/8FdFdXAurDYsnyc+HWaKMTHel6Y=; b=AsPAMqKRFKEfXsKsTcje+XdxuXaCNjExkXXcs2gHZcyucmBDK2aYhdNj0IE3YS7Bfj JhNoN443OeB4G/OBk0ePUn7QURhVtdK3s852LPd/B9h7ZFaZEa9yr6Q5JKcmhkAKPW/O GV6oKHoAtvhA3uh4prx07I5K336y7wtkF5hXV7SYoS+TZVU/2cqXHk4hX9zVOb0jUiWW BrVaMc2JVONlCgFs6ePe4j4rkv1fnjCixz6WNm+u9/CbTrjjRtoZq7dUIqXH3WURne6G 0V6sXug2PFW2sJIbpEiz1hnX1rNLQyFLR7EM015kzK9eDN8Pi3CcteL+jblfHu7lyJMV N6IA== X-Gm-Message-State: AOAM532pggL0/ktVIMuvEzTyCwWWJ5GFgB/vS9W9afM/nNEk+LQJHE70 OIF1Wsrj8L9kRvPvl3BCtGKpsEBB0GU4WzIBVnysHw== X-Google-Smtp-Source: ABdhPJw9yspq9j/qG1NKAt5aapXI1Q0GtnvJsXQ+8BEuEA6w4nEfVelDgROJG5yjDGSXPqb9X0CBLn7z1ViZeLeqFp4= X-Received: by 2002:a17:90b:941:: with SMTP id dw1mr9250411pjb.147.1605865071332; Fri, 20 Nov 2020 01:37:51 -0800 (PST) MIME-Version: 1.0 References: <20201120064325.34492-1-songmuchun@bytedance.com> <20201120064325.34492-12-songmuchun@bytedance.com> <20201120081123.GC3200@dhcp22.suse.cz> <20201120092826.GL3200@dhcp22.suse.cz> In-Reply-To: <20201120092826.GL3200@dhcp22.suse.cz> From: Muchun Song Date: Fri, 20 Nov 2020 17:37:09 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v5 11/21] mm/hugetlb: Allocate the vmemmap pages associated with each hugetlb page To: Michal Hocko Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Oscar Salvador , "Song Bao Hua (Barry Song)" , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Nov 20, 2020 at 5:28 PM Michal Hocko wrote: > > On Fri 20-11-20 16:51:59, Muchun Song wrote: > > On Fri, Nov 20, 2020 at 4:11 PM Michal Hocko wrote: > > > > > > On Fri 20-11-20 14:43:15, Muchun Song wrote: > > > [...] > > > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > > > > index eda7e3a0b67c..361c4174e222 100644 > > > > --- a/mm/hugetlb_vmemmap.c > > > > +++ b/mm/hugetlb_vmemmap.c > > > > @@ -117,6 +117,8 @@ > > > > #define RESERVE_VMEMMAP_NR 2U > > > > #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) > > > > #define TAIL_PAGE_REUSE -1 > > > > +#define GFP_VMEMMAP_PAGE \ > > > > + (GFP_KERNEL | __GFP_NOFAIL | __GFP_MEMALLOC) > > > > > > This is really dangerous! __GFP_MEMALLOC would allow a complete memory > > > depletion. I am not even sure triggering the OOM killer is a reasonable > > > behavior. It is just unexpected that shrinking a hugetlb pool can have > > > destructive side effects. I believe it would be more reasonable to > > > simply refuse to shrink the pool if we cannot free those pages up. This > > > sucks as well but it isn't destructive at least. > > > > I find the instructions of __GFP_MEMALLOC from the kernel doc. > > > > %__GFP_MEMALLOC allows access to all memory. This should only be used when > > the caller guarantees the allocation will allow more memory to be freed > > very shortly. > > > > Our situation is in line with the description above. We will free a HugeTLB page > > to the buddy allocator which is much larger than that we allocated shortly. > > Yes that is a part of the description. But read it in its full entirety. > * %__GFP_MEMALLOC allows access to all memory. This should only be used when > * the caller guarantees the allocation will allow more memory to be freed > * very shortly e.g. process exiting or swapping. Users either should > * be the MM or co-ordinating closely with the VM (e.g. swap over NFS). > * Users of this flag have to be extremely careful to not deplete the reserve > * completely and implement a throttling mechanism which controls the > * consumption of the reserve based on the amount of freed memory. > * Usage of a pre-allocated pool (e.g. mempool) should be always considered > * before using this flag. > > GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_HIGH We want to free the HugeTLB page to the buddy allocator, but before that, we need to allocate some pages as vmemmap pages, so here we cannot handle allocation failures. I think that we should replace the __GFP_RETRY_MAYFAIL to __GFP_NOFAIL. GFP_KERNEL | __GFP_NOFAIL | __GFP_HIGH This meets our needs here. Thanks. > > sounds like a more reasonable fit to me. > > -- > Michal Hocko > SUSE Labs -- Yours, Muchun