From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F30F4C63777 for ; Mon, 23 Nov 2020 07:38:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4CDF620738 for ; Mon, 23 Nov 2020 07:38:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="lZJNaK+z" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4CDF620738 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4D0BE6B005D; Mon, 23 Nov 2020 02:38:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 480316B0070; Mon, 23 Nov 2020 02:38:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 370CC6B0071; Mon, 23 Nov 2020 02:38:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0207.hostedemail.com [216.40.44.207]) by kanga.kvack.org (Postfix) with ESMTP id 09E7B6B005D for ; Mon, 23 Nov 2020 02:38:55 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9C886181AEF1D for ; Mon, 23 Nov 2020 07:38:55 +0000 (UTC) X-FDA: 77514881430.03.floor22_0d105b427363 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 765E128A4E8 for ; Mon, 23 Nov 2020 07:38:55 +0000 (UTC) X-HE-Tag: floor22_0d105b427363 X-Filterd-Recvd-Size: 4463 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Nov 2020 07:38:54 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606117133; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DAbMhd58MepnJ3wMNvb/zcTlCSLnBv7xL56Vesu5R1o=; b=lZJNaK+zKvjftgQ9k5+/adyl3XY7aEfdiOJ+dAXhG8va60Zactln2x6BiRwuIU5QSiP+dD 6Z4L18p2ngk3lSmCPyCYLT8jYUt8r1AYwVRlGbE49ILwX3ecjFTvGnwhpaMvOj9bf/OWtJ VM5o7sQcwiSv3lWFCvjBQAGKju6BcZs= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 7EB96ABDE; Mon, 23 Nov 2020 07:38:53 +0000 (UTC) Date: Mon, 23 Nov 2020 08:38:42 +0100 From: Michal Hocko To: Mike Kravetz Cc: David Hildenbrand , Muchun Song , corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org, oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, almasrymina@google.com, rientjes@google.com, willy@infradead.org, osalvador@suse.de, song.bao.hua@hisilicon.com, duanxiongchun@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v5 00/21] Free some vmemmap pages of hugetlb page Message-ID: <20201123073842.GA27488@dhcp22.suse.cz> References: <20201120064325.34492-1-songmuchun@bytedance.com> <20201120084202.GJ3200@dhcp22.suse.cz> <6b1533f7-69c6-6f19-fc93-c69750caaecc@redhat.com> <20201120093912.GM3200@dhcp22.suse.cz> <55e53264-a07a-a3ec-4253-e72c718b4ee6@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55e53264-a07a-a3ec-4253-e72c718b4ee6@oracle.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 20-11-20 09:45:12, Mike Kravetz wrote: > On 11/20/20 1:43 AM, David Hildenbrand wrote: [...] > >>> To keep things easy, maybe simply never allow to free these hugetlb pages > >>> again for now? If they were reserved during boot and the vmemmap condensed, > >>> then just let them stick around for all eternity. > >> > >> Not sure I understand. Do you propose to only free those vmemmap pages > >> when the pool is initialized during boot time and never allow to free > >> them up? That would certainly make it safer and maybe even simpler wrt > >> implementation. > > > > Exactly, let's keep it simple for now. I guess most use cases of this (virtualization, databases, ...) will allocate hugepages during boot and never free them. > > Not sure if I agree with that last statement. Database and virtualization > use cases from my employer allocate allocate hugetlb pages after boot. It > is shortly after boot, but still not from boot/kernel command line. Is there any strong reason for that? > Somewhat related, but not exactly addressing this issue ... > > One idea discussed in a previous patch set was to disable PMD/huge page > mapping of vmemmap if this feature was enabled. This would eliminate a bunch > of the complex code doing page table manipulation. It does not address > the issue of struct page pages going away which is being discussed here, > but it could be a way to simply the first version of this code. If this > is going to be an 'opt in' feature as previously suggested, then eliminating > the PMD/huge page vmemmap mapping may be acceptable. My guess is that > sysadmins would only 'opt in' if they expect most of system memory to be used > by hugetlb pages. We certainly have database and virtualization use cases > where this is true. Would this simplify the code considerably? I mean, the vmemmap page tables will need to be updated anyway. So that code has to stay. PMD entry split shouldn't be the most complex part of that operation. On the other hand dropping large pages for all vmemmaps will likely have a performance. -- Michal Hocko SUSE Labs