From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82BEAC433E2 for ; Fri, 4 Sep 2020 07:42:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E9A71206A5 for ; Fri, 4 Sep 2020 07:42:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E9A71206A5 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3E8CF6B0002; Fri, 4 Sep 2020 03:42:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3990A8E0001; Fri, 4 Sep 2020 03:42:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2886E6B005A; Fri, 4 Sep 2020 03:42:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0063.hostedemail.com [216.40.44.63]) by kanga.kvack.org (Postfix) with ESMTP id 13F9E6B0002 for ; Fri, 4 Sep 2020 03:42:11 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C454D3633 for ; Fri, 4 Sep 2020 07:42:10 +0000 (UTC) X-FDA: 77224585620.29.cat67_5b0c793270b0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id 95E55180865BF for ; Fri, 4 Sep 2020 07:42:10 +0000 (UTC) X-HE-Tag: cat67_5b0c793270b0 X-Filterd-Recvd-Size: 3600 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Fri, 4 Sep 2020 07:42:10 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id BBDB1ABC1; Fri, 4 Sep 2020 07:42:09 +0000 (UTC) Date: Fri, 4 Sep 2020 09:42:07 +0200 From: Michal Hocko To: Roman Gushchin Cc: Zi Yan , linux-mm@kvack.org, Rik van Riel , "Kirill A . Shutemov" , Matthew Wilcox , Shakeel Butt , Yang Shi , David Nellans , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 00/16] 1GB THP support on x86_64 Message-ID: <20200904074207.GC15277@dhcp22.suse.cz> References: <20200902180628.4052244-1-zi.yan@sent.com> <20200903073254.GP4617@dhcp22.suse.cz> <20200903162527.GF60440@carbon.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200903162527.GF60440@carbon.dhcp.thefacebook.com> X-Rspamd-Queue-Id: 95E55180865BF X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 03-09-20 09:25:27, Roman Gushchin wrote: > On Thu, Sep 03, 2020 at 09:32:54AM +0200, Michal Hocko wrote: > > On Wed 02-09-20 14:06:12, Zi Yan wrote: > > > From: Zi Yan > > > > > > Hi all, > > > > > > This patchset adds support for 1GB THP on x86_64. It is on top of > > > v5.9-rc2-mmots-2020-08-25-21-13. > > > > > > 1GB THP is more flexible for reducing translation overhead and increasing the > > > performance of applications with large memory footprint without application > > > changes compared to hugetlb. > > > > Please be more specific about usecases. This better have some strong > > ones because THP code is complex enough already to add on top solely > > based on a generic TLB pressure easing. > > Hello, Michal! > > We at Facebook are using 1 GB hugetlbfs pages and are getting noticeable > performance wins on some workloads. Let me clarify. I am not questioning 1GB (or large) pages in general. I believe it is quite clear that there are usecases which hugely benefit from them. I am mostly asking for the transparent part of it which traditionally means that userspace mostly doesn't have to care and get them. 2MB THPs have established certain expectations mostly a really aggressive pro-active instanciation. This has bitten us many times and create a "you need to disable THP to fix your problem whatever that is" cargo cult. I hope we do not want to repeat that mistake here again. > Historically we allocated gigantic pages at the boot time, but recently moved > to cma-based dynamic approach. Still, hugetlbfs interface requires more management > than we would like to do. 1 GB THP seems to be a better alternative. So I definitely > see it as a very useful feature. > > Given the cost of an allocation, I'm slightly skeptical about an automatic > heuristics-based approach, but if an application can explicitly mark target areas > with madvise(), I don't see why it wouldn't work. An explicit opt-in sounds much more appropriate to me as well. If we go with a specific API then I would not make it 1GB pages specific. Why cannot we have an explicit interface to "defragment" address space range into large pages and the kernel would use large pages where appropriate? Or is the additional copying prohibitively expensive? -- Michal Hocko SUSE Labs