From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81282C4363A for ; Mon, 5 Oct 2020 19:38:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4FFF82100A for ; Mon, 5 Oct 2020 19:38:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="VIysEGHG" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4FFF82100A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5E95D6B00A0; Mon, 5 Oct 2020 15:37:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 59AC36B00A2; Mon, 5 Oct 2020 15:37:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 461FD6B00A4; Mon, 5 Oct 2020 15:37:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0064.hostedemail.com [216.40.44.64]) by kanga.kvack.org (Postfix) with ESMTP id 1846C6B00A0 for ; Mon, 5 Oct 2020 15:37:59 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 993EF181AE878 for ; Mon, 5 Oct 2020 19:37:58 +0000 (UTC) X-FDA: 77338882236.09.prose59_0717c55271c0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 7E717180AD806 for ; Mon, 5 Oct 2020 19:37:58 +0000 (UTC) X-HE-Tag: prose59_0717c55271c0 X-Filterd-Recvd-Size: 6365 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Mon, 5 Oct 2020 19:37:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=A/OqTcP/ll/EOqyoCasxl2O3QCdiYXCnbxBZDY6AU10=; b=VIysEGHGlH/DBIJrRGDVrfX5au 6HdnYBjD7HOCPNLi1xoBCmML5dvFPi/KTzWvn5k05DhsVgfiQBzYAXP1OYGBDnI3AOd+fkBhsSNVG NxTuYjML3AW5Vg7LrK1SjizJSO0azt7u7SCKsbNLY5RusJpBZ+epgMyEyNPLDSQsnkFdUL2FQ+y0g 5o0vb9W04kMT1sToB/H/eY8LJ/x2HKiB17xFa3CMsRF1bTnpq2hEGI0V32vBq6TppDOzL8qlwgdGj zzKp+s4v3//nj311WeOIGSfXX91ARhlsSbMkKVQ4rIh7PmfVHmZf+XrF6x6YXazra7m0XlhFLSmwA SpOj3CmA==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kPWIs-0001FU-QA; Mon, 05 Oct 2020 19:37:46 +0000 Date: Mon, 5 Oct 2020 20:37:46 +0100 From: Matthew Wilcox To: Zi Yan Cc: David Hildenbrand , Michal Hocko , linux-mm@kvack.org, "Kirill A . Shutemov" , Rik van Riel , Roman Gushchin , Shakeel Butt , Yang Shi , Jason Gunthorpe , Mike Kravetz , William Kucharski , Andrea Arcangeli , John Hubbard , David Nellans , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH v2 00/30] 1GB PUD THP support on x86_64 Message-ID: <20201005193746.GO20115@casper.infradead.org> References: <20200928175428.4110504-1-zi.yan@sent.com> <20200930115505.GT2277@dhcp22.suse.cz> <73394A41-16D8-431C-9E48-B14D44F045F8@nvidia.com> <20201002073205.GC20872@dhcp22.suse.cz> <9a7600e2-044a-50ca-acde-bf647932c751@redhat.com> <20201002081023.GA4555@dhcp22.suse.cz> <645b35a5-970d-dcfe-2b4a-04ebd4444756@redhat.com> <20201005155553.GM20115@casper.infradead.org> <302C73F4-27BF-459C-8D78-5CBAF812E5CB@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <302C73F4-27BF-459C-8D78-5CBAF812E5CB@nvidia.com> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 05, 2020 at 03:12:55PM -0400, Zi Yan wrote: > On 5 Oct 2020, at 11:55, Matthew Wilcox wrote: > > One of the longer-term todo items is to support variable sized THPs f= or > > anonymous memory, just like I've done for the pagecache. With that i= n > > place, I think scaling up from PMD sized pages to PUD sized pages sta= rts > > to look more natural. Itanium and PA-RISC (two architectures that wi= ll > > never be found in phones...) support 1MB, 4MB, 16MB, 64MB and upwards= . > > The RiscV spec you pointed me at the other day confines itself to add= ing > > support for 16, 64 & 256kB today, but does note that 8MB, 32MB and 12= 8MB > > sizes would be possible additions in the future. >=20 > Just to understand the todo items clearly. With your pagecache patchset= , > kernel should be able to understand variable sized THPs no matter they > are anonymous or not, right? ... yes ... modulo bugs and places I didn't fix because only anonymous pages can get there ;-) There are still quite a few references to HPAGE_PMD_MASK / SIZE / NR and I couldn't swear that they're all related to things which are actually PMD sized. I did fix a couple of places where the anonymous path assumed that pages were PMD sized because I thought we'd probably want to do that sooner rather than later. > For anonymous memory, we need kernel policies > to decide what THP sizes to use at allocation, what to do when under > memory pressure, and so on. In terms of implementation, THP split funct= ion > needs to support from any order to any lower order. Anything I am missi= ng here? I think that's the bulk of the work. The swap code also needs work so we don't have to split pages to swap them out. > > I think I'm leaning towards not merging this patchset yet. I'm in > > agreement with the goals (allowing systems to use PUD-sized pages > > automatically), but I think we need to improve the infrastructure to > > make it work well automatically. Does that make sense? >=20 > I agree that this patchset should not be merged in the current form. > I think PUD THP support is a part of variable sized THP support, but > current form of the patchset does not have the =E2=80=9Cvariable sized = THP=E2=80=9D > spirit yet and is more like a special PUD case support. I guess some > changes to existing THP code to make PUD THP less a special case would > make the whole patchset more acceptable? >=20 > Can you elaborate more on the infrastructure part? Thanks. Oh, this paragraph was just summarising the above. We need to be consistently using thp_size() instead of HPAGE_PMD_SIZE, etc. I haven't put much effort yet into supporting pages which are larger than PMD-size -- that is, if a page is mapped with a PMD entry, we assume it's PMD-sized. Once we can allocate a larger-than-PMD sized page, that's off. I assume a lot of that is dealt with in your patchset, although I haven't audited it to check for that. > > (*) It would be nice if hardware provided a way to track D/A on a sub= -PTE > > level when using PMD/PUD sized mappings. I don't know of any that do= es > > that today. >=20 > I agree it would be a nice hardware feature, but it also has a high cos= t. > Each TLB would support this with 1024 bits, which is about 16 TLB entry= size, > assuming each entry takes 8B space. Now it becomes why not having a big= ger > TLB. ;) Oh, we don't have to track at the individual-page level for this to be useful. Let's take the RISC-V Sv39 page table entry format as an example= : 63-54 attributes 53-28 PPN2 27-19 PPN1 18-10 PPN0 9-8 RSW 7-0 DAGUXWRV For a 2MB page, we currently insist that 18-10 are zero. If we repurpose eight of those nine bits as A/D bits, we can track at 512kB granularity. For 1GB pages, we can use 16 of the 18 bits to track A/D at 128MB granularity. It's not great, but it is quite cheap!