All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>,
	Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
	Shakeel Butt <shakeelb@google.com>,
	Yang Shi <shy828301@gmail.com>, Jason Gunthorpe <jgg@nvidia.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	William Kucharski <william.kucharski@oracle.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	David Nellans <dnellans@nvidia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 00/30] 1GB PUD THP support on x86_64
Date: Mon, 5 Oct 2020 20:37:46 +0100	[thread overview]
Message-ID: <20201005193746.GO20115@casper.infradead.org> (raw)
In-Reply-To: <302C73F4-27BF-459C-8D78-5CBAF812E5CB@nvidia.com>

On Mon, Oct 05, 2020 at 03:12:55PM -0400, Zi Yan wrote:
> On 5 Oct 2020, at 11:55, Matthew Wilcox wrote:
> > One of the longer-term todo items is to support variable sized THPs for
> > anonymous memory, just like I've done for the pagecache.  With that in
> > place, I think scaling up from PMD sized pages to PUD sized pages starts
> > to look more natural.  Itanium and PA-RISC (two architectures that will
> > never be found in phones...) support 1MB, 4MB, 16MB, 64MB and upwards.
> > The RiscV spec you pointed me at the other day confines itself to adding
> > support for 16, 64 & 256kB today, but does note that 8MB, 32MB and 128MB
> > sizes would be possible additions in the future.
> 
> Just to understand the todo items clearly. With your pagecache patchset,
> kernel should be able to understand variable sized THPs no matter they
> are anonymous or not, right?

... yes ... modulo bugs and places I didn't fix because only anonymous
pages can get there ;-)  There are still quite a few references to
HPAGE_PMD_MASK / SIZE / NR and I couldn't swear that they're all related
to things which are actually PMD sized.  I did fix a couple of places
where the anonymous path assumed that pages were PMD sized because I
thought we'd probably want to do that sooner rather than later.

> For anonymous memory, we need kernel policies
> to decide what THP sizes to use at allocation, what to do when under
> memory pressure, and so on. In terms of implementation, THP split function
> needs to support from any order to any lower order. Anything I am missing here?

I think that's the bulk of the work.  The swap code also needs work so we
don't have to split pages to swap them out.

> > I think I'm leaning towards not merging this patchset yet.  I'm in
> > agreement with the goals (allowing systems to use PUD-sized pages
> > automatically), but I think we need to improve the infrastructure to
> > make it work well automatically.  Does that make sense?
> 
> I agree that this patchset should not be merged in the current form.
> I think PUD THP support is a part of variable sized THP support, but
> current form of the patchset does not have the “variable sized THP”
> spirit yet and is more like a special PUD case support. I guess some
> changes to existing THP code to make PUD THP less a special case would
> make the whole patchset more acceptable?
> 
> Can you elaborate more on the infrastructure part? Thanks.

Oh, this paragraph was just summarising the above.  We need to
be consistently using thp_size() instead of HPAGE_PMD_SIZE, etc.
I haven't put much effort yet into supporting pages which are larger than
PMD-size -- that is, if a page is mapped with a PMD entry, we assume
it's PMD-sized.  Once we can allocate a larger-than-PMD sized page,
that's off.  I assume a lot of that is dealt with in your patchset,
although I haven't audited it to check for that.

> > (*) It would be nice if hardware provided a way to track D/A on a sub-PTE
> > level when using PMD/PUD sized mappings.  I don't know of any that does
> > that today.
> 
> I agree it would be a nice hardware feature, but it also has a high cost.
> Each TLB would support this with 1024 bits, which is about 16 TLB entry size,
> assuming each entry takes 8B space. Now it becomes why not having a bigger
> TLB. ;)

Oh, we don't have to track at the individual-page level for this to be
useful.  Let's take the RISC-V Sv39 page table entry format as an example:

63-54 attributes
53-28 PPN2
27-19 PPN1
18-10 PPN0
9-8 RSW
7-0 DAGUXWRV

For a 2MB page, we currently insist that 18-10 are zero.  If we repurpose
eight of those nine bits as A/D bits, we can track at 512kB granularity.
For 1GB pages, we can use 16 of the 18 bits to track A/D at 128MB
granularity.  It's not great, but it is quite cheap!

  reply	other threads:[~2020-10-05 19:38 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-28 17:53 [RFC PATCH v2 00/30] 1GB PUD THP support on x86_64 Zi Yan
2020-09-28 17:53 ` [RFC PATCH v2 01/30] mm/pagewalk: use READ_ONCE when reading the PUD entry unlocked Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 02/30] mm: pagewalk: use READ_ONCE when reading the PMD " Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 03/30] mm: thp: use single linked list for THP page table page deposit Zi Yan
2020-09-28 19:34   ` Matthew Wilcox
2020-09-28 20:34     ` Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 04/30] mm: add new helper functions to allocate one PMD page with 512 PTE pages Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 05/30] mm: thp: add page table deposit/withdraw functions for PUD THP Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 06/30] mm: change thp_order and thp_nr as we will have not just PMD THPs Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 07/30] mm: thp: add anonymous PUD THP page fault support without enabling it Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 08/30] mm: thp: add PUD THP support for copy_huge_pud Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 09/30] mm: thp: add PUD THP support to zap_huge_pud Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 10/30] fs: proc: add PUD THP kpageflag Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 11/30] mm: thp: handling PUD THP reference bit Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 12/30] mm: rmap: add mappped/unmapped page order to anonymous page rmap functions Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 13/30] mm: rmap: add map_order to page_remove_anon_compound_rmap Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 14/30] mm: thp: add PUD THP split_huge_pud_page() function Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 15/30] mm: thp: add PUD THP to deferred split list when PUD mapping is gone Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 16/30] mm: debug: adapt dump_page to PUD THP Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 17/30] mm: thp: PUD THP COW splits PUD page and falls back to PMD page Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 18/30] mm: thp: PUD THP follow_p*d_page() support Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 19/30] mm: stats: make smap stats understand PUD THPs Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 20/30] mm: page_vma_walk: teach it about PMD-mapped PUD THP Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 21/30] mm: thp: PUD THP support in try_to_unmap() Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 22/30] mm: thp: split PUD THPs at page reclaim Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 23/30] mm: support PUD THP pagemap support Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 24/30] mm: madvise: add page size options to MADV_HUGEPAGE and MADV_NOHUGEPAGE Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 25/30] mm: vma: add VM_HUGEPAGE_PUD to vm_flags at bit 37 Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 26/30] mm: thp: add a global knob to enable/disable PUD THPs Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 27/30] mm: thp: make PUD THP size public Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 28/30] hugetlb: cma: move cma reserve function to cma.c Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 29/30] mm: thp: use cma reservation for pud thp allocation Zi Yan
2020-09-28 17:54 ` [RFC PATCH v2 30/30] mm: thp: enable anonymous PUD THP at page fault path Zi Yan
2020-09-30 11:55 ` [RFC PATCH v2 00/30] 1GB PUD THP support on x86_64 Michal Hocko
2020-10-01 15:14   ` Zi Yan
2020-10-02  7:32     ` Michal Hocko
2020-10-02  7:50       ` David Hildenbrand
2020-10-02  8:10         ` Michal Hocko
2020-10-02  8:30           ` David Hildenbrand
2020-10-05 15:03             ` Zi Yan
2020-10-05 15:55               ` Matthew Wilcox
2020-10-05 17:04                 ` Roman Gushchin
2020-10-05 19:12                 ` Zi Yan
2020-10-05 19:37                   ` Matthew Wilcox [this message]
2020-10-05 17:16               ` Roman Gushchin
2020-10-05 17:27                 ` David Hildenbrand
2020-10-05 18:25                   ` Roman Gushchin
2020-10-05 18:33                     ` David Hildenbrand
2020-10-05 19:11                       ` Roman Gushchin
2020-10-06  8:25                         ` David Hildenbrand
2020-10-05 17:39               ` David Hildenbrand
2020-10-05 18:05                 ` Zi Yan
2020-10-05 18:48                   ` David Hildenbrand
2020-10-06 11:59                   ` Michal Hocko
2020-10-05 15:34         ` Zi Yan
2020-10-05 17:30           ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201005193746.GO20115@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=david@redhat.com \
    --cc=dnellans@nvidia.com \
    --cc=guro@fb.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=william.kucharski@oracle.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.