linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm: Add PM_THP to /proc/pid/pagemap
@ 2021-11-04 21:46 Mina Almasry
       [not found] ` <YYRZNWZqHy9+11KW@casper.infradead.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Mina Almasry @ 2021-11-04 21:46 UTC (permalink / raw)
  Cc: Mina Almasry, Paul E . McKenney, Yu Zhao, Jonathan Corbet,
	Andrew Morton, Peter Xu, Ivan Teterevkov, David Hildenbrand,
	Matthew Wilcox, Florian Schmidt, linux-kernel, linux-fsdevel,
	linux-mm

Add PM_THP to allow userspace to detect whether a given virt address is
currently mapped by a hugepage or not.

Example use case is a process requesting hugepages from the kernel (via
a huge tmpfs mount for example), for a performance critical region of
memory.  The userspace may want to query whether the kernel is actually
backing this memory by hugepages or not.

Tested manually by adding logging into transhuge-stress.

Signed-off-by: Mina Almasry <almasrymina@google.com>

Cc: David Rientjes rientjes@google.com
Cc: Paul E. McKenney <paulmckrcu@fb.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Florian Schmidt <florian.schmidt@nutanix.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org

---
 fs/proc/task_mmu.c                            |  5 +++++
 tools/testing/selftests/vm/transhuge-stress.c | 21 +++++++++++++++----
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ad667dbc96f5c..9847514937fc7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1302,6 +1302,7 @@ struct pagemapread {
 #define PM_SOFT_DIRTY		BIT_ULL(55)
 #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
 #define PM_UFFD_WP		BIT_ULL(57)
+#define PM_THP			BIT_ULL(58)
 #define PM_FILE			BIT_ULL(61)
 #define PM_SWAP			BIT_ULL(62)
 #define PM_PRESENT		BIT_ULL(63)
@@ -1396,6 +1397,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		flags |= PM_FILE;
 	if (page && page_mapcount(page) == 1)
 		flags |= PM_MMAP_EXCLUSIVE;
+	if (page && PageTransCompound(page))
+		flags |= PM_THP;
 	if (vma->vm_flags & VM_SOFTDIRTY)
 		flags |= PM_SOFT_DIRTY;

@@ -1456,6 +1459,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,

 		if (page && page_mapcount(page) == 1)
 			flags |= PM_MMAP_EXCLUSIVE;
+		if (page && PageTransCompound(page))
+			flags |= PM_THP;

 		for (; addr != end; addr += PAGE_SIZE) {
 			pagemap_entry_t pme = make_pme(frame, flags);
diff --git a/tools/testing/selftests/vm/transhuge-stress.c b/tools/testing/selftests/vm/transhuge-stress.c
index fd7f1b4a96f94..7dce18981fff5 100644
--- a/tools/testing/selftests/vm/transhuge-stress.c
+++ b/tools/testing/selftests/vm/transhuge-stress.c
@@ -16,6 +16,12 @@
 #include <string.h>
 #include <sys/mman.h>

+/*
+ * We can use /proc/pid/pagemap to detect whether the kernel was able to find
+ * hugepages or no. This can be very noisy, so is disabled by default.
+ */
+#define NO_DETECT_HUGEPAGES
+
 #define PAGE_SHIFT 12
 #define HPAGE_SHIFT 21

@@ -23,6 +29,7 @@
 #define HPAGE_SIZE (1 << HPAGE_SHIFT)

 #define PAGEMAP_PRESENT(ent)	(((ent) & (1ull << 63)) != 0)
+#define PAGEMAP_THP(ent)	(((ent) & (1ull << 58)) != 0)
 #define PAGEMAP_PFN(ent)	((ent) & ((1ull << 55) - 1))

 int pagemap_fd;
@@ -47,10 +54,16 @@ int64_t allocate_transhuge(void *ptr)
 			(uintptr_t)ptr >> (PAGE_SHIFT - 3)) != sizeof(ent))
 		err(2, "read pagemap");

-	if (PAGEMAP_PRESENT(ent[0]) && PAGEMAP_PRESENT(ent[1]) &&
-	    PAGEMAP_PFN(ent[0]) + 1 == PAGEMAP_PFN(ent[1]) &&
-	    !(PAGEMAP_PFN(ent[0]) & ((1 << (HPAGE_SHIFT - PAGE_SHIFT)) - 1)))
-		return PAGEMAP_PFN(ent[0]);
+	if (PAGEMAP_PRESENT(ent[0]) && PAGEMAP_PRESENT(ent[1])) {
+#ifndef NO_DETECT_HUGEPAGES
+		if (!PAGEMAP_THP(ent[0]))
+			fprintf(stderr, "WARNING: detected non THP page\n");
+#endif
+		if (PAGEMAP_PFN(ent[0]) + 1 == PAGEMAP_PFN(ent[1]) &&
+		    !(PAGEMAP_PFN(ent[0]) &
+		      ((1 << (HPAGE_SHIFT - PAGE_SHIFT)) - 1)))
+			return PAGEMAP_PFN(ent[0]);
+	}

 	return -1;
 }
--
2.34.0.rc0.344.g81b53c2807-goog

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] mm: Add PM_THP to /proc/pid/pagemap
       [not found] ` <YYRZNWZqHy9+11KW@casper.infradead.org>
@ 2021-11-04 22:45   ` Mina Almasry
  2021-11-07 22:56     ` Mina Almasry
  0 siblings, 1 reply; 3+ messages in thread
From: Mina Almasry @ 2021-11-04 22:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Paul E . McKenney, Yu Zhao, Jonathan Corbet, Andrew Morton,
	Peter Xu, Ivan Teterevkov, David Hildenbrand, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm

On Thu, Nov 4, 2021 at 3:08 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Nov 04, 2021 at 02:46:35PM -0700, Mina Almasry wrote:
> > Add PM_THP to allow userspace to detect whether a given virt address is
> > currently mapped by a hugepage or not.
>
> Well, no, that's not what that means.
>

Sorry, that was the intention, but I didn't implement the intention correctly.

> > @@ -1396,6 +1397,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
> >               flags |= PM_FILE;
> >       if (page && page_mapcount(page) == 1)
> >               flags |= PM_MMAP_EXCLUSIVE;
> > +     if (page && PageTransCompound(page))
> > +             flags |= PM_THP;
>
> All that PageTransCompound() does is call PageCompound().  It doesn't
> tell you if the underlying allocation is PMD sized, nor properly aligned.
>
> And you didn't answer my question about whether you want information about
> whether a large page is being used that's not quite as large as a PMD.
>

Sorry, I thought the implementation would make it clear but I didn't
do that correctly. Right now and for the foreseeable future what I
want to know is whether the page is mapped by a PMD. All the below
work for me:

1. Flag is set if the page is either a PMD size THP page.
2. Flag is set if the page is either a PMD size THP page or PMD size
hugetlbfs page.
3. Flag is set if the page is either a PMD size THP page or PMD size
hugetlbfs page or contig PTE size hugetlbfs page.

I prefer #2 and I think it's maybe most extensible for future use
cases that 1 flag tells whether the page is PMD hugepage and another
flag is a large cont PTE page.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] mm: Add PM_THP to /proc/pid/pagemap
  2021-11-04 22:45   ` Mina Almasry
@ 2021-11-07 22:56     ` Mina Almasry
  0 siblings, 0 replies; 3+ messages in thread
From: Mina Almasry @ 2021-11-07 22:56 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Paul E . McKenney, Yu Zhao, Jonathan Corbet, Andrew Morton,
	Peter Xu, Ivan Teterevkov, David Hildenbrand, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm

On Thu, Nov 4, 2021 at 3:45 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Thu, Nov 4, 2021 at 3:08 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Nov 04, 2021 at 02:46:35PM -0700, Mina Almasry wrote:
> > > Add PM_THP to allow userspace to detect whether a given virt address is
> > > currently mapped by a hugepage or not.
> >
> > Well, no, that's not what that means.
> >
>
> Sorry, that was the intention, but I didn't implement the intention correctly.
>
> > > @@ -1396,6 +1397,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
> > >               flags |= PM_FILE;
> > >       if (page && page_mapcount(page) == 1)
> > >               flags |= PM_MMAP_EXCLUSIVE;
> > > +     if (page && PageTransCompound(page))
> > > +             flags |= PM_THP;
> >
> > All that PageTransCompound() does is call PageCompound().  It doesn't
> > tell you if the underlying allocation is PMD sized, nor properly aligned.
> >

Sorry Matthew again for getting this check wrong. After taking a
deeper look, you're completely correct. My check was returning true on
all compound pages without regard to whether they are actually THP, or
whether they're mapped at the PMD level.

I've renamed the flag from PM_THP to PM_HUGE_THP_MAPPING to be more
accurate, and it looks to me like the correct check is if we're in
pagemap_pmd_range() and the underlying page is_transparent_huegpage(),
then we set the flag.

I'm about to upload v3 with this new check; please take another look.
Thank you for catching this.

> > And you didn't answer my question about whether you want information about
> > whether a large page is being used that's not quite as large as a PMD.
> >
>
> Sorry, I thought the implementation would make it clear but I didn't
> do that correctly. Right now and for the foreseeable future what I
> want to know is whether the page is mapped by a PMD. All the below
> work for me:
>
> 1. Flag is set if the page is either a PMD size THP page.
> 2. Flag is set if the page is either a PMD size THP page or PMD size
> hugetlbfs page.
> 3. Flag is set if the page is either a PMD size THP page or PMD size
> hugetlbfs page or contig PTE size hugetlbfs page.
>
> I prefer #2 and I think it's maybe most extensible for future use
> cases that 1 flag tells whether the page is PMD hugepage and another
> flag is a large cont PTE page.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-07 22:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-04 21:46 [PATCH v2] mm: Add PM_THP to /proc/pid/pagemap Mina Almasry
     [not found] ` <YYRZNWZqHy9+11KW@casper.infradead.org>
2021-11-04 22:45   ` Mina Almasry
2021-11-07 22:56     ` Mina Almasry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).