All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
@ 2021-11-23  0:01 Mina Almasry
  2021-11-23  1:10 ` Peter Xu
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Mina Almasry @ 2021-11-23  0:01 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Mina Almasry, David Hildenbrand, Matthew Wilcox,
	Paul E . McKenney, Yu Zhao, Andrew Morton, Peter Xu,
	Ivan Teterevkov, Florian Schmidt, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc

Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
address is currently mapped by a transparent huge page or not.  Example
use case is a process requesting THPs from the kernel (via a huge tmpfs
mount for example), for a performance critical region of memory.  The
userspace may want to query whether the kernel is actually backing this
memory by hugepages or not.

PM_THP_MAPPED bit is set if the virt address is mapped at the PMD
level and the underlying page is a transparent huge page.

A few options were considered:
1. Add /proc/pid/pageflags that exports the same info as
   /proc/kpageflags.  This is not appropriate because many kpageflags are
   inappropriate to expose to userspace processes.
2. Simply get this info from the existing /proc/pid/smaps interface.
   There are a couple of issues with that:
   1. /proc/pid/smaps output is human readable and unfriendly to
      programatically parse.
   2. /proc/pid/smaps is slow because it must read the whole memory range
      rather than a small range we care about.  The cost of reading
      /proc/pid/smaps into userspace buffers is about ~800us per call,
      and this doesn't include parsing the output to get the information
      you need. The cost of querying 1 virt address in /proc/pid/pagemaps
      however is around 5-7us.

Tested manually by adding logging into transhuge-stress, and by
allocating THP and querying the PM_THP_MAPPED flag at those
virtual addresses.

Signed-off-by: Mina Almasry <almasrymina@google.com>

Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes rientjes@google.com
Cc: Paul E. McKenney <paulmckrcu@fb.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: Florian Schmidt <florian.schmidt@nutanix.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org


---

Changes in v7:
- Added clarification that smaps is only slow because it looks at the
  whole address space.

Changes in v6:
- Renamed to PM_THP_MAPPED
- Removed changes to transhuge-stress

Changes in v5:
- Added justification for this interface in the commit message!

Changes in v4:
- Removed unnecessary moving of flags variable declaration

Changes in v3:
- Renamed PM_THP to PM_HUGE_THP_MAPPING
- Fixed checks to set PM_HUGE_THP_MAPPING
- Added PM_HUGE_THP_MAPPING docs
---
 Documentation/admin-guide/mm/pagemap.rst | 3 ++-
 fs/proc/task_mmu.c                       | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index fdc19fbc10839..8a0f0064ff336 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -23,7 +23,8 @@ There are four components to pagemap:
     * Bit  56    page exclusively mapped (since 4.2)
     * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
       :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
-    * Bits 57-60 zero
+    * Bit  58    page is a huge (PMD size) THP mapping
+    * Bits 59-60 zero
     * Bit  61    page is file-page or shared-anon (since 3.5)
     * Bit  62    page swapped
     * Bit  63    page present
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ad667dbc96f5c..d784a97aa209a 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1302,6 +1302,7 @@ struct pagemapread {
 #define PM_SOFT_DIRTY		BIT_ULL(55)
 #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
 #define PM_UFFD_WP		BIT_ULL(57)
+#define PM_THP_MAPPED		BIT_ULL(58)
 #define PM_FILE			BIT_ULL(61)
 #define PM_SWAP			BIT_ULL(62)
 #define PM_PRESENT		BIT_ULL(63)
@@ -1456,6 +1457,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 
 		if (page && page_mapcount(page) == 1)
 			flags |= PM_MMAP_EXCLUSIVE;
+		if (page && is_transparent_hugepage(page))
+			flags |= PM_THP_MAPPED;
 
 		for (; addr != end; addr += PAGE_SIZE) {
 			pagemap_entry_t pme = make_pme(frame, flags);
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23  0:01 [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap Mina Almasry
@ 2021-11-23  1:10 ` Peter Xu
  2021-11-23  1:50 ` David Rientjes
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2021-11-23  1:10 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Jonathan Corbet, David Hildenbrand, Matthew Wilcox,
	Paul E . McKenney, Yu Zhao, Andrew Morton, Ivan Teterevkov,
	Florian Schmidt, linux-kernel, linux-fsdevel, linux-mm,
	linux-doc

On Mon, Nov 22, 2021 at 04:01:02PM -0800, Mina Almasry wrote:
> Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> address is currently mapped by a transparent huge page or not.  Example
> use case is a process requesting THPs from the kernel (via a huge tmpfs
> mount for example), for a performance critical region of memory.  The
> userspace may want to query whether the kernel is actually backing this
> memory by hugepages or not.
> 
> PM_THP_MAPPED bit is set if the virt address is mapped at the PMD
> level and the underlying page is a transparent huge page.
> 
> A few options were considered:
> 1. Add /proc/pid/pageflags that exports the same info as
>    /proc/kpageflags.  This is not appropriate because many kpageflags are
>    inappropriate to expose to userspace processes.
> 2. Simply get this info from the existing /proc/pid/smaps interface.
>    There are a couple of issues with that:
>    1. /proc/pid/smaps output is human readable and unfriendly to
>       programatically parse.
>    2. /proc/pid/smaps is slow because it must read the whole memory range
>       rather than a small range we care about.  The cost of reading
>       /proc/pid/smaps into userspace buffers is about ~800us per call,
>       and this doesn't include parsing the output to get the information
>       you need. The cost of querying 1 virt address in /proc/pid/pagemaps
>       however is around 5-7us.
> 
> Tested manually by adding logging into transhuge-stress, and by
> allocating THP and querying the PM_THP_MAPPED flag at those
> virtual addresses.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>

Acked-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23  0:01 [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap Mina Almasry
  2021-11-23  1:10 ` Peter Xu
@ 2021-11-23  1:50 ` David Rientjes
  2021-11-23 12:05 ` David Hildenbrand
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: David Rientjes @ 2021-11-23  1:50 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Jonathan Corbet, David Hildenbrand, Matthew Wilcox,
	Paul E . McKenney, Yu Zhao, Andrew Morton, Peter Xu,
	Ivan Teterevkov, Florian Schmidt, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc

On Mon, 22 Nov 2021, Mina Almasry wrote:

> Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> address is currently mapped by a transparent huge page or not.  Example
> use case is a process requesting THPs from the kernel (via a huge tmpfs
> mount for example), for a performance critical region of memory.  The
> userspace may want to query whether the kernel is actually backing this
> memory by hugepages or not.
> 
> PM_THP_MAPPED bit is set if the virt address is mapped at the PMD
> level and the underlying page is a transparent huge page.
> 
> A few options were considered:
> 1. Add /proc/pid/pageflags that exports the same info as
>    /proc/kpageflags.  This is not appropriate because many kpageflags are
>    inappropriate to expose to userspace processes.
> 2. Simply get this info from the existing /proc/pid/smaps interface.
>    There are a couple of issues with that:
>    1. /proc/pid/smaps output is human readable and unfriendly to
>       programatically parse.
>    2. /proc/pid/smaps is slow because it must read the whole memory range
>       rather than a small range we care about.  The cost of reading
>       /proc/pid/smaps into userspace buffers is about ~800us per call,
>       and this doesn't include parsing the output to get the information
>       you need. The cost of querying 1 virt address in /proc/pid/pagemaps
>       however is around 5-7us.
> 
> Tested manually by adding logging into transhuge-stress, and by
> allocating THP and querying the PM_THP_MAPPED flag at those
> virtual addresses.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23  0:01 [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap Mina Almasry
  2021-11-23  1:10 ` Peter Xu
  2021-11-23  1:50 ` David Rientjes
@ 2021-11-23 12:05 ` David Hildenbrand
  2021-11-23 20:51 ` Matthew Wilcox
  2021-11-28  4:10 ` Matthew Wilcox
  4 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-11-23 12:05 UTC (permalink / raw)
  To: Mina Almasry, Jonathan Corbet
  Cc: Matthew Wilcox, Paul E . McKenney, Yu Zhao, Andrew Morton,
	Peter Xu, Ivan Teterevkov, Florian Schmidt, linux-kernel,
	linux-fsdevel, linux-mm, linux-doc

On 23.11.21 01:01, Mina Almasry wrote:
> Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> address is currently mapped by a transparent huge page or not.  Example
> use case is a process requesting THPs from the kernel (via a huge tmpfs
> mount for example), for a performance critical region of memory.  The
> userspace may want to query whether the kernel is actually backing this
> memory by hugepages or not.
> 
> PM_THP_MAPPED bit is set if the virt address is mapped at the PMD
> level and the underlying page is a transparent huge page.
> 
> A few options were considered:
> 1. Add /proc/pid/pageflags that exports the same info as
>    /proc/kpageflags.  This is not appropriate because many kpageflags are
>    inappropriate to expose to userspace processes.
> 2. Simply get this info from the existing /proc/pid/smaps interface.
>    There are a couple of issues with that:
>    1. /proc/pid/smaps output is human readable and unfriendly to
>       programatically parse.
>    2. /proc/pid/smaps is slow because it must read the whole memory range
>       rather than a small range we care about.  The cost of reading
>       /proc/pid/smaps into userspace buffers is about ~800us per call,
>       and this doesn't include parsing the output to get the information
>       you need. The cost of querying 1 virt address in /proc/pid/pagemaps
>       however is around 5-7us.
> 
> Tested manually by adding logging into transhuge-stress, and by
> allocating THP and querying the PM_THP_MAPPED flag at those
> virtual addresses.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: David Rientjes rientjes@google.com
> Cc: Paul E. McKenney <paulmckrcu@fb.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
> Cc: Florian Schmidt <florian.schmidt@nutanix.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-mm@kvack.org
> 
> 
> ---
> 
> Changes in v7:
> - Added clarification that smaps is only slow because it looks at the
>   whole address space.
> 
> Changes in v6:
> - Renamed to PM_THP_MAPPED
> - Removed changes to transhuge-stress
> 
> Changes in v5:
> - Added justification for this interface in the commit message!
> 
> Changes in v4:
> - Removed unnecessary moving of flags variable declaration
> 
> Changes in v3:
> - Renamed PM_THP to PM_HUGE_THP_MAPPING
> - Fixed checks to set PM_HUGE_THP_MAPPING
> - Added PM_HUGE_THP_MAPPING docs
> ---
>  Documentation/admin-guide/mm/pagemap.rst | 3 ++-
>  fs/proc/task_mmu.c                       | 3 +++
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> index fdc19fbc10839..8a0f0064ff336 100644
> --- a/Documentation/admin-guide/mm/pagemap.rst
> +++ b/Documentation/admin-guide/mm/pagemap.rst
> @@ -23,7 +23,8 @@ There are four components to pagemap:
>      * Bit  56    page exclusively mapped (since 4.2)
>      * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
>        :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
> -    * Bits 57-60 zero
> +    * Bit  58    page is a huge (PMD size) THP mapping
> +    * Bits 59-60 zero
>      * Bit  61    page is file-page or shared-anon (since 3.5)
>      * Bit  62    page swapped
>      * Bit  63    page present
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index ad667dbc96f5c..d784a97aa209a 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1302,6 +1302,7 @@ struct pagemapread {
>  #define PM_SOFT_DIRTY		BIT_ULL(55)
>  #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
>  #define PM_UFFD_WP		BIT_ULL(57)
> +#define PM_THP_MAPPED		BIT_ULL(58)
>  #define PM_FILE			BIT_ULL(61)
>  #define PM_SWAP			BIT_ULL(62)
>  #define PM_PRESENT		BIT_ULL(63)
> @@ -1456,6 +1457,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
>  
>  		if (page && page_mapcount(page) == 1)
>  			flags |= PM_MMAP_EXCLUSIVE;
> +		if (page && is_transparent_hugepage(page))
> +			flags |= PM_THP_MAPPED;
>  
>  		for (; addr != end; addr += PAGE_SIZE) {
>  			pagemap_entry_t pme = make_pme(frame, flags);
> 

Thanks!

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23  0:01 [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap Mina Almasry
                   ` (2 preceding siblings ...)
  2021-11-23 12:05 ` David Hildenbrand
@ 2021-11-23 20:51 ` Matthew Wilcox
  2021-11-23 21:10   ` Mina Almasry
  2021-11-28  4:10 ` Matthew Wilcox
  4 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2021-11-23 20:51 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Mon, Nov 22, 2021 at 04:01:02PM -0800, Mina Almasry wrote:
> Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> address is currently mapped by a transparent huge page or not.  Example
> use case is a process requesting THPs from the kernel (via a huge tmpfs
> mount for example), for a performance critical region of memory.  The
> userspace may want to query whether the kernel is actually backing this
> memory by hugepages or not.

So you want this bit to be clear if the memory is backed by a hugetlb
page?

>  		if (page && page_mapcount(page) == 1)
>  			flags |= PM_MMAP_EXCLUSIVE;
> +		if (page && is_transparent_hugepage(page))
> +			flags |= PM_THP_MAPPED;

because honestly i'd expect it to be more useful to mean "This memory
is mapped by a PMD entry" and then the code would look like:

		if (page)
			flags |= PM_PMD_MAPPED;

(and put a corresponding change in pagemap_hugetlb_range)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23 20:51 ` Matthew Wilcox
@ 2021-11-23 21:10   ` Mina Almasry
  2021-11-23 21:30     ` Matthew Wilcox
  0 siblings, 1 reply; 17+ messages in thread
From: Mina Almasry @ 2021-11-23 21:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Tue, Nov 23, 2021 at 12:51 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Nov 22, 2021 at 04:01:02PM -0800, Mina Almasry wrote:
> > Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> > address is currently mapped by a transparent huge page or not.  Example
> > use case is a process requesting THPs from the kernel (via a huge tmpfs
> > mount for example), for a performance critical region of memory.  The
> > userspace may want to query whether the kernel is actually backing this
> > memory by hugepages or not.
>
> So you want this bit to be clear if the memory is backed by a hugetlb
> page?
>

Yes I believe so. I do not see value in telling the userspace that the
virt address is backed by a hugetlb page, since if the memory is
mapped by MAP_HUGETLB or is backed by a hugetlb file then the memory
is backed by hugetlb pages and there is no vagueness from the kernel
here.

Additionally hugetlb interfaces are more size based rather than PMD or
not. arm64 for example supports 64K, 2MB, 32MB and 1G 'huge' pages and
it's an implementation detail that those sizes are mapped CONTIG PTE,
PMD, CONITG PMD, and PUD respectively, and the specific mapping
mechanism is typically not exposed to the userspace and might not be
stable. Assuming pagemap_hugetlb_range() == PMD_MAPPED would not
technically be correct.

> >               if (page && page_mapcount(page) == 1)
> >                       flags |= PM_MMAP_EXCLUSIVE;
> > +             if (page && is_transparent_hugepage(page))
> > +                     flags |= PM_THP_MAPPED;
>
> because honestly i'd expect it to be more useful to mean "This memory
> is mapped by a PMD entry" and then the code would look like:
>
>                 if (page)
>                         flags |= PM_PMD_MAPPED;
>
> (and put a corresponding change in pagemap_hugetlb_range)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23 21:10   ` Mina Almasry
@ 2021-11-23 21:30     ` Matthew Wilcox
  2021-11-23 21:47       ` Mina Almasry
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2021-11-23 21:30 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Tue, Nov 23, 2021 at 01:10:37PM -0800, Mina Almasry wrote:
> On Tue, Nov 23, 2021 at 12:51 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Mon, Nov 22, 2021 at 04:01:02PM -0800, Mina Almasry wrote:
> > > Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> > > address is currently mapped by a transparent huge page or not.  Example
> > > use case is a process requesting THPs from the kernel (via a huge tmpfs
> > > mount for example), for a performance critical region of memory.  The
> > > userspace may want to query whether the kernel is actually backing this
> > > memory by hugepages or not.
> >
> > So you want this bit to be clear if the memory is backed by a hugetlb
> > page?
> >
> 
> Yes I believe so. I do not see value in telling the userspace that the
> virt address is backed by a hugetlb page, since if the memory is
> mapped by MAP_HUGETLB or is backed by a hugetlb file then the memory
> is backed by hugetlb pages and there is no vagueness from the kernel
> here.
> 
> Additionally hugetlb interfaces are more size based rather than PMD or
> not. arm64 for example supports 64K, 2MB, 32MB and 1G 'huge' pages and
> it's an implementation detail that those sizes are mapped CONTIG PTE,
> PMD, CONITG PMD, and PUD respectively, and the specific mapping
> mechanism is typically not exposed to the userspace and might not be
> stable. Assuming pagemap_hugetlb_range() == PMD_MAPPED would not
> technically be correct.

What I've been trying to communicate over the N reviews of this
patch series is that *the same thing is about to happen to THPs*.
Only more so.  THPs are going to be of arbitrary power-of-two size, not
necessarily sizes supported by the hardware.  That means that we need to
be extremely precise about what we mean by "is this a THP?"  Do we just
mean "This is a compound page?"  Do we mean "this is mapped by a PMD?"
Or do we mean something else?  And I feel like I haven't been able to
get that information out of you.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23 21:30     ` Matthew Wilcox
@ 2021-11-23 21:47       ` Mina Almasry
  2021-11-23 22:03         ` Matthew Wilcox
  0 siblings, 1 reply; 17+ messages in thread
From: Mina Almasry @ 2021-11-23 21:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Tue, Nov 23, 2021 at 1:30 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 23, 2021 at 01:10:37PM -0800, Mina Almasry wrote:
> > On Tue, Nov 23, 2021 at 12:51 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Mon, Nov 22, 2021 at 04:01:02PM -0800, Mina Almasry wrote:
> > > > Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> > > > address is currently mapped by a transparent huge page or not.  Example
> > > > use case is a process requesting THPs from the kernel (via a huge tmpfs
> > > > mount for example), for a performance critical region of memory.  The
> > > > userspace may want to query whether the kernel is actually backing this
> > > > memory by hugepages or not.
> > >
> > > So you want this bit to be clear if the memory is backed by a hugetlb
> > > page?
> > >
> >
> > Yes I believe so. I do not see value in telling the userspace that the
> > virt address is backed by a hugetlb page, since if the memory is
> > mapped by MAP_HUGETLB or is backed by a hugetlb file then the memory
> > is backed by hugetlb pages and there is no vagueness from the kernel
> > here.
> >
> > Additionally hugetlb interfaces are more size based rather than PMD or
> > not. arm64 for example supports 64K, 2MB, 32MB and 1G 'huge' pages and
> > it's an implementation detail that those sizes are mapped CONTIG PTE,
> > PMD, CONITG PMD, and PUD respectively, and the specific mapping
> > mechanism is typically not exposed to the userspace and might not be
> > stable. Assuming pagemap_hugetlb_range() == PMD_MAPPED would not
> > technically be correct.
>
> What I've been trying to communicate over the N reviews of this
> patch series is that *the same thing is about to happen to THPs*.
> Only more so.  THPs are going to be of arbitrary power-of-two size, not
> necessarily sizes supported by the hardware.  That means that we need to
> be extremely precise about what we mean by "is this a THP?"  Do we just
> mean "This is a compound page?"  Do we mean "this is mapped by a PMD?"
> Or do we mean something else?  And I feel like I haven't been able to
> get that information out of you.
>

Yes, I'm very sorry for the trouble, but I'm also confused what the
disconnect is. To allocate hugepages I can do like so:

mount -t tmpfs -o huge=always tmpfs /mnt/mytmpfs

or

madvise(..., MADV_HUGEPAGE)

Note I don't ask the kernel for a specific size, or a specific mapping
mechanism (PMD/contig PTE/contig PMD/PUD), I just ask the kernel for
'huge' pages. I would like to know whether the kernel was successful
in allocating a hugepage or not. Today a THP hugepage AFAICT is PMD
mapped + is_transparent_hugepage(), which is the check I have here. In
the future, THP may become an arbitrary power of two size, and I think
I'll need to update this querying interface once/if that gets merged
to the kernel. I.e, if in the future I allocate pages by using:

mount -t tmpfs -o huge=2MB tmpfs /mnt/mytmpfs

I need the kernel to tell me whether the mapping is 2MB size or not.

If I allocate pages by using:

mount -t tmpfs -o huge=pmd tmpfs /mnt/mytmps,

Then I need the kernel to tell me whether the pages are PMD mapped or
not, as I'm doing here.

The current implementation is based on what the current THP
implementation is in the kernel, and depending on future changes to
THP I may need to update it in the future. Does that make sense?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23 21:47       ` Mina Almasry
@ 2021-11-23 22:03         ` Matthew Wilcox
  2021-11-23 22:23           ` Mina Almasry
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2021-11-23 22:03 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Tue, Nov 23, 2021 at 01:47:33PM -0800, Mina Almasry wrote:
> On Tue, Nov 23, 2021 at 1:30 PM Matthew Wilcox <willy@infradead.org> wrote:
> > What I've been trying to communicate over the N reviews of this
> > patch series is that *the same thing is about to happen to THPs*.
> > Only more so.  THPs are going to be of arbitrary power-of-two size, not
> > necessarily sizes supported by the hardware.  That means that we need to
> > be extremely precise about what we mean by "is this a THP?"  Do we just
> > mean "This is a compound page?"  Do we mean "this is mapped by a PMD?"
> > Or do we mean something else?  And I feel like I haven't been able to
> > get that information out of you.
> 
> Yes, I'm very sorry for the trouble, but I'm also confused what the
> disconnect is. To allocate hugepages I can do like so:
> 
> mount -t tmpfs -o huge=always tmpfs /mnt/mytmpfs
> 
> or
> 
> madvise(..., MADV_HUGEPAGE)
> 
> Note I don't ask the kernel for a specific size, or a specific mapping
> mechanism (PMD/contig PTE/contig PMD/PUD), I just ask the kernel for
> 'huge' pages. I would like to know whether the kernel was successful
> in allocating a hugepage or not. Today a THP hugepage AFAICT is PMD
> mapped + is_transparent_hugepage(), which is the check I have here. In
> the future, THP may become an arbitrary power of two size, and I think
> I'll need to update this querying interface once/if that gets merged
> to the kernel. I.e, if in the future I allocate pages by using:
> 
> mount -t tmpfs -o huge=2MB tmpfs /mnt/mytmpfs
> 
> I need the kernel to tell me whether the mapping is 2MB size or not.
> 
> If I allocate pages by using:
> 
> mount -t tmpfs -o huge=pmd tmpfs /mnt/mytmps,
> 
> Then I need the kernel to tell me whether the pages are PMD mapped or
> not, as I'm doing here.
> 
> The current implementation is based on what the current THP
> implementation is in the kernel, and depending on future changes to
> THP I may need to update it in the future. Does that make sense?

Well, no.  You're adding (or changing, if you like) a userspace API.
We need to be precise about what that userspace API *means*, so that we
don't break it in the future when the implementation changes.  You're
still being fuzzy above.

I have no intention of adding an API like the ones you suggest above to
allow the user to specify what size pages to use.  That seems very strange
to me; how should the user (or sysadmin, or application) know what size is
best for the kernel to use to cache files?  Instead, the kernel observes
the usage pattern of the file (through the readahead mechanism) and grows
the allocation size to fit what the kernel thinks will be most effective.

I do honour some of the existing hints that userspace can provide; eg
VM_HUGEPAGE makes the pagefault path allocate PMD sized pages (if it can).
But there's intentionally no new way to tell the kernel to use pages
of a particular size.  The current implementation will use (at least)
64kB pages if you do reads in 64kB chunks, but that's not guaranteed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23 22:03         ` Matthew Wilcox
@ 2021-11-23 22:23           ` Mina Almasry
  2021-11-23 22:59             ` Matthew Wilcox
  0 siblings, 1 reply; 17+ messages in thread
From: Mina Almasry @ 2021-11-23 22:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Tue, Nov 23, 2021 at 2:03 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 23, 2021 at 01:47:33PM -0800, Mina Almasry wrote:
> > On Tue, Nov 23, 2021 at 1:30 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > What I've been trying to communicate over the N reviews of this
> > > patch series is that *the same thing is about to happen to THPs*.
> > > Only more so.  THPs are going to be of arbitrary power-of-two size, not
> > > necessarily sizes supported by the hardware.  That means that we need to
> > > be extremely precise about what we mean by "is this a THP?"  Do we just
> > > mean "This is a compound page?"  Do we mean "this is mapped by a PMD?"
> > > Or do we mean something else?  And I feel like I haven't been able to
> > > get that information out of you.
> >
> > Yes, I'm very sorry for the trouble, but I'm also confused what the
> > disconnect is. To allocate hugepages I can do like so:
> >
> > mount -t tmpfs -o huge=always tmpfs /mnt/mytmpfs
> >
> > or
> >
> > madvise(..., MADV_HUGEPAGE)
> >
> > Note I don't ask the kernel for a specific size, or a specific mapping
> > mechanism (PMD/contig PTE/contig PMD/PUD), I just ask the kernel for
> > 'huge' pages. I would like to know whether the kernel was successful
> > in allocating a hugepage or not. Today a THP hugepage AFAICT is PMD
> > mapped + is_transparent_hugepage(), which is the check I have here. In
> > the future, THP may become an arbitrary power of two size, and I think
> > I'll need to update this querying interface once/if that gets merged
> > to the kernel. I.e, if in the future I allocate pages by using:
> >
> > mount -t tmpfs -o huge=2MB tmpfs /mnt/mytmpfs
> >
> > I need the kernel to tell me whether the mapping is 2MB size or not.
> >
> > If I allocate pages by using:
> >
> > mount -t tmpfs -o huge=pmd tmpfs /mnt/mytmps,
> >
> > Then I need the kernel to tell me whether the pages are PMD mapped or
> > not, as I'm doing here.
> >
> > The current implementation is based on what the current THP
> > implementation is in the kernel, and depending on future changes to
> > THP I may need to update it in the future. Does that make sense?
>
> Well, no.  You're adding (or changing, if you like) a userspace API.
> We need to be precise about what that userspace API *means*, so that we
> don't break it in the future when the implementation changes.  You're
> still being fuzzy above.
>
> I have no intention of adding an API like the ones you suggest above to
> allow the user to specify what size pages to use.  That seems very strange
> to me; how should the user (or sysadmin, or application) know what size is
> best for the kernel to use to cache files?  Instead, the kernel observes
> the usage pattern of the file (through the readahead mechanism) and grows
> the allocation size to fit what the kernel thinks will be most effective.
>
> I do honour some of the existing hints that userspace can provide; eg
> VM_HUGEPAGE makes the pagefault path allocate PMD sized pages (if it can).

Right, so since VM_HUGEPAGE makes the kernel allocate PMD mapped THP
if it can, then I want to know if the page is actually a PMD mapped
THP or not. The implementation and documentation that I'm adding seem
consistent with that AFAICT, but sorry if I missed something.

> But there's intentionally no new way to tell the kernel to use pages
> of a particular size.  The current implementation will use (at least)
> 64kB pages if you do reads in 64kB chunks, but that's not guaranteed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23 22:23           ` Mina Almasry
@ 2021-11-23 22:59             ` Matthew Wilcox
  2021-11-23 23:16               ` Mina Almasry
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2021-11-23 22:59 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Tue, Nov 23, 2021 at 02:23:23PM -0800, Mina Almasry wrote:
> On Tue, Nov 23, 2021 at 2:03 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Nov 23, 2021 at 01:47:33PM -0800, Mina Almasry wrote:
> > > On Tue, Nov 23, 2021 at 1:30 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > What I've been trying to communicate over the N reviews of this
> > > > patch series is that *the same thing is about to happen to THPs*.
> > > > Only more so.  THPs are going to be of arbitrary power-of-two size, not
> > > > necessarily sizes supported by the hardware.  That means that we need to
> > > > be extremely precise about what we mean by "is this a THP?"  Do we just
> > > > mean "This is a compound page?"  Do we mean "this is mapped by a PMD?"
> > > > Or do we mean something else?  And I feel like I haven't been able to
> > > > get that information out of you.
> > >
> > > Yes, I'm very sorry for the trouble, but I'm also confused what the
> > > disconnect is. To allocate hugepages I can do like so:
> > >
> > > mount -t tmpfs -o huge=always tmpfs /mnt/mytmpfs
> > >
> > > or
> > >
> > > madvise(..., MADV_HUGEPAGE)
> > >
> > > Note I don't ask the kernel for a specific size, or a specific mapping
> > > mechanism (PMD/contig PTE/contig PMD/PUD), I just ask the kernel for
> > > 'huge' pages. I would like to know whether the kernel was successful
> > > in allocating a hugepage or not. Today a THP hugepage AFAICT is PMD
> > > mapped + is_transparent_hugepage(), which is the check I have here. In
> > > the future, THP may become an arbitrary power of two size, and I think
> > > I'll need to update this querying interface once/if that gets merged
> > > to the kernel. I.e, if in the future I allocate pages by using:
> > >
> > > mount -t tmpfs -o huge=2MB tmpfs /mnt/mytmpfs
> > >
> > > I need the kernel to tell me whether the mapping is 2MB size or not.
> > >
> > > If I allocate pages by using:
> > >
> > > mount -t tmpfs -o huge=pmd tmpfs /mnt/mytmps,
> > >
> > > Then I need the kernel to tell me whether the pages are PMD mapped or
> > > not, as I'm doing here.
> > >
> > > The current implementation is based on what the current THP
> > > implementation is in the kernel, and depending on future changes to
> > > THP I may need to update it in the future. Does that make sense?
> >
> > Well, no.  You're adding (or changing, if you like) a userspace API.
> > We need to be precise about what that userspace API *means*, so that we
> > don't break it in the future when the implementation changes.  You're
> > still being fuzzy above.
> >
> > I have no intention of adding an API like the ones you suggest above to
> > allow the user to specify what size pages to use.  That seems very strange
> > to me; how should the user (or sysadmin, or application) know what size is
> > best for the kernel to use to cache files?  Instead, the kernel observes
> > the usage pattern of the file (through the readahead mechanism) and grows
> > the allocation size to fit what the kernel thinks will be most effective.
> >
> > I do honour some of the existing hints that userspace can provide; eg
> > VM_HUGEPAGE makes the pagefault path allocate PMD sized pages (if it can).
> 
> Right, so since VM_HUGEPAGE makes the kernel allocate PMD mapped THP
> if it can, then I want to know if the page is actually a PMD mapped
> THP or not. The implementation and documentation that I'm adding seem
> consistent with that AFAICT, but sorry if I missed something.

So what userspace cares about is that the kernel is mapping the
memory with a PMD entry; it doesn't care whether the file is
being cached in 2MB (or larger) chunks.  So we can drop the 'THP'
from all of this, and just call the bit the PMD mapping bit?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23 22:59             ` Matthew Wilcox
@ 2021-11-23 23:16               ` Mina Almasry
  0 siblings, 0 replies; 17+ messages in thread
From: Mina Almasry @ 2021-11-23 23:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Tue, Nov 23, 2021 at 2:59 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 23, 2021 at 02:23:23PM -0800, Mina Almasry wrote:
> > On Tue, Nov 23, 2021 at 2:03 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Tue, Nov 23, 2021 at 01:47:33PM -0800, Mina Almasry wrote:
> > > > On Tue, Nov 23, 2021 at 1:30 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > What I've been trying to communicate over the N reviews of this
> > > > > patch series is that *the same thing is about to happen to THPs*.
> > > > > Only more so.  THPs are going to be of arbitrary power-of-two size, not
> > > > > necessarily sizes supported by the hardware.  That means that we need to
> > > > > be extremely precise about what we mean by "is this a THP?"  Do we just
> > > > > mean "This is a compound page?"  Do we mean "this is mapped by a PMD?"
> > > > > Or do we mean something else?  And I feel like I haven't been able to
> > > > > get that information out of you.
> > > >
> > > > Yes, I'm very sorry for the trouble, but I'm also confused what the
> > > > disconnect is. To allocate hugepages I can do like so:
> > > >
> > > > mount -t tmpfs -o huge=always tmpfs /mnt/mytmpfs
> > > >
> > > > or
> > > >
> > > > madvise(..., MADV_HUGEPAGE)
> > > >
> > > > Note I don't ask the kernel for a specific size, or a specific mapping
> > > > mechanism (PMD/contig PTE/contig PMD/PUD), I just ask the kernel for
> > > > 'huge' pages. I would like to know whether the kernel was successful
> > > > in allocating a hugepage or not. Today a THP hugepage AFAICT is PMD
> > > > mapped + is_transparent_hugepage(), which is the check I have here. In
> > > > the future, THP may become an arbitrary power of two size, and I think
> > > > I'll need to update this querying interface once/if that gets merged
> > > > to the kernel. I.e, if in the future I allocate pages by using:
> > > >
> > > > mount -t tmpfs -o huge=2MB tmpfs /mnt/mytmpfs
> > > >
> > > > I need the kernel to tell me whether the mapping is 2MB size or not.
> > > >
> > > > If I allocate pages by using:
> > > >
> > > > mount -t tmpfs -o huge=pmd tmpfs /mnt/mytmps,
> > > >
> > > > Then I need the kernel to tell me whether the pages are PMD mapped or
> > > > not, as I'm doing here.
> > > >
> > > > The current implementation is based on what the current THP
> > > > implementation is in the kernel, and depending on future changes to
> > > > THP I may need to update it in the future. Does that make sense?
> > >
> > > Well, no.  You're adding (or changing, if you like) a userspace API.
> > > We need to be precise about what that userspace API *means*, so that we
> > > don't break it in the future when the implementation changes.  You're
> > > still being fuzzy above.
> > >
> > > I have no intention of adding an API like the ones you suggest above to
> > > allow the user to specify what size pages to use.  That seems very strange
> > > to me; how should the user (or sysadmin, or application) know what size is
> > > best for the kernel to use to cache files?  Instead, the kernel observes
> > > the usage pattern of the file (through the readahead mechanism) and grows
> > > the allocation size to fit what the kernel thinks will be most effective.
> > >
> > > I do honour some of the existing hints that userspace can provide; eg
> > > VM_HUGEPAGE makes the pagefault path allocate PMD sized pages (if it can).
> >
> > Right, so since VM_HUGEPAGE makes the kernel allocate PMD mapped THP
> > if it can, then I want to know if the page is actually a PMD mapped
> > THP or not. The implementation and documentation that I'm adding seem
> > consistent with that AFAICT, but sorry if I missed something.
>
> So what userspace cares about is that the kernel is mapping the
> memory with a PMD entry; it doesn't care whether the file is
> being cached in 2MB (or larger) chunks.  So we can drop the 'THP'
> from all of this, and just call the bit the PMD mapping bit?

I've thought about this a bit, but I have a couple of problems:

1. It's a bit difficult to implement this for hugetlb pages, or at
least I haven't found a reasonably simple way to implement this for
hugetlb pages. hugetlb ranges are handled by
pagemap_hugetlb_range(ptep, hmask, ...). I can't find a way to uncover
whether ptep points to a pmd_t or pud_t or even pte_t with contig PTE
bit set. I can also easily surmise the size of the page from the
hmask, but I need to know what's the native page size and what arch
I'm running on to convert a page size to "is PMD mapped or not''
information. Very sorry if I missed an easy way to do this.

2. Semantically I'm not sure it makes sense to tell the user if a page
is PMD hugetlb or not. For THP I think it makes somewhat sense because
the userspace asks for hugepages via MADV_HUGEPAGE or huge=always, and
'huge' roughly here means 'PMD mapped', per your statement that for
VM_HUGEPAGE makes the kernel try to allocate PMD size pages. For
hugetlb, the userspace never asks for 'huge' pages or PMD mappings per
say, they ask for a specific size, and it's considered an
implementation detail how the mapping is achieved, and may not even be
backwards compatible.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-23  0:01 [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap Mina Almasry
                   ` (3 preceding siblings ...)
  2021-11-23 20:51 ` Matthew Wilcox
@ 2021-11-28  4:10 ` Matthew Wilcox
  2021-12-14  0:22   ` Mina Almasry
  4 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2021-11-28  4:10 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Mon, Nov 22, 2021 at 04:01:02PM -0800, Mina Almasry wrote:
> Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> address is currently mapped by a transparent huge page or not.  Example
> use case is a process requesting THPs from the kernel (via a huge tmpfs
> mount for example), for a performance critical region of memory.  The
> userspace may want to query whether the kernel is actually backing this
> memory by hugepages or not.

But what is userspace going to _do_ differently if the kernel hasn't
backed the memory with huge pages?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-11-28  4:10 ` Matthew Wilcox
@ 2021-12-14  0:22   ` Mina Almasry
  2022-01-04 23:04     ` Mina Almasry
  0 siblings, 1 reply; 17+ messages in thread
From: Mina Almasry @ 2021-12-14  0:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Sat, Nov 27, 2021 at 8:10 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Nov 22, 2021 at 04:01:02PM -0800, Mina Almasry wrote:
> > Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> > address is currently mapped by a transparent huge page or not.  Example
> > use case is a process requesting THPs from the kernel (via a huge tmpfs
> > mount for example), for a performance critical region of memory.  The
> > userspace may want to query whether the kernel is actually backing this
> > memory by hugepages or not.
>
> But what is userspace going to _do_ differently if the kernel hasn't
> backed the memory with huge pages?

Sorry for the late reply here.

My plan is to expose this information as metrics right now and:
1. Understand the kind of hugepage backing we're actually getting if any.
2. If there are drops in hugepage backing we can investigate the
cause, whether it's due to normal memory fragmentation or some
bug/issue.
3. Schedule machines for reboots to defragment the memory if the
hugepage backing is too low.
4. Possibly motivate future work to improve hugepage backing if our
numbers are too low.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2021-12-14  0:22   ` Mina Almasry
@ 2022-01-04 23:04     ` Mina Almasry
  2022-01-05  4:39       ` Matthew Wilcox
  2022-01-11 23:35       ` William Kucharski
  0 siblings, 2 replies; 17+ messages in thread
From: Mina Almasry @ 2022-01-04 23:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Mon, Dec 13, 2021 at 4:22 PM Mina Almasry <almasrymina@google.com> wrote:
>
> On Sat, Nov 27, 2021 at 8:10 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Mon, Nov 22, 2021 at 04:01:02PM -0800, Mina Almasry wrote:
> > > Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> > > address is currently mapped by a transparent huge page or not.  Example
> > > use case is a process requesting THPs from the kernel (via a huge tmpfs
> > > mount for example), for a performance critical region of memory.  The
> > > userspace may want to query whether the kernel is actually backing this
> > > memory by hugepages or not.
> >
> > But what is userspace going to _do_ differently if the kernel hasn't
> > backed the memory with huge pages?
>
> Sorry for the late reply here.
>
> My plan is to expose this information as metrics right now and:
> 1. Understand the kind of hugepage backing we're actually getting if any.
> 2. If there are drops in hugepage backing we can investigate the
> cause, whether it's due to normal memory fragmentation or some
> bug/issue.
> 3. Schedule machines for reboots to defragment the memory if the
> hugepage backing is too low.
> 4. Possibly motivate future work to improve hugepage backing if our
> numbers are too low.

Friendly ping on this. It has been reviewed by a few folks and after
Matthew had questions about the use case which I've answered in the
email above. Matthew, are you opposed to this patch?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2022-01-04 23:04     ` Mina Almasry
@ 2022-01-05  4:39       ` Matthew Wilcox
  2022-01-11 23:35       ` William Kucharski
  1 sibling, 0 replies; 17+ messages in thread
From: Matthew Wilcox @ 2022-01-05  4:39 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Jonathan Corbet, David Hildenbrand, Paul E . McKenney, Yu Zhao,
	Andrew Morton, Peter Xu, Ivan Teterevkov, Florian Schmidt,
	linux-kernel, linux-fsdevel, linux-mm, linux-doc

On Tue, Jan 04, 2022 at 03:04:31PM -0800, Mina Almasry wrote:
> On Mon, Dec 13, 2021 at 4:22 PM Mina Almasry <almasrymina@google.com> wrote:
> >
> > On Sat, Nov 27, 2021 at 8:10 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Mon, Nov 22, 2021 at 04:01:02PM -0800, Mina Almasry wrote:
> > > > Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> > > > address is currently mapped by a transparent huge page or not.  Example
> > > > use case is a process requesting THPs from the kernel (via a huge tmpfs
> > > > mount for example), for a performance critical region of memory.  The
> > > > userspace may want to query whether the kernel is actually backing this
> > > > memory by hugepages or not.
> > >
> > > But what is userspace going to _do_ differently if the kernel hasn't
> > > backed the memory with huge pages?
> >
> > Sorry for the late reply here.
> >
> > My plan is to expose this information as metrics right now and:
> > 1. Understand the kind of hugepage backing we're actually getting if any.
> > 2. If there are drops in hugepage backing we can investigate the
> > cause, whether it's due to normal memory fragmentation or some
> > bug/issue.
> > 3. Schedule machines for reboots to defragment the memory if the
> > hugepage backing is too low.
> > 4. Possibly motivate future work to improve hugepage backing if our
> > numbers are too low.
> 
> Friendly ping on this. It has been reviewed by a few folks and after
> Matthew had questions about the use case which I've answered in the
> email above. Matthew, are you opposed to this patch?

I'm not convinced you need more than the existing stats
(THP_FAULT_FALLBACK) for the information you claim to want.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
  2022-01-04 23:04     ` Mina Almasry
  2022-01-05  4:39       ` Matthew Wilcox
@ 2022-01-11 23:35       ` William Kucharski
  1 sibling, 0 replies; 17+ messages in thread
From: William Kucharski @ 2022-01-11 23:35 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Matthew Wilcox, Jonathan Corbet, David Hildenbrand,
	Paul E . McKenney, Yu Zhao, Andrew Morton, Peter Xu,
	Ivan Teterevkov, Florian Schmidt, linux-kernel, linux-fsdevel,
	linux-mm, linux-doc



> On Jan 4, 2022, at 4:04 PM, Mina Almasry <almasrymina@google.com> wrote:
> 
> On Mon, Dec 13, 2021 at 4:22 PM Mina Almasry <almasrymina@google.com> wrote:
>> 
>> On Sat, Nov 27, 2021 at 8:10 PM Matthew Wilcox <willy@infradead.org> wrote:
>>> 
>>> On Mon, Nov 22, 2021 at 04:01:02PM -0800, Mina Almasry wrote:
>>>> Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
>>>> address is currently mapped by a transparent huge page or not.  Example
>>>> use case is a process requesting THPs from the kernel (via a huge tmpfs
>>>> mount for example), for a performance critical region of memory.  The
>>>> userspace may want to query whether the kernel is actually backing this
>>>> memory by hugepages or not.
>>> 
>>> But what is userspace going to _do_ differently if the kernel hasn't
>>> backed the memory with huge pages?
>> 
>> Sorry for the late reply here.
>> 
>> My plan is to expose this information as metrics right now and:
>> 1. Understand the kind of hugepage backing we're actually getting if any.
>> 2. If there are drops in hugepage backing we can investigate the
>> cause, whether it's due to normal memory fragmentation or some
>> bug/issue.
>> 3. Schedule machines for reboots to defragment the memory if the
>> hugepage backing is too low.
>> 4. Possibly motivate future work to improve hugepage backing if our
>> numbers are too low.
> 
> Friendly ping on this. It has been reviewed by a few folks and after
> Matthew had questions about the use case which I've answered in the
> email above. Matthew, are you opposed to this patch?

I realize I'm jumping in late on this, but while (1) and (2) are
understandable, this bothers me:

> 3. Schedule machines for reboots to defragment the memory if the
> hugepage backing is too low.

If it's important enough to reboot the machine, shouldn't one be using
hugetlbfs instead?

It seems like user space is trying to track something it shouldn't be concerned
about, rather like if user space were concerned about precisely where in physical
memory its pages were mapped.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-01-11 23:35 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-23  0:01 [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap Mina Almasry
2021-11-23  1:10 ` Peter Xu
2021-11-23  1:50 ` David Rientjes
2021-11-23 12:05 ` David Hildenbrand
2021-11-23 20:51 ` Matthew Wilcox
2021-11-23 21:10   ` Mina Almasry
2021-11-23 21:30     ` Matthew Wilcox
2021-11-23 21:47       ` Mina Almasry
2021-11-23 22:03         ` Matthew Wilcox
2021-11-23 22:23           ` Mina Almasry
2021-11-23 22:59             ` Matthew Wilcox
2021-11-23 23:16               ` Mina Almasry
2021-11-28  4:10 ` Matthew Wilcox
2021-12-14  0:22   ` Mina Almasry
2022-01-04 23:04     ` Mina Almasry
2022-01-05  4:39       ` Matthew Wilcox
2022-01-11 23:35       ` William Kucharski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.