[v3] mm, thp: always specify disabled vmas as nh in smaps
diff mbox series

Message ID alpine.DEB.2.21.1809251449060.96762@chino.kir.corp.google.com
State In Next
Commit ddf1c51b2b5dfa2c18f16d791095c53b9491e3c7
Headers show
Series
  • [v3] mm, thp: always specify disabled vmas as nh in smaps
Related show

Commit Message

David Rientjes Sept. 25, 2018, 9:50 p.m. UTC
Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
introduced a regression in that userspace cannot always determine the set
of vmas where thp is disabled.

Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
to determine if a vma has been disabled from being backed by hugepages.

Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
be disabled and emit "nh" as a flag for the corresponding vmas as part of
/proc/pid/smaps.  After the commit, thp is disabled by means of an mm
flag and "nh" is not emitted.

This causes smaps parsing libraries to assume a vma is enabled for thp
and ends up puzzling the user on why its memory is not backed by thp.

This also clears the "hg" flag to make the behavior of MADV_HUGEPAGE and
PR_SET_THP_DISABLE definitive.

Fixes: 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
Signed-off-by: David Rientjes <rientjes@google.com>
---
 v3:
  - reword Documentation/filesystems/proc.txt for eligibility

 v2:
  - clear VM_HUGEPAGE per Vlastimil
  - update Documentation/filesystems/proc.txt to be explicit

 Documentation/filesystems/proc.txt |  7 ++++++-
 fs/proc/task_mmu.c                 | 14 +++++++++++++-
 2 files changed, 19 insertions(+), 2 deletions(-)

Comments

Michal Hocko Sept. 26, 2018, 6:12 a.m. UTC | #1
On Tue 25-09-18 14:50:52, David Rientjes wrote:
[...]
Let's put my general disagreement with the approach asside for a while.
If this is really the best way forward the is the implementation really
correct?

> +	/*
> +	 * Disabling thp is possible through both MADV_NOHUGEPAGE and
> +	 * PR_SET_THP_DISABLE.  Both historically used VM_NOHUGEPAGE.  Since
> +	 * the introduction of MMF_DISABLE_THP, however, userspace needs the
> +	 * ability to detect vmas where thp is not eligible in the same manner.
> +	 */
> +	if (vma->vm_mm && test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) {
> +		flags &= ~VM_HUGEPAGE;
> +		flags |= VM_NOHUGEPAGE;
> +	}

Do we want to report all vmas nh? Shouldn't we limit that to THP-able
mappings? It seems quite strange that an application started without
PR_SET_THP_DISABLE wouldn't report nh for most mappings while it would
otherwise. Also when can we have vma->vm_mm == NULL?

> +
>  	seq_puts(m, "VmFlags: ");
>  	for (i = 0; i < BITS_PER_LONG; i++) {
>  		if (!mnemonics[i][0])
>  			continue;
> -		if (vma->vm_flags & (1UL << i)) {
> +		if (flags & (1UL << i)) {
>  			seq_putc(m, mnemonics[i][0]);
>  			seq_putc(m, mnemonics[i][1]);
>  			seq_putc(m, ' ');
Michal Hocko Sept. 26, 2018, 7:17 a.m. UTC | #2
On Wed 26-09-18 08:12:47, Michal Hocko wrote:
> On Tue 25-09-18 14:50:52, David Rientjes wrote:
> [...]
> Let's put my general disagreement with the approach asside for a while.
> If this is really the best way forward the is the implementation really
> correct?
> 
> > +	/*
> > +	 * Disabling thp is possible through both MADV_NOHUGEPAGE and
> > +	 * PR_SET_THP_DISABLE.  Both historically used VM_NOHUGEPAGE.  Since
> > +	 * the introduction of MMF_DISABLE_THP, however, userspace needs the
> > +	 * ability to detect vmas where thp is not eligible in the same manner.
> > +	 */
> > +	if (vma->vm_mm && test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) {
> > +		flags &= ~VM_HUGEPAGE;
> > +		flags |= VM_NOHUGEPAGE;
> > +	}
> 
> Do we want to report all vmas nh? Shouldn't we limit that to THP-able
> mappings? It seems quite strange that an application started without
> PR_SET_THP_DISABLE wouldn't report nh for most mappings while it would
> otherwise. Also when can we have vma->vm_mm == NULL?

Hmm, after re-reading your documentation update to "A process mapping
may be advised to not be backed by transparent hugepages by either
madvise(MADV_NOHUGEPAGE) or prctl(PR_SET_THP_DISABLE)." the
implementation matches so scratch my comment.

As I've said, I am not happy about this approach but if there is a
general agreement this is really the best we can do I will not stand in
the way.
Vlastimil Babka Sept. 26, 2018, 8:40 a.m. UTC | #3
On 9/25/18 11:50 PM, David Rientjes wrote:
> Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
> introduced a regression in that userspace cannot always determine the set
> of vmas where thp is disabled.
> 
> Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
> to determine if a vma has been disabled from being backed by hugepages.
> 
> Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
> be disabled and emit "nh" as a flag for the corresponding vmas as part of
> /proc/pid/smaps.  After the commit, thp is disabled by means of an mm
> flag and "nh" is not emitted.
> 
> This causes smaps parsing libraries to assume a vma is enabled for thp
> and ends up puzzling the user on why its memory is not backed by thp.
> 
> This also clears the "hg" flag to make the behavior of MADV_HUGEPAGE and
> PR_SET_THP_DISABLE definitive.
> 
> Fixes: 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
> Signed-off-by: David Rientjes <rientjes@google.com>

Well, as Andrew said, we had the opportunity to provide a more complete
info to userspace e.g. with Michal's suggested /proc/pid/status
enhancement. If this is good enough for you (and nobody else cares) then
I won't block it either. It would be unfortunate though if we could not
revert this in case the MMF_DISABLE_THP querying is implemented later.
Hopefully the only consumers are internal tools such as yours, which can
be easily adapted...

Vlastimil

Patch
diff mbox series

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -491,9 +491,14 @@  manner. The codes are the following:
     sd  - soft-dirty flag
     mm  - mixed map area
     hg  - huge page advise flag
-    nh  - no-huge page advise flag
+    nh  - no-huge page advise flag [*]
     mg  - mergable advise flag
 
+ [*] A process mapping may be advised to not be backed by transparent hugepages
+     by either madvise(MADV_NOHUGEPAGE) or prctl(PR_SET_THP_DISABLE).  See
+     Documentation/admin-guide/mm/transhuge.rst for system-wide and process
+     mapping policies.
+
 Note that there is no guarantee that every flag and associated mnemonic will
 be present in all further kernel releases. Things get changed, the flags may
 be vanished or the reverse -- new added.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -653,13 +653,25 @@  static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 #endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
 	};
+	unsigned long flags = vma->vm_flags;
 	size_t i;
 
+	/*
+	 * Disabling thp is possible through both MADV_NOHUGEPAGE and
+	 * PR_SET_THP_DISABLE.  Both historically used VM_NOHUGEPAGE.  Since
+	 * the introduction of MMF_DISABLE_THP, however, userspace needs the
+	 * ability to detect vmas where thp is not eligible in the same manner.
+	 */
+	if (vma->vm_mm && test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) {
+		flags &= ~VM_HUGEPAGE;
+		flags |= VM_NOHUGEPAGE;
+	}
+
 	seq_puts(m, "VmFlags: ");
 	for (i = 0; i < BITS_PER_LONG; i++) {
 		if (!mnemonics[i][0])
 			continue;
-		if (vma->vm_flags & (1UL << i)) {
+		if (flags & (1UL << i)) {
 			seq_putc(m, mnemonics[i][0]);
 			seq_putc(m, mnemonics[i][1]);
 			seq_putc(m, ' ');