linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] mm, thp: always specify ineligible vmas as nh in smaps
@ 2018-09-24 17:55 David Rientjes
  2018-09-24 18:25 ` Vlastimil Babka
  0 siblings, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-09-24 17:55 UTC (permalink / raw)
  To: Alexey Dobriyan, Andrew Morton
  Cc: Vlastimil Babka, Kirill A. Shutemov, Michal Hocko, linux-kernel

Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
introduced a regression in that userspace cannot always determine the set
of vmas where thp is ineligible.

Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
to determine if a vma is eligible to be backed by hugepages.

Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
be disabled and emit "nh" as a flag for the corresponding vmas as part of
/proc/pid/smaps.  After the commit, thp is disabled by means of an mm
flag and "nh" is not emitted.

This causes smaps parsing libraries to assume a vma is eligible for thp
and ends up puzzling the user on why its memory is not backed by thp.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 fs/proc/task_mmu.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -653,13 +653,23 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 #endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
 	};
+	unsigned long flags = vma->vm_flags;
 	size_t i;
 
+	/*
+	 * Disabling thp is possible through both MADV_NOHUGEPAGE and
+	 * PR_SET_THP_DISABLE.  Both historically used VM_NOHUGEPAGE.  Since
+	 * the introduction of MMF_DISABLE_THP, however, userspace needs the
+	 * ability to detect vmas where thp is not eligible in the same manner.
+	 */
+	if (vma->vm_mm && test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
+		flags |= VM_NOHUGEPAGE;
+
 	seq_puts(m, "VmFlags: ");
 	for (i = 0; i < BITS_PER_LONG; i++) {
 		if (!mnemonics[i][0])
 			continue;
-		if (vma->vm_flags & (1UL << i)) {
+		if (flags & (1UL << i)) {
 			seq_putc(m, mnemonics[i][0]);
 			seq_putc(m, mnemonics[i][1]);
 			seq_putc(m, ' ');

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-24 17:55 [patch] mm, thp: always specify ineligible vmas as nh in smaps David Rientjes
@ 2018-09-24 18:25 ` Vlastimil Babka
  2018-09-24 19:17   ` David Rientjes
  0 siblings, 1 reply; 41+ messages in thread
From: Vlastimil Babka @ 2018-09-24 18:25 UTC (permalink / raw)
  To: David Rientjes, Alexey Dobriyan, Andrew Morton
  Cc: Kirill A. Shutemov, Michal Hocko, linux-kernel, Linux-MM layout,
	Linux API

+CC linux-mm linux-api

On 9/24/18 7:55 PM, David Rientjes wrote:
> Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
> introduced a regression in that userspace cannot always determine the set
> of vmas where thp is ineligible.
> 
> Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
> to determine if a vma is eligible to be backed by hugepages.
> 
> Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
> be disabled and emit "nh" as a flag for the corresponding vmas as part of
> /proc/pid/smaps.  After the commit, thp is disabled by means of an mm
> flag and "nh" is not emitted.
> 
> This causes smaps parsing libraries to assume a vma is eligible for thp
> and ends up puzzling the user on why its memory is not backed by thp.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

Fixes: 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")

Not worth for stable IMO, but makes sense otherwise.

Acked-by: Vlastimil Babka <vbabka@suse.cz>

A question below:

> ---
>  fs/proc/task_mmu.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -653,13 +653,23 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>  #endif
>  #endif /* CONFIG_ARCH_HAS_PKEYS */
>  	};
> +	unsigned long flags = vma->vm_flags;
>  	size_t i;
>  
> +	/*
> +	 * Disabling thp is possible through both MADV_NOHUGEPAGE and
> +	 * PR_SET_THP_DISABLE.  Both historically used VM_NOHUGEPAGE.  Since
> +	 * the introduction of MMF_DISABLE_THP, however, userspace needs the
> +	 * ability to detect vmas where thp is not eligible in the same manner.
> +	 */
> +	if (vma->vm_mm && test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> +		flags |= VM_NOHUGEPAGE;

Should it also clear VM_HUGEPAGE? In case MMF_DISABLE_THP overrides a
madvise(MADV_HUGEPAGE)'d vma? (I expect it does?)

> +
>  	seq_puts(m, "VmFlags: ");
>  	for (i = 0; i < BITS_PER_LONG; i++) {
>  		if (!mnemonics[i][0])
>  			continue;
> -		if (vma->vm_flags & (1UL << i)) {
> +		if (flags & (1UL << i)) {
>  			seq_putc(m, mnemonics[i][0]);
>  			seq_putc(m, mnemonics[i][1]);
>  			seq_putc(m, ' ');
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-24 18:25 ` Vlastimil Babka
@ 2018-09-24 19:17   ` David Rientjes
  2018-09-24 19:30     ` [patch v2] " David Rientjes
  0 siblings, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-09-24 19:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Alexey Dobriyan, Andrew Morton, Kirill A. Shutemov, Michal Hocko,
	linux-kernel, Linux-MM layout, Linux API

On Mon, 24 Sep 2018, Vlastimil Babka wrote:

> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -653,13 +653,23 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
> >  #endif
> >  #endif /* CONFIG_ARCH_HAS_PKEYS */
> >  	};
> > +	unsigned long flags = vma->vm_flags;
> >  	size_t i;
> >  
> > +	/*
> > +	 * Disabling thp is possible through both MADV_NOHUGEPAGE and
> > +	 * PR_SET_THP_DISABLE.  Both historically used VM_NOHUGEPAGE.  Since
> > +	 * the introduction of MMF_DISABLE_THP, however, userspace needs the
> > +	 * ability to detect vmas where thp is not eligible in the same manner.
> > +	 */
> > +	if (vma->vm_mm && test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> > +		flags |= VM_NOHUGEPAGE;
> 
> Should it also clear VM_HUGEPAGE? In case MMF_DISABLE_THP overrides a
> madvise(MADV_HUGEPAGE)'d vma? (I expect it does?)
> 

Good point, I think that is should because MMF_DISABLE_THP will override 
VM_HUGEPAGE.  It looks like the Documentation file is still referencing 
both as advise flags and doesn't address PR_SET_THP_DISABLE.  Let me send 
a v2 with a Documentation update and your suggested fix.  Thanks 
Vlastimil!

> > +
> >  	seq_puts(m, "VmFlags: ");
> >  	for (i = 0; i < BITS_PER_LONG; i++) {
> >  		if (!mnemonics[i][0])
> >  			continue;
> > -		if (vma->vm_flags & (1UL << i)) {
> > +		if (flags & (1UL << i)) {
> >  			seq_putc(m, mnemonics[i][0]);
> >  			seq_putc(m, mnemonics[i][1]);
> >  			seq_putc(m, ' ');
> > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch v2] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-24 19:17   ` David Rientjes
@ 2018-09-24 19:30     ` David Rientjes
  2018-09-24 19:56       ` Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-09-24 19:30 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Alexey Dobriyan, Kirill A. Shutemov, Michal Hocko, linux-kernel,
	linux-mm, linux-api

Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
introduced a regression in that userspace cannot always determine the set
of vmas where thp is ineligible.

Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
to determine if a vma is eligible to be backed by hugepages.

Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
be disabled and emit "nh" as a flag for the corresponding vmas as part of
/proc/pid/smaps.  After the commit, thp is disabled by means of an mm
flag and "nh" is not emitted.

This causes smaps parsing libraries to assume a vma is eligible for thp
and ends up puzzling the user on why its memory is not backed by thp.

This also clears the "hg" flag to make the behavior of MADV_HUGEPAGE and
PR_SET_THP_DISABLE definitive.

Fixes: 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
Signed-off-by: David Rientjes <rientjes@google.com>
---
 v2:
  - clear VM_HUGEPAGE per Vlastimil
  - update Documentation/filesystems/proc.txt to be explicit

 Documentation/filesystems/proc.txt | 12 ++++++++++--
 fs/proc/task_mmu.c                 | 14 +++++++++++++-
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -490,10 +490,18 @@ manner. The codes are the following:
     dd  - do not include area into core dump
     sd  - soft-dirty flag
     mm  - mixed map area
-    hg  - huge page advise flag
-    nh  - no-huge page advise flag
+    hg  - eligible for transparent hugepages [*]
+    nh  - not eligible for transparent hugepages [*]
     mg  - mergable advise flag
 
+ [*] A process mapping is eligible to be backed by transparent hugepages (thp)
+     depending on system-wide settings and the mapping itself.  See
+     Documentation/admin-guide/mm/transhuge.rst for default behavior.  If a
+     mapping has a flag of "nh", it is not eligible to be backed by hugepages
+     in any condition, either because of prctl(PR_SET_THP_DISABLE) or
+     madvise(MADV_NOHUGEPAGE).  PR_SET_THP_DISABLE takes precedence over any
+     MADV_HUGEPAGE.
+
 Note that there is no guarantee that every flag and associated mnemonic will
 be present in all further kernel releases. Things get changed, the flags may
 be vanished or the reverse -- new added.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -653,13 +653,25 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 #endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
 	};
+	unsigned long flags = vma->vm_flags;
 	size_t i;
 
+	/*
+	 * Disabling thp is possible through both MADV_NOHUGEPAGE and
+	 * PR_SET_THP_DISABLE.  Both historically used VM_NOHUGEPAGE.  Since
+	 * the introduction of MMF_DISABLE_THP, however, userspace needs the
+	 * ability to detect vmas where thp is not eligible in the same manner.
+	 */
+	if (vma->vm_mm && test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) {
+		flags &= ~VM_HUGEPAGE;
+		flags |= VM_NOHUGEPAGE;
+	}
+
 	seq_puts(m, "VmFlags: ");
 	for (i = 0; i < BITS_PER_LONG; i++) {
 		if (!mnemonics[i][0])
 			continue;
-		if (vma->vm_flags & (1UL << i)) {
+		if (flags & (1UL << i)) {
 			seq_putc(m, mnemonics[i][0]);
 			seq_putc(m, mnemonics[i][1]);
 			seq_putc(m, ' ');

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-24 19:30     ` [patch v2] " David Rientjes
@ 2018-09-24 19:56       ` Michal Hocko
  2018-09-24 20:02         ` Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-09-24 19:56 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Mon 24-09-18 12:30:07, David Rientjes wrote:
> Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
> introduced a regression in that userspace cannot always determine the set
> of vmas where thp is ineligible.
> 
> Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
> to determine if a vma is eligible to be backed by hugepages.

I was under impression that nh resp hg flags only tell about the madvise
status. How do you exactly use these flags in an application?

Your eligible rules as defined here:

> + [*] A process mapping is eligible to be backed by transparent hugepages (thp)
> +     depending on system-wide settings and the mapping itself.  See
> +     Documentation/admin-guide/mm/transhuge.rst for default behavior.  If a
> +     mapping has a flag of "nh", it is not eligible to be backed by hugepages
> +     in any condition, either because of prctl(PR_SET_THP_DISABLE) or
> +     madvise(MADV_NOHUGEPAGE).  PR_SET_THP_DISABLE takes precedence over any
> +     MADV_HUGEPAGE.

doesn't seem to match the reality. I do not see all the file backed
mappings to be nh marked. So is this really about eligibility rather
than the madvise status? Maybe it is just the above documentation that
needs to be updated.

That being said, I do not object to the patch, I am just trying to
understand what is the intended usage for the flag that does try to say
more than the madvise status.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-24 19:56       ` Michal Hocko
@ 2018-09-24 20:02         ` Michal Hocko
  2018-09-24 20:43           ` Vlastimil Babka
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-09-24 20:02 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Mon 24-09-18 21:56:03, Michal Hocko wrote:
> On Mon 24-09-18 12:30:07, David Rientjes wrote:
> > Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
> > introduced a regression in that userspace cannot always determine the set
> > of vmas where thp is ineligible.
> > 
> > Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
> > to determine if a vma is eligible to be backed by hugepages.
> 
> I was under impression that nh resp hg flags only tell about the madvise
> status. How do you exactly use these flags in an application?
> 
> Your eligible rules as defined here:
> 
> > + [*] A process mapping is eligible to be backed by transparent hugepages (thp)
> > +     depending on system-wide settings and the mapping itself.  See
> > +     Documentation/admin-guide/mm/transhuge.rst for default behavior.  If a
> > +     mapping has a flag of "nh", it is not eligible to be backed by hugepages
> > +     in any condition, either because of prctl(PR_SET_THP_DISABLE) or
> > +     madvise(MADV_NOHUGEPAGE).  PR_SET_THP_DISABLE takes precedence over any
> > +     MADV_HUGEPAGE.
> 
> doesn't seem to match the reality. I do not see all the file backed
> mappings to be nh marked. So is this really about eligibility rather
> than the madvise status? Maybe it is just the above documentation that
> needs to be updated.
> 
> That being said, I do not object to the patch, I am just trying to
> understand what is the intended usage for the flag that does try to say
> more than the madvise status.

And moreover, how is the PR_SET_THP_DISABLE any different from the
global THP disabled case. Do we want to set all vmas to nh as well?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-24 20:02         ` Michal Hocko
@ 2018-09-24 20:43           ` Vlastimil Babka
  2018-09-25  5:50             ` Michal Hocko
  2018-09-25 19:52             ` David Rientjes
  0 siblings, 2 replies; 41+ messages in thread
From: Vlastimil Babka @ 2018-09-24 20:43 UTC (permalink / raw)
  To: Michal Hocko, David Rientjes
  Cc: Andrew Morton, Alexey Dobriyan, Kirill A. Shutemov, linux-kernel,
	linux-mm, linux-api

On 9/24/18 10:02 PM, Michal Hocko wrote:
> On Mon 24-09-18 21:56:03, Michal Hocko wrote:
>> On Mon 24-09-18 12:30:07, David Rientjes wrote:
>>> Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
>>> introduced a regression in that userspace cannot always determine the set
>>> of vmas where thp is ineligible.
>>>
>>> Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
>>> to determine if a vma is eligible to be backed by hugepages.
>>
>> I was under impression that nh resp hg flags only tell about the madvise
>> status. How do you exactly use these flags in an application?
>>
>> Your eligible rules as defined here:
>>
>>> + [*] A process mapping is eligible to be backed by transparent hugepages (thp)
>>> +     depending on system-wide settings and the mapping itself.  See
>>> +     Documentation/admin-guide/mm/transhuge.rst for default behavior.  If a
>>> +     mapping has a flag of "nh", it is not eligible to be backed by hugepages
>>> +     in any condition, either because of prctl(PR_SET_THP_DISABLE) or
>>> +     madvise(MADV_NOHUGEPAGE).  PR_SET_THP_DISABLE takes precedence over any
>>> +     MADV_HUGEPAGE.
>>
>> doesn't seem to match the reality. I do not see all the file backed
>> mappings to be nh marked. So is this really about eligibility rather
>> than the madvise status? Maybe it is just the above documentation that
>> needs to be updated.

Yeah the change from madvise to eligibility in the doc seems to go too far.

>> That being said, I do not object to the patch, I am just trying to
>> understand what is the intended usage for the flag that does try to say
>> more than the madvise status.
> 
> And moreover, how is the PR_SET_THP_DISABLE any different from the
> global THP disabled case. Do we want to set all vmas to nh as well?

Probably not. It's easy to check the global status, but is it possible
to query for the prctl flags of a process? We are looking at process or
even vma-specific flags here. If the prctl was historically implemented
via VM_NOHUGEPAGE and thus reported as such in smaps, it makes sense to
do so even with the MMF_ flag IMHO?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-24 20:43           ` Vlastimil Babka
@ 2018-09-25  5:50             ` Michal Hocko
  2018-09-25 19:52             ` David Rientjes
  1 sibling, 0 replies; 41+ messages in thread
From: Michal Hocko @ 2018-09-25  5:50 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Andrew Morton, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Mon 24-09-18 22:43:49, Vlastimil Babka wrote:
> On 9/24/18 10:02 PM, Michal Hocko wrote:
> > On Mon 24-09-18 21:56:03, Michal Hocko wrote:
[...]
> >> That being said, I do not object to the patch, I am just trying to
> >> understand what is the intended usage for the flag that does try to say
> >> more than the madvise status.
> > 
> > And moreover, how is the PR_SET_THP_DISABLE any different from the
> > global THP disabled case. Do we want to set all vmas to nh as well?
> 
> Probably not. It's easy to check the global status, but is it possible
> to query for the prctl flags of a process?

Dunno but I suspect there is no way to check for this.

> We are looking at process or
> even vma-specific flags here. If the prctl was historically implemented
> via VM_NOHUGEPAGE and thus reported as such in smaps, it makes sense to
> do so even with the MMF_ flag IMHO?

Yes if this breaks some userspace which relied on the previous behavior.
But if nothing really broke then I guess it would be better to have the
semantic as clear as possible. Go and check the global status to make
the whole picture doesn't look very sound to me. On the other hand this
VMA has a madvise flag on it sounds quite clear and you know what to
expect at least. Sure the hint might be ignored in the end but well,
these are hints they do not guarantee anything after all.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-24 20:43           ` Vlastimil Babka
  2018-09-25  5:50             ` Michal Hocko
@ 2018-09-25 19:52             ` David Rientjes
  2018-09-25 20:29               ` Michal Hocko
  2018-09-25 21:50               ` [patch v3] mm, thp: always specify disabled vmas as nh in smaps David Rientjes
  1 sibling, 2 replies; 41+ messages in thread
From: David Rientjes @ 2018-09-25 19:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Michal Hocko, Andrew Morton, Alexey Dobriyan, Kirill A. Shutemov,
	linux-kernel, linux-mm, linux-api

On Mon, 24 Sep 2018, Vlastimil Babka wrote:

> On 9/24/18 10:02 PM, Michal Hocko wrote:
> > On Mon 24-09-18 21:56:03, Michal Hocko wrote:
> >> On Mon 24-09-18 12:30:07, David Rientjes wrote:
> >>> Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
> >>> introduced a regression in that userspace cannot always determine the set
> >>> of vmas where thp is ineligible.
> >>>
> >>> Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
> >>> to determine if a vma is eligible to be backed by hugepages.
> >>
> >> I was under impression that nh resp hg flags only tell about the madvise
> >> status. How do you exactly use these flags in an application?
> >>

This is used to identify heap mappings that should be able to fault thp 
but do not, and they normally point to a low-on-memory or fragmentation 
issue.  After commit 1860033237d4, our users of PR_SET_THP_DISABLE no 
longer show "nh" for their heap mappings so they get reported as having a 
low thp ratio when in reality it is disabled.  It is also used in 
automated testing to ensure that vmas get disabled for thp appropriately 
and we used "nh" since that is how PR_SET_THP_DISABLE previously enforced 
this, and those tests now break.

> >> Your eligible rules as defined here:
> >>
> >>> + [*] A process mapping is eligible to be backed by transparent hugepages (thp)
> >>> +     depending on system-wide settings and the mapping itself.  See
> >>> +     Documentation/admin-guide/mm/transhuge.rst for default behavior.  If a
> >>> +     mapping has a flag of "nh", it is not eligible to be backed by hugepages
> >>> +     in any condition, either because of prctl(PR_SET_THP_DISABLE) or
> >>> +     madvise(MADV_NOHUGEPAGE).  PR_SET_THP_DISABLE takes precedence over any
> >>> +     MADV_HUGEPAGE.
> >>
> >> doesn't seem to match the reality. I do not see all the file backed
> >> mappings to be nh marked. So is this really about eligibility rather
> >> than the madvise status? Maybe it is just the above documentation that
> >> needs to be updated.
> 
> Yeah the change from madvise to eligibility in the doc seems to go too far.
> 

I'll reword this to explicitly state that "hg" and "nh" mappings either 
allow or disallow thp backing.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-25 19:52             ` David Rientjes
@ 2018-09-25 20:29               ` Michal Hocko
  2018-09-25 21:45                 ` David Rientjes
  2018-09-25 21:50               ` [patch v3] mm, thp: always specify disabled vmas as nh in smaps David Rientjes
  1 sibling, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-09-25 20:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Vlastimil Babka, Andrew Morton, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue 25-09-18 12:52:09, David Rientjes wrote:
> On Mon, 24 Sep 2018, Vlastimil Babka wrote:
> 
> > On 9/24/18 10:02 PM, Michal Hocko wrote:
> > > On Mon 24-09-18 21:56:03, Michal Hocko wrote:
> > >> On Mon 24-09-18 12:30:07, David Rientjes wrote:
> > >>> Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
> > >>> introduced a regression in that userspace cannot always determine the set
> > >>> of vmas where thp is ineligible.
> > >>>
> > >>> Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
> > >>> to determine if a vma is eligible to be backed by hugepages.
> > >>
> > >> I was under impression that nh resp hg flags only tell about the madvise
> > >> status. How do you exactly use these flags in an application?
> > >>
> 
> This is used to identify heap mappings that should be able to fault thp 
> but do not, and they normally point to a low-on-memory or fragmentation 
> issue.  After commit 1860033237d4, our users of PR_SET_THP_DISABLE no 
> longer show "nh" for their heap mappings so they get reported as having a 
> low thp ratio when in reality it is disabled.  

I am still not sure I understand the issue completely. How are PR_SET_THP_DISABLE
users any different from the global THP disabled case? Is this only
about the scope? E.g the one who checks for the state cannot check the
PR_SET_THP_DISABLE state? Besides that what are consequences of the
low ratio? Is this an example of somebody using the prctl and still
complaining or an external observer trying to do something useful which
ends up doing contrary?

> It is also used in 
> automated testing to ensure that vmas get disabled for thp appropriately 
> and we used "nh" since that is how PR_SET_THP_DISABLE previously enforced 
> this, and those tests now break.

This sounds like a bit of an abuse to me. It shows how an internal
implementation detail leaks out to the userspace which is something we
should try to avoid.

> > >> Your eligible rules as defined here:
> > >>
> > >>> + [*] A process mapping is eligible to be backed by transparent hugepages (thp)
> > >>> +     depending on system-wide settings and the mapping itself.  See
> > >>> +     Documentation/admin-guide/mm/transhuge.rst for default behavior.  If a
> > >>> +     mapping has a flag of "nh", it is not eligible to be backed by hugepages
> > >>> +     in any condition, either because of prctl(PR_SET_THP_DISABLE) or
> > >>> +     madvise(MADV_NOHUGEPAGE).  PR_SET_THP_DISABLE takes precedence over any
> > >>> +     MADV_HUGEPAGE.
> > >>
> > >> doesn't seem to match the reality. I do not see all the file backed
> > >> mappings to be nh marked. So is this really about eligibility rather
> > >> than the madvise status? Maybe it is just the above documentation that
> > >> needs to be updated.
> > 
> > Yeah the change from madvise to eligibility in the doc seems to go too far.
> > 
> 
> I'll reword this to explicitly state that "hg" and "nh" mappings either 
> allow or disallow thp backing.

How are you going to distinguish a regular THP-able mapping then? I am
still not sure how this is supposed to work. Could you be more specific.
Let's say I have a THP-able mapping (shmem resp. anon for the current
implementation). What is the the matrix for hg/nh wrt. madvice/nomadvise
PR_SET_THP_DISABLE and global THP enabled/disable.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-25 20:29               ` Michal Hocko
@ 2018-09-25 21:45                 ` David Rientjes
  2018-09-25 22:04                   ` Andrew Morton
  0 siblings, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-09-25 21:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Andrew Morton, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue, 25 Sep 2018, Michal Hocko wrote:

> > This is used to identify heap mappings that should be able to fault thp 
> > but do not, and they normally point to a low-on-memory or fragmentation 
> > issue.  After commit 1860033237d4, our users of PR_SET_THP_DISABLE no 
> > longer show "nh" for their heap mappings so they get reported as having a 
> > low thp ratio when in reality it is disabled.  
> 
> I am still not sure I understand the issue completely. How are PR_SET_THP_DISABLE
> users any different from the global THP disabled case? Is this only
> about the scope? E.g the one who checks for the state cannot check the
> PR_SET_THP_DISABLE state? Besides that what are consequences of the
> low ratio? Is this an example of somebody using the prctl and still
> complaining or an external observer trying to do something useful which
> ends up doing contrary?
> 

Yes, that is how I found out about this.  The system-wide policy can be 
determined from /sys/kernel/mm/transparent_hugepage/enabled.  If it is 
"always" and heap mappings are not being backed by hugepages and lack the 
"nh" flag, it was considered as a likely fragmentation issue before commit 
1860033237d4.  After commit 1860033237d4, the heap mapping for 
PR_SET_THP_DISABLE users was not showing it actually is prevented from 
faulting thp because of policy, not because of fragmentation.

> > It is also used in 
> > automated testing to ensure that vmas get disabled for thp appropriately 
> > and we used "nh" since that is how PR_SET_THP_DISABLE previously enforced 
> > this, and those tests now break.
> 
> This sounds like a bit of an abuse to me. It shows how an internal
> implementation detail leaks out to the userspace which is something we
> should try to avoid.
> 

Well, it's already how this has worked for years before commit 
1860033237d4 broke it.  Changing the implementation in the kernel is fine 
as long as you don't break userspace who relies on what is exported to it 
and is the only way to determine if MADV_NOHUGEPAGE is preventing it from 
being backed by hugepages.

> > I'll reword this to explicitly state that "hg" and "nh" mappings either 
> > allow or disallow thp backing.
> 
> How are you going to distinguish a regular THP-able mapping then? I am
> still not sure how this is supposed to work. Could you be more specific.

You look for "[heap]" in smaps to determine where the heap mapping is.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [patch v3] mm, thp: always specify disabled vmas as nh in smaps
  2018-09-25 19:52             ` David Rientjes
  2018-09-25 20:29               ` Michal Hocko
@ 2018-09-25 21:50               ` David Rientjes
  2018-09-26  6:12                 ` Michal Hocko
  2018-09-26  8:40                 ` Vlastimil Babka
  1 sibling, 2 replies; 41+ messages in thread
From: David Rientjes @ 2018-09-25 21:50 UTC (permalink / raw)
  To: Andrew Morton, Vlastimil Babka
  Cc: Michal Hocko, Alexey Dobriyan, Kirill A. Shutemov, linux-kernel,
	linux-mm, linux-api

Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
introduced a regression in that userspace cannot always determine the set
of vmas where thp is disabled.

Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
to determine if a vma has been disabled from being backed by hugepages.

Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
be disabled and emit "nh" as a flag for the corresponding vmas as part of
/proc/pid/smaps.  After the commit, thp is disabled by means of an mm
flag and "nh" is not emitted.

This causes smaps parsing libraries to assume a vma is enabled for thp
and ends up puzzling the user on why its memory is not backed by thp.

This also clears the "hg" flag to make the behavior of MADV_HUGEPAGE and
PR_SET_THP_DISABLE definitive.

Fixes: 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
Signed-off-by: David Rientjes <rientjes@google.com>
---
 v3:
  - reword Documentation/filesystems/proc.txt for eligibility

 v2:
  - clear VM_HUGEPAGE per Vlastimil
  - update Documentation/filesystems/proc.txt to be explicit

 Documentation/filesystems/proc.txt |  7 ++++++-
 fs/proc/task_mmu.c                 | 14 +++++++++++++-
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -491,9 +491,14 @@ manner. The codes are the following:
     sd  - soft-dirty flag
     mm  - mixed map area
     hg  - huge page advise flag
-    nh  - no-huge page advise flag
+    nh  - no-huge page advise flag [*]
     mg  - mergable advise flag
 
+ [*] A process mapping may be advised to not be backed by transparent hugepages
+     by either madvise(MADV_NOHUGEPAGE) or prctl(PR_SET_THP_DISABLE).  See
+     Documentation/admin-guide/mm/transhuge.rst for system-wide and process
+     mapping policies.
+
 Note that there is no guarantee that every flag and associated mnemonic will
 be present in all further kernel releases. Things get changed, the flags may
 be vanished or the reverse -- new added.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -653,13 +653,25 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 #endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
 	};
+	unsigned long flags = vma->vm_flags;
 	size_t i;
 
+	/*
+	 * Disabling thp is possible through both MADV_NOHUGEPAGE and
+	 * PR_SET_THP_DISABLE.  Both historically used VM_NOHUGEPAGE.  Since
+	 * the introduction of MMF_DISABLE_THP, however, userspace needs the
+	 * ability to detect vmas where thp is not eligible in the same manner.
+	 */
+	if (vma->vm_mm && test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) {
+		flags &= ~VM_HUGEPAGE;
+		flags |= VM_NOHUGEPAGE;
+	}
+
 	seq_puts(m, "VmFlags: ");
 	for (i = 0; i < BITS_PER_LONG; i++) {
 		if (!mnemonics[i][0])
 			continue;
-		if (vma->vm_flags & (1UL << i)) {
+		if (flags & (1UL << i)) {
 			seq_putc(m, mnemonics[i][0]);
 			seq_putc(m, mnemonics[i][1]);
 			seq_putc(m, ' ');

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-25 21:45                 ` David Rientjes
@ 2018-09-25 22:04                   ` Andrew Morton
  2018-09-26  0:55                     ` David Rientjes
  2018-09-26  6:06                     ` Michal Hocko
  0 siblings, 2 replies; 41+ messages in thread
From: Andrew Morton @ 2018-09-25 22:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:

> > > It is also used in 
> > > automated testing to ensure that vmas get disabled for thp appropriately 
> > > and we used "nh" since that is how PR_SET_THP_DISABLE previously enforced 
> > > this, and those tests now break.
> > 
> > This sounds like a bit of an abuse to me. It shows how an internal
> > implementation detail leaks out to the userspace which is something we
> > should try to avoid.
> > 
> 
> Well, it's already how this has worked for years before commit 
> 1860033237d4 broke it.  Changing the implementation in the kernel is fine 
> as long as you don't break userspace who relies on what is exported to it 
> and is the only way to determine if MADV_NOHUGEPAGE is preventing it from 
> being backed by hugepages.

1860033237d4 was over a year ago so perhaps we don't need to be
too worried about restoring the old interface.  In which case
we have an opportunity to make improvements such as that suggested
by Michal?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-25 22:04                   ` Andrew Morton
@ 2018-09-26  0:55                     ` David Rientjes
  2018-09-26  6:06                     ` Michal Hocko
  1 sibling, 0 replies; 41+ messages in thread
From: David Rientjes @ 2018-09-26  0:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue, 25 Sep 2018, Andrew Morton wrote:

> > > > It is also used in 
> > > > automated testing to ensure that vmas get disabled for thp appropriately 
> > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously enforced 
> > > > this, and those tests now break.
> > > 
> > > This sounds like a bit of an abuse to me. It shows how an internal
> > > implementation detail leaks out to the userspace which is something we
> > > should try to avoid.
> > > 
> > 
> > Well, it's already how this has worked for years before commit 
> > 1860033237d4 broke it.  Changing the implementation in the kernel is fine 
> > as long as you don't break userspace who relies on what is exported to it 
> > and is the only way to determine if MADV_NOHUGEPAGE is preventing it from 
> > being backed by hugepages.
> 
> 1860033237d4 was over a year ago so perhaps we don't need to be
> too worried about restoring the old interface.  In which case
> we have an opportunity to make improvements such as that suggested
> by Michal?
> 

The only way to determine if a vma was thp disabled prior to this commit 
was parsing VmFlags from /proc/pid/smaps.  That was possible either 
through MADV_NOHUGEPAGE or PR_SET_THP_DISABLE.  It is perfectly legitimate 
for a test case to check if either are being set correctly through 
userspace libraries or through the kernel itself in the manner in which 
the kernel exports this information.  It is also perfectly legitimate for 
userspace to cull through information in the only way it is exported by 
the kernel to identify reasons for why applications are not having their 
heap backed by transparent hugepages: the mapping is disabled, the 
application is hitting the limit for its mem cgroup, we are low on memory, 
or there are fragmentation issues.  Differentiating between those is 
something our userspace does, and was broken by 1860033237d4.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v2] mm, thp: always specify ineligible vmas as nh in smaps
  2018-09-25 22:04                   ` Andrew Morton
  2018-09-26  0:55                     ` David Rientjes
@ 2018-09-26  6:06                     ` Michal Hocko
  2018-10-02 11:28                       ` [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc Michal Hocko
  1 sibling, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-09-26  6:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue 25-09-18 15:04:06, Andrew Morton wrote:
> On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> 
> > > > It is also used in 
> > > > automated testing to ensure that vmas get disabled for thp appropriately 
> > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously enforced 
> > > > this, and those tests now break.
> > > 
> > > This sounds like a bit of an abuse to me. It shows how an internal
> > > implementation detail leaks out to the userspace which is something we
> > > should try to avoid.
> > > 
> > 
> > Well, it's already how this has worked for years before commit 
> > 1860033237d4 broke it.  Changing the implementation in the kernel is fine 
> > as long as you don't break userspace who relies on what is exported to it 
> > and is the only way to determine if MADV_NOHUGEPAGE is preventing it from 
> > being backed by hugepages.
> 
> 1860033237d4 was over a year ago so perhaps we don't need to be
> too worried about restoring the old interface.  In which case
> we have an opportunity to make improvements such as that suggested
> by Michal?

Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace
somehow? E.g. /proc/<pid>/status. It is a process wide thing so
reporting it per VMA sounds strange at best.

This would also keep a sane (and currently documented) semantic for
the smaps flag to be really
    hg  - huge page advise flag
    nh  - no-huge page advise flag
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v3] mm, thp: always specify disabled vmas as nh in smaps
  2018-09-25 21:50               ` [patch v3] mm, thp: always specify disabled vmas as nh in smaps David Rientjes
@ 2018-09-26  6:12                 ` Michal Hocko
  2018-09-26  7:17                   ` Michal Hocko
  2018-09-26  8:40                 ` Vlastimil Babka
  1 sibling, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-09-26  6:12 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue 25-09-18 14:50:52, David Rientjes wrote:
[...]
Let's put my general disagreement with the approach asside for a while.
If this is really the best way forward the is the implementation really
correct?

> +	/*
> +	 * Disabling thp is possible through both MADV_NOHUGEPAGE and
> +	 * PR_SET_THP_DISABLE.  Both historically used VM_NOHUGEPAGE.  Since
> +	 * the introduction of MMF_DISABLE_THP, however, userspace needs the
> +	 * ability to detect vmas where thp is not eligible in the same manner.
> +	 */
> +	if (vma->vm_mm && test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) {
> +		flags &= ~VM_HUGEPAGE;
> +		flags |= VM_NOHUGEPAGE;
> +	}

Do we want to report all vmas nh? Shouldn't we limit that to THP-able
mappings? It seems quite strange that an application started without
PR_SET_THP_DISABLE wouldn't report nh for most mappings while it would
otherwise. Also when can we have vma->vm_mm == NULL?

> +
>  	seq_puts(m, "VmFlags: ");
>  	for (i = 0; i < BITS_PER_LONG; i++) {
>  		if (!mnemonics[i][0])
>  			continue;
> -		if (vma->vm_flags & (1UL << i)) {
> +		if (flags & (1UL << i)) {
>  			seq_putc(m, mnemonics[i][0]);
>  			seq_putc(m, mnemonics[i][1]);
>  			seq_putc(m, ' ');

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v3] mm, thp: always specify disabled vmas as nh in smaps
  2018-09-26  6:12                 ` Michal Hocko
@ 2018-09-26  7:17                   ` Michal Hocko
  0 siblings, 0 replies; 41+ messages in thread
From: Michal Hocko @ 2018-09-26  7:17 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Wed 26-09-18 08:12:47, Michal Hocko wrote:
> On Tue 25-09-18 14:50:52, David Rientjes wrote:
> [...]
> Let's put my general disagreement with the approach asside for a while.
> If this is really the best way forward the is the implementation really
> correct?
> 
> > +	/*
> > +	 * Disabling thp is possible through both MADV_NOHUGEPAGE and
> > +	 * PR_SET_THP_DISABLE.  Both historically used VM_NOHUGEPAGE.  Since
> > +	 * the introduction of MMF_DISABLE_THP, however, userspace needs the
> > +	 * ability to detect vmas where thp is not eligible in the same manner.
> > +	 */
> > +	if (vma->vm_mm && test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) {
> > +		flags &= ~VM_HUGEPAGE;
> > +		flags |= VM_NOHUGEPAGE;
> > +	}
> 
> Do we want to report all vmas nh? Shouldn't we limit that to THP-able
> mappings? It seems quite strange that an application started without
> PR_SET_THP_DISABLE wouldn't report nh for most mappings while it would
> otherwise. Also when can we have vma->vm_mm == NULL?

Hmm, after re-reading your documentation update to "A process mapping
may be advised to not be backed by transparent hugepages by either
madvise(MADV_NOHUGEPAGE) or prctl(PR_SET_THP_DISABLE)." the
implementation matches so scratch my comment.

As I've said, I am not happy about this approach but if there is a
general agreement this is really the best we can do I will not stand in
the way.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [patch v3] mm, thp: always specify disabled vmas as nh in smaps
  2018-09-25 21:50               ` [patch v3] mm, thp: always specify disabled vmas as nh in smaps David Rientjes
  2018-09-26  6:12                 ` Michal Hocko
@ 2018-09-26  8:40                 ` Vlastimil Babka
  1 sibling, 0 replies; 41+ messages in thread
From: Vlastimil Babka @ 2018-09-26  8:40 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton
  Cc: Michal Hocko, Alexey Dobriyan, Kirill A. Shutemov, linux-kernel,
	linux-mm, linux-api

On 9/25/18 11:50 PM, David Rientjes wrote:
> Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
> introduced a regression in that userspace cannot always determine the set
> of vmas where thp is disabled.
> 
> Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
> to determine if a vma has been disabled from being backed by hugepages.
> 
> Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
> be disabled and emit "nh" as a flag for the corresponding vmas as part of
> /proc/pid/smaps.  After the commit, thp is disabled by means of an mm
> flag and "nh" is not emitted.
> 
> This causes smaps parsing libraries to assume a vma is enabled for thp
> and ends up puzzling the user on why its memory is not backed by thp.
> 
> This also clears the "hg" flag to make the behavior of MADV_HUGEPAGE and
> PR_SET_THP_DISABLE definitive.
> 
> Fixes: 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
> Signed-off-by: David Rientjes <rientjes@google.com>

Well, as Andrew said, we had the opportunity to provide a more complete
info to userspace e.g. with Michal's suggested /proc/pid/status
enhancement. If this is good enough for you (and nobody else cares) then
I won't block it either. It would be unfortunate though if we could not
revert this in case the MMF_DISABLE_THP querying is implemented later.
Hopefully the only consumers are internal tools such as yours, which can
be easily adapted...

Vlastimil

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-09-26  6:06                     ` Michal Hocko
@ 2018-10-02 11:28                       ` Michal Hocko
  2018-10-02 20:29                         ` David Rientjes
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-10-02 11:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Wed 26-09-18 08:06:24, Michal Hocko wrote:
> On Tue 25-09-18 15:04:06, Andrew Morton wrote:
> > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> > 
> > > > > It is also used in 
> > > > > automated testing to ensure that vmas get disabled for thp appropriately 
> > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously enforced 
> > > > > this, and those tests now break.
> > > > 
> > > > This sounds like a bit of an abuse to me. It shows how an internal
> > > > implementation detail leaks out to the userspace which is something we
> > > > should try to avoid.
> > > > 
> > > 
> > > Well, it's already how this has worked for years before commit 
> > > 1860033237d4 broke it.  Changing the implementation in the kernel is fine 
> > > as long as you don't break userspace who relies on what is exported to it 
> > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it from 
> > > being backed by hugepages.
> > 
> > 1860033237d4 was over a year ago so perhaps we don't need to be
> > too worried about restoring the old interface.  In which case
> > we have an opportunity to make improvements such as that suggested
> > by Michal?
> 
> Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace
> somehow? E.g. /proc/<pid>/status. It is a process wide thing so
> reporting it per VMA sounds strange at best.

So how about this? (not tested yet but it should be pretty
straightforward)
--- 
From 048b29102de326900b54cce78b614345cd77a230 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Tue, 2 Oct 2018 10:53:48 +0200
Subject: [PATCH] mm, proc: report PR_SET_THP_DISABLE in proc

David Rientjes has reported that 1860033237d4 ("mm: make
PR_SET_THP_DISABLE immediately active") has changed the way how
we report THPable VMAs to the userspace. Their monitoring tool is
triggering false alarms on PR_SET_THP_DISABLE tasks because it considers
an insufficient THP usage as a memory fragmentation resp. memory
pressure issue.

Before the said commit each newly created VMA inherited VM_NOHUGEPAGE
flag and that got exposed to the userspace via /proc/<pid>/smaps file.
This implementation had its downsides as explained in the commit message
but it is true that the userspace doesn't have any means to query for
the process wide THP enabled/disabled status.

PR_SET_THP_DISABLE is a process wide flag so it makes a lot of sense
to export in the process wide context rather than per-vma. Introduce
a new field to /proc/<pid>/status which export this status.  If
PR_SET_THP_DISABLE is used the it reports false same as when the THP is
not compiled in. It doesn't consider the global THP status because we
already export that information via sysfs

Fixes: 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/filesystems/proc.txt |  3 +++
 fs/proc/array.c                    | 10 ++++++++++
 2 files changed, 13 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 22b4b00dee31..bafa5cb1685a 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -182,6 +182,7 @@ For example, to get the status information of a process, all you have to do is
   VmSwap:        0 kB
   HugetlbPages:          0 kB
   CoreDumping:    0
+  THP_enabled:	  1
   Threads:        1
   SigQ:   0/28578
   SigPnd: 0000000000000000
@@ -256,6 +257,8 @@ Table 1-2: Contents of the status files (as of 4.8)
  HugetlbPages                size of hugetlb memory portions
  CoreDumping                 process's memory is currently being dumped
                              (killing the process may lead to a corrupted core)
+ THP_enabled		     process is allowed to use THP (returns 0 when
+			     PR_SET_THP_DISABLE is set on the process
  Threads                     number of threads
  SigQ                        number of signals queued/max. number for queue
  SigPnd                      bitmap of pending signals for the thread
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 0ceb3b6b37e7..9d428d5a0ac8 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -392,6 +392,15 @@ static inline void task_core_dumping(struct seq_file *m, struct mm_struct *mm)
 	seq_putc(m, '\n');
 }
 
+static inline void task_thp_status(struct seq_file *m, struct mm_struct *mm)
+{
+	bool thp_enabled = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE);
+
+	if (thp_enabled)
+		thp_enabled = !test_bit(MMF_DISABLE_THP, &mm->flags);
+	seq_printf(m, "THP_enabled:\t%d\n", thp_enabled);
+}
+
 int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 			struct pid *pid, struct task_struct *task)
 {
@@ -406,6 +415,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
 	if (mm) {
 		task_mem(m, mm);
 		task_core_dumping(m, mm);
+		task_thp_status(m, mm);
 		mmput(mm);
 	}
 	task_sig(m, task);
-- 
2.19.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-02 11:28                       ` [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc Michal Hocko
@ 2018-10-02 20:29                         ` David Rientjes
  2018-10-03  7:36                           ` Michal Hocko
  2018-10-03 17:33                           ` Mike Rapoport
  0 siblings, 2 replies; 41+ messages in thread
From: David Rientjes @ 2018-10-02 20:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue, 2 Oct 2018, Michal Hocko wrote:

> On Wed 26-09-18 08:06:24, Michal Hocko wrote:
> > On Tue 25-09-18 15:04:06, Andrew Morton wrote:
> > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> > > 
> > > > > > It is also used in 
> > > > > > automated testing to ensure that vmas get disabled for thp appropriately 
> > > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously enforced 
> > > > > > this, and those tests now break.
> > > > > 
> > > > > This sounds like a bit of an abuse to me. It shows how an internal
> > > > > implementation detail leaks out to the userspace which is something we
> > > > > should try to avoid.
> > > > > 
> > > > 
> > > > Well, it's already how this has worked for years before commit 
> > > > 1860033237d4 broke it.  Changing the implementation in the kernel is fine 
> > > > as long as you don't break userspace who relies on what is exported to it 
> > > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it from 
> > > > being backed by hugepages.
> > > 
> > > 1860033237d4 was over a year ago so perhaps we don't need to be
> > > too worried about restoring the old interface.  In which case
> > > we have an opportunity to make improvements such as that suggested
> > > by Michal?
> > 
> > Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace
> > somehow? E.g. /proc/<pid>/status. It is a process wide thing so
> > reporting it per VMA sounds strange at best.
> 
> So how about this? (not tested yet but it should be pretty
> straightforward)

Umm, prctl(PR_GET_THP_DISABLE)?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-02 20:29                         ` David Rientjes
@ 2018-10-03  7:36                           ` Michal Hocko
  2018-10-03 22:51                             ` David Rientjes
  2018-10-03 17:33                           ` Mike Rapoport
  1 sibling, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-10-03  7:36 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue 02-10-18 13:29:42, David Rientjes wrote:
> On Tue, 2 Oct 2018, Michal Hocko wrote:
> 
> > On Wed 26-09-18 08:06:24, Michal Hocko wrote:
> > > On Tue 25-09-18 15:04:06, Andrew Morton wrote:
> > > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> > > > 
> > > > > > > It is also used in 
> > > > > > > automated testing to ensure that vmas get disabled for thp appropriately 
> > > > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously enforced 
> > > > > > > this, and those tests now break.
> > > > > > 
> > > > > > This sounds like a bit of an abuse to me. It shows how an internal
> > > > > > implementation detail leaks out to the userspace which is something we
> > > > > > should try to avoid.
> > > > > > 
> > > > > 
> > > > > Well, it's already how this has worked for years before commit 
> > > > > 1860033237d4 broke it.  Changing the implementation in the kernel is fine 
> > > > > as long as you don't break userspace who relies on what is exported to it 
> > > > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it from 
> > > > > being backed by hugepages.
> > > > 
> > > > 1860033237d4 was over a year ago so perhaps we don't need to be
> > > > too worried about restoring the old interface.  In which case
> > > > we have an opportunity to make improvements such as that suggested
> > > > by Michal?
> > > 
> > > Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace
> > > somehow? E.g. /proc/<pid>/status. It is a process wide thing so
> > > reporting it per VMA sounds strange at best.
> > 
> > So how about this? (not tested yet but it should be pretty
> > straightforward)
> 
> Umm, prctl(PR_GET_THP_DISABLE)?

/me confused. I thought you want to query for the flag on a
_different_ process. 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-02 20:29                         ` David Rientjes
  2018-10-03  7:36                           ` Michal Hocko
@ 2018-10-03 17:33                           ` Mike Rapoport
  1 sibling, 0 replies; 41+ messages in thread
From: Mike Rapoport @ 2018-10-03 17:33 UTC (permalink / raw)
  To: David Rientjes
  Cc: Michal Hocko, Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue, Oct 02, 2018 at 01:29:42PM -0700, David Rientjes wrote:
> On Tue, 2 Oct 2018, Michal Hocko wrote:
> 
> > On Wed 26-09-18 08:06:24, Michal Hocko wrote:
> > > On Tue 25-09-18 15:04:06, Andrew Morton wrote:
> > > > On Tue, 25 Sep 2018 14:45:19 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> > > > 
> > > > > > > It is also used in 
> > > > > > > automated testing to ensure that vmas get disabled for thp appropriately 
> > > > > > > and we used "nh" since that is how PR_SET_THP_DISABLE previously enforced 
> > > > > > > this, and those tests now break.
> > > > > > 
> > > > > > This sounds like a bit of an abuse to me. It shows how an internal
> > > > > > implementation detail leaks out to the userspace which is something we
> > > > > > should try to avoid.
> > > > > > 
> > > > > 
> > > > > Well, it's already how this has worked for years before commit 
> > > > > 1860033237d4 broke it.  Changing the implementation in the kernel is fine 
> > > > > as long as you don't break userspace who relies on what is exported to it 
> > > > > and is the only way to determine if MADV_NOHUGEPAGE is preventing it from 
> > > > > being backed by hugepages.
> > > > 
> > > > 1860033237d4 was over a year ago so perhaps we don't need to be
> > > > too worried about restoring the old interface.  In which case
> > > > we have an opportunity to make improvements such as that suggested
> > > > by Michal?
> > > 
> > > Yeah, can we add a way to export PR_SET_THP_DISABLE to userspace
> > > somehow? E.g. /proc/<pid>/status. It is a process wide thing so
> > > reporting it per VMA sounds strange at best.
> > 
> > So how about this? (not tested yet but it should be pretty
> > straightforward)
> 
> Umm, prctl(PR_GET_THP_DISABLE)?
> 

~/git/linux$ git grep PR_GET_THP_DISABLE
include/uapi/linux/prctl.h:#define PR_GET_THP_DISABLE   42
kernel/sys.c:   case PR_GET_THP_DISABLE:
tools/include/uapi/linux/prctl.h:#define PR_GET_THP_DISABLE     42

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-03  7:36                           ` Michal Hocko
@ 2018-10-03 22:51                             ` David Rientjes
  2018-10-04  5:58                               ` Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-10-03 22:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Wed, 3 Oct 2018, Michal Hocko wrote:

> > > So how about this? (not tested yet but it should be pretty
> > > straightforward)
> > 
> > Umm, prctl(PR_GET_THP_DISABLE)?
> 
> /me confused. I thought you want to query for the flag on a
> _different_ process. 

Why would we want to check three locations (system wide setting, prctl 
setting, madvise setting) to determine if a heap can be backed by thp?

If the nh flag being exported to VmFlag is to be extended beyond what my 
patch did, I suggest (1) it does it for the system wide setting as well 
and/or (2) calling a helper function to determine if the vma could be 
backed by thp in the first place regardless of any setting to determine if 
nh/hg is important.

The last thing I suggest is done is adding a third place to check.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-03 22:51                             ` David Rientjes
@ 2018-10-04  5:58                               ` Michal Hocko
  2018-10-04  9:15                                 ` David Rientjes
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-10-04  5:58 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Wed 03-10-18 15:51:05, David Rientjes wrote:
> On Wed, 3 Oct 2018, Michal Hocko wrote:
> 
> > > > So how about this? (not tested yet but it should be pretty
> > > > straightforward)
> > > 
> > > Umm, prctl(PR_GET_THP_DISABLE)?
> > 
> > /me confused. I thought you want to query for the flag on a
> > _different_ process. 
> 
> Why would we want to check three locations (system wide setting, prctl 
> setting, madvise setting) to determine if a heap can be backed by thp?

Because we simply have 3 different ways to control THP? Is this a real
problem?

> If the nh flag being exported to VmFlag is to be extended beyond what my 
> patch did, I suggest (1) it does it for the system wide setting as well 
> and/or (2) calling a helper function to determine if the vma could be 
> backed by thp in the first place regardless of any setting to determine if 
> nh/hg is important.
> 
> The last thing I suggest is done is adding a third place to check.

But conflating the three ways into a single exported symbol (be it nh
or something else) just makes the api more confusing longterm. I am
pretty sure we have made that mistake in the past already.

What if somebody really wants to check for PR_SET_THP_DISABLE? There is
currently no way to do that on a remote process right now AFAICS. So it
makes sense to export the state in general. Any exported API should be
about consistency. If you want to combine all three checks then
just do that in the userspace or in a library function.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-04  5:58                               ` Michal Hocko
@ 2018-10-04  9:15                                 ` David Rientjes
  2018-10-04  9:46                                   ` Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-10-04  9:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Thu, 4 Oct 2018, Michal Hocko wrote:

> > > > > So how about this? (not tested yet but it should be pretty
> > > > > straightforward)
> > > > 
> > > > Umm, prctl(PR_GET_THP_DISABLE)?
> > > 
> > > /me confused. I thought you want to query for the flag on a
> > > _different_ process. 
> > 
> > Why would we want to check three locations (system wide setting, prctl 
> > setting, madvise setting) to determine if a heap can be backed by thp?
> 
> Because we simply have 3 different ways to control THP? Is this a real
> problem?
> 

And prior to the offending commit, there were three ways to control thp 
but two ways to determine if a mapping was eligible for thp based on the 
implementation detail of one of those ways.  If there are three ways to 
control thp, userspace is still in the dark wrt which takes precedence 
over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set 
to "always", or we have MADV_HUGEPAGE set per smaps but PR_SET_THP_DISABLE 
shown in /proc/pid/status, etc.

Which one is the ultimate authority?  There's one way to specify it: in a 
single per-mapping location that reveals whether that mapping is eligible 
for thp or not.  So I think it would be a very sane extension so that 
smaps reveals if a mapping can be backed by hugepages or not depending on 
the helper function thp uses itself to determine if it can fault 
hugepages.  I don't think we should have three locations to check and then 
try to resolve which one takes precedence over the other for each 
userspace implementation (and perhaps how the kernel implementation 
evolves).

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-04  9:15                                 ` David Rientjes
@ 2018-10-04  9:46                                   ` Michal Hocko
  2018-10-04 18:34                                     ` David Rientjes
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-10-04  9:46 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Thu 04-10-18 02:15:38, David Rientjes wrote:
> On Thu, 4 Oct 2018, Michal Hocko wrote:
> 
> > > > > > So how about this? (not tested yet but it should be pretty
> > > > > > straightforward)
> > > > > 
> > > > > Umm, prctl(PR_GET_THP_DISABLE)?
> > > > 
> > > > /me confused. I thought you want to query for the flag on a
> > > > _different_ process. 
> > > 
> > > Why would we want to check three locations (system wide setting, prctl 
> > > setting, madvise setting) to determine if a heap can be backed by thp?
> > 
> > Because we simply have 3 different ways to control THP? Is this a real
> > problem?
> > 
> 
> And prior to the offending commit, there were three ways to control thp 
> but two ways to determine if a mapping was eligible for thp based on the 
> implementation detail of one of those ways.

Yes, it is really unfortunate that we have ever allowed to leak such an
internal stuff like VMA flags to userspace.

> If there are three ways to 
> control thp, userspace is still in the dark wrt which takes precedence 
> over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set 
> to "always", or we have MADV_HUGEPAGE set per smaps but PR_SET_THP_DISABLE 
> shown in /proc/pid/status, etc.
> 
> Which one is the ultimate authority?

Isn't our documentation good enough? If not then we should document it
properly.

> There's one way to specify it: in a 
> single per-mapping location that reveals whether that mapping is eligible 
> for thp or not.  So I think it would be a very sane extension so that 
> smaps reveals if a mapping can be backed by hugepages or not depending on 
> the helper function thp uses itself to determine if it can fault 
> hugepages.  I don't think we should have three locations to check and then 
> try to resolve which one takes precedence over the other for each 
> userspace implementation (and perhaps how the kernel implementation 
> evolves).

But we really have three different ways to disable thp. Which one has
caused the end result might be interesting/important because different
entities might be under control. You either have to contact your admin
for the global one, or whomever has launched you for the prctl thing. So
the distinction might be important.

Checking 3 different places and the precedence rules is not really
trivial but I do not see any reason why this couldn't be implemented in
a library so the user doesn't really have to scratch head.

If you really insist to have per-vma thing then all right but do not
conflate vma flags and the higher level logic and make it its own line
in the smaps output and make sure it reports only THP able VMAs.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-04  9:46                                   ` Michal Hocko
@ 2018-10-04 18:34                                     ` David Rientjes
  2018-10-09  8:33                                       ` Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-10-04 18:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Thu, 4 Oct 2018, Michal Hocko wrote:

> > And prior to the offending commit, there were three ways to control thp 
> > but two ways to determine if a mapping was eligible for thp based on the 
> > implementation detail of one of those ways.
> 
> Yes, it is really unfortunate that we have ever allowed to leak such an
> internal stuff like VMA flags to userspace.
> 

Right, I don't like userspace dependencies on VmFlags in smaps myself, but 
it's the only way we have available that shows whether a single mapping is 
eligible to be backed by thp :/

> > If there are three ways to 
> > control thp, userspace is still in the dark wrt which takes precedence 
> > over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set 
> > to "always", or we have MADV_HUGEPAGE set per smaps but PR_SET_THP_DISABLE 
> > shown in /proc/pid/status, etc.
> > 
> > Which one is the ultimate authority?
> 
> Isn't our documentation good enough? If not then we should document it
> properly.
> 

No, because the offending commit actually changed the precedence itself: 
PR_SET_THP_DISABLE used to be honored for future mappings and the commit 
changed that for all current mappings.  So as a result of the commit 
itself we would have had to change the documentation and userspace can't 
be expected to keep up with yet a fourth variable: kernel version.  It 
really needs to be simpler, just a per-mapping specifier.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-04 18:34                                     ` David Rientjes
@ 2018-10-09  8:33                                       ` Michal Hocko
  2018-10-15 15:03                                         ` Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-10-09  8:33 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Thu 04-10-18 11:34:11, David Rientjes wrote:
> On Thu, 4 Oct 2018, Michal Hocko wrote:
> 
> > > And prior to the offending commit, there were three ways to control thp 
> > > but two ways to determine if a mapping was eligible for thp based on the 
> > > implementation detail of one of those ways.
> > 
> > Yes, it is really unfortunate that we have ever allowed to leak such an
> > internal stuff like VMA flags to userspace.
> > 
> 
> Right, I don't like userspace dependencies on VmFlags in smaps myself, but 
> it's the only way we have available that shows whether a single mapping is 
> eligible to be backed by thp :/

Which is not the case due to reasons mentioned earlier. It only speaks
about madvise status on the VMA.

> > > If there are three ways to 
> > > control thp, userspace is still in the dark wrt which takes precedence 
> > > over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set 
> > > to "always", or we have MADV_HUGEPAGE set per smaps but PR_SET_THP_DISABLE 
> > > shown in /proc/pid/status, etc.
> > > 
> > > Which one is the ultimate authority?
> > 
> > Isn't our documentation good enough? If not then we should document it
> > properly.
> > 
> 
> No, because the offending commit actually changed the precedence itself: 
> PR_SET_THP_DISABLE used to be honored for future mappings and the commit 
> changed that for all current mappings.

Which is the actual and the full point of the fix as described in the
changelog. The original implementation was poor and inconsistent.

> So as a result of the commit 
> itself we would have had to change the documentation and userspace can't 
> be expected to keep up with yet a fourth variable: kernel version.  It 
> really needs to be simpler, just a per-mapping specifier.

As I've said, if you really need a per-vma granularity then make it a
dedicated line in the output with a clear semantic. Do not make VMA
flags even more confusing.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-09  8:33                                       ` Michal Hocko
@ 2018-10-15 15:03                                         ` Michal Hocko
  2018-10-15 22:25                                           ` David Rientjes
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-10-15 15:03 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue 09-10-18 10:33:26, Michal Hocko wrote:
> On Thu 04-10-18 11:34:11, David Rientjes wrote:
> > On Thu, 4 Oct 2018, Michal Hocko wrote:
> > 
> > > > And prior to the offending commit, there were three ways to control thp 
> > > > but two ways to determine if a mapping was eligible for thp based on the 
> > > > implementation detail of one of those ways.
> > > 
> > > Yes, it is really unfortunate that we have ever allowed to leak such an
> > > internal stuff like VMA flags to userspace.
> > > 
> > 
> > Right, I don't like userspace dependencies on VmFlags in smaps myself, but 
> > it's the only way we have available that shows whether a single mapping is 
> > eligible to be backed by thp :/
> 
> Which is not the case due to reasons mentioned earlier. It only speaks
> about madvise status on the VMA.
> 
> > > > If there are three ways to 
> > > > control thp, userspace is still in the dark wrt which takes precedence 
> > > > over the other: we have PR_SET_THP_DISABLE but globally sysfs has it set 
> > > > to "always", or we have MADV_HUGEPAGE set per smaps but PR_SET_THP_DISABLE 
> > > > shown in /proc/pid/status, etc.
> > > > 
> > > > Which one is the ultimate authority?
> > > 
> > > Isn't our documentation good enough? If not then we should document it
> > > properly.
> > > 
> > 
> > No, because the offending commit actually changed the precedence itself: 
> > PR_SET_THP_DISABLE used to be honored for future mappings and the commit 
> > changed that for all current mappings.
> 
> Which is the actual and the full point of the fix as described in the
> changelog. The original implementation was poor and inconsistent.
> 
> > So as a result of the commit 
> > itself we would have had to change the documentation and userspace can't 
> > be expected to keep up with yet a fourth variable: kernel version.  It 
> > really needs to be simpler, just a per-mapping specifier.
> 
> As I've said, if you really need a per-vma granularity then make it a
> dedicated line in the output with a clear semantic. Do not make VMA
> flags even more confusing.

Can we settle with something please?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-15 15:03                                         ` Michal Hocko
@ 2018-10-15 22:25                                           ` David Rientjes
  2018-10-16 10:48                                             ` Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-10-15 22:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Mon, 15 Oct 2018, Michal Hocko wrote:

> > > No, because the offending commit actually changed the precedence itself: 
> > > PR_SET_THP_DISABLE used to be honored for future mappings and the commit 
> > > changed that for all current mappings.
> > 
> > Which is the actual and the full point of the fix as described in the
> > changelog. The original implementation was poor and inconsistent.
> > 
> > > So as a result of the commit 
> > > itself we would have had to change the documentation and userspace can't 
> > > be expected to keep up with yet a fourth variable: kernel version.  It 
> > > really needs to be simpler, just a per-mapping specifier.
> > 
> > As I've said, if you really need a per-vma granularity then make it a
> > dedicated line in the output with a clear semantic. Do not make VMA
> > flags even more confusing.
> 
> Can we settle with something please?

I don't understand the point of extending smaps with yet another line.  
The only way for a different process to determine if a single vma from 
another process is thp disabled is by the "nh" flag, so it is reasonable 
that userspace reads this.  My patch fixes that.  If smaps is extended 
with another line per your patch, it doesn't change the fact that previous 
binaries are built to check for "nh" so it does not deprecate that.  
("THP_Enabled" is also ambiguous since it only refers to prctl and not the 
default thp setting or madvise.)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-15 22:25                                           ` David Rientjes
@ 2018-10-16 10:48                                             ` Michal Hocko
  2018-10-16 21:24                                               ` David Rientjes
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-10-16 10:48 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Mon 15-10-18 15:25:14, David Rientjes wrote:
> On Mon, 15 Oct 2018, Michal Hocko wrote:
> 
> > > > No, because the offending commit actually changed the precedence itself: 
> > > > PR_SET_THP_DISABLE used to be honored for future mappings and the commit 
> > > > changed that for all current mappings.
> > > 
> > > Which is the actual and the full point of the fix as described in the
> > > changelog. The original implementation was poor and inconsistent.
> > > 
> > > > So as a result of the commit 
> > > > itself we would have had to change the documentation and userspace can't 
> > > > be expected to keep up with yet a fourth variable: kernel version.  It 
> > > > really needs to be simpler, just a per-mapping specifier.
> > > 
> > > As I've said, if you really need a per-vma granularity then make it a
> > > dedicated line in the output with a clear semantic. Do not make VMA
> > > flags even more confusing.
> > 
> > Can we settle with something please?
> 
> I don't understand the point of extending smaps with yet another line.  

Because abusing a vma flag part is just wrong. What are you going to do
when a next bug report states that the flag is set even though no
userspace has set it and that leads to some malfunctioning? Can you rule
that out? Even your abuse of the flag is surprising so why others
wouldn't be?

> The only way for a different process to determine if a single vma from 
> another process is thp disabled is by the "nh" flag, so it is reasonable 
> that userspace reads this.  My patch fixes that.  If smaps is extended 
> with another line per your patch, it doesn't change the fact that previous 
> binaries are built to check for "nh" so it does not deprecate that.  
> ("THP_Enabled" is also ambiguous since it only refers to prctl and not the 
> default thp setting or madvise.)

As I've said there are two things. Exporting PR_SET_THP_DISABLE to
userspace so that a 3rd party process can query it. I've already
explained why that might be useful. If you really insist on having
a per-vma field then let's do it properly now. Are you going to agree on
that? If yes, I am willing to spend my time on that but I am not going
to bother if this will lead to "I want my vma field abuse anyway".
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-16 10:48                                             ` Michal Hocko
@ 2018-10-16 21:24                                               ` David Rientjes
  2018-10-17  7:05                                                 ` Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-10-16 21:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue, 16 Oct 2018, Michal Hocko wrote:

> > I don't understand the point of extending smaps with yet another line.  
> 
> Because abusing a vma flag part is just wrong. What are you going to do
> when a next bug report states that the flag is set even though no
> userspace has set it and that leads to some malfunctioning? Can you rule
> that out? Even your abuse of the flag is surprising so why others
> wouldn't be?
> 

The flag has taken on the meaning of "thp disabled for this vma", how it 
is set is not the scope of the flag.  If a thp is explicitly disabled from 
being eligible for thp, whether by madvise, prctl, or any future 
mechanism, it should use VM_NOHUGEPAGE or show_smap_vma_flags() needs to 
be modified.

I agree with you that this could have been done better if an interface was 
defined earlier that userspace could have used.  PR_SET_THP_DISABLE was 
merged long after thp had already been merged so this can be a reminder 
that defining clean, robust, and extensible APIs is important, but I'm 
afraid we can't go back in time and change how userspace queries 
information, especially in cases where there was only one way to do it.

> > The only way for a different process to determine if a single vma from 
> > another process is thp disabled is by the "nh" flag, so it is reasonable 
> > that userspace reads this.  My patch fixes that.  If smaps is extended 
> > with another line per your patch, it doesn't change the fact that previous 
> > binaries are built to check for "nh" so it does not deprecate that.  
> > ("THP_Enabled" is also ambiguous since it only refers to prctl and not the 
> > default thp setting or madvise.)
> 
> As I've said there are two things. Exporting PR_SET_THP_DISABLE to
> userspace so that a 3rd party process can query it. I've already
> explained why that might be useful. If you really insist on having
> a per-vma field then let's do it properly now. Are you going to agree on
> that? If yes, I am willing to spend my time on that but I am not going
> to bother if this will lead to "I want my vma field abuse anyway".

I think what you and I want is largely irrelevant :)  What's important is 
that there are userspace implementations that query this today so 
continuing to support it as the way to determine if a vma has been thp 
disabled doesn't seem problematic and guarantees that userspace doesn't 
break.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-16 21:24                                               ` David Rientjes
@ 2018-10-17  7:05                                                 ` Michal Hocko
  2018-10-17 19:59                                                   ` David Rientjes
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-10-17  7:05 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Tue 16-10-18 14:24:19, David Rientjes wrote:
> On Tue, 16 Oct 2018, Michal Hocko wrote:
> 
> > > I don't understand the point of extending smaps with yet another line.  
> > 
> > Because abusing a vma flag part is just wrong. What are you going to do
> > when a next bug report states that the flag is set even though no
> > userspace has set it and that leads to some malfunctioning? Can you rule
> > that out? Even your abuse of the flag is surprising so why others
> > wouldn't be?
> > 
> 
> The flag has taken on the meaning of "thp disabled for this vma", how it 
> is set is not the scope of the flag.  If a thp is explicitly disabled from 
> being eligible for thp, whether by madvise, prctl, or any future 
> mechanism, it should use VM_NOHUGEPAGE or show_smap_vma_flags() needs to 
> be modified.

No, this is not the meaning which is documented

nh  - no-huge page advise flag

and as far as I know it is only you who has complained so far.
 
> > As I've said there are two things. Exporting PR_SET_THP_DISABLE to
> > userspace so that a 3rd party process can query it. I've already
> > explained why that might be useful. If you really insist on having
> > a per-vma field then let's do it properly now. Are you going to agree on
> > that? If yes, I am willing to spend my time on that but I am not going
> > to bother if this will lead to "I want my vma field abuse anyway".
> 
> I think what you and I want is largely irrelevant :)  What's important is 
> that there are userspace implementations that query this today so 
> continuing to support it as the way to determine if a vma has been thp 
> disabled doesn't seem problematic and guarantees that userspace doesn't 
> break.

Do you know of any other userspace except your usecase? Is there
anything fundamental that would prevent a proper API adoption for you?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-17  7:05                                                 ` Michal Hocko
@ 2018-10-17 19:59                                                   ` David Rientjes
  2018-10-18  7:00                                                     ` Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-10-17 19:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Wed, 17 Oct 2018, Michal Hocko wrote:

> Do you know of any other userspace except your usecase? Is there
> anything fundamental that would prevent a proper API adoption for you?
> 

Yes, it would require us to go back in time and build patched binaries. 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-17 19:59                                                   ` David Rientjes
@ 2018-10-18  7:00                                                     ` Michal Hocko
  2018-11-14 13:23                                                       ` Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-10-18  7:00 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Wed 17-10-18 12:59:18, David Rientjes wrote:
> On Wed, 17 Oct 2018, Michal Hocko wrote:
> 
> > Do you know of any other userspace except your usecase? Is there
> > anything fundamental that would prevent a proper API adoption for you?
> > 
> 
> Yes, it would require us to go back in time and build patched binaries. 

I read that as there is a fundamental problem to update existing
binaries. If that is the case then there surely is no way around it
and another sad page in the screwed up APIs book we provide.

But I was under impression that the SW stack which actually does the
monitoring is under your controll. Moreover I was under impression that
you do not use the current vanilla kernel so there is no need for an
immediate change on your end. It is trivial to come up with a backward
compatible way to check for the new flag (if it is not present then
fallback to vma flags).

I am sorry for pushing here but if this is just a matter of a _single_
user which _can_ be fixed with a reasonable effort then I would love to
see the future api unscrewed.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-10-18  7:00                                                     ` Michal Hocko
@ 2018-11-14 13:23                                                       ` Michal Hocko
  2018-11-14 21:41                                                         ` David Rientjes
  0 siblings, 1 reply; 41+ messages in thread
From: Michal Hocko @ 2018-11-14 13:23 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Thu 18-10-18 09:00:31, Michal Hocko wrote:
> On Wed 17-10-18 12:59:18, David Rientjes wrote:
> > On Wed, 17 Oct 2018, Michal Hocko wrote:
> > 
> > > Do you know of any other userspace except your usecase? Is there
> > > anything fundamental that would prevent a proper API adoption for you?
> > > 
> > 
> > Yes, it would require us to go back in time and build patched binaries. 
> 
> I read that as there is a fundamental problem to update existing
> binaries. If that is the case then there surely is no way around it
> and another sad page in the screwed up APIs book we provide.
> 
> But I was under impression that the SW stack which actually does the
> monitoring is under your controll. Moreover I was under impression that
> you do not use the current vanilla kernel so there is no need for an
> immediate change on your end. It is trivial to come up with a backward
> compatible way to check for the new flag (if it is not present then
> fallback to vma flags).
> 
> I am sorry for pushing here but if this is just a matter of a _single_
> user which _can_ be fixed with a reasonable effort then I would love to
> see the future api unscrewed.

ping
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-11-14 13:23                                                       ` Michal Hocko
@ 2018-11-14 21:41                                                         ` David Rientjes
  2018-11-15  9:02                                                           ` Michal Hocko
  0 siblings, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-11-14 21:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Wed, 14 Nov 2018, Michal Hocko wrote:

> > > > Do you know of any other userspace except your usecase? Is there
> > > > anything fundamental that would prevent a proper API adoption for you?
> > > > 
> > > 
> > > Yes, it would require us to go back in time and build patched binaries. 
> > 
> > I read that as there is a fundamental problem to update existing
> > binaries. If that is the case then there surely is no way around it
> > and another sad page in the screwed up APIs book we provide.
> > 
> > But I was under impression that the SW stack which actually does the
> > monitoring is under your controll. Moreover I was under impression that
> > you do not use the current vanilla kernel so there is no need for an
> > immediate change on your end. It is trivial to come up with a backward
> > compatible way to check for the new flag (if it is not present then
> > fallback to vma flags).
> > 

The userspace had a single way to determine if thp had been disabled for a 
specific vma and that was broken with your commit.  We have since fixed 
it.  Modifying our software stack to start looking for some field 
somewhere else will not help anybody else that this has affected or will 
affect.  I'm interested in not breaking userspace, not trying a wait and 
see approach to see if anybody else complains once we start looking for 
some other field.  The risk outweighs the reward, it already broke us, and 
I'd prefer not to even open the possibility of breaking anybody else.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-11-14 21:41                                                         ` David Rientjes
@ 2018-11-15  9:02                                                           ` Michal Hocko
  2018-11-15  9:22                                                             ` Michal Hocko
  2018-11-19 22:05                                                             ` David Rientjes
  0 siblings, 2 replies; 41+ messages in thread
From: Michal Hocko @ 2018-11-15  9:02 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Wed 14-11-18 13:41:12, David Rientjes wrote:
> On Wed, 14 Nov 2018, Michal Hocko wrote:
> 
> > > > > Do you know of any other userspace except your usecase? Is there
> > > > > anything fundamental that would prevent a proper API adoption for you?
> > > > > 
> > > > 
> > > > Yes, it would require us to go back in time and build patched binaries. 
> > > 
> > > I read that as there is a fundamental problem to update existing
> > > binaries. If that is the case then there surely is no way around it
> > > and another sad page in the screwed up APIs book we provide.
> > > 
> > > But I was under impression that the SW stack which actually does the
> > > monitoring is under your controll. Moreover I was under impression that
> > > you do not use the current vanilla kernel so there is no need for an
> > > immediate change on your end. It is trivial to come up with a backward
> > > compatible way to check for the new flag (if it is not present then
> > > fallback to vma flags).
> > > 
> 
> The userspace had a single way to determine if thp had been disabled for a 
> specific vma and that was broken with your commit.  We have since fixed 
> it.  Modifying our software stack to start looking for some field 
> somewhere else will not help anybody else that this has affected or will 
> affect.  I'm interested in not breaking userspace, not trying a wait and 
> see approach to see if anybody else complains once we start looking for 
> some other field.  The risk outweighs the reward, it already broke us, and 
> I'd prefer not to even open the possibility of breaking anybody else.

I very much agree on "do not break userspace" part but this is kind of
gray area. VMA flags are a deep internal implementation detail and
nobody should really depend on it for anything important. The original
motivation for introducing it was CRIU where it is kind of
understandable. I would argue they should find a different way but it is
just too late for them.

For this particular case there was no other bug report except for yours
and if it is possible to fix it on your end then I would really love to
make the a sensible user interface to query the status. If we are going
to change the semantic of the exported flag again then we risk yet
another breakage.

Therefore I am asking whether changing your particular usecase to a new
interface is possible because that would allow to have a longerm
sensible user interface rather than another kludge which still doesn't
cover all the usecases (e.g. there is no way to reliably query the
madvise status after your patch).

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-11-15  9:02                                                           ` Michal Hocko
@ 2018-11-15  9:22                                                             ` Michal Hocko
  2018-11-19 22:05                                                             ` David Rientjes
  1 sibling, 0 replies; 41+ messages in thread
From: Michal Hocko @ 2018-11-15  9:22 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Thu 15-11-18 10:02:42, Michal Hocko wrote:
> On Wed 14-11-18 13:41:12, David Rientjes wrote:
> > On Wed, 14 Nov 2018, Michal Hocko wrote:
> > 
> > > > > > Do you know of any other userspace except your usecase? Is there
> > > > > > anything fundamental that would prevent a proper API adoption for you?
> > > > > > 
> > > > > 
> > > > > Yes, it would require us to go back in time and build patched binaries. 
> > > > 
> > > > I read that as there is a fundamental problem to update existing
> > > > binaries. If that is the case then there surely is no way around it
> > > > and another sad page in the screwed up APIs book we provide.
> > > > 
> > > > But I was under impression that the SW stack which actually does the
> > > > monitoring is under your controll. Moreover I was under impression that
> > > > you do not use the current vanilla kernel so there is no need for an
> > > > immediate change on your end. It is trivial to come up with a backward
> > > > compatible way to check for the new flag (if it is not present then
> > > > fallback to vma flags).
> > > > 
> > 
> > The userspace had a single way to determine if thp had been disabled for a 
> > specific vma and that was broken with your commit.  We have since fixed 
> > it.  Modifying our software stack to start looking for some field 
> > somewhere else will not help anybody else that this has affected or will 
> > affect.  I'm interested in not breaking userspace, not trying a wait and 
> > see approach to see if anybody else complains once we start looking for 
> > some other field.  The risk outweighs the reward, it already broke us, and 
> > I'd prefer not to even open the possibility of breaking anybody else.
> 
> I very much agree on "do not break userspace" part but this is kind of
> gray area. VMA flags are a deep internal implementation detail and
> nobody should really depend on it for anything important. The original
> motivation for introducing it was CRIU where it is kind of
> understandable. I would argue they should find a different way but it is
> just too late for them.
> 
> For this particular case there was no other bug report except for yours
> and if it is possible to fix it on your end then I would really love to
> make the a sensible user interface to query the status. If we are going
> to change the semantic of the exported flag again then we risk yet
> another breakage.
> 
> Therefore I am asking whether changing your particular usecase to a new
> interface is possible because that would allow to have a longerm
> sensible user interface rather than another kludge which still doesn't
> cover all the usecases (e.g. there is no way to reliably query the
> madvise status after your patch).

Btw. this is essentially the same kind of problem as
http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz
where the conclusion was to come up with a saner interface rather than
mimic the previous one.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-11-15  9:02                                                           ` Michal Hocko
  2018-11-15  9:22                                                             ` Michal Hocko
@ 2018-11-19 22:05                                                             ` David Rientjes
  2018-11-20  7:48                                                               ` Michal Hocko
  1 sibling, 1 reply; 41+ messages in thread
From: David Rientjes @ 2018-11-19 22:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Thu, 15 Nov 2018, Michal Hocko wrote:

> > The userspace had a single way to determine if thp had been disabled for a 
> > specific vma and that was broken with your commit.  We have since fixed 
> > it.  Modifying our software stack to start looking for some field 
> > somewhere else will not help anybody else that this has affected or will 
> > affect.  I'm interested in not breaking userspace, not trying a wait and 
> > see approach to see if anybody else complains once we start looking for 
> > some other field.  The risk outweighs the reward, it already broke us, and 
> > I'd prefer not to even open the possibility of breaking anybody else.
> 
> I very much agree on "do not break userspace" part but this is kind of
> gray area. VMA flags are a deep internal implementation detail and
> nobody should really depend on it for anything important. The original
> motivation for introducing it was CRIU where it is kind of
> understandable. I would argue they should find a different way but it is
> just too late for them.
> 
> For this particular case there was no other bug report except for yours
> and if it is possible to fix it on your end then I would really love to
> make the a sensible user interface to query the status. If we are going
> to change the semantic of the exported flag again then we risk yet
> another breakage.
> 
> Therefore I am asking whether changing your particular usecase to a new
> interface is possible because that would allow to have a longerm
> sensible user interface rather than another kludge which still doesn't
> cover all the usecases (e.g. there is no way to reliably query the
> madvise status after your patch).
> 

Providing another interface is great, I have no objection other than 
emitting another line for every vma on the system for smaps is probably 
overkill for something as rare as PR_SET_THP_DISABLE.

That said, I think the current handling of the "nh" flag being emitted in 
smaps is logical and ensures no further userspace breakage.  If that is to 
be removed, I consider it an unnecessary risk.  That would raised in code 
review.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc
  2018-11-19 22:05                                                             ` David Rientjes
@ 2018-11-20  7:48                                                               ` Michal Hocko
  0 siblings, 0 replies; 41+ messages in thread
From: Michal Hocko @ 2018-11-20  7:48 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Vlastimil Babka, Alexey Dobriyan,
	Kirill A. Shutemov, linux-kernel, linux-mm, linux-api

On Mon 19-11-18 14:05:34, David Rientjes wrote:
> On Thu, 15 Nov 2018, Michal Hocko wrote:
> 
> > > The userspace had a single way to determine if thp had been disabled for a 
> > > specific vma and that was broken with your commit.  We have since fixed 
> > > it.  Modifying our software stack to start looking for some field 
> > > somewhere else will not help anybody else that this has affected or will 
> > > affect.  I'm interested in not breaking userspace, not trying a wait and 
> > > see approach to see if anybody else complains once we start looking for 
> > > some other field.  The risk outweighs the reward, it already broke us, and 
> > > I'd prefer not to even open the possibility of breaking anybody else.
> > 
> > I very much agree on "do not break userspace" part but this is kind of
> > gray area. VMA flags are a deep internal implementation detail and
> > nobody should really depend on it for anything important. The original
> > motivation for introducing it was CRIU where it is kind of
> > understandable. I would argue they should find a different way but it is
> > just too late for them.
> > 
> > For this particular case there was no other bug report except for yours
> > and if it is possible to fix it on your end then I would really love to
> > make the a sensible user interface to query the status. If we are going
> > to change the semantic of the exported flag again then we risk yet
> > another breakage.
> > 
> > Therefore I am asking whether changing your particular usecase to a new
> > interface is possible because that would allow to have a longerm
> > sensible user interface rather than another kludge which still doesn't
> > cover all the usecases (e.g. there is no way to reliably query the
> > madvise status after your patch).
> > 
> 
> Providing another interface is great, I have no objection other than 
> emitting another line for every vma on the system for smaps is probably 
> overkill for something as rare as PR_SET_THP_DISABLE.

Let me think about a full patch and see how it looks like.

> 
> That said, I think the current handling of the "nh" flag being emitted in 
> smaps is logical and ensures no further userspace breakage.

I have already expressed a concern that there is no way to query for
MADV_NOHUGEPAGE if we overload the flag. So this is not a riskfree
option.

> If that is to 
> be removed, I consider it an unnecessary risk.  That would raised in code 
> review.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2018-11-20  7:48 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-24 17:55 [patch] mm, thp: always specify ineligible vmas as nh in smaps David Rientjes
2018-09-24 18:25 ` Vlastimil Babka
2018-09-24 19:17   ` David Rientjes
2018-09-24 19:30     ` [patch v2] " David Rientjes
2018-09-24 19:56       ` Michal Hocko
2018-09-24 20:02         ` Michal Hocko
2018-09-24 20:43           ` Vlastimil Babka
2018-09-25  5:50             ` Michal Hocko
2018-09-25 19:52             ` David Rientjes
2018-09-25 20:29               ` Michal Hocko
2018-09-25 21:45                 ` David Rientjes
2018-09-25 22:04                   ` Andrew Morton
2018-09-26  0:55                     ` David Rientjes
2018-09-26  6:06                     ` Michal Hocko
2018-10-02 11:28                       ` [RFC PATCH] mm, proc: report PR_SET_THP_DISABLE in proc Michal Hocko
2018-10-02 20:29                         ` David Rientjes
2018-10-03  7:36                           ` Michal Hocko
2018-10-03 22:51                             ` David Rientjes
2018-10-04  5:58                               ` Michal Hocko
2018-10-04  9:15                                 ` David Rientjes
2018-10-04  9:46                                   ` Michal Hocko
2018-10-04 18:34                                     ` David Rientjes
2018-10-09  8:33                                       ` Michal Hocko
2018-10-15 15:03                                         ` Michal Hocko
2018-10-15 22:25                                           ` David Rientjes
2018-10-16 10:48                                             ` Michal Hocko
2018-10-16 21:24                                               ` David Rientjes
2018-10-17  7:05                                                 ` Michal Hocko
2018-10-17 19:59                                                   ` David Rientjes
2018-10-18  7:00                                                     ` Michal Hocko
2018-11-14 13:23                                                       ` Michal Hocko
2018-11-14 21:41                                                         ` David Rientjes
2018-11-15  9:02                                                           ` Michal Hocko
2018-11-15  9:22                                                             ` Michal Hocko
2018-11-19 22:05                                                             ` David Rientjes
2018-11-20  7:48                                                               ` Michal Hocko
2018-10-03 17:33                           ` Mike Rapoport
2018-09-25 21:50               ` [patch v3] mm, thp: always specify disabled vmas as nh in smaps David Rientjes
2018-09-26  6:12                 ` Michal Hocko
2018-09-26  7:17                   ` Michal Hocko
2018-09-26  8:40                 ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).