linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
@ 2022-03-04  4:36 Hugh Dickins
  2022-03-04 18:06 ` Oleg Nesterov
  2022-03-04 18:49 ` Liam Howlett
  0 siblings, 2 replies; 15+ messages in thread
From: Hugh Dickins @ 2022-03-04  4:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Oleg Nesterov, Liam Howlett, linux-kernel, linux-mm

v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem")
introduced vma_merge() to mbind_range(); but unlike madvise, mlock and
mprotect, it put a "continue" to next vma where its precedents go to
update flags on current vma before advancing: that left vma with the
wrong setting in the infamous vma_merge() case 8.

v3.10 commit 1444f92c8498 ("mm: merging memory blocks resets mempolicy")
tried to fix that in vma_adjust(), without fully understanding the issue.

v3.11 commit 3964acd0dbec ("mm: mempolicy: fix mbind_range() &&
vma_adjust() interaction") reverted that, and went about the fix in the
right way, but chose to optimize out an unnecessary mpol_dup() with a
prior mpol_equal() test.  But on tmpfs, that also pessimized out the
vital call to its ->set_policy(), leaving the new mbind unenforced.

Just delete that optimization now (though it could be made conditional
on vma not having a set_policy).  Also remove the "next" variable:
it turned out to be blameless, but also pointless.

Fixes: 3964acd0dbec ("mm: mempolicy: fix mbind_range() && vma_adjust() interaction")
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/mempolicy.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -786,7 +786,6 @@ static int vma_replace_policy(struct vm_area_struct *vma,
 static int mbind_range(struct mm_struct *mm, unsigned long start,
 		       unsigned long end, struct mempolicy *new_pol)
 {
-	struct vm_area_struct *next;
 	struct vm_area_struct *prev;
 	struct vm_area_struct *vma;
 	int err = 0;
@@ -801,8 +800,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 	if (start > vma->vm_start)
 		prev = vma;
 
-	for (; vma && vma->vm_start < end; prev = vma, vma = next) {
-		next = vma->vm_next;
+	for (; vma && vma->vm_start < end; prev = vma, vma = vma->vm_next) {
 		vmstart = max(start, vma->vm_start);
 		vmend   = min(end, vma->vm_end);
 
@@ -817,10 +815,6 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
 				 anon_vma_name(vma));
 		if (prev) {
 			vma = prev;
-			next = vma->vm_next;
-			if (mpol_equal(vma_policy(vma), new_pol))
-				continue;
-			/* vma_merge() joined vma && vma->next, case 8 */
 			goto replace;
 		}
 		if (vma->vm_start != vmstart) {


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-04  4:36 [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge() Hugh Dickins
@ 2022-03-04 18:06 ` Oleg Nesterov
  2022-03-04 22:33   ` Hugh Dickins
  2022-03-04 18:49 ` Liam Howlett
  1 sibling, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2022-03-04 18:06 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Liam Howlett, linux-kernel, linux-mm

On 03/03, Hugh Dickins wrote:
>
> Just delete that optimization now (though it could be made conditional
> on vma not having a set_policy).  Also remove the "next" variable:
> it turned out to be blameless, but also pointless.
>
> Fixes: 3964acd0dbec ("mm: mempolicy: fix mbind_range() && vma_adjust() interaction")

I can't believe I ever looked at this code ;)

Acked-by: Oleg Nesterov <oleg@redhat.com>


offtopic question... can't vma_replace_policy() check vm_ops && vm_ops->set_policy
at the start, before mpol_dup() ?

Oleg.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-04  4:36 [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge() Hugh Dickins
  2022-03-04 18:06 ` Oleg Nesterov
@ 2022-03-04 18:49 ` Liam Howlett
  2022-03-04 19:05   ` Liam Howlett
  1 sibling, 1 reply; 15+ messages in thread
From: Liam Howlett @ 2022-03-04 18:49 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Oleg Nesterov, linux-kernel, linux-mm

* Hugh Dickins <hughd@google.com> [220303 23:36]:
> v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem")
> introduced vma_merge() to mbind_range(); but unlike madvise, mlock and
> mprotect, it put a "continue" to next vma where its precedents go to
> update flags on current vma before advancing: that left vma with the
> wrong setting in the infamous vma_merge() case 8.
> 
> v3.10 commit 1444f92c8498 ("mm: merging memory blocks resets mempolicy")
> tried to fix that in vma_adjust(), without fully understanding the issue.
> 
> v3.11 commit 3964acd0dbec ("mm: mempolicy: fix mbind_range() &&
> vma_adjust() interaction") reverted that, and went about the fix in the
> right way, but chose to optimize out an unnecessary mpol_dup() with a
> prior mpol_equal() test.  But on tmpfs, that also pessimized out the
> vital call to its ->set_policy(), leaving the new mbind unenforced.
> 
> Just delete that optimization now (though it could be made conditional
> on vma not having a set_policy).  Also remove the "next" variable:
> it turned out to be blameless, but also pointless.
> 
> Fixes: 3964acd0dbec ("mm: mempolicy: fix mbind_range() && vma_adjust() interaction")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> 
>  mm/mempolicy.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -786,7 +786,6 @@ static int vma_replace_policy(struct vm_area_struct *vma,
>  static int mbind_range(struct mm_struct *mm, unsigned long start,
>  		       unsigned long end, struct mempolicy *new_pol)
>  {
> -	struct vm_area_struct *next;
>  	struct vm_area_struct *prev;
>  	struct vm_area_struct *vma;
>  	int err = 0;
> @@ -801,8 +800,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>  	if (start > vma->vm_start)
>  		prev = vma;
>  
> -	for (; vma && vma->vm_start < end; prev = vma, vma = next) {
> -		next = vma->vm_next;
> +	for (; vma && vma->vm_start < end; prev = vma, vma = vma->vm_next) {
>  		vmstart = max(start, vma->vm_start);
>  		vmend   = min(end, vma->vm_end);
>  
> @@ -817,10 +815,6 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>  				 anon_vma_name(vma));
>  		if (prev) {
>  			vma = prev;
> -			next = vma->vm_next;
> -			if (mpol_equal(vma_policy(vma), new_pol))
> -				continue;
> -			/* vma_merge() joined vma && vma->next, case 8 */
>  			goto replace;
>  		}
>  		if (vma->vm_start != vmstart) {

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-04 18:49 ` Liam Howlett
@ 2022-03-04 19:05   ` Liam Howlett
  2022-03-04 22:48     ` Hugh Dickins
  0 siblings, 1 reply; 15+ messages in thread
From: Liam Howlett @ 2022-03-04 19:05 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton, Oleg Nesterov, linux-kernel, linux-mm

* Liam R. Howlett <Liam.Howlett@Oracle.com> [220304 13:49]:
> * Hugh Dickins <hughd@google.com> [220303 23:36]:
> > v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem")
> > introduced vma_merge() to mbind_range(); but unlike madvise, mlock and
> > mprotect, it put a "continue" to next vma where its precedents go to
> > update flags on current vma before advancing: that left vma with the
> > wrong setting in the infamous vma_merge() case 8.
> > 
> > v3.10 commit 1444f92c8498 ("mm: merging memory blocks resets mempolicy")
> > tried to fix that in vma_adjust(), without fully understanding the issue.
> > 
> > v3.11 commit 3964acd0dbec ("mm: mempolicy: fix mbind_range() &&
> > vma_adjust() interaction") reverted that, and went about the fix in the
> > right way, but chose to optimize out an unnecessary mpol_dup() with a
> > prior mpol_equal() test.  But on tmpfs, that also pessimized out the
> > vital call to its ->set_policy(), leaving the new mbind unenforced.
> > 

I just thought of something after my initial email

How does the ->set_policy() requirement on tmpfs play out for the
mpol_equal() check earlier in that for loop?


> > Just delete that optimization now (though it could be made conditional
> > on vma not having a set_policy).  Also remove the "next" variable:
> > it turned out to be blameless, but also pointless.
> > 
> > Fixes: 3964acd0dbec ("mm: mempolicy: fix mbind_range() && vma_adjust() interaction")
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> > 
> >  mm/mempolicy.c | 8 +-------
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> > 
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -786,7 +786,6 @@ static int vma_replace_policy(struct vm_area_struct *vma,
> >  static int mbind_range(struct mm_struct *mm, unsigned long start,
> >  		       unsigned long end, struct mempolicy *new_pol)
> >  {
> > -	struct vm_area_struct *next;
> >  	struct vm_area_struct *prev;
> >  	struct vm_area_struct *vma;
> >  	int err = 0;
> > @@ -801,8 +800,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
> >  	if (start > vma->vm_start)
> >  		prev = vma;
> >  
> > -	for (; vma && vma->vm_start < end; prev = vma, vma = next) {
> > -		next = vma->vm_next;
> > +	for (; vma && vma->vm_start < end; prev = vma, vma = vma->vm_next) {
> >  		vmstart = max(start, vma->vm_start);
> >  		vmend   = min(end, vma->vm_end);
> >  
> > @@ -817,10 +815,6 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
> >  				 anon_vma_name(vma));
> >  		if (prev) {
> >  			vma = prev;
> > -			next = vma->vm_next;
> > -			if (mpol_equal(vma_policy(vma), new_pol))
> > -				continue;
> > -			/* vma_merge() joined vma && vma->next, case 8 */
> >  			goto replace;
> >  		}
> >  		if (vma->vm_start != vmstart) {
> 
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-04 18:06 ` Oleg Nesterov
@ 2022-03-04 22:33   ` Hugh Dickins
  0 siblings, 0 replies; 15+ messages in thread
From: Hugh Dickins @ 2022-03-04 22:33 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Hugh Dickins, Andrew Morton, Liam Howlett, linux-kernel, linux-mm

On Fri, 4 Mar 2022, Oleg Nesterov wrote:
> On 03/03, Hugh Dickins wrote:
> >
> > Just delete that optimization now (though it could be made conditional
> > on vma not having a set_policy).  Also remove the "next" variable:
> > it turned out to be blameless, but also pointless.
> >
> > Fixes: 3964acd0dbec ("mm: mempolicy: fix mbind_range() && vma_adjust() interaction")
> 
> I can't believe I ever looked at this code ;)

;)

> 
> Acked-by: Oleg Nesterov <oleg@redhat.com>

Thanks.

> 
> offtopic question... can't vma_replace_policy() check vm_ops && vm_ops->set_policy
> at the start, before mpol_dup() ?

You are probably correct, though that is not how I did or would do it.

For me too, it's a long time since I've been in here: I have a large
number of cleanups to mm/mempolicy.c, done in the early days of shmem
huge pages (and leading up to NUMA migration outside of mmap lock),
but never the time to carry them forward; and quite a few other people
have made their own cleanups here too, so hard work to integrate it all.

This particular patch here stood out in my mind as an actual bugfix:
our people found shmem pages being allocated on the wrong node because
of it; last time I tried to work in this area, I did pull this one out
and push it into our tree, before approaching any of the rest of it.

If I look at the last of our trees which had the full set of cleanups,
I see that my vma_replace_policy() had less in, but mbind_range() doing
what you have in mind, checking vm_ops->set_policy in between the
mpol_equal() check at the top of the loop and the vma_merge() below it:
with a comment "If set_policy() is implemented, the vma is nothing more
than a window on to the shared object, and we do not set vm_policy and
we do not split the vma" (then calls the set_policy()).

But I think that depends on other changes in other patches (not
setting vm_policy): now, just as usual, I absolutely do not have
the time to get back into that.  So, thanks for the observation,
but I have to stick with just this bugfix for now (a long now!).

It's good to hear from you, Oleg: stay well.

Hugh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-04 19:05   ` Liam Howlett
@ 2022-03-04 22:48     ` Hugh Dickins
  2022-03-05  2:00       ` Liam Howlett
  0 siblings, 1 reply; 15+ messages in thread
From: Hugh Dickins @ 2022-03-04 22:48 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Hugh Dickins, Andrew Morton, Oleg Nesterov, linux-kernel, linux-mm

On Fri, 4 Mar 2022, Liam Howlett wrote:
> * Liam R. Howlett <Liam.Howlett@Oracle.com> [220304 13:49]:
> > * Hugh Dickins <hughd@google.com> [220303 23:36]:
> 
> I just thought of something after my initial email
> 
> How does the ->set_policy() requirement on tmpfs play out for the
> mpol_equal() check earlier in that for loop?

It took me a while to page all this back in (and remind myself of
what is case 8) to answer that question!

The answer is that the mpol_equal() check at the top of the loop is on
an existing, unmodified vma; so it's right to assume that any necessary
set_policy() has already been done.

Whereas the mpol_equal() check being removed in this patch, is being
done on a vma which may have just been extended to cover a greater range:
so although the relevant set_policy() may have already been done on a part
of its range, there is now another part which needs the policy applied.

> > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

Thank you, your review is very welcome (but mainly I Cc'ed to alert
you to how I'm probably stepping on your toes a little here - sorry).

Hugh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-04 22:48     ` Hugh Dickins
@ 2022-03-05  2:00       ` Liam Howlett
  2022-03-05  2:28         ` Hugh Dickins
  0 siblings, 1 reply; 15+ messages in thread
From: Liam Howlett @ 2022-03-05  2:00 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Oleg Nesterov, linux-kernel, linux-mm

* Hugh Dickins <hughd@google.com> [220304 17:48]:
> On Fri, 4 Mar 2022, Liam Howlett wrote:
> > * Liam R. Howlett <Liam.Howlett@Oracle.com> [220304 13:49]:
> > > * Hugh Dickins <hughd@google.com> [220303 23:36]:
> > 
> > I just thought of something after my initial email
> > 
> > How does the ->set_policy() requirement on tmpfs play out for the
> > mpol_equal() check earlier in that for loop?
> 
> It took me a while to page all this back in (and remind myself of
> what is case 8) to answer that question!
> 
> The answer is that the mpol_equal() check at the top of the loop is on
> an existing, unmodified vma; so it's right to assume that any necessary
> set_policy() has already been done.
> 
> Whereas the mpol_equal() check being removed in this patch, is being
> done on a vma which may have just been extended to cover a greater range:
> so although the relevant set_policy() may have already been done on a part
> of its range, there is now another part which needs the policy applied.

Doesn't the policy get checked during vma_merge()?  Specifically the
mpol_equal(policy, vma_policy(next)) check?

> 
> > > Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> 
> Thank you, your review is very welcome (but mainly I Cc'ed to alert
> you to how I'm probably stepping on your toes a little here - sorry).

Yes, I figured as much and I really appreciate it.  I'm quite confident
I translated this bug to my patch set.

Thanks,
Liam

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-05  2:00       ` Liam Howlett
@ 2022-03-05  2:28         ` Hugh Dickins
  2022-03-08 16:05           ` Liam Howlett
  0 siblings, 1 reply; 15+ messages in thread
From: Hugh Dickins @ 2022-03-05  2:28 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Hugh Dickins, Andrew Morton, Oleg Nesterov, linux-kernel, linux-mm

On Sat, 5 Mar 2022, Liam Howlett wrote:
> * Hugh Dickins <hughd@google.com> [220304 17:48]:
> > On Fri, 4 Mar 2022, Liam Howlett wrote:
> > > * Liam R. Howlett <Liam.Howlett@Oracle.com> [220304 13:49]:
> > > > * Hugh Dickins <hughd@google.com> [220303 23:36]:
> > > 
> > > I just thought of something after my initial email
> > > 
> > > How does the ->set_policy() requirement on tmpfs play out for the
> > > mpol_equal() check earlier in that for loop?
> > 
> > It took me a while to page all this back in (and remind myself of
> > what is case 8) to answer that question!
> > 
> > The answer is that the mpol_equal() check at the top of the loop is on
> > an existing, unmodified vma; so it's right to assume that any necessary
> > set_policy() has already been done.
> > 
> > Whereas the mpol_equal() check being removed in this patch, is being
> > done on a vma which may have just been extended to cover a greater range:
> > so although the relevant set_policy() may have already been done on a part
> > of its range, there is now another part which needs the policy applied.
> 
> Doesn't the policy get checked during vma_merge()?  Specifically the
> mpol_equal(policy, vma_policy(next)) check?

Sorry, I'm reduced to the unhelpful reply of "Yes. So?"

If vma_merge() finds that vma's new_pol allows it to be merged with prev,
that still requires mbind_range() (or its call to vma_replace_policy())
to set_policy() on prev (now assigned to vma), to apply that new_pol to
the extension of prev - vma_merge() would have checked mpol_equal(),
but would not have done the set_policy().

Hugh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-05  2:28         ` Hugh Dickins
@ 2022-03-08 16:05           ` Liam Howlett
  2022-03-08 21:32             ` Hugh Dickins
  0 siblings, 1 reply; 15+ messages in thread
From: Liam Howlett @ 2022-03-08 16:05 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Oleg Nesterov, linux-kernel, linux-mm

* Hugh Dickins <hughd@google.com> [220304 21:29]:
> On Sat, 5 Mar 2022, Liam Howlett wrote:
> > * Hugh Dickins <hughd@google.com> [220304 17:48]:
> > > On Fri, 4 Mar 2022, Liam Howlett wrote:
> > > > * Liam R. Howlett <Liam.Howlett@Oracle.com> [220304 13:49]:
> > > > > * Hugh Dickins <hughd@google.com> [220303 23:36]:
> > > > 
> > > > I just thought of something after my initial email
> > > > 
> > > > How does the ->set_policy() requirement on tmpfs play out for the
> > > > mpol_equal() check earlier in that for loop?
> > > 
> > > It took me a while to page all this back in (and remind myself of
> > > what is case 8) to answer that question!
> > > 
> > > The answer is that the mpol_equal() check at the top of the loop is on
> > > an existing, unmodified vma; so it's right to assume that any necessary
> > > set_policy() has already been done.
> > > 
> > > Whereas the mpol_equal() check being removed in this patch, is being
> > > done on a vma which may have just been extended to cover a greater range:
> > > so although the relevant set_policy() may have already been done on a part
> > > of its range, there is now another part which needs the policy applied.
> > 
> > Doesn't the policy get checked during vma_merge()?  Specifically the
> > mpol_equal(policy, vma_policy(next)) check?
> 
> Sorry, I'm reduced to the unhelpful reply of "Yes. So?"
> 
> If vma_merge() finds that vma's new_pol allows it to be merged with prev,
> that still requires mbind_range() (or its call to vma_replace_policy())
> to set_policy() on prev (now assigned to vma), to apply that new_pol to
> the extension of prev - vma_merge() would have checked mpol_equal(),
> but would not have done the set_policy().

I must be missing something.  If mpol_equal() isn't sufficient to ensure
we don't need to set_policy(), then why are the other vma_merge() cases
okay - such as madvise_update_vma() and mlock_fixup()?  Won't the mem
policy change in the same way in these cases?

Thanks,
Liam


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-08 16:05           ` Liam Howlett
@ 2022-03-08 21:32             ` Hugh Dickins
  2022-03-09 12:41               ` Vlastimil Babka
  0 siblings, 1 reply; 15+ messages in thread
From: Hugh Dickins @ 2022-03-08 21:32 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Hugh Dickins, Andrew Morton, Oleg Nesterov, linux-kernel, linux-mm

On Tue, 8 Mar 2022, Liam Howlett wrote:
> * Hugh Dickins <hughd@google.com> [220304 21:29]:
> > On Sat, 5 Mar 2022, Liam Howlett wrote:
> > > * Hugh Dickins <hughd@google.com> [220304 17:48]:
> > > > On Fri, 4 Mar 2022, Liam Howlett wrote:
> > > > > * Liam R. Howlett <Liam.Howlett@Oracle.com> [220304 13:49]:
> > > > > > * Hugh Dickins <hughd@google.com> [220303 23:36]:
> > > > > 
> > > > > I just thought of something after my initial email
> > > > > 
> > > > > How does the ->set_policy() requirement on tmpfs play out for the
> > > > > mpol_equal() check earlier in that for loop?
> > > > 
> > > > It took me a while to page all this back in (and remind myself of
> > > > what is case 8) to answer that question!
> > > > 
> > > > The answer is that the mpol_equal() check at the top of the loop is on
> > > > an existing, unmodified vma; so it's right to assume that any necessary
> > > > set_policy() has already been done.
> > > > 
> > > > Whereas the mpol_equal() check being removed in this patch, is being
> > > > done on a vma which may have just been extended to cover a greater range:
> > > > so although the relevant set_policy() may have already been done on a part
> > > > of its range, there is now another part which needs the policy applied.
> > > 
> > > Doesn't the policy get checked during vma_merge()?  Specifically the
> > > mpol_equal(policy, vma_policy(next)) check?
> > 
> > Sorry, I'm reduced to the unhelpful reply of "Yes. So?"
> > 
> > If vma_merge() finds that vma's new_pol allows it to be merged with prev,
> > that still requires mbind_range() (or its call to vma_replace_policy())
> > to set_policy() on prev (now assigned to vma), to apply that new_pol to
> > the extension of prev - vma_merge() would have checked mpol_equal(),
> > but would not have done the set_policy().
> 
> I must be missing something.  If mpol_equal() isn't sufficient to ensure
> we don't need to set_policy(), then why are the other vma_merge() cases
> okay - such as madvise_update_vma() and mlock_fixup()?  Won't the mem
> policy change in the same way in these cases?

mlock provides a good example to compare.

Mlocking pages is the business of mlock(), and mlock_fixup() needs to
attend to mm->locked_vm, and calling something to mark as PageMlocked
those pages already in the area now covered by mlock.  But it doesn't
need to worry about set_policy(), that's not its business, and is
unaffected by mlock changes (though merging of vmas needs mpol_equal()
to check that policy is the same, and merging and splitting of vmas
need to maintain the refcount of the shared policy if any).

Whereas NUMA mempolicy is the business of mbind(), and mbind_range()
needs to attend to vma->vm_policy, and if it's a mapping of something
supporting a shared set_policy(), call that to establish the new range
on the object mapped.  But it doesn't need to worry about mm->locked_vm
or whether pages are Mlocked, that's not its business, and is unaffected
by mbind changes (though merging of vmas needs to check VM_LOCKED among
other flags to check that they are the same before it can merge).

Does that help?

Hugh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-08 21:32             ` Hugh Dickins
@ 2022-03-09 12:41               ` Vlastimil Babka
  2022-03-09 19:10                 ` Liam Howlett
  2022-03-11  8:54                 ` Hugh Dickins
  0 siblings, 2 replies; 15+ messages in thread
From: Vlastimil Babka @ 2022-03-09 12:41 UTC (permalink / raw)
  To: Hugh Dickins, Liam Howlett
  Cc: Andrew Morton, Oleg Nesterov, linux-kernel, linux-mm

On 3/8/22 22:32, Hugh Dickins wrote:
> On Tue, 8 Mar 2022, Liam Howlett wrote:
>> * Hugh Dickins <hughd@google.com> [220304 21:29]:
>> > On Sat, 5 Mar 2022, Liam Howlett wrote:
>> > > * Hugh Dickins <hughd@google.com> [220304 17:48]:
>> > > > On Fri, 4 Mar 2022, Liam Howlett wrote:
>> > > > > * Liam R. Howlett <Liam.Howlett@Oracle.com> [220304 13:49]:
>> > > > > > * Hugh Dickins <hughd@google.com> [220303 23:36]:
>> > > > > 
>> > > > > I just thought of something after my initial email
>> > > > > 
>> > > > > How does the ->set_policy() requirement on tmpfs play out for the
>> > > > > mpol_equal() check earlier in that for loop?
>> > > > 
>> > > > It took me a while to page all this back in (and remind myself of
>> > > > what is case 8) to answer that question!
>> > > > 
>> > > > The answer is that the mpol_equal() check at the top of the loop is on
>> > > > an existing, unmodified vma; so it's right to assume that any necessary
>> > > > set_policy() has already been done.
>> > > > 
>> > > > Whereas the mpol_equal() check being removed in this patch, is being
>> > > > done on a vma which may have just been extended to cover a greater range:
>> > > > so although the relevant set_policy() may have already been done on a part
>> > > > of its range, there is now another part which needs the policy applied.
>> > > 
>> > > Doesn't the policy get checked during vma_merge()?  Specifically the
>> > > mpol_equal(policy, vma_policy(next)) check?
>> > 
>> > Sorry, I'm reduced to the unhelpful reply of "Yes. So?"
>> > 
>> > If vma_merge() finds that vma's new_pol allows it to be merged with prev,
>> > that still requires mbind_range() (or its call to vma_replace_policy())
>> > to set_policy() on prev (now assigned to vma), to apply that new_pol to
>> > the extension of prev - vma_merge() would have checked mpol_equal(),
>> > but would not have done the set_policy().
>> 
>> I must be missing something.  If mpol_equal() isn't sufficient to ensure
>> we don't need to set_policy(), then why are the other vma_merge() cases
>> okay - such as madvise_update_vma() and mlock_fixup()?  Won't the mem
>> policy change in the same way in these cases?
> 
> mlock provides a good example to compare.
> 
> Mlocking pages is the business of mlock(), and mlock_fixup() needs to
> attend to mm->locked_vm, and calling something to mark as PageMlocked
> those pages already in the area now covered by mlock.  But it doesn't
> need to worry about set_policy(), that's not its business, and is
> unaffected by mlock changes (though merging of vmas needs mpol_equal()
> to check that policy is the same, and merging and splitting of vmas
> need to maintain the refcount of the shared policy if any).
> 
> Whereas NUMA mempolicy is the business of mbind(), and mbind_range()
> needs to attend to vma->vm_policy, and if it's a mapping of something
> supporting a shared set_policy(), call that to establish the new range
> on the object mapped.  But it doesn't need to worry about mm->locked_vm
> or whether pages are Mlocked, that's not its business, and is unaffected
> by mbind changes (though merging of vmas needs to check VM_LOCKED among
> other flags to check that they are the same before it can merge).

So if I understand correctly, we have case 8 of vma_merge():

    AAAA
PPPPNNNNXXXX
becomes
PPPPXXXXXXXX 8

N is vma with some old policy different from new_pol
A is the range where we change to new policy new_pol, which happens to be
the same as existing policy of X
Thus vma_merge() extends vma X to include range A - the vma N
vma_merge() succeeds because it's passed new_pol to do the compatibility
checks (although N still has the previous policy)

Before Hugh's patch we would then realize "oh X already has new_pol, nothing
to do". Note that this AFAICS doesn't affect actual pages migration between
nodes, because that happens outside of mbind_range(). But it causes us to
skip vma_replace_policy(), which causes us to skip vm_ops->set_policy, where
tmpfs does something important (we could maybe argue that Hugh didn't
specify the user visible effects of this exactly enough :) what is "leaving
the new mbind unenforced" - are pages not migrated in this case?).

HTH (if I'm right),
Vlastimil

> Does that help?
> 
> Hugh
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-09 12:41               ` Vlastimil Babka
@ 2022-03-09 19:10                 ` Liam Howlett
  2022-03-11  9:33                   ` Hugh Dickins
  2022-03-11  8:54                 ` Hugh Dickins
  1 sibling, 1 reply; 15+ messages in thread
From: Liam Howlett @ 2022-03-09 19:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hugh Dickins, Andrew Morton, Oleg Nesterov, linux-kernel, linux-mm

* Vlastimil Babka <vbabka@suse.cz> [220309 07:41]:
> On 3/8/22 22:32, Hugh Dickins wrote:
> > On Tue, 8 Mar 2022, Liam Howlett wrote:
> >> * Hugh Dickins <hughd@google.com> [220304 21:29]:
> >> > On Sat, 5 Mar 2022, Liam Howlett wrote:
> >> > > * Hugh Dickins <hughd@google.com> [220304 17:48]:
> >> > > > On Fri, 4 Mar 2022, Liam Howlett wrote:
> >> > > > > * Liam R. Howlett <Liam.Howlett@Oracle.com> [220304 13:49]:
> >> > > > > > * Hugh Dickins <hughd@google.com> [220303 23:36]:
> >> > > > > 
> >> > > > > I just thought of something after my initial email
> >> > > > > 
> >> > > > > How does the ->set_policy() requirement on tmpfs play out for the
> >> > > > > mpol_equal() check earlier in that for loop?
> >> > > > 
> >> > > > It took me a while to page all this back in (and remind myself of
> >> > > > what is case 8) to answer that question!
> >> > > > 
> >> > > > The answer is that the mpol_equal() check at the top of the loop is on
> >> > > > an existing, unmodified vma; so it's right to assume that any necessary
> >> > > > set_policy() has already been done.
> >> > > > 
> >> > > > Whereas the mpol_equal() check being removed in this patch, is being
> >> > > > done on a vma which may have just been extended to cover a greater range:
> >> > > > so although the relevant set_policy() may have already been done on a part
> >> > > > of its range, there is now another part which needs the policy applied.
> >> > > 
> >> > > Doesn't the policy get checked during vma_merge()?  Specifically the
> >> > > mpol_equal(policy, vma_policy(next)) check?
> >> > 
> >> > Sorry, I'm reduced to the unhelpful reply of "Yes. So?"
> >> > 
> >> > If vma_merge() finds that vma's new_pol allows it to be merged with prev,
> >> > that still requires mbind_range() (or its call to vma_replace_policy())
> >> > to set_policy() on prev (now assigned to vma), to apply that new_pol to
> >> > the extension of prev - vma_merge() would have checked mpol_equal(),
> >> > but would not have done the set_policy().
> >> 
> >> I must be missing something.  If mpol_equal() isn't sufficient to ensure
> >> we don't need to set_policy(), then why are the other vma_merge() cases
> >> okay - such as madvise_update_vma() and mlock_fixup()?  Won't the mem
> >> policy change in the same way in these cases?
> > 
> > mlock provides a good example to compare.
> > 
> > Mlocking pages is the business of mlock(), and mlock_fixup() needs to
> > attend to mm->locked_vm, and calling something to mark as PageMlocked
> > those pages already in the area now covered by mlock.  But it doesn't
> > need to worry about set_policy(), that's not its business, and is
> > unaffected by mlock changes (though merging of vmas needs mpol_equal()
> > to check that policy is the same, and merging and splitting of vmas
> > need to maintain the refcount of the shared policy if any).
> > 
> > Whereas NUMA mempolicy is the business of mbind(), and mbind_range()
> > needs to attend to vma->vm_policy, and if it's a mapping of something
> > supporting a shared set_policy(), call that to establish the new range
> > on the object mapped.  But it doesn't need to worry about mm->locked_vm
> > or whether pages are Mlocked, that's not its business, and is unaffected
> > by mbind changes (though merging of vmas needs to check VM_LOCKED among
> > other flags to check that they are the same before it can merge).
> 
> So if I understand correctly, we have case 8 of vma_merge():
> 
>     AAAA
> PPPPNNNNXXXX
> becomes
> PPPPXXXXXXXX 8
> 
> N is vma with some old policy different from new_pol
> A is the range where we change to new policy new_pol, which happens to be
> the same as existing policy of X
> Thus vma_merge() extends vma X to include range A - the vma N
> vma_merge() succeeds because it's passed new_pol to do the compatibility
> checks (although N still has the previous policy)
> 
> Before Hugh's patch we would then realize "oh X already has new_pol, nothing
> to do". Note that this AFAICS doesn't affect actual pages migration between
> nodes, because that happens outside of mbind_range(). But it causes us to
> skip vma_replace_policy(), which causes us to skip vm_ops->set_policy, where
> tmpfs does something important (we could maybe argue that Hugh didn't
> specify the user visible effects of this exactly enough :) what is "leaving
> the new mbind unenforced" - are pages not migrated in this case?).
> 
> HTH (if I'm right),
> Vlastimil
> 
> > Does that help?
> > 

I think so.  You want to set the mpol the same across the vma because we
are specifically calling mbind, but it isn't necessary because the mpol
is equivalent but not the same mpol?  I guess I have the same question
as Vlastimil brought up - why does tmpfs need this?


Thanks,
Liam



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-09 12:41               ` Vlastimil Babka
  2022-03-09 19:10                 ` Liam Howlett
@ 2022-03-11  8:54                 ` Hugh Dickins
  2022-03-11 12:47                   ` Vlastimil Babka
  1 sibling, 1 reply; 15+ messages in thread
From: Hugh Dickins @ 2022-03-11  8:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hugh Dickins, Liam Howlett, Andrew Morton, Oleg Nesterov,
	linux-kernel, linux-mm

On Wed, 9 Mar 2022, Vlastimil Babka wrote:
> On 3/8/22 22:32, Hugh Dickins wrote:
> > On Tue, 8 Mar 2022, Liam Howlett wrote:
> >> 
> >> I must be missing something.  If mpol_equal() isn't sufficient to ensure
> >> we don't need to set_policy(), then why are the other vma_merge() cases
> >> okay - such as madvise_update_vma() and mlock_fixup()?  Won't the mem
> >> policy change in the same way in these cases?
> > 
> > mlock provides a good example to compare.
> > 
> > Mlocking pages is the business of mlock(), and mlock_fixup() needs to
> > attend to mm->locked_vm, and calling something to mark as PageMlocked
> > those pages already in the area now covered by mlock.  But it doesn't
> > need to worry about set_policy(), that's not its business, and is
> > unaffected by mlock changes (though merging of vmas needs mpol_equal()
> > to check that policy is the same, and merging and splitting of vmas
> > need to maintain the refcount of the shared policy if any).
> > 
> > Whereas NUMA mempolicy is the business of mbind(), and mbind_range()
> > needs to attend to vma->vm_policy, and if it's a mapping of something
> > supporting a shared set_policy(), call that to establish the new range
> > on the object mapped.  But it doesn't need to worry about mm->locked_vm
> > or whether pages are Mlocked, that's not its business, and is unaffected
> > by mbind changes (though merging of vmas needs to check VM_LOCKED among
> > other flags to check that they are the same before it can merge).
> 
> So if I understand correctly, we have case 8 of vma_merge():
> 
>     AAAA
> PPPPNNNNXXXX
> becomes
> PPPPXXXXXXXX 8
> 
> N is vma with some old policy different from new_pol
> A is the range where we change to new policy new_pol, which happens to be
> the same as existing policy of X
> Thus vma_merge() extends vma X to include range A - the vma N
> vma_merge() succeeds because it's passed new_pol to do the compatibility
> checks (although N still has the previous policy)

I *think* you have it the wrong way round there: my reading is that
this vma_merge() case 8 was correctly handled before, because in its
case !mpol_equal(vma_policy(vma), new_pol): I think case 8 was being
handled correctly, but the other cases were not.

Or was the comment even correct to reference case 8 especially?

I'm afraid bringing it all back to mind is a bit of an effort: I won't
stake my life on it, perhaps I'm the one who has it the wrong way round.

> 
> Before Hugh's patch we would then realize "oh X already has new_pol, nothing
> to do". Note that this AFAICS doesn't affect actual pages migration between
> nodes, because that happens outside of mbind_range(). But it causes us to
> skip vma_replace_policy(), which causes us to skip vm_ops->set_policy, where
> tmpfs does something important (we could maybe argue that Hugh didn't
> specify the user visible effects of this exactly enough :) what is "leaving
> the new mbind unenforced" - are pages not migrated in this case?).

Went back to check the original (internal) report:
mbind MPOL_BIND on tmpfs can result in allocations on the wrong node.
And it was a genuine practical case, though the finder was kind enough
to distil it down to a minimal sequence (and correctly suggest the fix).

The user visible effect was that the pages got allocated on the local node
(happened to be 0), after the mbind() caller had specifically asked for
them to be allocated on node 1.  There was not any page migration involved
in the case reported: the pages simply got allocated on the wrong node.

And yes, on this patch I should have asked for a
Cc: <stable@vger.kernel.org>

> 
> HTH (if I'm right),
> Vlastimil


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-09 19:10                 ` Liam Howlett
@ 2022-03-11  9:33                   ` Hugh Dickins
  0 siblings, 0 replies; 15+ messages in thread
From: Hugh Dickins @ 2022-03-11  9:33 UTC (permalink / raw)
  To: Liam Howlett
  Cc: Vlastimil Babka, Hugh Dickins, Andrew Morton, Oleg Nesterov,
	linux-kernel, linux-mm

On Wed, 9 Mar 2022, Liam Howlett wrote:
> * Vlastimil Babka <vbabka@suse.cz> [220309 07:41]:
> > On 3/8/22 22:32, Hugh Dickins wrote:
> > 
> > > Does that help?
> 
> I think so.  You want to set the mpol the same across the vma because we
> are specifically calling mbind, but it isn't necessary because the mpol
> is equivalent but not the same mpol?

I did not understand that logic, so cannot confirm it.

The crucial point, as I see it, is that a call to shmem_set_policy() was
needed in many cases, but had been "optimized away" by the mpol_equal test.

> I guess I have the same question
> as Vlastimil brought up - why does tmpfs need this?

One way of answering would be: because tmpfs is used to back SysV SHM,
and "man 2 mbind" has a paragraph on the subject of mbind on a shared
memory region created using the shmget(2) system call.

Or you could look in Documentation/admin-guide/mm/numa_memory_policy.rst,
section on "Shared Policy".  (But it's a surprise to me to see how that
makes no mention of tmpfs itself: it's as if it's just an accident that
this is how NUMA mempolicy is applied to mappings of tmpfs files.)

An important comment from ChangeLog-2.6.7 when mm/mempolicy.c came in:
For shmfs/tmpfs/hugetlbfs shared memory the policy is shared between
all users and remembered even when nobody has memory mapped.

(But although hugetlbfs support was intended in the original RFC to lkml,
it was never implemented in the tree, despite appearances: a patch I did
not send this time, but eventually will, removes the appearance of shared
mempolicy on hugetlbfs.)

Hugh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge()
  2022-03-11  8:54                 ` Hugh Dickins
@ 2022-03-11 12:47                   ` Vlastimil Babka
  0 siblings, 0 replies; 15+ messages in thread
From: Vlastimil Babka @ 2022-03-11 12:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Liam Howlett, Andrew Morton, Oleg Nesterov, linux-kernel, linux-mm

On 3/11/22 09:54, Hugh Dickins wrote:
> On Wed, 9 Mar 2022, Vlastimil Babka wrote:
>> On 3/8/22 22:32, Hugh Dickins wrote:
>> > On Tue, 8 Mar 2022, Liam Howlett wrote:
>> >> 
>> >> I must be missing something.  If mpol_equal() isn't sufficient to ensure
>> >> we don't need to set_policy(), then why are the other vma_merge() cases
>> >> okay - such as madvise_update_vma() and mlock_fixup()?  Won't the mem
>> >> policy change in the same way in these cases?
>> > 
>> > mlock provides a good example to compare.
>> > 
>> > Mlocking pages is the business of mlock(), and mlock_fixup() needs to
>> > attend to mm->locked_vm, and calling something to mark as PageMlocked
>> > those pages already in the area now covered by mlock.  But it doesn't
>> > need to worry about set_policy(), that's not its business, and is
>> > unaffected by mlock changes (though merging of vmas needs mpol_equal()
>> > to check that policy is the same, and merging and splitting of vmas
>> > need to maintain the refcount of the shared policy if any).
>> > 
>> > Whereas NUMA mempolicy is the business of mbind(), and mbind_range()
>> > needs to attend to vma->vm_policy, and if it's a mapping of something
>> > supporting a shared set_policy(), call that to establish the new range
>> > on the object mapped.  But it doesn't need to worry about mm->locked_vm
>> > or whether pages are Mlocked, that's not its business, and is unaffected
>> > by mbind changes (though merging of vmas needs to check VM_LOCKED among
>> > other flags to check that they are the same before it can merge).
>> 
>> So if I understand correctly, we have case 8 of vma_merge():
>> 
>>     AAAA
>> PPPPNNNNXXXX
>> becomes
>> PPPPXXXXXXXX 8
>> 
>> N is vma with some old policy different from new_pol
>> A is the range where we change to new policy new_pol, which happens to be
>> the same as existing policy of X
>> Thus vma_merge() extends vma X to include range A - the vma N
>> vma_merge() succeeds because it's passed new_pol to do the compatibility
>> checks (although N still has the previous policy)
> 
> I *think* you have it the wrong way round there: my reading is that
> this vma_merge() case 8 was correctly handled before, because in its
> case !mpol_equal(vma_policy(vma), new_pol): I think case 8 was being
> handled correctly, but the other cases were not.
> 
> Or was the comment even correct to reference case 8 especially?

I think it wasn't, but...

> I'm afraid bringing it all back to mind is a bit of an effort: I won't
> stake my life on it, perhaps I'm the one who has it the wrong way round.

... same here.

Importantly I believe your patch is the correct solution.

>> 
>> Before Hugh's patch we would then realize "oh X already has new_pol, nothing
>> to do". Note that this AFAICS doesn't affect actual pages migration between
>> nodes, because that happens outside of mbind_range(). But it causes us to
>> skip vma_replace_policy(), which causes us to skip vm_ops->set_policy, where
>> tmpfs does something important (we could maybe argue that Hugh didn't
>> specify the user visible effects of this exactly enough :) what is "leaving
>> the new mbind unenforced" - are pages not migrated in this case?).
> 
> Went back to check the original (internal) report:
> mbind MPOL_BIND on tmpfs can result in allocations on the wrong node.
> And it was a genuine practical case, though the finder was kind enough
> to distil it down to a minimal sequence (and correctly suggest the fix).
> 
> The user visible effect was that the pages got allocated on the local node
> (happened to be 0), after the mbind() caller had specifically asked for
> them to be allocated on node 1.  There was not any page migration involved
> in the case reported: the pages simply got allocated on the wrong node.

That's useful, thanks.

> And yes, on this patch I should have asked for a
> Cc: <stable@vger.kernel.org>

Agree. Andrew can add it, and also the user visible effects above?

Thanks,
Vlastimil

>> 
>> HTH (if I'm right),
>> Vlastimil



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-03-11 12:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-04  4:36 [PATCH mmotm] mempolicy: mbind_range() set_policy() after vma_merge() Hugh Dickins
2022-03-04 18:06 ` Oleg Nesterov
2022-03-04 22:33   ` Hugh Dickins
2022-03-04 18:49 ` Liam Howlett
2022-03-04 19:05   ` Liam Howlett
2022-03-04 22:48     ` Hugh Dickins
2022-03-05  2:00       ` Liam Howlett
2022-03-05  2:28         ` Hugh Dickins
2022-03-08 16:05           ` Liam Howlett
2022-03-08 21:32             ` Hugh Dickins
2022-03-09 12:41               ` Vlastimil Babka
2022-03-09 19:10                 ` Liam Howlett
2022-03-11  9:33                   ` Hugh Dickins
2022-03-11  8:54                 ` Hugh Dickins
2022-03-11 12:47                   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).