linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: madvise: MADV_DONTNEED_LOCKED
@ 2022-03-03 21:29 Johannes Weiner
  2022-03-03 21:35 ` Nadav Amit
  2022-03-03 21:47 ` Johannes Weiner
  0 siblings, 2 replies; 10+ messages in thread
From: Johannes Weiner @ 2022-03-03 21:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michal Hocko, Vlastimil Babka, linux-mm, linux-kernel

MADV_DONTNEED historically rejects mlocked ranges, but with
MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating,
there are valid use cases for depopulating locked ranges as well.

Users mlock memory to protect secrets. There are allocators for secure
buffers that want in-use memory generally mlocked, but cleared and
invalidated memory to give up the physical pages. This could be done
with explicit munlock -> mlock calls on free -> alloc of course, but
that adds two unnecessary syscalls, heavy mmap_sem write locks, vma
splits and re-merges - only to get rid of the backing pages.

Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are
okay with on-demand initial population. It seems valid to selectively
free some memory during the lifetime of such a process, without having
to mess with its overall policy.

Why add a separate flag? Isn't this a pretty niche usecase?

- MADV_DONTNEED has been bailing on locked vmas forever. It's at least
  conceivable that someone, somewhere is relying on mlock to protect
  data from perhaps broader invalidation calls. Changing this behavior
  now could lead to quiet data corruption.

- It also clarifies expectations around MADV_FREE and maybe
  MADV_REMOVE. It avoids the situation where one quietly behaves
  different than the others. MADV_FREE_LOCKED can be added later.

- The combination of mlock() and madvise() in the first place is
  probably niche. But where it happens, I'd say that dropping pages
  from a locked region once they don't contain secrets or won't page
  anymore is much saner than relying on mlock to protect memory from
  speculative or errant invalidation calls. It's just that we can't
  change the default behavior because of the two previous points.

Given that, an explicit new flag seems to make the most sense.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/uapi/asm-generic/mman-common.h |  2 ++
 mm/madvise.c                           | 16 +++++++++++++---
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 1567a3294c3d..6c1aa92a92e4 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -75,6 +75,8 @@
 #define MADV_POPULATE_READ	22	/* populate (prefault) page tables readable */
 #define MADV_POPULATE_WRITE	23	/* populate (prefault) page tables writable */
 
+#define MADV_DONTNEED_LOCKED	24	/* like DONTNEED, but drop locked pages too */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/mm/madvise.c b/mm/madvise.c
index 5604064df464..12dfa14bc985 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -800,6 +800,13 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
 	return 0;
 }
 
+static bool can_madv_dontneed_free(struct vm_area_struct *vma, int behavior)
+{
+	if (behavior == MADV_DONTNEED_LOCKED)
+		return !(vma->vm_flags & (VM_HUGETLB|VM_PFNMAP));
+	return can_madv_lru_vma(vma);
+}
+
 static long madvise_dontneed_free(struct vm_area_struct *vma,
 				  struct vm_area_struct **prev,
 				  unsigned long start, unsigned long end,
@@ -808,7 +815,8 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
 	struct mm_struct *mm = vma->vm_mm;
 
 	*prev = vma;
-	if (!can_madv_lru_vma(vma))
+
+	if (!can_madv_dontneed_free(vma, behavior))
 		return -EINVAL;
 
 	if (!userfaultfd_remove(vma, start, end)) {
@@ -830,7 +838,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
 			 */
 			return -ENOMEM;
 		}
-		if (!can_madv_lru_vma(vma))
+		if (!can_madv_dontneed_free(vma, behavior))
 			return -EINVAL;
 		if (end > vma->vm_end) {
 			/*
@@ -850,7 +858,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
 		VM_WARN_ON(start >= end);
 	}
 
-	if (behavior == MADV_DONTNEED)
+	if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED)
 		return madvise_dontneed_single_vma(vma, start, end);
 	else if (behavior == MADV_FREE)
 		return madvise_free_single_vma(vma, start, end);
@@ -988,6 +996,7 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
 		return madvise_pageout(vma, prev, start, end);
 	case MADV_FREE:
 	case MADV_DONTNEED:
+	case MADV_DONTNEED_LOCKED:
 		return madvise_dontneed_free(vma, prev, start, end, behavior);
 	case MADV_POPULATE_READ:
 	case MADV_POPULATE_WRITE:
@@ -1113,6 +1122,7 @@ madvise_behavior_valid(int behavior)
 	case MADV_REMOVE:
 	case MADV_WILLNEED:
 	case MADV_DONTNEED:
+	case MADV_DONTNEED_LOCKED:
 	case MADV_FREE:
 	case MADV_COLD:
 	case MADV_PAGEOUT:
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: madvise: MADV_DONTNEED_LOCKED
  2022-03-03 21:29 [PATCH] mm: madvise: MADV_DONTNEED_LOCKED Johannes Weiner
@ 2022-03-03 21:35 ` Nadav Amit
  2022-03-03 22:25   ` Johannes Weiner
  2022-03-03 21:47 ` Johannes Weiner
  1 sibling, 1 reply; 10+ messages in thread
From: Nadav Amit @ 2022-03-03 21:35 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Linux-MM, linux-kernel



> On Mar 3, 2022, at 1:29 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> MADV_DONTNEED historically rejects mlocked ranges, but with
> MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating,
> there are valid use cases for depopulating locked ranges as well.

...

> @@ -850,7 +858,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> 		VM_WARN_ON(start >= end);
> 	}
> 
> -	if (behavior == MADV_DONTNEED)
> +	if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED)
> 		return madvise_dontneed_single_vma(vma, start, end);
> 	else if (behavior == MADV_FREE)
> 		return madvise_free_single_vma(vma, start, end);
> @@ -988,6 +996,7 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
> 		return madvise_pageout(vma, prev, start, end);
> 	case MADV_FREE:
> 	case MADV_DONTNEED:
> +	case MADV_DONTNEED_LOCKED:
> 		return madvise_dontneed_free(vma, prev, start, end, behavior);
> 	case MADV_POPULATE_READ:
> 	case MADV_POPULATE_WRITE:
> @@ -1113,6 +1122,7 @@ madvise_behavior_valid(int behavior)
> 	case MADV_REMOVE:
> 	case MADV_WILLNEED:
> 	case MADV_DONTNEED:
> +	case MADV_DONTNEED_LOCKED:
> 	case MADV_FREE:
> 	case MADV_COLD:
> 	case MADV_PAGEOUT:

Don’t you want to change madvise_need_mmap_write() as well and add
MADV_DONTNEED_LOCKED there too?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: madvise: MADV_DONTNEED_LOCKED
  2022-03-03 21:29 [PATCH] mm: madvise: MADV_DONTNEED_LOCKED Johannes Weiner
  2022-03-03 21:35 ` Nadav Amit
@ 2022-03-03 21:47 ` Johannes Weiner
  2022-03-04  9:12   ` Michal Hocko
  2022-03-04 13:08   ` David Hildenbrand
  1 sibling, 2 replies; 10+ messages in thread
From: Johannes Weiner @ 2022-03-03 21:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michal Hocko, Vlastimil Babka, linux-mm, linux-kernel

On Thu, Mar 03, 2022 at 04:29:56PM -0500, Johannes Weiner wrote:
> MADV_DONTNEED historically rejects mlocked ranges, but with
> MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating,
> there are valid use cases for depopulating locked ranges as well.
> 
> Users mlock memory to protect secrets. There are allocators for secure
> buffers that want in-use memory generally mlocked, but cleared and
> invalidated memory to give up the physical pages. This could be done
> with explicit munlock -> mlock calls on free -> alloc of course, but
> that adds two unnecessary syscalls, heavy mmap_sem write locks, vma
> splits and re-merges - only to get rid of the backing pages.
> 
> Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are
> okay with on-demand initial population. It seems valid to selectively
> free some memory during the lifetime of such a process, without having
> to mess with its overall policy.
> 
> Why add a separate flag? Isn't this a pretty niche usecase?
> 
> - MADV_DONTNEED has been bailing on locked vmas forever. It's at least
>   conceivable that someone, somewhere is relying on mlock to protect
>   data from perhaps broader invalidation calls. Changing this behavior
>   now could lead to quiet data corruption.
> 
> - It also clarifies expectations around MADV_FREE and maybe
>   MADV_REMOVE. It avoids the situation where one quietly behaves
>   different than the others. MADV_FREE_LOCKED can be added later.
> 
> - The combination of mlock() and madvise() in the first place is
>   probably niche. But where it happens, I'd say that dropping pages
>   from a locked region once they don't contain secrets or won't page
>   anymore is much saner than relying on mlock to protect memory from
>   speculative or errant invalidation calls. It's just that we can't
>   change the default behavior because of the two previous points.
> 
> Given that, an explicit new flag seems to make the most sense.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Just for context, I found this discussion back from 2018:

https://lkml.iu.edu/hypermail/linux/kernel/1806.1/00483.html

It seems to me that the usecase wasn't really in question, but people
weren't sure about the API, and then Jason found a workaround before
the discussion really concluded. I was asked internally about this
feature, so I'm submitting another patch in this direction, but with
more thoughts on why I chose to go with a new flag. Hopefully we can
work it out this time around :-)

Thanks

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: madvise: MADV_DONTNEED_LOCKED
  2022-03-03 21:35 ` Nadav Amit
@ 2022-03-03 22:25   ` Johannes Weiner
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Weiner @ 2022-03-03 22:25 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Linux-MM, linux-kernel

On Thu, Mar 03, 2022 at 01:35:56PM -0800, Nadav Amit wrote:
> 
> 
> > On Mar 3, 2022, at 1:29 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > 
> > MADV_DONTNEED historically rejects mlocked ranges, but with
> > MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating,
> > there are valid use cases for depopulating locked ranges as well.
> 
> ...
> 
> > @@ -850,7 +858,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> > 		VM_WARN_ON(start >= end);
> > 	}
> > 
> > -	if (behavior == MADV_DONTNEED)
> > +	if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED)
> > 		return madvise_dontneed_single_vma(vma, start, end);
> > 	else if (behavior == MADV_FREE)
> > 		return madvise_free_single_vma(vma, start, end);
> > @@ -988,6 +996,7 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
> > 		return madvise_pageout(vma, prev, start, end);
> > 	case MADV_FREE:
> > 	case MADV_DONTNEED:
> > +	case MADV_DONTNEED_LOCKED:
> > 		return madvise_dontneed_free(vma, prev, start, end, behavior);
> > 	case MADV_POPULATE_READ:
> > 	case MADV_POPULATE_WRITE:
> > @@ -1113,6 +1122,7 @@ madvise_behavior_valid(int behavior)
> > 	case MADV_REMOVE:
> > 	case MADV_WILLNEED:
> > 	case MADV_DONTNEED:
> > +	case MADV_DONTNEED_LOCKED:
> > 	case MADV_FREE:
> > 	case MADV_COLD:
> > 	case MADV_PAGEOUT:
> 
> Don’t you want to change madvise_need_mmap_write() as well and add
> MADV_DONTNEED_LOCKED there too?

You're absolutely right, thanks Nadav! It'd be fine, but more
expensive than necessary. Here is the fixlet:

diff --git a/mm/madvise.c b/mm/madvise.c
index 12dfa14bc985..7dbfcd6c955a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -52,6 +52,7 @@ static int madvise_need_mmap_write(int behavior)
 	case MADV_REMOVE:
 	case MADV_WILLNEED:
 	case MADV_DONTNEED:
+	case MADV_DONTNEED_LOCKED:
 	case MADV_COLD:
 	case MADV_PAGEOUT:
 	case MADV_FREE:

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: madvise: MADV_DONTNEED_LOCKED
  2022-03-03 21:47 ` Johannes Weiner
@ 2022-03-04  9:12   ` Michal Hocko
  2022-03-04 13:08   ` David Hildenbrand
  1 sibling, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2022-03-04  9:12 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Andrew Morton, Vlastimil Babka, linux-mm, linux-kernel

[please CC linux-api if you are going to repost with the fix suggested
 by Nadav]

On Thu 03-03-22 16:47:34, Johannes Weiner wrote:
> On Thu, Mar 03, 2022 at 04:29:56PM -0500, Johannes Weiner wrote:
> > MADV_DONTNEED historically rejects mlocked ranges, but with
> > MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating,
> > there are valid use cases for depopulating locked ranges as well.
> > 
> > Users mlock memory to protect secrets. There are allocators for secure
> > buffers that want in-use memory generally mlocked, but cleared and
> > invalidated memory to give up the physical pages. This could be done
> > with explicit munlock -> mlock calls on free -> alloc of course, but
> > that adds two unnecessary syscalls, heavy mmap_sem write locks, vma
> > splits and re-merges - only to get rid of the backing pages.
> > 
> > Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are
> > okay with on-demand initial population. It seems valid to selectively
> > free some memory during the lifetime of such a process, without having
> > to mess with its overall policy.
> > 
> > Why add a separate flag? Isn't this a pretty niche usecase?
> > 
> > - MADV_DONTNEED has been bailing on locked vmas forever. It's at least
> >   conceivable that someone, somewhere is relying on mlock to protect
> >   data from perhaps broader invalidation calls. Changing this behavior
> >   now could lead to quiet data corruption.
> > 
> > - It also clarifies expectations around MADV_FREE and maybe
> >   MADV_REMOVE. It avoids the situation where one quietly behaves
> >   different than the others. MADV_FREE_LOCKED can be added later.
> > 
> > - The combination of mlock() and madvise() in the first place is
> >   probably niche. But where it happens, I'd say that dropping pages
> >   from a locked region once they don't contain secrets or won't page
> >   anymore is much saner than relying on mlock to protect memory from
> >   speculative or errant invalidation calls. It's just that we can't
> >   change the default behavior because of the two previous points.
> > 
> > Given that, an explicit new flag seems to make the most sense.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> Just for context, I found this discussion back from 2018:
> 
> https://lkml.iu.edu/hypermail/linux/kernel/1806.1/00483.html
> 
> It seems to me that the usecase wasn't really in question, but people
> weren't sure about the API, and then Jason found a workaround before
> the discussion really concluded. I was asked internally about this
> feature, so I'm submitting another patch in this direction, but with
> more thoughts on why I chose to go with a new flag. Hopefully we can
> work it out this time around :-)

Thanks for the link. The topic sounded familiar but I couldn't really
remember any details anymore. Now I do remember that I wasn't happy
about special casing MLOCK_ONFAULT. A dedicated madvise operation
is definitely safer and I am OK with that. Presented usecases make sense
to me as well.

Btw. I have a recollection that Mike is working on MADV_DONTNEED support
for hugetlb pages. I do not know the current state of that work. Not
that it would make nay impact on your new flag but some minor changes
might be needed.

Anyway, after the madvise_need_mmap_write is addressed, feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: madvise: MADV_DONTNEED_LOCKED
  2022-03-03 21:47 ` Johannes Weiner
  2022-03-04  9:12   ` Michal Hocko
@ 2022-03-04 13:08   ` David Hildenbrand
  1 sibling, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2022-03-04 13:08 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Michal Hocko, Vlastimil Babka, linux-mm, linux-kernel, dgilbert

On 03.03.22 22:47, Johannes Weiner wrote:
> On Thu, Mar 03, 2022 at 04:29:56PM -0500, Johannes Weiner wrote:
>> MADV_DONTNEED historically rejects mlocked ranges, but with
>> MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating,
>> there are valid use cases for depopulating locked ranges as well.

Indeed, there are. I'd have use for that in QEMU for virtio-mem (which
uses MADV_POPULATE_WRITE+MADV_DONTNEED to dynamically allocate/discard
memory in a sparse memory mapping to be used by the VM, currently
doesn't support mlock for obvious reasons).

Further, QEMU postcopy live migration handling requires an munlockall()
before registering the uffd handler and discarding all RAM via
MADV_DONTNEED. Once postcopy finished, we have to mlock() again. I
didn't check if there would be more stopping uffd from working on a
mlocked region, but this would be one piece of the puzzle I guess.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: madvise: MADV_DONTNEED_LOCKED
  2022-03-04 17:19 Johannes Weiner
  2022-03-04 18:45 ` Mike Kravetz
  2022-03-04 19:26 ` Shakeel Butt
@ 2022-03-08 15:31 ` Vlastimil Babka
  2 siblings, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2022-03-08 15:31 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Michal Hocko, Nadav Amit, David Hildenbrand, dgilbert,
	Mike Kravetz, linux-mm, linux-api, linux-kernel

On 3/4/22 18:19, Johannes Weiner wrote:
> MADV_DONTNEED historically rejects mlocked ranges, but with
> MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating,
> there are valid use cases for depopulating locked ranges as well.
> 
> Users mlock memory to protect secrets. There are allocators for secure
> buffers that want in-use memory generally mlocked, but cleared and
> invalidated memory to give up the physical pages. This could be done
> with explicit munlock -> mlock calls on free -> alloc of course, but
> that adds two unnecessary syscalls, heavy mmap_sem write locks, vma
> splits and re-merges - only to get rid of the backing pages.
> 
> Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are
> okay with on-demand initial population. It seems valid to selectively
> free some memory during the lifetime of such a process, without having
> to mess with its overall policy.
> 
> Why add a separate flag? Isn't this a pretty niche usecase?
> 
> - MADV_DONTNEED has been bailing on locked vmas forever. It's at least
>   conceivable that someone, somewhere is relying on mlock to protect
>   data from perhaps broader invalidation calls. Changing this behavior
>   now could lead to quiet data corruption.
> 
> - It also clarifies expectations around MADV_FREE and maybe
>   MADV_REMOVE. It avoids the situation where one quietly behaves
>   different than the others. MADV_FREE_LOCKED can be added later.

Looks like the parameter is not a bitmask, so it makes sense to have
MADV_FREE_LOCKED instead of a generic flag that combines with one of those.

> - The combination of mlock() and madvise() in the first place is
>   probably niche. But where it happens, I'd say that dropping pages
>   from a locked region once they don't contain secrets or won't page
>   anymore is much saner than relying on mlock to protect memory from
>   speculative or errant invalidation calls. It's just that we can't
>   change the default behavior because of the two previous points.
> 
> Given that, an explicit new flag seems to make the most sense.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> Acked-by: Michal Hocko <mhocko@suse.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: madvise: MADV_DONTNEED_LOCKED
  2022-03-04 17:19 Johannes Weiner
  2022-03-04 18:45 ` Mike Kravetz
@ 2022-03-04 19:26 ` Shakeel Butt
  2022-03-08 15:31 ` Vlastimil Babka
  2 siblings, 0 replies; 10+ messages in thread
From: Shakeel Butt @ 2022-03-04 19:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Vlastimil Babka, Nadav Amit,
	David Hildenbrand, dgilbert, Mike Kravetz, linux-mm, linux-api,
	linux-kernel

On Fri, Mar 04, 2022 at 12:19:12PM -0500, Johannes Weiner wrote:
> MADV_DONTNEED historically rejects mlocked ranges, but with
> MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating,
> there are valid use cases for depopulating locked ranges as well.

> Users mlock memory to protect secrets. There are allocators for secure
> buffers that want in-use memory generally mlocked, but cleared and
> invalidated memory to give up the physical pages. This could be done
> with explicit munlock -> mlock calls on free -> alloc of course, but
> that adds two unnecessary syscalls, heavy mmap_sem write locks, vma
> splits and re-merges - only to get rid of the backing pages.

> Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are
> okay with on-demand initial population. It seems valid to selectively
> free some memory during the lifetime of such a process, without having
> to mess with its overall policy.

> Why add a separate flag? Isn't this a pretty niche usecase?

> - MADV_DONTNEED has been bailing on locked vmas forever. It's at least
>    conceivable that someone, somewhere is relying on mlock to protect
>    data from perhaps broader invalidation calls. Changing this behavior
>    now could lead to quiet data corruption.

> - It also clarifies expectations around MADV_FREE and maybe
>    MADV_REMOVE. It avoids the situation where one quietly behaves
>    different than the others. MADV_FREE_LOCKED can be added later.

> - The combination of mlock() and madvise() in the first place is
>    probably niche. But where it happens, I'd say that dropping pages
>    from a locked region once they don't contain secrets or won't page
>    anymore is much saner than relying on mlock to protect memory from
>    speculative or errant invalidation calls. It's just that we can't
>    change the default behavior because of the two previous points.

> Given that, an explicit new flag seems to make the most sense.

> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: madvise: MADV_DONTNEED_LOCKED
  2022-03-04 17:19 Johannes Weiner
@ 2022-03-04 18:45 ` Mike Kravetz
  2022-03-04 19:26 ` Shakeel Butt
  2022-03-08 15:31 ` Vlastimil Babka
  2 siblings, 0 replies; 10+ messages in thread
From: Mike Kravetz @ 2022-03-04 18:45 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Michal Hocko, Vlastimil Babka, Nadav Amit, David Hildenbrand,
	dgilbert, linux-mm, linux-api, linux-kernel

On 3/4/22 09:19, Johannes Weiner wrote:
> MADV_DONTNEED historically rejects mlocked ranges, but with
> MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating,
> there are valid use cases for depopulating locked ranges as well.
> 
> Users mlock memory to protect secrets. There are allocators for secure
> buffers that want in-use memory generally mlocked, but cleared and
> invalidated memory to give up the physical pages. This could be done
> with explicit munlock -> mlock calls on free -> alloc of course, but
> that adds two unnecessary syscalls, heavy mmap_sem write locks, vma
> splits and re-merges - only to get rid of the backing pages.
> 
> Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are
> okay with on-demand initial population. It seems valid to selectively
> free some memory during the lifetime of such a process, without having
> to mess with its overall policy.
> 
> Why add a separate flag? Isn't this a pretty niche usecase?
> 
> - MADV_DONTNEED has been bailing on locked vmas forever. It's at least
>   conceivable that someone, somewhere is relying on mlock to protect
>   data from perhaps broader invalidation calls. Changing this behavior
>   now could lead to quiet data corruption.
> 
> - It also clarifies expectations around MADV_FREE and maybe
>   MADV_REMOVE. It avoids the situation where one quietly behaves
>   different than the others. MADV_FREE_LOCKED can be added later.
> 
> - The combination of mlock() and madvise() in the first place is
>   probably niche. But where it happens, I'd say that dropping pages
>   from a locked region once they don't contain secrets or won't page
>   anymore is much saner than relying on mlock to protect memory from
>   speculative or errant invalidation calls. It's just that we can't
>   change the default behavior because of the two previous points.
> 
> Given that, an explicit new flag seems to make the most sense.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
> ---
>  include/uapi/asm-generic/mman-common.h |  2 ++
>  mm/madvise.c                           | 24 ++++++++++++++----------
>  2 files changed, 16 insertions(+), 10 deletions(-)
> 
> v2:
> - mmap_sem for read is enough for DONTNEED_LOCKED, thanks Nadav
> - rebased on top of Mike's hugetlb DONTNEED patch in -mm

Thanks for rebasing on top of recent changes.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Looks like we both will be making madvise man page changes soon.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] mm: madvise: MADV_DONTNEED_LOCKED
@ 2022-03-04 17:19 Johannes Weiner
  2022-03-04 18:45 ` Mike Kravetz
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Johannes Weiner @ 2022-03-04 17:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Vlastimil Babka, Nadav Amit, David Hildenbrand,
	dgilbert, Mike Kravetz, linux-mm, linux-api, linux-kernel

MADV_DONTNEED historically rejects mlocked ranges, but with
MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating,
there are valid use cases for depopulating locked ranges as well.

Users mlock memory to protect secrets. There are allocators for secure
buffers that want in-use memory generally mlocked, but cleared and
invalidated memory to give up the physical pages. This could be done
with explicit munlock -> mlock calls on free -> alloc of course, but
that adds two unnecessary syscalls, heavy mmap_sem write locks, vma
splits and re-merges - only to get rid of the backing pages.

Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are
okay with on-demand initial population. It seems valid to selectively
free some memory during the lifetime of such a process, without having
to mess with its overall policy.

Why add a separate flag? Isn't this a pretty niche usecase?

- MADV_DONTNEED has been bailing on locked vmas forever. It's at least
  conceivable that someone, somewhere is relying on mlock to protect
  data from perhaps broader invalidation calls. Changing this behavior
  now could lead to quiet data corruption.

- It also clarifies expectations around MADV_FREE and maybe
  MADV_REMOVE. It avoids the situation where one quietly behaves
  different than the others. MADV_FREE_LOCKED can be added later.

- The combination of mlock() and madvise() in the first place is
  probably niche. But where it happens, I'd say that dropping pages
  from a locked region once they don't contain secrets or won't page
  anymore is much saner than relying on mlock to protect memory from
  speculative or errant invalidation calls. It's just that we can't
  change the default behavior because of the two previous points.

Given that, an explicit new flag seems to make the most sense.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 include/uapi/asm-generic/mman-common.h |  2 ++
 mm/madvise.c                           | 24 ++++++++++++++----------
 2 files changed, 16 insertions(+), 10 deletions(-)

v2:
- mmap_sem for read is enough for DONTNEED_LOCKED, thanks Nadav
- rebased on top of Mike's hugetlb DONTNEED patch in -mm

diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index 1567a3294c3d..6c1aa92a92e4 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -75,6 +75,8 @@
 #define MADV_POPULATE_READ	22	/* populate (prefault) page tables readable */
 #define MADV_POPULATE_WRITE	23	/* populate (prefault) page tables writable */
 
+#define MADV_DONTNEED_LOCKED	24	/* like DONTNEED, but drop locked pages too */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/mm/madvise.c b/mm/madvise.c
index e4ddd00878b5..5b6d796e55de 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -52,6 +52,7 @@ static int madvise_need_mmap_write(int behavior)
 	case MADV_REMOVE:
 	case MADV_WILLNEED:
 	case MADV_DONTNEED:
+	case MADV_DONTNEED_LOCKED:
 	case MADV_COLD:
 	case MADV_PAGEOUT:
 	case MADV_FREE:
@@ -502,14 +503,9 @@ static void madvise_cold_page_range(struct mmu_gather *tlb,
 	tlb_end_vma(tlb, vma);
 }
 
-static inline bool can_madv_lru_non_huge_vma(struct vm_area_struct *vma)
-{
-	return !(vma->vm_flags & (VM_LOCKED|VM_PFNMAP));
-}
-
 static inline bool can_madv_lru_vma(struct vm_area_struct *vma)
 {
-	return can_madv_lru_non_huge_vma(vma) && !is_vm_hugetlb_page(vma);
+	return !(vma->vm_flags & (VM_LOCKED|VM_PFNMAP|VM_HUGETLB));
 }
 
 static long madvise_cold(struct vm_area_struct *vma,
@@ -787,10 +783,16 @@ static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma,
 					    unsigned long *end,
 					    int behavior)
 {
-	if (!is_vm_hugetlb_page(vma))
-		return can_madv_lru_non_huge_vma(vma);
+	if (!is_vm_hugetlb_page(vma)) {
+		unsigned int forbidden = VM_PFNMAP;
+
+		if (behavior != MADV_DONTNEED_LOCKED)
+			forbidden |= VM_LOCKED;
+
+		return !(vma->vm_flags & forbidden);
+	}
 
-	if (behavior != MADV_DONTNEED)
+	if (behavior != MADV_DONTNEED && behavior != MADV_DONTNEED_LOCKED)
 		return false;
 	if (start & ~huge_page_mask(hstate_vma(vma)))
 		return false;
@@ -854,7 +856,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
 		VM_WARN_ON(start >= end);
 	}
 
-	if (behavior == MADV_DONTNEED)
+	if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED)
 		return madvise_dontneed_single_vma(vma, start, end);
 	else if (behavior == MADV_FREE)
 		return madvise_free_single_vma(vma, start, end);
@@ -993,6 +995,7 @@ static int madvise_vma_behavior(struct vm_area_struct *vma,
 		return madvise_pageout(vma, prev, start, end);
 	case MADV_FREE:
 	case MADV_DONTNEED:
+	case MADV_DONTNEED_LOCKED:
 		return madvise_dontneed_free(vma, prev, start, end, behavior);
 	case MADV_POPULATE_READ:
 	case MADV_POPULATE_WRITE:
@@ -1123,6 +1126,7 @@ madvise_behavior_valid(int behavior)
 	case MADV_REMOVE:
 	case MADV_WILLNEED:
 	case MADV_DONTNEED:
+	case MADV_DONTNEED_LOCKED:
 	case MADV_FREE:
 	case MADV_COLD:
 	case MADV_PAGEOUT:
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-03-08 15:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-03 21:29 [PATCH] mm: madvise: MADV_DONTNEED_LOCKED Johannes Weiner
2022-03-03 21:35 ` Nadav Amit
2022-03-03 22:25   ` Johannes Weiner
2022-03-03 21:47 ` Johannes Weiner
2022-03-04  9:12   ` Michal Hocko
2022-03-04 13:08   ` David Hildenbrand
2022-03-04 17:19 Johannes Weiner
2022-03-04 18:45 ` Mike Kravetz
2022-03-04 19:26 ` Shakeel Butt
2022-03-08 15:31 ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).