linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Allow userland to request that the kernel clear memory on release
@ 2019-04-24 19:14 Matthew Garrett
  2019-04-24 19:28 ` Matthew Wilcox
  2019-04-25 15:32 ` [PATCH] " Christopher Lameter
  0 siblings, 2 replies; 29+ messages in thread
From: Matthew Garrett @ 2019-04-24 19:14 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Matthew Garrett

From: Matthew Garrett <mjg59@google.com>

Applications that hold secrets and wish to avoid them leaking can use
mlock() to prevent the page from being pushed out to swap and
MADV_DONTDUMP to prevent it from being included in core dumps. Applications
can also use atexit() handlers to overwrite secrets on application exit.
However, if an attacker can reboot the system into another OS, they can
dump the contents of RAM and extract secrets. We can avoid this by setting
CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
firmware wipe the contents of RAM before booting another OS, but this means
rebooting takes a *long* time - the expected behaviour is for a clean
shutdown to remove the request after scrubbing secrets from RAM in order to
avoid this.

Unfortunately, if an application exits uncleanly, its secrets may still be
present in RAM. This can't be easily fixed in userland (eg, if the OOM
killer decides to kill a process holding secrets, we're not going to be able
to avoid that), so this patch adds a new flag to madvise() to allow userland
to request that the kernel clear the covered pages whenever the page
reference count hits zero. Since vm_flags is already full on 32-bit, it
will only work on 64-bit systems.

Signed-off-by: Matthew Garrett <mjg59@google.com>
---

I know nothing about mm, so this is doubtless broken in any number of
ways - please let me know how!

 include/linux/mm.h                     |  6 ++++
 include/linux/page-flags.h             |  2 ++
 include/trace/events/mmflags.h         |  4 +--
 include/uapi/asm-generic/mman-common.h |  2 ++
 mm/hugetlb.c                           |  2 ++
 mm/madvise.c                           | 39 ++++++++++++++++++++++++++
 mm/mempolicy.c                         |  2 ++
 mm/page_alloc.c                        |  6 ++++
 8 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6b10c21630f5..64bdab679275 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -257,6 +257,8 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
 #define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
+
+#define VM_WIPEONRELEASE BIT(37)       /* Clear pages when releasing them */
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -298,6 +300,10 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_GROWSUP	VM_NONE
 #endif
 
+#ifndef VM_WIPEONRELEASE
+# define VM_WIPEONRELEASE VM_NONE
+#endif
+
 /* Bits set in the VMA until the stack is in its final location */
 #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
 
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 9f8712a4b1a5..c52ea8a89c5d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -118,6 +118,7 @@ enum pageflags {
 	PG_reclaim,		/* To be reclaimed asap */
 	PG_swapbacked,		/* Page is backed by RAM/swap */
 	PG_unevictable,		/* Page is "unevictable"  */
+	PG_wipeonrelease,
 #ifdef CONFIG_MMU
 	PG_mlocked,		/* Page is vma mlocked */
 #endif
@@ -316,6 +317,7 @@ PAGEFLAG(Referenced, referenced, PF_HEAD)
 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
 	__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
+PAGEFLAG(WipeOnRelease, wipeonrelease, PF_HEAD) __CLEARPAGEFLAG(WipeOnRelease, wipeonrelease, PF_HEAD)
 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
 	TESTCLEARFLAG(Active, active, PF_HEAD)
 PAGEFLAG(Workingset, workingset, PF_HEAD)
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index a1675d43777e..4e5116a95b82 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -100,13 +100,13 @@
 	{1UL << PG_mappedtodisk,	"mappedtodisk"	},		\
 	{1UL << PG_reclaim,		"reclaim"	},		\
 	{1UL << PG_swapbacked,		"swapbacked"	},		\
-	{1UL << PG_unevictable,		"unevictable"	}		\
+	{1UL << PG_unevictable,		"unevictable"	},		\
+	{1UL << PG_wipeonrelease,	"wipeonrelease"	}		\
 IF_HAVE_PG_MLOCK(PG_mlocked,		"mlocked"	)		\
 IF_HAVE_PG_UNCACHED(PG_uncached,	"uncached"	)		\
 IF_HAVE_PG_HWPOISON(PG_hwpoison,	"hwpoison"	)		\
 IF_HAVE_PG_IDLE(PG_young,		"young"		)		\
 IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
-
 #define show_page_flags(flags)						\
 	(flags) ? __print_flags(flags, "|",				\
 	__def_pageflag_names						\
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index abd238d0f7a4..82dfff4a8e3d 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -64,6 +64,8 @@
 #define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
 #define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
 
+#define MADV_WIPEONRELEASE 20
+#define MADV_DONTWIPEONRELEASE 21
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6cdc7b2d9100..2816dc5c31f9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1683,6 +1683,8 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
 	node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
 	page = alloc_huge_page_nodemask(h, node, nodemask);
 	mpol_cond_put(mpol);
+	if (vma->vm_flags & VM_WIPEONRELEASE)
+		SetPageWipeOnRelease(page);
 
 	return page;
 }
diff --git a/mm/madvise.c b/mm/madvise.c
index 21a7881a2db4..bf256c1a3b51 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -48,6 +48,23 @@ static int madvise_need_mmap_write(int behavior)
 	}
 }
 
+static int madvise_wipe_on_release(unsigned long start, unsigned long end)
+{
+	struct page *page;
+
+	for (; start < end; start += PAGE_SIZE) {
+		int ret;
+
+		ret = get_user_pages(start, 1, 0, &page, NULL);
+		if (ret != 1)
+			return ret;
+		SetPageWipeOnRelease(page);
+		put_page(page);
+	}
+
+	return 0;
+}
+
 /*
  * We can potentially split a vm area into separate
  * areas, each area with its own behavior.
@@ -92,6 +109,23 @@ static long madvise_behavior(struct vm_area_struct *vma,
 	case MADV_KEEPONFORK:
 		new_flags &= ~VM_WIPEONFORK;
 		break;
+	case MADV_WIPEONRELEASE:
+		/* MADV_WIPEONRELEASE is only supported on anonymous memory. */
+		if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
+		    vma->vm_flags & VM_SHARED) {
+			error = -EINVAL;
+			goto out;
+		}
+		madvise_wipe_on_release(start, end);
+		new_flags |= VM_WIPEONRELEASE;
+		break;
+	case MADV_DONTWIPEONRELEASE:
+		if (VM_WIPEONRELEASE == 0) {
+			error = -EINVAL;
+			goto out;
+		}
+		new_flags &= ~VM_WIPEONRELEASE;
+		break;
 	case MADV_DONTDUMP:
 		new_flags |= VM_DONTDUMP;
 		break;
@@ -727,6 +761,8 @@ madvise_behavior_valid(int behavior)
 	case MADV_DODUMP:
 	case MADV_WIPEONFORK:
 	case MADV_KEEPONFORK:
+	case MADV_WIPEONRELEASE:
+	case MADV_DONTWIPEONRELEASE:
 #ifdef CONFIG_MEMORY_FAILURE
 	case MADV_SOFT_OFFLINE:
 	case MADV_HWPOISON:
@@ -785,6 +821,9 @@ madvise_behavior_valid(int behavior)
  *  MADV_DONTDUMP - the application wants to prevent pages in the given range
  *		from being included in its core dump.
  *  MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump.
+ *  MADV_WIPEONRELEASE - clear the contents of the memory after the last
+ *		reference to it has been released
+ *  MADV_DONTWIPEONRELEASE - cancel MADV_WIPEONRELEASE
  *
  * return values:
  *  zero    - success
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 2219e747df49..c3bda2d9ab8e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2096,6 +2096,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
 	page = __alloc_pages_nodemask(gfp, order, preferred_nid, nmask);
 	mpol_cond_put(pol);
 out:
+	if (vma->vm_flags & VM_WIPEONRELEASE)
+		SetPageWipeOnRelease(page);
 	return page;
 }
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c6ce20aaf80b..39a37d7601a5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1083,11 +1083,17 @@ static __always_inline bool free_pages_prepare(struct page *page,
 					unsigned int order, bool check_free)
 {
 	int bad = 0;
+	int i;
 
 	VM_BUG_ON_PAGE(PageTail(page), page);
 
 	trace_mm_page_free(page, order);
 
+	if (PageWipeOnRelease(page)) {
+		for (i = 0; i < (1<<order); i++)
+			clear_highpage(page + i);
+	}
+
 	/*
 	 * Check tail pages before head page information is cleared to
 	 * avoid checking PageCompound for order-0 pages.
-- 
2.21.0.593.g511ec345e18-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release
  2019-04-24 19:14 [PATCH] mm: Allow userland to request that the kernel clear memory on release Matthew Garrett
@ 2019-04-24 19:28 ` Matthew Wilcox
  2019-04-24 19:33   ` Matthew Garrett
  2019-04-25 15:32 ` [PATCH] " Christopher Lameter
  1 sibling, 1 reply; 29+ messages in thread
From: Matthew Wilcox @ 2019-04-24 19:28 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-mm, linux-kernel, Matthew Garrett

On Wed, Apr 24, 2019 at 12:14:40PM -0700, Matthew Garrett wrote:
> Unfortunately, if an application exits uncleanly, its secrets may still be
> present in RAM. This can't be easily fixed in userland (eg, if the OOM
> killer decides to kill a process holding secrets, we're not going to be able
> to avoid that), so this patch adds a new flag to madvise() to allow userland
> to request that the kernel clear the covered pages whenever the page
> reference count hits zero. Since vm_flags is already full on 32-bit, it
> will only work on 64-bit systems.

Your request seems reasonable to me.

> +++ b/include/linux/page-flags.h
> @@ -118,6 +118,7 @@ enum pageflags {
>  	PG_reclaim,		/* To be reclaimed asap */
>  	PG_swapbacked,		/* Page is backed by RAM/swap */
>  	PG_unevictable,		/* Page is "unevictable"  */
> +	PG_wipeonrelease,

But you can't have a new PageFlag.  Can you instead zero the memory in
unmap_single_vma() where we call uprobe_munmap() and untrack_pfn() today?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release
  2019-04-24 19:28 ` Matthew Wilcox
@ 2019-04-24 19:33   ` Matthew Garrett
  2019-04-24 20:20     ` Matthew Wilcox
  0 siblings, 1 reply; 29+ messages in thread
From: Matthew Garrett @ 2019-04-24 19:33 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, Linux Kernel Mailing List

On Wed, Apr 24, 2019 at 12:28 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Apr 24, 2019 at 12:14:40PM -0700, Matthew Garrett wrote:
> > Unfortunately, if an application exits uncleanly, its secrets may still be
> > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > killer decides to kill a process holding secrets, we're not going to be able
> > to avoid that), so this patch adds a new flag to madvise() to allow userland
> > to request that the kernel clear the covered pages whenever the page
> > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > will only work on 64-bit systems.
>
> Your request seems reasonable to me.
>
> > +++ b/include/linux/page-flags.h
> > @@ -118,6 +118,7 @@ enum pageflags {
> >       PG_reclaim,             /* To be reclaimed asap */
> >       PG_swapbacked,          /* Page is backed by RAM/swap */
> >       PG_unevictable,         /* Page is "unevictable"  */
> > +     PG_wipeonrelease,
>
> But you can't have a new PageFlag.  Can you instead zero the memory in
> unmap_single_vma() where we call uprobe_munmap() and untrack_pfn() today?

Is there any way the page could be referenced by something other than
a VMA at this point? If so we probably don't want to zero it here, but
we do want to zero it when the page is finally released (which is why
I went with a page flag)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release
  2019-04-24 19:33   ` Matthew Garrett
@ 2019-04-24 20:20     ` Matthew Wilcox
  2019-04-24 20:22       ` Matthew Garrett
  0 siblings, 1 reply; 29+ messages in thread
From: Matthew Wilcox @ 2019-04-24 20:20 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-mm, Linux Kernel Mailing List

On Wed, Apr 24, 2019 at 12:33:11PM -0700, Matthew Garrett wrote:
> On Wed, Apr 24, 2019 at 12:28 PM Matthew Wilcox <willy@infradead.org> wrote:
> > But you can't have a new PageFlag.  Can you instead zero the memory in
> > unmap_single_vma() where we call uprobe_munmap() and untrack_pfn() today?
> 
> Is there any way the page could be referenced by something other than
> a VMA at this point? If so we probably don't want to zero it here, but
> we do want to zero it when the page is finally released (which is why
> I went with a page flag)

It could be the target/source of direct I/O, or userspace could have
registered it with an RDMA device, or ...

It depends on the semantics you want.  There's no legacy code to
worry about here.  I was seeing this as the equivalent of an atexit()
handler; userspace is saying "When this page is unmapped, zero it".
So it doesn't matter that somebody else might be able to reference it --
userspace could have zeroed it themselves.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release
  2019-04-24 20:20     ` Matthew Wilcox
@ 2019-04-24 20:22       ` Matthew Garrett
  2019-04-24 21:10         ` [PATCH V2] " Matthew Garrett
                           ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Matthew Garrett @ 2019-04-24 20:22 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, Linux Kernel Mailing List

On Wed, Apr 24, 2019 at 1:20 PM Matthew Wilcox <willy@infradead.org> wrote:
> It depends on the semantics you want.  There's no legacy code to
> worry about here.  I was seeing this as the equivalent of an atexit()
> handler; userspace is saying "When this page is unmapped, zero it".
> So it doesn't matter that somebody else might be able to reference it --
> userspace could have zeroed it themselves.

Mm ok that seems reasonable. I'll rework with that in mind.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-24 20:22       ` Matthew Garrett
@ 2019-04-24 21:10         ` Matthew Garrett
  2019-04-25 12:14           ` Michal Hocko
  2019-04-25 22:58         ` [PATCH V3] " Matthew Garrett
  2019-04-29 19:36         ` [PATCH V4] " Matthew Garrett
  2 siblings, 1 reply; 29+ messages in thread
From: Matthew Garrett @ 2019-04-24 21:10 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Matthew Garrett

From: Matthew Garrett <mjg59@google.com>

Applications that hold secrets and wish to avoid them leaking can use
mlock() to prevent the page from being pushed out to swap and
MADV_DONTDUMP to prevent it from being included in core dumps. Applications
can also use atexit() handlers to overwrite secrets on application exit.
However, if an attacker can reboot the system into another OS, they can
dump the contents of RAM and extract secrets. We can avoid this by setting
CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
firmware wipe the contents of RAM before booting another OS, but this means
rebooting takes a *long* time - the expected behaviour is for a clean
shutdown to remove the request after scrubbing secrets from RAM in order to
avoid this.

Unfortunately, if an application exits uncleanly, its secrets may still be
present in RAM. This can't be easily fixed in userland (eg, if the OOM
killer decides to kill a process holding secrets, we're not going to be able
to avoid that), so this patch adds a new flag to madvise() to allow userland
to request that the kernel clear the covered pages whenever the page
reference count hits zero. Since vm_flags is already full on 32-bit, it
will only work on 64-bit systems.

Signed-off-by: Matthew Garrett <mjg59@google.com>
---

Modified to wipe when the VMA is released rather than on page freeing

 include/linux/mm.h                     |  6 ++++++
 include/uapi/asm-generic/mman-common.h |  2 ++
 mm/madvise.c                           | 21 +++++++++++++++++++++
 mm/memory.c                            |  3 +++
 4 files changed, 32 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6b10c21630f5..64bdab679275 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -257,6 +257,8 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
 #define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
+
+#define VM_WIPEONRELEASE BIT(37)       /* Clear pages when releasing them */
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -298,6 +300,10 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_GROWSUP	VM_NONE
 #endif
 
+#ifndef VM_WIPEONRELEASE
+# define VM_WIPEONRELEASE VM_NONE
+#endif
+
 /* Bits set in the VMA until the stack is in its final location */
 #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
 
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index abd238d0f7a4..82dfff4a8e3d 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -64,6 +64,8 @@
 #define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
 #define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
 
+#define MADV_WIPEONRELEASE 20
+#define MADV_DONTWIPEONRELEASE 21
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/mm/madvise.c b/mm/madvise.c
index 21a7881a2db4..989c2fde15cf 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct *vma,
 	case MADV_KEEPONFORK:
 		new_flags &= ~VM_WIPEONFORK;
 		break;
+	case MADV_WIPEONRELEASE:
+		/* MADV_WIPEONRELEASE is only supported on anonymous memory. */
+		if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
+		    vma->vm_flags & VM_SHARED) {
+			error = -EINVAL;
+			goto out;
+		}
+		new_flags |= VM_WIPEONRELEASE;
+		break;
+	case MADV_DONTWIPEONRELEASE:
+		if (VM_WIPEONRELEASE == 0) {
+			error = -EINVAL;
+			goto out;
+		}
+		new_flags &= ~VM_WIPEONRELEASE;
+		break;
 	case MADV_DONTDUMP:
 		new_flags |= VM_DONTDUMP;
 		break;
@@ -727,6 +743,8 @@ madvise_behavior_valid(int behavior)
 	case MADV_DODUMP:
 	case MADV_WIPEONFORK:
 	case MADV_KEEPONFORK:
+	case MADV_WIPEONRELEASE:
+	case MADV_DONTWIPEONRELEASE:
 #ifdef CONFIG_MEMORY_FAILURE
 	case MADV_SOFT_OFFLINE:
 	case MADV_HWPOISON:
@@ -785,6 +803,9 @@ madvise_behavior_valid(int behavior)
  *  MADV_DONTDUMP - the application wants to prevent pages in the given range
  *		from being included in its core dump.
  *  MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump.
+ *  MADV_WIPEONRELEASE - clear the contents of the memory after the last
+ *		reference to it has been released
+ *  MADV_DONTWIPEONRELEASE - cancel MADV_WIPEONRELEASE
  *
  * return values:
  *  zero    - success
diff --git a/mm/memory.c b/mm/memory.c
index ab650c21bccd..ff78b527660e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1091,6 +1091,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			page_remove_rmap(page, false);
 			if (unlikely(page_mapcount(page) < 0))
 				print_bad_pte(vma, addr, ptent, page);
+			if (unlikely(vma->vm_flags & VM_WIPEONRELEASE) &&
+			    page_mapcount(page) == 0)
+				clear_highpage(page);
 			if (unlikely(__tlb_remove_page(tlb, page))) {
 				force_flush = 1;
 				addr += PAGE_SIZE;
-- 
2.21.0.593.g511ec345e18-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-24 21:10         ` [PATCH V2] " Matthew Garrett
@ 2019-04-25 12:14           ` Michal Hocko
  2019-04-25 12:37             ` Michal Hocko
                               ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Michal Hocko @ 2019-04-25 12:14 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-mm, linux-kernel, Matthew Garrett, linux-api

Please cc linux-api for user visible API proposals (now done). Keep the
rest of the email intact for reference.

On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
> From: Matthew Garrett <mjg59@google.com>
> 
> Applications that hold secrets and wish to avoid them leaking can use
> mlock() to prevent the page from being pushed out to swap and
> MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> can also use atexit() handlers to overwrite secrets on application exit.
> However, if an attacker can reboot the system into another OS, they can
> dump the contents of RAM and extract secrets. We can avoid this by setting
> CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
> firmware wipe the contents of RAM before booting another OS, but this means
> rebooting takes a *long* time - the expected behaviour is for a clean
> shutdown to remove the request after scrubbing secrets from RAM in order to
> avoid this.
> 
> Unfortunately, if an application exits uncleanly, its secrets may still be
> present in RAM. This can't be easily fixed in userland (eg, if the OOM
> killer decides to kill a process holding secrets, we're not going to be able
> to avoid that), so this patch adds a new flag to madvise() to allow userland
> to request that the kernel clear the covered pages whenever the page
> reference count hits zero. Since vm_flags is already full on 32-bit, it
> will only work on 64-bit systems.
> 
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> ---
> 
> Modified to wipe when the VMA is released rather than on page freeing
> 
>  include/linux/mm.h                     |  6 ++++++
>  include/uapi/asm-generic/mman-common.h |  2 ++
>  mm/madvise.c                           | 21 +++++++++++++++++++++
>  mm/memory.c                            |  3 +++
>  4 files changed, 32 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6b10c21630f5..64bdab679275 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -257,6 +257,8 @@ extern unsigned int kobjsize(const void *objp);
>  #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
>  #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
>  #define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
> +
> +#define VM_WIPEONRELEASE BIT(37)       /* Clear pages when releasing them */
>  #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
>  
>  #ifdef CONFIG_ARCH_HAS_PKEYS
> @@ -298,6 +300,10 @@ extern unsigned int kobjsize(const void *objp);
>  # define VM_GROWSUP	VM_NONE
>  #endif
>  
> +#ifndef VM_WIPEONRELEASE
> +# define VM_WIPEONRELEASE VM_NONE
> +#endif
> +
>  /* Bits set in the VMA until the stack is in its final location */
>  #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
>  
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index abd238d0f7a4..82dfff4a8e3d 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -64,6 +64,8 @@
>  #define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
>  #define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
>  
> +#define MADV_WIPEONRELEASE 20
> +#define MADV_DONTWIPEONRELEASE 21
>  /* compatibility flags */
>  #define MAP_FILE	0
>  
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 21a7881a2db4..989c2fde15cf 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct *vma,
>  	case MADV_KEEPONFORK:
>  		new_flags &= ~VM_WIPEONFORK;
>  		break;
> +	case MADV_WIPEONRELEASE:
> +		/* MADV_WIPEONRELEASE is only supported on anonymous memory. */
> +		if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
> +		    vma->vm_flags & VM_SHARED) {
> +			error = -EINVAL;
> +			goto out;
> +		}
> +		new_flags |= VM_WIPEONRELEASE;
> +		break;
> +	case MADV_DONTWIPEONRELEASE:
> +		if (VM_WIPEONRELEASE == 0) {
> +			error = -EINVAL;
> +			goto out;
> +		}
> +		new_flags &= ~VM_WIPEONRELEASE;
> +		break;
>  	case MADV_DONTDUMP:
>  		new_flags |= VM_DONTDUMP;
>  		break;
> @@ -727,6 +743,8 @@ madvise_behavior_valid(int behavior)
>  	case MADV_DODUMP:
>  	case MADV_WIPEONFORK:
>  	case MADV_KEEPONFORK:
> +	case MADV_WIPEONRELEASE:
> +	case MADV_DONTWIPEONRELEASE:
>  #ifdef CONFIG_MEMORY_FAILURE
>  	case MADV_SOFT_OFFLINE:
>  	case MADV_HWPOISON:
> @@ -785,6 +803,9 @@ madvise_behavior_valid(int behavior)
>   *  MADV_DONTDUMP - the application wants to prevent pages in the given range
>   *		from being included in its core dump.
>   *  MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump.
> + *  MADV_WIPEONRELEASE - clear the contents of the memory after the last
> + *		reference to it has been released
> + *  MADV_DONTWIPEONRELEASE - cancel MADV_WIPEONRELEASE
>   *
>   * return values:
>   *  zero    - success
> diff --git a/mm/memory.c b/mm/memory.c
> index ab650c21bccd..ff78b527660e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1091,6 +1091,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  			page_remove_rmap(page, false);
>  			if (unlikely(page_mapcount(page) < 0))
>  				print_bad_pte(vma, addr, ptent, page);
> +			if (unlikely(vma->vm_flags & VM_WIPEONRELEASE) &&
> +			    page_mapcount(page) == 0)
> +				clear_highpage(page);
>  			if (unlikely(__tlb_remove_page(tlb, page))) {
>  				force_flush = 1;
>  				addr += PAGE_SIZE;
> -- 
> 2.21.0.593.g511ec345e18-goog

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-25 12:14           ` Michal Hocko
@ 2019-04-25 12:37             ` Michal Hocko
  2019-04-25 20:39               ` Matthew Garrett
  2019-04-25 12:40             ` Vlastimil Babka
  2019-04-25 12:42             ` Jann Horn
  2 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2019-04-25 12:37 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-mm, linux-kernel, Matthew Garrett, linux-api

On Thu 25-04-19 14:14:10, Michal Hocko wrote:
> Please cc linux-api for user visible API proposals (now done). Keep the
> rest of the email intact for reference.
> 
> On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
> > From: Matthew Garrett <mjg59@google.com>
> > 
> > Applications that hold secrets and wish to avoid them leaking can use
> > mlock() to prevent the page from being pushed out to swap and
> > MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> > can also use atexit() handlers to overwrite secrets on application exit.
> > However, if an attacker can reboot the system into another OS, they can
> > dump the contents of RAM and extract secrets. We can avoid this by setting
> > CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
> > firmware wipe the contents of RAM before booting another OS, but this means
> > rebooting takes a *long* time - the expected behaviour is for a clean
> > shutdown to remove the request after scrubbing secrets from RAM in order to
> > avoid this.
> > 
> > Unfortunately, if an application exits uncleanly, its secrets may still be
> > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > killer decides to kill a process holding secrets, we're not going to be able
> > to avoid that), so this patch adds a new flag to madvise() to allow userland
> > to request that the kernel clear the covered pages whenever the page
> > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > will only work on 64-bit systems.

The changelog seems stale. You are hooking into unmap path where the
reference count might be still > 0 and the page still held by somebody.
A previous email from Willy said
"
It could be the target/source of direct I/O, or userspace could have
registered it with an RDMA device, or ...

It depends on the semantics you want.  There's no legacy code to
worry about here.  I was seeing this as the equivalent of an atexit()
handler; userspace is saying "When this page is unmapped, zero it".
So it doesn't matter that somebody else might be able to reference it --
userspace could have zeroed it themselves.
"

I am not sure this is really a bullet proof argumentation but it should
definitely be part of the changelog.

Besides that you inherently assume that the user would do mlock because
you do not try to wipe the swap content. Is this intentional?

Another question would be regarding the targeted user API. There are
some attempts to make all the freed memory to be zeroed/poisoned. Are
users who would like to use this feature also be interested in using
system wide setting as well?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-25 12:14           ` Michal Hocko
  2019-04-25 12:37             ` Michal Hocko
@ 2019-04-25 12:40             ` Vlastimil Babka
  2019-04-25 20:45               ` Matthew Garrett
  2019-04-25 12:42             ` Jann Horn
  2 siblings, 1 reply; 29+ messages in thread
From: Vlastimil Babka @ 2019-04-25 12:40 UTC (permalink / raw)
  To: Michal Hocko, Matthew Garrett
  Cc: linux-mm, linux-kernel, Matthew Garrett, linux-api

On 4/25/19 2:14 PM, Michal Hocko wrote:
> Please cc linux-api for user visible API proposals (now done). Keep the
> rest of the email intact for reference.
> 
> On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
>> From: Matthew Garrett <mjg59@google.com>
>>
>> Applications that hold secrets and wish to avoid them leaking can use
>> mlock() to prevent the page from being pushed out to swap and
>> MADV_DONTDUMP to prevent it from being included in core dumps. Applications

So, do we really need a new madvise() flag and VMA flag, or can we just
infer this page clearing from mlock+MADV_DONTDUMP being both applied?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-25 12:14           ` Michal Hocko
  2019-04-25 12:37             ` Michal Hocko
  2019-04-25 12:40             ` Vlastimil Babka
@ 2019-04-25 12:42             ` Jann Horn
  2019-04-25 20:43               ` Matthew Garrett
  2019-04-26  5:31               ` Michal Hocko
  2 siblings, 2 replies; 29+ messages in thread
From: Jann Horn @ 2019-04-25 12:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Garrett, Linux-MM, kernel list, Matthew Garrett, Linux API

On Thu, Apr 25, 2019 at 2:14 PM Michal Hocko <mhocko@kernel.org> wrote:
[...]
> On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
> > From: Matthew Garrett <mjg59@google.com>
> >
> > Applications that hold secrets and wish to avoid them leaking can use
> > mlock() to prevent the page from being pushed out to swap and
> > MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> > can also use atexit() handlers to overwrite secrets on application exit.
> > However, if an attacker can reboot the system into another OS, they can
> > dump the contents of RAM and extract secrets. We can avoid this by setting
> > CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
> > firmware wipe the contents of RAM before booting another OS, but this means
> > rebooting takes a *long* time - the expected behaviour is for a clean
> > shutdown to remove the request after scrubbing secrets from RAM in order to
> > avoid this.
> >
> > Unfortunately, if an application exits uncleanly, its secrets may still be
> > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > killer decides to kill a process holding secrets, we're not going to be able
> > to avoid that), so this patch adds a new flag to madvise() to allow userland
> > to request that the kernel clear the covered pages whenever the page
> > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > will only work on 64-bit systems.
[...]
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 21a7881a2db4..989c2fde15cf 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct *vma,
> >       case MADV_KEEPONFORK:
> >               new_flags &= ~VM_WIPEONFORK;
> >               break;
> > +     case MADV_WIPEONRELEASE:
> > +             /* MADV_WIPEONRELEASE is only supported on anonymous memory. */
> > +             if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
> > +                 vma->vm_flags & VM_SHARED) {
> > +                     error = -EINVAL;
> > +                     goto out;
> > +             }
> > +             new_flags |= VM_WIPEONRELEASE;
> > +             break;

An interesting effect of this is that it will be possible to set this
on a CoW anon VMA in a fork() child, and then the semantics in the
parent will be subtly different - e.g. if the parent vmsplice()d a
CoWed page into a pipe, then forked an unprivileged child, the child
set MADV_WIPEONRELEASE on its VMA, the parent died somehow, and then
the child died, the page in the pipe would be zeroed out. A child
should not be able to affect its parent like this, I think. If this
was an mmap() flag instead of a madvise() command, that issue could be
avoided. Alternatively, if adding more mmap() flags doesn't work,
perhaps you could scan the VMA and ensure that it contains no pages
yet, or something like that?

> > diff --git a/mm/memory.c b/mm/memory.c
> > index ab650c21bccd..ff78b527660e 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1091,6 +1091,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> >                       page_remove_rmap(page, false);
> >                       if (unlikely(page_mapcount(page) < 0))
> >                               print_bad_pte(vma, addr, ptent, page);
> > +                     if (unlikely(vma->vm_flags & VM_WIPEONRELEASE) &&
> > +                         page_mapcount(page) == 0)
> > +                             clear_highpage(page);
> >                       if (unlikely(__tlb_remove_page(tlb, page))) {
> >                               force_flush = 1;
> >                               addr += PAGE_SIZE;

Should something like this perhaps be added in page_remove_rmap()
instead? That's where the mapcount is decremented; and looking at
other callers of page_remove_rmap(), in particular the following ones
look interesting:

 - do_huge_pmd_wp_page()/do_huge_pmd_wp_page_fallback() might be
relevant in the case where a forking process contains transparent
hugepages?
 - zap_huge_pmd() is relevant when transparent hugepages are used, I
think (otherwise transparent hugepages might not be wiped?)
 - there are various callers related to migration; I think this is
relevant on a NUMA system where memory is moved between nodes to
improve locality (moving memory to a new page and freeing the old one,
in which case you'd want to wipe the old page)

I think all the callers have a reference to the VMA, so perhaps you
could add a VMA parameter to page_remove_rmap() and then look at the
VMA in there?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release
  2019-04-24 19:14 [PATCH] mm: Allow userland to request that the kernel clear memory on release Matthew Garrett
  2019-04-24 19:28 ` Matthew Wilcox
@ 2019-04-25 15:32 ` Christopher Lameter
  2019-04-25 20:29   ` Matthew Garrett
  1 sibling, 1 reply; 29+ messages in thread
From: Christopher Lameter @ 2019-04-25 15:32 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-mm, linux-kernel, Matthew Garrett

On Wed, 24 Apr 2019, Matthew Garrett wrote:

> Applications that hold secrets and wish to avoid them leaking can use
> mlock() to prevent the page from being pushed out to swap and
> MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> can also use atexit() handlers to overwrite secrets on application exit.
> However, if an attacker can reboot the system into another OS, they can
> dump the contents of RAM and extract secrets. We can avoid this by setting

Well nothing in this patchset deals with that issue.... That hole still
exists afterwards. So is it worth to have this functionality?

> Unfortunately, if an application exits uncleanly, its secrets may still be
> present in RAM. This can't be easily fixed in userland (eg, if the OOM
> killer decides to kill a process holding secrets, we're not going to be able
> to avoid that), so this patch adds a new flag to madvise() to allow userland
> to request that the kernel clear the covered pages whenever the page
> reference count hits zero. Since vm_flags is already full on 32-bit, it
> will only work on 64-bit systems.

But then the pages are cleared anyways when reallocated to another
process. This just clears it sooner before reuse. So it will reduce the
time that a page contains the secret sauce in case the program is
aborted and cannot run its exit handling.

Is that realy worth extending system calls and adding kernel handling for
this? Maybe the answer is yes given our current concern about anything
related to "security".


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm: Allow userland to request that the kernel clear memory on release
  2019-04-25 15:32 ` [PATCH] " Christopher Lameter
@ 2019-04-25 20:29   ` Matthew Garrett
  0 siblings, 0 replies; 29+ messages in thread
From: Matthew Garrett @ 2019-04-25 20:29 UTC (permalink / raw)
  To: Christopher Lameter; +Cc: linux-mm, Linux Kernel Mailing List

On Thu, Apr 25, 2019 at 8:32 AM Christopher Lameter <cl@linux.com> wrote:
>
> On Wed, 24 Apr 2019, Matthew Garrett wrote:
>
> > Applications that hold secrets and wish to avoid them leaking can use
> > mlock() to prevent the page from being pushed out to swap and
> > MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> > can also use atexit() handlers to overwrite secrets on application exit.
> > However, if an attacker can reboot the system into another OS, they can
> > dump the contents of RAM and extract secrets. We can avoid this by setting
>
> Well nothing in this patchset deals with that issue.... That hole still
> exists afterwards. So is it worth to have this functionality?

On UEFI systems we can set the MOR bit and the firmware will overwrite
RAM on reboot. However, this can take a long time, which makes it
difficult to justify doing by default. We want userland to be able to
assert that secrets have been cleared from RAM and then clear the MOR
flag, but we can't do that if applications can terminate in a way that
prevents them from clearing their secrets.

> > Unfortunately, if an application exits uncleanly, its secrets may still be
> > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > killer decides to kill a process holding secrets, we're not going to be able
> > to avoid that), so this patch adds a new flag to madvise() to allow userland
> > to request that the kernel clear the covered pages whenever the page
> > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > will only work on 64-bit systems.
>
> But then the pages are cleared anyways when reallocated to another
> process. This just clears it sooner before reuse. So it will reduce the
> time that a page contains the secret sauce in case the program is
> aborted and cannot run its exit handling.

On a mostly idle system there's a real chance that nothing will end up
re-using the page before a reboot happens.

> Is that realy worth extending system calls and adding kernel handling for
> this? Maybe the answer is yes given our current concern about anything
> related to "security".

If I didn't think it was worth it, I wouldn't be proposing it :)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-25 12:37             ` Michal Hocko
@ 2019-04-25 20:39               ` Matthew Garrett
  2019-04-26  5:25                 ` Michal Hocko
  0 siblings, 1 reply; 29+ messages in thread
From: Matthew Garrett @ 2019-04-25 20:39 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Linux Kernel Mailing List, Linux API

On Thu, Apr 25, 2019 at 5:37 AM Michal Hocko <mhocko@kernel.org> wrote:
> Besides that you inherently assume that the user would do mlock because
> you do not try to wipe the swap content. Is this intentional?

Yes, given MADV_DONTDUMP doesn't imply mlock I thought it'd be more
consistent to keep those independent.

> Another question would be regarding the targeted user API. There are
> some attempts to make all the freed memory to be zeroed/poisoned. Are
> users who would like to use this feature also be interested in using
> system wide setting as well?

I think that depends on the performance overhead of a global setting,
but it's also influenced by the semantics around when the freeing
occurs, which is something I haven't nailed down yet. If the
expectation is that the page is freed whenever the process exits, even
if the page is in use somewhere else, then we'd still want this to be
separate to poisoning on final page free.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-25 12:42             ` Jann Horn
@ 2019-04-25 20:43               ` Matthew Garrett
  2019-04-26  5:31               ` Michal Hocko
  1 sibling, 0 replies; 29+ messages in thread
From: Matthew Garrett @ 2019-04-25 20:43 UTC (permalink / raw)
  To: Jann Horn; +Cc: Michal Hocko, Linux-MM, kernel list, Linux API

On Thu, Apr 25, 2019 at 5:43 AM Jann Horn <jannh@google.com> wrote:
> An interesting effect of this is that it will be possible to set this
> on a CoW anon VMA in a fork() child, and then the semantics in the
> parent will be subtly different - e.g. if the parent vmsplice()d a
> CoWed page into a pipe, then forked an unprivileged child, the child
> set MADV_WIPEONRELEASE on its VMA, the parent died somehow, and then
> the child died, the page in the pipe would be zeroed out. A child
> should not be able to affect its parent like this, I think. If this
> was an mmap() flag instead of a madvise() command, that issue could be
> avoided. Alternatively, if adding more mmap() flags doesn't work,
> perhaps you could scan the VMA and ensure that it contains no pages
> yet, or something like that?

I /think/ my argument here would be not to do that? I agree that it's
unexpected, but I guess the other alternative would be to force a copy
on any existing COW pages in the VMA at madvise() time, and maybe also
at fork() time (sort of like the behaviour of MADV_WIPEONFORK, but
copying the page rather than providing a new zero page)

> I think all the callers have a reference to the VMA, so perhaps you
> could add a VMA parameter to page_remove_rmap() and then look at the
> VMA in there?

I'll dig into that, thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-25 12:40             ` Vlastimil Babka
@ 2019-04-25 20:45               ` Matthew Garrett
  0 siblings, 0 replies; 29+ messages in thread
From: Matthew Garrett @ 2019-04-25 20:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Michal Hocko, Linux-MM, Linux Kernel Mailing List, Linux API

On Thu, Apr 25, 2019 at 5:44 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 4/25/19 2:14 PM, Michal Hocko wrote:
> > Please cc linux-api for user visible API proposals (now done). Keep the
> > rest of the email intact for reference.
> >
> > On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
> >> From: Matthew Garrett <mjg59@google.com>
> >>
> >> Applications that hold secrets and wish to avoid them leaking can use
> >> mlock() to prevent the page from being pushed out to swap and
> >> MADV_DONTDUMP to prevent it from being included in core dumps. Applications
>
> So, do we really need a new madvise() flag and VMA flag, or can we just
> infer this page clearing from mlock+MADV_DONTDUMP being both applied?

I think the combination would probably imply that this is the
behaviour you want, but I'm a little concerned about changing the
semantics given the corner cases described earlier in the thread. If
we can figure those out in a way that won't break any existing code, I
could buy this.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V3] mm: Allow userland to request that the kernel clear memory on release
  2019-04-24 20:22       ` Matthew Garrett
  2019-04-24 21:10         ` [PATCH V2] " Matthew Garrett
@ 2019-04-25 22:58         ` Matthew Garrett
  2019-04-26  7:45           ` Vlastimil Babka
  2019-04-29 19:36         ` [PATCH V4] " Matthew Garrett
  2 siblings, 1 reply; 29+ messages in thread
From: Matthew Garrett @ 2019-04-25 22:58 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, linux-api, Matthew Garrett

From: Matthew Garrett <mjg59@google.com>

Applications that hold secrets and wish to avoid them leaking can use
mlock() to prevent the page from being pushed out to swap and
MADV_DONTDUMP to prevent it from being included in core dumps. Applications
can also use atexit() handlers to overwrite secrets on application exit.
However, if an attacker can reboot the system into another OS, they can
dump the contents of RAM and extract secrets. We can avoid this by setting
CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
firmware wipe the contents of RAM before booting another OS, but this means
rebooting takes a *long* time - the expected behaviour is for a clean
shutdown to remove the request after scrubbing secrets from RAM in order to
avoid this.

Unfortunately, if an application exits uncleanly, its secrets may still be
present in RAM. This can't be easily fixed in userland (eg, if the OOM
killer decides to kill a process holding secrets, we're not going to be able
to avoid that), so this patch adds a new flag to madvise() to allow userland
to request that the kernel clear the covered pages whenever the page
map count hits zero. Since vm_flags is already full on 32-bit, it
will only work on 64-bit systems. This is currently only permitted on
private mappings that have not yet been populated in order to simplify
implementation, which should suffice for the envisaged use cases. We can
extend the behaviour later if we come up with a robust set of semantics.

Signed-off-by: Matthew Garrett <mjg59@google.com>
---

Updated based on feedback from Jann - for now let's just prevent setting
the flag on anything that has already mapped some pages, which avoids
child processes being able to interfere with the parent. In addition,
move the page clearing logic into page_remove_rmap() to ensure that we
cover the full set of cases (eg, handling page migration properly).

 include/linux/mm.h                     |  6 ++++++
 include/linux/rmap.h                   |  2 +-
 include/uapi/asm-generic/mman-common.h |  2 ++
 kernel/events/uprobes.c                |  2 +-
 mm/huge_memory.c                       | 12 ++++++------
 mm/hugetlb.c                           |  4 ++--
 mm/khugepaged.c                        |  2 +-
 mm/ksm.c                               |  2 +-
 mm/madvise.c                           | 25 +++++++++++++++++++++++++
 mm/memory.c                            |  6 +++---
 mm/migrate.c                           |  4 ++--
 mm/rmap.c                              |  9 +++++++--
 12 files changed, 57 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6b10c21630f5..64bdab679275 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -257,6 +257,8 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
 #define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
+
+#define VM_WIPEONRELEASE BIT(37)       /* Clear pages when releasing them */
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -298,6 +300,10 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_GROWSUP	VM_NONE
 #endif
 
+#ifndef VM_WIPEONRELEASE
+# define VM_WIPEONRELEASE VM_NONE
+#endif
+
 /* Bits set in the VMA until the stack is in its final location */
 #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
 
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 988d176472df..abb47d623edd 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -177,7 +177,7 @@ void do_page_add_anon_rmap(struct page *, struct vm_area_struct *,
 void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
 		unsigned long, bool);
 void page_add_file_rmap(struct page *, bool);
-void page_remove_rmap(struct page *, bool);
+void page_remove_rmap(struct page *, struct vm_area_struct *, bool);
 
 void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
 			    unsigned long);
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index abd238d0f7a4..82dfff4a8e3d 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -64,6 +64,8 @@
 #define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
 #define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
 
+#define MADV_WIPEONRELEASE 20
+#define MADV_DONTWIPEONRELEASE 21
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c5cde87329c7..2230a1717fe3 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -196,7 +196,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	set_pte_at_notify(mm, addr, pvmw.pte,
 			mk_pte(new_page, vma->vm_page_prot));
 
-	page_remove_rmap(old_page, false);
+	page_remove_rmap(old_page, vma, false);
 	if (!page_mapped(old_page))
 		try_to_free_swap(old_page);
 	page_vma_mapped_walk_done(&pvmw);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 165ea46bf149..1ad6ee5857b7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1260,7 +1260,7 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
 
 	smp_wmb(); /* make pte visible before pmd */
 	pmd_populate(vma->vm_mm, vmf->pmd, pgtable);
-	page_remove_rmap(page, true);
+	page_remove_rmap(page, vma, true);
 	spin_unlock(vmf->ptl);
 
 	/*
@@ -1410,7 +1410,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 			add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
 		} else {
 			VM_BUG_ON_PAGE(!PageHead(page), page);
-			page_remove_rmap(page, true);
+			page_remove_rmap(page, vma, true);
 			put_page(page);
 		}
 		ret |= VM_FAULT_WRITE;
@@ -1783,7 +1783,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 		if (pmd_present(orig_pmd)) {
 			page = pmd_page(orig_pmd);
-			page_remove_rmap(page, true);
+			page_remove_rmap(page, vma, true);
 			VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
 			VM_BUG_ON_PAGE(!PageHead(page), page);
 		} else if (thp_migration_supported()) {
@@ -2146,7 +2146,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 			set_page_dirty(page);
 		if (!PageReferenced(page) && pmd_young(_pmd))
 			SetPageReferenced(page);
-		page_remove_rmap(page, true);
+		page_remove_rmap(page, vma, true);
 		put_page(page);
 		add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR);
 		return;
@@ -2266,7 +2266,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 
 	if (freeze) {
 		for (i = 0; i < HPAGE_PMD_NR; i++) {
-			page_remove_rmap(page + i, false);
+			page_remove_rmap(page + i, vma, false);
 			put_page(page + i);
 		}
 	}
@@ -2954,7 +2954,7 @@ void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
 	if (pmd_soft_dirty(pmdval))
 		pmdswp = pmd_swp_mksoft_dirty(pmdswp);
 	set_pmd_at(mm, address, pvmw->pmd, pmdswp);
-	page_remove_rmap(page, true);
+	page_remove_rmap(page, vma, true);
 	put_page(page);
 }
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6cdc7b2d9100..1df046525861 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3419,7 +3419,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			set_page_dirty(page);
 
 		hugetlb_count_sub(pages_per_huge_page(h), mm);
-		page_remove_rmap(page, true);
+		page_remove_rmap(page, vma, true);
 
 		spin_unlock(ptl);
 		tlb_remove_page_size(tlb, page, huge_page_size(h));
@@ -3643,7 +3643,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 		mmu_notifier_invalidate_range(mm, range.start, range.end);
 		set_huge_pte_at(mm, haddr, ptep,
 				make_huge_pte(vma, new_page, 1));
-		page_remove_rmap(old_page, true);
+		page_remove_rmap(old_page, vma, true);
 		hugepage_add_new_anon_rmap(new_page, vma, haddr);
 		set_page_huge_active(new_page);
 		/* Make the old page be freed below */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 449044378782..20df74dfd954 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -673,7 +673,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
 			 * superfluous.
 			 */
 			pte_clear(vma->vm_mm, address, _pte);
-			page_remove_rmap(src_page, false);
+			page_remove_rmap(src_page, vma, false);
 			spin_unlock(ptl);
 			free_page_and_swap_cache(src_page);
 		}
diff --git a/mm/ksm.c b/mm/ksm.c
index fc64874dc6f4..280705f65af7 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1193,7 +1193,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
 	ptep_clear_flush(vma, addr, ptep);
 	set_pte_at_notify(mm, addr, ptep, newpte);
 
-	page_remove_rmap(page, false);
+	page_remove_rmap(page, vma, false);
 	if (!page_mapped(page))
 		try_to_free_swap(page);
 	put_page(page);
diff --git a/mm/madvise.c b/mm/madvise.c
index 21a7881a2db4..2a6e616e5c0d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -92,6 +92,26 @@ static long madvise_behavior(struct vm_area_struct *vma,
 	case MADV_KEEPONFORK:
 		new_flags &= ~VM_WIPEONFORK;
 		break;
+	case MADV_WIPEONRELEASE:
+		/*
+		 * MADV_WIPEONRELEASE is only supported on as-yet unallocated
+		 * anonymous memory.
+		 */
+		if (VM_WIPEONRELEASE == 0 || vma->vm_file || vma->anon_vma ||
+		    vma->vm_flags & VM_SHARED) {
+			error = -EINVAL;
+			goto out;
+		}
+
+		new_flags |= VM_WIPEONRELEASE;
+		break;
+	case MADV_DONTWIPEONRELEASE:
+		if (VM_WIPEONRELEASE == 0) {
+			error = -EINVAL;
+			goto out;
+		}
+		new_flags &= ~VM_WIPEONRELEASE;
+		break;
 	case MADV_DONTDUMP:
 		new_flags |= VM_DONTDUMP;
 		break;
@@ -727,6 +747,8 @@ madvise_behavior_valid(int behavior)
 	case MADV_DODUMP:
 	case MADV_WIPEONFORK:
 	case MADV_KEEPONFORK:
+	case MADV_WIPEONRELEASE:
+	case MADV_DONTWIPEONRELEASE:
 #ifdef CONFIG_MEMORY_FAILURE
 	case MADV_SOFT_OFFLINE:
 	case MADV_HWPOISON:
@@ -785,6 +807,9 @@ madvise_behavior_valid(int behavior)
  *  MADV_DONTDUMP - the application wants to prevent pages in the given range
  *		from being included in its core dump.
  *  MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump.
+ *  MADV_WIPEONRELEASE - clear the contents of the memory after the last
+ *		reference to it has been released
+ *  MADV_DONTWIPEONRELEASE - cancel MADV_WIPEONRELEASE
  *
  * return values:
  *  zero    - success
diff --git a/mm/memory.c b/mm/memory.c
index ab650c21bccd..dd9555bb9aec 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1088,7 +1088,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 					mark_page_accessed(page);
 			}
 			rss[mm_counter(page)]--;
-			page_remove_rmap(page, false);
+			page_remove_rmap(page, vma, false);
 			if (unlikely(page_mapcount(page) < 0))
 				print_bad_pte(vma, addr, ptent, page);
 			if (unlikely(__tlb_remove_page(tlb, page))) {
@@ -1116,7 +1116,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 
 			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 			rss[mm_counter(page)]--;
-			page_remove_rmap(page, false);
+			page_remove_rmap(page, vma, false);
 			put_page(page);
 			continue;
 		}
@@ -2340,7 +2340,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 			 * mapcount is visible. So transitively, TLBs to
 			 * old page will be flushed before it can be reused.
 			 */
-			page_remove_rmap(old_page, false);
+			page_remove_rmap(old_page, vma, false);
 		}
 
 		/* Free the old page.. */
diff --git a/mm/migrate.c b/mm/migrate.c
index 663a5449367a..5d3437a6541d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2083,7 +2083,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 
 	page_ref_unfreeze(page, 2);
 	mlock_migrate_page(new_page, page);
-	page_remove_rmap(page, true);
+	page_remove_rmap(page, vma, true);
 	set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED);
 
 	spin_unlock(ptl);
@@ -2313,7 +2313,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 			 * drop page refcount. Page won't be freed, as we took
 			 * a reference just above.
 			 */
-			page_remove_rmap(page, false);
+			page_remove_rmap(page, vma, false);
 			put_page(page);
 
 			if (pte_present(pte))
diff --git a/mm/rmap.c b/mm/rmap.c
index b30c7c71d1d9..46dc9946a516 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1292,11 +1292,13 @@ static void page_remove_anon_compound_rmap(struct page *page)
 /**
  * page_remove_rmap - take down pte mapping from a page
  * @page:	page to remove mapping from
+ * @vma:	VMA the page belongs to
  * @compound:	uncharge the page as compound or small page
  *
  * The caller needs to hold the pte lock.
  */
-void page_remove_rmap(struct page *page, bool compound)
+void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
+		      bool compound)
 {
 	if (!PageAnon(page))
 		return page_remove_file_rmap(page, compound);
@@ -1321,6 +1323,9 @@ void page_remove_rmap(struct page *page, bool compound)
 	if (PageTransCompound(page))
 		deferred_split_huge_page(compound_head(page));
 
+	if (unlikely(vma->vm_flags & VM_WIPEONRELEASE))
+		clear_highpage(page);
+
 	/*
 	 * It would be tidy to reset the PageAnon mapping here,
 	 * but that might overwrite a racing page_add_anon_rmap
@@ -1652,7 +1657,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		 *
 		 * See Documentation/vm/mmu_notifier.rst
 		 */
-		page_remove_rmap(subpage, PageHuge(page));
+		page_remove_rmap(subpage, vma, PageHuge(page));
 		put_page(page);
 	}
 
-- 
2.21.0.593.g511ec345e18-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-25 20:39               ` Matthew Garrett
@ 2019-04-26  5:25                 ` Michal Hocko
  2019-04-26 18:08                   ` Matthew Garrett
  0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2019-04-26  5:25 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-mm, Linux Kernel Mailing List, Linux API

On Thu 25-04-19 13:39:01, Matthew Garrett wrote:
> On Thu, Apr 25, 2019 at 5:37 AM Michal Hocko <mhocko@kernel.org> wrote:
> > Besides that you inherently assume that the user would do mlock because
> > you do not try to wipe the swap content. Is this intentional?
> 
> Yes, given MADV_DONTDUMP doesn't imply mlock I thought it'd be more
> consistent to keep those independent.

Do we want to fail madvise call on VMAs that are not mlocked then? What
if the munlock happens later after the madvise is called?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-25 12:42             ` Jann Horn
  2019-04-25 20:43               ` Matthew Garrett
@ 2019-04-26  5:31               ` Michal Hocko
  2019-04-26 13:33                 ` Jann Horn
  1 sibling, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2019-04-26  5:31 UTC (permalink / raw)
  To: Jann Horn
  Cc: Matthew Garrett, Linux-MM, kernel list, Matthew Garrett, Linux API

On Thu 25-04-19 14:42:52, Jann Horn wrote:
> On Thu, Apr 25, 2019 at 2:14 PM Michal Hocko <mhocko@kernel.org> wrote:
> [...]
> > On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
> > > From: Matthew Garrett <mjg59@google.com>
> > >
> > > Applications that hold secrets and wish to avoid them leaking can use
> > > mlock() to prevent the page from being pushed out to swap and
> > > MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> > > can also use atexit() handlers to overwrite secrets on application exit.
> > > However, if an attacker can reboot the system into another OS, they can
> > > dump the contents of RAM and extract secrets. We can avoid this by setting
> > > CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
> > > firmware wipe the contents of RAM before booting another OS, but this means
> > > rebooting takes a *long* time - the expected behaviour is for a clean
> > > shutdown to remove the request after scrubbing secrets from RAM in order to
> > > avoid this.
> > >
> > > Unfortunately, if an application exits uncleanly, its secrets may still be
> > > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > > killer decides to kill a process holding secrets, we're not going to be able
> > > to avoid that), so this patch adds a new flag to madvise() to allow userland
> > > to request that the kernel clear the covered pages whenever the page
> > > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > > will only work on 64-bit systems.
> [...]
> > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > index 21a7881a2db4..989c2fde15cf 100644
> > > --- a/mm/madvise.c
> > > +++ b/mm/madvise.c
> > > @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct *vma,
> > >       case MADV_KEEPONFORK:
> > >               new_flags &= ~VM_WIPEONFORK;
> > >               break;
> > > +     case MADV_WIPEONRELEASE:
> > > +             /* MADV_WIPEONRELEASE is only supported on anonymous memory. */
> > > +             if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
> > > +                 vma->vm_flags & VM_SHARED) {
> > > +                     error = -EINVAL;
> > > +                     goto out;
> > > +             }
> > > +             new_flags |= VM_WIPEONRELEASE;
> > > +             break;
> 
> An interesting effect of this is that it will be possible to set this
> on a CoW anon VMA in a fork() child, and then the semantics in the
> parent will be subtly different - e.g. if the parent vmsplice()d a
> CoWed page into a pipe, then forked an unprivileged child, the child

Maybe a stupid question. How do you fork an unprivileged child (without
exec)? Child would have to drop priviledges on its own, no?

> set MADV_WIPEONRELEASE on its VMA, the parent died somehow, and then
> the child died, the page in the pipe would be zeroed out. A child
> should not be able to affect its parent like this, I think. If this
> was an mmap() flag instead of a madvise() command, that issue could be
> avoided.

With a VMA flag underneath, I think you can do an early CoW during fork
to prevent from that.

> Alternatively, if adding more mmap() flags doesn't work,
> perhaps you could scan the VMA and ensure that it contains no pages
> yet, or something like that?
> 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index ab650c21bccd..ff78b527660e 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -1091,6 +1091,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> > >                       page_remove_rmap(page, false);
> > >                       if (unlikely(page_mapcount(page) < 0))
> > >                               print_bad_pte(vma, addr, ptent, page);
> > > +                     if (unlikely(vma->vm_flags & VM_WIPEONRELEASE) &&
> > > +                         page_mapcount(page) == 0)
> > > +                             clear_highpage(page);
> > >                       if (unlikely(__tlb_remove_page(tlb, page))) {
> > >                               force_flush = 1;
> > >                               addr += PAGE_SIZE;
> 
> Should something like this perhaps be added in page_remove_rmap()
> instead? That's where the mapcount is decremented; and looking at
> other callers of page_remove_rmap(), in particular the following ones
> look interesting:

Well spotted!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V3] mm: Allow userland to request that the kernel clear memory on release
  2019-04-25 22:58         ` [PATCH V3] " Matthew Garrett
@ 2019-04-26  7:45           ` Vlastimil Babka
  2019-04-26 18:10             ` Matthew Garrett
  0 siblings, 1 reply; 29+ messages in thread
From: Vlastimil Babka @ 2019-04-26  7:45 UTC (permalink / raw)
  To: Matthew Garrett, linux-mm; +Cc: linux-kernel, linux-api, Matthew Garrett

On 4/26/19 12:58 AM, Matthew Garrett wrote:
> From: Matthew Garrett <mjg59@google.com>
> 
> Applications that hold secrets and wish to avoid them leaking can use
> mlock() to prevent the page from being pushed out to swap and
> MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> can also use atexit() handlers to overwrite secrets on application exit.
> However, if an attacker can reboot the system into another OS, they can
> dump the contents of RAM and extract secrets. We can avoid this by setting
> CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
> firmware wipe the contents of RAM before booting another OS, but this means
> rebooting takes a *long* time - the expected behaviour is for a clean
> shutdown to remove the request after scrubbing secrets from RAM in order to
> avoid this.
> 
> Unfortunately, if an application exits uncleanly, its secrets may still be
> present in RAM. This can't be easily fixed in userland (eg, if the OOM
> killer decides to kill a process holding secrets, we're not going to be able
> to avoid that), so this patch adds a new flag to madvise() to allow userland
> to request that the kernel clear the covered pages whenever the page
> map count hits zero. Since vm_flags is already full on 32-bit, it
> will only work on 64-bit systems. This is currently only permitted on
> private mappings that have not yet been populated in order to simplify
> implementation, which should suffice for the envisaged use cases. We can
> extend the behaviour later if we come up with a robust set of semantics.
> 
> Signed-off-by: Matthew Garrett <mjg59@google.com>
> ---
> 
> Updated based on feedback from Jann - for now let's just prevent setting
> the flag on anything that has already mapped some pages, which avoids
> child processes being able to interfere with the parent. In addition,

That makes the API quite tricky and different from existing madvise()
modes that don't care. One would for example have to call
madvise(MADV_WIPEONRELEASE) before mlock(), otherwise mlock() would
fault the pages in (unless MLOCK_ONFAULT). As such it really looks like
a mmap() flag, but that's less flexible.

How bout just doing the CoW on any such pre-existing pages as part of
the madvise(MADV_WIPEONRELEASE) call?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-26  5:31               ` Michal Hocko
@ 2019-04-26 13:33                 ` Jann Horn
  2019-04-26 13:47                   ` Michal Hocko
  0 siblings, 1 reply; 29+ messages in thread
From: Jann Horn @ 2019-04-26 13:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Garrett, Linux-MM, kernel list, Matthew Garrett, Linux API

On Fri, Apr 26, 2019 at 7:31 AM Michal Hocko <mhocko@kernel.org> wrote:
> On Thu 25-04-19 14:42:52, Jann Horn wrote:
> > On Thu, Apr 25, 2019 at 2:14 PM Michal Hocko <mhocko@kernel.org> wrote:
> > [...]
> > > On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
> > > > From: Matthew Garrett <mjg59@google.com>
> > > >
> > > > Applications that hold secrets and wish to avoid them leaking can use
> > > > mlock() to prevent the page from being pushed out to swap and
> > > > MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> > > > can also use atexit() handlers to overwrite secrets on application exit.
> > > > However, if an attacker can reboot the system into another OS, they can
> > > > dump the contents of RAM and extract secrets. We can avoid this by setting
> > > > CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
> > > > firmware wipe the contents of RAM before booting another OS, but this means
> > > > rebooting takes a *long* time - the expected behaviour is for a clean
> > > > shutdown to remove the request after scrubbing secrets from RAM in order to
> > > > avoid this.
> > > >
> > > > Unfortunately, if an application exits uncleanly, its secrets may still be
> > > > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > > > killer decides to kill a process holding secrets, we're not going to be able
> > > > to avoid that), so this patch adds a new flag to madvise() to allow userland
> > > > to request that the kernel clear the covered pages whenever the page
> > > > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > > > will only work on 64-bit systems.
> > [...]
> > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > index 21a7881a2db4..989c2fde15cf 100644
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct *vma,
> > > >       case MADV_KEEPONFORK:
> > > >               new_flags &= ~VM_WIPEONFORK;
> > > >               break;
> > > > +     case MADV_WIPEONRELEASE:
> > > > +             /* MADV_WIPEONRELEASE is only supported on anonymous memory. */
> > > > +             if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
> > > > +                 vma->vm_flags & VM_SHARED) {
> > > > +                     error = -EINVAL;
> > > > +                     goto out;
> > > > +             }
> > > > +             new_flags |= VM_WIPEONRELEASE;
> > > > +             break;
> >
> > An interesting effect of this is that it will be possible to set this
> > on a CoW anon VMA in a fork() child, and then the semantics in the
> > parent will be subtly different - e.g. if the parent vmsplice()d a
> > CoWed page into a pipe, then forked an unprivileged child, the child
>
> Maybe a stupid question. How do you fork an unprivileged child (without
> exec)? Child would have to drop priviledges on its own, no?

Sorry, yes, that's what I meant.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-26 13:33                 ` Jann Horn
@ 2019-04-26 13:47                   ` Michal Hocko
  2019-04-26 14:03                     ` Jann Horn
  0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2019-04-26 13:47 UTC (permalink / raw)
  To: Jann Horn
  Cc: Matthew Garrett, Linux-MM, kernel list, Matthew Garrett, Linux API

On Fri 26-04-19 15:33:25, Jann Horn wrote:
> On Fri, Apr 26, 2019 at 7:31 AM Michal Hocko <mhocko@kernel.org> wrote:
> > On Thu 25-04-19 14:42:52, Jann Horn wrote:
> > > On Thu, Apr 25, 2019 at 2:14 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > [...]
> > > > On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
> > > > > From: Matthew Garrett <mjg59@google.com>
> > > > >
> > > > > Applications that hold secrets and wish to avoid them leaking can use
> > > > > mlock() to prevent the page from being pushed out to swap and
> > > > > MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> > > > > can also use atexit() handlers to overwrite secrets on application exit.
> > > > > However, if an attacker can reboot the system into another OS, they can
> > > > > dump the contents of RAM and extract secrets. We can avoid this by setting
> > > > > CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
> > > > > firmware wipe the contents of RAM before booting another OS, but this means
> > > > > rebooting takes a *long* time - the expected behaviour is for a clean
> > > > > shutdown to remove the request after scrubbing secrets from RAM in order to
> > > > > avoid this.
> > > > >
> > > > > Unfortunately, if an application exits uncleanly, its secrets may still be
> > > > > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > > > > killer decides to kill a process holding secrets, we're not going to be able
> > > > > to avoid that), so this patch adds a new flag to madvise() to allow userland
> > > > > to request that the kernel clear the covered pages whenever the page
> > > > > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > > > > will only work on 64-bit systems.
> > > [...]
> > > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > > index 21a7881a2db4..989c2fde15cf 100644
> > > > > --- a/mm/madvise.c
> > > > > +++ b/mm/madvise.c
> > > > > @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct *vma,
> > > > >       case MADV_KEEPONFORK:
> > > > >               new_flags &= ~VM_WIPEONFORK;
> > > > >               break;
> > > > > +     case MADV_WIPEONRELEASE:
> > > > > +             /* MADV_WIPEONRELEASE is only supported on anonymous memory. */
> > > > > +             if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
> > > > > +                 vma->vm_flags & VM_SHARED) {
> > > > > +                     error = -EINVAL;
> > > > > +                     goto out;
> > > > > +             }
> > > > > +             new_flags |= VM_WIPEONRELEASE;
> > > > > +             break;
> > >
> > > An interesting effect of this is that it will be possible to set this
> > > on a CoW anon VMA in a fork() child, and then the semantics in the
> > > parent will be subtly different - e.g. if the parent vmsplice()d a
> > > CoWed page into a pipe, then forked an unprivileged child, the child
> >
> > Maybe a stupid question. How do you fork an unprivileged child (without
> > exec)? Child would have to drop priviledges on its own, no?
> 
> Sorry, yes, that's what I meant.

But then the VMA is gone along with the flag so why does it matter?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-26 13:47                   ` Michal Hocko
@ 2019-04-26 14:03                     ` Jann Horn
  2019-04-26 14:08                       ` Michal Hocko
  0 siblings, 1 reply; 29+ messages in thread
From: Jann Horn @ 2019-04-26 14:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Garrett, Linux-MM, kernel list, Matthew Garrett, Linux API

On Fri, Apr 26, 2019 at 3:47 PM Michal Hocko <mhocko@kernel.org> wrote:
> On Fri 26-04-19 15:33:25, Jann Horn wrote:
> > On Fri, Apr 26, 2019 at 7:31 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > On Thu 25-04-19 14:42:52, Jann Horn wrote:
> > > > On Thu, Apr 25, 2019 at 2:14 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > [...]
> > > > > On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
> > > > > > From: Matthew Garrett <mjg59@google.com>
> > > > > >
> > > > > > Applications that hold secrets and wish to avoid them leaking can use
> > > > > > mlock() to prevent the page from being pushed out to swap and
> > > > > > MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> > > > > > can also use atexit() handlers to overwrite secrets on application exit.
> > > > > > However, if an attacker can reboot the system into another OS, they can
> > > > > > dump the contents of RAM and extract secrets. We can avoid this by setting
> > > > > > CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
> > > > > > firmware wipe the contents of RAM before booting another OS, but this means
> > > > > > rebooting takes a *long* time - the expected behaviour is for a clean
> > > > > > shutdown to remove the request after scrubbing secrets from RAM in order to
> > > > > > avoid this.
> > > > > >
> > > > > > Unfortunately, if an application exits uncleanly, its secrets may still be
> > > > > > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > > > > > killer decides to kill a process holding secrets, we're not going to be able
> > > > > > to avoid that), so this patch adds a new flag to madvise() to allow userland
> > > > > > to request that the kernel clear the covered pages whenever the page
> > > > > > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > > > > > will only work on 64-bit systems.
> > > > [...]
> > > > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > > > index 21a7881a2db4..989c2fde15cf 100644
> > > > > > --- a/mm/madvise.c
> > > > > > +++ b/mm/madvise.c
> > > > > > @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct *vma,
> > > > > >       case MADV_KEEPONFORK:
> > > > > >               new_flags &= ~VM_WIPEONFORK;
> > > > > >               break;
> > > > > > +     case MADV_WIPEONRELEASE:
> > > > > > +             /* MADV_WIPEONRELEASE is only supported on anonymous memory. */
> > > > > > +             if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
> > > > > > +                 vma->vm_flags & VM_SHARED) {
> > > > > > +                     error = -EINVAL;
> > > > > > +                     goto out;
> > > > > > +             }
> > > > > > +             new_flags |= VM_WIPEONRELEASE;
> > > > > > +             break;
> > > >
> > > > An interesting effect of this is that it will be possible to set this
> > > > on a CoW anon VMA in a fork() child, and then the semantics in the
> > > > parent will be subtly different - e.g. if the parent vmsplice()d a
> > > > CoWed page into a pipe, then forked an unprivileged child, the child
> > >
> > > Maybe a stupid question. How do you fork an unprivileged child (without
> > > exec)? Child would have to drop priviledges on its own, no?
> >
> > Sorry, yes, that's what I meant.
>
> But then the VMA is gone along with the flag so why does it matter?

But in theory, the page might still be used somewhere, e.g. as data in
a pipe (into which the parent wrote it) or whatever. Parent
vmsplice()s a page into a pipe, parent exits, child marks the VMA as
WIPEONRELEASE and exits, page gets wiped, someone else reads the page
from the pipe.

Yes, this is very theoretical, and you'd have to write some pretty
weird software for this to matter. But it doesn't seem clean to me to
allow a child to affect the data in e.g. a pipe that it isn't supposed
to have access to like this.

Then again, this could probably already happen, since do_wp_page()
reuses pages depending on only the mapcount, without looking at the
refcount.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-26 14:03                     ` Jann Horn
@ 2019-04-26 14:08                       ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2019-04-26 14:08 UTC (permalink / raw)
  To: Jann Horn
  Cc: Matthew Garrett, Linux-MM, kernel list, Matthew Garrett, Linux API

On Fri 26-04-19 16:03:26, Jann Horn wrote:
> On Fri, Apr 26, 2019 at 3:47 PM Michal Hocko <mhocko@kernel.org> wrote:
> > On Fri 26-04-19 15:33:25, Jann Horn wrote:
> > > On Fri, Apr 26, 2019 at 7:31 AM Michal Hocko <mhocko@kernel.org> wrote:
> > > > On Thu 25-04-19 14:42:52, Jann Horn wrote:
> > > > > On Thu, Apr 25, 2019 at 2:14 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > > [...]
> > > > > > On Wed 24-04-19 14:10:39, Matthew Garrett wrote:
> > > > > > > From: Matthew Garrett <mjg59@google.com>
> > > > > > >
> > > > > > > Applications that hold secrets and wish to avoid them leaking can use
> > > > > > > mlock() to prevent the page from being pushed out to swap and
> > > > > > > MADV_DONTDUMP to prevent it from being included in core dumps. Applications
> > > > > > > can also use atexit() handlers to overwrite secrets on application exit.
> > > > > > > However, if an attacker can reboot the system into another OS, they can
> > > > > > > dump the contents of RAM and extract secrets. We can avoid this by setting
> > > > > > > CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
> > > > > > > firmware wipe the contents of RAM before booting another OS, but this means
> > > > > > > rebooting takes a *long* time - the expected behaviour is for a clean
> > > > > > > shutdown to remove the request after scrubbing secrets from RAM in order to
> > > > > > > avoid this.
> > > > > > >
> > > > > > > Unfortunately, if an application exits uncleanly, its secrets may still be
> > > > > > > present in RAM. This can't be easily fixed in userland (eg, if the OOM
> > > > > > > killer decides to kill a process holding secrets, we're not going to be able
> > > > > > > to avoid that), so this patch adds a new flag to madvise() to allow userland
> > > > > > > to request that the kernel clear the covered pages whenever the page
> > > > > > > reference count hits zero. Since vm_flags is already full on 32-bit, it
> > > > > > > will only work on 64-bit systems.
> > > > > [...]
> > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > > > > index 21a7881a2db4..989c2fde15cf 100644
> > > > > > > --- a/mm/madvise.c
> > > > > > > +++ b/mm/madvise.c
> > > > > > > @@ -92,6 +92,22 @@ static long madvise_behavior(struct vm_area_struct *vma,
> > > > > > >       case MADV_KEEPONFORK:
> > > > > > >               new_flags &= ~VM_WIPEONFORK;
> > > > > > >               break;
> > > > > > > +     case MADV_WIPEONRELEASE:
> > > > > > > +             /* MADV_WIPEONRELEASE is only supported on anonymous memory. */
> > > > > > > +             if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
> > > > > > > +                 vma->vm_flags & VM_SHARED) {
> > > > > > > +                     error = -EINVAL;
> > > > > > > +                     goto out;
> > > > > > > +             }
> > > > > > > +             new_flags |= VM_WIPEONRELEASE;
> > > > > > > +             break;
> > > > >
> > > > > An interesting effect of this is that it will be possible to set this
> > > > > on a CoW anon VMA in a fork() child, and then the semantics in the
> > > > > parent will be subtly different - e.g. if the parent vmsplice()d a
> > > > > CoWed page into a pipe, then forked an unprivileged child, the child
> > > >
> > > > Maybe a stupid question. How do you fork an unprivileged child (without
> > > > exec)? Child would have to drop priviledges on its own, no?
> > >
> > > Sorry, yes, that's what I meant.
> >
> > But then the VMA is gone along with the flag so why does it matter?
> 
> But in theory, the page might still be used somewhere, e.g. as data in
> a pipe (into which the parent wrote it) or whatever. Parent
> vmsplice()s a page into a pipe, parent exits, child marks the VMA as
> WIPEONRELEASE and exits, page gets wiped, someone else reads the page
> from the pipe.
> 
> Yes, this is very theoretical, and you'd have to write some pretty
> weird software for this to matter. But it doesn't seem clean to me to
> allow a child to affect the data in e.g. a pipe that it isn't supposed
> to have access to like this.
> 
> Then again, this could probably already happen, since do_wp_page()
> reuses pages depending on only the mapcount, without looking at the
> refcount.

OK, now I see your point. I was confused about the unprivileged child.
You are right that this looks weird but we have traditionally trusted
child processes to not do a harm. I guess this falls down to the same
bucket. An early CoW on these mapping should solve the problem AFAICS.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-26  5:25                 ` Michal Hocko
@ 2019-04-26 18:08                   ` Matthew Garrett
  2019-04-29 21:44                     ` Michal Hocko
  0 siblings, 1 reply; 29+ messages in thread
From: Matthew Garrett @ 2019-04-26 18:08 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, Linux Kernel Mailing List, Linux API

On Thu, Apr 25, 2019 at 10:25 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 25-04-19 13:39:01, Matthew Garrett wrote:
> > Yes, given MADV_DONTDUMP doesn't imply mlock I thought it'd be more
> > consistent to keep those independent.
>
> Do we want to fail madvise call on VMAs that are not mlocked then? What
> if the munlock happens later after the madvise is called?

I'm not sure if it's strictly necessary. We already have various
combinations of features that only make sense when used together and
which can be undermined by later actions. I can see the appeal of
designing this in a way that makes it harder to misuse, but is that
worth additional implementation complexity?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V3] mm: Allow userland to request that the kernel clear memory on release
  2019-04-26  7:45           ` Vlastimil Babka
@ 2019-04-26 18:10             ` Matthew Garrett
  0 siblings, 0 replies; 29+ messages in thread
From: Matthew Garrett @ 2019-04-26 18:10 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: linux-mm, Linux Kernel Mailing List, Linux API

On Fri, Apr 26, 2019 at 12:45 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 4/26/19 12:58 AM, Matthew Garrett wrote:
> > Updated based on feedback from Jann - for now let's just prevent setting
> > the flag on anything that has already mapped some pages, which avoids
> > child processes being able to interfere with the parent. In addition,
>
> That makes the API quite tricky and different from existing madvise()
> modes that don't care. One would for example have to call
> madvise(MADV_WIPEONRELEASE) before mlock(), otherwise mlock() would
> fault the pages in (unless MLOCK_ONFAULT). As such it really looks like
> a mmap() flag, but that's less flexible.
>
> How bout just doing the CoW on any such pre-existing pages as part of
> the madvise(MADV_WIPEONRELEASE) call?

I'll look into the easiest way to do that.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V4] mm: Allow userland to request that the kernel clear memory on release
  2019-04-24 20:22       ` Matthew Garrett
  2019-04-24 21:10         ` [PATCH V2] " Matthew Garrett
  2019-04-25 22:58         ` [PATCH V3] " Matthew Garrett
@ 2019-04-29 19:36         ` Matthew Garrett
  2019-06-05 18:26           ` Matthew Garrett
  2 siblings, 1 reply; 29+ messages in thread
From: Matthew Garrett @ 2019-04-29 19:36 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Matthew Garrett

From: Matthew Garrett <mjg59@google.com>

Applications that hold secrets and wish to avoid them leaking can use
mlock() to prevent the page from being pushed out to swap and
MADV_DONTDUMP to prevent it from being included in core dumps. Applications
can also use atexit() handlers to overwrite secrets on application exit.
However, if an attacker can reboot the system into another OS, they can
dump the contents of RAM and extract secrets. We can avoid this by setting
CONFIG_RESET_ATTACK_MITIGATION on UEFI systems in order to request that the
firmware wipe the contents of RAM before booting another OS, but this means
rebooting takes a *long* time - the expected behaviour is for a clean
shutdown to remove the request after scrubbing secrets from RAM in order to
avoid this.

Unfortunately, if an application exits uncleanly, its secrets may still be
present in RAM. This can't be easily fixed in userland (eg, if the OOM
killer decides to kill a process holding secrets, we're not going to be able
to avoid that), so this patch adds a new flag (MADV_WIPEONRELEASE) to
madvise() to allow userland to request that the kernel clear the covered
pages on process exit (clean or otherwise).

If a process clone()s, pages may end up being simultaneously owned by
multiple processes. In this case the parent process may exit before the
child, in which case the pages would not be cleared because the map count
would still be non-zero. This could result in surprising outcomes, so
instead all CoW pages are forcibly copied on fork() and madvise(). Child
processes will still receive copies of the secrets and must be trusted to
also ensure that they are wiped - this can be avoided if the parent also
sets MADV_WIPEONFORK on these regions to ensure that the child receives
clean pages.

Signed-off-by: Matthew Garrett <mjg59@google.com>
---

Further updates based on feedback - we now forcibly copy any CoW pages
in MADV_WIPEONRELEASE areas, and I've moved the wiping logic to
page_remove_rmap() and page_remove_anon_compound_rmap(). I know more
about mm than I did when I started, which means I'm probably more
dangerous than I was this time last week - please do feel free to point
out how I've screwed up.

 include/linux/mm.h                     |  7 +++++
 include/linux/rmap.h                   |  2 +-
 include/uapi/asm-generic/mman-common.h |  2 ++
 kernel/events/uprobes.c                |  2 +-
 kernel/fork.c                          | 10 +++++++
 mm/gup.c                               | 37 ++++++++++++++++++++++++++
 mm/huge_memory.c                       | 12 ++++-----
 mm/hugetlb.c                           |  4 +--
 mm/khugepaged.c                        |  2 +-
 mm/ksm.c                               |  2 +-
 mm/madvise.c                           | 28 +++++++++++++++++++
 mm/memory.c                            |  6 ++---
 mm/migrate.c                           |  4 +--
 mm/rmap.c                              | 28 +++++++++++++++----
 14 files changed, 124 insertions(+), 22 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6b10c21630f5..7841dd282961 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -257,6 +257,8 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_HIGH_ARCH_2	BIT(VM_HIGH_ARCH_BIT_2)
 #define VM_HIGH_ARCH_3	BIT(VM_HIGH_ARCH_BIT_3)
 #define VM_HIGH_ARCH_4	BIT(VM_HIGH_ARCH_BIT_4)
+
+#define VM_WIPEONRELEASE BIT(37)       /* Clear pages when releasing them */
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
@@ -298,6 +300,10 @@ extern unsigned int kobjsize(const void *objp);
 # define VM_GROWSUP	VM_NONE
 #endif
 
+#ifndef VM_WIPEONRELEASE
+# define VM_WIPEONRELEASE VM_NONE
+#endif
+
 /* Bits set in the VMA until the stack is in its final location */
 #define VM_STACK_INCOMPLETE_SETUP	(VM_RAND_READ | VM_SEQ_READ)
 
@@ -1449,6 +1455,7 @@ int generic_error_remove_page(struct address_space *mapping, struct page *page);
 int invalidate_inode_page(struct page *page);
 
 #ifdef CONFIG_MMU
+extern int trigger_cow(unsigned long start, unsigned long end);
 extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma,
 			unsigned long address, unsigned int flags);
 extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 988d176472df..abb47d623edd 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -177,7 +177,7 @@ void do_page_add_anon_rmap(struct page *, struct vm_area_struct *,
 void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
 		unsigned long, bool);
 void page_add_file_rmap(struct page *, bool);
-void page_remove_rmap(struct page *, bool);
+void page_remove_rmap(struct page *, struct vm_area_struct *, bool);
 
 void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
 			    unsigned long);
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index abd238d0f7a4..82dfff4a8e3d 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -64,6 +64,8 @@
 #define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
 #define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
 
+#define MADV_WIPEONRELEASE 20
+#define MADV_DONTWIPEONRELEASE 21
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c5cde87329c7..2230a1717fe3 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -196,7 +196,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	set_pte_at_notify(mm, addr, pvmw.pte,
 			mk_pte(new_page, vma->vm_page_prot));
 
-	page_remove_rmap(old_page, false);
+	page_remove_rmap(old_page, vma, false);
 	if (!page_mapped(old_page))
 		try_to_free_swap(old_page);
 	page_vma_mapped_walk_done(&pvmw);
diff --git a/kernel/fork.c b/kernel/fork.c
index 9dcd18aa210b..04fe45966042 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -584,6 +584,16 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		if (!(tmp->vm_flags & VM_WIPEONFORK))
 			retval = copy_page_range(mm, oldmm, mpnt);
 
+		/*
+		 * If VM_WIPEONRELEASE is set and VM_WIPEONFORK isn't, ensure
+		 * that the any mapped pages are copied rather than being
+		 * left as CoW - this avoids situations where a parent
+		 * has pages marked as WIPEONRELEASE and a child doesn't
+		 */
+		if (unlikely((tmp->vm_flags & (VM_WIPEONRELEASE|VM_WIPEONFORK))
+			     == VM_WIPEONRELEASE))
+			trigger_cow(tmp->vm_start, tmp->vm_end);
+
 		if (tmp->vm_ops && tmp->vm_ops->open)
 			tmp->vm_ops->open(tmp);
 
diff --git a/mm/gup.c b/mm/gup.c
index 91819b8ad9cc..bd89795ceaf5 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1499,6 +1499,43 @@ struct page *get_dump_page(unsigned long addr)
 }
 #endif /* CONFIG_ELF_CORE */
 
+static int trigger_cow_pte_entry(pte_t *pte, unsigned long addr,
+				 unsigned long next, struct mm_walk *walk)
+{
+	int ret = __get_user_pages(current, current->mm, addr, 1,
+				   FOLL_WRITE | FOLL_TOUCH, NULL, NULL, NULL);
+	if (ret != 1)
+		return ret;
+	return 0;
+}
+
+static int trigger_cow_hugetlb_range(pte_t *pte, unsigned long hmask,
+				     unsigned long addr, unsigned long end,
+				     struct mm_walk *walk)
+{
+#ifdef CONFIG_HUGETLB_PAGE
+	int ret = __get_user_pages(current, current->mm, addr, 1,
+				   FOLL_WRITE | FOLL_TOUCH, NULL, NULL, NULL);
+
+	if (ret != 1)
+		return ret;
+#else
+	BUG();
+#endif
+	return 0;
+}
+
+int trigger_cow(unsigned long start, unsigned long end)
+{
+	struct mm_walk cow_walk = {
+		.pte_entry = trigger_cow_pte_entry,
+		.hugetlb_entry = trigger_cow_hugetlb_range,
+		.mm = current->mm,
+	};
+
+	return walk_page_range(start, end, &cow_walk);
+}
+
 /*
  * Generic Fast GUP
  *
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 165ea46bf149..1ad6ee5857b7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1260,7 +1260,7 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
 
 	smp_wmb(); /* make pte visible before pmd */
 	pmd_populate(vma->vm_mm, vmf->pmd, pgtable);
-	page_remove_rmap(page, true);
+	page_remove_rmap(page, vma, true);
 	spin_unlock(vmf->ptl);
 
 	/*
@@ -1410,7 +1410,7 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
 			add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
 		} else {
 			VM_BUG_ON_PAGE(!PageHead(page), page);
-			page_remove_rmap(page, true);
+			page_remove_rmap(page, vma, true);
 			put_page(page);
 		}
 		ret |= VM_FAULT_WRITE;
@@ -1783,7 +1783,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 		if (pmd_present(orig_pmd)) {
 			page = pmd_page(orig_pmd);
-			page_remove_rmap(page, true);
+			page_remove_rmap(page, vma, true);
 			VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
 			VM_BUG_ON_PAGE(!PageHead(page), page);
 		} else if (thp_migration_supported()) {
@@ -2146,7 +2146,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 			set_page_dirty(page);
 		if (!PageReferenced(page) && pmd_young(_pmd))
 			SetPageReferenced(page);
-		page_remove_rmap(page, true);
+		page_remove_rmap(page, vma, true);
 		put_page(page);
 		add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR);
 		return;
@@ -2266,7 +2266,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 
 	if (freeze) {
 		for (i = 0; i < HPAGE_PMD_NR; i++) {
-			page_remove_rmap(page + i, false);
+			page_remove_rmap(page + i, vma, false);
 			put_page(page + i);
 		}
 	}
@@ -2954,7 +2954,7 @@ void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
 	if (pmd_soft_dirty(pmdval))
 		pmdswp = pmd_swp_mksoft_dirty(pmdswp);
 	set_pmd_at(mm, address, pvmw->pmd, pmdswp);
-	page_remove_rmap(page, true);
+	page_remove_rmap(page, vma, true);
 	put_page(page);
 }
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6cdc7b2d9100..1df046525861 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3419,7 +3419,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			set_page_dirty(page);
 
 		hugetlb_count_sub(pages_per_huge_page(h), mm);
-		page_remove_rmap(page, true);
+		page_remove_rmap(page, vma, true);
 
 		spin_unlock(ptl);
 		tlb_remove_page_size(tlb, page, huge_page_size(h));
@@ -3643,7 +3643,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
 		mmu_notifier_invalidate_range(mm, range.start, range.end);
 		set_huge_pte_at(mm, haddr, ptep,
 				make_huge_pte(vma, new_page, 1));
-		page_remove_rmap(old_page, true);
+		page_remove_rmap(old_page, vma, true);
 		hugepage_add_new_anon_rmap(new_page, vma, haddr);
 		set_page_huge_active(new_page);
 		/* Make the old page be freed below */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 449044378782..20df74dfd954 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -673,7 +673,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
 			 * superfluous.
 			 */
 			pte_clear(vma->vm_mm, address, _pte);
-			page_remove_rmap(src_page, false);
+			page_remove_rmap(src_page, vma, false);
 			spin_unlock(ptl);
 			free_page_and_swap_cache(src_page);
 		}
diff --git a/mm/ksm.c b/mm/ksm.c
index fc64874dc6f4..280705f65af7 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1193,7 +1193,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
 	ptep_clear_flush(vma, addr, ptep);
 	set_pte_at_notify(mm, addr, ptep, newpte);
 
-	page_remove_rmap(page, false);
+	page_remove_rmap(page, vma, false);
 	if (!page_mapped(page))
 		try_to_free_swap(page);
 	put_page(page);
diff --git a/mm/madvise.c b/mm/madvise.c
index 21a7881a2db4..3e133496c801 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -92,6 +92,29 @@ static long madvise_behavior(struct vm_area_struct *vma,
 	case MADV_KEEPONFORK:
 		new_flags &= ~VM_WIPEONFORK;
 		break;
+	case MADV_WIPEONRELEASE:
+		/* MADV_WIPEONRELEASE is only supported on anonymous memory. */
+		if (VM_WIPEONRELEASE == 0 || vma->vm_file ||
+		    vma->vm_flags & VM_SHARED) {
+			error = -EINVAL;
+			goto out;
+		}
+
+		new_flags |= VM_WIPEONRELEASE;
+		/*
+		 * If the VMA already has backing pages that are mapped by
+		 * multiple processes, ensure that they're CoWed
+		 */
+		if (vma->anon_vma)
+			trigger_cow(start, end);
+		break;
+	case MADV_DONTWIPEONRELEASE:
+		if (VM_WIPEONRELEASE == 0) {
+			error = -EINVAL;
+			goto out;
+		}
+		new_flags &= ~VM_WIPEONRELEASE;
+		break;
 	case MADV_DONTDUMP:
 		new_flags |= VM_DONTDUMP;
 		break;
@@ -727,6 +750,8 @@ madvise_behavior_valid(int behavior)
 	case MADV_DODUMP:
 	case MADV_WIPEONFORK:
 	case MADV_KEEPONFORK:
+	case MADV_WIPEONRELEASE:
+	case MADV_DONTWIPEONRELEASE:
 #ifdef CONFIG_MEMORY_FAILURE
 	case MADV_SOFT_OFFLINE:
 	case MADV_HWPOISON:
@@ -785,6 +810,9 @@ madvise_behavior_valid(int behavior)
  *  MADV_DONTDUMP - the application wants to prevent pages in the given range
  *		from being included in its core dump.
  *  MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump.
+ *  MADV_WIPEONRELEASE - clear the contents of the memory after the last
+ *		reference to it has been released
+ *  MADV_DONTWIPEONRELEASE - cancel MADV_WIPEONRELEASE
  *
  * return values:
  *  zero    - success
diff --git a/mm/memory.c b/mm/memory.c
index ab650c21bccd..dd9555bb9aec 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1088,7 +1088,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 					mark_page_accessed(page);
 			}
 			rss[mm_counter(page)]--;
-			page_remove_rmap(page, false);
+			page_remove_rmap(page, vma, false);
 			if (unlikely(page_mapcount(page) < 0))
 				print_bad_pte(vma, addr, ptent, page);
 			if (unlikely(__tlb_remove_page(tlb, page))) {
@@ -1116,7 +1116,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 
 			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
 			rss[mm_counter(page)]--;
-			page_remove_rmap(page, false);
+			page_remove_rmap(page, vma, false);
 			put_page(page);
 			continue;
 		}
@@ -2340,7 +2340,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 			 * mapcount is visible. So transitively, TLBs to
 			 * old page will be flushed before it can be reused.
 			 */
-			page_remove_rmap(old_page, false);
+			page_remove_rmap(old_page, vma, false);
 		}
 
 		/* Free the old page.. */
diff --git a/mm/migrate.c b/mm/migrate.c
index 663a5449367a..5d3437a6541d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2083,7 +2083,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 
 	page_ref_unfreeze(page, 2);
 	mlock_migrate_page(new_page, page);
-	page_remove_rmap(page, true);
+	page_remove_rmap(page, vma, true);
 	set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED);
 
 	spin_unlock(ptl);
@@ -2313,7 +2313,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 			 * drop page refcount. Page won't be freed, as we took
 			 * a reference just above.
 			 */
-			page_remove_rmap(page, false);
+			page_remove_rmap(page, vma, false);
 			put_page(page);
 
 			if (pte_present(pte))
diff --git a/mm/rmap.c b/mm/rmap.c
index b30c7c71d1d9..f6f4e52299ed 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1251,13 +1251,19 @@ static void page_remove_file_rmap(struct page *page, bool compound)
 	unlock_page_memcg(page);
 }
 
-static void page_remove_anon_compound_rmap(struct page *page)
+static void page_remove_anon_compound_rmap(struct vm_area_struct *vma,
+					   struct page *page)
 {
 	int i, nr;
 
 	if (!atomic_add_negative(-1, compound_mapcount_ptr(page)))
 		return;
 
+	if (unlikely(vma->vm_flags & VM_WIPEONRELEASE))
+		for (i = 0; i < HPAGE_PMD_NR; i++)
+			if (page_mapcount(page) == 0)
+				clear_highpage(&page[i]);
+
 	/* Hugepages are not counted in NR_ANON_PAGES for now. */
 	if (unlikely(PageHuge(page)))
 		return;
@@ -1273,8 +1279,15 @@ static void page_remove_anon_compound_rmap(struct page *page)
 		 * themi are still mapped.
 		 */
 		for (i = 0, nr = 0; i < HPAGE_PMD_NR; i++) {
-			if (atomic_add_negative(-1, &page[i]._mapcount))
+			if (atomic_add_negative(-1, &page[i]._mapcount)) {
 				nr++;
+				/*
+				 * These will have been missed in the first
+				 * pass, so clear them now
+				 */
+				if (unlikely(vma->vm_flags & VM_WIPEONRELEASE))
+					clear_highpage(&page[i]);
+			}
 		}
 	} else {
 		nr = HPAGE_PMD_NR;
@@ -1292,17 +1305,19 @@ static void page_remove_anon_compound_rmap(struct page *page)
 /**
  * page_remove_rmap - take down pte mapping from a page
  * @page:	page to remove mapping from
+ * @vma:	VMA the page belongs to
  * @compound:	uncharge the page as compound or small page
  *
  * The caller needs to hold the pte lock.
  */
-void page_remove_rmap(struct page *page, bool compound)
+void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
+		      bool compound)
 {
 	if (!PageAnon(page))
 		return page_remove_file_rmap(page, compound);
 
 	if (compound)
-		return page_remove_anon_compound_rmap(page);
+		return page_remove_anon_compound_rmap(vma, page);
 
 	/* page still mapped by someone else? */
 	if (!atomic_add_negative(-1, &page->_mapcount))
@@ -1321,6 +1336,9 @@ void page_remove_rmap(struct page *page, bool compound)
 	if (PageTransCompound(page))
 		deferred_split_huge_page(compound_head(page));
 
+	if (unlikely(vma->vm_flags & VM_WIPEONRELEASE))
+		clear_highpage(page);
+
 	/*
 	 * It would be tidy to reset the PageAnon mapping here,
 	 * but that might overwrite a racing page_add_anon_rmap
@@ -1652,7 +1670,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		 *
 		 * See Documentation/vm/mmu_notifier.rst
 		 */
-		page_remove_rmap(subpage, PageHuge(page));
+		page_remove_rmap(subpage, vma, PageHuge(page));
 		put_page(page);
 	}
 
-- 
2.21.0.593.g511ec345e18-goog


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH V2] mm: Allow userland to request that the kernel clear memory on release
  2019-04-26 18:08                   ` Matthew Garrett
@ 2019-04-29 21:44                     ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2019-04-29 21:44 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-mm, Linux Kernel Mailing List, Linux API

On Fri 26-04-19 11:08:44, Matthew Garrett wrote:
> On Thu, Apr 25, 2019 at 10:25 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Thu 25-04-19 13:39:01, Matthew Garrett wrote:
> > > Yes, given MADV_DONTDUMP doesn't imply mlock I thought it'd be more
> > > consistent to keep those independent.
> >
> > Do we want to fail madvise call on VMAs that are not mlocked then? What
> > if the munlock happens later after the madvise is called?
> 
> I'm not sure if it's strictly necessary. We already have various
> combinations of features that only make sense when used together and
> which can be undermined by later actions. I can see the appeal of
> designing this in a way that makes it harder to misuse, but is that
> worth additional implementation complexity?

If the complexity is not worth the usual usecases then this should be
really documented and noted that without an mlock you are not getting
the full semantic and you can leave memory behind on the swap partition.

I cannot judge how much that matter but it certainly looks half feature
to me but if nobody is going to use the madvise without mlock then it
looks certainly much easier to implement.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4] mm: Allow userland to request that the kernel clear memory on release
  2019-04-29 19:36         ` [PATCH V4] " Matthew Garrett
@ 2019-06-05 18:26           ` Matthew Garrett
  2019-06-06 22:45             ` Kees Cook
  0 siblings, 1 reply; 29+ messages in thread
From: Matthew Garrett @ 2019-06-05 18:26 UTC (permalink / raw)
  To: Linux-MM; +Cc: Linux Kernel Mailing List

On Mon, Apr 29, 2019 at 12:36 PM Matthew Garrett
<matthewgarrett@google.com> wrote:

(snip)

Any further feedback on this? Does it seem conceptually useful?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4] mm: Allow userland to request that the kernel clear memory on release
  2019-06-05 18:26           ` Matthew Garrett
@ 2019-06-06 22:45             ` Kees Cook
  0 siblings, 0 replies; 29+ messages in thread
From: Kees Cook @ 2019-06-06 22:45 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Linux-MM, Matthew Wilcox, Alexander Potapenko, Linux Kernel Mailing List

On Wed, Jun 05, 2019 at 11:26:03AM -0700, Matthew Garrett wrote:
> Any further feedback on this? Does it seem conceptually useful?

Hi!

I love this patch, and I think it can nicely combine with Alexander's
init_on_alloc/free series[1].

One thing I'd like to see changed is that the DONTWIPE call should
wipe the memory. That way, there is no need to "trust" child behavior.
The only way out of the WIPE flag is that the memory gets wiped.

[1] https://patchwork.kernel.org/patch/10967023/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-06-06 22:45 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-24 19:14 [PATCH] mm: Allow userland to request that the kernel clear memory on release Matthew Garrett
2019-04-24 19:28 ` Matthew Wilcox
2019-04-24 19:33   ` Matthew Garrett
2019-04-24 20:20     ` Matthew Wilcox
2019-04-24 20:22       ` Matthew Garrett
2019-04-24 21:10         ` [PATCH V2] " Matthew Garrett
2019-04-25 12:14           ` Michal Hocko
2019-04-25 12:37             ` Michal Hocko
2019-04-25 20:39               ` Matthew Garrett
2019-04-26  5:25                 ` Michal Hocko
2019-04-26 18:08                   ` Matthew Garrett
2019-04-29 21:44                     ` Michal Hocko
2019-04-25 12:40             ` Vlastimil Babka
2019-04-25 20:45               ` Matthew Garrett
2019-04-25 12:42             ` Jann Horn
2019-04-25 20:43               ` Matthew Garrett
2019-04-26  5:31               ` Michal Hocko
2019-04-26 13:33                 ` Jann Horn
2019-04-26 13:47                   ` Michal Hocko
2019-04-26 14:03                     ` Jann Horn
2019-04-26 14:08                       ` Michal Hocko
2019-04-25 22:58         ` [PATCH V3] " Matthew Garrett
2019-04-26  7:45           ` Vlastimil Babka
2019-04-26 18:10             ` Matthew Garrett
2019-04-29 19:36         ` [PATCH V4] " Matthew Garrett
2019-06-05 18:26           ` Matthew Garrett
2019-06-06 22:45             ` Kees Cook
2019-04-25 15:32 ` [PATCH] " Christopher Lameter
2019-04-25 20:29   ` Matthew Garrett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).