All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] mm/secretmem: one fix and one refactoring
@ 2024-03-26 14:32 David Hildenbrand
  2024-03-26 14:32 ` [PATCH v2 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: David Hildenbrand @ 2024-03-26 14:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Mike Rapoport,
	Miklos Szeredi, Lorenzo Stoakes, xingwei lee, yue sun

Patch #1 fixes a GUP-fast issue, whereby we might succeed in pinning
secretmem folios. Patch #2 extends the memfd_secret selftest to cover
that case. Patch #3 removes folio_is_secretmem() and instead lets
folio_fast_pin_allowed() cover that case as well.

With this series, the reproducer (+selftests) works as expected. To
test patch #3, the gup_longterm test does exactly what we need, and
keeps on working as expected.

Without the fix:
	TAP version 13
	1..6
	ok 1 mlock limit is respected
	ok 2 file IO is blocked as expected
	not ok 3 vmsplice: unexpected memory access with fresh page
	ok 4 vmsplice is blocked as expected with existing page
	ok 5 process_vm_read is blocked as expected
	ok 6 ptrace is blocked as expected
	# Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0

With the fix:
	TAP version 13
	1..6
	ok 1 mlock limit is respected
	ok 2 file IO is blocked as expected
	ok 3 vmsplice is blocked as expected with fresh page
	ok 4 vmsplice is blocked as expected with existing page
	ok 5 process_vm_read is blocked as expected
	ok 6 ptrace is blocked as expected
	# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0

v1 -> v2:
* "mm/secretmem: fix GUP-fast succeeding on secretmem folios"
 -> Drop the LRU check completely
 -> Rephrase patch description
 -> (Dropped RB from Mike)
* "selftests/memfd_secret: add vmsplice() test"
 -> Add test with fresh+existing page
 -> Change pass/fail message
 -> Rephrase patch description
 -> (Dropped RB from Mike)
* "mm: merge folio_is_secretmem() and folio_fast_pin_allowed() into
   gup_fast_folio_allowed()"
 -> Adjust to dropped LRU check
 -> Rename folio_fast_pin_allowed() to gup_fast_folio_allowed()
 -> Rephrase patch description
 -> Add RB from Mike

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: xingwei lee <xrivendell7@gmail.com>
Cc: yue sun <samsun1006219@gmail.com>

David Hildenbrand (3):
  mm/secretmem: fix GUP-fast succeeding on secretmem folios
  selftests/memfd_secret: add vmsplice() test
  mm: merge folio_is_secretmem() and folio_fast_pin_allowed() into
    gup_fast_folio_allowed()

 include/linux/secretmem.h                 | 21 +---------
 mm/gup.c                                  | 48 ++++++++++++---------
 tools/testing/selftests/mm/memfd_secret.c | 51 ++++++++++++++++++++++-
 3 files changed, 79 insertions(+), 41 deletions(-)

-- 
2.43.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios
  2024-03-26 14:32 [PATCH v2 0/3] mm/secretmem: one fix and one refactoring David Hildenbrand
@ 2024-03-26 14:32 ` David Hildenbrand
  2024-03-26 14:56   ` Mike Rapoport
  2024-03-26 14:32 ` [PATCH v2 2/3] selftests/memfd_secret: add vmsplice() test David Hildenbrand
  2024-03-26 14:32 ` [PATCH v2 3/3] mm: merge folio_is_secretmem() and folio_fast_pin_allowed() into gup_fast_folio_allowed() David Hildenbrand
  2 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-03-26 14:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Mike Rapoport,
	Miklos Szeredi, Lorenzo Stoakes, xingwei lee, yue sun,
	Miklos Szeredi, stable

folio_is_secretmem() currently relies on secretmem folios being LRU folios,
to save some cycles.

However, folios might reside in a folio batch without the LRU flag set, or
temporarily have their LRU flag cleared. Consequently, the LRU flag is
unreliable for this purpose.

In particular, this is the case when secretmem_fault() allocates a
fresh page and calls filemap_add_folio()->folio_add_lru(). The folio might
be added to the per-cpu folio batch and won't get the LRU flag set until
the batch was drained using e.g., lru_add_drain().

Consequently, folio_is_secretmem() might not detect secretmem folios
and GUP-fast can succeed in grabbing a secretmem folio, crashing the
kernel when we would later try reading/writing to the folio, because
the folio has been unmapped from the directmap.

Fix it by removing that unreliable check.

Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com/
Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
Tested-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
Cc: <stable@vger.kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/secretmem.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index 35f3a4a8ceb1..acf7e1a3f3de 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -13,10 +13,10 @@ static inline bool folio_is_secretmem(struct folio *folio)
 	/*
 	 * Using folio_mapping() is quite slow because of the actual call
 	 * instruction.
-	 * We know that secretmem pages are not compound and LRU so we can
+	 * We know that secretmem pages are not compound, so we can
 	 * save a couple of cycles here.
 	 */
-	if (folio_test_large(folio) || !folio_test_lru(folio))
+	if (folio_test_large(folio))
 		return false;
 
 	mapping = (struct address_space *)
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/3] selftests/memfd_secret: add vmsplice() test
  2024-03-26 14:32 [PATCH v2 0/3] mm/secretmem: one fix and one refactoring David Hildenbrand
  2024-03-26 14:32 ` [PATCH v2 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios David Hildenbrand
@ 2024-03-26 14:32 ` David Hildenbrand
  2024-03-26 14:59   ` Mike Rapoport
  2024-03-26 15:10   ` Miklos Szeredi
  2024-03-26 14:32 ` [PATCH v2 3/3] mm: merge folio_is_secretmem() and folio_fast_pin_allowed() into gup_fast_folio_allowed() David Hildenbrand
  2 siblings, 2 replies; 8+ messages in thread
From: David Hildenbrand @ 2024-03-26 14:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Mike Rapoport,
	Miklos Szeredi, Lorenzo Stoakes, xingwei lee, yue sun

Let's add a simple reproducer for a scenario where GUP-fast could succeed
on secretmem folios, making vmsplice() succeed instead of failing. The
reproducer is based on a reproducer [1] by Miklos Szeredi.

We want to perform two tests: vmsplice() when a fresh page was just
faulted in, and vmsplice() on an existing page after munmap() that
would drain certain LRU caches/batches in the kernel.

In an ideal world, we could use fallocate(FALLOC_FL_PUNCH_HOLE) /
MADV_REMOVE to remove any existing page. As that is currently not
possible, run the test before any other tests that would allocate memory
in the secretmem fd.

Perform the ftruncate() only once, and check the return value.

[1] https://lkml.kernel.org/r/CAJfpegt3UCsMmxd0taOY11Uaw5U=eS1fE5dn0wZX3HF0oy8-oQ@mail.gmail.com

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 tools/testing/selftests/mm/memfd_secret.c | 51 ++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
index 9b298f6a04b3..9a0597310a76 100644
--- a/tools/testing/selftests/mm/memfd_secret.c
+++ b/tools/testing/selftests/mm/memfd_secret.c
@@ -20,6 +20,7 @@
 #include <unistd.h>
 #include <errno.h>
 #include <stdio.h>
+#include <fcntl.h>
 
 #include "../kselftest.h"
 
@@ -83,6 +84,45 @@ static void test_mlock_limit(int fd)
 	pass("mlock limit is respected\n");
 }
 
+static void test_vmsplice(int fd, const char *desc)
+{
+	ssize_t transferred;
+	struct iovec iov;
+	int pipefd[2];
+	char *mem;
+
+	if (pipe(pipefd)) {
+		fail("pipe failed: %s\n", strerror(errno));
+		return;
+	}
+
+	mem = mmap(NULL, page_size, prot, mode, fd, 0);
+	if (mem == MAP_FAILED) {
+		fail("Unable to mmap secret memory\n");
+		goto close_pipe;
+	}
+
+	/*
+	 * vmsplice() may use GUP-fast, which must also fail. Prefault the
+	 * page table, so GUP-fast could find it.
+	 */
+	memset(mem, PATTERN, page_size);
+
+	iov.iov_base = mem;
+	iov.iov_len = page_size;
+	transferred = vmsplice(pipefd[1], &iov, 1, 0);
+
+	if (transferred < 0 && errno == EFAULT)
+		pass("vmsplice is blocked as expected with %s\n", desc);
+	else
+		fail("vmsplice: unexpected memory access with %s\n", desc);
+
+	munmap(mem, page_size);
+close_pipe:
+	close(pipefd[0]);
+	close(pipefd[1]);
+}
+
 static void try_process_vm_read(int fd, int pipefd[2])
 {
 	struct iovec liov, riov;
@@ -187,7 +227,6 @@ static void test_remote_access(int fd, const char *name,
 		return;
 	}
 
-	ftruncate(fd, page_size);
 	memset(mem, PATTERN, page_size);
 
 	if (write(pipefd[1], &mem, sizeof(mem)) < 0) {
@@ -258,7 +297,7 @@ static void prepare(void)
 				   strerror(errno));
 }
 
-#define NUM_TESTS 4
+#define NUM_TESTS 6
 
 int main(int argc, char *argv[])
 {
@@ -277,9 +316,17 @@ int main(int argc, char *argv[])
 			ksft_exit_fail_msg("memfd_secret failed: %s\n",
 					   strerror(errno));
 	}
+	if (ftruncate(fd, page_size))
+		ksft_exit_fail_msg("ftruncate failed: %s\n", strerror(errno));
 
 	test_mlock_limit(fd);
 	test_file_apis(fd);
+	/*
+	 * We have to run the first vmsplice test before any secretmem page was
+	 * allocated for this fd.
+	 */
+	test_vmsplice(fd, "fresh page");
+	test_vmsplice(fd, "existing page");
 	test_process_vm_read(fd);
 	test_ptrace(fd);
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/3] mm: merge folio_is_secretmem() and folio_fast_pin_allowed() into gup_fast_folio_allowed()
  2024-03-26 14:32 [PATCH v2 0/3] mm/secretmem: one fix and one refactoring David Hildenbrand
  2024-03-26 14:32 ` [PATCH v2 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios David Hildenbrand
  2024-03-26 14:32 ` [PATCH v2 2/3] selftests/memfd_secret: add vmsplice() test David Hildenbrand
@ 2024-03-26 14:32 ` David Hildenbrand
  2 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2024-03-26 14:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Mike Rapoport,
	Miklos Szeredi, Lorenzo Stoakes, xingwei lee, yue sun

folio_is_secretmem() is currently only used during GUP-fast. Nowadays,
folio_fast_pin_allowed() performs similar checks during GUP-fast and
contains a lot of careful handling -- READ_ONCE() -- , sanity checks --
lockdep_assert_irqs_disabled() --  and helpful comments on how this
handling is safe and correct.

So let's merge folio_is_secretmem() into folio_fast_pin_allowed(). Rename
folio_fast_pin_allowed() to gup_fast_folio_allowed(), to better match the
new semantics.

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/secretmem.h | 21 ++---------------
 mm/gup.c                  | 48 +++++++++++++++++++++++----------------
 2 files changed, 30 insertions(+), 39 deletions(-)

diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index acf7e1a3f3de..e918f96881f5 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -6,25 +6,8 @@
 
 extern const struct address_space_operations secretmem_aops;
 
-static inline bool folio_is_secretmem(struct folio *folio)
+static inline bool secretmem_mapping(struct address_space *mapping)
 {
-	struct address_space *mapping;
-
-	/*
-	 * Using folio_mapping() is quite slow because of the actual call
-	 * instruction.
-	 * We know that secretmem pages are not compound, so we can
-	 * save a couple of cycles here.
-	 */
-	if (folio_test_large(folio))
-		return false;
-
-	mapping = (struct address_space *)
-		((unsigned long)folio->mapping & ~PAGE_MAPPING_FLAGS);
-
-	if (!mapping || mapping != folio->mapping)
-		return false;
-
 	return mapping->a_ops == &secretmem_aops;
 }
 
@@ -38,7 +21,7 @@ static inline bool vma_is_secretmem(struct vm_area_struct *vma)
 	return false;
 }
 
-static inline bool folio_is_secretmem(struct folio *folio)
+static inline bool secretmem_mapping(struct address_space *mapping)
 {
 	return false;
 }
diff --git a/mm/gup.c b/mm/gup.c
index e7510b6ce765..03b74b148e30 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2466,12 +2466,14 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
 #ifdef CONFIG_HAVE_FAST_GUP
 
 /*
- * Used in the GUP-fast path to determine whether a pin is permitted for a
- * specific folio.
+ * Used in the GUP-fast path to determine whether GUP is permitted to work on
+ * a specific folio.
  *
  * This call assumes the caller has pinned the folio, that the lowest page table
  * level still points to this folio, and that interrupts have been disabled.
  *
+ * GUP-fast must reject all secretmem folios.
+ *
  * Writing to pinned file-backed dirty tracked folios is inherently problematic
  * (see comment describing the writable_file_mapping_allowed() function). We
  * therefore try to avoid the most egregious case of a long-term mapping doing
@@ -2481,25 +2483,34 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
  * in the fast path, so instead we whitelist known good cases and if in doubt,
  * fall back to the slow path.
  */
-static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags)
+static bool gup_fast_folio_allowed(struct folio *folio, unsigned int flags)
 {
+	bool reject_file_backed = false;
 	struct address_space *mapping;
+	bool check_secretmem = false;
 	unsigned long mapping_flags;
 
 	/*
 	 * If we aren't pinning then no problematic write can occur. A long term
 	 * pin is the most egregious case so this is the one we disallow.
 	 */
-	if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) !=
+	if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) ==
 	    (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE))
-		return true;
+		reject_file_backed = true;
+
+	/* We hold a folio reference, so we can safely access folio fields. */
 
-	/* The folio is pinned, so we can safely access folio fields. */
+	/* secretmem folios are always order-0 folios. */
+	if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio))
+		check_secretmem = true;
+
+	if (!reject_file_backed && !check_secretmem)
+		return true;
 
 	if (WARN_ON_ONCE(folio_test_slab(folio)))
 		return false;
 
-	/* hugetlb mappings do not require dirty-tracking. */
+	/* hugetlb neither requires dirty-tracking nor can be secretmem. */
 	if (folio_test_hugetlb(folio))
 		return true;
 
@@ -2535,10 +2546,12 @@ static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags)
 
 	/*
 	 * At this point, we know the mapping is non-null and points to an
-	 * address_space object. The only remaining whitelisted file system is
-	 * shmem.
+	 * address_space object.
 	 */
-	return shmem_mapping(mapping);
+	if (check_secretmem && secretmem_mapping(mapping))
+		return false;
+	/* The only remaining allowed file system is shmem. */
+	return !reject_file_backed || shmem_mapping(mapping);
 }
 
 static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
@@ -2624,18 +2637,13 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
 		if (!folio)
 			goto pte_unmap;
 
-		if (unlikely(folio_is_secretmem(folio))) {
-			gup_put_folio(folio, 1, flags);
-			goto pte_unmap;
-		}
-
 		if (unlikely(pmd_val(pmd) != pmd_val(*pmdp)) ||
 		    unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) {
 			gup_put_folio(folio, 1, flags);
 			goto pte_unmap;
 		}
 
-		if (!folio_fast_pin_allowed(folio, flags)) {
+		if (!gup_fast_folio_allowed(folio, flags)) {
 			gup_put_folio(folio, 1, flags);
 			goto pte_unmap;
 		}
@@ -2832,7 +2840,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
 		return 0;
 	}
 
-	if (!folio_fast_pin_allowed(folio, flags)) {
+	if (!gup_fast_folio_allowed(folio, flags)) {
 		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
@@ -2903,7 +2911,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 		return 0;
 	}
 
-	if (!folio_fast_pin_allowed(folio, flags)) {
+	if (!gup_fast_folio_allowed(folio, flags)) {
 		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
@@ -2947,7 +2955,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 		return 0;
 	}
 
-	if (!folio_fast_pin_allowed(folio, flags)) {
+	if (!gup_fast_folio_allowed(folio, flags)) {
 		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
@@ -2992,7 +3000,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 		return 0;
 	}
 
-	if (!folio_fast_pin_allowed(folio, flags)) {
+	if (!gup_fast_folio_allowed(folio, flags)) {
 		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios
  2024-03-26 14:32 ` [PATCH v2 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios David Hildenbrand
@ 2024-03-26 14:56   ` Mike Rapoport
  0 siblings, 0 replies; 8+ messages in thread
From: Mike Rapoport @ 2024-03-26 14:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Miklos Szeredi,
	Lorenzo Stoakes, xingwei lee, yue sun, Miklos Szeredi, stable

On Tue, Mar 26, 2024 at 03:32:08PM +0100, David Hildenbrand wrote:
> folio_is_secretmem() currently relies on secretmem folios being LRU folios,
> to save some cycles.
> 
> However, folios might reside in a folio batch without the LRU flag set, or
> temporarily have their LRU flag cleared. Consequently, the LRU flag is
> unreliable for this purpose.
> 
> In particular, this is the case when secretmem_fault() allocates a
> fresh page and calls filemap_add_folio()->folio_add_lru(). The folio might
> be added to the per-cpu folio batch and won't get the LRU flag set until
> the batch was drained using e.g., lru_add_drain().
> 
> Consequently, folio_is_secretmem() might not detect secretmem folios
> and GUP-fast can succeed in grabbing a secretmem folio, crashing the
> kernel when we would later try reading/writing to the folio, because
> the folio has been unmapped from the directmap.
> 
> Fix it by removing that unreliable check.
> 
> Reported-by: xingwei lee <xrivendell7@gmail.com>
> Reported-by: yue sun <samsun1006219@gmail.com>
> Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com/
> Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
> Tested-by: Miklos Szeredi <mszeredi@redhat.com>
> Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  include/linux/secretmem.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
> index 35f3a4a8ceb1..acf7e1a3f3de 100644
> --- a/include/linux/secretmem.h
> +++ b/include/linux/secretmem.h
> @@ -13,10 +13,10 @@ static inline bool folio_is_secretmem(struct folio *folio)
>  	/*
>  	 * Using folio_mapping() is quite slow because of the actual call
>  	 * instruction.
> -	 * We know that secretmem pages are not compound and LRU so we can
> +	 * We know that secretmem pages are not compound, so we can
>  	 * save a couple of cycles here.
>  	 */
> -	if (folio_test_large(folio) || !folio_test_lru(folio))
> +	if (folio_test_large(folio))
>  		return false;
>  
>  	mapping = (struct address_space *)
> -- 
> 2.43.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/3] selftests/memfd_secret: add vmsplice() test
  2024-03-26 14:32 ` [PATCH v2 2/3] selftests/memfd_secret: add vmsplice() test David Hildenbrand
@ 2024-03-26 14:59   ` Mike Rapoport
  2024-03-26 15:10   ` Miklos Szeredi
  1 sibling, 0 replies; 8+ messages in thread
From: Mike Rapoport @ 2024-03-26 14:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Miklos Szeredi,
	Lorenzo Stoakes, xingwei lee, yue sun

On Tue, Mar 26, 2024 at 03:32:09PM +0100, David Hildenbrand wrote:
> Let's add a simple reproducer for a scenario where GUP-fast could succeed
> on secretmem folios, making vmsplice() succeed instead of failing. The
> reproducer is based on a reproducer [1] by Miklos Szeredi.
> 
> We want to perform two tests: vmsplice() when a fresh page was just
> faulted in, and vmsplice() on an existing page after munmap() that
> would drain certain LRU caches/batches in the kernel.
> 
> In an ideal world, we could use fallocate(FALLOC_FL_PUNCH_HOLE) /
> MADV_REMOVE to remove any existing page. As that is currently not
> possible, run the test before any other tests that would allocate memory
> in the secretmem fd.
> 
> Perform the ftruncate() only once, and check the return value.
> 
> [1] https://lkml.kernel.org/r/CAJfpegt3UCsMmxd0taOY11Uaw5U=eS1fE5dn0wZX3HF0oy8-oQ@mail.gmail.com
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  tools/testing/selftests/mm/memfd_secret.c | 51 ++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
> index 9b298f6a04b3..9a0597310a76 100644
> --- a/tools/testing/selftests/mm/memfd_secret.c
> +++ b/tools/testing/selftests/mm/memfd_secret.c
> @@ -20,6 +20,7 @@
>  #include <unistd.h>
>  #include <errno.h>
>  #include <stdio.h>
> +#include <fcntl.h>
>  
>  #include "../kselftest.h"
>  
> @@ -83,6 +84,45 @@ static void test_mlock_limit(int fd)
>  	pass("mlock limit is respected\n");
>  }
>  
> +static void test_vmsplice(int fd, const char *desc)
> +{
> +	ssize_t transferred;
> +	struct iovec iov;
> +	int pipefd[2];
> +	char *mem;
> +
> +	if (pipe(pipefd)) {
> +		fail("pipe failed: %s\n", strerror(errno));
> +		return;
> +	}
> +
> +	mem = mmap(NULL, page_size, prot, mode, fd, 0);
> +	if (mem == MAP_FAILED) {
> +		fail("Unable to mmap secret memory\n");
> +		goto close_pipe;
> +	}
> +
> +	/*
> +	 * vmsplice() may use GUP-fast, which must also fail. Prefault the
> +	 * page table, so GUP-fast could find it.
> +	 */
> +	memset(mem, PATTERN, page_size);
> +
> +	iov.iov_base = mem;
> +	iov.iov_len = page_size;
> +	transferred = vmsplice(pipefd[1], &iov, 1, 0);
> +
> +	if (transferred < 0 && errno == EFAULT)
> +		pass("vmsplice is blocked as expected with %s\n", desc);
> +	else
> +		fail("vmsplice: unexpected memory access with %s\n", desc);
> +
> +	munmap(mem, page_size);
> +close_pipe:
> +	close(pipefd[0]);
> +	close(pipefd[1]);
> +}
> +
>  static void try_process_vm_read(int fd, int pipefd[2])
>  {
>  	struct iovec liov, riov;
> @@ -187,7 +227,6 @@ static void test_remote_access(int fd, const char *name,
>  		return;
>  	}
>  
> -	ftruncate(fd, page_size);
>  	memset(mem, PATTERN, page_size);
>  
>  	if (write(pipefd[1], &mem, sizeof(mem)) < 0) {
> @@ -258,7 +297,7 @@ static void prepare(void)
>  				   strerror(errno));
>  }
>  
> -#define NUM_TESTS 4
> +#define NUM_TESTS 6
>  
>  int main(int argc, char *argv[])
>  {
> @@ -277,9 +316,17 @@ int main(int argc, char *argv[])
>  			ksft_exit_fail_msg("memfd_secret failed: %s\n",
>  					   strerror(errno));
>  	}
> +	if (ftruncate(fd, page_size))
> +		ksft_exit_fail_msg("ftruncate failed: %s\n", strerror(errno));
>  
>  	test_mlock_limit(fd);
>  	test_file_apis(fd);
> +	/*
> +	 * We have to run the first vmsplice test before any secretmem page was
> +	 * allocated for this fd.
> +	 */
> +	test_vmsplice(fd, "fresh page");
> +	test_vmsplice(fd, "existing page");
>  	test_process_vm_read(fd);
>  	test_ptrace(fd);
>  
> -- 
> 2.43.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/3] selftests/memfd_secret: add vmsplice() test
  2024-03-26 14:32 ` [PATCH v2 2/3] selftests/memfd_secret: add vmsplice() test David Hildenbrand
  2024-03-26 14:59   ` Mike Rapoport
@ 2024-03-26 15:10   ` Miklos Szeredi
  2024-03-26 15:36     ` David Hildenbrand
  1 sibling, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2024-03-26 15:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Rapoport,
	Lorenzo Stoakes, xingwei lee, yue sun

On Tue, Mar 26, 2024 at 3:32 PM David Hildenbrand <david@redhat.com> wrote:
>
> Let's add a simple reproducer for a scenario where GUP-fast could succeed
> on secretmem folios, making vmsplice() succeed instead of failing. The
> reproducer is based on a reproducer [1] by Miklos Szeredi.
>
> We want to perform two tests: vmsplice() when a fresh page was just
> faulted in, and vmsplice() on an existing page after munmap() that
> would drain certain LRU caches/batches in the kernel.
>
> In an ideal world, we could use fallocate(FALLOC_FL_PUNCH_HOLE) /
> MADV_REMOVE to remove any existing page. As that is currently not
> possible, run the test before any other tests that would allocate memory
> in the secretmem fd.
>
> Perform the ftruncate() only once, and check the return value.
>
> [1] https://lkml.kernel.org/r/CAJfpegt3UCsMmxd0taOY11Uaw5U=eS1fE5dn0wZX3HF0oy8-oQ@mail.gmail.com
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  tools/testing/selftests/mm/memfd_secret.c | 51 ++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
> index 9b298f6a04b3..9a0597310a76 100644
> --- a/tools/testing/selftests/mm/memfd_secret.c
> +++ b/tools/testing/selftests/mm/memfd_secret.c
> @@ -20,6 +20,7 @@
>  #include <unistd.h>
>  #include <errno.h>
>  #include <stdio.h>
> +#include <fcntl.h>
>
>  #include "../kselftest.h"
>
> @@ -83,6 +84,45 @@ static void test_mlock_limit(int fd)
>         pass("mlock limit is respected\n");
>  }
>
> +static void test_vmsplice(int fd, const char *desc)
> +{
> +       ssize_t transferred;
> +       struct iovec iov;
> +       int pipefd[2];
> +       char *mem;
> +
> +       if (pipe(pipefd)) {
> +               fail("pipe failed: %s\n", strerror(errno));
> +               return;
> +       }
> +
> +       mem = mmap(NULL, page_size, prot, mode, fd, 0);
> +       if (mem == MAP_FAILED) {
> +               fail("Unable to mmap secret memory\n");
> +               goto close_pipe;
> +       }
> +
> +       /*
> +        * vmsplice() may use GUP-fast, which must also fail. Prefault the
> +        * page table, so GUP-fast could find it.
> +        */
> +       memset(mem, PATTERN, page_size);

Shouldn't the non-prefault case be tested as well?

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/3] selftests/memfd_secret: add vmsplice() test
  2024-03-26 15:10   ` Miklos Szeredi
@ 2024-03-26 15:36     ` David Hildenbrand
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2024-03-26 15:36 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Rapoport,
	Lorenzo Stoakes, xingwei lee, yue sun

>> +static void test_vmsplice(int fd, const char *desc)
>> +{
>> +       ssize_t transferred;
>> +       struct iovec iov;
>> +       int pipefd[2];
>> +       char *mem;
>> +
>> +       if (pipe(pipefd)) {
>> +               fail("pipe failed: %s\n", strerror(errno));
>> +               return;
>> +       }
>> +
>> +       mem = mmap(NULL, page_size, prot, mode, fd, 0);
>> +       if (mem == MAP_FAILED) {
>> +               fail("Unable to mmap secret memory\n");
>> +               goto close_pipe;
>> +       }
>> +
>> +       /*
>> +        * vmsplice() may use GUP-fast, which must also fail. Prefault the
>> +        * page table, so GUP-fast could find it.
>> +        */
>> +       memset(mem, PATTERN, page_size);
> 
> Shouldn't the non-prefault case be tested as well?

That's the "easy" case where GUP-fast is never involved, and it should 
mostly be covered by the ptrace/process_vm_read() tests already.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-26 15:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-26 14:32 [PATCH v2 0/3] mm/secretmem: one fix and one refactoring David Hildenbrand
2024-03-26 14:32 ` [PATCH v2 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios David Hildenbrand
2024-03-26 14:56   ` Mike Rapoport
2024-03-26 14:32 ` [PATCH v2 2/3] selftests/memfd_secret: add vmsplice() test David Hildenbrand
2024-03-26 14:59   ` Mike Rapoport
2024-03-26 15:10   ` Miklos Szeredi
2024-03-26 15:36     ` David Hildenbrand
2024-03-26 14:32 ` [PATCH v2 3/3] mm: merge folio_is_secretmem() and folio_fast_pin_allowed() into gup_fast_folio_allowed() David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.