linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] hugetlb, userfaultfd: Fix reservation restore on userfaultfd error
@ 2021-11-16 23:57 Mina Almasry
  2021-11-17  0:00 ` Mina Almasry
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Mina Almasry @ 2021-11-16 23:57 UTC (permalink / raw)
  To: Mike Kravetz, Andrew Morton
  Cc: Mina Almasry, Wei Xu, James Houghton, linux-mm, linux-kernel

Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we
bail out using "goto out_release_unlock;" in the cases where idx >=
size, or !huge_pte_none(), the code will detect that new_pagecache_page
== false, and so call restore_reserve_on_error().
In this case I see restore_reserve_on_error() delete the reservation,
and the following call to remove_inode_hugepages() will increment
h->resv_hugepages causing a 100% reproducible leak.

We should treat the is_continue case similar to adding a page into the
pagecache and set new_pagecache_page to true, to indicate that there is
no reservation to restore on the error path, and we need not call
restore_reserve_on_error().

Cc: Wei Xu <weixugc@google.com>

Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error")
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reported-by: James Houghton <jthoughton@google.com>
---
 mm/hugetlb.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e09159c957e3..25a7a3d84607 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5741,6 +5741,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 		page = find_lock_page(mapping, idx);
 		if (!page)
 			goto out;
+		/*
+		 * Set new_pagecache_page to true, as we've added a page to the
+		 * pagecache, but userfaultfd hasn't set up a mapping for this
+		 * page yet. If we bail out before setting up the mapping, we
+		 * want to indicate to restore_reserve_on_error() that we've
+		 * added the page to the page cache.
+		 */
+		new_pagecache_page = true;
 	} else if (!*pagep) {
 		/* If a page already exists, then it's UFFDIO_COPY for
 		 * a non-missing case. Return -EEXIST.
--
2.34.0.rc1.387.gb447b232ab-goog

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] hugetlb, userfaultfd: Fix reservation restore on userfaultfd error
  2021-11-16 23:57 [PATCH v1] hugetlb, userfaultfd: Fix reservation restore on userfaultfd error Mina Almasry
@ 2021-11-17  0:00 ` Mina Almasry
  2021-11-17  0:32 ` Andrew Morton
  2021-11-17  0:58 ` Mike Kravetz
  2 siblings, 0 replies; 4+ messages in thread
From: Mina Almasry @ 2021-11-17  0:00 UTC (permalink / raw)
  To: Mike Kravetz, Andrew Morton
  Cc: Wei Xu, James Houghton, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2218 bytes --]

Hi Mike,

On Tue, Nov 16, 2021 at 3:57 PM Mina Almasry <almasrymina@google.com> wrote:
>
> Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we
> bail out using "goto out_release_unlock;" in the cases where idx >=
> size, or !huge_pte_none(), the code will detect that new_pagecache_page
> == false, and so call restore_reserve_on_error().
> In this case I see restore_reserve_on_error() delete the reservation,
> and the following call to remove_inode_hugepages() will increment
> h->resv_hugepages causing a 100% reproducible leak.
>

Attached is the .c file with the 100% repro.

> We should treat the is_continue case similar to adding a page into the
> pagecache and set new_pagecache_page to true, to indicate that there is
> no reservation to restore on the error path, and we need not call
> restore_reserve_on_error().
>
> Cc: Wei Xu <weixugc@google.com>
>
> Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error")
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> Reported-by: James Houghton <jthoughton@google.com>

Not sure if this is a Cc: stable issue. If it is, I can add in v2.

> ---
>  mm/hugetlb.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e09159c957e3..25a7a3d84607 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5741,6 +5741,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>                 page = find_lock_page(mapping, idx);
>                 if (!page)
>                         goto out;
> +               /*
> +                * Set new_pagecache_page to true, as we've added a page to the
> +                * pagecache, but userfaultfd hasn't set up a mapping for this
> +                * page yet. If we bail out before setting up the mapping, we
> +                * want to indicate to restore_reserve_on_error() that we've
> +                * added the page to the page cache.
> +                */
> +               new_pagecache_page = true;
>         } else if (!*pagep) {
>                 /* If a page already exists, then it's UFFDIO_COPY for
>                  * a non-missing case. Return -EEXIST.
> --
> 2.34.0.rc1.387.gb447b232ab-goog

[-- Attachment #2: hugetlb-leak-test.c --]
[-- Type: text/x-csrc, Size: 2964 bytes --]

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <string.h>
#include <fcntl.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <linux/userfaultfd.h>

#ifndef UFFD_FEATURE_MINOR_HUGETLBFS
#define UFFD_FEATURE_MINOR_HUGETLBFS (1 << 9)
#endif

#ifndef UFFDIO_REGISTER_MODE_MINOR
#define UFFDIO_REGISTER_MODE_MINOR ((__u64)1 << 2)
#endif

#ifndef UFFDIO_CONTINUE_MODE_DONTWAKE
#define UFFDIO_CONTINUE_MODE_DONTWAKE ((uint64_t)1 << 0)
#endif

struct uffdio_continue {
	struct uffdio_range range;
	uint64_t mode;
	int64_t bytes_mapped;
};


#ifndef UFFDIO_CONTINUE
#define _UFFDIO_CONTINUE (0x07)
#define UFFDIO_CONTINUE _IOWR(UFFDIO, _UFFDIO_CONTINUE, struct uffdio_continue)
#endif

#ifndef UFFD_PAGEFAULT_FLAG_MINOR
#define UFFD_PAGEFAULT_FLAG_MINOR (1 << 2)
#endif

int userfaultfd(int mode) {
	return syscall(__NR_userfaultfd, mode);
}

int main() {

	int fd = open("/mnt/huge/test", O_RDWR | O_CREAT | S_IRUSR | S_IWUSR);
	if (fd == -1) {
		perror("opening tmpfile in hugetlbfs-mount failed");
		return -1;
	}

	const size_t len_hugepage = 512 * 4096; // 2MB
	const size_t len = 2 * len_hugepage;
	if (ftruncate(fd, len) < 0) {
		perror("ftruncate failed");
		return -1;
	}
	int uffd = userfaultfd(O_CLOEXEC | O_NONBLOCK);
	if (uffd < 0) {
		perror("uffd not created");
		return -1;
	}

	char *hva = (char*)(mmap(NULL, len, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0));
	if (hva == MAP_FAILED) {
		perror("mmap hva failed");
		return -1;
	}

	char *primary = (char*)(mmap(hva, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0));
	if (primary == MAP_FAILED) {
		perror("mmap primary failed");
		return -1;
	}

	char *secondary = (char*)(mmap(hva, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0));
	if (primary == MAP_FAILED) {
		perror("mmap secondary failed");
		return -1;
	}

	struct uffdio_api api;
	api.api = UFFD_API;
	api.features = UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MISSING_HUGETLBFS;
	if (ioctl(uffd, UFFDIO_API, &api) == -1) {
		perror("UFFDIO_API failed");
		return -1;
	}

	memset(primary + len_hugepage, 0, len_hugepage);

	struct uffdio_register reg;
	reg.range.start = (unsigned long)primary;
	reg.range.len = len;
	reg.mode = UFFDIO_REGISTER_MODE_MINOR | UFFDIO_REGISTER_MODE_MISSING;
	reg.ioctls = 0;
	if (ioctl(uffd, UFFDIO_REGISTER, &reg) == -1) {
		perror("register failed");
		return -1;
	}

	memset(secondary, 0, len);

	struct uffdio_continue cont = {
		.range = (struct uffdio_range) {
			.start = (uint64_t)primary,
			.len = len,
		},
		.mode = 0,
		.bytes_mapped = 0,
	};
	if (ioctl(uffd, UFFDIO_CONTINUE, &cont) < 0) {
		perror("UFFDIO_CONTINUE failed");
		printf("Mapped %d bytes\n", cont.bytes_mapped);
	}

	close(uffd);
	munmap(secondary, len);
	munmap(primary, len);
	munmap(hva, len);

	close(fd);

	// After this unlink, a hugepage will be leaked.
	if (unlink("/mnt/huge/test") < 0) {
		perror("unlink failed");
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] hugetlb, userfaultfd: Fix reservation restore on userfaultfd error
  2021-11-16 23:57 [PATCH v1] hugetlb, userfaultfd: Fix reservation restore on userfaultfd error Mina Almasry
  2021-11-17  0:00 ` Mina Almasry
@ 2021-11-17  0:32 ` Andrew Morton
  2021-11-17  0:58 ` Mike Kravetz
  2 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2021-11-17  0:32 UTC (permalink / raw)
  To: Mina Almasry; +Cc: Mike Kravetz, Wei Xu, James Houghton, linux-mm, linux-kernel

On Tue, 16 Nov 2021 15:57:32 -0800 Mina Almasry <almasrymina@google.com> wrote:

> Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we
> bail out using "goto out_release_unlock;" in the cases where idx >=
> size, or !huge_pte_none(), the code will detect that new_pagecache_page
> == false, and so call restore_reserve_on_error().
> In this case I see restore_reserve_on_error() delete the reservation,
> and the following call to remove_inode_hugepages() will increment
> h->resv_hugepages causing a 100% reproducible leak.
> 
> We should treat the is_continue case similar to adding a page into the
> pagecache and set new_pagecache_page to true, to indicate that there is
> no reservation to restore on the error path, and we need not call
> restore_reserve_on_error().
> 
> Cc: Wei Xu <weixugc@google.com>
> 
> Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error")

I added cc:stable to this.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1] hugetlb, userfaultfd: Fix reservation restore on userfaultfd error
  2021-11-16 23:57 [PATCH v1] hugetlb, userfaultfd: Fix reservation restore on userfaultfd error Mina Almasry
  2021-11-17  0:00 ` Mina Almasry
  2021-11-17  0:32 ` Andrew Morton
@ 2021-11-17  0:58 ` Mike Kravetz
  2 siblings, 0 replies; 4+ messages in thread
From: Mike Kravetz @ 2021-11-17  0:58 UTC (permalink / raw)
  To: Mina Almasry, Andrew Morton
  Cc: Wei Xu, James Houghton, linux-mm, linux-kernel

Subject:   Re: [PATCH v1] hugetlb, userfaultfd: Fix reservation restore on userfaultfd error

To:        Mina Almasry <almasrymina@google.com>, Andrew Morton <akpm@linux-foundation.org>

Cc:        Wei Xu <weixugc@google.com>, James Houghton <jthoughton@google.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org

Bcc:       

-=-=-=-=-=-=-=-=-=# Don't remove this line #=-=-=-=-=-=-=-=-=-

On 11/16/21 3:57 PM, Mina Almasry wrote:

> Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we

> bail out using "goto out_release_unlock;" in the cases where idx >=

> size, or !huge_pte_none(), the code will detect that new_pagecache_page

> == false, and so call restore_reserve_on_error().

> In this case I see restore_reserve_on_error() delete the reservation,

> and the following call to remove_inode_hugepages() will increment

> h->resv_hugepages causing a 100% reproducible leak.

> 

> We should treat the is_continue case similar to adding a page into the

> pagecache and set new_pagecache_page to true, to indicate that there is

> no reservation to restore on the error path, and we need not call

> restore_reserve_on_error().

> 

> Cc: Wei Xu <weixugc@google.com>

> 

> Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error")

> Signed-off-by: Mina Almasry <almasrymina@google.com>

> Reported-by: James Houghton <jthoughton@google.com>



Thanks Mina and James!



Technically, the issue was introduced by commit 846be08578ed.  See the

'Note on Fixes tag' in c7b1850dfb41.  It is true that commit c7b1850dfb41

should have taken the 'is_continue' case into account when deciding whether

or not to call restore_reserve_on_error.  However, this issue first

showed up with 846be08578ed.  But, this patch depends on c7b1850dfb41 so

I think c7b1850dfb41 it best for the Fixes tag.



> ---

>  mm/hugetlb.c | 8 ++++++++

>  1 file changed, 8 insertions(+)

> 

> diff --git a/mm/hugetlb.c b/mm/hugetlb.c

> index e09159c957e3..25a7a3d84607 100644

> --- a/mm/hugetlb.c

> +++ b/mm/hugetlb.c

> @@ -5741,6 +5741,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,

>  		page = find_lock_page(mapping, idx);

>  		if (!page)

>  			goto out;

> +		/*

> +		 * Set new_pagecache_page to true, as we've added a page to the

> +		 * pagecache, but userfaultfd hasn't set up a mapping for this



We did not add the the page to the pagecache.  Rather, this is the case

where the page already exists in the cache.  Right?



> +		 * page yet. If we bail out before setting up the mapping, we

> +		 * want to indicate to restore_reserve_on_error() that we've

> +		 * added the page to the page cache.

> +		 */

> +		new_pagecache_page = true;





How about changing the variable name new_pagecache_page to page_in_pagecache?

Then it makes sense both here and below when actually adding to the

cache.  I think we could then drop the above comment.

-- 

Mike Kravetz



>  	} else if (!*pagep) {

>  		/* If a page already exists, then it's UFFDIO_COPY for

>  		 * a non-missing case. Return -EEXIST.

> --

> 2.34.0.rc1.387.gb447b232ab-goog

> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-17  0:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16 23:57 [PATCH v1] hugetlb, userfaultfd: Fix reservation restore on userfaultfd error Mina Almasry
2021-11-17  0:00 ` Mina Almasry
2021-11-17  0:32 ` Andrew Morton
2021-11-17  0:58 ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).