All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] mm/hugetlb: Fix issues on file sealing and fork
@ 2021-05-01 14:41 Peter Xu
  2021-05-01 14:41 ` [PATCH 1/2] mm/hugetlb: Fix F_SEAL_FUTURE_WRITE Peter Xu
  2021-05-01 14:41 ` [PATCH 2/2] mm/hugetlb: Fix cow where page writtable in child Peter Xu
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Xu @ 2021-05-01 14:41 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, peterx,
	Mike Kravetz, Axel Rasmussen

Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to
hugetlbfs, which I can easily verify using the memfd_test program, which seems
that the program is hardly run with hugetlbfs pages (as by default shmem).

Meanwhile I found another probably even more severe issue on that hugetlb fork
won't wr-protect child cow pages, so child can potentially write to parent
private pages.  Patch 2 addresses that.

After this series applied, "memfd_test hugetlbfs" should start to pass.

Please review, thanks.

Peter Xu (2):
  mm/hugetlb: Fix F_SEAL_FUTURE_WRITE
  mm/hugetlb: Fix cow where page writtable in child

 fs/hugetlbfs/inode.c |  5 +++++
 include/linux/mm.h   | 32 ++++++++++++++++++++++++++++++++
 mm/hugetlb.c         |  2 ++
 mm/shmem.c           | 22 ++++------------------
 4 files changed, 43 insertions(+), 18 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] mm/hugetlb: Fix F_SEAL_FUTURE_WRITE
  2021-05-01 14:41 [PATCH 0/2] mm/hugetlb: Fix issues on file sealing and fork Peter Xu
@ 2021-05-01 14:41 ` Peter Xu
  2021-05-03 18:55   ` Mike Kravetz
  2021-05-01 14:41 ` [PATCH 2/2] mm/hugetlb: Fix cow where page writtable in child Peter Xu
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Xu @ 2021-05-01 14:41 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, peterx,
	Mike Kravetz, Axel Rasmussen

F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
There is a test program for that and it fails constantly.

$ ./memfd_test hugetlbfs
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
mmap() didn't fail as expected
Aborted (core dumped)

I think it's probably because no one is really running the hugetlbfs test.

Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in
shmem_mmap().  Generalize a helper for that.

Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/hugetlbfs/inode.c |  5 +++++
 include/linux/mm.h   | 32 ++++++++++++++++++++++++++++++++
 mm/shmem.c           | 22 ++++------------------
 3 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index a2a42335e8fd2..39922c0f2fc8c 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -131,10 +131,15 @@ static void huge_pagevec_release(struct pagevec *pvec)
 static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file_inode(file);
+	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
 	loff_t len, vma_len;
 	int ret;
 	struct hstate *h = hstate_file(file);
 
+	ret = seal_check_future_write(info->seals, vma);
+	if (ret)
+		return ret;
+
 	/*
 	 * vma address alignment (but not the pgoff alignment) has
 	 * already been checked by prepare_hugepage_range.  If you add
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 84fb1697b20ff..c3fd7d504a60e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3200,5 +3200,37 @@ extern int sysctl_nr_trim_pages;
 
 void mem_dump_obj(void *object);
 
+/**
+ * seal_check_future_write - Check for F_SEAL_FUTURE_WRITE flag and handle it
+ * @seals: the seals to check
+ * @vma: the vma to operate on
+ *
+ * Check whether F_SEAL_FUTURE_WRITE is set; if so, do proper check/handling on
+ * the vma flags.  Return 0 if check pass, or <0 for errors.
+ */
+static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
+{
+	if (seals & F_SEAL_FUTURE_WRITE) {
+		/*
+		 * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
+		 * "future write" seal active.
+		 */
+		if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
+			return -EPERM;
+
+		/*
+		 * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
+		 * MAP_SHARED and read-only, take care to not allow mprotect to
+		 * revert protections on such mappings. Do this only for shared
+		 * mappings. For private mappings, don't need to mask
+		 * VM_MAYWRITE as we still want them to be COW-writable.
+		 */
+		if (vma->vm_flags & VM_SHARED)
+			vma->vm_flags &= ~(VM_MAYWRITE);
+	}
+
+	return 0;
+}
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/shmem.c b/mm/shmem.c
index 26c76b13ad233..e86a230735b60 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2258,25 +2258,11 @@ int shmem_lock(struct file *file, int lock, struct user_struct *user)
 static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct shmem_inode_info *info = SHMEM_I(file_inode(file));
+	int ret;
 
-	if (info->seals & F_SEAL_FUTURE_WRITE) {
-		/*
-		 * New PROT_WRITE and MAP_SHARED mmaps are not allowed when
-		 * "future write" seal active.
-		 */
-		if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE))
-			return -EPERM;
-
-		/*
-		 * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
-		 * MAP_SHARED and read-only, take care to not allow mprotect to
-		 * revert protections on such mappings. Do this only for shared
-		 * mappings. For private mappings, don't need to mask
-		 * VM_MAYWRITE as we still want them to be COW-writable.
-		 */
-		if (vma->vm_flags & VM_SHARED)
-			vma->vm_flags &= ~(VM_MAYWRITE);
-	}
+	ret = seal_check_future_write(info->seals, vma);
+	if (ret)
+		return ret;
 
 	/* arm64 - allow memory tagging on RAM-based files */
 	vma->vm_flags |= VM_MTE_ALLOWED;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] mm/hugetlb: Fix cow where page writtable in child
  2021-05-01 14:41 [PATCH 0/2] mm/hugetlb: Fix issues on file sealing and fork Peter Xu
  2021-05-01 14:41 ` [PATCH 1/2] mm/hugetlb: Fix F_SEAL_FUTURE_WRITE Peter Xu
@ 2021-05-01 14:41 ` Peter Xu
  2021-05-03 20:53   ` Mike Kravetz
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Xu @ 2021-05-01 14:41 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, peterx,
	Mike Kravetz, Axel Rasmussen

When fork() and copy hugetlb page range, we'll remember to wrprotect src pte if
needed, however we forget about the child!  Without it, the child will be able
to write to parent's pages when mapped as PROT_READ|PROT_WRITE and MAP_PRIVATE,
which will cause data corruption in the parent process.

This issue can also be exposed by "memfd_test hugetlbfs" kselftest (if it can
pass the F_SEAL_FUTURE_WRITE test first, though).

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/hugetlb.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 629aa4c2259c8..9978fb73b8caf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4056,6 +4056,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 				 * See Documentation/vm/mmu_notifier.rst
 				 */
 				huge_ptep_set_wrprotect(src, addr, src_pte);
+				/* Child cannot write too! */
+				entry = huge_pte_wrprotect(entry);
 			}
 
 			page_dup_rmap(ptepage, true);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm/hugetlb: Fix F_SEAL_FUTURE_WRITE
  2021-05-01 14:41 ` [PATCH 1/2] mm/hugetlb: Fix F_SEAL_FUTURE_WRITE Peter Xu
@ 2021-05-03 18:55   ` Mike Kravetz
  2021-05-03 21:31     ` Peter Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Kravetz @ 2021-05-03 18:55 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, Mike Kravetz,
	Axel Rasmussen

On 5/1/21 7:41 AM, Peter Xu wrote:
> F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
> There is a test program for that and it fails constantly.
> 
> $ ./memfd_test hugetlbfs
> memfd-hugetlb: CREATE
> memfd-hugetlb: BASIC
> memfd-hugetlb: SEAL-WRITE
> memfd-hugetlb: SEAL-FUTURE-WRITE
> mmap() didn't fail as expected
> Aborted (core dumped)
> 
> I think it's probably because no one is really running the hugetlbfs test.
> 
> Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in
> shmem_mmap().  Generalize a helper for that.
> 
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  fs/hugetlbfs/inode.c |  5 +++++
>  include/linux/mm.h   | 32 ++++++++++++++++++++++++++++++++
>  mm/shmem.c           | 22 ++++------------------
>  3 files changed, 41 insertions(+), 18 deletions(-)

Thanks Peter and Hugh!

One question below,

> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index a2a42335e8fd2..39922c0f2fc8c 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -131,10 +131,15 @@ static void huge_pagevec_release(struct pagevec *pvec)
>  static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>  {
>  	struct inode *inode = file_inode(file);
> +	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
>  	loff_t len, vma_len;
>  	int ret;
>  	struct hstate *h = hstate_file(file);
>  
> +	ret = seal_check_future_write(info->seals, vma);
> +	if (ret)
> +		return ret;
> +
>  	/*
>  	 * vma address alignment (but not the pgoff alignment) has
>  	 * already been checked by prepare_hugepage_range.  If you add

The full comment below the code you added is:

	/*
	 * vma address alignment (but not the pgoff alignment) has
	 * already been checked by prepare_hugepage_range.  If you add
	 * any error returns here, do so after setting VM_HUGETLB, so
	 * is_vm_hugetlb_page tests below unmap_region go the right
	 * way when do_mmap unwinds (may be important on powerpc
	 * and ia64).
	 */

This comment was added in commit 68589bc35303 by Hugh, although it
appears David Gibson added the reason for the comment in the commit
message:

"If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
because the given file offset is not hugepage aligned - then do_mmap_pgoff
will go to the unmap_and_free_vma backout path.

But at this stage the vma hasn't been marked as hugepage, and the backout path
will call unmap_region() on it.  That will eventually call down to the
non-hugepage version of unmap_page_range().  On ppc64, at least, that will
cause serious problems if there are any existing hugepage pagetable entries in
the vicinity - for example if there are any other hugepage mappings under the
same PUD.  unmap_page_range() will trigger a bad_pud() on the hugepage pud
entries.  I suspect this will also cause bad problems on ia64, though I don't
have a machine to test it on."

There are still comments in the unmap code about special handling of
ppc64 PUDs.  So, this may still be an issue.

I am trying to dig into the code to determine if this is still and
issue.  Just curious if you looked into this?  Might be simpler and
safer to just put the seal check after setting the VM_HUGETLB flag?

--
Mike Kravetz



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm/hugetlb: Fix cow where page writtable in child
  2021-05-01 14:41 ` [PATCH 2/2] mm/hugetlb: Fix cow where page writtable in child Peter Xu
@ 2021-05-03 20:53   ` Mike Kravetz
  2021-05-03 21:41     ` Peter Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Kravetz @ 2021-05-03 20:53 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, Axel Rasmussen

On 5/1/21 7:41 AM, Peter Xu wrote:
> When fork() and copy hugetlb page range, we'll remember to wrprotect src pte if
> needed, however we forget about the child!  Without it, the child will be able
> to write to parent's pages when mapped as PROT_READ|PROT_WRITE and MAP_PRIVATE,
> which will cause data corruption in the parent process.
> 
> This issue can also be exposed by "memfd_test hugetlbfs" kselftest (if it can
> pass the F_SEAL_FUTURE_WRITE test first, though).
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/hugetlb.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

I think we need to add, "Fixes: 4eae4efa2c29" as this is now in v5.12
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm/hugetlb: Fix F_SEAL_FUTURE_WRITE
  2021-05-03 18:55   ` Mike Kravetz
@ 2021-05-03 21:31     ` Peter Xu
  2021-05-03 22:28       ` Mike Kravetz
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2021-05-03 21:31 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Hugh Dickins, Andrew Morton,
	Andrea Arcangeli, Mike Kravetz, Axel Rasmussen

Mike,

On Mon, May 03, 2021 at 11:55:41AM -0700, Mike Kravetz wrote:
> On 5/1/21 7:41 AM, Peter Xu wrote:
> > F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
> > There is a test program for that and it fails constantly.
> > 
> > $ ./memfd_test hugetlbfs
> > memfd-hugetlb: CREATE
> > memfd-hugetlb: BASIC
> > memfd-hugetlb: SEAL-WRITE
> > memfd-hugetlb: SEAL-FUTURE-WRITE
> > mmap() didn't fail as expected
> > Aborted (core dumped)
> > 
> > I think it's probably because no one is really running the hugetlbfs test.
> > 
> > Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in
> > shmem_mmap().  Generalize a helper for that.
> > 
> > Reported-by: Hugh Dickins <hughd@google.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  fs/hugetlbfs/inode.c |  5 +++++
> >  include/linux/mm.h   | 32 ++++++++++++++++++++++++++++++++
> >  mm/shmem.c           | 22 ++++------------------
> >  3 files changed, 41 insertions(+), 18 deletions(-)
> 
> Thanks Peter and Hugh!
> 
> One question below,
> 
> > 
> > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > index a2a42335e8fd2..39922c0f2fc8c 100644
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -131,10 +131,15 @@ static void huge_pagevec_release(struct pagevec *pvec)
> >  static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> >  {
> >  	struct inode *inode = file_inode(file);
> > +	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
> >  	loff_t len, vma_len;
> >  	int ret;
> >  	struct hstate *h = hstate_file(file);
> >  
> > +	ret = seal_check_future_write(info->seals, vma);
> > +	if (ret)
> > +		return ret;
> > +
> >  	/*
> >  	 * vma address alignment (but not the pgoff alignment) has
> >  	 * already been checked by prepare_hugepage_range.  If you add
> 
> The full comment below the code you added is:
> 
> 	/*
> 	 * vma address alignment (but not the pgoff alignment) has
> 	 * already been checked by prepare_hugepage_range.  If you add
> 	 * any error returns here, do so after setting VM_HUGETLB, so
> 	 * is_vm_hugetlb_page tests below unmap_region go the right
> 	 * way when do_mmap unwinds (may be important on powerpc
> 	 * and ia64).
> 	 */
> 
> This comment was added in commit 68589bc35303 by Hugh, although it
> appears David Gibson added the reason for the comment in the commit
> message:
> 
> "If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
> because the given file offset is not hugepage aligned - then do_mmap_pgoff
> will go to the unmap_and_free_vma backout path.
> 
> But at this stage the vma hasn't been marked as hugepage, and the backout path
> will call unmap_region() on it.  That will eventually call down to the
> non-hugepage version of unmap_page_range().  On ppc64, at least, that will
> cause serious problems if there are any existing hugepage pagetable entries in
> the vicinity - for example if there are any other hugepage mappings under the
> same PUD.  unmap_page_range() will trigger a bad_pud() on the hugepage pud
> entries.  I suspect this will also cause bad problems on ia64, though I don't
> have a machine to test it on."
> 
> There are still comments in the unmap code about special handling of
> ppc64 PUDs.  So, this may still be an issue.
> 
> I am trying to dig into the code to determine if this is still and
> issue.  Just curious if you looked into this?  Might be simpler and
> safer to just put the seal check after setting the VM_HUGETLB flag?

Good catch!  I overlooked on that, and I definitely didn't look into it yet.
For now I'd better move that check to be after the flag settings in all cases.

I'll also add:

Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm/hugetlb: Fix cow where page writtable in child
  2021-05-03 20:53   ` Mike Kravetz
@ 2021-05-03 21:41     ` Peter Xu
  2021-05-03 22:10       ` Mike Kravetz
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2021-05-03 21:41 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Hugh Dickins, Andrew Morton,
	Andrea Arcangeli, Axel Rasmussen

On Mon, May 03, 2021 at 01:53:03PM -0700, Mike Kravetz wrote:
> On 5/1/21 7:41 AM, Peter Xu wrote:
> > When fork() and copy hugetlb page range, we'll remember to wrprotect src pte if
> > needed, however we forget about the child!  Without it, the child will be able
> > to write to parent's pages when mapped as PROT_READ|PROT_WRITE and MAP_PRIVATE,
> > which will cause data corruption in the parent process.
> > 
> > This issue can also be exposed by "memfd_test hugetlbfs" kselftest (if it can
> > pass the F_SEAL_FUTURE_WRITE test first, though).
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  mm/hugetlb.c | 2 ++
> >  1 file changed, 2 insertions(+)
> 
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

Thanks!

> 
> I think we need to add, "Fixes: 4eae4efa2c29" as this is now in v5.12

I could be mistaken, but my understanding is it's broken from the most initial
cow support of hugetlbfs in 2006...  So if we want a fixes tag, maybe this?

Fixes: 1e8f889b10d8d ("[PATCH] Hugetlb: Copy on Write support")

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm/hugetlb: Fix cow where page writtable in child
  2021-05-03 21:41     ` Peter Xu
@ 2021-05-03 22:10       ` Mike Kravetz
  2021-05-03 22:24         ` Peter Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Kravetz @ 2021-05-03 22:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-mm, linux-kernel, Hugh Dickins, Andrew Morton,
	Andrea Arcangeli, Axel Rasmussen

On 5/3/21 2:41 PM, Peter Xu wrote:
> On Mon, May 03, 2021 at 01:53:03PM -0700, Mike Kravetz wrote:
>> On 5/1/21 7:41 AM, Peter Xu wrote:
>>> When fork() and copy hugetlb page range, we'll remember to wrprotect src pte if
>>> needed, however we forget about the child!  Without it, the child will be able
>>> to write to parent's pages when mapped as PROT_READ|PROT_WRITE and MAP_PRIVATE,
>>> which will cause data corruption in the parent process.
>>>
>>> This issue can also be exposed by "memfd_test hugetlbfs" kselftest (if it can
>>> pass the F_SEAL_FUTURE_WRITE test first, though).
>>>
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>  mm/hugetlb.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>
>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Thanks!
> 
>>
>> I think we need to add, "Fixes: 4eae4efa2c29" as this is now in v5.12
> 
> I could be mistaken, but my understanding is it's broken from the most initial
> cow support of hugetlbfs in 2006...  So if we want a fixes tag, maybe this?
> 
> Fixes: 1e8f889b10d8d ("[PATCH] Hugetlb: Copy on Write support")
> 

Here is why I think it was broken in 4eae4efa2c29.  Prior to that commit
the code looked like this:

			if (cow) {
				/*
				 * No need to notify as we are downgrading page
				 * table protection not changing it to point
				 * to a new page.
				 *
				 * See Documentation/vm/mmu_notifier.rst
				 */
				huge_ptep_set_wrprotect(src, addr, src_pte);
			}
			entry = huge_ptep_get(src_pte);
			ptepage = pte_page(entry);
			get_page(ptepage);
			page_dup_rmap(ptepage, true);
			set_huge_pte_at(dst, addr, dst_pte, entry);
			hugetlb_count_add(pages_per_huge_page(h), dst);

After setting the wrprotect in the source pte, we 'huge_ptep_get' the
source to create the destination.  Hence, wrprotect will be set in the
destination as well.  It is perhaps not the most efficient, but
I think it 'works'.

It is subtle, or am I missing something?
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm/hugetlb: Fix cow where page writtable in child
  2021-05-03 22:10       ` Mike Kravetz
@ 2021-05-03 22:24         ` Peter Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2021-05-03 22:24 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Hugh Dickins, Andrew Morton,
	Andrea Arcangeli, Axel Rasmussen

On Mon, May 03, 2021 at 03:10:04PM -0700, Mike Kravetz wrote:
> On 5/3/21 2:41 PM, Peter Xu wrote:
> > On Mon, May 03, 2021 at 01:53:03PM -0700, Mike Kravetz wrote:
> >> On 5/1/21 7:41 AM, Peter Xu wrote:
> >>> When fork() and copy hugetlb page range, we'll remember to wrprotect src pte if
> >>> needed, however we forget about the child!  Without it, the child will be able
> >>> to write to parent's pages when mapped as PROT_READ|PROT_WRITE and MAP_PRIVATE,
> >>> which will cause data corruption in the parent process.
> >>>
> >>> This issue can also be exposed by "memfd_test hugetlbfs" kselftest (if it can
> >>> pass the F_SEAL_FUTURE_WRITE test first, though).
> >>>
> >>> Signed-off-by: Peter Xu <peterx@redhat.com>
> >>> ---
> >>>  mm/hugetlb.c | 2 ++
> >>>  1 file changed, 2 insertions(+)
> >>
> >> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > 
> > Thanks!
> > 
> >>
> >> I think we need to add, "Fixes: 4eae4efa2c29" as this is now in v5.12
> > 
> > I could be mistaken, but my understanding is it's broken from the most initial
> > cow support of hugetlbfs in 2006...  So if we want a fixes tag, maybe this?
> > 
> > Fixes: 1e8f889b10d8d ("[PATCH] Hugetlb: Copy on Write support")
> > 
> 
> Here is why I think it was broken in 4eae4efa2c29.  Prior to that commit
> the code looked like this:
> 
> 			if (cow) {
> 				/*
> 				 * No need to notify as we are downgrading page
> 				 * table protection not changing it to point
> 				 * to a new page.
> 				 *
> 				 * See Documentation/vm/mmu_notifier.rst
> 				 */
> 				huge_ptep_set_wrprotect(src, addr, src_pte);
> 			}
> 			entry = huge_ptep_get(src_pte);
> 			ptepage = pte_page(entry);
> 			get_page(ptepage);
> 			page_dup_rmap(ptepage, true);
> 			set_huge_pte_at(dst, addr, dst_pte, entry);
> 			hugetlb_count_add(pages_per_huge_page(h), dst);
> 
> After setting the wrprotect in the source pte, we 'huge_ptep_get' the
> source to create the destination.  Hence, wrprotect will be set in the
> destination as well.  It is perhaps not the most efficient, but
> I think it 'works'.
> 
> It is subtle, or am I missing something?

You're right, thanks Mike.  I'll repost and add correct fixes tag.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm/hugetlb: Fix F_SEAL_FUTURE_WRITE
  2021-05-03 21:31     ` Peter Xu
@ 2021-05-03 22:28       ` Mike Kravetz
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Kravetz @ 2021-05-03 22:28 UTC (permalink / raw)
  To: Peter Xu, Mike Kravetz
  Cc: linux-mm, linux-kernel, Hugh Dickins, Andrew Morton,
	Andrea Arcangeli, Axel Rasmussen

On 5/3/21 2:31 PM, Peter Xu wrote:
> Mike,
> 
> On Mon, May 03, 2021 at 11:55:41AM -0700, Mike Kravetz wrote:
>> On 5/1/21 7:41 AM, Peter Xu wrote:
>>> F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
>>> There is a test program for that and it fails constantly.
>>>
>>> $ ./memfd_test hugetlbfs
>>> memfd-hugetlb: CREATE
>>> memfd-hugetlb: BASIC
>>> memfd-hugetlb: SEAL-WRITE
>>> memfd-hugetlb: SEAL-FUTURE-WRITE
>>> mmap() didn't fail as expected
>>> Aborted (core dumped)
>>>
>>> I think it's probably because no one is really running the hugetlbfs test.
>>>
>>> Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in
>>> shmem_mmap().  Generalize a helper for that.
>>>
>>> Reported-by: Hugh Dickins <hughd@google.com>
>>> Signed-off-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>  fs/hugetlbfs/inode.c |  5 +++++
>>>  include/linux/mm.h   | 32 ++++++++++++++++++++++++++++++++
>>>  mm/shmem.c           | 22 ++++------------------
>>>  3 files changed, 41 insertions(+), 18 deletions(-)
>>
>> Thanks Peter and Hugh!
>>
>> One question below,
>>
>>>
>>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>>> index a2a42335e8fd2..39922c0f2fc8c 100644
>>> --- a/fs/hugetlbfs/inode.c
>>> +++ b/fs/hugetlbfs/inode.c
>>> @@ -131,10 +131,15 @@ static void huge_pagevec_release(struct pagevec *pvec)
>>>  static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>>>  {
>>>  	struct inode *inode = file_inode(file);
>>> +	struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
>>>  	loff_t len, vma_len;
>>>  	int ret;
>>>  	struct hstate *h = hstate_file(file);
>>>  
>>> +	ret = seal_check_future_write(info->seals, vma);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>>  	/*
>>>  	 * vma address alignment (but not the pgoff alignment) has
>>>  	 * already been checked by prepare_hugepage_range.  If you add
>>
>> The full comment below the code you added is:
>>
>> 	/*
>> 	 * vma address alignment (but not the pgoff alignment) has
>> 	 * already been checked by prepare_hugepage_range.  If you add
>> 	 * any error returns here, do so after setting VM_HUGETLB, so
>> 	 * is_vm_hugetlb_page tests below unmap_region go the right
>> 	 * way when do_mmap unwinds (may be important on powerpc
>> 	 * and ia64).
>> 	 */
>>
>> This comment was added in commit 68589bc35303 by Hugh, although it
>> appears David Gibson added the reason for the comment in the commit
>> message:
>>
>> "If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
>> because the given file offset is not hugepage aligned - then do_mmap_pgoff
>> will go to the unmap_and_free_vma backout path.
>>
>> But at this stage the vma hasn't been marked as hugepage, and the backout path
>> will call unmap_region() on it.  That will eventually call down to the
>> non-hugepage version of unmap_page_range().  On ppc64, at least, that will
>> cause serious problems if there are any existing hugepage pagetable entries in
>> the vicinity - for example if there are any other hugepage mappings under the
>> same PUD.  unmap_page_range() will trigger a bad_pud() on the hugepage pud
>> entries.  I suspect this will also cause bad problems on ia64, though I don't
>> have a machine to test it on."
>>
>> There are still comments in the unmap code about special handling of
>> ppc64 PUDs.  So, this may still be an issue.
>>
>> I am trying to dig into the code to determine if this is still and
>> issue.  Just curious if you looked into this?  Might be simpler and
>> safer to just put the seal check after setting the VM_HUGETLB flag?
> 
> Good catch!  I overlooked on that, and I definitely didn't look into it yet.
> For now I'd better move that check to be after the flag settings in all cases.
> 
> I'll also add:
> 
> Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
> 

Thanks!  With those changes, you can add,

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-05-03 22:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-01 14:41 [PATCH 0/2] mm/hugetlb: Fix issues on file sealing and fork Peter Xu
2021-05-01 14:41 ` [PATCH 1/2] mm/hugetlb: Fix F_SEAL_FUTURE_WRITE Peter Xu
2021-05-03 18:55   ` Mike Kravetz
2021-05-03 21:31     ` Peter Xu
2021-05-03 22:28       ` Mike Kravetz
2021-05-01 14:41 ` [PATCH 2/2] mm/hugetlb: Fix cow where page writtable in child Peter Xu
2021-05-03 20:53   ` Mike Kravetz
2021-05-03 21:41     ` Peter Xu
2021-05-03 22:10       ` Mike Kravetz
2021-05-03 22:24         ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.