linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTNUE error handling + accounting
@ 2021-03-25 23:10 Axel Rasmussen
  2021-03-27 21:57 ` Peter Xu
  0 siblings, 1 reply; 2+ messages in thread
From: Axel Rasmussen @ 2021-03-25 23:10 UTC (permalink / raw)
  To: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Rapoport,
	Peter Xu, Shaohua Li, Shuah Khan, Stephen Rothwell, Wang Qing
  Cc: linux-kernel, linux-fsdevel, linux-mm, linux-kselftest,
	Axel Rasmussen, Brian Geffon, Cannon Matthews,
	Dr . David Alan Gilbert, David Rientjes, Michel Lespinasse,
	Mina Almasry, Oliver Upton

Previously, in the error path, we unconditionally removed the page from
the page cache. But in the continue case, we didn't add it - it was
already there because the page is used by a second (non-UFFD-registered)
mapping. So, in that case, it's incorrect to remove it as the other
mapping may still use it normally.

For this error handling failure, trivially exercise it in the
userfaultfd selftest, to detect this kind of bug in the future.

Also, we previously were unconditionally calling shmem_inode_acct_block.
In the continue case, however, this is incorrect, because we would have
already accounted for the RAM usage when the page was originally
allocated (since at this point it's already in the page cache). So,
doing it in the continue case causes us to double-count.

Fixes: 00da60b9d0a0 ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
---
 mm/shmem.c                               | 15 ++++++++++-----
 tools/testing/selftests/vm/userfaultfd.c | 12 ++++++++++++
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index d2e0e81b7d2e..5ac8ea737004 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2379,9 +2379,11 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 	int ret;
 	pgoff_t offset, max_off;
 
-	ret = -ENOMEM;
-	if (!shmem_inode_acct_block(inode, 1))
-		goto out;
+	if (!is_continue) {
+		ret = -ENOMEM;
+		if (!shmem_inode_acct_block(inode, 1))
+			goto out;
+	}
 
 	if (is_continue) {
 		ret = -EFAULT;
@@ -2389,6 +2391,7 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 		if (!page)
 			goto out_unacct_blocks;
 	} else if (!*pagep) {
+		ret = -ENOMEM;
 		page = shmem_alloc_page(gfp, info, pgoff);
 		if (!page)
 			goto out_unacct_blocks;
@@ -2486,12 +2489,14 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
 out_release_unlock:
 	pte_unmap_unlock(dst_pte, ptl);
 	ClearPageDirty(page);
-	delete_from_page_cache(page);
+	if (!is_continue)
+		delete_from_page_cache(page);
 out_release:
 	unlock_page(page);
 	put_page(page);
 out_unacct_blocks:
-	shmem_inode_unacct_blocks(inode, 1);
+	if (!is_continue)
+		shmem_inode_unacct_blocks(inode, 1);
 	goto out;
 }
 #endif /* CONFIG_USERFAULTFD */
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index f6c86b036d0f..d8541a59dae5 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -485,6 +485,7 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp)
 static void continue_range(int ufd, __u64 start, __u64 len)
 {
 	struct uffdio_continue req;
+	int ret;
 
 	req.range.start = start;
 	req.range.len = len;
@@ -493,6 +494,17 @@ static void continue_range(int ufd, __u64 start, __u64 len)
 	if (ioctl(ufd, UFFDIO_CONTINUE, &req))
 		err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
 		    (uint64_t)start);
+
+	/*
+	 * Error handling within the kernel for continue is subtly different
+	 * from copy or zeropage, so it may be a source of bugs. Trigger an
+	 * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
+	 */
+	req.mapped = 0;
+	ret = ioctl(ufd, UFFDIO_CONTINUE, &req);
+	if (ret >= 0 || req.mapped != -EEXIST)
+		err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d, mapped=%" PRId64,
+		    ret, req.mapped);
 }
 
 static void *locking_thread(void *arg)
-- 
2.31.0.291.g576ba9dcdaf-goog



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTNUE error handling + accounting
  2021-03-25 23:10 [PATCH] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTNUE error handling + accounting Axel Rasmussen
@ 2021-03-27 21:57 ` Peter Xu
  0 siblings, 0 replies; 2+ messages in thread
From: Peter Xu @ 2021-03-27 21:57 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Alexander Viro, Andrea Arcangeli, Andrew Morton, Hugh Dickins,
	Jerome Glisse, Joe Perches, Lokesh Gidra, Mike Rapoport,
	Shaohua Li, Shuah Khan, Stephen Rothwell, Wang Qing,
	linux-kernel, linux-fsdevel, linux-mm, linux-kselftest,
	Brian Geffon, Cannon Matthews, Dr . David Alan Gilbert,
	David Rientjes, Michel Lespinasse, Mina Almasry, Oliver Upton

Axel,

On Thu, Mar 25, 2021 at 04:10:27PM -0700, Axel Rasmussen wrote:
> Previously, in the error path, we unconditionally removed the page from
> the page cache. But in the continue case, we didn't add it - it was
> already there because the page is used by a second (non-UFFD-registered)
> mapping. So, in that case, it's incorrect to remove it as the other
> mapping may still use it normally.
> 
> For this error handling failure, trivially exercise it in the
> userfaultfd selftest, to detect this kind of bug in the future.
> 
> Also, we previously were unconditionally calling shmem_inode_acct_block.
> In the continue case, however, this is incorrect, because we would have
> already accounted for the RAM usage when the page was originally
> allocated (since at this point it's already in the page cache). So,
> doing it in the continue case causes us to double-count.
> 
> Fixes: 00da60b9d0a0 ("userfaultfd: support minor fault handling for shmem")
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> ---
>  mm/shmem.c                               | 15 ++++++++++-----
>  tools/testing/selftests/vm/userfaultfd.c | 12 ++++++++++++
>  2 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d2e0e81b7d2e..5ac8ea737004 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2379,9 +2379,11 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
>  	int ret;
>  	pgoff_t offset, max_off;
>  
> -	ret = -ENOMEM;
> -	if (!shmem_inode_acct_block(inode, 1))

IMHO a better change here is to only touch this line into:

        if (!is_continue && !shmem_inode_acct_block(inode, 1))

Then you don't need to touch any other line, also you can drop line [1] below
too as a side benefit.

> -		goto out;
> +	if (!is_continue) {
> +		ret = -ENOMEM;
> +		if (!shmem_inode_acct_block(inode, 1))
> +			goto out;
> +	}
>  
>  	if (is_continue) {
>  		ret = -EFAULT;
> @@ -2389,6 +2391,7 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
>  		if (!page)
>  			goto out_unacct_blocks;
>  	} else if (!*pagep) {
> +		ret = -ENOMEM;

[1]

>  		page = shmem_alloc_page(gfp, info, pgoff);
>  		if (!page)
>  			goto out_unacct_blocks;
> @@ -2486,12 +2489,14 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
>  out_release_unlock:
>  	pte_unmap_unlock(dst_pte, ptl);
>  	ClearPageDirty(page);

Should this be conditional too?  IIUC this will clear dirty for the page cache
even if it was dirty.  I'm not sure whether it'll cause data loss.

> -	delete_from_page_cache(page);
> +	if (!is_continue)
> +		delete_from_page_cache(page);
>  out_release:
>  	unlock_page(page);
>  	put_page(page);
>  out_unacct_blocks:
> -	shmem_inode_unacct_blocks(inode, 1);
> +	if (!is_continue)
> +		shmem_inode_unacct_blocks(inode, 1);
>  	goto out;
>  }

Besides the error handling, I looked at the function again and I have another
two thoughts:

1. IMHO in shmem_mcopy_atomic_pte() we should also conditionally call
   pte_mkwrite() just like the hugetlb code too, so as to keep W bit clear when
   !VM_SHARED.

2. I see even more "if (!is_continue)" here.. I'm thinking whether we can
   simply jump to pte installation "if (is_continue)" instead, because
   uffdio-continue shoiuld really be a lightweight operation.

   E.g., most of the things at the middle of the function is not relevant to
   uffd-continue.  To be explicit:

	ret = -EFAULT;
	offset = linear_page_index(dst_vma, dst_addr);
	max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
	if (unlikely(offset >= max_off))
		goto out_release;

   These chunk can be put into !is_continue too, then if you see you're
   bypassing mostly everything.  Then error handling of uffdio-continue would
   be simple too, since it could only fail if either page cache not exist, or
   pte changed.  Nothing else could happen.

For above point (1), I even start to doubt whether we should conditionally
grant the write bit for normal uffdio_copy case only if both WRITE|SHARED set
for the vma flags. E.g., shmem_mcopy_atomic_pte() of a normal uffdio-copy will
fill in the page cache into pte, however what if this mapping is privately
mapped?  IMHO we can't apply write bit otherwise the process will be writting
to the page cache directly.

However I think that question will be irrelevant to this patch.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-03-27 21:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-25 23:10 [PATCH] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTNUE error handling + accounting Axel Rasmussen
2021-03-27 21:57 ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).