All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] mm/uffd: Fix vma merge/split
@ 2023-05-17 15:04 Peter Xu
  2023-05-17 15:04 ` [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma Peter Xu
  2023-05-17 15:04 ` [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible Peter Xu
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Xu @ 2023-05-17 15:04 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Lorenzo Stoakes, Andrew Morton, Liam R . Howlett, Mark Rutland,
	Andrea Arcangeli, Mike Rapoport, peterx, Alexander Viro

This series contains two patches that fix vma merge/split for userfaultfd
on two separate issues.  The patchset is based on akpm/mm-hotfixes-unstable
with 2f628010799e reverted (where patch 1 should be used to replace it
which seems to be the plan we reached).

The major changes comparing to the patches I attached to the reply:

  - Fixed up patch 1 on vma_prev() side effect pointed out by Liam, further
    I simplified it to just bring back the two lines missing, so even shorter.

  - Add fixes tags for both patches, I decided to copy stable for both
    patch in this version, even though patch 2 is more or less tentative
    (as I don't see anything wrong besides vma didn't trigger a merge).

Patch 1 fixes a regression since 6.1+ due to something we overlooked when
converting to maple tree apis.  The plan is we use patch 1 to replace the
commit "2f628010799e (mm: userfaultfd: avoid passing an invalid range to
vma_merge())" in mm-hostfixes-unstable tree if possible, so as to bring
uffd vma operations back aligned with the rest code again.

Patch 2 fixes a long standing issue that vma can be left unmerged even if
we can for either uffd register or unregister.

Many thanks to Lorenzo on either noticing this issue from the assert
movement patch, looking at this problem, and also provided a reproducer on
the unmerged vma issue [1].

Please have a look, thanks.

[1] https://gist.github.com/lorenzo-stoakes/a11a10f5f479e7a977fc456331266e0e

Peter Xu (2):
  mm/uffd: Fix vma operation where start addr cuts part of vma
  mm/uffd: Allow vma to merge as much as possible

 fs/userfaultfd.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma
  2023-05-17 15:04 [PATCH 0/2] mm/uffd: Fix vma merge/split Peter Xu
@ 2023-05-17 15:04 ` Peter Xu
  2023-05-17 17:20   ` Lorenzo Stoakes
  2023-05-17 18:01   ` Liam R. Howlett
  2023-05-17 15:04 ` [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible Peter Xu
  1 sibling, 2 replies; 12+ messages in thread
From: Peter Xu @ 2023-05-17 15:04 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Lorenzo Stoakes, Andrew Morton, Liam R . Howlett, Mark Rutland,
	Andrea Arcangeli, Mike Rapoport, peterx, Alexander Viro,
	linux-stable

It seems vma merging with uffd paths is broken with either
register/unregister, where right now we can feed wrong parameters to
vma_merge() and it's found by recent patch which moved asserts upwards in
vma_merge() by Lorenzo Stoakes:

https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/

The problem is in the current code base we didn't fixup "prev" for the case
where "start" address can be within the "prev" vma section.  In that case
we should have "prev" points to the current vma rather than the previous
one when feeding to vma_merge().

This patch will eliminate the report and make sure vma_merge() calls will
become legal again.

One thing to mention is that the "Fixes: 29417d292bd0" below is there only
to help explain where the warning can start to trigger, the real commit to
fix should be 69dbe6daf104.  Commit 29417d292bd0 helps us to identify the
issue, but unfortunately we may want to keep it in Fixes too just to ease
kernel backporters for easier tracking.

Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 29417d292bd0 ("mm/mmap/vma_merge: always check invariants")
Fixes: 69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs")
Closes: https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0fd96d6e39ce..17c8c345dac4 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1459,6 +1459,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 
 	vma_iter_set(&vmi, start);
 	prev = vma_prev(&vmi);
+	if (vma->vm_start < start)
+		prev = vma;
 
 	ret = 0;
 	for_each_vma_range(vmi, vma, end) {
@@ -1625,6 +1627,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 
 	vma_iter_set(&vmi, start);
 	prev = vma_prev(&vmi);
+	if (vma->vm_start < start)
+		prev = vma;
+
 	ret = 0;
 	for_each_vma_range(vmi, vma, end) {
 		cond_resched();
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible
  2023-05-17 15:04 [PATCH 0/2] mm/uffd: Fix vma merge/split Peter Xu
  2023-05-17 15:04 ` [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma Peter Xu
@ 2023-05-17 15:04 ` Peter Xu
  2023-05-17 17:23   ` Lorenzo Stoakes
  2023-05-17 18:01   ` Liam R. Howlett
  1 sibling, 2 replies; 12+ messages in thread
From: Peter Xu @ 2023-05-17 15:04 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Lorenzo Stoakes, Andrew Morton, Liam R . Howlett, Mark Rutland,
	Andrea Arcangeli, Mike Rapoport, peterx, Alexander Viro,
	linux-stable

We used to not pass in the pgoff correctly when register/unregister uffd
regions, it caused incorrect behavior on vma merging and can cause
mergeable vmas being separate after ioctls return.

For example, when we have:

  vma1(range 0-9, with uffd), vma2(range 10-19, no uffd)

Then someone unregisters uffd on range (5-9), it should logically become:

  vma1(range 0-4, with uffd), vma2(range 5-19, no uffd)

But with current code we'll have:

  vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd)

This patch allows such merge to happen correctly before ioctl returns.

This behavior seems to have existed since the 1st day of uffd.  Since pgoff
for vma_merge() is only used to identify the possibility of vma merging,
meanwhile here what we did was always passing in a pgoff smaller than what
we should, so there should have no other side effect besides not merging
it.  Let's still tentatively copy stable for this, even though I don't see
anything will go wrong besides vma being split (which is mostly not user
visible).

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: linux-stable <stable@vger.kernel.org>
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 17c8c345dac4..4e800bb7d2ab 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1332,6 +1332,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 	bool basic_ioctls;
 	unsigned long start, end, vma_end;
 	struct vma_iterator vmi;
+	pgoff_t pgoff;
 
 	user_uffdio_register = (struct uffdio_register __user *) arg;
 
@@ -1484,8 +1485,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
 		vma_end = min(end, vma->vm_end);
 
 		new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
+		pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
 		prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
-				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
+				 vma->anon_vma, vma->vm_file, pgoff,
 				 vma_policy(vma),
 				 ((struct vm_userfaultfd_ctx){ ctx }),
 				 anon_vma_name(vma));
@@ -1565,6 +1567,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 	unsigned long start, end, vma_end;
 	const void __user *buf = (void __user *)arg;
 	struct vma_iterator vmi;
+	pgoff_t pgoff;
 
 	ret = -EFAULT;
 	if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
@@ -1667,8 +1670,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 			uffd_wp_range(vma, start, vma_end - start, false);
 
 		new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
+		pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
 		prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
-				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
+				 vma->anon_vma, vma->vm_file, pgoff,
 				 vma_policy(vma),
 				 NULL_VM_UFFD_CTX, anon_vma_name(vma));
 		if (prev) {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma
  2023-05-17 15:04 ` [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma Peter Xu
@ 2023-05-17 17:20   ` Lorenzo Stoakes
  2023-05-17 18:37     ` Peter Xu
  2023-05-17 18:01   ` Liam R. Howlett
  1 sibling, 1 reply; 12+ messages in thread
From: Lorenzo Stoakes @ 2023-05-17 17:20 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Liam R . Howlett,
	Mark Rutland, Andrea Arcangeli, Mike Rapoport, Alexander Viro,
	linux-stable

On Wed, May 17, 2023 at 11:04:07AM -0400, Peter Xu wrote:
> It seems vma merging with uffd paths is broken with either
> register/unregister, where right now we can feed wrong parameters to
> vma_merge() and it's found by recent patch which moved asserts upwards in
> vma_merge() by Lorenzo Stoakes:
>
> https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
>
> The problem is in the current code base we didn't fixup "prev" for the case
> where "start" address can be within the "prev" vma section.  In that case
> we should have "prev" points to the current vma rather than the previous
> one when feeding to vma_merge().

This doesn't seem quite correct, perhaps - "where start is contained within vma
but not clamped to its start. We need to convert this into case 4 which permits
subdivision of prev by assigning vma to prev. As we loop, each subsequent VMA
will be clamped to the start."

>
> This patch will eliminate the report and make sure vma_merge() calls will
> become legal again.
>
> One thing to mention is that the "Fixes: 29417d292bd0" below is there only
> to help explain where the warning can start to trigger, the real commit to
> fix should be 69dbe6daf104.  Commit 29417d292bd0 helps us to identify the
> issue, but unfortunately we may want to keep it in Fixes too just to ease
> kernel backporters for easier tracking.
>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Mike Rapoport (IBM) <rppt@kernel.org>
> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> Reported-by: Mark Rutland <mark.rutland@arm.com>
> Fixes: 29417d292bd0 ("mm/mmap/vma_merge: always check invariants")
> Fixes: 69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs")
> Closes: https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
> Cc: linux-stable <stable@vger.kernel.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  fs/userfaultfd.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 0fd96d6e39ce..17c8c345dac4 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1459,6 +1459,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>
>  	vma_iter_set(&vmi, start);
>  	prev = vma_prev(&vmi);
> +	if (vma->vm_start < start)
> +		prev = vma;
>
>  	ret = 0;
>  	for_each_vma_range(vmi, vma, end) {
> @@ -1625,6 +1627,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>
>  	vma_iter_set(&vmi, start);
>  	prev = vma_prev(&vmi);
> +	if (vma->vm_start < start)
> +		prev = vma;
> +
>  	ret = 0;
>  	for_each_vma_range(vmi, vma, end) {
>  		cond_resched();
> --
> 2.39.1
>

Other than that looks good:-

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible
  2023-05-17 15:04 ` [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible Peter Xu
@ 2023-05-17 17:23   ` Lorenzo Stoakes
  2023-05-17 18:39     ` Peter Xu
  2023-05-17 18:01   ` Liam R. Howlett
  1 sibling, 1 reply; 12+ messages in thread
From: Lorenzo Stoakes @ 2023-05-17 17:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Liam R . Howlett,
	Mark Rutland, Andrea Arcangeli, Mike Rapoport, Alexander Viro,
	linux-stable

On Wed, May 17, 2023 at 11:04:08AM -0400, Peter Xu wrote:
> We used to not pass in the pgoff correctly when register/unregister uffd
> regions, it caused incorrect behavior on vma merging and can cause
> mergeable vmas being separate after ioctls return.
>
> For example, when we have:
>
>   vma1(range 0-9, with uffd), vma2(range 10-19, no uffd)
>
> Then someone unregisters uffd on range (5-9), it should logically become:
>
>   vma1(range 0-4, with uffd), vma2(range 5-19, no uffd)
>
> But with current code we'll have:
>
>   vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd)
>
> This patch allows such merge to happen correctly before ioctl returns.
>
> This behavior seems to have existed since the 1st day of uffd.  Since pgoff
> for vma_merge() is only used to identify the possibility of vma merging,
> meanwhile here what we did was always passing in a pgoff smaller than what
> we should, so there should have no other side effect besides not merging
> it.  Let's still tentatively copy stable for this, even though I don't see
> anything will go wrong besides vma being split (which is mostly not user
> visible).
>

Maybe a Reported-by me since I discovered the fragmentation was already
happening via the repro? :)

> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Mike Rapoport (IBM) <rppt@kernel.org>
> Cc: linux-stable <stable@vger.kernel.org>
> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  fs/userfaultfd.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 17c8c345dac4..4e800bb7d2ab 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1332,6 +1332,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>  	bool basic_ioctls;
>  	unsigned long start, end, vma_end;
>  	struct vma_iterator vmi;
> +	pgoff_t pgoff;
>
>  	user_uffdio_register = (struct uffdio_register __user *) arg;
>
> @@ -1484,8 +1485,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>  		vma_end = min(end, vma->vm_end);
>
>  		new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
> +		pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
>  		prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
> -				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> +				 vma->anon_vma, vma->vm_file, pgoff,
>  				 vma_policy(vma),
>  				 ((struct vm_userfaultfd_ctx){ ctx }),
>  				 anon_vma_name(vma));
> @@ -1565,6 +1567,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  	unsigned long start, end, vma_end;
>  	const void __user *buf = (void __user *)arg;
>  	struct vma_iterator vmi;
> +	pgoff_t pgoff;
>
>  	ret = -EFAULT;
>  	if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
> @@ -1667,8 +1670,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  			uffd_wp_range(vma, start, vma_end - start, false);
>
>  		new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
> +		pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
>  		prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
> -				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> +				 vma->anon_vma, vma->vm_file, pgoff,
>  				 vma_policy(vma),
>  				 NULL_VM_UFFD_CTX, anon_vma_name(vma));
>  		if (prev) {
> --
> 2.39.1
>

Acked-by: Lorenzo Stoakes <lstoakes@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma
  2023-05-17 15:04 ` [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma Peter Xu
  2023-05-17 17:20   ` Lorenzo Stoakes
@ 2023-05-17 18:01   ` Liam R. Howlett
  1 sibling, 0 replies; 12+ messages in thread
From: Liam R. Howlett @ 2023-05-17 18:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Lorenzo Stoakes, Andrew Morton,
	Mark Rutland, Andrea Arcangeli, Mike Rapoport, Alexander Viro,
	linux-stable

* Peter Xu <peterx@redhat.com> [230517 11:04]:
> It seems vma merging with uffd paths is broken with either
> register/unregister, where right now we can feed wrong parameters to
> vma_merge() and it's found by recent patch which moved asserts upwards in
> vma_merge() by Lorenzo Stoakes:
> 
> https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
> 
> The problem is in the current code base we didn't fixup "prev" for the case
> where "start" address can be within the "prev" vma section.  In that case
> we should have "prev" points to the current vma rather than the previous
> one when feeding to vma_merge().
> 
> This patch will eliminate the report and make sure vma_merge() calls will
> become legal again.
> 
> One thing to mention is that the "Fixes: 29417d292bd0" below is there only
> to help explain where the warning can start to trigger, the real commit to
> fix should be 69dbe6daf104.  Commit 29417d292bd0 helps us to identify the
> issue, but unfortunately we may want to keep it in Fixes too just to ease
> kernel backporters for easier tracking.
> 
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Mike Rapoport (IBM) <rppt@kernel.org>
> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> Reported-by: Mark Rutland <mark.rutland@arm.com>
> Fixes: 29417d292bd0 ("mm/mmap/vma_merge: always check invariants")
> Fixes: 69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs")
> Closes: https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
> Cc: linux-stable <stable@vger.kernel.org>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

> ---
>  fs/userfaultfd.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 0fd96d6e39ce..17c8c345dac4 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1459,6 +1459,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>  
>  	vma_iter_set(&vmi, start);
>  	prev = vma_prev(&vmi);
> +	if (vma->vm_start < start)
> +		prev = vma;
>  
>  	ret = 0;
>  	for_each_vma_range(vmi, vma, end) {
> @@ -1625,6 +1627,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  
>  	vma_iter_set(&vmi, start);
>  	prev = vma_prev(&vmi);
> +	if (vma->vm_start < start)
> +		prev = vma;
> +
>  	ret = 0;
>  	for_each_vma_range(vmi, vma, end) {
>  		cond_resched();
> -- 
> 2.39.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible
  2023-05-17 15:04 ` [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible Peter Xu
  2023-05-17 17:23   ` Lorenzo Stoakes
@ 2023-05-17 18:01   ` Liam R. Howlett
  1 sibling, 0 replies; 12+ messages in thread
From: Liam R. Howlett @ 2023-05-17 18:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Lorenzo Stoakes, Andrew Morton,
	Mark Rutland, Andrea Arcangeli, Mike Rapoport, Alexander Viro,
	linux-stable

* Peter Xu <peterx@redhat.com> [230517 11:04]:
> We used to not pass in the pgoff correctly when register/unregister uffd
> regions, it caused incorrect behavior on vma merging and can cause
> mergeable vmas being separate after ioctls return.
> 
> For example, when we have:
> 
>   vma1(range 0-9, with uffd), vma2(range 10-19, no uffd)
> 
> Then someone unregisters uffd on range (5-9), it should logically become:
> 
>   vma1(range 0-4, with uffd), vma2(range 5-19, no uffd)
> 
> But with current code we'll have:
> 
>   vma1(range 0-4, with uffd), vma3(range 5-9, no uffd), vma2(range 10-19, no uffd)
> 
> This patch allows such merge to happen correctly before ioctl returns.
> 
> This behavior seems to have existed since the 1st day of uffd.  Since pgoff
> for vma_merge() is only used to identify the possibility of vma merging,
> meanwhile here what we did was always passing in a pgoff smaller than what
> we should, so there should have no other side effect besides not merging
> it.  Let's still tentatively copy stable for this, even though I don't see
> anything will go wrong besides vma being split (which is mostly not user
> visible).
> 
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Mike Rapoport (IBM) <rppt@kernel.org>
> Cc: linux-stable <stable@vger.kernel.org>
> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

> ---
>  fs/userfaultfd.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 17c8c345dac4..4e800bb7d2ab 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1332,6 +1332,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>  	bool basic_ioctls;
>  	unsigned long start, end, vma_end;
>  	struct vma_iterator vmi;
> +	pgoff_t pgoff;
>  
>  	user_uffdio_register = (struct uffdio_register __user *) arg;
>  
> @@ -1484,8 +1485,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
>  		vma_end = min(end, vma->vm_end);
>  
>  		new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags;
> +		pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
>  		prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
> -				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> +				 vma->anon_vma, vma->vm_file, pgoff,
>  				 vma_policy(vma),
>  				 ((struct vm_userfaultfd_ctx){ ctx }),
>  				 anon_vma_name(vma));
> @@ -1565,6 +1567,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  	unsigned long start, end, vma_end;
>  	const void __user *buf = (void __user *)arg;
>  	struct vma_iterator vmi;
> +	pgoff_t pgoff;
>  
>  	ret = -EFAULT;
>  	if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
> @@ -1667,8 +1670,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  			uffd_wp_range(vma, start, vma_end - start, false);
>  
>  		new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
> +		pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
>  		prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags,
> -				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> +				 vma->anon_vma, vma->vm_file, pgoff,
>  				 vma_policy(vma),
>  				 NULL_VM_UFFD_CTX, anon_vma_name(vma));
>  		if (prev) {
> -- 
> 2.39.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma
  2023-05-17 17:20   ` Lorenzo Stoakes
@ 2023-05-17 18:37     ` Peter Xu
  2023-05-17 18:40       ` Lorenzo Stoakes
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2023-05-17 18:37 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, linux-mm, Andrew Morton, Liam R . Howlett,
	Mark Rutland, Andrea Arcangeli, Mike Rapoport, Alexander Viro,
	linux-stable

On Wed, May 17, 2023 at 06:20:55PM +0100, Lorenzo Stoakes wrote:
> On Wed, May 17, 2023 at 11:04:07AM -0400, Peter Xu wrote:
> > It seems vma merging with uffd paths is broken with either
> > register/unregister, where right now we can feed wrong parameters to
> > vma_merge() and it's found by recent patch which moved asserts upwards in
> > vma_merge() by Lorenzo Stoakes:
> >
> > https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
> >
> > The problem is in the current code base we didn't fixup "prev" for the case
> > where "start" address can be within the "prev" vma section.  In that case
> > we should have "prev" points to the current vma rather than the previous
> > one when feeding to vma_merge().
> 
> This doesn't seem quite correct, perhaps - "where start is contained within vma
> but not clamped to its start. We need to convert this into case 4 which permits
> subdivision of prev by assigning vma to prev. As we loop, each subsequent VMA
> will be clamped to the start."

I think it covers more than case 4 - it can also be case 0 where no merge
will happen?

> 
> >
> > This patch will eliminate the report and make sure vma_merge() calls will
> > become legal again.
> >
> > One thing to mention is that the "Fixes: 29417d292bd0" below is there only
> > to help explain where the warning can start to trigger, the real commit to
> > fix should be 69dbe6daf104.  Commit 29417d292bd0 helps us to identify the
> > issue, but unfortunately we may want to keep it in Fixes too just to ease
> > kernel backporters for easier tracking.
> >
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Mike Rapoport (IBM) <rppt@kernel.org>
> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > Reported-by: Mark Rutland <mark.rutland@arm.com>
> > Fixes: 29417d292bd0 ("mm/mmap/vma_merge: always check invariants")
> > Fixes: 69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs")
> > Closes: https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
> > Cc: linux-stable <stable@vger.kernel.org>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  fs/userfaultfd.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index 0fd96d6e39ce..17c8c345dac4 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -1459,6 +1459,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> >
> >  	vma_iter_set(&vmi, start);
> >  	prev = vma_prev(&vmi);
> > +	if (vma->vm_start < start)
> > +		prev = vma;
> >
> >  	ret = 0;
> >  	for_each_vma_range(vmi, vma, end) {
> > @@ -1625,6 +1627,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
> >
> >  	vma_iter_set(&vmi, start);
> >  	prev = vma_prev(&vmi);
> > +	if (vma->vm_start < start)
> > +		prev = vma;
> > +
> >  	ret = 0;
> >  	for_each_vma_range(vmi, vma, end) {
> >  		cond_resched();
> > --
> > 2.39.1
> >
> 
> Other than that looks good:-
> 
> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>

Thanks to both on the quick reviews!

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible
  2023-05-17 17:23   ` Lorenzo Stoakes
@ 2023-05-17 18:39     ` Peter Xu
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2023-05-17 18:39 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, linux-mm, Andrew Morton, Liam R . Howlett,
	Mark Rutland, Andrea Arcangeli, Mike Rapoport, Alexander Viro,
	linux-stable

On Wed, May 17, 2023 at 06:23:18PM +0100, Lorenzo Stoakes wrote:
> Maybe a Reported-by me since I discovered the fragmentation was already
> happening via the repro? :)

Sure!  I'll add it when/if there's a repost.  Thanks.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma
  2023-05-17 18:37     ` Peter Xu
@ 2023-05-17 18:40       ` Lorenzo Stoakes
  2023-05-17 18:54         ` Peter Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Stoakes @ 2023-05-17 18:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Liam R . Howlett,
	Mark Rutland, Andrea Arcangeli, Mike Rapoport, Alexander Viro,
	linux-stable

On Wed, May 17, 2023 at 02:37:41PM -0400, Peter Xu wrote:
> On Wed, May 17, 2023 at 06:20:55PM +0100, Lorenzo Stoakes wrote:
> > On Wed, May 17, 2023 at 11:04:07AM -0400, Peter Xu wrote:
> > > It seems vma merging with uffd paths is broken with either
> > > register/unregister, where right now we can feed wrong parameters to
> > > vma_merge() and it's found by recent patch which moved asserts upwards in
> > > vma_merge() by Lorenzo Stoakes:
> > >
> > > https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
> > >
> > > The problem is in the current code base we didn't fixup "prev" for the case
> > > where "start" address can be within the "prev" vma section.  In that case
> > > we should have "prev" points to the current vma rather than the previous
> > > one when feeding to vma_merge().
> >
> > This doesn't seem quite correct, perhaps - "where start is contained within vma
> > but not clamped to its start. We need to convert this into case 4 which permits
> > subdivision of prev by assigning vma to prev. As we loop, each subsequent VMA
> > will be clamped to the start."
>
> I think it covers more than case 4 - it can also be case 0 where no merge
> will happen?

Ugh please let's not call a case that doesn't merge by a number :P but sure of
course it might also not merge.

>
> >
> > >
> > > This patch will eliminate the report and make sure vma_merge() calls will
> > > become legal again.
> > >
> > > One thing to mention is that the "Fixes: 29417d292bd0" below is there only
> > > to help explain where the warning can start to trigger, the real commit to
> > > fix should be 69dbe6daf104.  Commit 29417d292bd0 helps us to identify the
> > > issue, but unfortunately we may want to keep it in Fixes too just to ease
> > > kernel backporters for easier tracking.
> > >
> > > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > > Cc: Mike Rapoport (IBM) <rppt@kernel.org>
> > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > > Reported-by: Mark Rutland <mark.rutland@arm.com>
> > > Fixes: 29417d292bd0 ("mm/mmap/vma_merge: always check invariants")
> > > Fixes: 69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs")
> > > Closes: https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
> > > Cc: linux-stable <stable@vger.kernel.org>
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  fs/userfaultfd.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > index 0fd96d6e39ce..17c8c345dac4 100644
> > > --- a/fs/userfaultfd.c
> > > +++ b/fs/userfaultfd.c
> > > @@ -1459,6 +1459,8 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> > >
> > >  	vma_iter_set(&vmi, start);
> > >  	prev = vma_prev(&vmi);
> > > +	if (vma->vm_start < start)
> > > +		prev = vma;
> > >
> > >  	ret = 0;
> > >  	for_each_vma_range(vmi, vma, end) {
> > > @@ -1625,6 +1627,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
> > >
> > >  	vma_iter_set(&vmi, start);
> > >  	prev = vma_prev(&vmi);
> > > +	if (vma->vm_start < start)
> > > +		prev = vma;
> > > +
> > >  	ret = 0;
> > >  	for_each_vma_range(vmi, vma, end) {
> > >  		cond_resched();
> > > --
> > > 2.39.1
> > >
> >
> > Other than that looks good:-
> >
> > Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
>
> Thanks to both on the quick reviews!

No problem!

>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma
  2023-05-17 18:40       ` Lorenzo Stoakes
@ 2023-05-17 18:54         ` Peter Xu
  2023-05-17 19:03           ` Lorenzo Stoakes
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2023-05-17 18:54 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, linux-mm, Andrew Morton, Liam R . Howlett,
	Mark Rutland, Andrea Arcangeli, Mike Rapoport, Alexander Viro,
	linux-stable

On Wed, May 17, 2023 at 07:40:59PM +0100, Lorenzo Stoakes wrote:
> On Wed, May 17, 2023 at 02:37:41PM -0400, Peter Xu wrote:
> > On Wed, May 17, 2023 at 06:20:55PM +0100, Lorenzo Stoakes wrote:
> > > On Wed, May 17, 2023 at 11:04:07AM -0400, Peter Xu wrote:
> > > > It seems vma merging with uffd paths is broken with either
> > > > register/unregister, where right now we can feed wrong parameters to
> > > > vma_merge() and it's found by recent patch which moved asserts upwards in
> > > > vma_merge() by Lorenzo Stoakes:
> > > >
> > > > https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
> > > >
> > > > The problem is in the current code base we didn't fixup "prev" for the case
> > > > where "start" address can be within the "prev" vma section.  In that case
> > > > we should have "prev" points to the current vma rather than the previous
> > > > one when feeding to vma_merge().
> > >
> > > This doesn't seem quite correct, perhaps - "where start is contained within vma
> > > but not clamped to its start. We need to convert this into case 4 which permits
> > > subdivision of prev by assigning vma to prev. As we loop, each subsequent VMA
> > > will be clamped to the start."
> >
> > I think it covers more than case 4 - it can also be case 0 where no merge
> > will happen?
> 
> Ugh please let's not call a case that doesn't merge by a number :P but sure of
> course it might also not merge.

To me the original paragraph was still fine. But if you prefer your version
(which I'm perfectly fine either way if you'd like to spell out what cases
it'll trigger), it'll be:

  It's possible that "start" is contained within vma but not clamped to its
  start.  We need to convert this into either "cannot merge" case or "can
  merge" case 4 which permits subdivision of prev by assigning vma to
  prev. As we loop, each subsequent VMA will be clamped to the start.

Does that look good to you?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma
  2023-05-17 18:54         ` Peter Xu
@ 2023-05-17 19:03           ` Lorenzo Stoakes
  0 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2023-05-17 19:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Liam R . Howlett,
	Mark Rutland, Andrea Arcangeli, Mike Rapoport, Alexander Viro,
	linux-stable

On Wed, May 17, 2023 at 02:54:39PM -0400, Peter Xu wrote:
> On Wed, May 17, 2023 at 07:40:59PM +0100, Lorenzo Stoakes wrote:
> > On Wed, May 17, 2023 at 02:37:41PM -0400, Peter Xu wrote:
> > > On Wed, May 17, 2023 at 06:20:55PM +0100, Lorenzo Stoakes wrote:
> > > > On Wed, May 17, 2023 at 11:04:07AM -0400, Peter Xu wrote:
> > > > > It seems vma merging with uffd paths is broken with either
> > > > > register/unregister, where right now we can feed wrong parameters to
> > > > > vma_merge() and it's found by recent patch which moved asserts upwards in
> > > > > vma_merge() by Lorenzo Stoakes:
> > > > >
> > > > > https://lore.kernel.org/all/ZFunF7DmMdK05MoF@FVFF77S0Q05N.cambridge.arm.com/
> > > > >
> > > > > The problem is in the current code base we didn't fixup "prev" for the case
> > > > > where "start" address can be within the "prev" vma section.  In that case
> > > > > we should have "prev" points to the current vma rather than the previous
> > > > > one when feeding to vma_merge().
> > > >
> > > > This doesn't seem quite correct, perhaps - "where start is contained within vma
> > > > but not clamped to its start. We need to convert this into case 4 which permits
> > > > subdivision of prev by assigning vma to prev. As we loop, each subsequent VMA
> > > > will be clamped to the start."
> > >
> > > I think it covers more than case 4 - it can also be case 0 where no merge
> > > will happen?
> >
> > Ugh please let's not call a case that doesn't merge by a number :P but sure of
> > course it might also not merge.
>
> To me the original paragraph was still fine. But if you prefer your version
> (which I'm perfectly fine either way if you'd like to spell out what cases
> it'll trigger), it'll be:
>
>   It's possible that "start" is contained within vma but not clamped to its
>   start.  We need to convert this into either "cannot merge" case or "can
>   merge" case 4 which permits subdivision of prev by assigning vma to
>   prev. As we loop, each subsequent VMA will be clamped to the start.
>
> Does that look good to you?
>

Looks good to me, thanks for taking the time!

> Thanks,
>
> --
> Peter Xu
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-05-17 19:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-17 15:04 [PATCH 0/2] mm/uffd: Fix vma merge/split Peter Xu
2023-05-17 15:04 ` [PATCH 1/2] mm/uffd: Fix vma operation where start addr cuts part of vma Peter Xu
2023-05-17 17:20   ` Lorenzo Stoakes
2023-05-17 18:37     ` Peter Xu
2023-05-17 18:40       ` Lorenzo Stoakes
2023-05-17 18:54         ` Peter Xu
2023-05-17 19:03           ` Lorenzo Stoakes
2023-05-17 18:01   ` Liam R. Howlett
2023-05-17 15:04 ` [PATCH 2/2] mm/uffd: Allow vma to merge as much as possible Peter Xu
2023-05-17 17:23   ` Lorenzo Stoakes
2023-05-17 18:39     ` Peter Xu
2023-05-17 18:01   ` Liam R. Howlett

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.