linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/rmap: fix reusing mergeable anon_vma as parent when fork
@ 2020-01-06 10:42 Konstantin Khlebnikov
  2020-01-06 20:35 ` Konstantin Khlebnikov
  2020-01-07  2:32 ` Wei Yang
  0 siblings, 2 replies; 5+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-06 10:42 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, linux-kernel, Wei Yang; +Cc: Kirill A. Shutemov

This fixes couple misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
mergeable anon_vma as parent when fork").

First problem caused by initialization order in dup_mmap(): vma->vm_prev
is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
previous VMA in parent mm. This is fixed by rearrangement in dup_mmap().

If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
before all patches in child process related VMAs: DST1 DST2 .. DSTn will
use different anon-vmas: ANON1 ANON2 .. ANONn. Before this patch only DST1
will fork new ANON1 and following DST2 .. DSTn will share parent's ANON0.
With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.

Also this patch moves sharing logic out of anon_vma_clone() into more
specific anon_vma_fork() because this supposed to work only at fork().
Function anon_vma_clone() is more generic is also used at splitting VMAs.

Second problem is hidden behind first one: assumption "Parent has vm_prev,
which implies we have vm_prev" is wrong if first VMA in parent mm has set
flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
because in current code 'prev' actually is same as 'pprev'. To avoid that
this patch just checks pointer and compares vm_start to verify relation
between previous VMAs in parent and child.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
---
 kernel/fork.c |    4 ++--
 mm/rmap.c     |   25 ++++++++++++-------------
 2 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 2508a4f238a3..04ee5e243f65 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -548,6 +548,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		if (retval)
 			goto fail_nomem_policy;
 		tmp->vm_mm = mm;
+		tmp->vm_prev = prev;	/* anon_vma_fork use this */
+		tmp->vm_next = NULL;
 		retval = dup_userfaultfd(tmp, &uf);
 		if (retval)
 			goto fail_nomem_anon_vma_fork;
@@ -559,7 +561,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		} else if (anon_vma_fork(tmp, mpnt))
 			goto fail_nomem_anon_vma_fork;
 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
-		tmp->vm_next = tmp->vm_prev = NULL;
 		file = tmp->vm_file;
 		if (file) {
 			struct inode *inode = file_inode(file);
@@ -592,7 +593,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		 */
 		*pprev = tmp;
 		pprev = &tmp->vm_next;
-		tmp->vm_prev = prev;
 		prev = tmp;
 
 		__vma_link_rb(mm, tmp, rb_link, rb_parent);
diff --git a/mm/rmap.c b/mm/rmap.c
index b3e381919835..77b3aa38d5c2 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
 {
 	struct anon_vma_chain *avc, *pavc;
 	struct anon_vma *root = NULL;
-	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
-
-	/*
-	 * If parent share anon_vma with its vm_prev, keep this sharing in in
-	 * child.
-	 *
-	 * 1. Parent has vm_prev, which implies we have vm_prev.
-	 * 2. Parent and its vm_prev have the same anon_vma.
-	 */
-	if (!dst->anon_vma && src->anon_vma &&
-	    pprev && pprev->anon_vma == src->anon_vma)
-		dst->anon_vma = prev->anon_vma;
-
 
 	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
 		struct anon_vma *anon_vma;
@@ -334,6 +321,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
  */
 int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 {
+	struct vm_area_struct *prev = vma->vm_prev, *pprev = pvma->vm_prev;
 	struct anon_vma_chain *avc;
 	struct anon_vma *anon_vma;
 	int error;
@@ -345,6 +333,17 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
 	vma->anon_vma = NULL;
 
+	/*
+	 * If parent shares anon_vma with its vm_prev, keep this sharing.
+	 *
+	 * Previous VMA could be missing or not match previuos in parent
+	 * if VM_DONTCOPY is set: compare vm_start to avoid this case.
+	 */
+	if (pvma->anon_vma && pprev && prev &&
+	    pprev->anon_vma == pvma->anon_vma &&
+	    pprev->vm_start == prev->vm_start)
+		vma->anon_vma = prev->anon_vma;
+
 	/*
 	 * First, attach the new VMA to the parent VMA's anon_vmas,
 	 * so rmap can find non-COWed pages in child processes.


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/rmap: fix reusing mergeable anon_vma as parent when fork
  2020-01-06 10:42 [PATCH] mm/rmap: fix reusing mergeable anon_vma as parent when fork Konstantin Khlebnikov
@ 2020-01-06 20:35 ` Konstantin Khlebnikov
       [not found]   ` <2020010710441027026650@gmail.com>
  2020-01-07  2:32 ` Wei Yang
  1 sibling, 1 reply; 5+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-06 20:35 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, Linux Kernel Mailing List, Wei Yang,
	Kirill A. Shutemov, Li Xinhai

On Mon, Jan 6, 2020 at 1:42 PM Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
>
> This fixes couple misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
> mergeable anon_vma as parent when fork").
>
> First problem caused by initialization order in dup_mmap(): vma->vm_prev
> is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
> previous VMA in parent mm. This is fixed by rearrangement in dup_mmap().
>
> If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
> before all patches in child process related VMAs: DST1 DST2 .. DSTn will
> use different anon-vmas: ANON1 ANON2 .. ANONn. Before this patch only DST1
> will fork new ANON1 and following DST2 .. DSTn will share parent's ANON0.
> With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>
> Also this patch moves sharing logic out of anon_vma_clone() into more
> specific anon_vma_fork() because this supposed to work only at fork().
> Function anon_vma_clone() is more generic is also used at splitting VMAs.
>
> Second problem is hidden behind first one: assumption "Parent has vm_prev,
> which implies we have vm_prev" is wrong if first VMA in parent mm has set
> flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
> because in current code 'prev' actually is same as 'pprev'. To avoid that
> this patch just checks pointer and compares vm_start to verify relation
> between previous VMAs in parent and child.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")

Oops, I've forgot to mention that Li Xinhai <lixinhai.lxh@gmail.com>
found and reported this suspicious code. Sorry.

Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t

> ---
>  kernel/fork.c |    4 ++--
>  mm/rmap.c     |   25 ++++++++++++-------------
>  2 files changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 2508a4f238a3..04ee5e243f65 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -548,6 +548,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>                 if (retval)
>                         goto fail_nomem_policy;
>                 tmp->vm_mm = mm;
> +               tmp->vm_prev = prev;    /* anon_vma_fork use this */
> +               tmp->vm_next = NULL;
>                 retval = dup_userfaultfd(tmp, &uf);
>                 if (retval)
>                         goto fail_nomem_anon_vma_fork;
> @@ -559,7 +561,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>                 } else if (anon_vma_fork(tmp, mpnt))
>                         goto fail_nomem_anon_vma_fork;
>                 tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
> -               tmp->vm_next = tmp->vm_prev = NULL;
>                 file = tmp->vm_file;
>                 if (file) {
>                         struct inode *inode = file_inode(file);
> @@ -592,7 +593,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>                  */
>                 *pprev = tmp;
>                 pprev = &tmp->vm_next;
> -               tmp->vm_prev = prev;
>                 prev = tmp;
>
>                 __vma_link_rb(mm, tmp, rb_link, rb_parent);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index b3e381919835..77b3aa38d5c2 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>  {
>         struct anon_vma_chain *avc, *pavc;
>         struct anon_vma *root = NULL;
> -       struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
> -
> -       /*
> -        * If parent share anon_vma with its vm_prev, keep this sharing in in
> -        * child.
> -        *
> -        * 1. Parent has vm_prev, which implies we have vm_prev.
> -        * 2. Parent and its vm_prev have the same anon_vma.
> -        */
> -       if (!dst->anon_vma && src->anon_vma &&
> -           pprev && pprev->anon_vma == src->anon_vma)
> -               dst->anon_vma = prev->anon_vma;
> -
>
>         list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>                 struct anon_vma *anon_vma;
> @@ -334,6 +321,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>   */
>  int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>  {
> +       struct vm_area_struct *prev = vma->vm_prev, *pprev = pvma->vm_prev;
>         struct anon_vma_chain *avc;
>         struct anon_vma *anon_vma;
>         int error;
> @@ -345,6 +333,17 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>         /* Drop inherited anon_vma, we'll reuse existing or allocate new. */
>         vma->anon_vma = NULL;
>
> +       /*
> +        * If parent shares anon_vma with its vm_prev, keep this sharing.
> +        *
> +        * Previous VMA could be missing or not match previuos in parent
> +        * if VM_DONTCOPY is set: compare vm_start to avoid this case.
> +        */
> +       if (pvma->anon_vma && pprev && prev &&
> +           pprev->anon_vma == pvma->anon_vma &&
> +           pprev->vm_start == prev->vm_start)
> +               vma->anon_vma = prev->anon_vma;
> +
>         /*
>          * First, attach the new VMA to the parent VMA's anon_vmas,
>          * so rmap can find non-COWed pages in child processes.
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/rmap: fix reusing mergeable anon_vma as parent when fork
  2020-01-06 10:42 [PATCH] mm/rmap: fix reusing mergeable anon_vma as parent when fork Konstantin Khlebnikov
  2020-01-06 20:35 ` Konstantin Khlebnikov
@ 2020-01-07  2:32 ` Wei Yang
  1 sibling, 0 replies; 5+ messages in thread
From: Wei Yang @ 2020-01-07  2:32 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, linux-kernel, Wei Yang, Kirill A. Shutemov

On Mon, Jan 06, 2020 at 01:42:40PM +0300, Konstantin Khlebnikov wrote:
>This fixes couple misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>mergeable anon_vma as parent when fork").
>
>First problem caused by initialization order in dup_mmap(): vma->vm_prev
>is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>previous VMA in parent mm. This is fixed by rearrangement in dup_mmap().
>

You are right, I missed this point.

>If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>use different anon-vmas: ANON1 ANON2 .. ANONn. Before this patch only DST1
>will fork new ANON1 and following DST2 .. DSTn will share parent's ANON0.
>With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>
>Also this patch moves sharing logic out of anon_vma_clone() into more
>specific anon_vma_fork() because this supposed to work only at fork().
>Function anon_vma_clone() is more generic is also used at splitting VMAs.
>
>Second problem is hidden behind first one: assumption "Parent has vm_prev,
>which implies we have vm_prev" is wrong if first VMA in parent mm has set
>flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>because in current code 'prev' actually is same as 'pprev'. To avoid that
>this patch just checks pointer and compares vm_start to verify relation
>between previous VMAs in parent and child.
>

Correct here too.

>Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>---
> kernel/fork.c |    4 ++--
> mm/rmap.c     |   25 ++++++++++++-------------
> 2 files changed, 14 insertions(+), 15 deletions(-)
>
>diff --git a/kernel/fork.c b/kernel/fork.c
>index 2508a4f238a3..04ee5e243f65 100644
>--- a/kernel/fork.c
>+++ b/kernel/fork.c
>@@ -548,6 +548,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> 		if (retval)
> 			goto fail_nomem_policy;
> 		tmp->vm_mm = mm;
>+		tmp->vm_prev = prev;	/* anon_vma_fork use this */
>+		tmp->vm_next = NULL;

How about pass prev to anon_vma_fork()? So that we limit all the difference in
anon_vma_fork().

> 		retval = dup_userfaultfd(tmp, &uf);
> 		if (retval)
> 			goto fail_nomem_anon_vma_fork;
>@@ -559,7 +561,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> 		} else if (anon_vma_fork(tmp, mpnt))
> 			goto fail_nomem_anon_vma_fork;
> 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>-		tmp->vm_next = tmp->vm_prev = NULL;
> 		file = tmp->vm_file;
> 		if (file) {
> 			struct inode *inode = file_inode(file);
>@@ -592,7 +593,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> 		 */
> 		*pprev = tmp;
> 		pprev = &tmp->vm_next;
>-		tmp->vm_prev = prev;
> 		prev = tmp;
> 
> 		__vma_link_rb(mm, tmp, rb_link, rb_parent);
>diff --git a/mm/rmap.c b/mm/rmap.c
>index b3e381919835..77b3aa38d5c2 100644
>--- a/mm/rmap.c
>+++ b/mm/rmap.c
>@@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> {
> 	struct anon_vma_chain *avc, *pavc;
> 	struct anon_vma *root = NULL;
>-	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>-
>-	/*
>-	 * If parent share anon_vma with its vm_prev, keep this sharing in in
>-	 * child.
>-	 *
>-	 * 1. Parent has vm_prev, which implies we have vm_prev.
>-	 * 2. Parent and its vm_prev have the same anon_vma.
>-	 */
>-	if (!dst->anon_vma && src->anon_vma &&
>-	    pprev && pprev->anon_vma == src->anon_vma)
>-		dst->anon_vma = prev->anon_vma;
>-
> 
> 	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> 		struct anon_vma *anon_vma;
>@@ -334,6 +321,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>  */
> int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> {
>+	struct vm_area_struct *prev = vma->vm_prev, *pprev = pvma->vm_prev;
> 	struct anon_vma_chain *avc;
> 	struct anon_vma *anon_vma;
> 	int error;
>@@ -345,6 +333,17 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> 	/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
> 	vma->anon_vma = NULL;
> 
>+	/*
>+	 * If parent shares anon_vma with its vm_prev, keep this sharing.
>+	 *
>+	 * Previous VMA could be missing or not match previuos in parent
>+	 * if VM_DONTCOPY is set: compare vm_start to avoid this case.
>+	 */
>+	if (pvma->anon_vma && pprev && prev &&
>+	    pprev->anon_vma == pvma->anon_vma &&
>+	    pprev->vm_start == prev->vm_start)
>+		vma->anon_vma = prev->anon_vma;
>+
> 	/*
> 	 * First, attach the new VMA to the parent VMA's anon_vmas,
> 	 * so rmap can find non-COWed pages in child processes.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/rmap: fix reusing mergeable anon_vma as parent when fork
       [not found]   ` <2020010710441027026650@gmail.com>
@ 2020-01-07  8:24     ` Konstantin Khlebnikov
       [not found]       ` <2020010717045916228557@gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-07  8:24 UTC (permalink / raw)
  To: lixinhai.lxh, Konstantin Khlebnikov
  Cc: linux-mm, akpm, linux-kernel, richardw.yang, kirill.shutemov

On 07/01/2020 05.44, lixinhai.lxh@gmail.com wrote:
> On 2020-01-07 at 04:35 Konstantin Khlebnikov wrote:
>> On Mon, Jan 6, 2020 at 1:42 PM Konstantin Khlebnikov
>> <khlebnikov@yandex-team.ru> wrote:
>>>
>>> This fixes couple misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>>> mergeable anon_vma as parent when fork").
>>>
>>> First problem caused by initialization order in dup_mmap(): vma->vm_prev
>>> is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>>> previous VMA in parent mm. This is fixed by rearrangement in dup_mmap().
>>>
>>> If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>>> before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>>> use different anon-vmas: ANON1 ANON2 .. ANONn. Before this patch only DST1
>>> will fork new ANON1 and following DST2 .. DSTn will share parent's ANON0.
>>> With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>>>
>>> Also this patch moves sharing logic out of anon_vma_clone() into more
>>> specific anon_vma_fork() because this supposed to work only at fork().
>>> Function anon_vma_clone() is more generic is also used at splitting VMAs.
>>>
>>> Second problem is hidden behind first one: assumption "Parent has vm_prev,
>>> which implies we have vm_prev" is wrong if first VMA in parent mm has set
>>> flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>>> because in current code 'prev' actually is same as 'pprev'. To avoid that
>>> this patch just checks pointer and compares vm_start to verify relation
>>> between previous VMAs in parent and child.
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>> Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>>
>> Oops, I've forgot to mention that Li Xinhai <lixinhai.lxh@gmail.com>
>> found and reported this suspicious code. Sorry.
>>
>> Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>>
> 
> Can we change the interface
> int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma),
> to
> int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma, struct vm_area_struct *pcvma),
> and 'pcvma' means previous child vma.
> so highlight the use of that vma, and the current code sequence for linking 'tmp' vma
> in dup_mmap() is not changed(in case some code would have dependency on that
> linking sequence)

There should be no dependency on linking sequence.
But we could generalize sharing: cloned vma could share prev anon-vma
(or any other actually) if anon_vma->parent == src->anon_vma.
This is more clear than sharing only between related vmas.

> 
> Another issue is for linking the avc for the reused anon_vma. anon_vma_clone()
> use the iteration
> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma),
> to link avc for child vma, and it is unable to reach the resued anon_vma because
> that is from the previous vma not from parent vma. So, in anon_vma_fork(),
> we need to setup the avc link for vma->anon.

Oh, yes. That's another example where current code miraculously stays correct.

> 
>>> ---
>>>    kernel/fork.c |    4 ++--
>>>    mm/rmap.c     |   25 ++++++++++++-------------
>>>    2 files changed, 14 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>> index 2508a4f238a3..04ee5e243f65 100644
>>> --- a/kernel/fork.c
>>> +++ b/kernel/fork.c
>>> @@ -548,6 +548,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>                   if (retval)
>>>                           goto fail_nomem_policy;
>>>                   tmp->vm_mm = mm;
>>> +               tmp->vm_prev = prev;    /* anon_vma_fork use this */
>>> +               tmp->vm_next = NULL;
>>>                   retval = dup_userfaultfd(tmp, &uf);
>>>                   if (retval)
>>>                           goto fail_nomem_anon_vma_fork;
>>> @@ -559,7 +561,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>                   } else if (anon_vma_fork(tmp, mpnt))
>>>                           goto fail_nomem_anon_vma_fork;
>>>                   tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>>> -               tmp->vm_next = tmp->vm_prev = NULL;
>>>                   file = tmp->vm_file;
>>>                   if (file) {
>>>                           struct inode *inode = file_inode(file);
>>> @@ -592,7 +593,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>                    */
>>>                   *pprev = tmp;
>>>                   pprev = &tmp->vm_next;
>>> -               tmp->vm_prev = prev;
>>>                   prev = tmp;
>>>
>>>                   __vma_link_rb(mm, tmp, rb_link, rb_parent);
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index b3e381919835..77b3aa38d5c2 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>    {
>>>           struct anon_vma_chain *avc, *pavc;
>>>           struct anon_vma *root = NULL;
>>> -       struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>>> -
>>> -       /*
>>> -        * If parent share anon_vma with its vm_prev, keep this sharing in in
>>> -        * child.
>>> -        *
>>> -        * 1. Parent has vm_prev, which implies we have vm_prev.
>>> -        * 2. Parent and its vm_prev have the same anon_vma.
>>> -        */
>>> -       if (!dst->anon_vma && src->anon_vma &&
>>> -           pprev && pprev->anon_vma == src->anon_vma)
>>> -               dst->anon_vma = prev->anon_vma;
>>> -
>>>
>>>           list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>>>                   struct anon_vma *anon_vma;
>>> @@ -334,6 +321,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>     */
>>>    int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>    {
>>> +       struct vm_area_struct *prev = vma->vm_prev, *pprev = pvma->vm_prev;
>>>           struct anon_vma_chain *avc;
>>>           struct anon_vma *anon_vma;
>>>           int error;
>>> @@ -345,6 +333,17 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>           /* Drop inherited anon_vma, we'll reuse existing or allocate new. */
>>>           vma->anon_vma = NULL;
>>>
>>> +       /*
>>> +        * If parent shares anon_vma with its vm_prev, keep this sharing.
>>> +        *
>>> +        * Previous VMA could be missing or not match previuos in parent
>>> +        * if VM_DONTCOPY is set: compare vm_start to avoid this case.
>>> +        */
>>> +       if (pvma->anon_vma && pprev && prev &&
>>> +           pprev->anon_vma == pvma->anon_vma &&
>>> +           pprev->vm_start == prev->vm_start)
>>> +               vma->anon_vma = prev->anon_vma;
>>> +
>>>           /*
>>>            * First, attach the new VMA to the parent VMA's anon_vmas,
>>>            * so rmap can find non-COWed pages in child processes.
>>>
>> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm/rmap: fix reusing mergeable anon_vma as parent when fork
       [not found]       ` <2020010717045916228557@gmail.com>
@ 2020-01-07  9:35         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 5+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-07  9:35 UTC (permalink / raw)
  To: lixinhai.lxh, Konstantin Khlebnikov
  Cc: linux-mm, akpm, linux-kernel, richardw.yang, kirill.shutemov

On 07/01/2020 12.05, lixinhai.lxh@gmail.com wrote:
> On 2020-01-07 at 16:24 Konstantin Khlebnikov wrote:
>> On 07/01/2020 05.44, lixinhai.lxh@gmail.com wrote:
>>> On 2020-01-07 at 04:35 Konstantin Khlebnikov wrote:
>>>> On Mon, Jan 6, 2020 at 1:42 PM Konstantin Khlebnikov
>>>> <khlebnikov@yandex-team.ru> wrote:
>>>>>
>>>>> This fixes couple misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>>>>> mergeable anon_vma as parent when fork").
>>>>>
>>>>> First problem caused by initialization order in dup_mmap(): vma->vm_prev
>>>>> is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>>>>> previous VMA in parent mm. This is fixed by rearrangement in dup_mmap().
>>>>>
>>>>> If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>>>>> before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>>>>> use different anon-vmas: ANON1 ANON2 .. ANONn. Before this patch only DST1
>>>>> will fork new ANON1 and following DST2 .. DSTn will share parent's ANON0.
>>>>> With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>>>>>
>>>>> Also this patch moves sharing logic out of anon_vma_clone() into more
>>>>> specific anon_vma_fork() because this supposed to work only at fork().
>>>>> Function anon_vma_clone() is more generic is also used at splitting VMAs.
>>>>>
>>>>> Second problem is hidden behind first one: assumption "Parent has vm_prev,
>>>>> which implies we have vm_prev" is wrong if first VMA in parent mm has set
>>>>> flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>>>>> because in current code 'prev' actually is same as 'pprev'. To avoid that
>>>>> this patch just checks pointer and compares vm_start to verify relation
>>>>> between previous VMAs in parent and child.
>>>>>
>>>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>>>> Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>>>>
>>>> Oops, I've forgot to mention that Li Xinhai <lixinhai.lxh@gmail.com>
>>>> found and reported this suspicious code. Sorry.
>>>>
>>>> Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>>> Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>>>>
>>>
>>> Can we change the interface
>>> int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma),
>>> to
>>> int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma, struct vm_area_struct *pcvma),
>>> and 'pcvma' means previous child vma.
>>> so highlight the use of that vma, and the current code sequence for linking 'tmp' vma
>>> in dup_mmap() is not changed(in case some code would have dependency on that
>>> linking sequence)
>>
>> There should be no dependency on linking sequence.
>> But we could generalize sharing: cloned vma could share prev anon-vma
>> (or any other actually) if anon_vma->parent == src->anon_vma.
>> This is more clear than sharing only between related vmas.
>>
> I mean in
> int dup_mmap(...)
> {
> 		tmp = vm_area_dup(mpnt);
> 
> //.... some code in this part would have dependency on 'tmp->vm_prev, tmp->vm_next'
> // ... I didn't go through all this part, but need take care about it.
> // so, better to keep the current semantic and pass in *pprev for anon vma call

Yeah, there is weird and unintentional reference in case VM_WIPEONFORK:
anon_vma_prepare -> find_mergeable_anon_vma

This is just dangerous.

> 
> 		/*
> 		 * Link in the new vma and copy the page table entries.
> 		 */
> 		*pprev = tmp;
> 		pprev = &tmp->vm_next;
> 		tmp->vm_prev = prev;
> 		prev = tmp;
> 
> }
> 
>>>
>>> Another issue is for linking the avc for the reused anon_vma. anon_vma_clone()
>>> use the iteration
>>> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma),
>>> to link avc for child vma, and it is unable to reach the resued anon_vma because
>>> that is from the previous vma not from parent vma. So, in anon_vma_fork(),
>>> we need to setup the avc link for vma->anon.
>>
>> Oh, yes. That's another example where current code miraculously stays correct.
>>
>>>
>>>>> ---
>>>>>      kernel/fork.c |    4 ++--
>>>>>      mm/rmap.c     |   25 ++++++++++++-------------
>>>>>      2 files changed, 14 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>> index 2508a4f238a3..04ee5e243f65 100644
>>>>> --- a/kernel/fork.c
>>>>> +++ b/kernel/fork.c
>>>>> @@ -548,6 +548,8 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>>>                     if (retval)
>>>>>                             goto fail_nomem_policy;
>>>>>                     tmp->vm_mm = mm;
>>>>> +               tmp->vm_prev = prev;    /* anon_vma_fork use this */
>>>>> +               tmp->vm_next = NULL;
>>>>>                     retval = dup_userfaultfd(tmp, &uf);
>>>>>                     if (retval)
>>>>>                             goto fail_nomem_anon_vma_fork;
>>>>> @@ -559,7 +561,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>>>                     } else if (anon_vma_fork(tmp, mpnt))
>>>>>                             goto fail_nomem_anon_vma_fork;
>>>>>                     tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>>>>> -               tmp->vm_next = tmp->vm_prev = NULL;
>>>>>                     file = tmp->vm_file;
>>>>>                     if (file) {
>>>>>                             struct inode *inode = file_inode(file);
>>>>> @@ -592,7 +593,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>>>                      */
>>>>>                     *pprev = tmp;
>>>>>                     pprev = &tmp->vm_next;
>>>>> -               tmp->vm_prev = prev;
>>>>>                     prev = tmp;
>>>>>
>>>>>                     __vma_link_rb(mm, tmp, rb_link, rb_parent);
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index b3e381919835..77b3aa38d5c2 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>>>      {
>>>>>             struct anon_vma_chain *avc, *pavc;
>>>>>             struct anon_vma *root = NULL;
>>>>> -       struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>>>>> -
>>>>> -       /*
>>>>> -        * If parent share anon_vma with its vm_prev, keep this sharing in in
>>>>> -        * child.
>>>>> -        *
>>>>> -        * 1. Parent has vm_prev, which implies we have vm_prev.
>>>>> -        * 2. Parent and its vm_prev have the same anon_vma.
>>>>> -        */
>>>>> -       if (!dst->anon_vma && src->anon_vma &&
>>>>> -           pprev && pprev->anon_vma == src->anon_vma)
>>>>> -               dst->anon_vma = prev->anon_vma;
>>>>> -
>>>>>
>>>>>             list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>>>>>                     struct anon_vma *anon_vma;
>>>>> @@ -334,6 +321,7 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>>>       */
>>>>>      int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>>>      {
>>>>> +       struct vm_area_struct *prev = vma->vm_prev, *pprev = pvma->vm_prev;
>>>>>             struct anon_vma_chain *avc;
>>>>>             struct anon_vma *anon_vma;
>>>>>             int error;
>>>>> @@ -345,6 +333,17 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>>>             /* Drop inherited anon_vma, we'll reuse existing or allocate new. */
>>>>>             vma->anon_vma = NULL;
>>>>>
>>>>> +       /*
>>>>> +        * If parent shares anon_vma with its vm_prev, keep this sharing.
>>>>> +        *
>>>>> +        * Previous VMA could be missing or not match previuos in parent
>>>>> +        * if VM_DONTCOPY is set: compare vm_start to avoid this case.
>>>>> +        */
>>>>> +       if (pvma->anon_vma && pprev && prev &&
>>>>> +           pprev->anon_vma == pvma->anon_vma &&
>>>>> +           pprev->vm_start == prev->vm_start)
>>>>> +               vma->anon_vma = prev->anon_vma;
>>>>> +
>>>>>             /*
>>>>>              * First, attach the new VMA to the parent VMA's anon_vmas,
>>>>>              * so rmap can find non-COWed pages in child processes.
>>>>>
>>>> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-01-07  9:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-06 10:42 [PATCH] mm/rmap: fix reusing mergeable anon_vma as parent when fork Konstantin Khlebnikov
2020-01-06 20:35 ` Konstantin Khlebnikov
     [not found]   ` <2020010710441027026650@gmail.com>
2020-01-07  8:24     ` Konstantin Khlebnikov
     [not found]       ` <2020010717045916228557@gmail.com>
2020-01-07  9:35         ` Konstantin Khlebnikov
2020-01-07  2:32 ` Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).