All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
@ 2020-01-07 10:19 Konstantin Khlebnikov
  2020-01-07 10:19 ` [PATCH v2 2/2] kernel/fork: set VMA's mm/prev/next right after vm_area_dup in dup_mmap Konstantin Khlebnikov
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-07 10:19 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, linux-kernel, Wei Yang
  Cc: Rik van Riel, Li Xinhai, Kirill A. Shutemov

This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected
way but fortunately still produces valid anon-vma tree, so nothing crashes.

If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
before all patches in child process related VMAs: DST1 DST2 .. DSTn will
fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0).
Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn
will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal).
With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.

Root problem caused by initialization order in dup_mmap(): vma->vm_prev
is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
previous VMA in parent mm.

Second problem is hidden behind first one: assumption "Parent has vm_prev,
which implies we have vm_prev" is wrong if first VMA in parent mm has set
flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
because in current code 'prev' actually is same as 'pprev'.

Third hidden problem is linking between VMA and anon-vmas whose pages it
could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas,
shared anon-vma isn't attached. But every mapped page stays reachable in
rmap because we erroneously share anon-vma from parent's previous VMA.

This patch moves sharing logic out of anon_vma_clone() into more specific
anon_vma_fork() because this supposed to work only at fork() and simply
reuses anon_vma from previous VMA if it is forked from the same anon-vma.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
---
 include/linux/rmap.h |    3 ++-
 kernel/fork.c        |    2 +-
 mm/rmap.c            |   23 +++++++++--------------
 3 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 988d176472df..560e4480dcd0 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -143,7 +143,8 @@ void anon_vma_init(void);	/* create anon_vma_cachep */
 int  __anon_vma_prepare(struct vm_area_struct *);
 void unlink_anon_vmas(struct vm_area_struct *);
 int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
-int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
+int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
+		  struct vm_area_struct *prev);
 
 static inline int anon_vma_prepare(struct vm_area_struct *vma)
 {
diff --git a/kernel/fork.c b/kernel/fork.c
index 2508a4f238a3..c33626993831 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 			tmp->anon_vma = NULL;
 			if (anon_vma_prepare(tmp))
 				goto fail_nomem_anon_vma_fork;
-		} else if (anon_vma_fork(tmp, mpnt))
+		} else if (anon_vma_fork(tmp, mpnt, prev))
 			goto fail_nomem_anon_vma_fork;
 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
 		tmp->vm_next = tmp->vm_prev = NULL;
diff --git a/mm/rmap.c b/mm/rmap.c
index b3e381919835..3c1e04389291 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
 {
 	struct anon_vma_chain *avc, *pavc;
 	struct anon_vma *root = NULL;
-	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
-
-	/*
-	 * If parent share anon_vma with its vm_prev, keep this sharing in in
-	 * child.
-	 *
-	 * 1. Parent has vm_prev, which implies we have vm_prev.
-	 * 2. Parent and its vm_prev have the same anon_vma.
-	 */
-	if (!dst->anon_vma && src->anon_vma &&
-	    pprev && pprev->anon_vma == src->anon_vma)
-		dst->anon_vma = prev->anon_vma;
-
 
 	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
 		struct anon_vma *anon_vma;
@@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
  * the corresponding VMA in the parent process is attached to.
  * Returns 0 on success, non-zero on failure.
  */
-int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
+int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
+		  struct vm_area_struct *prev)
 {
 	struct anon_vma_chain *avc;
 	struct anon_vma *anon_vma;
@@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
 	if (!pvma->anon_vma)
 		return 0;
 
+	/* Share anon_vma with previous VMA if it has the same parent. */
+	if (prev && prev->anon_vma &&
+	    prev->anon_vma->parent == pvma->anon_vma) {
+		vma->anon_vma = prev->anon_vma;
+		return anon_vma_clone(vma, prev);
+	}
+
 	/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
 	vma->anon_vma = NULL;
 


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/2] kernel/fork: set VMA's mm/prev/next right after vm_area_dup in dup_mmap
  2020-01-07 10:19 [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork Konstantin Khlebnikov
@ 2020-01-07 10:19 ` Konstantin Khlebnikov
  2020-01-07 14:22   ` lixinhai.lxh
  2020-01-07 13:35 ` [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork lixinhai.lxh
  2020-01-08  2:32 ` Wei Yang
  2 siblings, 1 reply; 22+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-07 10:19 UTC (permalink / raw)
  To: linux-mm, Andrew Morton, linux-kernel, Wei Yang
  Cc: Rik van Riel, Li Xinhai, Kirill A. Shutemov

Function vm_area_dup() makes exact copy of structure. It's better to stop
referencing parent structures because various helpers use these pointers.

For example if VM_WIPEONFORK is set then anon_vma_prepare() will try to
merge anon_vma with previous and next VMA. Poking parent VMAs and sharing
their anon_vma is safe for now but is not expected behavior.

Note: this will break commit 4e4a9eb92133 ("mm/rmap.c: reuse mergeable
anon_vma as parent when fork") without related fix because it contains
several flaws hidden by current initialization sequence in dup_mmap().

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 kernel/fork.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index c33626993831..784c9ae56aa9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -544,10 +544,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		tmp = vm_area_dup(mpnt);
 		if (!tmp)
 			goto fail_nomem;
+		tmp->vm_mm = mm;
+		tmp->vm_prev = prev;
+		tmp->vm_next = NULL;
 		retval = vma_dup_policy(mpnt, tmp);
 		if (retval)
 			goto fail_nomem_policy;
-		tmp->vm_mm = mm;
 		retval = dup_userfaultfd(tmp, &uf);
 		if (retval)
 			goto fail_nomem_anon_vma_fork;
@@ -559,7 +561,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		} else if (anon_vma_fork(tmp, mpnt, prev))
 			goto fail_nomem_anon_vma_fork;
 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
-		tmp->vm_next = tmp->vm_prev = NULL;
 		file = tmp->vm_file;
 		if (file) {
 			struct inode *inode = file_inode(file);
@@ -592,7 +593,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
 		 */
 		*pprev = tmp;
 		pprev = &tmp->vm_next;
-		tmp->vm_prev = prev;
 		prev = tmp;
 
 		__vma_link_rb(mm, tmp, rb_link, rb_parent);


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-07 10:19 [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork Konstantin Khlebnikov
  2020-01-07 10:19 ` [PATCH v2 2/2] kernel/fork: set VMA's mm/prev/next right after vm_area_dup in dup_mmap Konstantin Khlebnikov
@ 2020-01-07 13:35 ` lixinhai.lxh
  2020-01-08  2:32 ` Wei Yang
  2 siblings, 0 replies; 22+ messages in thread
From: lixinhai.lxh @ 2020-01-07 13:35 UTC (permalink / raw)
  To: khlebnikov, linux-mm, akpm, linux-kernel, richardw.yang
  Cc: Rik van Riel, kirill.shutemov

On 2020-01-07 at 18:19 Konstantin Khlebnikov wrote:
>This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected
>way but fortunately still produces valid anon-vma tree, so nothing crashes.
>
>If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0).
>Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn
>will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal).
>With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>
>Root problem caused by initialization order in dup_mmap(): vma->vm_prev
>is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>previous VMA in parent mm.
>
>Second problem is hidden behind first one: assumption "Parent has vm_prev,
>which implies we have vm_prev" is wrong if first VMA in parent mm has set
>flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>because in current code 'prev' actually is same as 'pprev'.
>
>Third hidden problem is linking between VMA and anon-vmas whose pages it
>could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas,
>shared anon-vma isn't attached. But every mapped page stays reachable in
>rmap because we erroneously share anon-vma from parent's previous VMA.
>
>This patch moves sharing logic out of anon_vma_clone() into more specific
>anon_vma_fork() because this supposed to work only at fork() and simply
>reuses anon_vma from previous VMA if it is forked from the same anon-vma.
>
>Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t

The call to anon_vma_clone is nice. It looks fine to me.
Reviewed-by: Li Xinhai <lixinhai.lxh@gmail.com>

should I also add Suggested-by: Li Xinhai <lixinhai.lxh@gmail.com> ?

>---
> include/linux/rmap.h |    3 ++-
> kernel/fork.c        |    2 +-
> mm/rmap.c            |   23 +++++++++--------------
> 3 files changed, 12 insertions(+), 16 deletions(-)
>
>diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>index 988d176472df..560e4480dcd0 100644
>--- a/include/linux/rmap.h
>+++ b/include/linux/rmap.h
>@@ -143,7 +143,8 @@ void anon_vma_init(void);	/* create anon_vma_cachep */
> int  __anon_vma_prepare(struct vm_area_struct *);
> void unlink_anon_vmas(struct vm_area_struct *);
> int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
>-int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>+int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>+	  struct vm_area_struct *prev);
>
> static inline int anon_vma_prepare(struct vm_area_struct *vma)
> {
>diff --git a/kernel/fork.c b/kernel/fork.c
>index 2508a4f238a3..c33626993831 100644
>--- a/kernel/fork.c
>+++ b/kernel/fork.c
>@@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> tmp->anon_vma = NULL;
> if (anon_vma_prepare(tmp))
> goto fail_nomem_anon_vma_fork;
>-	} else if (anon_vma_fork(tmp, mpnt))
>+	} else if (anon_vma_fork(tmp, mpnt, prev))
> goto fail_nomem_anon_vma_fork;
> tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
> tmp->vm_next = tmp->vm_prev = NULL;
>diff --git a/mm/rmap.c b/mm/rmap.c
>index b3e381919835..3c1e04389291 100644
>--- a/mm/rmap.c
>+++ b/mm/rmap.c
>@@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> {
> struct anon_vma_chain *avc, *pavc;
> struct anon_vma *root = NULL;
>-	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>-
>-	/*
>-	* If parent share anon_vma with its vm_prev, keep this sharing in in
>-	* child.
>-	*
>-	* 1. Parent has vm_prev, which implies we have vm_prev.
>-	* 2. Parent and its vm_prev have the same anon_vma.
>-	*/
>-	if (!dst->anon_vma && src->anon_vma &&
>-	    pprev && pprev->anon_vma == src->anon_vma)
>-	dst->anon_vma = prev->anon_vma;
>-
>
> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> struct anon_vma *anon_vma;
>@@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>  * the corresponding VMA in the parent process is attached to.
>  * Returns 0 on success, non-zero on failure.
>  */
>-int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>+int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>+	  struct vm_area_struct *prev)
> {
> struct anon_vma_chain *avc;
> struct anon_vma *anon_vma;
>@@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> if (!pvma->anon_vma)
> return 0;
>
>+	/* Share anon_vma with previous VMA if it has the same parent. */
>+	if (prev && prev->anon_vma &&
>+	    prev->anon_vma->parent == pvma->anon_vma) {
>+	vma->anon_vma = prev->anon_vma;
>+	return anon_vma_clone(vma, prev);
>+	}
>+
> /* Drop inherited anon_vma, we'll reuse existing or allocate new. */
> vma->anon_vma = NULL;
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/2] kernel/fork: set VMA's mm/prev/next right after vm_area_dup in dup_mmap
  2020-01-07 10:19 ` [PATCH v2 2/2] kernel/fork: set VMA's mm/prev/next right after vm_area_dup in dup_mmap Konstantin Khlebnikov
@ 2020-01-07 14:22   ` lixinhai.lxh
  0 siblings, 0 replies; 22+ messages in thread
From: lixinhai.lxh @ 2020-01-07 14:22 UTC (permalink / raw)
  To: khlebnikov, linux-mm, akpm, linux-kernel, richardw.yang
  Cc: Rik van Riel, kirill.shutemov

On 2020-01-07 at 18:19 Konstantin Khlebnikov wrote:
>Function vm_area_dup() makes exact copy of structure. It's better to stop
>referencing parent structures because various helpers use these pointers.
>
>For example if VM_WIPEONFORK is set then anon_vma_prepare() will try to
>merge anon_vma with previous and next VMA. Poking parent VMAs and sharing
>their anon_vma is safe for now but is not expected behavior.
>
>Note: this will break commit 4e4a9eb92133 ("mm/rmap.c: reuse mergeable
>anon_vma as parent when fork") without related fix because it contains
>several flaws hidden by current initialization sequence in dup_mmap().
>
>Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> 

looks fine to me.
Reviewed-by: Li Xinhai <lixinhai.lxh@gmail.com>

>---
> kernel/fork.c |    6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/kernel/fork.c b/kernel/fork.c
>index c33626993831..784c9ae56aa9 100644
>--- a/kernel/fork.c
>+++ b/kernel/fork.c
>@@ -544,10 +544,12 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> tmp = vm_area_dup(mpnt);
> if (!tmp)
> goto fail_nomem;
>+	tmp->vm_mm = mm;
>+	tmp->vm_prev = prev;
>+	tmp->vm_next = NULL;
> retval = vma_dup_policy(mpnt, tmp);
> if (retval)
> goto fail_nomem_policy;
>-	tmp->vm_mm = mm;
> retval = dup_userfaultfd(tmp, &uf);
> if (retval)
> goto fail_nomem_anon_vma_fork;
>@@ -559,7 +561,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> } else if (anon_vma_fork(tmp, mpnt, prev))
> goto fail_nomem_anon_vma_fork;
> tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>-	tmp->vm_next = tmp->vm_prev = NULL;
> file = tmp->vm_file;
> if (file) {
> struct inode *inode = file_inode(file);
>@@ -592,7 +593,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> */
> *pprev = tmp;
> pprev = &tmp->vm_next;
>-	tmp->vm_prev = prev;
> prev = tmp;
>
> __vma_link_rb(mm, tmp, rb_link, rb_parent);
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-07 10:19 [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork Konstantin Khlebnikov
  2020-01-07 10:19 ` [PATCH v2 2/2] kernel/fork: set VMA's mm/prev/next right after vm_area_dup in dup_mmap Konstantin Khlebnikov
  2020-01-07 13:35 ` [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork lixinhai.lxh
@ 2020-01-08  2:32 ` Wei Yang
  2020-01-08  6:42   ` lixinhai.lxh
  2020-01-08 10:40   ` Konstantin Khlebnikov
  2 siblings, 2 replies; 22+ messages in thread
From: Wei Yang @ 2020-01-08  2:32 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, linux-kernel, Wei Yang, Rik van Riel,
	Li Xinhai, Kirill A. Shutemov

On Tue, Jan 07, 2020 at 01:19:56PM +0300, Konstantin Khlebnikov wrote:
>This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected
>way but fortunately still produces valid anon-vma tree, so nothing crashes.
>
>If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0).
>Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn
>will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal).
>With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>
>Root problem caused by initialization order in dup_mmap(): vma->vm_prev
>is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>previous VMA in parent mm.
>
>Second problem is hidden behind first one: assumption "Parent has vm_prev,
>which implies we have vm_prev" is wrong if first VMA in parent mm has set
>flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>because in current code 'prev' actually is same as 'pprev'.
>
>Third hidden problem is linking between VMA and anon-vmas whose pages it
>could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas,
>shared anon-vma isn't attached. But every mapped page stays reachable in
>rmap because we erroneously share anon-vma from parent's previous VMA.
>
>This patch moves sharing logic out of anon_vma_clone() into more specific
>anon_vma_fork() because this supposed to work only at fork() and simply
>reuses anon_vma from previous VMA if it is forked from the same anon-vma.
>
>Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>---
> include/linux/rmap.h |    3 ++-
> kernel/fork.c        |    2 +-
> mm/rmap.c            |   23 +++++++++--------------
> 3 files changed, 12 insertions(+), 16 deletions(-)
>
>diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>index 988d176472df..560e4480dcd0 100644
>--- a/include/linux/rmap.h
>+++ b/include/linux/rmap.h
>@@ -143,7 +143,8 @@ void anon_vma_init(void);	/* create anon_vma_cachep */
> int  __anon_vma_prepare(struct vm_area_struct *);
> void unlink_anon_vmas(struct vm_area_struct *);
> int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
>-int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>+int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>+		  struct vm_area_struct *prev);
>
> static inline int anon_vma_prepare(struct vm_area_struct *vma)
> {
>diff --git a/kernel/fork.c b/kernel/fork.c
>index 2508a4f238a3..c33626993831 100644
>--- a/kernel/fork.c
>+++ b/kernel/fork.c
>@@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
> 			tmp->anon_vma = NULL;
> 			if (anon_vma_prepare(tmp))
> 				goto fail_nomem_anon_vma_fork;
>-		} else if (anon_vma_fork(tmp, mpnt))
>+		} else if (anon_vma_fork(tmp, mpnt, prev))
> 			goto fail_nomem_anon_vma_fork;
> 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
> 		tmp->vm_next = tmp->vm_prev = NULL;
>diff --git a/mm/rmap.c b/mm/rmap.c
>index b3e381919835..3c1e04389291 100644
>--- a/mm/rmap.c
>+++ b/mm/rmap.c
>@@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> {
> 	struct anon_vma_chain *avc, *pavc;
> 	struct anon_vma *root = NULL;
>-	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>-
>-	/*
>-	 * If parent share anon_vma with its vm_prev, keep this sharing in in
>-	 * child.
>-	 *
>-	 * 1. Parent has vm_prev, which implies we have vm_prev.
>-	 * 2. Parent and its vm_prev have the same anon_vma.
>-	 */
>-	if (!dst->anon_vma && src->anon_vma &&
>-	    pprev && pprev->anon_vma == src->anon_vma)
>-		dst->anon_vma = prev->anon_vma;
>-
>
> 	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> 		struct anon_vma *anon_vma;
>@@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>  * the corresponding VMA in the parent process is attached to.
>  * Returns 0 on success, non-zero on failure.
>  */
>-int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>+int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>+		  struct vm_area_struct *prev)
> {
> 	struct anon_vma_chain *avc;
> 	struct anon_vma *anon_vma;
>@@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> 	if (!pvma->anon_vma)
> 		return 0;
>
>+	/* Share anon_vma with previous VMA if it has the same parent. */
>+	if (prev && prev->anon_vma &&
>+	    prev->anon_vma->parent == pvma->anon_vma) {
>+		vma->anon_vma = prev->anon_vma;
>+		return anon_vma_clone(vma, prev);
>+	}
>+

I am afraid this one change the intended behavior. Let's put a chart to
describe.

Commit 4e4a9eb92133 ("mm/rmap.c: reusemergeable anon_vma as parent when
fork") tries to improve the following situation.

Before the commit, the behavior is like this:

Parent process:

      +-----+            
      | pav |<-----------------+----------------------+
      +-----+                  |                      |
                               |                      |
                   +-----------+          +-----------+
                   |pprev      |          |pvma       |
                   +-----------+          +-----------+

Child Process


      +-----+                     +-----+
      | av1 |<-----------------+  | av2 |<------------+
      +-----+                  |  +-----+             |
                               |                      |
                   +-----------+          +-----------+
                   |prev       |          |vma        |
                   +-----------+          +-----------+


Parent pprev and pvma share the same anon_vma due to
find_mergeable_anon_vma(). While the anon_vma_clone() would pick up different
anon_vma for child process's vma.

The purpose of my commit is to give child process the following shape.

      +-----+                            
      | av  |<-----------------+----------------------+
      +-----+                  |                      |
                               |                      |
                   +-----------+          +-----------+
                   |prev       |          |vma        |
                   +-----------+          +-----------+

After this, we reduce the extra "av2" for child process. But yes, because of
the two reasons you found, it didn't do the exact thing.

While if my understanding is correct, the anon_vma_clone() would pick up any
anon_vma in its process tree, except parent's. If this fails to get a reusable
one, anon_vma_fork() would allocate one, whose parent is pvma->anon_vma.

Let me summarise original behavior:

  * if anon_vma_clone succeed, it find one anon_vma in the process tree, but
    it could not be pvma->anon_vma
  * if anon_vma_clone fail, it will allocate a new anon_vma and its parent is
    pvma->anon_vam

Then take a look into your code here.

"prev->anon_vma->parent == pvma->anon_vma" means prev->anon_vma parent is
pvma's anon_vma. If my understanding is correct, this just match the second
case. For "prev", we didn't find a reusable anon_vma and allocate a new one. 

But how about the first case? prev reuse an anon_vma in the process tree which
is not parent's?

> 	/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
> 	vma->anon_vma = NULL;
>

--
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-08  2:32 ` Wei Yang
@ 2020-01-08  6:42   ` lixinhai.lxh
  2020-01-08 10:40   ` Konstantin Khlebnikov
  1 sibling, 0 replies; 22+ messages in thread
From: lixinhai.lxh @ 2020-01-08  6:42 UTC (permalink / raw)
  To: richardw.yang, khlebnikov
  Cc: linux-mm, akpm, linux-kernel, richardw.yang, Rik van Riel,
	kirill.shutemov

On 2020-01-08 at 10:32 Wei Yang wrote:
>On Tue, Jan 07, 2020 at 01:19:56PM +0300, Konstantin Khlebnikov wrote:
>>This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>>mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected
>>way but fortunately still produces valid anon-vma tree, so nothing crashes.
>>
>>If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>>before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>>fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0).
>>Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn
>>will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal).
>>With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>>
>>Root problem caused by initialization order in dup_mmap(): vma->vm_prev
>>is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>>previous VMA in parent mm.
>>
>>Second problem is hidden behind first one: assumption "Parent has vm_prev,
>>which implies we have vm_prev" is wrong if first VMA in parent mm has set
>>flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>>because in current code 'prev' actually is same as 'pprev'.
>>
>>Third hidden problem is linking between VMA and anon-vmas whose pages it
>>could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas,
>>shared anon-vma isn't attached. But every mapped page stays reachable in
>>rmap because we erroneously share anon-vma from parent's previous VMA.
>>
>>This patch moves sharing logic out of anon_vma_clone() into more specific
>>anon_vma_fork() because this supposed to work only at fork() and simply
>>reuses anon_vma from previous VMA if it is forked from the same anon-vma.
>>
>>Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>>Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>>---
>> include/linux/rmap.h |    3 ++-
>> kernel/fork.c        |    2 +-
>> mm/rmap.c            |   23 +++++++++--------------
>> 3 files changed, 12 insertions(+), 16 deletions(-)
>>
>>diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>index 988d176472df..560e4480dcd0 100644
>>--- a/include/linux/rmap.h
>>+++ b/include/linux/rmap.h
>>@@ -143,7 +143,8 @@ void anon_vma_init(void);	/* create anon_vma_cachep */
>> int  __anon_vma_prepare(struct vm_area_struct *);
>> void unlink_anon_vmas(struct vm_area_struct *);
>> int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
>>-int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>>+int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>>+	  struct vm_area_struct *prev);
>>
>> static inline int anon_vma_prepare(struct vm_area_struct *vma)
>> {
>>diff --git a/kernel/fork.c b/kernel/fork.c
>>index 2508a4f238a3..c33626993831 100644
>>--- a/kernel/fork.c
>>+++ b/kernel/fork.c
>>@@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>> tmp->anon_vma = NULL;
>> if (anon_vma_prepare(tmp))
>> goto fail_nomem_anon_vma_fork;
>>-	} else if (anon_vma_fork(tmp, mpnt))
>>+	} else if (anon_vma_fork(tmp, mpnt, prev))
>> goto fail_nomem_anon_vma_fork;
>> tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>> tmp->vm_next = tmp->vm_prev = NULL;
>>diff --git a/mm/rmap.c b/mm/rmap.c
>>index b3e381919835..3c1e04389291 100644
>>--- a/mm/rmap.c
>>+++ b/mm/rmap.c
>>@@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>> {
>> struct anon_vma_chain *avc, *pavc;
>> struct anon_vma *root = NULL;
>>-	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>>-
>>-	/*
>>-	* If parent share anon_vma with its vm_prev, keep this sharing in in
>>-	* child.
>>-	*
>>-	* 1. Parent has vm_prev, which implies we have vm_prev.
>>-	* 2. Parent and its vm_prev have the same anon_vma.
>>-	*/
>>-	if (!dst->anon_vma && src->anon_vma &&
>>-	    pprev && pprev->anon_vma == src->anon_vma)
>>-	dst->anon_vma = prev->anon_vma;
>>-
>>
>> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>> struct anon_vma *anon_vma;
>>@@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>  * the corresponding VMA in the parent process is attached to.
>>  * Returns 0 on success, non-zero on failure.
>>  */
>>-int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>+int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>>+	  struct vm_area_struct *prev)
>> {
>> struct anon_vma_chain *avc;
>> struct anon_vma *anon_vma;
>>@@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>> if (!pvma->anon_vma)
>> return 0;
>>
>>+	/* Share anon_vma with previous VMA if it has the same parent. */
>>+	if (prev && prev->anon_vma &&
>>+	    prev->anon_vma->parent == pvma->anon_vma) {
>>+	vma->anon_vma = prev->anon_vma;
>>+	return anon_vma_clone(vma, prev);
>>+	}
>>+
> 

the checking in first version has done right thing, or anything missed here? 
+	if (pvma->anon_vma && pprev && prev &&
+	    pprev->anon_vma == pvma->anon_vma &&
+	    pprev->vm_start == prev->vm_start)
except that 'pvma->anon_vma' has been checked at the begining of this function,
it can be removed from this block.

>I am afraid this one change the intended behavior. Let's put a chart to
>describe.
>
>Commit 4e4a9eb92133 ("mm/rmap.c: reusemergeable anon_vma as parent when
>fork") tries to improve the following situation.
>
>Before the commit, the behavior is like this:
>
>Parent process:
>
>      +-----+           
>      | pav |<-----------------+----------------------+
>      +-----+                  |                      |
>                               |                      |
>                   +-----------+          +-----------+
>                   |pprev      |          |pvma       |
>                   +-----------+          +-----------+
>
>Child Process
>
>
>      +-----+                     +-----+
>      | av1 |<-----------------+  | av2 |<------------+
>      +-----+                  |  +-----+             |
>                               |                      |
>                   +-----------+          +-----------+
>                   |prev       |          |vma        |
>                   +-----------+          +-----------+
>
>
>Parent pprev and pvma share the same anon_vma due to
>find_mergeable_anon_vma(). While the anon_vma_clone() would pick up different
>anon_vma for child process's vma.
>
>The purpose of my commit is to give child process the following shape.
>
>      +-----+                           
>      | av  |<-----------------+----------------------+
>      +-----+                  |                      |
>                               |                      |
>                   +-----------+          +-----------+
>                   |prev       |          |vma        |
>                   +-----------+          +-----------+
>
>After this, we reduce the extra "av2" for child process. But yes, because of
>the two reasons you found, it didn't do the exact thing.
>
>While if my understanding is correct, the anon_vma_clone() would pick up any
>anon_vma in its process tree, except parent's. If this fails to get a reusable
>one, anon_vma_fork() would allocate one, whose parent is pvma->anon_vma.
>
>Let me summarise original behavior:
>
>  * if anon_vma_clone succeed, it find one anon_vma in the process tree, but
>    it could not be pvma->anon_vma
>  * if anon_vma_clone fail, it will allocate a new anon_vma and its parent is
>    pvma->anon_vam
>
>Then take a look into your code here.
>
>"prev->anon_vma->parent == pvma->anon_vma" means prev->anon_vma parent is
>pvma's anon_vma. If my understanding is correct, this just match the second
>case. For "prev", we didn't find a reusable anon_vma and allocate a new one.
>
>But how about the first case? prev reuse an anon_vma in the process tree which
>is not parent's?
>
>> /* Drop inherited anon_vma, we'll reuse existing or allocate new. */
>> vma->anon_vma = NULL;
>>
>
>--
>Wei Yang
>Help you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-08  2:32 ` Wei Yang
  2020-01-08  6:42   ` lixinhai.lxh
@ 2020-01-08 10:40   ` Konstantin Khlebnikov
  2020-01-09  2:52     ` Wei Yang
  1 sibling, 1 reply; 22+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-08 10:40 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, Andrew Morton, linux-kernel, Rik van Riel, Li Xinhai,
	Kirill A. Shutemov

On 08/01/2020 05.32, Wei Yang wrote:
> On Tue, Jan 07, 2020 at 01:19:56PM +0300, Konstantin Khlebnikov wrote:
>> This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>> mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected
>> way but fortunately still produces valid anon-vma tree, so nothing crashes.
>>
>> If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>> before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>> fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0).
>> Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn
>> will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal).
>> With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>>
>> Root problem caused by initialization order in dup_mmap(): vma->vm_prev
>> is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>> previous VMA in parent mm.
>>
>> Second problem is hidden behind first one: assumption "Parent has vm_prev,
>> which implies we have vm_prev" is wrong if first VMA in parent mm has set
>> flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>> because in current code 'prev' actually is same as 'pprev'.
>>
>> Third hidden problem is linking between VMA and anon-vmas whose pages it
>> could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas,
>> shared anon-vma isn't attached. But every mapped page stays reachable in
>> rmap because we erroneously share anon-vma from parent's previous VMA.
>>
>> This patch moves sharing logic out of anon_vma_clone() into more specific
>> anon_vma_fork() because this supposed to work only at fork() and simply
>> reuses anon_vma from previous VMA if it is forked from the same anon-vma.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>> Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>> ---
>> include/linux/rmap.h |    3 ++-
>> kernel/fork.c        |    2 +-
>> mm/rmap.c            |   23 +++++++++--------------
>> 3 files changed, 12 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> index 988d176472df..560e4480dcd0 100644
>> --- a/include/linux/rmap.h
>> +++ b/include/linux/rmap.h
>> @@ -143,7 +143,8 @@ void anon_vma_init(void);	/* create anon_vma_cachep */
>> int  __anon_vma_prepare(struct vm_area_struct *);
>> void unlink_anon_vmas(struct vm_area_struct *);
>> int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
>> -int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>> +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>> +		  struct vm_area_struct *prev);
>>
>> static inline int anon_vma_prepare(struct vm_area_struct *vma)
>> {
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 2508a4f238a3..c33626993831 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>> 			tmp->anon_vma = NULL;
>> 			if (anon_vma_prepare(tmp))
>> 				goto fail_nomem_anon_vma_fork;
>> -		} else if (anon_vma_fork(tmp, mpnt))
>> +		} else if (anon_vma_fork(tmp, mpnt, prev))
>> 			goto fail_nomem_anon_vma_fork;
>> 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>> 		tmp->vm_next = tmp->vm_prev = NULL;
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index b3e381919835..3c1e04389291 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>> {
>> 	struct anon_vma_chain *avc, *pavc;
>> 	struct anon_vma *root = NULL;
>> -	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>> -
>> -	/*
>> -	 * If parent share anon_vma with its vm_prev, keep this sharing in in
>> -	 * child.
>> -	 *
>> -	 * 1. Parent has vm_prev, which implies we have vm_prev.
>> -	 * 2. Parent and its vm_prev have the same anon_vma.
>> -	 */
>> -	if (!dst->anon_vma && src->anon_vma &&
>> -	    pprev && pprev->anon_vma == src->anon_vma)
>> -		dst->anon_vma = prev->anon_vma;
>> -
>>
>> 	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>> 		struct anon_vma *anon_vma;
>> @@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>   * the corresponding VMA in the parent process is attached to.
>>   * Returns 0 on success, non-zero on failure.
>>   */
>> -int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>> +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>> +		  struct vm_area_struct *prev)
>> {
>> 	struct anon_vma_chain *avc;
>> 	struct anon_vma *anon_vma;
>> @@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>> 	if (!pvma->anon_vma)
>> 		return 0;
>>
>> +	/* Share anon_vma with previous VMA if it has the same parent. */
>> +	if (prev && prev->anon_vma &&
>> +	    prev->anon_vma->parent == pvma->anon_vma) {
>> +		vma->anon_vma = prev->anon_vma;
>> +		return anon_vma_clone(vma, prev);
>> +	}
>> +
> 
> I am afraid this one change the intended behavior. Let's put a chart to
> describe.
> 
> Commit 4e4a9eb92133 ("mm/rmap.c: reusemergeable anon_vma as parent when
> fork") tries to improve the following situation.
> 
> Before the commit, the behavior is like this:
> 
> Parent process:
> 
>        +-----+
>        | pav |<-----------------+----------------------+
>        +-----+                  |                      |
>                                 |                      |
>                     +-----------+          +-----------+
>                     |pprev      |          |pvma       |
>                     +-----------+          +-----------+
> 
> Child Process
> 
> 
>        +-----+                     +-----+
>        | av1 |<-----------------+  | av2 |<------------+
>        +-----+                  |  +-----+             |
>                                 |                      |
>                     +-----------+          +-----------+
>                     |prev       |          |vma        |
>                     +-----------+          +-----------+
> 
> 
> Parent pprev and pvma share the same anon_vma due to
> find_mergeable_anon_vma(). While the anon_vma_clone() would pick up different
> anon_vma for child process's vma.
> 
> The purpose of my commit is to give child process the following shape.
> 
>        +-----+
>        | av  |<-----------------+----------------------+
>        +-----+                  |                      |
>                                 |                      |
>                     +-----------+          +-----------+
>                     |prev       |          |vma        |
>                     +-----------+          +-----------+
> 
> After this, we reduce the extra "av2" for child process. But yes, because of
> the two reasons you found, it didn't do the exact thing.
> 
> While if my understanding is correct, the anon_vma_clone() would pick up any
> anon_vma in its process tree, except parent's. If this fails to get a reusable
> one, anon_vma_fork() would allocate one, whose parent is pvma->anon_vma.
> 
> Let me summarise original behavior:
> 
>    * if anon_vma_clone succeed, it find one anon_vma in the process tree, but
>      it could not be pvma->anon_vma
>    * if anon_vma_clone fail, it will allocate a new anon_vma and its parent is
>      pvma->anon_vam
> 
> Then take a look into your code here.
> 
> "prev->anon_vma->parent == pvma->anon_vma" means prev->anon_vma parent is
> pvma's anon_vma. If my understanding is correct, this just match the second
> case. For "prev", we didn't find a reusable anon_vma and allocate a new one.
> 
> But how about the first case? prev reuse an anon_vma in the process tree which
> is not parent's?

If anon_vma_clone() pick old anon-vma for first vma in sharing chain (prev)
then second vma (vma) will fork new anon-vma (unless pick another old anon-vma),
then third vma will share it. And so on.

Fork works left to right - we don't known about next vma to predict sharing and
choose better options.

But reusing old vma doesn't allocates new one. It's better to not reuse them
second time because this makes tree less optimal (and actually not a tree anymore).
This is just a trick to prevent unlimited growth anon-vma chains in background:
while each anon-vma has at least one vma or two childs then their count is
limited with count of vmas which are visible and limited.

> 
>> 	/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
>> 	vma->anon_vma = NULL;
>>
> 
> --
> Wei Yang
> Help you, Help me
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-08 10:40   ` Konstantin Khlebnikov
@ 2020-01-09  2:52     ` Wei Yang
  2020-01-09  8:54       ` Konstantin Khlebnikov
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2020-01-09  2:52 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Wei Yang, linux-mm, Andrew Morton, linux-kernel, Rik van Riel,
	Li Xinhai, Kirill A. Shutemov

On Wed, Jan 08, 2020 at 01:40:44PM +0300, Konstantin Khlebnikov wrote:
>On 08/01/2020 05.32, Wei Yang wrote:
>> On Tue, Jan 07, 2020 at 01:19:56PM +0300, Konstantin Khlebnikov wrote:
>> > This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>> > mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected
>> > way but fortunately still produces valid anon-vma tree, so nothing crashes.
>> > 
>> > If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>> > before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>> > fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0).
>> > Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn
>> > will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal).
>> > With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>> > 
>> > Root problem caused by initialization order in dup_mmap(): vma->vm_prev
>> > is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>> > previous VMA in parent mm.
>> > 
>> > Second problem is hidden behind first one: assumption "Parent has vm_prev,
>> > which implies we have vm_prev" is wrong if first VMA in parent mm has set
>> > flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>> > because in current code 'prev' actually is same as 'pprev'.
>> > 
>> > Third hidden problem is linking between VMA and anon-vmas whose pages it
>> > could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas,
>> > shared anon-vma isn't attached. But every mapped page stays reachable in
>> > rmap because we erroneously share anon-vma from parent's previous VMA.
>> > 
>> > This patch moves sharing logic out of anon_vma_clone() into more specific
>> > anon_vma_fork() because this supposed to work only at fork() and simply
>> > reuses anon_vma from previous VMA if it is forked from the same anon-vma.
>> > 
>> > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> > Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> > Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>> > Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>> > ---
>> > include/linux/rmap.h |    3 ++-
>> > kernel/fork.c        |    2 +-
>> > mm/rmap.c            |   23 +++++++++--------------
>> > 3 files changed, 12 insertions(+), 16 deletions(-)
>> > 
>> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> > index 988d176472df..560e4480dcd0 100644
>> > --- a/include/linux/rmap.h
>> > +++ b/include/linux/rmap.h
>> > @@ -143,7 +143,8 @@ void anon_vma_init(void);	/* create anon_vma_cachep */
>> > int  __anon_vma_prepare(struct vm_area_struct *);
>> > void unlink_anon_vmas(struct vm_area_struct *);
>> > int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
>> > -int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>> > +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>> > +		  struct vm_area_struct *prev);
>> > 
>> > static inline int anon_vma_prepare(struct vm_area_struct *vma)
>> > {
>> > diff --git a/kernel/fork.c b/kernel/fork.c
>> > index 2508a4f238a3..c33626993831 100644
>> > --- a/kernel/fork.c
>> > +++ b/kernel/fork.c
>> > @@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>> > 			tmp->anon_vma = NULL;
>> > 			if (anon_vma_prepare(tmp))
>> > 				goto fail_nomem_anon_vma_fork;
>> > -		} else if (anon_vma_fork(tmp, mpnt))
>> > +		} else if (anon_vma_fork(tmp, mpnt, prev))
>> > 			goto fail_nomem_anon_vma_fork;
>> > 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>> > 		tmp->vm_next = tmp->vm_prev = NULL;
>> > diff --git a/mm/rmap.c b/mm/rmap.c
>> > index b3e381919835..3c1e04389291 100644
>> > --- a/mm/rmap.c
>> > +++ b/mm/rmap.c
>> > @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>> > {
>> > 	struct anon_vma_chain *avc, *pavc;
>> > 	struct anon_vma *root = NULL;
>> > -	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>> > -
>> > -	/*
>> > -	 * If parent share anon_vma with its vm_prev, keep this sharing in in
>> > -	 * child.
>> > -	 *
>> > -	 * 1. Parent has vm_prev, which implies we have vm_prev.
>> > -	 * 2. Parent and its vm_prev have the same anon_vma.
>> > -	 */
>> > -	if (!dst->anon_vma && src->anon_vma &&
>> > -	    pprev && pprev->anon_vma == src->anon_vma)
>> > -		dst->anon_vma = prev->anon_vma;
>> > -
>> > 
>> > 	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>> > 		struct anon_vma *anon_vma;
>> > @@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>> >   * the corresponding VMA in the parent process is attached to.
>> >   * Returns 0 on success, non-zero on failure.
>> >   */
>> > -int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>> > +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>> > +		  struct vm_area_struct *prev)
>> > {
>> > 	struct anon_vma_chain *avc;
>> > 	struct anon_vma *anon_vma;
>> > @@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>> > 	if (!pvma->anon_vma)
>> > 		return 0;
>> > 
>> > +	/* Share anon_vma with previous VMA if it has the same parent. */
>> > +	if (prev && prev->anon_vma &&
>> > +	    prev->anon_vma->parent == pvma->anon_vma) {
>> > +		vma->anon_vma = prev->anon_vma;
>> > +		return anon_vma_clone(vma, prev);
>> > +	}
>> > +
>> 
>> I am afraid this one change the intended behavior. Let's put a chart to
>> describe.
>> 
>> Commit 4e4a9eb92133 ("mm/rmap.c: reusemergeable anon_vma as parent when
>> fork") tries to improve the following situation.
>> 
>> Before the commit, the behavior is like this:
>> 
>> Parent process:
>> 
>>        +-----+
>>        | pav |<-----------------+----------------------+
>>        +-----+                  |                      |
>>                                 |                      |
>>                     +-----------+          +-----------+
>>                     |pprev      |          |pvma       |
>>                     +-----------+          +-----------+
>> 
>> Child Process
>> 
>> 
>>        +-----+                     +-----+
>>        | av1 |<-----------------+  | av2 |<------------+
>>        +-----+                  |  +-----+             |
>>                                 |                      |
>>                     +-----------+          +-----------+
>>                     |prev       |          |vma        |
>>                     +-----------+          +-----------+
>> 
>> 
>> Parent pprev and pvma share the same anon_vma due to
>> find_mergeable_anon_vma(). While the anon_vma_clone() would pick up different
>> anon_vma for child process's vma.
>> 
>> The purpose of my commit is to give child process the following shape.
>> 
>>        +-----+
>>        | av  |<-----------------+----------------------+
>>        +-----+                  |                      |
>>                                 |                      |
>>                     +-----------+          +-----------+
>>                     |prev       |          |vma        |
>>                     +-----------+          +-----------+
>> 
>> After this, we reduce the extra "av2" for child process. But yes, because of
>> the two reasons you found, it didn't do the exact thing.
>> 
>> While if my understanding is correct, the anon_vma_clone() would pick up any
>> anon_vma in its process tree, except parent's. If this fails to get a reusable
>> one, anon_vma_fork() would allocate one, whose parent is pvma->anon_vma.
>> 
>> Let me summarise original behavior:
>> 
>>    * if anon_vma_clone succeed, it find one anon_vma in the process tree, but
>>      it could not be pvma->anon_vma
>>    * if anon_vma_clone fail, it will allocate a new anon_vma and its parent is
>>      pvma->anon_vam
>> 
>> Then take a look into your code here.
>> 
>> "prev->anon_vma->parent == pvma->anon_vma" means prev->anon_vma parent is
>> pvma's anon_vma. If my understanding is correct, this just match the second
>> case. For "prev", we didn't find a reusable anon_vma and allocate a new one.
>> 
>> But how about the first case? prev reuse an anon_vma in the process tree which
>> is not parent's?
>
>If anon_vma_clone() pick old anon-vma for first vma in sharing chain (prev)
>then second vma (vma) will fork new anon-vma (unless pick another old anon-vma),
>then third vma will share it. And so on.

No, I am afraid you are not correct here. Or I don't understand your sentence.

This is my understanding about the behavior before my commit. Suppose av1 and
av2 are both reused from old anon_vma. And if my understanding is correct,
they are different from pvma->anon_vma. Then how your code match this
situatioin?

        +-----+                     +-----+
        | av1 |<-----------------+  | av2 |<------------+
        +-----+                  |  +-----+             |
                                 |                      |
                     +-----------+          +-----------+
                     |prev       |          |vma        |
                     +-----------+          +-----------+

Would you explain your understanding the second and third vma in your
sentence? Which case you are trying to illustrate?

>Fork works left to right - we don't known about next vma to predict sharing and
>choose better options.
>
>But reusing old vma doesn't allocates new one. It's better to not reuse them

You mean reuse old anon_vma here?

>second time because this makes tree less optimal (and actually not a tree anymore).
>This is just a trick to prevent unlimited growth anon-vma chains in background:
>while each anon-vma has at least one vma or two childs then their count is
>limited with count of vmas which are visible and limited.
>
>> 
>> > 	/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
>> > 	vma->anon_vma = NULL;
>> > 
>> 
>> --
>> Wei Yang
>> Help you, Help me
>> 

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-09  2:52     ` Wei Yang
@ 2020-01-09  8:54       ` Konstantin Khlebnikov
  2020-01-10  2:30         ` Wei Yang
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-09  8:54 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, Andrew Morton, linux-kernel, Rik van Riel, Li Xinhai,
	Kirill A. Shutemov



On 09/01/2020 05.52, Wei Yang wrote:
> On Wed, Jan 08, 2020 at 01:40:44PM +0300, Konstantin Khlebnikov wrote:
>> On 08/01/2020 05.32, Wei Yang wrote:
>>> On Tue, Jan 07, 2020 at 01:19:56PM +0300, Konstantin Khlebnikov wrote:
>>>> This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>>>> mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected
>>>> way but fortunately still produces valid anon-vma tree, so nothing crashes.
>>>>
>>>> If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>>>> before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>>>> fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0).
>>>> Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn
>>>> will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal).
>>>> With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>>>>
>>>> Root problem caused by initialization order in dup_mmap(): vma->vm_prev
>>>> is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>>>> previous VMA in parent mm.
>>>>
>>>> Second problem is hidden behind first one: assumption "Parent has vm_prev,
>>>> which implies we have vm_prev" is wrong if first VMA in parent mm has set
>>>> flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>>>> because in current code 'prev' actually is same as 'pprev'.
>>>>
>>>> Third hidden problem is linking between VMA and anon-vmas whose pages it
>>>> could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas,
>>>> shared anon-vma isn't attached. But every mapped page stays reachable in
>>>> rmap because we erroneously share anon-vma from parent's previous VMA.
>>>>
>>>> This patch moves sharing logic out of anon_vma_clone() into more specific
>>>> anon_vma_fork() because this supposed to work only at fork() and simply
>>>> reuses anon_vma from previous VMA if it is forked from the same anon-vma.
>>>>
>>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>>> Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>>> Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>>>> Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>>>> ---
>>>> include/linux/rmap.h |    3 ++-
>>>> kernel/fork.c        |    2 +-
>>>> mm/rmap.c            |   23 +++++++++--------------
>>>> 3 files changed, 12 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>> index 988d176472df..560e4480dcd0 100644
>>>> --- a/include/linux/rmap.h
>>>> +++ b/include/linux/rmap.h
>>>> @@ -143,7 +143,8 @@ void anon_vma_init(void);	/* create anon_vma_cachep */
>>>> int  __anon_vma_prepare(struct vm_area_struct *);
>>>> void unlink_anon_vmas(struct vm_area_struct *);
>>>> int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
>>>> -int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>>>> +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>>>> +		  struct vm_area_struct *prev);
>>>>
>>>> static inline int anon_vma_prepare(struct vm_area_struct *vma)
>>>> {
>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>> index 2508a4f238a3..c33626993831 100644
>>>> --- a/kernel/fork.c
>>>> +++ b/kernel/fork.c
>>>> @@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>> 			tmp->anon_vma = NULL;
>>>> 			if (anon_vma_prepare(tmp))
>>>> 				goto fail_nomem_anon_vma_fork;
>>>> -		} else if (anon_vma_fork(tmp, mpnt))
>>>> +		} else if (anon_vma_fork(tmp, mpnt, prev))
>>>> 			goto fail_nomem_anon_vma_fork;
>>>> 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>>>> 		tmp->vm_next = tmp->vm_prev = NULL;
>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>> index b3e381919835..3c1e04389291 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>> {
>>>> 	struct anon_vma_chain *avc, *pavc;
>>>> 	struct anon_vma *root = NULL;
>>>> -	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>>>> -
>>>> -	/*
>>>> -	 * If parent share anon_vma with its vm_prev, keep this sharing in in
>>>> -	 * child.
>>>> -	 *
>>>> -	 * 1. Parent has vm_prev, which implies we have vm_prev.
>>>> -	 * 2. Parent and its vm_prev have the same anon_vma.
>>>> -	 */
>>>> -	if (!dst->anon_vma && src->anon_vma &&
>>>> -	    pprev && pprev->anon_vma == src->anon_vma)
>>>> -		dst->anon_vma = prev->anon_vma;
>>>> -
>>>>
>>>> 	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>>>> 		struct anon_vma *anon_vma;
>>>> @@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>>    * the corresponding VMA in the parent process is attached to.
>>>>    * Returns 0 on success, non-zero on failure.
>>>>    */
>>>> -int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>> +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>>>> +		  struct vm_area_struct *prev)
>>>> {
>>>> 	struct anon_vma_chain *avc;
>>>> 	struct anon_vma *anon_vma;
>>>> @@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>> 	if (!pvma->anon_vma)
>>>> 		return 0;
>>>>
>>>> +	/* Share anon_vma with previous VMA if it has the same parent. */
>>>> +	if (prev && prev->anon_vma &&
>>>> +	    prev->anon_vma->parent == pvma->anon_vma) {
>>>> +		vma->anon_vma = prev->anon_vma;
>>>> +		return anon_vma_clone(vma, prev);
>>>> +	}
>>>> +
>>>
>>> I am afraid this one change the intended behavior. Let's put a chart to
>>> describe.
>>>
>>> Commit 4e4a9eb92133 ("mm/rmap.c: reusemergeable anon_vma as parent when
>>> fork") tries to improve the following situation.
>>>
>>> Before the commit, the behavior is like this:
>>>
>>> Parent process:
>>>
>>>         +-----+
>>>         | pav |<-----------------+----------------------+
>>>         +-----+                  |                      |
>>>                                  |                      |
>>>                      +-----------+          +-----------+
>>>                      |pprev      |          |pvma       |
>>>                      +-----------+          +-----------+
>>>
>>> Child Process
>>>
>>>
>>>         +-----+                     +-----+
>>>         | av1 |<-----------------+  | av2 |<------------+
>>>         +-----+                  |  +-----+             |
>>>                                  |                      |
>>>                      +-----------+          +-----------+
>>>                      |prev       |          |vma        |
>>>                      +-----------+          +-----------+
>>>
>>>
>>> Parent pprev and pvma share the same anon_vma due to
>>> find_mergeable_anon_vma(). While the anon_vma_clone() would pick up different
>>> anon_vma for child process's vma.
>>>
>>> The purpose of my commit is to give child process the following shape.
>>>
>>>         +-----+
>>>         | av  |<-----------------+----------------------+
>>>         +-----+                  |                      |
>>>                                  |                      |
>>>                      +-----------+          +-----------+
>>>                      |prev       |          |vma        |
>>>                      +-----------+          +-----------+
>>>
>>> After this, we reduce the extra "av2" for child process. But yes, because of
>>> the two reasons you found, it didn't do the exact thing.
>>>
>>> While if my understanding is correct, the anon_vma_clone() would pick up any
>>> anon_vma in its process tree, except parent's. If this fails to get a reusable
>>> one, anon_vma_fork() would allocate one, whose parent is pvma->anon_vma.
>>>
>>> Let me summarise original behavior:
>>>
>>>     * if anon_vma_clone succeed, it find one anon_vma in the process tree, but
>>>       it could not be pvma->anon_vma
>>>     * if anon_vma_clone fail, it will allocate a new anon_vma and its parent is
>>>       pvma->anon_vam
>>>
>>> Then take a look into your code here.
>>>
>>> "prev->anon_vma->parent == pvma->anon_vma" means prev->anon_vma parent is
>>> pvma's anon_vma. If my understanding is correct, this just match the second
>>> case. For "prev", we didn't find a reusable anon_vma and allocate a new one.
>>>
>>> But how about the first case? prev reuse an anon_vma in the process tree which
>>> is not parent's?
>>
>> If anon_vma_clone() pick old anon-vma for first vma in sharing chain (prev)
>> then second vma (vma) will fork new anon-vma (unless pick another old anon-vma),
>> then third vma will share it. And so on.
> 
> No, I am afraid you are not correct here. Or I don't understand your sentence.
> 
> This is my understanding about the behavior before my commit. Suppose av1 and
> av2 are both reused from old anon_vma. And if my understanding is correct,
> they are different from pvma->anon_vma. Then how your code match this
> situatioin?
> 
>          +-----+                     +-----+
>          | av1 |<-----------------+  | av2 |<------------+
>          +-----+                  |  +-----+             |
>                                   |                      |
>                       +-----------+          +-----------+
>                       |prev       |          |vma        |
>                       +-----------+          +-----------+
> 
> Would you explain your understanding the second and third vma in your
> sentence? Which case you are trying to illustrate?

series of vma in parent with shared AV:

SRC1 - AV0
SRC2 - AV0
SRC3 - AV0
...
SRCn - AV0

in child after fork

DST1 - AV_OLD_1 (some old vma, picked by anon_vma_clone) plus DST1 is attached to same AVs as SRC1
DST2 - AV_OLD_2 (other old vma) plus DST1 is attached to same AVs as SRC2
DST2 - AV1 prev AV parent does not match AV0, no old vma found for reusing -> allocate new one (child of AV0)
DST3 - AV1 - DST2->AV->parent == SRC3->AV (AV0) -> share AV with prev
DST4 - AV1 - same thing
...
DSTn - AV1

> 
>> Fork works left to right - we don't known about next vma to predict sharing and
>> choose better options.
>>
>> But reusing old vma doesn't allocates new one. It's better to not reuse them
> 
> You mean reuse old anon_vma here?

Picking ancestor AV in anon_vma_clone instead of allocating new one.

> 
>> second time because this makes tree less optimal (and actually not a tree anymore).
>> This is just a trick to prevent unlimited growth anon-vma chains in background:
>> while each anon-vma has at least one vma or two childs then their count is
>> limited with count of vmas which are visible and limited.
>>
>>>
>>>> 	/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
>>>> 	vma->anon_vma = NULL;
>>>>
>>>
>>> --
>>> Wei Yang
>>> Help you, Help me
>>>
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-09  8:54       ` Konstantin Khlebnikov
@ 2020-01-10  2:30         ` Wei Yang
  2020-01-10  3:23           ` Li Xinhai
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2020-01-10  2:30 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Wei Yang, linux-mm, Andrew Morton, linux-kernel, Rik van Riel,
	Li Xinhai, Kirill A. Shutemov

On Thu, Jan 09, 2020 at 11:54:21AM +0300, Konstantin Khlebnikov wrote:
>
>
>On 09/01/2020 05.52, Wei Yang wrote:
>> On Wed, Jan 08, 2020 at 01:40:44PM +0300, Konstantin Khlebnikov wrote:
>> > On 08/01/2020 05.32, Wei Yang wrote:
>> > > On Tue, Jan 07, 2020 at 01:19:56PM +0300, Konstantin Khlebnikov wrote:
>> > > > This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>> > > > mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected
>> > > > way but fortunately still produces valid anon-vma tree, so nothing crashes.
>> > > > 
>> > > > If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>> > > > before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>> > > > fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0).
>> > > > Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn
>> > > > will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal).
>> > > > With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>> > > > 
>> > > > Root problem caused by initialization order in dup_mmap(): vma->vm_prev
>> > > > is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>> > > > previous VMA in parent mm.
>> > > > 
>> > > > Second problem is hidden behind first one: assumption "Parent has vm_prev,
>> > > > which implies we have vm_prev" is wrong if first VMA in parent mm has set
>> > > > flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>> > > > because in current code 'prev' actually is same as 'pprev'.
>> > > > 
>> > > > Third hidden problem is linking between VMA and anon-vmas whose pages it
>> > > > could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas,
>> > > > shared anon-vma isn't attached. But every mapped page stays reachable in
>> > > > rmap because we erroneously share anon-vma from parent's previous VMA.
>> > > > 
>> > > > This patch moves sharing logic out of anon_vma_clone() into more specific
>> > > > anon_vma_fork() because this supposed to work only at fork() and simply
>> > > > reuses anon_vma from previous VMA if it is forked from the same anon-vma.
>> > > > 
>> > > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> > > > Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> > > > Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>> > > > Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>> > > > ---
>> > > > include/linux/rmap.h |    3 ++-
>> > > > kernel/fork.c        |    2 +-
>> > > > mm/rmap.c            |   23 +++++++++--------------
>> > > > 3 files changed, 12 insertions(+), 16 deletions(-)
>> > > > 
>> > > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> > > > index 988d176472df..560e4480dcd0 100644
>> > > > --- a/include/linux/rmap.h
>> > > > +++ b/include/linux/rmap.h
>> > > > @@ -143,7 +143,8 @@ void anon_vma_init(void);	/* create anon_vma_cachep */
>> > > > int  __anon_vma_prepare(struct vm_area_struct *);
>> > > > void unlink_anon_vmas(struct vm_area_struct *);
>> > > > int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
>> > > > -int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>> > > > +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>> > > > +		  struct vm_area_struct *prev);
>> > > > 
>> > > > static inline int anon_vma_prepare(struct vm_area_struct *vma)
>> > > > {
>> > > > diff --git a/kernel/fork.c b/kernel/fork.c
>> > > > index 2508a4f238a3..c33626993831 100644
>> > > > --- a/kernel/fork.c
>> > > > +++ b/kernel/fork.c
>> > > > @@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>> > > > 			tmp->anon_vma = NULL;
>> > > > 			if (anon_vma_prepare(tmp))
>> > > > 				goto fail_nomem_anon_vma_fork;
>> > > > -		} else if (anon_vma_fork(tmp, mpnt))
>> > > > +		} else if (anon_vma_fork(tmp, mpnt, prev))
>> > > > 			goto fail_nomem_anon_vma_fork;
>> > > > 		tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>> > > > 		tmp->vm_next = tmp->vm_prev = NULL;
>> > > > diff --git a/mm/rmap.c b/mm/rmap.c
>> > > > index b3e381919835..3c1e04389291 100644
>> > > > --- a/mm/rmap.c
>> > > > +++ b/mm/rmap.c
>> > > > @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>> > > > {
>> > > > 	struct anon_vma_chain *avc, *pavc;
>> > > > 	struct anon_vma *root = NULL;
>> > > > -	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>> > > > -
>> > > > -	/*
>> > > > -	 * If parent share anon_vma with its vm_prev, keep this sharing in in
>> > > > -	 * child.
>> > > > -	 *
>> > > > -	 * 1. Parent has vm_prev, which implies we have vm_prev.
>> > > > -	 * 2. Parent and its vm_prev have the same anon_vma.
>> > > > -	 */
>> > > > -	if (!dst->anon_vma && src->anon_vma &&
>> > > > -	    pprev && pprev->anon_vma == src->anon_vma)
>> > > > -		dst->anon_vma = prev->anon_vma;
>> > > > -
>> > > > 
>> > > > 	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>> > > > 		struct anon_vma *anon_vma;
>> > > > @@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>> > > >    * the corresponding VMA in the parent process is attached to.
>> > > >    * Returns 0 on success, non-zero on failure.
>> > > >    */
>> > > > -int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>> > > > +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>> > > > +		  struct vm_area_struct *prev)
>> > > > {
>> > > > 	struct anon_vma_chain *avc;
>> > > > 	struct anon_vma *anon_vma;
>> > > > @@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>> > > > 	if (!pvma->anon_vma)
>> > > > 		return 0;
>> > > > 
>> > > > +	/* Share anon_vma with previous VMA if it has the same parent. */
>> > > > +	if (prev && prev->anon_vma &&
>> > > > +	    prev->anon_vma->parent == pvma->anon_vma) {
>> > > > +		vma->anon_vma = prev->anon_vma;
>> > > > +		return anon_vma_clone(vma, prev);
>> > > > +	}
>> > > > +
>> > > 
>> > > I am afraid this one change the intended behavior. Let's put a chart to
>> > > describe.
>> > > 
>> > > Commit 4e4a9eb92133 ("mm/rmap.c: reusemergeable anon_vma as parent when
>> > > fork") tries to improve the following situation.
>> > > 
>> > > Before the commit, the behavior is like this:
>> > > 
>> > > Parent process:
>> > > 
>> > >         +-----+
>> > >         | pav |<-----------------+----------------------+
>> > >         +-----+                  |                      |
>> > >                                  |                      |
>> > >                      +-----------+          +-----------+
>> > >                      |pprev      |          |pvma       |
>> > >                      +-----------+          +-----------+
>> > > 
>> > > Child Process
>> > > 
>> > > 
>> > >         +-----+                     +-----+
>> > >         | av1 |<-----------------+  | av2 |<------------+
>> > >         +-----+                  |  +-----+             |
>> > >                                  |                      |
>> > >                      +-----------+          +-----------+
>> > >                      |prev       |          |vma        |
>> > >                      +-----------+          +-----------+
>> > > 
>> > > 
>> > > Parent pprev and pvma share the same anon_vma due to
>> > > find_mergeable_anon_vma(). While the anon_vma_clone() would pick up different
>> > > anon_vma for child process's vma.
>> > > 
>> > > The purpose of my commit is to give child process the following shape.
>> > > 
>> > >         +-----+
>> > >         | av  |<-----------------+----------------------+
>> > >         +-----+                  |                      |
>> > >                                  |                      |
>> > >                      +-----------+          +-----------+
>> > >                      |prev       |          |vma        |
>> > >                      +-----------+          +-----------+
>> > > 
>> > > After this, we reduce the extra "av2" for child process. But yes, because of
>> > > the two reasons you found, it didn't do the exact thing.
>> > > 
>> > > While if my understanding is correct, the anon_vma_clone() would pick up any
>> > > anon_vma in its process tree, except parent's. If this fails to get a reusable
>> > > one, anon_vma_fork() would allocate one, whose parent is pvma->anon_vma.
>> > > 
>> > > Let me summarise original behavior:
>> > > 
>> > >     * if anon_vma_clone succeed, it find one anon_vma in the process tree, but
>> > >       it could not be pvma->anon_vma
>> > >     * if anon_vma_clone fail, it will allocate a new anon_vma and its parent is
>> > >       pvma->anon_vam
>> > > 
>> > > Then take a look into your code here.
>> > > 
>> > > "prev->anon_vma->parent == pvma->anon_vma" means prev->anon_vma parent is
>> > > pvma's anon_vma. If my understanding is correct, this just match the second
>> > > case. For "prev", we didn't find a reusable anon_vma and allocate a new one.
>> > > 
>> > > But how about the first case? prev reuse an anon_vma in the process tree which
>> > > is not parent's?
>> > 
>> > If anon_vma_clone() pick old anon-vma for first vma in sharing chain (prev)
>> > then second vma (vma) will fork new anon-vma (unless pick another old anon-vma),
>> > then third vma will share it. And so on.
>> 
>> No, I am afraid you are not correct here. Or I don't understand your sentence.
>> 
>> This is my understanding about the behavior before my commit. Suppose av1 and
>> av2 are both reused from old anon_vma. And if my understanding is correct,
>> they are different from pvma->anon_vma. Then how your code match this
>> situatioin?
>> 
>>          +-----+                     +-----+
>>          | av1 |<-----------------+  | av2 |<------------+
>>          +-----+                  |  +-----+             |
>>                                   |                      |
>>                       +-----------+          +-----------+
>>                       |prev       |          |vma        |
>>                       +-----------+          +-----------+
>> 
>> Would you explain your understanding the second and third vma in your
>> sentence? Which case you are trying to illustrate?
>
>series of vma in parent with shared AV:
>
>SRC1 - AV0
>SRC2 - AV0
>SRC3 - AV0
>...
>SRCn - AV0
>
>in child after fork
>
>DST1 - AV_OLD_1 (some old vma, picked by anon_vma_clone) plus DST1 is attached to same AVs as SRC1
>DST2 - AV_OLD_2 (other old vma) plus DST1 is attached to same AVs as SRC2
>DST2 - AV1 prev AV parent does not match AV0, no old vma found for reusing -> allocate new one (child of AV0)
>DST3 - AV1 - DST2->AV->parent == SRC3->AV (AV0) -> share AV with prev
>DST4 - AV1 - same thing
>...
>DSTn - AV1
>

Yes, your code works for DST3..DSTn. They will pick up AV1 since
(DST2->AV->parent == SRC3->AV).

My question is why DST1 and DST2 has different AV? The purpose of my patch
tries to make child has the same topology and parent. So the ideal look of
child is:

DST1 - AV1
DST2 - AV1
DST2 - AV1
DST3 - AV1
DST4 - AV1

Would you mind putting more words on DST1 and DST2? I didn't fully understand
the logic here.

Thanks

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-10  2:30         ` Wei Yang
@ 2020-01-10  3:23           ` Li Xinhai
  2020-01-10  5:34             ` Wei Yang
  0 siblings, 1 reply; 22+ messages in thread
From: Li Xinhai @ 2020-01-10  3:23 UTC (permalink / raw)
  To: richardw.yang, khlebnikov
  Cc: richardw.yang, linux-mm, akpm, linux-kernel, Rik van Riel,
	kirill.shutemov

On 2020-01-10 at 10:30 Wei Yang wrote:
>On Thu, Jan 09, 2020 at 11:54:21AM +0300, Konstantin Khlebnikov wrote:
>>
>>
>>On 09/01/2020 05.52, Wei Yang wrote:
>>> On Wed, Jan 08, 2020 at 01:40:44PM +0300, Konstantin Khlebnikov wrote:
>>> > On 08/01/2020 05.32, Wei Yang wrote:
>>> > > On Tue, Jan 07, 2020 at 01:19:56PM +0300, Konstantin Khlebnikov wrote:
>>> > > > This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>>> > > > mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected
>>> > > > way but fortunately still produces valid anon-vma tree, so nothing crashes.
>>> > > >
>>> > > > If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>>> > > > before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>>> > > > fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0).
>>> > > > Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn
>>> > > > will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal).
>>> > > > With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>>> > > >
>>> > > > Root problem caused by initialization order in dup_mmap(): vma->vm_prev
>>> > > > is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>>> > > > previous VMA in parent mm.
>>> > > >
>>> > > > Second problem is hidden behind first one: assumption "Parent has vm_prev,
>>> > > > which implies we have vm_prev" is wrong if first VMA in parent mm has set
>>> > > > flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>>> > > > because in current code 'prev' actually is same as 'pprev'.
>>> > > >
>>> > > > Third hidden problem is linking between VMA and anon-vmas whose pages it
>>> > > > could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas,
>>> > > > shared anon-vma isn't attached. But every mapped page stays reachable in
>>> > > > rmap because we erroneously share anon-vma from parent's previous VMA.
>>> > > >
>>> > > > This patch moves sharing logic out of anon_vma_clone() into more specific
>>> > > > anon_vma_fork() because this supposed to work only at fork() and simply
>>> > > > reuses anon_vma from previous VMA if it is forked from the same anon-vma.
>>> > > >
>>> > > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>> > > > Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>> > > > Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>>> > > > Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>>> > > > ---
>>> > > > include/linux/rmap.h |    3 ++-
>>> > > > kernel/fork.c        |    2 +-
>>> > > > mm/rmap.c            |   23 +++++++++--------------
>>> > > > 3 files changed, 12 insertions(+), 16 deletions(-)
>>> > > >
>>> > > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>> > > > index 988d176472df..560e4480dcd0 100644
>>> > > > --- a/include/linux/rmap.h
>>> > > > +++ b/include/linux/rmap.h
>>> > > > @@ -143,7 +143,8 @@ void anon_vma_init(void);	/* create anon_vma_cachep */
>>> > > > int  __anon_vma_prepare(struct vm_area_struct *);
>>> > > > void unlink_anon_vmas(struct vm_area_struct *);
>>> > > > int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
>>> > > > -int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>>> > > > +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>>> > > > +	  struct vm_area_struct *prev);
>>> > > >
>>> > > > static inline int anon_vma_prepare(struct vm_area_struct *vma)
>>> > > > {
>>> > > > diff --git a/kernel/fork.c b/kernel/fork.c
>>> > > > index 2508a4f238a3..c33626993831 100644
>>> > > > --- a/kernel/fork.c
>>> > > > +++ b/kernel/fork.c
>>> > > > @@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>> > > > tmp->anon_vma = NULL;
>>> > > > if (anon_vma_prepare(tmp))
>>> > > > goto fail_nomem_anon_vma_fork;
>>> > > > -	} else if (anon_vma_fork(tmp, mpnt))
>>> > > > +	} else if (anon_vma_fork(tmp, mpnt, prev))
>>> > > > goto fail_nomem_anon_vma_fork;
>>> > > > tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>>> > > > tmp->vm_next = tmp->vm_prev = NULL;
>>> > > > diff --git a/mm/rmap.c b/mm/rmap.c
>>> > > > index b3e381919835..3c1e04389291 100644
>>> > > > --- a/mm/rmap.c
>>> > > > +++ b/mm/rmap.c
>>> > > > @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>> > > > {
>>> > > > struct anon_vma_chain *avc, *pavc;
>>> > > > struct anon_vma *root = NULL;
>>> > > > -	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>>> > > > -
>>> > > > -	/*
>>> > > > -	* If parent share anon_vma with its vm_prev, keep this sharing in in
>>> > > > -	* child.
>>> > > > -	*
>>> > > > -	* 1. Parent has vm_prev, which implies we have vm_prev.
>>> > > > -	* 2. Parent and its vm_prev have the same anon_vma.
>>> > > > -	*/
>>> > > > -	if (!dst->anon_vma && src->anon_vma &&
>>> > > > -	    pprev && pprev->anon_vma == src->anon_vma)
>>> > > > -	dst->anon_vma = prev->anon_vma;
>>> > > > -
>>> > > >
>>> > > > list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>>> > > > struct anon_vma *anon_vma;
>>> > > > @@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>> > > >    * the corresponding VMA in the parent process is attached to.
>>> > > >    * Returns 0 on success, non-zero on failure.
>>> > > >    */
>>> > > > -int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>> > > > +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>>> > > > +	  struct vm_area_struct *prev)
>>> > > > {
>>> > > > struct anon_vma_chain *avc;
>>> > > > struct anon_vma *anon_vma;
>>> > > > @@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>> > > > if (!pvma->anon_vma)
>>> > > > return 0;
>>> > > >
>>> > > > +	/* Share anon_vma with previous VMA if it has the same parent. */
>>> > > > +	if (prev && prev->anon_vma &&
>>> > > > +	    prev->anon_vma->parent == pvma->anon_vma) {
>>> > > > +	vma->anon_vma = prev->anon_vma;
>>> > > > +	return anon_vma_clone(vma, prev);
>>> > > > +	}
>>> > > > +
>>> > >
>>> > > I am afraid this one change the intended behavior. Let's put a chart to
>>> > > describe.
>>> > >
>>> > > Commit 4e4a9eb92133 ("mm/rmap.c: reusemergeable anon_vma as parent when
>>> > > fork") tries to improve the following situation.
>>> > >
>>> > > Before the commit, the behavior is like this:
>>> > >
>>> > > Parent process:
>>> > >
>>> > >         +-----+
>>> > >         | pav |<-----------------+----------------------+
>>> > >         +-----+                  |                      |
>>> > >                                  |                      |
>>> > >                      +-----------+          +-----------+
>>> > >                      |pprev      |          |pvma       |
>>> > >                      +-----------+          +-----------+
>>> > >
>>> > > Child Process
>>> > >
>>> > >
>>> > >         +-----+                     +-----+
>>> > >         | av1 |<-----------------+  | av2 |<------------+
>>> > >         +-----+                  |  +-----+             |
>>> > >                                  |                      |
>>> > >                      +-----------+          +-----------+
>>> > >                      |prev       |          |vma        |
>>> > >                      +-----------+          +-----------+
>>> > >
>>> > >
>>> > > Parent pprev and pvma share the same anon_vma due to
>>> > > find_mergeable_anon_vma(). While the anon_vma_clone() would pick up different
>>> > > anon_vma for child process's vma.
>>> > >
>>> > > The purpose of my commit is to give child process the following shape.
>>> > >
>>> > >         +-----+
>>> > >         | av  |<-----------------+----------------------+
>>> > >         +-----+                  |                      |
>>> > >                                  |                      |
>>> > >                      +-----------+          +-----------+
>>> > >                      |prev       |          |vma        |
>>> > >                      +-----------+          +-----------+
>>> > >
>>> > > After this, we reduce the extra "av2" for child process. But yes, because of
>>> > > the two reasons you found, it didn't do the exact thing.
>>> > >
>>> > > While if my understanding is correct, the anon_vma_clone() would pick up any
>>> > > anon_vma in its process tree, except parent's. If this fails to get a reusable
>>> > > one, anon_vma_fork() would allocate one, whose parent is pvma->anon_vma.
>>> > >
>>> > > Let me summarise original behavior:
>>> > >
>>> > >     * if anon_vma_clone succeed, it find one anon_vma in the process tree, but
>>> > >       it could not be pvma->anon_vma
>>> > >     * if anon_vma_clone fail, it will allocate a new anon_vma and its parent is
>>> > >       pvma->anon_vam
>>> > >
>>> > > Then take a look into your code here.
>>> > >
>>> > > "prev->anon_vma->parent == pvma->anon_vma" means prev->anon_vma parent is
>>> > > pvma's anon_vma. If my understanding is correct, this just match the second
>>> > > case. For "prev", we didn't find a reusable anon_vma and allocate a new one.
>>> > >
>>> > > But how about the first case? prev reuse an anon_vma in the process tree which
>>> > > is not parent's?
>>> >
>>> > If anon_vma_clone() pick old anon-vma for first vma in sharing chain (prev)
>>> > then second vma (vma) will fork new anon-vma (unless pick another old anon-vma),
>>> > then third vma will share it. And so on.
>>>
>>> No, I am afraid you are not correct here. Or I don't understand your sentence.
>>>
>>> This is my understanding about the behavior before my commit. Suppose av1 and
>>> av2 are both reused from old anon_vma. And if my understanding is correct,
>>> they are different from pvma->anon_vma. Then how your code match this
>>> situatioin?
>>>
>>>          +-----+                     +-----+
>>>          | av1 |<-----------------+  | av2 |<------------+
>>>          +-----+                  |  +-----+             |
>>>                                   |                      |
>>>                       +-----------+          +-----------+
>>>                       |prev       |          |vma        |
>>>                       +-----------+          +-----------+
>>>
>>> Would you explain your understanding the second and third vma in your
>>> sentence? Which case you are trying to illustrate?
>>
>>series of vma in parent with shared AV:
>>
>>SRC1 - AV0
>>SRC2 - AV0
>>SRC3 - AV0
>>...
>>SRCn - AV0
>>
>>in child after fork
>>
>>DST1 - AV_OLD_1 (some old vma, picked by anon_vma_clone) plus DST1 is attached to same AVs as SRC1
>>DST2 - AV_OLD_2 (other old vma) plus DST1 is attached to same AVs as SRC2
>>DST2 - AV1 prev AV parent does not match AV0, no old vma found for reusing -> allocate new one (child of AV0)
>>DST3 - AV1 - DST2->AV->parent == SRC3->AV (AV0) -> share AV with prev
>>DST4 - AV1 - same thing
>>...
>>DSTn - AV1
>>
>
>Yes, your code works for DST3..DSTn. They will pick up AV1 since
>(DST2->AV->parent == SRC3->AV).
>
>My question is why DST1 and DST2 has different AV? The purpose of my patch
>tries to make child has the same topology and parent. So the ideal look of
>child is:
>
>DST1 - AV1
>DST2 - AV1
>DST2 - AV1
>DST3 - AV1
>DST4 - AV1
>
>Would you mind putting more words on DST1 and DST2? I didn't fully understand
>the logic here.
>
>Thanks
> 

I think that the first version is doing the work as you expected, but been
revised in second version, to limits the number of users of reused old
anon(which is picked in anon_vma_clone() and keep the tree structure.

>--
>Wei Yang
>Help you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-10  3:23           ` Li Xinhai
@ 2020-01-10  5:34             ` Wei Yang
  2020-01-10  8:11               ` Konstantin Khlebnikov
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2020-01-10  5:34 UTC (permalink / raw)
  To: Li Xinhai
  Cc: richardw.yang, khlebnikov, linux-mm, akpm, linux-kernel,
	Rik van Riel, kirill.shutemov

On Fri, Jan 10, 2020 at 11:23:59AM +0800, Li Xinhai wrote:
>On 2020-01-10 at 10:30 Wei Yang wrote:
>>On Thu, Jan 09, 2020 at 11:54:21AM +0300, Konstantin Khlebnikov wrote:
>>>
>>>
>>>On 09/01/2020 05.52, Wei Yang wrote:
>>>> On Wed, Jan 08, 2020 at 01:40:44PM +0300, Konstantin Khlebnikov wrote:
>>>> > On 08/01/2020 05.32, Wei Yang wrote:
>>>> > > On Tue, Jan 07, 2020 at 01:19:56PM +0300, Konstantin Khlebnikov wrote:
>>>> > > > This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>>>> > > > mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected
>>>> > > > way but fortunately still produces valid anon-vma tree, so nothing crashes.
>>>> > > >
>>>> > > > If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>>>> > > > before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>>>> > > > fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0).
>>>> > > > Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn
>>>> > > > will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal).
>>>> > > > With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>>>> > > >
>>>> > > > Root problem caused by initialization order in dup_mmap(): vma->vm_prev
>>>> > > > is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>>>> > > > previous VMA in parent mm.
>>>> > > >
>>>> > > > Second problem is hidden behind first one: assumption "Parent has vm_prev,
>>>> > > > which implies we have vm_prev" is wrong if first VMA in parent mm has set
>>>> > > > flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>>>> > > > because in current code 'prev' actually is same as 'pprev'.
>>>> > > >
>>>> > > > Third hidden problem is linking between VMA and anon-vmas whose pages it
>>>> > > > could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas,
>>>> > > > shared anon-vma isn't attached. But every mapped page stays reachable in
>>>> > > > rmap because we erroneously share anon-vma from parent's previous VMA.
>>>> > > >
>>>> > > > This patch moves sharing logic out of anon_vma_clone() into more specific
>>>> > > > anon_vma_fork() because this supposed to work only at fork() and simply
>>>> > > > reuses anon_vma from previous VMA if it is forked from the same anon-vma.
>>>> > > >
>>>> > > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>>> > > > Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>>> > > > Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>>>> > > > Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>>>> > > > ---
>>>> > > > include/linux/rmap.h |    3 ++-
>>>> > > > kernel/fork.c        |    2 +-
>>>> > > > mm/rmap.c            |   23 +++++++++--------------
>>>> > > > 3 files changed, 12 insertions(+), 16 deletions(-)
>>>> > > >
>>>> > > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>> > > > index 988d176472df..560e4480dcd0 100644
>>>> > > > --- a/include/linux/rmap.h
>>>> > > > +++ b/include/linux/rmap.h
>>>> > > > @@ -143,7 +143,8 @@ void anon_vma_init(void);	/* create anon_vma_cachep */
>>>> > > > int  __anon_vma_prepare(struct vm_area_struct *);
>>>> > > > void unlink_anon_vmas(struct vm_area_struct *);
>>>> > > > int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
>>>> > > > -int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>>>> > > > +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>>>> > > > +	  struct vm_area_struct *prev);
>>>> > > >
>>>> > > > static inline int anon_vma_prepare(struct vm_area_struct *vma)
>>>> > > > {
>>>> > > > diff --git a/kernel/fork.c b/kernel/fork.c
>>>> > > > index 2508a4f238a3..c33626993831 100644
>>>> > > > --- a/kernel/fork.c
>>>> > > > +++ b/kernel/fork.c
>>>> > > > @@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>> > > > tmp->anon_vma = NULL;
>>>> > > > if (anon_vma_prepare(tmp))
>>>> > > > goto fail_nomem_anon_vma_fork;
>>>> > > > -	} else if (anon_vma_fork(tmp, mpnt))
>>>> > > > +	} else if (anon_vma_fork(tmp, mpnt, prev))
>>>> > > > goto fail_nomem_anon_vma_fork;
>>>> > > > tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>>>> > > > tmp->vm_next = tmp->vm_prev = NULL;
>>>> > > > diff --git a/mm/rmap.c b/mm/rmap.c
>>>> > > > index b3e381919835..3c1e04389291 100644
>>>> > > > --- a/mm/rmap.c
>>>> > > > +++ b/mm/rmap.c
>>>> > > > @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>> > > > {
>>>> > > > struct anon_vma_chain *avc, *pavc;
>>>> > > > struct anon_vma *root = NULL;
>>>> > > > -	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>>>> > > > -
>>>> > > > -	/*
>>>> > > > -	* If parent share anon_vma with its vm_prev, keep this sharing in in
>>>> > > > -	* child.
>>>> > > > -	*
>>>> > > > -	* 1. Parent has vm_prev, which implies we have vm_prev.
>>>> > > > -	* 2. Parent and its vm_prev have the same anon_vma.
>>>> > > > -	*/
>>>> > > > -	if (!dst->anon_vma && src->anon_vma &&
>>>> > > > -	    pprev && pprev->anon_vma == src->anon_vma)
>>>> > > > -	dst->anon_vma = prev->anon_vma;
>>>> > > > -
>>>> > > >
>>>> > > > list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>>>> > > > struct anon_vma *anon_vma;
>>>> > > > @@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>> > > >    * the corresponding VMA in the parent process is attached to.
>>>> > > >    * Returns 0 on success, non-zero on failure.
>>>> > > >    */
>>>> > > > -int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>> > > > +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>>>> > > > +	  struct vm_area_struct *prev)
>>>> > > > {
>>>> > > > struct anon_vma_chain *avc;
>>>> > > > struct anon_vma *anon_vma;
>>>> > > > @@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>> > > > if (!pvma->anon_vma)
>>>> > > > return 0;
>>>> > > >
>>>> > > > +	/* Share anon_vma with previous VMA if it has the same parent. */
>>>> > > > +	if (prev && prev->anon_vma &&
>>>> > > > +	    prev->anon_vma->parent == pvma->anon_vma) {
>>>> > > > +	vma->anon_vma = prev->anon_vma;
>>>> > > > +	return anon_vma_clone(vma, prev);
>>>> > > > +	}
>>>> > > > +
>>>> > >
>>>> > > I am afraid this one change the intended behavior. Let's put a chart to
>>>> > > describe.
>>>> > >
>>>> > > Commit 4e4a9eb92133 ("mm/rmap.c: reusemergeable anon_vma as parent when
>>>> > > fork") tries to improve the following situation.
>>>> > >
>>>> > > Before the commit, the behavior is like this:
>>>> > >
>>>> > > Parent process:
>>>> > >
>>>> > >         +-----+
>>>> > >         | pav |<-----------------+----------------------+
>>>> > >         +-----+                  |                      |
>>>> > >                                  |                      |
>>>> > >                      +-----------+          +-----------+
>>>> > >                      |pprev      |          |pvma       |
>>>> > >                      +-----------+          +-----------+
>>>> > >
>>>> > > Child Process
>>>> > >
>>>> > >
>>>> > >         +-----+                     +-----+
>>>> > >         | av1 |<-----------------+  | av2 |<------------+
>>>> > >         +-----+                  |  +-----+             |
>>>> > >                                  |                      |
>>>> > >                      +-----------+          +-----------+
>>>> > >                      |prev       |          |vma        |
>>>> > >                      +-----------+          +-----------+
>>>> > >
>>>> > >
>>>> > > Parent pprev and pvma share the same anon_vma due to
>>>> > > find_mergeable_anon_vma(). While the anon_vma_clone() would pick up different
>>>> > > anon_vma for child process's vma.
>>>> > >
>>>> > > The purpose of my commit is to give child process the following shape.
>>>> > >
>>>> > >         +-----+
>>>> > >         | av  |<-----------------+----------------------+
>>>> > >         +-----+                  |                      |
>>>> > >                                  |                      |
>>>> > >                      +-----------+          +-----------+
>>>> > >                      |prev       |          |vma        |
>>>> > >                      +-----------+          +-----------+
>>>> > >
>>>> > > After this, we reduce the extra "av2" for child process. But yes, because of
>>>> > > the two reasons you found, it didn't do the exact thing.
>>>> > >
>>>> > > While if my understanding is correct, the anon_vma_clone() would pick up any
>>>> > > anon_vma in its process tree, except parent's. If this fails to get a reusable
>>>> > > one, anon_vma_fork() would allocate one, whose parent is pvma->anon_vma.
>>>> > >
>>>> > > Let me summarise original behavior:
>>>> > >
>>>> > >     * if anon_vma_clone succeed, it find one anon_vma in the process tree, but
>>>> > >       it could not be pvma->anon_vma
>>>> > >     * if anon_vma_clone fail, it will allocate a new anon_vma and its parent is
>>>> > >       pvma->anon_vam
>>>> > >
>>>> > > Then take a look into your code here.
>>>> > >
>>>> > > "prev->anon_vma->parent == pvma->anon_vma" means prev->anon_vma parent is
>>>> > > pvma's anon_vma. If my understanding is correct, this just match the second
>>>> > > case. For "prev", we didn't find a reusable anon_vma and allocate a new one.
>>>> > >
>>>> > > But how about the first case? prev reuse an anon_vma in the process tree which
>>>> > > is not parent's?
>>>> >
>>>> > If anon_vma_clone() pick old anon-vma for first vma in sharing chain (prev)
>>>> > then second vma (vma) will fork new anon-vma (unless pick another old anon-vma),
>>>> > then third vma will share it. And so on.
>>>>
>>>> No, I am afraid you are not correct here. Or I don't understand your sentence.
>>>>
>>>> This is my understanding about the behavior before my commit. Suppose av1 and
>>>> av2 are both reused from old anon_vma. And if my understanding is correct,
>>>> they are different from pvma->anon_vma. Then how your code match this
>>>> situatioin?
>>>>
>>>>          +-----+                     +-----+
>>>>          | av1 |<-----------------+  | av2 |<------------+
>>>>          +-----+                  |  +-----+             |
>>>>                                   |                      |
>>>>                       +-----------+          +-----------+
>>>>                       |prev       |          |vma        |
>>>>                       +-----------+          +-----------+
>>>>
>>>> Would you explain your understanding the second and third vma in your
>>>> sentence? Which case you are trying to illustrate?
>>>
>>>series of vma in parent with shared AV:
>>>
>>>SRC1 - AV0
>>>SRC2 - AV0
>>>SRC3 - AV0
>>>...
>>>SRCn - AV0
>>>
>>>in child after fork
>>>
>>>DST1 - AV_OLD_1 (some old vma, picked by anon_vma_clone) plus DST1 is attached to same AVs as SRC1
>>>DST2 - AV_OLD_2 (other old vma) plus DST1 is attached to same AVs as SRC2
>>>DST2 - AV1 prev AV parent does not match AV0, no old vma found for reusing -> allocate new one (child of AV0)
>>>DST3 - AV1 - DST2->AV->parent == SRC3->AV (AV0) -> share AV with prev
>>>DST4 - AV1 - same thing
>>>...
>>>DSTn - AV1
>>>
>>
>>Yes, your code works for DST3..DSTn. They will pick up AV1 since
>>(DST2->AV->parent == SRC3->AV).
>>
>>My question is why DST1 and DST2 has different AV? The purpose of my patch
>>tries to make child has the same topology and parent. So the ideal look of
>>child is:
>>
>>DST1 - AV1
>>DST2 - AV1
>>DST2 - AV1
>>DST3 - AV1
>>DST4 - AV1
>>
>>Would you mind putting more words on DST1 and DST2? I didn't fully understand
>>the logic here.
>>
>>Thanks
>> 
>
>I think that the first version is doing the work as you expected, but been
>revised in second version, to limits the number of users of reused old
>anon(which is picked in anon_vma_clone() and keep the tree structure.
>

Any reason to reduce the reuse? Maybe I lost some point.

>>--
>>Wei Yang
>>Help you, Help me

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-10  5:34             ` Wei Yang
@ 2020-01-10  8:11               ` Konstantin Khlebnikov
  2020-01-11 22:38                 ` Wei Yang
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-10  8:11 UTC (permalink / raw)
  To: Wei Yang, Li Xinhai
  Cc: linux-mm, akpm, linux-kernel, Rik van Riel, kirill.shutemov

On 10/01/2020 08.34, Wei Yang wrote:
> On Fri, Jan 10, 2020 at 11:23:59AM +0800, Li Xinhai wrote:
>> On 2020-01-10 at 10:30 Wei Yang wrote:
>>> On Thu, Jan 09, 2020 at 11:54:21AM +0300, Konstantin Khlebnikov wrote:
>>>>
>>>>
>>>> On 09/01/2020 05.52, Wei Yang wrote:
>>>>> On Wed, Jan 08, 2020 at 01:40:44PM +0300, Konstantin Khlebnikov wrote:
>>>>>> On 08/01/2020 05.32, Wei Yang wrote:
>>>>>>> On Tue, Jan 07, 2020 at 01:19:56PM +0300, Konstantin Khlebnikov wrote:
>>>>>>>> This fixes some misconceptions in commit 4e4a9eb92133 ("mm/rmap.c: reuse
>>>>>>>> mergeable anon_vma as parent when fork"). It merges anon-vma in unexpected
>>>>>>>> way but fortunately still produces valid anon-vma tree, so nothing crashes.
>>>>>>>>
>>>>>>>> If in parent VMAs: SRC1 SRC2 .. SRCn share anon-vma ANON0, then after fork
>>>>>>>> before all patches in child process related VMAs: DST1 DST2 .. DSTn will
>>>>>>>> fork indepndent anon-vmas: ANON1 ANON2 .. ANONn (each is child of ANON0).
>>>>>>>> Before this patch only DST1 will fork new ANON1 and following DST2 .. DSTn
>>>>>>>> will share parent's ANON0 (i.e. anon-vma tree is valid but isn't optimal).
>>>>>>>> With this patch DST1 will create new ANON1 and DST2 .. DSTn will share it.
>>>>>>>>
>>>>>>>> Root problem caused by initialization order in dup_mmap(): vma->vm_prev
>>>>>>>> is set after calling anon_vma_fork(). Thus in anon_vma_fork() it points to
>>>>>>>> previous VMA in parent mm.
>>>>>>>>
>>>>>>>> Second problem is hidden behind first one: assumption "Parent has vm_prev,
>>>>>>>> which implies we have vm_prev" is wrong if first VMA in parent mm has set
>>>>>>>> flag VM_DONTCOPY. Luckily prev->anon_vma doesn't dereference NULL pointer
>>>>>>>> because in current code 'prev' actually is same as 'pprev'.
>>>>>>>>
>>>>>>>> Third hidden problem is linking between VMA and anon-vmas whose pages it
>>>>>>>> could contain. Loop in anon_vma_clone() attaches only parent's anon-vmas,
>>>>>>>> shared anon-vma isn't attached. But every mapped page stays reachable in
>>>>>>>> rmap because we erroneously share anon-vma from parent's previous VMA.
>>>>>>>>
>>>>>>>> This patch moves sharing logic out of anon_vma_clone() into more specific
>>>>>>>> anon_vma_fork() because this supposed to work only at fork() and simply
>>>>>>>> reuses anon_vma from previous VMA if it is forked from the same anon-vma.
>>>>>>>>
>>>>>>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>>>>>>> Reported-by: Li Xinhai <lixinhai.lxh@gmail.com>
>>>>>>>> Fixes: 4e4a9eb92133 ("mm/rmap.c: reuse mergeable anon_vma as parent when fork")
>>>>>>>> Link: https://lore.kernel.org/linux-mm/CALYGNiNzz+dxHX0g5-gNypUQc3B=8_Scp53-NTOh=zWsdUuHAw@mail.gmail.com/T/#t
>>>>>>>> ---
>>>>>>>> include/linux/rmap.h |    3 ++-
>>>>>>>> kernel/fork.c        |    2 +-
>>>>>>>> mm/rmap.c            |   23 +++++++++--------------
>>>>>>>> 3 files changed, 12 insertions(+), 16 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>>>>>> index 988d176472df..560e4480dcd0 100644
>>>>>>>> --- a/include/linux/rmap.h
>>>>>>>> +++ b/include/linux/rmap.h
>>>>>>>> @@ -143,7 +143,8 @@ void anon_vma_init(void);	/* create anon_vma_cachep */
>>>>>>>> int  __anon_vma_prepare(struct vm_area_struct *);
>>>>>>>> void unlink_anon_vmas(struct vm_area_struct *);
>>>>>>>> int anon_vma_clone(struct vm_area_struct *, struct vm_area_struct *);
>>>>>>>> -int anon_vma_fork(struct vm_area_struct *, struct vm_area_struct *);
>>>>>>>> +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>>>>>>>> +	  struct vm_area_struct *prev);
>>>>>>>>
>>>>>>>> static inline int anon_vma_prepare(struct vm_area_struct *vma)
>>>>>>>> {
>>>>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>>>>> index 2508a4f238a3..c33626993831 100644
>>>>>>>> --- a/kernel/fork.c
>>>>>>>> +++ b/kernel/fork.c
>>>>>>>> @@ -556,7 +556,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>>>>>>>> tmp->anon_vma = NULL;
>>>>>>>> if (anon_vma_prepare(tmp))
>>>>>>>> goto fail_nomem_anon_vma_fork;
>>>>>>>> -	} else if (anon_vma_fork(tmp, mpnt))
>>>>>>>> +	} else if (anon_vma_fork(tmp, mpnt, prev))
>>>>>>>> goto fail_nomem_anon_vma_fork;
>>>>>>>> tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
>>>>>>>> tmp->vm_next = tmp->vm_prev = NULL;
>>>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>>>>> index b3e381919835..3c1e04389291 100644
>>>>>>>> --- a/mm/rmap.c
>>>>>>>> +++ b/mm/rmap.c
>>>>>>>> @@ -269,19 +269,6 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>>>>>> {
>>>>>>>> struct anon_vma_chain *avc, *pavc;
>>>>>>>> struct anon_vma *root = NULL;
>>>>>>>> -	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
>>>>>>>> -
>>>>>>>> -	/*
>>>>>>>> -	* If parent share anon_vma with its vm_prev, keep this sharing in in
>>>>>>>> -	* child.
>>>>>>>> -	*
>>>>>>>> -	* 1. Parent has vm_prev, which implies we have vm_prev.
>>>>>>>> -	* 2. Parent and its vm_prev have the same anon_vma.
>>>>>>>> -	*/
>>>>>>>> -	if (!dst->anon_vma && src->anon_vma &&
>>>>>>>> -	    pprev && pprev->anon_vma == src->anon_vma)
>>>>>>>> -	dst->anon_vma = prev->anon_vma;
>>>>>>>> -
>>>>>>>>
>>>>>>>> list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>>>>>>>> struct anon_vma *anon_vma;
>>>>>>>> @@ -332,7 +319,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>>>>>>>>      * the corresponding VMA in the parent process is attached to.
>>>>>>>>      * Returns 0 on success, non-zero on failure.
>>>>>>>>      */
>>>>>>>> -int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>>>>>> +int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma,
>>>>>>>> +	  struct vm_area_struct *prev)
>>>>>>>> {
>>>>>>>> struct anon_vma_chain *avc;
>>>>>>>> struct anon_vma *anon_vma;
>>>>>>>> @@ -342,6 +330,13 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
>>>>>>>> if (!pvma->anon_vma)
>>>>>>>> return 0;
>>>>>>>>
>>>>>>>> +	/* Share anon_vma with previous VMA if it has the same parent. */
>>>>>>>> +	if (prev && prev->anon_vma &&
>>>>>>>> +	    prev->anon_vma->parent == pvma->anon_vma) {
>>>>>>>> +	vma->anon_vma = prev->anon_vma;
>>>>>>>> +	return anon_vma_clone(vma, prev);
>>>>>>>> +	}
>>>>>>>> +
>>>>>>>
>>>>>>> I am afraid this one change the intended behavior. Let's put a chart to
>>>>>>> describe.
>>>>>>>
>>>>>>> Commit 4e4a9eb92133 ("mm/rmap.c: reusemergeable anon_vma as parent when
>>>>>>> fork") tries to improve the following situation.
>>>>>>>
>>>>>>> Before the commit, the behavior is like this:
>>>>>>>
>>>>>>> Parent process:
>>>>>>>
>>>>>>>           +-----+
>>>>>>>           | pav |<-----------------+----------------------+
>>>>>>>           +-----+                  |                      |
>>>>>>>                                    |                      |
>>>>>>>                        +-----------+          +-----------+
>>>>>>>                        |pprev      |          |pvma       |
>>>>>>>                        +-----------+          +-----------+
>>>>>>>
>>>>>>> Child Process
>>>>>>>
>>>>>>>
>>>>>>>           +-----+                     +-----+
>>>>>>>           | av1 |<-----------------+  | av2 |<------------+
>>>>>>>           +-----+                  |  +-----+             |
>>>>>>>                                    |                      |
>>>>>>>                        +-----------+          +-----------+
>>>>>>>                        |prev       |          |vma        |
>>>>>>>                        +-----------+          +-----------+
>>>>>>>
>>>>>>>
>>>>>>> Parent pprev and pvma share the same anon_vma due to
>>>>>>> find_mergeable_anon_vma(). While the anon_vma_clone() would pick up different
>>>>>>> anon_vma for child process's vma.
>>>>>>>
>>>>>>> The purpose of my commit is to give child process the following shape.
>>>>>>>
>>>>>>>           +-----+
>>>>>>>           | av  |<-----------------+----------------------+
>>>>>>>           +-----+                  |                      |
>>>>>>>                                    |                      |
>>>>>>>                        +-----------+          +-----------+
>>>>>>>                        |prev       |          |vma        |
>>>>>>>                        +-----------+          +-----------+
>>>>>>>
>>>>>>> After this, we reduce the extra "av2" for child process. But yes, because of
>>>>>>> the two reasons you found, it didn't do the exact thing.
>>>>>>>
>>>>>>> While if my understanding is correct, the anon_vma_clone() would pick up any
>>>>>>> anon_vma in its process tree, except parent's. If this fails to get a reusable
>>>>>>> one, anon_vma_fork() would allocate one, whose parent is pvma->anon_vma.
>>>>>>>
>>>>>>> Let me summarise original behavior:
>>>>>>>
>>>>>>>       * if anon_vma_clone succeed, it find one anon_vma in the process tree, but
>>>>>>>         it could not be pvma->anon_vma
>>>>>>>       * if anon_vma_clone fail, it will allocate a new anon_vma and its parent is
>>>>>>>         pvma->anon_vam
>>>>>>>
>>>>>>> Then take a look into your code here.
>>>>>>>
>>>>>>> "prev->anon_vma->parent == pvma->anon_vma" means prev->anon_vma parent is
>>>>>>> pvma's anon_vma. If my understanding is correct, this just match the second
>>>>>>> case. For "prev", we didn't find a reusable anon_vma and allocate a new one.
>>>>>>>
>>>>>>> But how about the first case? prev reuse an anon_vma in the process tree which
>>>>>>> is not parent's?
>>>>>>
>>>>>> If anon_vma_clone() pick old anon-vma for first vma in sharing chain (prev)
>>>>>> then second vma (vma) will fork new anon-vma (unless pick another old anon-vma),
>>>>>> then third vma will share it. And so on.
>>>>>
>>>>> No, I am afraid you are not correct here. Or I don't understand your sentence.
>>>>>
>>>>> This is my understanding about the behavior before my commit. Suppose av1 and
>>>>> av2 are both reused from old anon_vma. And if my understanding is correct,
>>>>> they are different from pvma->anon_vma. Then how your code match this
>>>>> situatioin?
>>>>>
>>>>>            +-----+                     +-----+
>>>>>            | av1 |<-----------------+  | av2 |<------------+
>>>>>            +-----+                  |  +-----+             |
>>>>>                                     |                      |
>>>>>                         +-----------+          +-----------+
>>>>>                         |prev       |          |vma        |
>>>>>                         +-----------+          +-----------+
>>>>>
>>>>> Would you explain your understanding the second and third vma in your
>>>>> sentence? Which case you are trying to illustrate?
>>>>
>>>> series of vma in parent with shared AV:
>>>>
>>>> SRC1 - AV0
>>>> SRC2 - AV0
>>>> SRC3 - AV0
>>>> ...
>>>> SRCn - AV0
>>>>
>>>> in child after fork
>>>>
>>>> DST1 - AV_OLD_1 (some old vma, picked by anon_vma_clone) plus DST1 is attached to same AVs as SRC1
>>>> DST2 - AV_OLD_2 (other old vma) plus DST1 is attached to same AVs as SRC2
>>>> DST2 - AV1 prev AV parent does not match AV0, no old vma found for reusing -> allocate new one (child of AV0)
>>>> DST3 - AV1 - DST2->AV->parent == SRC3->AV (AV0) -> share AV with prev
>>>> DST4 - AV1 - same thing
>>>> ...
>>>> DSTn - AV1
>>>>
>>>
>>> Yes, your code works for DST3..DSTn. They will pick up AV1 since
>>> (DST2->AV->parent == SRC3->AV).
>>>
>>> My question is why DST1 and DST2 has different AV? The purpose of my patch
>>> tries to make child has the same topology and parent. So the ideal look of
>>> child is:
>>>
>>> DST1 - AV1
>>> DST2 - AV1
>>> DST2 - AV1
>>> DST3 - AV1
>>> DST4 - AV1
>>>
>>> Would you mind putting more words on DST1 and DST2? I didn't fully understand
>>> the logic here.
>>>
>>> Thanks
>>>
>>
>> I think that the first version is doing the work as you expected, but been
>> revised in second version, to limits the number of users of reused old
>> anon(which is picked in anon_vma_clone() and keep the tree structure.
>>
> 
> Any reason to reduce the reuse? Maybe I lost some point.

I've illustrated how two heuristics (reusing-old and sharing-prev) _could_ work together.
But they both are optional.

At cloning first vma SRC1 -> DST1 there is no prev to share anon vma,
thus works common code which _could_ reuse old vma because it have to.

If there is no old anon-vma which have to be reused then DST1 will allocate
new anon-vma (AV1) and it will be used by DST2 and so on like on your picture.

> 
>>> --
>>> Wei Yang
>>> Help you, Help me
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-10  8:11               ` Konstantin Khlebnikov
@ 2020-01-11 22:38                 ` Wei Yang
  2020-01-12  9:55                   ` Konstantin Khlebnikov
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2020-01-11 22:38 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Wei Yang, Li Xinhai, linux-mm, akpm, linux-kernel, Rik van Riel,
	kirill.shutemov

On Fri, Jan 10, 2020 at 11:11:23AM +0300, Konstantin Khlebnikov wrote:
[...]
>> > > > 
>> > > > series of vma in parent with shared AV:
>> > > > 
>> > > > SRC1 - AV0
>> > > > SRC2 - AV0
>> > > > SRC3 - AV0
>> > > > ...
>> > > > SRCn - AV0
>> > > > 
>> > > > in child after fork
>> > > > 
>> > > > DST1 - AV_OLD_1 (some old vma, picked by anon_vma_clone) plus DST1 is attached to same AVs as SRC1
>> > > > DST2 - AV_OLD_2 (other old vma) plus DST1 is attached to same AVs as SRC2
>> > > > DST2 - AV1 prev AV parent does not match AV0, no old vma found for reusing -> allocate new one (child of AV0)
>> > > > DST3 - AV1 - DST2->AV->parent == SRC3->AV (AV0) -> share AV with prev
>> > > > DST4 - AV1 - same thing
>> > > > ...
>> > > > DSTn - AV1
>> > > > 

To focus on the point, I rearranged the order a little. Suppose your following
comments is explaining the above behavior.

   I've illustrated how two heuristics (reusing-old and sharing-prev) _could_ work together.
   But they both are optional.
   
   At cloning first vma SRC1 -> DST1 there is no prev to share anon vma,
   thus works common code which _could_ reuse old vma because it have to.
   
   If there is no old anon-vma which have to be reused then DST1 will allocate
   new anon-vma (AV1) and it will be used by DST2 and so on like on your picture.

I agree with your 3rd paragraph, but confused with 2nd.

At cloning first vma SRC1 -> DST1, there is no prev so anon_vma_clone() would
pick up a reusable anon_vma. Here you named it AV_OLD_1. This looks good to
me. But I am not sure why you would picked up AV_OLD_2 for DST2? In parent,
SRC1 and SRC2 has the same anon_vma, AV0. So in child, DST1 and DST2 could
also share the same anon_vma, AV_OLD_1.

Sorry for my poor understanding, would you mind giving me more hint on this
change?

>> > > 
>> > > Yes, your code works for DST3..DSTn. They will pick up AV1 since
>> > > (DST2->AV->parent == SRC3->AV).
>> > > 
>> > > My question is why DST1 and DST2 has different AV? The purpose of my patch
>> > > tries to make child has the same topology and parent. So the ideal look of
>> > > child is:
>> > > 
>> > > DST1 - AV1
>> > > DST2 - AV1
>> > > DST2 - AV1
>> > > DST3 - AV1
>> > > DST4 - AV1
>> > > 
>> > > Would you mind putting more words on DST1 and DST2? I didn't fully understand
>> > > the logic here.
>> > > 
>> > > Thanks
>> > > 
>> > 
>> > I think that the first version is doing the work as you expected, but been
>> > revised in second version, to limits the number of users of reused old
>> > anon(which is picked in anon_vma_clone() and keep the tree structure.
>> > 
>> 
>> Any reason to reduce the reuse? Maybe I lost some point.
>
>> 
>> > > --
>> > > Wei Yang
>> > > Help you, Help me
>> 

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-11 22:38                 ` Wei Yang
@ 2020-01-12  9:55                   ` Konstantin Khlebnikov
  2020-01-13  0:33                     ` Wei Yang
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-12  9:55 UTC (permalink / raw)
  To: Wei Yang
  Cc: Li Xinhai, linux-mm, akpm, linux-kernel, Rik van Riel, kirill.shutemov



On 12/01/2020 01.38, Wei Yang wrote:
> On Fri, Jan 10, 2020 at 11:11:23AM +0300, Konstantin Khlebnikov wrote:
> [...]
>>>>>>
>>>>>> series of vma in parent with shared AV:
>>>>>>
>>>>>> SRC1 - AV0
>>>>>> SRC2 - AV0
>>>>>> SRC3 - AV0
>>>>>> ...
>>>>>> SRCn - AV0
>>>>>>
>>>>>> in child after fork
>>>>>>
>>>>>> DST1 - AV_OLD_1 (some old vma, picked by anon_vma_clone) plus DST1 is attached to same AVs as SRC1
>>>>>> DST2 - AV_OLD_2 (other old vma) plus DST1 is attached to same AVs as SRC2
>>>>>> DST2 - AV1 prev AV parent does not match AV0, no old vma found for reusing -> allocate new one (child of AV0)
>>>>>> DST3 - AV1 - DST2->AV->parent == SRC3->AV (AV0) -> share AV with prev
>>>>>> DST4 - AV1 - same thing
>>>>>> ...
>>>>>> DSTn - AV1
>>>>>>
> 
> To focus on the point, I rearranged the order a little. Suppose your following
> comments is explaining the above behavior.
> 
>     I've illustrated how two heuristics (reusing-old and sharing-prev) _could_ work together.
>     But they both are optional.
>     
>     At cloning first vma SRC1 -> DST1 there is no prev to share anon vma,
>     thus works common code which _could_ reuse old vma because it have to.
>     
>     If there is no old anon-vma which have to be reused then DST1 will allocate
>     new anon-vma (AV1) and it will be used by DST2 and so on like on your picture.
> 
> I agree with your 3rd paragraph, but confused with 2nd.
> 
> At cloning first vma SRC1 -> DST1, there is no prev so anon_vma_clone() would
> pick up a reusable anon_vma. Here you named it AV_OLD_1. This looks good to
> me. But I am not sure why you would picked up AV_OLD_2 for DST2? In parent,
> SRC1 and SRC2 has the same anon_vma, AV0. So in child, DST1 and DST2 could
> also share the same anon_vma, AV_OLD_1.
> 
> Sorry for my poor understanding, would you mind giving me more hint on this
> change?

For DST2 heuristic "share-with-prev" will not work because if prev (DST1)
uses old AV (AV_OLD_1) and AV_OLD_1->parent isn't SRC2->AV (AV0).
So DST2 could only pick another old AV or allocate new.

My patch uses condition dst->prev->anon_vma->parent == src->anon_vma rather
than obvious src->prev->anon_vma == src->anon_vma because in this way it
eliminates all unwanted corner cases and explicitly verifies that we going to
share related anon-vma.

Heuristic "reuse-old" uses fact that VMA links and AV parent chain are tracked
independently: when VMA reuses old AV it still links to all related AV even
if VMA->AV points into some old AV in the middle of inheritance chain.

> 
>>>>>
>>>>> Yes, your code works for DST3..DSTn. They will pick up AV1 since
>>>>> (DST2->AV->parent == SRC3->AV).
>>>>>
>>>>> My question is why DST1 and DST2 has different AV? The purpose of my patch
>>>>> tries to make child has the same topology and parent. So the ideal look of
>>>>> child is:
>>>>>
>>>>> DST1 - AV1
>>>>> DST2 - AV1
>>>>> DST2 - AV1
>>>>> DST3 - AV1
>>>>> DST4 - AV1
>>>>>
>>>>> Would you mind putting more words on DST1 and DST2? I didn't fully understand
>>>>> the logic here.
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>> I think that the first version is doing the work as you expected, but been
>>>> revised in second version, to limits the number of users of reused old
>>>> anon(which is picked in anon_vma_clone() and keep the tree structure.
>>>>
>>>
>>> Any reason to reduce the reuse? Maybe I lost some point.
>>
>>>
>>>>> --
>>>>> Wei Yang
>>>>> Help you, Help me
>>>
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-12  9:55                   ` Konstantin Khlebnikov
@ 2020-01-13  0:33                     ` Wei Yang
  2020-01-13 11:07                       ` Konstantin Khlebnikov
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2020-01-13  0:33 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Wei Yang, Li Xinhai, linux-mm, akpm, linux-kernel, Rik van Riel,
	kirill.shutemov

On Sun, Jan 12, 2020 at 12:55:45PM +0300, Konstantin Khlebnikov wrote:
>
>
>On 12/01/2020 01.38, Wei Yang wrote:
>> On Fri, Jan 10, 2020 at 11:11:23AM +0300, Konstantin Khlebnikov wrote:
>> [...]
>> > > > > > 
>> > > > > > series of vma in parent with shared AV:
>> > > > > > 
>> > > > > > SRC1 - AV0
>> > > > > > SRC2 - AV0
>> > > > > > SRC3 - AV0
>> > > > > > ...
>> > > > > > SRCn - AV0
>> > > > > > 
>> > > > > > in child after fork
>> > > > > > 
>> > > > > > DST1 - AV_OLD_1 (some old vma, picked by anon_vma_clone) plus DST1 is attached to same AVs as SRC1
>> > > > > > DST2 - AV_OLD_2 (other old vma) plus DST1 is attached to same AVs as SRC2
>> > > > > > DST2 - AV1 prev AV parent does not match AV0, no old vma found for reusing -> allocate new one (child of AV0)
>> > > > > > DST3 - AV1 - DST2->AV->parent == SRC3->AV (AV0) -> share AV with prev
>> > > > > > DST4 - AV1 - same thing
>> > > > > > ...
>> > > > > > DSTn - AV1
>> > > > > > 
>> 
>> To focus on the point, I rearranged the order a little. Suppose your following
>> comments is explaining the above behavior.
>> 
>>     I've illustrated how two heuristics (reusing-old and sharing-prev) _could_ work together.
>>     But they both are optional.
>>     At cloning first vma SRC1 -> DST1 there is no prev to share anon vma,
>>     thus works common code which _could_ reuse old vma because it have to.
>>     If there is no old anon-vma which have to be reused then DST1 will allocate
>>     new anon-vma (AV1) and it will be used by DST2 and so on like on your picture.
>> 
>> I agree with your 3rd paragraph, but confused with 2nd.
>> 
>> At cloning first vma SRC1 -> DST1, there is no prev so anon_vma_clone() would
>> pick up a reusable anon_vma. Here you named it AV_OLD_1. This looks good to
>> me. But I am not sure why you would picked up AV_OLD_2 for DST2? In parent,
>> SRC1 and SRC2 has the same anon_vma, AV0. So in child, DST1 and DST2 could
>> also share the same anon_vma, AV_OLD_1.
>> 
>> Sorry for my poor understanding, would you mind giving me more hint on this
>> change?
>
>For DST2 heuristic "share-with-prev" will not work because if prev (DST1)
>uses old AV (AV_OLD_1) and AV_OLD_1->parent isn't SRC2->AV (AV0).
>So DST2 could only pick another old AV or allocate new.

I know this behavior after your change, my question is why you want to do so.

>
>My patch uses condition dst->prev->anon_vma->parent == src->anon_vma rather
>than obvious src->prev->anon_vma == src->anon_vma because in this way it
>eliminates all unwanted corner cases and explicitly verifies that we going to
>share related anon-vma.
>

This do eliminates some corner case, but as you showed child and parent don't
share the same AV topology. To keep the same AV topology is the purpose of my
commit.

I agree you found some bug that previous commit doesn't do it is expected. But
since you change the design a little, I suggest you split this idea to a
separate patch so that reviewer and audience in the future could understand
your approach clearly. Otherwise audience would be confused and hard to track
this change.

For example, you describe the behavior after your change. The second vma would
probably have a different AV from first vma.

>Heuristic "reuse-old" uses fact that VMA links and AV parent chain are tracked
>independently: when VMA reuses old AV it still links to all related AV even
>if VMA->AV points into some old AV in the middle of inheritance chain.
>
>> 
>> > > > > 
>> > > > > Yes, your code works for DST3..DSTn. They will pick up AV1 since
>> > > > > (DST2->AV->parent == SRC3->AV).
>> > > > > 
>> > > > > My question is why DST1 and DST2 has different AV? The purpose of my patch
>> > > > > tries to make child has the same topology and parent. So the ideal look of
>> > > > > child is:
>> > > > > 
>> > > > > DST1 - AV1
>> > > > > DST2 - AV1
>> > > > > DST2 - AV1
>> > > > > DST3 - AV1
>> > > > > DST4 - AV1
>> > > > > 
>> > > > > Would you mind putting more words on DST1 and DST2? I didn't fully understand
>> > > > > the logic here.
>> > > > > 
>> > > > > Thanks
>> > > > > 
>> > > > 
>> > > > I think that the first version is doing the work as you expected, but been
>> > > > revised in second version, to limits the number of users of reused old
>> > > > anon(which is picked in anon_vma_clone() and keep the tree structure.
>> > > > 
>> > > 
>> > > Any reason to reduce the reuse? Maybe I lost some point.
>> > 
>> > > 
>> > > > > --
>> > > > > Wei Yang
>> > > > > Help you, Help me
>> > > 
>> 

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-13  0:33                     ` Wei Yang
@ 2020-01-13 11:07                       ` Konstantin Khlebnikov
  2020-01-14  2:09                         ` Wei Yang
  2020-01-14 14:42                         ` Li Xinhai
  0 siblings, 2 replies; 22+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-13 11:07 UTC (permalink / raw)
  To: Wei Yang
  Cc: Li Xinhai, linux-mm, akpm, linux-kernel, Rik van Riel, kirill.shutemov

On 13/01/2020 03.33, Wei Yang wrote:
> On Sun, Jan 12, 2020 at 12:55:45PM +0300, Konstantin Khlebnikov wrote:
>>
>>
>> On 12/01/2020 01.38, Wei Yang wrote:
>>> On Fri, Jan 10, 2020 at 11:11:23AM +0300, Konstantin Khlebnikov wrote:
>>> [...]
>>>>>>>>
>>>>>>>> series of vma in parent with shared AV:
>>>>>>>>
>>>>>>>> SRC1 - AV0
>>>>>>>> SRC2 - AV0
>>>>>>>> SRC3 - AV0
>>>>>>>> ...
>>>>>>>> SRCn - AV0
>>>>>>>>
>>>>>>>> in child after fork
>>>>>>>>
>>>>>>>> DST1 - AV_OLD_1 (some old vma, picked by anon_vma_clone) plus DST1 is attached to same AVs as SRC1
>>>>>>>> DST2 - AV_OLD_2 (other old vma) plus DST1 is attached to same AVs as SRC2
>>>>>>>> DST2 - AV1 prev AV parent does not match AV0, no old vma found for reusing -> allocate new one (child of AV0)
>>>>>>>> DST3 - AV1 - DST2->AV->parent == SRC3->AV (AV0) -> share AV with prev
>>>>>>>> DST4 - AV1 - same thing
>>>>>>>> ...
>>>>>>>> DSTn - AV1
>>>>>>>>
>>>
>>> To focus on the point, I rearranged the order a little. Suppose your following
>>> comments is explaining the above behavior.
>>>
>>>      I've illustrated how two heuristics (reusing-old and sharing-prev) _could_ work together.
>>>      But they both are optional.
>>>      At cloning first vma SRC1 -> DST1 there is no prev to share anon vma,
>>>      thus works common code which _could_ reuse old vma because it have to.
>>>      If there is no old anon-vma which have to be reused then DST1 will allocate
>>>      new anon-vma (AV1) and it will be used by DST2 and so on like on your picture.
>>>
>>> I agree with your 3rd paragraph, but confused with 2nd.
>>>
>>> At cloning first vma SRC1 -> DST1, there is no prev so anon_vma_clone() would
>>> pick up a reusable anon_vma. Here you named it AV_OLD_1. This looks good to
>>> me. But I am not sure why you would picked up AV_OLD_2 for DST2? In parent,
>>> SRC1 and SRC2 has the same anon_vma, AV0. So in child, DST1 and DST2 could
>>> also share the same anon_vma, AV_OLD_1.
>>>
>>> Sorry for my poor understanding, would you mind giving me more hint on this
>>> change?
>>
>> For DST2 heuristic "share-with-prev" will not work because if prev (DST1)
>> uses old AV (AV_OLD_1) and AV_OLD_1->parent isn't SRC2->AV (AV0).
>> So DST2 could only pick another old AV or allocate new.
> 
> I know this behavior after your change, my question is why you want to do so.

Because I want to keep both heuristics.
This seems most sane way of interaction between them.

Unfortunately even this patch is slightly broken.
Condition prev->anon_vma->parent == pvma->anon_vma doesn't guarantee that
prev vma has the same set of anon-vmas like current vma.
I.e. anon_vma_clone(vma, prev) might be not enough for keeping connectivity.
Building such case isn't trivial job but I see nothing that could prevent it.

> 
>>
>> My patch uses condition dst->prev->anon_vma->parent == src->anon_vma rather
>> than obvious src->prev->anon_vma == src->anon_vma because in this way it
>> eliminates all unwanted corner cases and explicitly verifies that we going to
>> share related anon-vma.
>>
> 
> This do eliminates some corner case, but as you showed child and parent don't
> share the same AV topology. To keep the same AV topology is the purpose of my
> commit.
> 
> I agree you found some bug that previous commit doesn't do it is expected. But
> since you change the design a little, I suggest you split this idea to a
> separate patch so that reviewer and audience in the future could understand
> your approach clearly. Otherwise audience would be confused and hard to track
> this change.
> 
> For example, you describe the behavior after your change. The second vma would
> probably have a different AV from first vma.
> 
>> Heuristic "reuse-old" uses fact that VMA links and AV parent chain are tracked
>> independently: when VMA reuses old AV it still links to all related AV even
>> if VMA->AV points into some old AV in the middle of inheritance chain.
>>
>>>
>>>>>>>
>>>>>>> Yes, your code works for DST3..DSTn. They will pick up AV1 since
>>>>>>> (DST2->AV->parent == SRC3->AV).
>>>>>>>
>>>>>>> My question is why DST1 and DST2 has different AV? The purpose of my patch
>>>>>>> tries to make child has the same topology and parent. So the ideal look of
>>>>>>> child is:
>>>>>>>
>>>>>>> DST1 - AV1
>>>>>>> DST2 - AV1
>>>>>>> DST2 - AV1
>>>>>>> DST3 - AV1
>>>>>>> DST4 - AV1
>>>>>>>
>>>>>>> Would you mind putting more words on DST1 and DST2? I didn't fully understand
>>>>>>> the logic here.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>
>>>>>> I think that the first version is doing the work as you expected, but been
>>>>>> revised in second version, to limits the number of users of reused old
>>>>>> anon(which is picked in anon_vma_clone() and keep the tree structure.
>>>>>>
>>>>>
>>>>> Any reason to reduce the reuse? Maybe I lost some point.
>>>>
>>>>>
>>>>>>> --
>>>>>>> Wei Yang
>>>>>>> Help you, Help me
>>>>>
>>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-13 11:07                       ` Konstantin Khlebnikov
@ 2020-01-14  2:09                         ` Wei Yang
  2020-01-14 14:42                         ` Li Xinhai
  1 sibling, 0 replies; 22+ messages in thread
From: Wei Yang @ 2020-01-14  2:09 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Wei Yang, Li Xinhai, linux-mm, akpm, linux-kernel, Rik van Riel,
	kirill.shutemov

On Mon, Jan 13, 2020 at 02:07:18PM +0300, Konstantin Khlebnikov wrote:
>On 13/01/2020 03.33, Wei Yang wrote:
>> On Sun, Jan 12, 2020 at 12:55:45PM +0300, Konstantin Khlebnikov wrote:
>> > 
>> > 
>> > On 12/01/2020 01.38, Wei Yang wrote:
>> > > On Fri, Jan 10, 2020 at 11:11:23AM +0300, Konstantin Khlebnikov wrote:
>> > > [...]
>> > > > > > > > 
>> > > > > > > > series of vma in parent with shared AV:
>> > > > > > > > 
>> > > > > > > > SRC1 - AV0
>> > > > > > > > SRC2 - AV0
>> > > > > > > > SRC3 - AV0
>> > > > > > > > ...
>> > > > > > > > SRCn - AV0
>> > > > > > > > 
>> > > > > > > > in child after fork
>> > > > > > > > 
>> > > > > > > > DST1 - AV_OLD_1 (some old vma, picked by anon_vma_clone) plus DST1 is attached to same AVs as SRC1
>> > > > > > > > DST2 - AV_OLD_2 (other old vma) plus DST1 is attached to same AVs as SRC2
>> > > > > > > > DST2 - AV1 prev AV parent does not match AV0, no old vma found for reusing -> allocate new one (child of AV0)
>> > > > > > > > DST3 - AV1 - DST2->AV->parent == SRC3->AV (AV0) -> share AV with prev
>> > > > > > > > DST4 - AV1 - same thing
>> > > > > > > > ...
>> > > > > > > > DSTn - AV1
>> > > > > > > > 
>> > > 
>> > > To focus on the point, I rearranged the order a little. Suppose your following
>> > > comments is explaining the above behavior.
>> > > 
>> > >      I've illustrated how two heuristics (reusing-old and sharing-prev) _could_ work together.
>> > >      But they both are optional.
>> > >      At cloning first vma SRC1 -> DST1 there is no prev to share anon vma,
>> > >      thus works common code which _could_ reuse old vma because it have to.
>> > >      If there is no old anon-vma which have to be reused then DST1 will allocate
>> > >      new anon-vma (AV1) and it will be used by DST2 and so on like on your picture.
>> > > 
>> > > I agree with your 3rd paragraph, but confused with 2nd.
>> > > 
>> > > At cloning first vma SRC1 -> DST1, there is no prev so anon_vma_clone() would
>> > > pick up a reusable anon_vma. Here you named it AV_OLD_1. This looks good to
>> > > me. But I am not sure why you would picked up AV_OLD_2 for DST2? In parent,
>> > > SRC1 and SRC2 has the same anon_vma, AV0. So in child, DST1 and DST2 could
>> > > also share the same anon_vma, AV_OLD_1.
>> > > 
>> > > Sorry for my poor understanding, would you mind giving me more hint on this
>> > > change?
>> > 
>> > For DST2 heuristic "share-with-prev" will not work because if prev (DST1)
>> > uses old AV (AV_OLD_1) and AV_OLD_1->parent isn't SRC2->AV (AV0).
>> > So DST2 could only pick another old AV or allocate new.
>> 
>> I know this behavior after your change, my question is why you want to do so.
>
>Because I want to keep both heuristics.
>This seems most sane way of interaction between them.
>

I am not sure this is more sane.

Still suggest to separate your idea into a new patch, so audience could
analysis and notice the change clearly. Otherwise audience would be confused
with this behavior.

>Unfortunately even this patch is slightly broken.
>Condition prev->anon_vma->parent == pvma->anon_vma doesn't guarantee that
>prev vma has the same set of anon-vmas like current vma.
>I.e. anon_vma_clone(vma, prev) might be not enough for keeping connectivity.
>Building such case isn't trivial job but I see nothing that could prevent it.
>
>> 
>> > 
>> > My patch uses condition dst->prev->anon_vma->parent == src->anon_vma rather
>> > than obvious src->prev->anon_vma == src->anon_vma because in this way it
>> > eliminates all unwanted corner cases and explicitly verifies that we going to
>> > share related anon-vma.
>> > 
>> 
>> This do eliminates some corner case, but as you showed child and parent don't
>> share the same AV topology. To keep the same AV topology is the purpose of my
>> commit.
>> 
>> I agree you found some bug that previous commit doesn't do it is expected. But
>> since you change the design a little, I suggest you split this idea to a
>> separate patch so that reviewer and audience in the future could understand
>> your approach clearly. Otherwise audience would be confused and hard to track
>> this change.
>> 
>> For example, you describe the behavior after your change. The second vma would
>> probably have a different AV from first vma.
>> 
>> > Heuristic "reuse-old" uses fact that VMA links and AV parent chain are tracked
>> > independently: when VMA reuses old AV it still links to all related AV even
>> > if VMA->AV points into some old AV in the middle of inheritance chain.
>> > 
>> > > 
>> > > > > > > 
>> > > > > > > Yes, your code works for DST3..DSTn. They will pick up AV1 since
>> > > > > > > (DST2->AV->parent == SRC3->AV).
>> > > > > > > 
>> > > > > > > My question is why DST1 and DST2 has different AV? The purpose of my patch
>> > > > > > > tries to make child has the same topology and parent. So the ideal look of
>> > > > > > > child is:
>> > > > > > > 
>> > > > > > > DST1 - AV1
>> > > > > > > DST2 - AV1
>> > > > > > > DST2 - AV1
>> > > > > > > DST3 - AV1
>> > > > > > > DST4 - AV1
>> > > > > > > 
>> > > > > > > Would you mind putting more words on DST1 and DST2? I didn't fully understand
>> > > > > > > the logic here.
>> > > > > > > 
>> > > > > > > Thanks
>> > > > > > > 
>> > > > > > 
>> > > > > > I think that the first version is doing the work as you expected, but been
>> > > > > > revised in second version, to limits the number of users of reused old
>> > > > > > anon(which is picked in anon_vma_clone() and keep the tree structure.
>> > > > > > 
>> > > > > 
>> > > > > Any reason to reduce the reuse? Maybe I lost some point.
>> > > > 
>> > > > > 
>> > > > > > > --
>> > > > > > > Wei Yang
>> > > > > > > Help you, Help me
>> > > > > 
>> > > 
>> 

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-13 11:07                       ` Konstantin Khlebnikov
  2020-01-14  2:09                         ` Wei Yang
@ 2020-01-14 14:42                         ` Li Xinhai
  2020-01-15  1:20                           ` Wei Yang
  1 sibling, 1 reply; 22+ messages in thread
From: Li Xinhai @ 2020-01-14 14:42 UTC (permalink / raw)
  To: khlebnikov, richardw.yang
  Cc: linux-mm, akpm, linux-kernel, Rik van Riel, kirill.shutemov

On 2020-01-13 at 19:07 Konstantin Khlebnikov wrote:

>
>Because I want to keep both heuristics.
>This seems most sane way of interaction between them.
>
>Unfortunately even this patch is slightly broken.
>Condition prev->anon_vma->parent == pvma->anon_vma doesn't guarantee that
>prev vma has the same set of anon-vmas like current vma.
>I.e. anon_vma_clone(vma, prev) might be not enough for keeping connectivity. 

New patch is required?
It is necessary to call anon_vma_clone(vma, pvma) to link all anon_vma which
currently linked by pvma, then link the prev->anon_vma to vma. By this way,
connectivity of vma should be maintained, right?

>Building such case isn't trivial job but I see nothing that could prevent it.
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-14 14:42                         ` Li Xinhai
@ 2020-01-15  1:20                           ` Wei Yang
  2020-01-18  8:04                             ` Konstantin Khlebnikov
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Yang @ 2020-01-15  1:20 UTC (permalink / raw)
  To: Li Xinhai
  Cc: khlebnikov, richardw.yang, linux-mm, akpm, linux-kernel,
	Rik van Riel, kirill.shutemov

On Tue, Jan 14, 2020 at 10:42:52PM +0800, Li Xinhai wrote:
>On 2020-01-13 at 19:07 Konstantin Khlebnikov wrote:
>
>>
>>Because I want to keep both heuristics.
>>This seems most sane way of interaction between them.
>>
>>Unfortunately even this patch is slightly broken.
>>Condition prev->anon_vma->parent == pvma->anon_vma doesn't guarantee that
>>prev vma has the same set of anon-vmas like current vma.
>>I.e. anon_vma_clone(vma, prev) might be not enough for keeping connectivity. 
>
>New patch is required?

My suggestion is separate the fix and new approach instead of mess them into
one patch.

>It is necessary to call anon_vma_clone(vma, pvma) to link all anon_vma which
>currently linked by pvma, then link the prev->anon_vma to vma. By this way,
>connectivity of vma should be maintained, right?
>
>>Building such case isn't trivial job but I see nothing that could prevent it.
>>
>

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-15  1:20                           ` Wei Yang
@ 2020-01-18  8:04                             ` Konstantin Khlebnikov
  2020-01-18 14:00                               ` Wei Yang
  0 siblings, 1 reply; 22+ messages in thread
From: Konstantin Khlebnikov @ 2020-01-18  8:04 UTC (permalink / raw)
  To: Wei Yang, Li Xinhai
  Cc: linux-mm, akpm, linux-kernel, Rik van Riel, kirill.shutemov

On 15/01/2020 04.20, Wei Yang wrote:
> On Tue, Jan 14, 2020 at 10:42:52PM +0800, Li Xinhai wrote:
>> On 2020-01-13 at 19:07 Konstantin Khlebnikov wrote:
>>
>>>
>>> Because I want to keep both heuristics.
>>> This seems most sane way of interaction between them.
>>>
>>> Unfortunately even this patch is slightly broken.
>>> Condition prev->anon_vma->parent == pvma->anon_vma doesn't guarantee that
>>> prev vma has the same set of anon-vmas like current vma.
>>> I.e. anon_vma_clone(vma, prev) might be not enough for keeping connectivity.
>>
>> New patch is required?
> 
> My suggestion is separate the fix and new approach instead of mess them into
> one patch.

Yep, it's messy. Maybe it's could be better to revert recent change,
apply second patch from this set and write something new after that.

> 
>> It is necessary to call anon_vma_clone(vma, pvma) to link all anon_vma which
>> currently linked by pvma, then link the prev->anon_vma to vma. By this way,
>> connectivity of vma should be maintained, right?
>>
>>> Building such case isn't trivial job but I see nothing that could prevent it.
>>>
>>
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork
  2020-01-18  8:04                             ` Konstantin Khlebnikov
@ 2020-01-18 14:00                               ` Wei Yang
  0 siblings, 0 replies; 22+ messages in thread
From: Wei Yang @ 2020-01-18 14:00 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Wei Yang, Li Xinhai, linux-mm, akpm, linux-kernel, Rik van Riel,
	kirill.shutemov

On Sat, Jan 18, 2020 at 11:04:21AM +0300, Konstantin Khlebnikov wrote:
>On 15/01/2020 04.20, Wei Yang wrote:
>> On Tue, Jan 14, 2020 at 10:42:52PM +0800, Li Xinhai wrote:
>> > On 2020-01-13??at 19:07??Konstantin Khlebnikov??wrote:
>> > 
>> > > 
>> > > Because I want to keep both heuristics.
>> > > This seems most sane way of interaction between them.
>> > > 
>> > > Unfortunately even this patch is slightly broken.
>> > > Condition prev->anon_vma->parent == pvma->anon_vma doesn't guarantee that
>> > > prev vma has the same set of anon-vmas like current vma.
>> > > I.e. anon_vma_clone(vma, prev) might be not enough for keeping connectivity.
>> > 
>> > New patch is required?
>> 
>> My suggestion is separate the fix and new approach instead of mess them into
>> one patch.
>
>Yep, it's messy. Maybe it's could be better to revert recent change,
>apply second patch from this set and write something new after that.
>

It is up to you.

>> 
>> > It is necessary to call anon_vma_clone(vma,??pvma) to link all anon_vma which
>> > currently linked by pvma, then link the prev->anon_vma to vma. By this way,
>> > connectivity of vma should be maintained, right?
>> > 
>> > > Building such case isn't trivial job but I see nothing that could prevent it.
>> > > 
>> > 
>> 

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-01-18 14:00 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-07 10:19 [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork Konstantin Khlebnikov
2020-01-07 10:19 ` [PATCH v2 2/2] kernel/fork: set VMA's mm/prev/next right after vm_area_dup in dup_mmap Konstantin Khlebnikov
2020-01-07 14:22   ` lixinhai.lxh
2020-01-07 13:35 ` [PATCH v2 1/2] mm/rmap: fix and simplify reusing mergeable anon_vma as parent when fork lixinhai.lxh
2020-01-08  2:32 ` Wei Yang
2020-01-08  6:42   ` lixinhai.lxh
2020-01-08 10:40   ` Konstantin Khlebnikov
2020-01-09  2:52     ` Wei Yang
2020-01-09  8:54       ` Konstantin Khlebnikov
2020-01-10  2:30         ` Wei Yang
2020-01-10  3:23           ` Li Xinhai
2020-01-10  5:34             ` Wei Yang
2020-01-10  8:11               ` Konstantin Khlebnikov
2020-01-11 22:38                 ` Wei Yang
2020-01-12  9:55                   ` Konstantin Khlebnikov
2020-01-13  0:33                     ` Wei Yang
2020-01-13 11:07                       ` Konstantin Khlebnikov
2020-01-14  2:09                         ` Wei Yang
2020-01-14 14:42                         ` Li Xinhai
2020-01-15  1:20                           ` Wei Yang
2020-01-18  8:04                             ` Konstantin Khlebnikov
2020-01-18 14:00                               ` Wei Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.