linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
To: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: corbet@lwn.net, akpm@linux-foundation.org, willy@infradead.org,
	brauner@kernel.org, surenb@google.com,
	michael.christie@oracle.com, peterz@infradead.org,
	mathieu.desnoyers@efficios.com, npiggin@gmail.com,
	avagin@gmail.com, linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 11/11] fork: Use __mt_dup() to duplicate maple tree in dup_mmap()
Date: Wed, 26 Jul 2023 13:06:45 -0400	[thread overview]
Message-ID: <20230726170645.2m2rbk325dy727eo@revolver> (raw)
In-Reply-To: <20230726080916.17454-12-zhangpeng.00@bytedance.com>

* Peng Zhang <zhangpeng.00@bytedance.com> [230726 04:10]:
> Use __mt_dup() to duplicate the old maple tree in dup_mmap(), and then
> directly modify the entries of VMAs in the new maple tree, which can
> get better performance. dup_mmap() is used by fork(), so this patch
> optimizes fork(). The optimization effect is proportional to the number
> of VMAs.
> 
> Due to the introduction of this method, the optimization in
> (maple_tree: add a fast path case in mas_wr_slot_store())[1] no longer
> has an effect here, but it is also an optimization of the maple tree.
> 
> There is a unixbench test suite[2] where 'spawn' is used to test fork().
> 'spawn' only has 23 VMAs by default, so I tweaked the benchmark code a
> bit to use mmap() to control the number of VMAs. Therefore, the
> performance under different numbers of VMAs can be measured.
> 
> Insert code like below into 'spawn':
> for (int i = 0; i < 200; ++i) {
> 	size_t size = 10 * getpagesize();
> 	void *addr;
> 
> 	if (i & 1) {
> 		addr = mmap(NULL, size, PROT_READ,
> 			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> 	} else {
> 		addr = mmap(NULL, size, PROT_WRITE,
> 			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> 	}
> 	if (addr == MAP_FAILED)
> 		...
> }
> 
> Based on next-20230721, use 'spawn' under 23, 203, and 4023 VMAs, test
> 4 times in 30 seconds each time, and get the following numbers. These
> numbers are the number of fork() successes in 30s (average of the best
> 3 out of 4). By the way, based on next-20230725, I reverted [1], and
> tested it together as a comparison. In order to ensure the reliability
> of the test results, these tests were run on a physical machine.
> 
> 		23VMAs		223VMAs		4023VMAs
> revert [1]:	159104.00	73316.33	6787.00

You can probably remove the revert benchmark from this since there is no
reason to revert the previous change. The change is worth while on its
own, so it's better to have the numbers more clear by having with and
without this series.

> 
> 		+0.77%		+0.42%		+0.28%
> next-20230721:	160321.67	73624.67	6806.33
> 
> 		+2.77%		+15.42%		+29.86%
> apply this:	164751.67	84980.33	8838.67

What is the difference between using this patch with mas_replace_entry()
and mas_store_entry()?

> 
> It can be seen that the performance improvement is proportional to
> the number of VMAs. With 23 VMAs, performance improves by about 3%,
> with 223 VMAs, performance improves by about 15%, and with 4023 VMAs,
> performance improves by about 30%.
> 
> [1] https://lore.kernel.org/lkml/20230628073657.75314-4-zhangpeng.00@bytedance.com/
> [2] https://github.com/kdlucas/byte-unixbench/tree/master
> 
> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
> ---
>  kernel/fork.c | 35 +++++++++++++++++++++++++++--------
>  mm/mmap.c     | 14 ++++++++++++--
>  2 files changed, 39 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index f81149739eb9..ef80025b62d6 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -650,7 +650,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  	int retval;
>  	unsigned long charge = 0;
>  	LIST_HEAD(uf);
> -	VMA_ITERATOR(old_vmi, oldmm, 0);
>  	VMA_ITERATOR(vmi, mm, 0);
>  
>  	uprobe_start_dup_mmap();
> @@ -678,17 +677,40 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  		goto out;
>  	khugepaged_fork(mm, oldmm);
>  
> -	retval = vma_iter_bulk_alloc(&vmi, oldmm->map_count);
> -	if (retval)
> +	/* Use __mt_dup() to efficiently build an identical maple tree. */
> +	retval = __mt_dup(&oldmm->mm_mt, &mm->mm_mt, GFP_NOWAIT | __GFP_NOWARN);
> +	if (unlikely(retval))
>  		goto out;
>  
>  	mt_clear_in_rcu(vmi.mas.tree);
> -	for_each_vma(old_vmi, mpnt) {
> +	for_each_vma(vmi, mpnt) {
>  		struct file *file;
>  
>  		vma_start_write(mpnt);
>  		if (mpnt->vm_flags & VM_DONTCOPY) {
>  			vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt));
> +
> +			/*
> +			 * Since the new tree is exactly the same as the old one,
> +			 * we need to remove the unneeded VMAs.
> +			 */
> +			mas_store(&vmi.mas, NULL);
> +
> +			/*
> +			 * Even removing an entry may require memory allocation,
> +			 * and if removal fails, we use XA_ZERO_ENTRY to mark
> +			 * from which VMA it failed. The case of encountering
> +			 * XA_ZERO_ENTRY will be handled in exit_mmap().
> +			 */
> +			if (unlikely(mas_is_err(&vmi.mas))) {
> +				retval = xa_err(vmi.mas.node);
> +				mas_reset(&vmi.mas);
> +				if (mas_find(&vmi.mas, ULONG_MAX))
> +					mas_replace_entry(&vmi.mas,
> +							  XA_ZERO_ENTRY);
> +				goto loop_out;
> +			}
> +
>  			continue;
>  		}
>  		charge = 0;
> @@ -750,8 +772,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  			hugetlb_dup_vma_private(tmp);
>  
>  		/* Link the vma into the MT */
> -		if (vma_iter_bulk_store(&vmi, tmp))
> -			goto fail_nomem_vmi_store;
> +		mas_replace_entry(&vmi.mas, tmp);
>  
>  		mm->map_count++;
>  		if (!(tmp->vm_flags & VM_WIPEONFORK))
> @@ -778,8 +799,6 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
>  	uprobe_end_dup_mmap();
>  	return retval;
>  
> -fail_nomem_vmi_store:
> -	unlink_anon_vmas(tmp);
>  fail_nomem_anon_vma_fork:
>  	mpol_put(vma_policy(tmp));
>  fail_nomem_policy:
> diff --git a/mm/mmap.c b/mm/mmap.c
> index bc91d91261ab..5bfba2fb0e39 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3184,7 +3184,11 @@ void exit_mmap(struct mm_struct *mm)
>  	arch_exit_mmap(mm);
>  
>  	vma = mas_find(&mas, ULONG_MAX);
> -	if (!vma) {
> +	/*
> +	 * If dup_mmap() fails to remove a VMA marked VM_DONTCOPY,
> +	 * xa_is_zero(vma) may be true.
> +	 */
> +	if (!vma || xa_is_zero(vma)) {
>  		/* Can happen if dup_mmap() received an OOM */
>  		mmap_read_unlock(mm);
>  		return;
> @@ -3222,7 +3226,13 @@ void exit_mmap(struct mm_struct *mm)
>  		remove_vma(vma, true);
>  		count++;
>  		cond_resched();
> -	} while ((vma = mas_find(&mas, ULONG_MAX)) != NULL);
> +		vma = mas_find(&mas, ULONG_MAX);
> +		/*
> +		 * If xa_is_zero(vma) is true, it means that subsequent VMAs
> +		 * donot need to be removed. Can happen if dup_mmap() fails to
> +		 * remove a VMA marked VM_DONTCOPY.
> +		 */
> +	} while (vma != NULL && !xa_is_zero(vma));
>  
>  	BUG_ON(count != mm->map_count);
>  
> -- 
> 2.20.1
> 

  reply	other threads:[~2023-07-26 17:07 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-26  8:09 [PATCH 00/11] Introduce mt_dup() to improve the performance of fork() Peng Zhang
2023-07-26  8:09 ` [PATCH 01/11] maple_tree: Introduce ma_nonleaf_data_end{_nocheck}() Peng Zhang
2023-07-26 14:58   ` Liam R. Howlett
2023-07-31  9:52     ` Peng Zhang
2023-07-31 16:08       ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 02/11] maple_tree: Validate MAPLE_ENODE and ma_nonleaf_data_end() Peng Zhang
2023-07-26  8:09 ` [PATCH 03/11] maple_tree: Add some helper functions Peng Zhang
2023-07-26 15:02   ` Liam R. Howlett
2023-07-26 15:08     ` Matthew Wilcox
2023-07-31 11:45       ` Peng Zhang
2023-08-11 17:28         ` Liam R. Howlett
2023-07-31 11:40     ` Peng Zhang
2023-07-26  8:09 ` [PATCH 04/11] maple_tree: Introduce interfaces __mt_dup() and mt_dup() Peng Zhang
2023-07-26 16:03   ` Liam R. Howlett
2023-07-31 12:24     ` Peng Zhang
2023-07-31 16:27       ` Liam R. Howlett
2023-08-16 13:41         ` Peng Zhang
2023-08-16 18:30           ` Liam R. Howlett
2023-08-18 11:53             ` Peng Zhang
2023-08-18 16:13               ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 05/11] maple_tree: Add test for mt_dup() Peng Zhang
2023-07-26 16:06   ` Liam R. Howlett
2023-07-31 12:32     ` Peng Zhang
2023-07-31 16:41       ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 06/11] maple_tree: Introduce mas_replace_entry() to directly replace an entry Peng Zhang
2023-07-26 16:08   ` Liam R. Howlett
2023-07-31 12:39     ` Peng Zhang
2023-07-31 16:48       ` Liam R. Howlett
2023-08-16 13:11         ` Peng Zhang
2023-08-16 17:40           ` Liam R. Howlett
2023-08-18  9:39             ` Peng Zhang
2023-08-18 16:15               ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 07/11] maple_tree: Update the documentation of maple tree Peng Zhang
2023-07-26  8:09 ` [PATCH 08/11] maple_tree: Skip other tests when BENCH is enabled Peng Zhang
2023-07-26  8:09 ` [PATCH 09/11] maple_tree: Update check_forking() and bench_forking() Peng Zhang
2023-07-26  8:09 ` [PATCH 10/11] MAINTAINERS: Add co-maintainer for maple tree Peng Zhang
2023-07-26 16:39   ` Liam R. Howlett
2023-07-31 12:55     ` Peng Zhang
2023-07-31 20:55       ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 11/11] fork: Use __mt_dup() to duplicate maple tree in dup_mmap() Peng Zhang
2023-07-26 17:06   ` Liam R. Howlett [this message]
2023-07-31 12:59     ` Peng Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230726170645.2m2rbk325dy727eo@revolver \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=michael.christie@oracle.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    --cc=zhangpeng.00@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).