Re: [PATCH V2 1/1] mm:improve the performance during fork

From: jun qian <qianjun.kernel@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: ast@kernel.org, daniel@iogearbox.net, kafai@fb.com,
	songliubraving@fb.com, yhs@fb.com, andriin@fb.com,
	john.fastabend@gmail.com, kpsingh@chromium.org,
	Linux-MM <linux-mm@kvack.org>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	bpf@vger.kernel.org
Subject: Re: [PATCH V2 1/1] mm:improve the performance during fork
Date: Tue, 6 Apr 2021 10:14:16 +0800	[thread overview]
Message-ID: <CAKc596KE+mN1xOXppTOtJY7UDDLSb+zg2kwj=x8AzMN_Px2DuQ@mail.gmail.com> (raw)
In-Reply-To: <20210330224406.5e195f3b8b971ff2a56c657d@linux-foundation.org>

Andrew Morton <akpm@linux-foundation.org> 于2021年3月31日周三 下午1:44写道：
>
> On Mon, 29 Mar 2021 20:36:35 +0800 qianjun.kernel@gmail.com wrote:
>
> > From: jun qian <qianjun.kernel@gmail.com>
> >
> > In our project, Many business delays come from fork, so
> > we started looking for the reason why fork is time-consuming.
> > I used the ftrace with function_graph to trace the fork, found
> > that the vm_normal_page will be called tens of thousands and
> > the execution time of this vm_normal_page function is only a
> > few nanoseconds. And the vm_normal_page is not a inline function.
> > So I think if the function is inline style, it maybe reduce the
> > call time overhead.
> >
> > I did the following experiment:
> >
> > use the bpftrace tool to trace the fork time :
> >
> > bpftrace -e 'kprobe:_do_fork/comm=="redis-server"/ {@st=nsecs;} \
> > kretprobe:_do_fork /comm=="redis-server"/{printf("the fork time \
> > is %d us\n", (nsecs-@st)/1000)}'
> >
> > no inline vm_normal_page:
> > result:
> > the fork time is 40743 us
> > the fork time is 41746 us
> > the fork time is 41336 us
> > the fork time is 42417 us
> > the fork time is 40612 us
> > the fork time is 40930 us
> > the fork time is 41910 us
> >
> > inline vm_normal_page:
> > result:
> > the fork time is 39276 us
> > the fork time is 38974 us
> > the fork time is 39436 us
> > the fork time is 38815 us
> > the fork time is 39878 us
> > the fork time is 39176 us
> >
> > In the same test environment, we can get 3% to 4% of
> > performance improvement.
> >
> > note:the test data is from the 4.18.0-193.6.3.el8_2.v1.1.x86_64,
> > because my product use this version kernel to test the redis
> > server, If you need to compare the latest version of the kernel
> > test data, you can refer to the version 1 Patch.
> >
> > We need to compare the changes in the size of vmlinux:
> >                   inline           non-inline       diff
> > vmlinux size      9709248 bytes    9709824 bytes    -576 bytes
> >
>
> I get very different results with gcc-7.2.0:
>
> q:/usr/src/25> size mm/memory.o
>    text    data     bss     dec     hex filename
>   74898    3375      64   78337   13201 mm/memory.o-before
>   75119    3363      64   78546   132d2 mm/memory.o-after
>
> That's a somewhat significant increase in code size, and larger code
> size has a worsened cache footprint.
>
> Not that this is necessarily a bad thing for a function which is
> tightly called many times in succession as is vm__normal_page()
>
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -592,7 +592,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
> >   * PFNMAP mappings in order to support COWable mappings.
> >   *
> >   */
> > -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> > +inline struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> >                           pte_t pte)
> >  {
> >       unsigned long pfn = pte_pfn(pte);
>
> I'm a bit surprised this made any difference - rumour has it that
> modern gcc just ignores `inline' and makes up its own mind.  Which is
> why we added __always_inline.
>
the kernel code version: kernel-4.18.0-193.6.3.el8_2
gcc version 8.4.1 20200928 (Red Hat 8.4.1-1) (GCC)

and I made it again, got the results, and later i will test in the
latest version kernel with the new gcc.

757368576  vmlinux   inline
757381440  vmlinux   no inline