linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Prathu Baronia <prathu.baronia@oneplus.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	catalin.marinas@arm.com, alexander.duyck@gmail.com,
	chintan.pandya@oneplus.com, mhocko@suse.com,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	gregkh@linuxfoundation.com, gthelen@google.com, jack@suse.cz,
	ken.lin@oneplus.com, gasine.xu@oneplus.com, ying.huang@intel.com,
	mark.rutland@arm.com
Subject: Re: [PATCH v2] mm: Optimized hugepage zeroing & copying from user
Date: Tue, 5 May 2020 09:59:21 +0100	[thread overview]
Message-ID: <20200505085919.GB16980@willie-the-truck> (raw)
In-Reply-To: <20200501085855.c5dzk5hfrdzunqdl@oneplus.com>

On Fri, May 01, 2020 at 02:28:55PM +0530, Prathu Baronia wrote:
> Platform and setup conditions:
> Qualcomm's SM8150 platform under controlled conditions(i.e. only CPU0 and 6
> turned on and set to max frequency, and DDR set to performance governor).
> ---------------------------------------------------------------------------
> 
> ---------------------------------------------------------------------------
> Summary:
> 	We observed a ~61% improvement in executon time of clearing a hugepage
> 	in the case of arm64 if we increase the granularity i.e. the chunk size
> 	to 64KB from 4KB for each chunk clearing subroutine call.
> ---------------------------------------------------------------------------
> 
> For the base build:
> 
> clear_huge_page() ftrace profile
> --------------------------------
> - CPU0:
> 	- Samples: 95
> 	- Mean: 242.099 us
> 	- Std dev: 45.0096 us

That's one hell of a deviation. Any idea what's going on there?

> - CPU6:
> 	- Samples: 61
> 	- Mean: 258.372 us
> 	- Std dev: 22.0754 us
> 
> With patches [PATCH {1,2,3}/4] provided at the end where we just revert the
> forward-reverse traversal code we observed:
> 
> clear_huge_page() ftrace profile
> --------------------------------
> - CPU0:
> 	- Samples: 77
> 	- Mean: 234.568
> 	- Std dev: 6.52
> - CPU6:
> 	- Samples: 81
> 	- Mean: 259.437
> 	- Std dev: 19.25
> 
> We were expecting a bit of an improvement for arm64's case because of our
> hypothesis that reverse traversal is considerably slower in arm64 but after Will
> Deacon's test code which showed similar timings for forward and reverse
> traversals we digged a bit deeper into this.
> 
> I found that In the case of arm64 a page is cleared using a special clear_page.S
> assembly routine instead of an explicit call to memset. With the below patch we
> bypassed the assembly routine and oberserved improvement in execution time of
> clear_huge_page on CPU0.
> 
>  diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>  index ea5cdbd8c2c3..a0a97a95aee8 100644
>  --- a/include/linux/highmem.h
>  +++ b/include/linux/highmem.h
>  @@ -158,7 +158,7 @@ do {
>  \
>  static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
>  {
>         void *addr = kmap_atomic(page);
>  -      clear_user_page(addr, vaddr, page);
>  +      memset(addr, 0x0, PAGE_SIZE);
>         kunmap_atomic(addr);
>  }
>  #endif
> 
> For reference I will call the above patch v-exp.
> 
> When v-exp is applied on base we observed:
> 
> clear_huge_page() ftrace profile
> --------------------------------
> - CPU0:
> 	- Samples: 71
> 	- Mean: 124.657 us
> 	- Std dev: 0.494165 us

This doesn't make any sense to me. memset() of zero is special-cased to
use the DC ZVA instruction in a loop:

3:
	dc	zva, dst
	add	dst, dst, zva_len_x
	subs	count, count, zva_len_x
	b.ge	3b

which is basically the same as clear_page():

1:	dc	zva, x0
	add	x0, x0, x1
	tst	x0, #(PAGE_SIZE - 1)
	b.ne	1b

Are you able to reproduce this in userspace?

Will


  reply	other threads:[~2020-05-05  8:59 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-14 15:38 [PATCH v2] mm: Optimized hugepage zeroing & copying from user Prathu Baronia
2020-04-14 17:03 ` Michal Hocko
2020-04-14 17:41   ` Daniel Jordan
     [not found]   ` <20200414184743.GB2097@oneplus.com>
2020-04-14 19:32     ` Alexander Duyck
2020-04-15  3:40       ` Huang, Ying
2020-04-15 11:09         ` Michal Hocko
2020-04-19 12:05       ` Prathu Baronia
2020-04-14 19:40     ` Michal Hocko
2020-04-15  3:27 ` Huang, Ying
2020-04-16  1:21   ` Huang, Ying
2020-04-19 15:58   ` Prathu Baronia
2020-04-20  0:18     ` Huang, Ying
2020-04-21  9:36       ` Prathu Baronia
2020-04-21 10:09         ` Will Deacon
2020-04-21 12:47           ` Vlastimil Babka
2020-04-21 12:48             ` Vlastimil Babka
2020-04-21 13:39               ` Will Deacon
2020-04-21 13:48                 ` Vlastimil Babka
2020-04-21 13:56                   ` Chintan Pandya
2020-04-22  8:18                   ` Will Deacon
2020-04-22 11:19                     ` Will Deacon
2020-04-22 14:38                       ` Prathu Baronia
2020-05-01  8:58                         ` Prathu Baronia
2020-05-05  8:59                           ` Will Deacon [this message]
2020-04-21 13:00             ` Michal Hocko
2020-04-21 13:10               ` Will Deacon
2020-04-17  7:48 ` [mm] 134c8b410f: vm-scalability.median -7.9% regression kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200505085919.GB16980@willie-the-truck \
    --to=will@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=chintan.pandya@oneplus.com \
    --cc=gasine.xu@oneplus.com \
    --cc=gregkh@linuxfoundation.com \
    --cc=gthelen@google.com \
    --cc=jack@suse.cz \
    --cc=ken.lin@oneplus.com \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mhocko@suse.com \
    --cc=prathu.baronia@oneplus.com \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).