From: Will Deacon <will@kernel.org>
To: Prathu Baronia <prathu.baronia@oneplus.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
catalin.marinas@arm.com, alexander.duyck@gmail.com,
chintan.pandya@oneplus.com, mhocko@suse.com,
akpm@linux-foundation.org, linux-mm@kvack.org,
gregkh@linuxfoundation.com, gthelen@google.com, jack@suse.cz,
ken.lin@oneplus.com, gasine.xu@oneplus.com, ying.huang@intel.com,
mark.rutland@arm.com
Subject: Re: [PATCH v2] mm: Optimized hugepage zeroing & copying from user
Date: Tue, 5 May 2020 09:59:21 +0100 [thread overview]
Message-ID: <20200505085919.GB16980@willie-the-truck> (raw)
In-Reply-To: <20200501085855.c5dzk5hfrdzunqdl@oneplus.com>
On Fri, May 01, 2020 at 02:28:55PM +0530, Prathu Baronia wrote:
> Platform and setup conditions:
> Qualcomm's SM8150 platform under controlled conditions(i.e. only CPU0 and 6
> turned on and set to max frequency, and DDR set to performance governor).
> ---------------------------------------------------------------------------
>
> ---------------------------------------------------------------------------
> Summary:
> We observed a ~61% improvement in executon time of clearing a hugepage
> in the case of arm64 if we increase the granularity i.e. the chunk size
> to 64KB from 4KB for each chunk clearing subroutine call.
> ---------------------------------------------------------------------------
>
> For the base build:
>
> clear_huge_page() ftrace profile
> --------------------------------
> - CPU0:
> - Samples: 95
> - Mean: 242.099 us
> - Std dev: 45.0096 us
That's one hell of a deviation. Any idea what's going on there?
> - CPU6:
> - Samples: 61
> - Mean: 258.372 us
> - Std dev: 22.0754 us
>
> With patches [PATCH {1,2,3}/4] provided at the end where we just revert the
> forward-reverse traversal code we observed:
>
> clear_huge_page() ftrace profile
> --------------------------------
> - CPU0:
> - Samples: 77
> - Mean: 234.568
> - Std dev: 6.52
> - CPU6:
> - Samples: 81
> - Mean: 259.437
> - Std dev: 19.25
>
> We were expecting a bit of an improvement for arm64's case because of our
> hypothesis that reverse traversal is considerably slower in arm64 but after Will
> Deacon's test code which showed similar timings for forward and reverse
> traversals we digged a bit deeper into this.
>
> I found that In the case of arm64 a page is cleared using a special clear_page.S
> assembly routine instead of an explicit call to memset. With the below patch we
> bypassed the assembly routine and oberserved improvement in execution time of
> clear_huge_page on CPU0.
>
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index ea5cdbd8c2c3..a0a97a95aee8 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -158,7 +158,7 @@ do {
> \
> static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
> {
> void *addr = kmap_atomic(page);
> - clear_user_page(addr, vaddr, page);
> + memset(addr, 0x0, PAGE_SIZE);
> kunmap_atomic(addr);
> }
> #endif
>
> For reference I will call the above patch v-exp.
>
> When v-exp is applied on base we observed:
>
> clear_huge_page() ftrace profile
> --------------------------------
> - CPU0:
> - Samples: 71
> - Mean: 124.657 us
> - Std dev: 0.494165 us
This doesn't make any sense to me. memset() of zero is special-cased to
use the DC ZVA instruction in a loop:
3:
dc zva, dst
add dst, dst, zva_len_x
subs count, count, zva_len_x
b.ge 3b
which is basically the same as clear_page():
1: dc zva, x0
add x0, x0, x1
tst x0, #(PAGE_SIZE - 1)
b.ne 1b
Are you able to reproduce this in userspace?
Will
next prev parent reply other threads:[~2020-05-05 8:59 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-14 15:38 [PATCH v2] mm: Optimized hugepage zeroing & copying from user Prathu Baronia
2020-04-14 17:03 ` Michal Hocko
2020-04-14 17:41 ` Daniel Jordan
[not found] ` <20200414184743.GB2097@oneplus.com>
2020-04-14 19:32 ` Alexander Duyck
2020-04-15 3:40 ` Huang, Ying
2020-04-15 11:09 ` Michal Hocko
2020-04-19 12:05 ` Prathu Baronia
2020-04-14 19:40 ` Michal Hocko
2020-04-15 3:27 ` Huang, Ying
2020-04-16 1:21 ` Huang, Ying
2020-04-19 15:58 ` Prathu Baronia
2020-04-20 0:18 ` Huang, Ying
2020-04-21 9:36 ` Prathu Baronia
2020-04-21 10:09 ` Will Deacon
2020-04-21 12:47 ` Vlastimil Babka
2020-04-21 12:48 ` Vlastimil Babka
2020-04-21 13:39 ` Will Deacon
2020-04-21 13:48 ` Vlastimil Babka
2020-04-21 13:56 ` Chintan Pandya
2020-04-22 8:18 ` Will Deacon
2020-04-22 11:19 ` Will Deacon
2020-04-22 14:38 ` Prathu Baronia
2020-05-01 8:58 ` Prathu Baronia
2020-05-05 8:59 ` Will Deacon [this message]
2020-04-21 13:00 ` Michal Hocko
2020-04-21 13:10 ` Will Deacon
2020-04-17 7:48 ` [mm] 134c8b410f: vm-scalability.median -7.9% regression kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200505085919.GB16980@willie-the-truck \
--to=will@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=alexander.duyck@gmail.com \
--cc=catalin.marinas@arm.com \
--cc=chintan.pandya@oneplus.com \
--cc=gasine.xu@oneplus.com \
--cc=gregkh@linuxfoundation.com \
--cc=gthelen@google.com \
--cc=jack@suse.cz \
--cc=ken.lin@oneplus.com \
--cc=linux-mm@kvack.org \
--cc=mark.rutland@arm.com \
--cc=mhocko@suse.com \
--cc=prathu.baronia@oneplus.com \
--cc=vbabka@suse.cz \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).