linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: Prathu Baronia <prathu.baronia@oneplus.com>
Cc: Michal Hocko <mhocko@suse.com>,
	Chintan Pandya <chintan.pandya@oneplus.com>,
	 "Huang, Ying" <ying.huang@intel.com>,
	akpm@linux-foundation.com,  linux-mm <linux-mm@kvack.org>,
	gregkh@linuxfoundation.com,  Greg Thelen <gthelen@google.com>,
	jack@suse.cz, Ken Lin <ken.lin@oneplus.com>,
	 Gasine Xu <gasine.xu@oneplus.com>
Subject: Re: [PATCH v2] mm: Optimized hugepage zeroing & copying from user
Date: Tue, 14 Apr 2020 12:32:57 -0700	[thread overview]
Message-ID: <CAKgT0Ud2zeZO7-akPCLySUAbh5ePF=Kp0V+kaBpV63woQXk_xg@mail.gmail.com> (raw)
In-Reply-To: <20200414184743.GB2097@oneplus.com>

On Tue, Apr 14, 2020 at 11:47 AM Prathu Baronia
<prathu.baronia@oneplus.com> wrote:
>
> The 04/14/2020 19:03, Michal Hocko wrote:
> > I still have hard time to see why kmap machinery should introduce any
> > slowdown here. Previous data posted while discussing v1 didn't really
> > show anything outside of the noise.
> >
> You are right, the multiple barriers are not responsible for the slowdown, but
> removal of kmap_atomic() allows us to call memset and memcpy for larger sizes.
> I will re-frame this part of the commit text when we proceed towards v3 to
> present it more cleanly.
> >
> > It would be really nice to provide std
> >
> Here is the data with std:-
> ----------------------------------------------------------------------
> Results:
> ----------------------------------------------------------------------
> Results for ARM64 target (SM8150 , CPU0 & 6 are online, running at max
> frequency)
> All numbers are mean of 100 iterations. Variation is ignorable.
> ----------------------------------------------------------------------
> - Oneshot : 3389.26 us  std: 79.1377 us
> - Forward : 8876.16 us  std: 172.699 us
> - Reverse : 18157.6 us  std: 111.713 us
> ----------------------------------------------------------------------
>
> ----------------------------------------------------------------------
> Results for x86-64 (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz, only CPU 0 in
> max frequency, DDR also running at max frequency.) All numbers are mean of
> 100 iterations. Variation is ignorable.
> ----------------------------------------------------------------------
> - Oneshot : 3203.49 us  std: 115.4086 us
> - Forward : 5766.46 us  std: 328.6299 us
> - Reverse : 5187.86 us  std: 341.1918 us
> ----------------------------------------------------------------------
>
> >
> > No. There is absolutely zero reason to add a config option for this. The
> > kernel should have all the information to make an educated guess.
> >
> I will try to incorporate this in v3. But currently I don't have any idea on how
> to go about implementing the guessing logic. Would really appreciate if you can
> suggest some way to go about it.
>
> > Also before going any further. The patch which has introduced the
> > optimization was c79b57e462b5 ("mm: hugetlb: clear target sub-page last
> > when clearing huge page"). It is based on an artificial benchmark which
> > to my knowledge doesn't represent any real workload. Your measurements
> > are based on a different benchmark. Your numbers clearly show that some
> > assumptions used for the optimization are not architecture neutral.
> >
> But oneshot numbers are significantly better on both the archs. I think
> theoretically the oneshot approach should provide better results on all the
> architectures when compared with serial approach. Isn't it a fair assumption to
> go ahead with the oneshot approach?

I think the point that Michal is getting at is that there are other
tests that need to be run. You are running the test on just one core.
What happens as we start fanning this out and having multiple
instances running per socket? We would be flooding the LLC in addition
to overwriting all the other caches.

If you take a look at commit c6ddfb6c58903 ("mm, clear_huge_page: move
order algorithm into a separate function") they were running the tests
on multiple threads simultaneously as their concern was flooding the
LLC cache. I wonder if we couldn't look at bypassing the cache
entirely using something like __copy_user_nocache for some portion of
the copy and then only copy in the last pieces that we think will be
immediately accessed.


  parent reply	other threads:[~2020-04-14 19:33 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-14 15:38 [PATCH v2] mm: Optimized hugepage zeroing & copying from user Prathu Baronia
2020-04-14 17:03 ` Michal Hocko
2020-04-14 17:41   ` Daniel Jordan
     [not found]   ` <20200414184743.GB2097@oneplus.com>
2020-04-14 19:32     ` Alexander Duyck [this message]
2020-04-15  3:40       ` Huang, Ying
2020-04-15 11:09         ` Michal Hocko
2020-04-19 12:05       ` Prathu Baronia
2020-04-14 19:40     ` Michal Hocko
2020-04-15  3:27 ` Huang, Ying
2020-04-16  1:21   ` Huang, Ying
2020-04-19 15:58   ` Prathu Baronia
2020-04-20  0:18     ` Huang, Ying
2020-04-21  9:36       ` Prathu Baronia
2020-04-21 10:09         ` Will Deacon
2020-04-21 12:47           ` Vlastimil Babka
2020-04-21 12:48             ` Vlastimil Babka
2020-04-21 13:39               ` Will Deacon
2020-04-21 13:48                 ` Vlastimil Babka
2020-04-21 13:56                   ` Chintan Pandya
2020-04-22  8:18                   ` Will Deacon
2020-04-22 11:19                     ` Will Deacon
2020-04-22 14:38                       ` Prathu Baronia
2020-05-01  8:58                         ` Prathu Baronia
2020-05-05  8:59                           ` Will Deacon
2020-04-21 13:00             ` Michal Hocko
2020-04-21 13:10               ` Will Deacon
2020-04-17  7:48 ` [mm] 134c8b410f: vm-scalability.median -7.9% regression kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKgT0Ud2zeZO7-akPCLySUAbh5ePF=Kp0V+kaBpV63woQXk_xg@mail.gmail.com' \
    --to=alexander.duyck@gmail.com \
    --cc=akpm@linux-foundation.com \
    --cc=chintan.pandya@oneplus.com \
    --cc=gasine.xu@oneplus.com \
    --cc=gregkh@linuxfoundation.com \
    --cc=gthelen@google.com \
    --cc=jack@suse.cz \
    --cc=ken.lin@oneplus.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=prathu.baronia@oneplus.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).