From: Chintan Pandya <chintan.pandya@oneplus.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@suse.com>,
Prathu Baronia <prathu.baronia@oneplus.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"gthelen@google.com" <gthelen@google.com>,
"jack@suse.cz" <jack@suse.cz>, Ken Lin <ken.lin@oneplus.com>,
Gasine Xu <Gasine.Xu@Oneplus.com>
Subject: RE: [RFC] mm/memory.c: Optimizing THP zeroing routine for !HIGHMEM cases
Date: Sat, 11 Apr 2020 15:40:01 +0000 [thread overview]
Message-ID: <SG2PR04MB2921E6D51681B935C0F85EEA91DF0@SG2PR04MB2921.apcprd04.prod.outlook.com> (raw)
In-Reply-To: <87lfn390db.fsf@yhuang-dev.intel.com>
> > Generally, many architectures are optimized for serial loads, be it
> > initialization or access, as it is simplest form of prediction. Any
> > random access pattern would kill that pre-fetching. And for now, I
> > suspect that to be the case here. Probably, we can run more tests to confirm
> this part.
>
> Please prove your theory with test. Better to test x86 too.
Wrote down below userspace test code.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#define SZ_1M 0x100000
#define SZ_4K 0x1000
#define NUM 100
Int main ()
{
void *p;
void *q;
void *r;
unsigned long total_pages, total_size;
int i, j;
struct timeval t0, t1, t2, t3;
int elapsed;
printf ("Hello World\n");
total_size = NUM * SZ_1M;
total_pages = NUM * (SZ_1M / SZ_4K);
p = malloc (total_size);
q = malloc (total_size);
r = malloc (total_size);
/* So that all pages gets allocated */
memset (r, 0xa, total_size);
memset (q, 0xa, total_size);
memset (p, 0xa, total_size);
gettimeofday (&t0, NULL);
/* One shot memset */
memset (r, 0xd, total_size);
gettimeofday (&t1, NULL);
/* traverse in forward order */
for (j = 0; j < total_pages; j++)
{
memset (q + (j * SZ_4K), 0xc, SZ_4K);
}
gettimeofday (&t2, NULL);
/* traverse in reverse order */
for (i = 0; i < total_pages; i++)
{
memset (p + total_size - (i + 1) * SZ_4K, 0xb, SZ_4K);
}
gettimeofday (&t3, NULL);
free (p);
free (q);
free (r);
/* Results time */
elapsed = ((t1.tv_sec - t0.tv_sec) * 1000000) + (t1.tv_usec - t0.tv_usec);
printf ("One shot: %d micro seconds\n", elapsed);
elapsed = ((t2.tv_sec - t1.tv_sec) * 1000000) + (t2.tv_usec - t1.tv_usec);
printf ("Forward order: %d micro seconds\n", elapsed);
elapsed = ((t3.tv_sec - t2.tv_sec) * 1000000) + (t3.tv_usec - t2.tv_usec);
printf ("Reverse order: %d micro seconds\n", elapsed);
return 0;
}
------------------------------------------------------------------------------------------------
Results for ARM64 target (SM8150 , CPU0 & 6 are online, running at max frequency)
All numbers are mean of 100 iterations. Variation is ignorable.
- Oneshot : 3389.26 us
- Forward : 8876.16 us
- Reverse : 18157.6 us
Results for x86-64 (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz, only CPU 0 in max frequency)
All numbers are mean of 100 iterations. Variation is ignorable.
- Oneshot : 3203.49 us
- Forward : 5766.46 us
- Reverse : 5187.86 us
To conclude, I observed optimized serial writes in case of ARM processor. But strangely,
memset in reverse order performs better than forward order quite consistently across
multiple x86 machines. I don't have much insight into x86 so to clarify, I would like to
restrict my previous suspicion to ARM only.
>
> Best Regards,
> Huang, Ying
next prev parent reply other threads:[~2020-04-11 15:40 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-03 8:18 [RFC] mm/memory.c: Optimizing THP zeroing routine for !HIGHMEM cases Prathu Baronia
2020-04-03 8:52 ` Michal Hocko
2020-04-09 15:29 ` Prathu Baronia
2020-04-09 15:45 ` Michal Hocko
[not found] ` <SG2PR04MB2921D2AAA8726318EF53D83691DE0@SG2PR04MB2921.apcprd04.prod.outlook.com>
2020-04-10 9:05 ` Huang, Ying
2020-04-11 15:40 ` Chintan Pandya [this message]
2020-04-11 20:47 ` Alexander Duyck
2020-04-13 15:33 ` Prathu Baronia
2020-04-13 16:24 ` Alexander Duyck
2020-04-14 1:10 ` Huang, Ying
2020-04-10 18:54 ` Alexander Duyck
2020-04-11 8:45 ` Chintan Pandya
2020-04-14 15:55 ` Daniel Jordan
2020-04-14 17:33 ` Chintan Pandya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SG2PR04MB2921E6D51681B935C0F85EEA91DF0@SG2PR04MB2921.apcprd04.prod.outlook.com \
--to=chintan.pandya@oneplus.com \
--cc=Gasine.Xu@Oneplus.com \
--cc=akpm@linux-foundation.org \
--cc=gregkh@linuxfoundation.org \
--cc=gthelen@google.com \
--cc=jack@suse.cz \
--cc=ken.lin@oneplus.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=prathu.baronia@oneplus.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).