From: Andy Shevchenko <andy.shevchenko@gmail.com>
To: Thorsten Leemhuis <linux@leemhuis.info>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Randy Dunlap <rdunlap@infradead.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"open list:KERNEL SELFTEST FRAMEWORK"
<linux-kselftest@vger.kernel.org>,
KUnit Development <kunit-dev@googlegroups.com>,
Linux Media Mailing List <linux-media@vger.kernel.org>,
netdev <netdev@vger.kernel.org>,
Brendan Higgins <brendanhiggins@google.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
Sakari Ailus <sakari.ailus@linux.intel.com>,
Laurent Pinchart <laurent.pinchart@ideasonboard.com>,
Mauro Carvalho Chehab <mchehab@kernel.org>,
Thomas Graf <tgraf@suug.ch>,
Herbert Xu <herbert@gondor.apana.org.au>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 0/4] kernel.h further split
Date: Wed, 13 Oct 2021 13:31:38 +0300 [thread overview]
Message-ID: <YWa1igOl4eAxv6FL@smile.fi.intel.com> (raw)
In-Reply-To: <20211008113758.6cbee642@t14s>
On Fri, Oct 08, 2021 at 11:37:58AM +0200, Thorsten Leemhuis wrote:
> On Thu, 7 Oct 2021 14:51:15 +0300
> Andy Shevchenko <andy.shevchenko@gmail.com> wrote:
> > On Thu, Oct 7, 2021 at 1:34 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > > On Thu, Oct 07, 2021 at 12:51:25PM +0300, Andy Shevchenko wrote:
> > > > The kernel.h is a set of something which is not related to each
> > > > other and often used in non-crossed compilation units, especially
> > > > when drivers need only one or two macro definitions from it.
> > > >
> > > > Here is the split of container_of(). The goals are the following:
> > > > - untwist the dependency hell a bit
> > > > - drop kernel.h inclusion where it's only used for container_of()
> > > > - speed up C preprocessing.
> > > >
> > > > People, like Greg KH and Miguel Ojeda, were asking about the
> > > > latter. Read below the methodology and test setup with outcome
> > > > numbers.
> > > >
> > > > The methodology
> > > > ===============
> > > > The question here is how to measure in the more or less clean way
> > > > the C preprocessing time when building a project like Linux
> > > > kernel. To answer it, let's look around and see what tools do we
> > > > have that may help. Aha, here is ccache tool that seems quite
> > > > plausible to be used. Its core idea is to preprocess C file,
> > > > count hash (MD4) and compare to ones that are in the cache. If
> > > > found, return the object file, avoiding compilation stage.
> > > >
> > > > Taking into account the property of the ccache, configure and use
> > > > it in the below steps:
> > > >
> > > > 1. Configure kernel with allyesconfig
> > > >
> > > > 2. Make it with `make` to be sure that the cache is filled with
> > > > the latest data. I.o.w. warm up the cache.
> > > >
> > > > 3. Run `make -s` (silent mode to reduce the influence of
> > > > the unrelated things, like console output) 10 times and
> > > > measure 'real' time spent.
> > > >
> > > > 4. Repeat 1-3 for each patch or patch set to get data sets before
> > > > and after.
> > > >
> > > > When we get the raw data, calculating median will show us the
> > > > number. Comparing them before and after we will see the
> > > > difference.
> > > >
> > > > The setup
> > > > =========
> > > > I have used the Intel x86_64 server platform (see partial output
> > > > of `lscpu` below):
> > > >
> > > > $ lscpu
> > > > Architecture: x86_64
> > > > CPU op-mode(s): 32-bit, 64-bit
> > > > Address sizes: 46 bits physical, 48 bits virtual
> > > > Byte Order: Little Endian
> > > > CPU(s): 88
> > > > On-line CPU(s) list: 0-87
> > > > Vendor ID: GenuineIntel
> > > > Model name: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > > > CPU family: 6
> > > > Model: 79
> > > > Thread(s) per core: 2
> > > > Core(s) per socket: 22
> > > > Socket(s): 2
> > > > Stepping: 1
> > > > CPU max MHz: 3600.0000
> > > > CPU min MHz: 1200.0000
> > > > ...
> > > > Caches (sum of all):
> > > > L1d: 1.4 MiB (44 instances)
> > > > L1i: 1.4 MiB (44 instances)
> > > > L2: 11 MiB (44 instances)
> > > > L3: 110 MiB (2 instances)
> > > > NUMA:
> > > > NUMA node(s): 2
> > > > NUMA node0 CPU(s): 0-21,44-65
> > > > NUMA node1 CPU(s): 22-43,66-87
> > > > Vulnerabilities:
> > > > Itlb multihit: KVM: Mitigation: Split huge pages
> > > > L1tf: Mitigation; PTE Inversion; VMX
> > > > conditional cache flushes, SMT vulnerable Mds:
> > > > Mitigation; Clear CPU buffers; SMT vulnerable Meltdown:
> > > > Mitigation; PTI Spec store bypass: Mitigation; Speculative
> > > > Store Bypass disabled via prctl and seccomp Spectre v1:
> > > > Mitigation; usercopy/swapgs barriers and __user pointer
> > > > sanitization Spectre v2: Mitigation; Full generic
> > > > retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB
> > > > filling Tsx async abort: Mitigation; Clear CPU buffers; SMT
> > > > vulnerable
> > > >
> > > > With the following GCC:
> > > >
> > > > $ gcc --version
> > > > gcc (Debian 10.3.0-11) 10.3.0
> > > >
> > > > The commands I have run during the measurement were:
> > > >
> > > > rm -rf $O
> > > > make O=$O allyesconfig
> > > > time make O=$O -s -j64 # this step has been measured
>
> BTW, what kcbench does in the end is not that different, but it only
> builds the config once and that uses it for all further testing.
Since I measure the third operation only this shouldn't affect recreation
of the configuration file.
> > > > The raw data and median
> > > > =======================
> > > > Before patch 2 (yes, I have measured the only patch 2 effect) in
> > > > the series (the data is sorted by time):
> > > >
> > > > real 2m8.794s
> > > > real 2m11.183s
> > > > real 2m11.235s
> > > > real 2m11.639s
> > > > real 2m11.960s
> > > > real 2m12.014s
> > > > real 2m12.609s
> > > > real 2m13.177s
> > > > real 2m13.462s
> > > > real 2m19.132s
> > > >
> > > > After patch 2 has been applied:
> > > >
> > > > real 2m8.536s
> > > > real 2m8.776s
> > > > real 2m9.071s
> > > > real 2m9.459s
> > > > real 2m9.531s
> > > > real 2m9.610s
> > > > real 2m10.356s
> > > > real 2m10.430s
> > > > real 2m11.117s
> > > > real 2m11.885s
> > > >
> > > > Median values are:
> > > > 131.987s before
> > > > 129.571s after
> > > >
> > > > We see the steady speedup as of 1.83%.
> > >
> > > You do know about kcbench:
> > > https://gitlab.com/knurd42/kcbench.git
> > >
> > > Try running that to make it such that we know how it was tested :)
> >
> > I'll try it.
> >
> > Meanwhile, Thorsten, can you have a look at my approach and tell if it
> > makes sense?
>
> I'm not the right person to ask here, I don't know enough about the
> inner working of ccache and C preprocessing. Reminder: I'm not a real
> kernel/C developer, but more kind of a parasite that lives on the
> fringes of kernel development. ;-) Kcbench in fact originated as a
> benchmark magazine for the computer magazine I used to work for – where
> I also did quite a few benchmarks. But that knowledge might be helpful
> here:
>
> The measurements before and after patch 2 was applied get slower over
> time. That is a hint that something is interfering. Is the disk filling
> up and making the fs do more work? Or is the machine getting to hot? It
> IMHO would be worth investigating and ruling out, as the differences
> you are looking out are likely quite small
I tried to explain why my methodology is closer to what we need to measure
in the above and replies. TL;DR: mathematically the O() shadows o() and as
we know the CPU and disk usage during compilation is a huge in comparison
to the C preprocessing. I'm not sure what you are referring by "slower
over time" since I explicitly said that I have _sorted_ the data. Nothing
should be done here, I believe.
> Also: the last run of the first measurement cycle is off by quite a
> bit, so I wouldn't even include the result, as there like was something
> that disturbed the benchmark.
I believe you missed the very same remark, i.e. that the data is sorted.
> And I might be missing something, but why were you using "-j 64" on a
> machine with 44 cores/88 threads?
Because that machine has more processes being run. And I would like to
minimize fluctuation of the CPU scheduling when some process requires
a resource to perform little work.
> I wonder if that might lead do
> interesting effects due to SMT (some core will run two threads, other
> only one). Using either "-j 44" or "-j 88" might be better.
How -j64 can be better? Nothing will guarantee that any of the core will
be half-loaded. But -j88 is worse because any process that wakes up and
requires for a resource may affect the measurements.
> But I
> suggest you run kcbench once without specifying "-j", as that will
> check which setting is the fastest on this system – and then use that
> for all further tests.
Next time I will try this approach, thanks for your reply and insights!
--
With Best Regards,
Andy Shevchenko
prev parent reply other threads:[~2021-10-13 10:31 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-07 9:51 [PATCH v2 0/4] kernel.h further split Andy Shevchenko
2021-10-07 9:51 ` [PATCH v2 1/4] kernel.h: Drop unneeded <linux/kernel.h> inclusion from other headers Andy Shevchenko
2021-10-07 9:51 ` [PATCH v2 2/4] kernel.h: Split out container_of() and typeof_member() macros Andy Shevchenko
2021-10-07 10:37 ` Greg Kroah-Hartman
2021-10-07 10:38 ` Greg Kroah-Hartman
2021-10-07 9:51 ` [PATCH v2 3/4] lib/rhashtable: Replace kernel.h with the necessary inclusions Andy Shevchenko
2021-10-07 11:23 ` Herbert Xu
2021-10-07 11:44 ` Andy Shevchenko
2021-10-08 2:18 ` Herbert Xu
2021-10-07 9:51 ` [PATCH v2 4/4] kunit: " Andy Shevchenko
2021-10-07 10:33 ` [PATCH v2 0/4] kernel.h further split Greg Kroah-Hartman
2021-10-07 11:51 ` Andy Shevchenko
2021-10-07 13:59 ` Greg Kroah-Hartman
2021-10-07 14:47 ` Andy Shevchenko
2021-10-07 15:07 ` Greg Kroah-Hartman
2021-10-08 9:37 ` Thorsten Leemhuis
2021-10-13 10:31 ` Andy Shevchenko [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YWa1igOl4eAxv6FL@smile.fi.intel.com \
--to=andy.shevchenko@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=brendanhiggins@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=herbert@gondor.apana.org.au \
--cc=kunit-dev@googlegroups.com \
--cc=laurent.pinchart@ideasonboard.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux@leemhuis.info \
--cc=longman@redhat.com \
--cc=mchehab@kernel.org \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=sakari.ailus@linux.intel.com \
--cc=tglx@linutronix.de \
--cc=tgraf@suug.ch \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).