linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Sverdlin, Alexander (Nokia - DE/Ulm)" <alexander.sverdlin@nokia.com>
Cc: "x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Jason Vas Dias <jason.vas.dias@gmail.com>,
	Andy Lutomirski <luto@kernel.org>
Subject: Re: [PATCH v3 1/2] x86/vdso: Move mult and shift into struct vgtod_ts
Date: Sun, 23 Jun 2019 12:18:26 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1906231008170.32342@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20190605144116.28553-2-alexander.sverdlin@nokia.com>

Alexander,

On Wed, 5 Jun 2019, Sverdlin, Alexander (Nokia - DE/Ulm) wrote:

> From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
> 
> This is a preparation for CLOCK_MONOTONIC_RAW vDSO implementation.
> Coincidentally it had a slight performance improvement as well:
> 
> ---- Test code ----
>  #include <errno.h>
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <time.h>
>  #include <unistd.h>
> 
>  #define CLOCK_TYPE CLOCK_MONOTONIC_RAW
>  #define DURATION_SEC 10
> 
> int main(int argc, char **argv)
> {
> 	struct timespec t, end;
> 	unsigned long long cnt = 0;
> 
> 	clock_gettime(CLOCK_TYPE, &end);
> 	end.tv_sec += DURATION_SEC;
> 
> 	do {
> 		clock_gettime(CLOCK_TYPE, &t);
> 		++cnt;
> 	} while (t.tv_sec < end.tv_sec || t.tv_nsec < end.tv_nsec);
> 
> 	dprintf(STDOUT_FILENO, "%llu", cnt);
> 
> 	return EXIT_SUCCESS;
> }
> -------------------
> 
> The results from the above test program:
> 
> Clock                   Before  After   Diff
> -----                   ------  -----   ----
> CLOCK_MONOTONIC         355.5M  359.6M  +1%
> CLOCK_REALTIME          355.5M  359.6M  +1%

Been there done that. But the results of the micro benchmark above have to
be taken with a grain of salt.

Let's look at the cache side effects of this change:

Before:

struct vsyscall_gtod_data {
	unsigned int               seq;                  /*     0     4 */
	int                        vclock_mode;          /*     4     4 */
	u64                        cycle_last;           /*     8     8 */
	u64                        mask;                 /*    16     8 */
	u32                        mult;                 /*    24     4 */
	u32                        shift;                /*    28     4 */
	struct vgtod_ts            basetime[12];         /*    32   192 */
	/* --- cacheline 3 boundary (192 bytes) was 32 bytes ago --- */
	int                        tz_minuteswest;       /*   224     4 */
	int                        tz_dsttime;           /*   228     4 */

	/* size: 232, cachelines: 4, members: 9 */
	/* last cacheline: 40 bytes */
};

After:

struct vsyscall_gtod_data {
	unsigned int               seq;                  /*     0     4 */
	int                        vclock_mode;          /*     4     4 */
	u64                        cycle_last;           /*     8     8 */
	u64                        mask;                 /*    16     8 */
	struct vgtod_ts            basetime[12];         /*    24   288 */
	/* --- cacheline 4 boundary (256 bytes) was 56 bytes ago --- */
	int                        tz_minuteswest;       /*   312     4 */
	int                        tz_dsttime;           /*   316     4 */

	/* size: 320, cachelines: 5, members: 7 */
};

So the interesting change is here:

	struct vgtod_ts            basetime[12];         /*    32   192 */

vs.

	struct vgtod_ts            basetime[12];         /*    24   288 */

In the original version struct vgtod_ts occupies 16 bytes in the modified
version it has 24 bytes. As a consequence:

	CLOCK				Before		After
	data->basetime[CLOCK_REALTIME]	32 .. 47	24 .. 47
	data->basetime[CLOCK_MONOTONIC]	48 .. 63	48 .. 71

That means that the CLOCK_MONOTONIC storage now overlaps into a second
cache line. CLOCK_MONOTONIC is widely used.

Of course on a micro benchmark this does not matter at all, but in real
world use cases which care about cache pressure it does.

Another thing is that just adding some small computation to the benchmark
loop makes the advantage of the modified version go away and it becomes
slightly slower.

On a real world application which is a cache heavy and uses CLOCK MONOTONIC
extensivly I can observe a significant L1 miss increase (%2) with the
modified version.

Now there is something else which is interesting:

On a guest (pvclock) I can observe with your benchmark:

 Before:

	MONO  cnt: 576.7
	REAL  cnt: 576.6

 After: 

 	MONO  cnt: 577.1
	REAL  cnt: 577.0

But on a (different) host:

 Before:

       MONO   cnt: 353.9
       REAL   cnt: 354.4

 After:

       MONO   cnt: 347.9
       REAL   cnt: 347.1

Yuck!

So this is really sensitive and I'm inclined not to do that due to the
cache line side effect.

The alternative solution for this is what Vincenzo has in his unified VDSO
patch series:

  https://lkml.kernel.org/r/20190621095252.32307-1-vincenzo.frascino@arm.com

It leaves the data struct unmodified and has a separate array for the raw
clock. That does not have the side effects at all.

I'm in the process of merging that series and I actually adapted your
scheme to the new unified infrastructure where it has exactly the same
effects as with your original patches against the x86 version.

Thanks,

	tglx

  reply	other threads:[~2019-06-23 11:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-28  8:01 [PATCH] x86/vdso: implement clock_gettime(CLOCK_MONOTONIC_RAW, ...) Sverdlin, Alexander (Nokia - DE/Ulm)
2019-05-28 18:01 ` Thomas Gleixner
2019-06-04 16:42   ` [PATCH v2] " Sverdlin, Alexander (Nokia - DE/Ulm)
2019-06-04 17:39     ` Andy Lutomirski
2019-06-05 14:41       ` [PATCH v3 0/2] x86/vdso: CLOCK_MONOTONIC_RAW implementation Sverdlin, Alexander (Nokia - DE/Ulm)
2019-06-05 14:41         ` [PATCH v3 1/2] x86/vdso: Move mult and shift into struct vgtod_ts Sverdlin, Alexander (Nokia - DE/Ulm)
2019-06-23 10:18           ` Thomas Gleixner [this message]
2019-06-24  9:16             ` Sverdlin, Alexander (Nokia - DE/Ulm)
2019-06-24  9:40               ` Thomas Gleixner
2019-06-27 12:00                 ` Sverdlin, Alexander (Nokia - DE/Ulm)
2019-06-27 12:07                   ` Thomas Gleixner
2019-06-27 12:15                     ` Sverdlin, Alexander (Nokia - DE/Ulm)
2019-06-27 12:21                       ` Thomas Gleixner
2019-06-05 14:41         ` [PATCH v3 2/2] x86/vdso: implement clock_gettime(CLOCK_MONOTONIC_RAW, ...) Sverdlin, Alexander (Nokia - DE/Ulm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1906231008170.32342@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=alexander.sverdlin@nokia.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jason.vas.dias@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).