From: Thomas Gleixner <email@example.com>
To: Tom Lendacky <firstname.lastname@example.org>,
Dave Hansen <email@example.com>,
Cc: firstname.lastname@example.org, Andrew Cooper <email@example.com>,
"Edgecombe, Rick P" <firstname.lastname@example.org>
Subject: Re: [patch 3/3] x86/fpu/xsave: Optimize XSAVEC/S when XGETBV1 is supported
Date: Fri, 22 Apr 2022 21:30:19 +0200 [thread overview]
Message-ID: <87bkws6hmc.ffs@tglx> (raw)
On Wed, Apr 20 2022 at 13:15, Tom Lendacky wrote:
> On 4/19/22 16:22, Thomas Gleixner wrote:
>>> That was bare metal and I just checked that this was a production config
>>> and not some weird debug muck which breaks large pages. I'll look deeper
>>> into that.
>> I can't find any reasonable explanation. The pages are definitely large
>> pages, so yes the dTLB miss count does not make sense, but it's
>> consistently faster and it's always the dTLB miss count which makes the
>> big difference according to perf.
>> For enhanced fun, I ran the lot on a AMD Zen3 machine and with the same
>> test case (hackbench -l 10000) repeated 10 times by perf stat this is
>> consistently slower than the non optimized variant. There is at least an
>> explanation for that. A tight loop of 1 Mio xgetbv(1) invocations takes
>> 9 Mio cycles on a SKL-X and 50 Mio cycles on a AMD Zen3.
> I'll take a look into this and see what I find. Might be interesting to
> see if the actual XSAVES is slower or quicker, too, based on the input mask.
> If the performance slowdown shows up in real world benchmarks, we might
> want to consider not using the xgetbv() call on AMD.
As things stand now, I'm not going to pursue this further at the moment.
The effect on SKL-X is not explainable especially the dTLB miss count
decrease does not make any sense. Aside of that I just figured out that
it is very sensitive to kernel configurations and I have no idea yet
what exactly is the screw to turn to make the effect come and go.
So I just go and add the XSAVEC support alone as that's actually
something which _is_ beneficial for guests.
next prev parent reply other threads:[~2022-04-22 19:51 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-04 12:11 [patch 0/3] x86/fpu/xsave: Add XSAVEC support and XGETBV1 utilization Thomas Gleixner
2022-04-04 12:11 ` [patch 1/3] x86/fpu/xsave: Support XSAVEC in the kernel Thomas Gleixner
2022-04-04 16:10 ` Andrew Cooper
2022-04-14 14:43 ` Dave Hansen
2022-04-25 13:11 ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
2022-04-04 12:11 ` [patch 2/3] x86/fpu/xsave: Prepare for optimized compaction Thomas Gleixner
2022-04-14 15:46 ` Dave Hansen
2022-04-19 12:39 ` Thomas Gleixner
2022-04-19 13:33 ` Thomas Gleixner
2022-04-04 12:11 ` [patch 3/3] x86/fpu/xsave: Optimize XSAVEC/S when XGETBV1 is supported Thomas Gleixner
2022-04-14 17:24 ` Dave Hansen
2022-04-19 13:43 ` Thomas Gleixner
2022-04-19 21:22 ` Thomas Gleixner
2022-04-20 18:15 ` Tom Lendacky
2022-04-22 19:30 ` Thomas Gleixner [this message]
2022-04-23 15:20 ` Dave Hansen
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.