[v12,10/18] x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
diff mbox series

Message ID 20200511045311.4785-11-sashal@kernel.org
State New
Headers show
Series
  • Enable FSGSBASE instructions
Related show

Commit Message

Sasha Levin May 11, 2020, 4:53 a.m. UTC
From: "Chang S. Bae" <chang.seok.bae@intel.com>

Add CPU feature conditional FS/GS base access to the relevant helper
functions. That allows accelerating certain FS/GS base operations in
subsequent changes.

Note, that while possible, the user space entry/exit GS base operations are
not going to use the new FSGSBASE instructions. The reason is that it would
require additional storage for the user space value which adds more
complexity to the low level code and experiments have shown marginal
benefit. This may be revisited later but for now the SWAPGS based handling
in the entry code is preserved except for the paranoid entry/exit code.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 arch/x86/include/asm/fsgsbase.h | 27 +++++++--------
 arch/x86/kernel/process_64.c    | 58 +++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 15 deletions(-)

Comments

Thomas Gleixner May 18, 2020, 6:20 p.m. UTC | #1
Sasha Levin <sashal@kernel.org> writes:
> +unsigned long x86_gsbase_read_cpu_inactive(void)
> +{
> +	unsigned long gsbase;
> +
> +	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> +		bool need_restore = false;
> +		unsigned long flags;
> +
> +		/*
> +		 * We read the inactive GS base value by swapping
> +		 * to make it the active one. But we cannot allow
> +		 * an interrupt while we switch to and from.
> +		 */
> +		if (!irqs_disabled()) {
> +			local_irq_save(flags);
> +			need_restore = true;
> +		}
> +
> +		native_swapgs();
> +		gsbase = rdgsbase();
> +		native_swapgs();
> +
> +		if (need_restore)
> +			local_irq_restore(flags);

Where does this crap come from?

This conditional irqsave gunk is clearly NOT what was in the tip tree
before it got reverted:

  a86b4625138d ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions")

In https://lore.kernel.org/r/87ftcrtckg.fsf@nanos.tec.linutronix.de I
explicitely asked for this:

     - Made sure that the cleanups I did when merging them initially have
       been picked up. I'm not going to waste another couple of days on
       this mess just to revert it because it hadn't seen any serious
       testing in development.

and you confirmed in https://lore.kernel.org/r/20200426025243.GJ13035@sasha-vm

       Based on your revert
       (https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/cpu&id=049331f277fef1c3f2527c2c9afa1d285e9a1247)
       I believe that we have all the relevant patches in the series.

And the above while it might not have exploded yet, is simply broken
because the 'swapgs rd/wr swapgs' sequence is not protected against
kprobes. There is even a big fat comment in that original commit:

 /*
  * Out of line to be protected from kprobes. It is not used on Xen
  * paravirt. When paravirt support is needed, it needs to be renamed
  * with native_ prefix.
  */

Yes, you surely got all patches from the git tree and made sure that the
result reflects that.

I've just extracted the original commits from git and applied them and
fixed the trivial rejects. Then I diffed the result against this lot:

 - That above gunk, which is the worst of all

 - In paranoid_exit()

-	TRACE_IRQS_IRETQ_DEBUG
+	TRACE_IRQS_OFF_DEBUG

 - Dropped comments vs. FENCE_SWAPGS and a gazillion of comment
   changes to make reading the diff harder.

Then I gave up looking at it.

It took me ~ 20 minutes (ignoring selftests and documentation) to fixup
the rejects and create a patch queue which is reflecting the state
before the revert and does not have complete crap in it.

This required to add one preparatory patch dealing with the changes in
copy_thread_tls() and no, not by inlining all of that twice.

It took me another 5 minutes to get rid of the local_irq_save/restore()
in save_fsgs() on top without any conditional crap.

I'm seriously tired of this FSGSBASE mess. Every single version I've
looked at in several years was a trainwreck.

Don't bother to send out a new version of this in a frenzy. For my
mental sake I'm not going to look at yet another cobbled together
trainwreck anytime soon.

If you read the above carefully you might find a recipe of properly
engineering this so it's easy to verify against the old version.

Your's seriously grumpy

       tglx
Sasha Levin May 18, 2020, 8:24 p.m. UTC | #2
Thank you for taking the time to review this.

On Mon, May 18, 2020 at 08:20:08PM +0200, Thomas Gleixner wrote:
>Sasha Levin <sashal@kernel.org> writes:
>> +unsigned long x86_gsbase_read_cpu_inactive(void)
>> +{
>> +	unsigned long gsbase;
>> +
>> +	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
>> +		bool need_restore = false;
>> +		unsigned long flags;
>> +
>> +		/*
>> +		 * We read the inactive GS base value by swapping
>> +		 * to make it the active one. But we cannot allow
>> +		 * an interrupt while we switch to and from.
>> +		 */
>> +		if (!irqs_disabled()) {
>> +			local_irq_save(flags);
>> +			need_restore = true;
>> +		}
>> +
>> +		native_swapgs();
>> +		gsbase = rdgsbase();
>> +		native_swapgs();
>> +
>> +		if (need_restore)
>> +			local_irq_restore(flags);
>
>Where does this crap come from?
>
>This conditional irqsave gunk is clearly NOT what was in the tip tree
>before it got reverted:
>
>  a86b4625138d ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions")

It wasn't in the reverted series, it came in Intel's v9 series, with
these comments in the cover letter:

	Updates from v8 [10]:
	[...]
	* Simplified GS base helper functions (Tony L.)

>In https://lore.kernel.org/r/87ftcrtckg.fsf@nanos.tec.linutronix.de I
>explicitely asked for this:
>
>     - Made sure that the cleanups I did when merging them initially have
>       been picked up. I'm not going to waste another couple of days on
>       this mess just to revert it because it hadn't seen any serious
>       testing in development.
>
>and you confirmed in https://lore.kernel.org/r/20200426025243.GJ13035@sasha-vm
>
>       Based on your revert
>       (https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/cpu&id=049331f277fef1c3f2527c2c9afa1d285e9a1247)
>       I believe that we have all the relevant patches in the series.

And I did, Thomas. While I'm not intimately familiar with the code I
made sure that all the patches that came on top of the merged series
before it got reverted made it into this new series. However, more work
has happened here after the revert and I would expect that the code in
this new series will be different than the code you reverted last year.

>And the above while it might not have exploded yet, is simply broken
>because the 'swapgs rd/wr swapgs' sequence is not protected against
>kprobes. There is even a big fat comment in that original commit:
>
> /*
>  * Out of line to be protected from kprobes. It is not used on Xen
>  * paravirt. When paravirt support is needed, it needs to be renamed
>  * with native_ prefix.
>  */
>
>Yes, you surely got all patches from the git tree and made sure that the
>result reflects that.
>
>I've just extracted the original commits from git and applied them and
>fixed the trivial rejects. Then I diffed the result against this lot:
>
> - That above gunk, which is the worst of all

Changed in v9 of the series.

> - In paranoid_exit()
>
>-	TRACE_IRQS_IRETQ_DEBUG
>+	TRACE_IRQS_OFF_DEBUG

(assuming we're looking at the same thing here, ) Changed in v8 of the
series.

> - Dropped comments vs. FENCE_SWAPGS and a gazillion of comment
>   changes to make reading the diff harder.

Changed in every version after the revert:

  - v7:
    - "Add more comments for entry changes"
  - v8:
    - "Carried on Thomas' edits on multiple changelogs and comments"
  - v9:
    - "Fixed typos (Randy D.) and massaged a few sentences in the
      documentation"

>Then I gave up looking at it.
>
>It took me ~ 20 minutes (ignoring selftests and documentation) to fixup
>the rejects and create a patch queue which is reflecting the state
>before the revert and does not have complete crap in it.
>
>This required to add one preparatory patch dealing with the changes in
>copy_thread_tls() and no, not by inlining all of that twice.
>
>It took me another 5 minutes to get rid of the local_irq_save/restore()
>in save_fsgs() on top without any conditional crap.
>
>I'm seriously tired of this FSGSBASE mess. Every single version I've
>looked at in several years was a trainwreck.
>
>Don't bother to send out a new version of this in a frenzy. For my
>mental sake I'm not going to look at yet another cobbled together
>trainwreck anytime soon.
>
>If you read the above carefully you might find a recipe of properly
>engineering this so it's easy to verify against the old version.

I'm a bit confused about the surprise here that v12 is different than
the reverted patches. There were multiple rounds of review which
resulted in the code being more than just a revert of the revert along
with a small fix.

This very issue was brought up by Andy in v7 of the series:

On Mon, Sep 16, 2019 at 11:38 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 12 Sep 2019, Andy Lutomirski wrote:
> > On 9/12/19 1:06 PM, Chang S. Bae wrote:
> > > Updates from v7 [7]:
> > > (1) Consider FSGSBASE when determining which Spectre SWAPGS mitigations are
> > >      required.
> > > (2) Fixed save_fsgs() to be aware of interrupt conditions
> > > (3) Made selftest changes based on Andy's previous fixes and cleanups
> > > (4) Included Andy's paranoid exit cleanup
> > > (5) Included documentation rewritten by Thomas
> > > (6) Carried on Thomas' edits on multiple changelogs and comments
> > > (7) Used '[FS|GS] base' consistently, except for selftest where GSBASE has
> > >      been already used in its test messages
> > > (8) Dropped the READ_MSR_GSBASE macro
> > >
> >
> > This looks unpleasant to review.  I wonder if it would be better to unrevert
> > the reversion, merge up to Linus' tree or -tip, and then base the changes on
> > top of that.
>
> I don't think that's a good idea. The old code is broken in several ways
> and not bisectable. So we really better start from scratch.

And this is what we have here, a series that has more than trivial
differences from the revert, and is more of a pain to review. Look at
what you did with your 25 minutes: you've reverted the revert and went
on to apply fixes on top of it, exactly the thing you've asked
not to do a few months prior.

No need to worry about me sending a new series, as I can't - I just
don't know what you want to see at this point: on one hand you're saying
"we really better start from scratch" and on the other hand "this
conditional irqsave gunk is clearly NOT what was in the tip tree before
it got reverted", you're making suggestions to change comments only to
later complain that "a gazillion of comment changes make reading the
diff harder".
Thomas Gleixner May 18, 2020, 10:59 p.m. UTC | #3
Sasha,

Sasha Levin <sashal@kernel.org> writes:
> Thank you for taking the time to review this.

welcome and sorry for the explosion.

> On Mon, May 18, 2020 at 08:20:08PM +0200, Thomas Gleixner wrote:
>>Sasha Levin <sashal@kernel.org> writes:
>>This conditional irqsave gunk is clearly NOT what was in the tip tree
>>before it got reverted:
>>
>>  a86b4625138d ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions")
>
> It wasn't in the reverted series, it came in Intel's v9 series, with
> these comments in the cover letter:
>
> 	Updates from v8 [10]:
> 	[...]
> 	* Simplified GS base helper functions (Tony L.)

Ok. I never looked at that series because that requested confirmation
that nothing will regress due to the ptrace changes was not there. After
a bit of handwaving this dried out. So I completely missed that back
then. And I did not look at any later variant which had 0day complaints.

> And I did, Thomas. While I'm not intimately familiar with the code I
> made sure that all the patches that came on top of the merged series
> before it got reverted made it into this new series. However, more work
> has happened here after the revert and I would expect that the code in
> this new series will be different than the code you reverted last
> year.

It's obvious that it would be different from what was merged simply
because the affected code has changed but not in substantial points like
losing a kprobes protection by "simplifying" something which was
carefully done in the first place.

It's not your fault at all, you just happened to be the messanger. The
people responsible for that mess owe you at least a beer.

>> - In paranoid_exit()
>>
>>-	TRACE_IRQS_IRETQ_DEBUG
>>+	TRACE_IRQS_OFF_DEBUG
>
> (assuming we're looking at the same thing here, ) Changed in v8 of the
> series.

Sigh.

> I'm a bit confused about the surprise here that v12 is different than
> the reverted patches. There were multiple rounds of review which
> resulted in the code being more than just a revert of the revert along
> with a small fix.
>
>> > This looks unpleasant to review.  I wonder if it would be better to unrevert
>> > the reversion, merge up to Linus' tree or -tip, and then base the changes on
>> > top of that.
>>
>> I don't think that's a good idea. The old code is broken in several ways
>> and not bisectable. So we really better start from scratch.
>
> And this is what we have here, a series that has more than trivial
> differences from the revert, and is more of a pain to review. Look at
> what you did with your 25 minutes: you've reverted the revert and went
> on to apply fixes on top of it, exactly the thing you've asked
> not to do a few months prior.

I did that to analyse whether that new series has everything what was
fixed back then and did not introduce new bugs. Mission accomplished.

> No need to worry about me sending a new series, as I can't - I just
> don't know what you want to see at this point: on one hand you're saying
> "we really better start from scratch" and on the other hand "this
> conditional irqsave gunk is clearly NOT what was in the tip tree before
> it got reverted", you're making suggestions to change comments only to
> later complain that "a gazillion of comment changes make reading the
> diff harder". 

Gah. That comment change thing was just an annoyance and I complained
about it because I was already grumpy as hell.

So what I meant is that the blind revert of the revert, i.e. just
reapplying the previous stuff, is horrible. Simply because the reverted
patches were already not bisectable. And then applying random changes on
top does not make it any better.

So yes, I would have done exactly where I started:

   1) Extract the original patches from git

   2) Apply them and fixup the rejects

and on top of that:

   3) Make them bisectable by folding back the fixes to the right place
      and reordering them which creates a result which is equivalent to
      'start from scratch' but without losing context and introducing
      new bugs. Simply because it's trivial to diff against the state
      before the revert.

   4) Do the 'improvements' on top, discuss them and fold them back.

For what you tried to do I would have omitted #4 completely and then
did:

   5) Rebase the latest Intel variant

   6) Diff the results ideally step by step

   7) Analyze the deltas carefully and if unsure about the result
      ask.
      
   That way you really would have noticed that this helper patch is
   substantially different and you would have noticed that the kprobes
   protection is gone. Also that would have clearly shown you the IRQ
   flag wreckage.

So to go forward can you please just do #1 - #3 first?

Vs. the s/GSBASE/GS base/g comment changes: I don't mind them per se,
but they are incomplete because they just change it in the new code
while there are still the original comments using GSBASE. So either we
change it wholesale or not at all. If so, then this wants to be a
separate patch right at the beginning of the new series which changes
the existing comments before introducing a different variant.

That "simplified" handling is going nowhere. That conditional irq
disable and the redundant conditionals and the out of line invocation in
switch_to() are just not going to happen.

So when comparing it to the latest Intel trainwreck ignore that part
completely,

I've uploaded my quick shot with a few cleanups on top (folded back) for
reference:

  https://tglx.de/~tglx/patches-fsgs.tar

Uncompiled and untested. I'm not claiming it's bug free either. If you
find one, please keep it. Hope that helps.

Thanks,

        tglx
David Laight May 19, 2020, 12:20 p.m. UTC | #4
From: Sasha Levin
> Sent: 18 May 2020 21:25
> Thank you for taking the time to review this.
> 
> On Mon, May 18, 2020 at 08:20:08PM +0200, Thomas Gleixner wrote:
> >Sasha Levin <sashal@kernel.org> writes:
> >> +unsigned long x86_gsbase_read_cpu_inactive(void)
> >> +{
> >> +	unsigned long gsbase;
> >> +
> >> +	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
> >> +		bool need_restore = false;
> >> +		unsigned long flags;
> >> +
> >> +		/*
> >> +		 * We read the inactive GS base value by swapping
> >> +		 * to make it the active one. But we cannot allow
> >> +		 * an interrupt while we switch to and from.
> >> +		 */
> >> +		if (!irqs_disabled()) {
> >> +			local_irq_save(flags);
> >> +			need_restore = true;
> >> +		}
> >> +
> >> +		native_swapgs();
> >> +		gsbase = rdgsbase();
> >> +		native_swapgs();

Does local_irq_save() even do anything useful here.
You need to actually execute CLI, not just set a
flag that indicates interrupts shouldn't happen.
(Which is what I think local_irq_save() might do.)

You also (probably) need to disable NMIs.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Thomas Gleixner May 19, 2020, 2:48 p.m. UTC | #5
David Laight <David.Laight@ACULAB.COM> writes:
> From: Sasha Levin
>> >> +		native_swapgs();
>> >> +		gsbase = rdgsbase();
>> >> +		native_swapgs();
>
> Does local_irq_save() even do anything useful here.
> You need to actually execute CLI, not just set a
> flag that indicates interrupts shouldn't happen.
> (Which is what I think local_irq_save() might do.)

  local_irq_save()
    raw_local_irq_save()
      arch_local_irq_save()
        arch_local_irq_disable()
          native_irq_disable()
            asm("CLI")

> You also (probably) need to disable NMIs.

The NMI entry can deal with that obviously.

Thanks,

        tglx
David Laight May 20, 2020, 9:13 a.m. UTC | #6
From: Thomas Gleixner
> Sent: 19 May 2020 15:48
> 
> David Laight <David.Laight@ACULAB.COM> writes:
> > From: Sasha Levin
> >> >> +		native_swapgs();
> >> >> +		gsbase = rdgsbase();
> >> >> +		native_swapgs();
> >
> > Does local_irq_save() even do anything useful here.
> > You need to actually execute CLI, not just set a
> > flag that indicates interrupts shouldn't happen.
> > (Which is what I think local_irq_save() might do.)
> 
>   local_irq_save()
>     raw_local_irq_save()
>       arch_local_irq_save()
>         arch_local_irq_disable()
>           native_irq_disable()
>             asm("CLI")

Ah, I was expecting software 'tricks' to avoid the expensive CLI.
But that call chain probably costs more - unless it is all inlined.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Patch
diff mbox series

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index fdd1177499b40..aefd53767a5d4 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -49,35 +49,32 @@  static __always_inline void wrgsbase(unsigned long gsbase)
 	asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
 }
 
+#include <asm/cpufeature.h>
+
 /* Helper functions for reading/writing FS/GS base */
 
 static inline unsigned long x86_fsbase_read_cpu(void)
 {
 	unsigned long fsbase;
 
-	rdmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		fsbase = rdfsbase();
+	else
+		rdmsrl(MSR_FS_BASE, fsbase);
 
 	return fsbase;
 }
 
-static inline unsigned long x86_gsbase_read_cpu_inactive(void)
-{
-	unsigned long gsbase;
-
-	rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
-
-	return gsbase;
-}
-
 static inline void x86_fsbase_write_cpu(unsigned long fsbase)
 {
-	wrmsrl(MSR_FS_BASE, fsbase);
+	if (static_cpu_has(X86_FEATURE_FSGSBASE))
+		wrfsbase(fsbase);
+	else
+		wrmsrl(MSR_FS_BASE, fsbase);
 }
 
-static inline void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
-{
-	wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
-}
+extern unsigned long x86_gsbase_read_cpu_inactive(void);
+extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
 
 #endif /* CONFIG_X86_64 */
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 5ef9d8f25b0e8..aaa65f284b9b9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -328,6 +328,64 @@  static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
 	return base;
 }
 
+unsigned long x86_gsbase_read_cpu_inactive(void)
+{
+	unsigned long gsbase;
+
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		bool need_restore = false;
+		unsigned long flags;
+
+		/*
+		 * We read the inactive GS base value by swapping
+		 * to make it the active one. But we cannot allow
+		 * an interrupt while we switch to and from.
+		 */
+		if (!irqs_disabled()) {
+			local_irq_save(flags);
+			need_restore = true;
+		}
+
+		native_swapgs();
+		gsbase = rdgsbase();
+		native_swapgs();
+
+		if (need_restore)
+			local_irq_restore(flags);
+	} else {
+		rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
+
+	return gsbase;
+}
+
+void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
+{
+	if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+		bool need_restore = false;
+		unsigned long flags;
+
+		/*
+		 * We write the inactive GS base value by swapping
+		 * to make it the active one. But we cannot allow
+		 * an interrupt while we switch to and from.
+		 */
+		if (!irqs_disabled()) {
+			local_irq_save(flags);
+			need_restore = true;
+		}
+
+		native_swapgs();
+		wrgsbase(gsbase);
+		native_swapgs();
+
+		if (need_restore)
+			local_irq_restore(flags);
+	} else {
+		wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+	}
+}
+
 unsigned long x86_fsbase_read_task(struct task_struct *task)
 {
 	unsigned long fsbase;