linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/3] x86/process: Optimize __switch_to_extra()
@ 2016-12-15 16:44 Thomas Gleixner
  2016-12-15 16:44 ` [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch Thomas Gleixner
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Thomas Gleixner @ 2016-12-15 16:44 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Kyle Huey, Andy Lutomirski

GCC generates lousy code in __switch_to_extra(). Aside of that some of the
operations there are implemented suboptimal.

This series, inspired by a patch from Kyle, helps the compiler to be less
stupid by explicitely giving the hints to optimize and replaces the open
coded bit toggle mechanisms with proper helper functions.

The resulting change in text size:

	 64bit	    32bit
Before:	  3726	     9388
After:	  3646	     9324
Delta:	    80	      152	    

The number of conditional jumps is also reduced:

	 64bit	    32bit
Before:	     8	       13
After:	     5	       10

Thanks,

	tglx

---
 include/asm/processor.h |   12 ++++++++
 include/asm/tlbflush.h  |   10 ++++++
 kernel/process.c        |   70 +++++++++++++++++++-----------------------------
 3 files changed, 51 insertions(+), 41 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch
  2016-12-15 16:44 [patch 0/3] x86/process: Optimize __switch_to_extra() Thomas Gleixner
@ 2016-12-15 16:44 ` Thomas Gleixner
  2016-12-15 17:28   ` Andy Lutomirski
  2016-12-15 16:44 ` [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra() Thomas Gleixner
  2016-12-15 16:44 ` [patch 3/3] x86/process: Optimize TIF_NOTSC switch Thomas Gleixner
  2 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2016-12-15 16:44 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Kyle Huey, Andy Lutomirski

[-- Attachment #1: x86-process--Optimize-TIF_BLOCKSTEP-switch.patch --]
[-- Type: text/plain, Size: 1677 bytes --]

Provide and use a seperate helper for toggling the DEBUGCTLMSR_BTF bit
instead of doing it open coded with a branch and eventually evaluating
boot_cpu_data twice.

x86_64:
3694	   8505	     16	  12215	   2fb7	Before
3662	   8505	     16	  12183	   2f97 After

i386:
5986	   9388	   1804	  17178	   431a	Before
5906	   9388	   1804	  17098	   42ca	After

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |   12 ++++++++++++
 arch/x86/kernel/process.c        |   10 ++--------
 2 files changed, 14 insertions(+), 8 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -676,6 +676,18 @@ static inline void update_debugctlmsr(un
 	wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctlmsr);
 }
 
+static inline void toggle_debugctlmsr(unsigned long mask)
+{
+	unsigned long msrval;
+
+#ifndef CONFIG_X86_DEBUGCTLMSR
+	if (boot_cpu_data.x86 < 6)
+		return;
+#endif
+	rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
+	wrmsrl(MSR_IA32_DEBUGCTLMSR, msrval ^ mask);
+}
+
 extern void set_task_blockstep(struct task_struct *task, bool on);
 
 /* Boot loader type from the setup header: */
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -209,14 +209,8 @@ void __switch_to_xtra(struct task_struct
 
 	propagate_user_return_notify(prev_p, next_p);
 
-	if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
-		unsigned long debugctl = get_debugctlmsr();
-
-		debugctl &= ~DEBUGCTLMSR_BTF;
-		if (tifn & _TIF_BLOCKSTEP)
-			debugctl |= DEBUGCTLMSR_BTF;
-		update_debugctlmsr(debugctl);
-	}
+	if ((tifp ^ tifn) & _TIF_BLOCKSTEP)
+		toggle_debugctlmsr(DEBUGCTLMSR_BTF);
 
 	if ((tifp ^ tifn) & _TIF_NOTSC) {
 		if (tifn & _TIF_NOTSC)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra()
  2016-12-15 16:44 [patch 0/3] x86/process: Optimize __switch_to_extra() Thomas Gleixner
  2016-12-15 16:44 ` [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch Thomas Gleixner
@ 2016-12-15 16:44 ` Thomas Gleixner
  2016-12-15 17:20   ` Peter Zijlstra
  2016-12-15 17:24   ` Andy Lutomirski
  2016-12-15 16:44 ` [patch 3/3] x86/process: Optimize TIF_NOTSC switch Thomas Gleixner
  2 siblings, 2 replies; 14+ messages in thread
From: Thomas Gleixner @ 2016-12-15 16:44 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Kyle Huey, Andy Lutomirski

[-- Attachment #1: x86-process--Optimize-TIF-checks-in-switch_to_extra--.patch --]
[-- Type: text/plain, Size: 2883 bytes --]

Help the compiler to avoid reevaluating the thread flags for each checked
bit by reordering the bit checks and providing an explicit xor for
evaluation.

x8664: arch/x86/kernel/process.o
text	   data	    bss	    dec	    hex
3726	   8505	     16	  12247	   2fd7	Before
3694	   8505	     16	  12215	   2fb7	After

i386:  No change

Originally-from: Kyle Huey <khuey@kylehuey.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/process.c |   54 ++++++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 23 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -174,48 +174,56 @@ int set_tsc_mode(unsigned int val)
 	return 0;
 }
 
+static inline void switch_to_bitmap(struct tss_struct *tss,
+				    struct thread_struct *prev,
+				    struct thread_struct *next,
+				    unsigned long tifp, unsigned long tifn)
+{
+	if (tifn & _TIF_IO_BITMAP) {
+		/*
+		 * Copy the relevant range of the IO bitmap.
+		 * Normally this is 128 bytes or less:
+		 */
+		memcpy(tss->io_bitmap, next->io_bitmap_ptr,
+		       max(prev->io_bitmap_max, next->io_bitmap_max));
+	} else if (tifp & _TIF_IO_BITMAP) {
+		/*
+		 * Clear any possible leftover bits:
+		 */
+		memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
+	}
+}
+
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 		      struct tss_struct *tss)
 {
 	struct thread_struct *prev, *next;
+	unsigned long tifp, tifn;
 
 	prev = &prev_p->thread;
 	next = &next_p->thread;
 
-	if (test_tsk_thread_flag(prev_p, TIF_BLOCKSTEP) ^
-	    test_tsk_thread_flag(next_p, TIF_BLOCKSTEP)) {
+	tifn = task_thread_info(next_p)->flags;
+	tifp = task_thread_info(prev_p)->flags;
+	switch_to_bitmap(tss, prev, next, tifp, tifn);
+
+	propagate_user_return_notify(prev_p, next_p);
+
+	if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
 		unsigned long debugctl = get_debugctlmsr();
 
 		debugctl &= ~DEBUGCTLMSR_BTF;
-		if (test_tsk_thread_flag(next_p, TIF_BLOCKSTEP))
+		if (tifn & _TIF_BLOCKSTEP)
 			debugctl |= DEBUGCTLMSR_BTF;
-
 		update_debugctlmsr(debugctl);
 	}
 
-	if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
-	    test_tsk_thread_flag(next_p, TIF_NOTSC)) {
-		/* prev and next are different */
-		if (test_tsk_thread_flag(next_p, TIF_NOTSC))
+	if ((tifp ^ tifn) & _TIF_NOTSC) {
+		if (tifn & _TIF_NOTSC)
 			hard_disable_TSC();
 		else
 			hard_enable_TSC();
 	}
-
-	if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
-		/*
-		 * Copy the relevant range of the IO bitmap.
-		 * Normally this is 128 bytes or less:
-		 */
-		memcpy(tss->io_bitmap, next->io_bitmap_ptr,
-		       max(prev->io_bitmap_max, next->io_bitmap_max));
-	} else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
-		/*
-		 * Clear any possible leftover bits:
-		 */
-		memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
-	}
-	propagate_user_return_notify(prev_p, next_p);
 }
 
 /*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 3/3] x86/process: Optimize TIF_NOTSC switch
  2016-12-15 16:44 [patch 0/3] x86/process: Optimize __switch_to_extra() Thomas Gleixner
  2016-12-15 16:44 ` [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch Thomas Gleixner
  2016-12-15 16:44 ` [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra() Thomas Gleixner
@ 2016-12-15 16:44 ` Thomas Gleixner
  2016-12-15 17:31   ` Andy Lutomirski
  2 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2016-12-15 16:44 UTC (permalink / raw)
  To: LKML; +Cc: x86, Peter Zijlstra, Kyle Huey, Andy Lutomirski

[-- Attachment #1: x86-process--Optimize-TIF_NOTSC-switch.patch --]
[-- Type: text/plain, Size: 2111 bytes --]

Provide and use a toggle helper instead of doing it with a branch.

x86_64:
3662       8505      16   12183    2f97 Before
3646	   8505	     16	  12167	   2f87	After 

i386:
5906	   9388	   1804	  17098	   42ca	Before
5834	   9324	   1740	  16898	   4202	After

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/tlbflush.h |   10 ++++++++++
 arch/x86/kernel/process.c       |   22 ++++------------------
 2 files changed, 14 insertions(+), 18 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -110,6 +110,16 @@ static inline void cr4_clear_bits(unsign
 	}
 }
 
+static inline void cr4_toggle_bits(unsigned long mask)
+{
+	unsigned long cr4;
+
+	cr4 = this_cpu_read(cpu_tlbstate.cr4);
+	cr4 ^= mask;
+	this_cpu_write(cpu_tlbstate.cr4, cr4);
+	__write_cr4(cr4);
+}
+
 /* Read the CR4 shadow. */
 static inline unsigned long cr4_read_shadow(void)
 {
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -116,11 +116,6 @@ void flush_thread(void)
 	fpu__clear(&tsk->thread.fpu);
 }
 
-static void hard_disable_TSC(void)
-{
-	cr4_set_bits(X86_CR4_TSD);
-}
-
 void disable_TSC(void)
 {
 	preempt_disable();
@@ -129,15 +124,10 @@ void disable_TSC(void)
 		 * Must flip the CPU state synchronously with
 		 * TIF_NOTSC in the current running context.
 		 */
-		hard_disable_TSC();
+		cr4_set_bits(X86_CR4_TSD);
 	preempt_enable();
 }
 
-static void hard_enable_TSC(void)
-{
-	cr4_clear_bits(X86_CR4_TSD);
-}
-
 static void enable_TSC(void)
 {
 	preempt_disable();
@@ -146,7 +136,7 @@ static void enable_TSC(void)
 		 * Must flip the CPU state synchronously with
 		 * TIF_NOTSC in the current running context.
 		 */
-		hard_enable_TSC();
+		cr4_clear_bits(X86_CR4_TSD);
 	preempt_enable();
 }
 
@@ -212,12 +202,8 @@ void __switch_to_xtra(struct task_struct
 	if ((tifp ^ tifn) & _TIF_BLOCKSTEP)
 		toggle_debugctlmsr(DEBUGCTLMSR_BTF);
 
-	if ((tifp ^ tifn) & _TIF_NOTSC) {
-		if (tifn & _TIF_NOTSC)
-			hard_disable_TSC();
-		else
-			hard_enable_TSC();
-	}
+	if ((tifp ^ tifn) & _TIF_NOTSC)
+		cr4_toggle_bits(X86_CR4_TSD);
 }
 
 /*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra()
  2016-12-15 16:44 ` [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra() Thomas Gleixner
@ 2016-12-15 17:20   ` Peter Zijlstra
  2016-12-15 17:26     ` Thomas Gleixner
  2016-12-15 17:24   ` Andy Lutomirski
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2016-12-15 17:20 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Kyle Huey, Andy Lutomirski

On Thu, Dec 15, 2016 at 04:44:02PM -0000, Thomas Gleixner wrote:
>  void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
>  		      struct tss_struct *tss)
>  {
>  	struct thread_struct *prev, *next;
> +	unsigned long tifp, tifn;
>  
>  	prev = &prev_p->thread;
>  	next = &next_p->thread;
>  
> +	tifn = task_thread_info(next_p)->flags;
> +	tifp = task_thread_info(prev_p)->flags;
> +	switch_to_bitmap(tss, prev, next, tifp, tifn);
> +
> +	propagate_user_return_notify(prev_p, next_p);
> +
> +	if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
>  		unsigned long debugctl = get_debugctlmsr();
>  
>  		debugctl &= ~DEBUGCTLMSR_BTF;
> +		if (tifn & _TIF_BLOCKSTEP)
>  			debugctl |= DEBUGCTLMSR_BTF;
>  		update_debugctlmsr(debugctl);
>  	}

Going by the toggle patter you have elsewhere, wouldn't that then be:

	if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
		unsigned long debugctl = get_debugctlmsr();

		debugctl ^= DEBUGCTLMSR_BTF;
		update_debugctlmsr(debugctl);
	}

?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra()
  2016-12-15 16:44 ` [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra() Thomas Gleixner
  2016-12-15 17:20   ` Peter Zijlstra
@ 2016-12-15 17:24   ` Andy Lutomirski
  1 sibling, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2016-12-15 17:24 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, X86 ML, Peter Zijlstra, Kyle Huey, Andy Lutomirski

On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> -       if (test_tsk_thread_flag(prev_p, TIF_BLOCKSTEP) ^
> -           test_tsk_thread_flag(next_p, TIF_BLOCKSTEP)) {
> +       tifn = task_thread_info(next_p)->flags;
> +       tifp = task_thread_info(prev_p)->flags;

Minor nit, but I think that a sufficiently clever compiler could
interpret this to mean "no one else is modifying these flags, so I can
do clever crazy things".  Wrapping these in READ_ONCE might be
helpful.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra()
  2016-12-15 17:20   ` Peter Zijlstra
@ 2016-12-15 17:26     ` Thomas Gleixner
  2016-12-15 17:33       ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2016-12-15 17:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, x86, Kyle Huey, Andy Lutomirski

On Thu, 15 Dec 2016, Peter Zijlstra wrote:
> On Thu, Dec 15, 2016 at 04:44:02PM -0000, Thomas Gleixner wrote:
> >  void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
> >  		      struct tss_struct *tss)
> >  {
> >  	struct thread_struct *prev, *next;
> > +	unsigned long tifp, tifn;
> >  
> >  	prev = &prev_p->thread;
> >  	next = &next_p->thread;
> >  
> > +	tifn = task_thread_info(next_p)->flags;
> > +	tifp = task_thread_info(prev_p)->flags;
> > +	switch_to_bitmap(tss, prev, next, tifp, tifn);
> > +
> > +	propagate_user_return_notify(prev_p, next_p);
> > +
> > +	if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
> >  		unsigned long debugctl = get_debugctlmsr();
> >  
> >  		debugctl &= ~DEBUGCTLMSR_BTF;
> > +		if (tifn & _TIF_BLOCKSTEP)
> >  			debugctl |= DEBUGCTLMSR_BTF;
> >  		update_debugctlmsr(debugctl);
> >  	}
> 
> Going by the toggle patter you have elsewhere, wouldn't that then be:
> 
> 	if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
> 		unsigned long debugctl = get_debugctlmsr();
> 
> 		debugctl ^= DEBUGCTLMSR_BTF;
> 		update_debugctlmsr(debugctl);
> 	}

See the next patch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch
  2016-12-15 16:44 ` [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch Thomas Gleixner
@ 2016-12-15 17:28   ` Andy Lutomirski
  2016-12-16  8:47     ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2016-12-15 17:28 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, X86 ML, Peter Zijlstra, Kyle Huey, Andy Lutomirski

On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> Provide and use a seperate helper for toggling the DEBUGCTLMSR_BTF bit
> instead of doing it open coded with a branch and eventually evaluating
> boot_cpu_data twice.
>
> x86_64:
> 3694       8505      16   12215    2fb7 Before
> 3662       8505      16   12183    2f97 After
>
> i386:
> 5986       9388    1804   17178    431a Before
> 5906       9388    1804   17098    42ca After
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/include/asm/processor.h |   12 ++++++++++++
>  arch/x86/kernel/process.c        |   10 ++--------
>  2 files changed, 14 insertions(+), 8 deletions(-)
>
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -676,6 +676,18 @@ static inline void update_debugctlmsr(un
>         wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctlmsr);
>  }
>
> +static inline void toggle_debugctlmsr(unsigned long mask)
> +{
> +       unsigned long msrval;
> +
> +#ifndef CONFIG_X86_DEBUGCTLMSR
> +       if (boot_cpu_data.x86 < 6)
> +               return;
> +#endif
> +       rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
> +       wrmsrl(MSR_IA32_DEBUGCTLMSR, msrval ^ mask);
> +}
> +

This scares me.  If the MSR ever gets out of sync with the TI flag,
this will malfunction.  And IIRC the MSR is highly magical and the CPU
clears it all by itself under a variety of not-so-well documented
circumstances.

How about adding a real feature bit and doing:

if (!static_cpu_has(X86_FEATURE_BLOCKSTEP))
  return;

rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
msrval &= DEBUGCTLMSR_BTF;
msrval |= (tifn >> TIF_BLOCKSTEP) << DEBUGCTLMSR_BIT;

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 3/3] x86/process: Optimize TIF_NOTSC switch
  2016-12-15 16:44 ` [patch 3/3] x86/process: Optimize TIF_NOTSC switch Thomas Gleixner
@ 2016-12-15 17:31   ` Andy Lutomirski
  2016-12-16  8:50     ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2016-12-15 17:31 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, X86 ML, Peter Zijlstra, Kyle Huey, Andy Lutomirski

On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> Provide and use a toggle helper instead of doing it with a branch.
>
> x86_64:
> 3662       8505      16   12183    2f97 Before
> 3646       8505      16   12167    2f87 After
>
> i386:
> 5906       9388    1804   17098    42ca Before
> 5834       9324    1740   16898    4202 After
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/include/asm/tlbflush.h |   10 ++++++++++
>  arch/x86/kernel/process.c       |   22 ++++------------------
>  2 files changed, 14 insertions(+), 18 deletions(-)
>
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -110,6 +110,16 @@ static inline void cr4_clear_bits(unsign
>         }
>  }
>
> +static inline void cr4_toggle_bits(unsigned long mask)
> +{
> +       unsigned long cr4;
> +
> +       cr4 = this_cpu_read(cpu_tlbstate.cr4);
> +       cr4 ^= mask;
> +       this_cpu_write(cpu_tlbstate.cr4, cr4);
> +       __write_cr4(cr4);
> +}

This scares me for the same reason as BTF, although this should at
least be less fragile.  But how about:

static inline void cr4_set_bit_to(unsigned long mask, bool set)
{
  ...
  cr4 &= ~mask;
  cr4 ^= (set << ilog2(mask));
  ...
}

This should generate code that's nearly as good.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra()
  2016-12-15 17:26     ` Thomas Gleixner
@ 2016-12-15 17:33       ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2016-12-15 17:33 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86, Kyle Huey, Andy Lutomirski

On Thu, Dec 15, 2016 at 06:26:28PM +0100, Thomas Gleixner wrote:
> See the next patch

Duh, I'm an idiot. For some reason I though this one got missed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch
  2016-12-15 17:28   ` Andy Lutomirski
@ 2016-12-16  8:47     ` Thomas Gleixner
  2016-12-16 19:22       ` Andy Lutomirski
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2016-12-16  8:47 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, X86 ML, Peter Zijlstra, Kyle Huey, Andy Lutomirski

On Thu, 15 Dec 2016, Andy Lutomirski wrote:
> On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > +static inline void toggle_debugctlmsr(unsigned long mask)
> > +{
> > +       unsigned long msrval;
> > +
> > +#ifndef CONFIG_X86_DEBUGCTLMSR
> > +       if (boot_cpu_data.x86 < 6)
> > +               return;
> > +#endif
> > +       rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
> > +       wrmsrl(MSR_IA32_DEBUGCTLMSR, msrval ^ mask);
> > +}
> > +
> 
> This scares me.  If the MSR ever gets out of sync with the TI flag,
> this will malfunction.  And IIRC the MSR is highly magical and the CPU
> clears it all by itself under a variety of not-so-well documented
> circumstances.

If that is true, then the code today is broken as well, when the flag has
been cleared and both prev and next have the flag set. Then it won't be
updated for the next task.

The we should not use the TIF flag and store a debugmask in thread info and
do:

	if (prev->debugmask || next->debugmask) {
    		if (static_cpu_has(X86_FEATURE_BLOCKSTEP)) {
			rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
			msrval &= DEBUGCTLMSR_BTF;
			msrval |= next->debugmask;
		}
	}

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 3/3] x86/process: Optimize TIF_NOTSC switch
  2016-12-15 17:31   ` Andy Lutomirski
@ 2016-12-16  8:50     ` Thomas Gleixner
  2016-12-16 18:34       ` Andy Lutomirski
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2016-12-16  8:50 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: LKML, X86 ML, Peter Zijlstra, Kyle Huey, Andy Lutomirski

On Thu, 15 Dec 2016, Andy Lutomirski wrote:
> On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > +static inline void cr4_toggle_bits(unsigned long mask)
> > +{
> > +       unsigned long cr4;
> > +
> > +       cr4 = this_cpu_read(cpu_tlbstate.cr4);
> > +       cr4 ^= mask;
> > +       this_cpu_write(cpu_tlbstate.cr4, cr4);
> > +       __write_cr4(cr4);
> > +}
> 
> This scares me for the same reason as BTF, although this should at
> least be less fragile.  But how about:

If that is fragile then all cr4 manipulation code is fragile because it
relies on cpu_tlbstate.cr4. The TIF flag and that per cpu thing are kept in
sync.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 3/3] x86/process: Optimize TIF_NOTSC switch
  2016-12-16  8:50     ` Thomas Gleixner
@ 2016-12-16 18:34       ` Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2016-12-16 18:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, X86 ML, Peter Zijlstra, Kyle Huey, Andy Lutomirski

On Fri, Dec 16, 2016 at 12:50 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 15 Dec 2016, Andy Lutomirski wrote:
>> On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > +static inline void cr4_toggle_bits(unsigned long mask)
>> > +{
>> > +       unsigned long cr4;
>> > +
>> > +       cr4 = this_cpu_read(cpu_tlbstate.cr4);
>> > +       cr4 ^= mask;
>> > +       this_cpu_write(cpu_tlbstate.cr4, cr4);
>> > +       __write_cr4(cr4);
>> > +}
>>
>> This scares me for the same reason as BTF, although this should at
>> least be less fragile.  But how about:
>
> If that is fragile then all cr4 manipulation code is fragile because it
> relies on cpu_tlbstate.cr4. The TIF flag and that per cpu thing are kept in
> sync.

True.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch
  2016-12-16  8:47     ` Thomas Gleixner
@ 2016-12-16 19:22       ` Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2016-12-16 19:22 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, X86 ML, Peter Zijlstra, Kyle Huey, Andy Lutomirski

On Fri, Dec 16, 2016 at 12:47 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 15 Dec 2016, Andy Lutomirski wrote:
>> On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> > +static inline void toggle_debugctlmsr(unsigned long mask)
>> > +{
>> > +       unsigned long msrval;
>> > +
>> > +#ifndef CONFIG_X86_DEBUGCTLMSR
>> > +       if (boot_cpu_data.x86 < 6)
>> > +               return;
>> > +#endif
>> > +       rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
>> > +       wrmsrl(MSR_IA32_DEBUGCTLMSR, msrval ^ mask);
>> > +}
>> > +
>>
>> This scares me.  If the MSR ever gets out of sync with the TI flag,
>> this will malfunction.  And IIRC the MSR is highly magical and the CPU
>> clears it all by itself under a variety of not-so-well documented
>> circumstances.
>
> If that is true, then the code today is broken as well, when the flag has
> been cleared and both prev and next have the flag set. Then it won't be
> updated for the next task.
>
> The we should not use the TIF flag and store a debugmask in thread info and
> do:
>
>         if (prev->debugmask || next->debugmask) {
>                 if (static_cpu_has(X86_FEATURE_BLOCKSTEP)) {
>                         rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
>                         msrval &= DEBUGCTLMSR_BTF;
>                         msrval |= next->debugmask;
>                 }
>         }

Seems reasonable to me.  Although keeping it in flags might simplify
the logic a bit.  FWIW, I doubt we care about performance much when
either prev or next has the bit set.

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-12-16 19:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-15 16:44 [patch 0/3] x86/process: Optimize __switch_to_extra() Thomas Gleixner
2016-12-15 16:44 ` [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch Thomas Gleixner
2016-12-15 17:28   ` Andy Lutomirski
2016-12-16  8:47     ` Thomas Gleixner
2016-12-16 19:22       ` Andy Lutomirski
2016-12-15 16:44 ` [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra() Thomas Gleixner
2016-12-15 17:20   ` Peter Zijlstra
2016-12-15 17:26     ` Thomas Gleixner
2016-12-15 17:33       ` Peter Zijlstra
2016-12-15 17:24   ` Andy Lutomirski
2016-12-15 16:44 ` [patch 3/3] x86/process: Optimize TIF_NOTSC switch Thomas Gleixner
2016-12-15 17:31   ` Andy Lutomirski
2016-12-16  8:50     ` Thomas Gleixner
2016-12-16 18:34       ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).