All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/current: Provide additional information to optimise get_cpu_info()
@ 2014-09-01 10:58 Andrew Cooper
  2014-09-01 11:24 ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2014-09-01 10:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Exactly as with c/s d55c5eefe "x86: use compiler visible "add" instead of
inline assembly "or" in get_cpu_info()", this is achieved by providing more
information to the compiler.

With this modification, gcc replaces the older:
    mov imm, %reg
    and %rsp, %reg

with:
    mov %rsp, %reg
    and imm, %reg

which is one byte shorter.  It also considers all general purpose registers
for %reg rather than just the legacy ones (i.e. will now use %r12 etc), which
allows for better register scheduling in larger functions.

This causes a net drop of almost 4K of .text

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/include/asm-x86/current.h |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 2081015..b95fd79 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -25,9 +25,9 @@ struct cpu_info {
 
 static inline struct cpu_info *get_cpu_info(void)
 {
-    unsigned long tos;
-    __asm__ ( "and %%rsp,%0" : "=r" (tos) : "0" (~(STACK_SIZE-1)) );
-    return (struct cpu_info *)(tos + STACK_SIZE) - 1;
+    register unsigned long sp asm("rsp");
+
+    return (struct cpu_info *)((sp & ~(STACK_SIZE-1)) + STACK_SIZE) - 1;
 }
 
 #define get_current()         (get_cpu_info()->current_vcpu)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/current: Provide additional information to optimise get_cpu_info()
  2014-09-01 10:58 [PATCH] x86/current: Provide additional information to optimise get_cpu_info() Andrew Cooper
@ 2014-09-01 11:24 ` Jan Beulich
  2014-09-01 12:18   ` Andrew Cooper
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2014-09-01 11:24 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 01.09.14 at 12:58, <andrew.cooper3@citrix.com> wrote:
> Exactly as with c/s d55c5eefe "x86: use compiler visible "add" instead of
> inline assembly "or" in get_cpu_info()", this is achieved by providing more
> information to the compiler.
> 
> With this modification, gcc replaces the older:
>     mov imm, %reg
>     and %rsp, %reg
> 
> with:
>     mov %rsp, %reg
>     and imm, %reg
> 
> which is one byte shorter.

I'm in no way opposed to the change, but is that really true? Afaict
it can be 1 byte shorter only when %rax gets selected as the register
here.

>  It also considers all general purpose registers
> for %reg rather than just the legacy ones (i.e. will now use %r12 etc), 
> which
> allows for better register scheduling in larger functions.

Same here - why would with the old code not all registers be
available for selection by the compiler?

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/current: Provide additional information to optimise get_cpu_info()
  2014-09-01 11:24 ` Jan Beulich
@ 2014-09-01 12:18   ` Andrew Cooper
  2014-09-01 12:32     ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2014-09-01 12:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 01/09/14 12:24, Jan Beulich wrote:
>>>> On 01.09.14 at 12:58, <andrew.cooper3@citrix.com> wrote:
>> Exactly as with c/s d55c5eefe "x86: use compiler visible "add" instead of
>> inline assembly "or" in get_cpu_info()", this is achieved by providing more
>> information to the compiler.
>>
>> With this modification, gcc replaces the older:
>>     mov imm, %reg
>>     and %rsp, %reg
>>
>> with:
>>     mov %rsp, %reg
>>     and imm, %reg
>>
>> which is one byte shorter.
> I'm in no way opposed to the change, but is that really true? Afaict
> it can be 1 byte shorter only when %rax gets selected as the register
> here.

Oh - quite possibly only %rax, but that still makes up the majority of
instances in shorter functions, where %rax was previously chosen as well.

I also note that the exact position of the lookup gets deferred in some
cases until after an early exit from the function.

>
>>  It also considers all general purpose registers
>> for %reg rather than just the legacy ones (i.e. will now use %r12 etc), 
>> which
>> allows for better register scheduling in larger functions.
> Same here - why would with the old code not all registers be
> available for selection by the compiler?

I suspect it has something to do with the choices available from the asm
parameter.  There no mnemonics to specify the newer registers, which is
a holdover from the 32bit days.  I suspect there is some implicit limit
to just the legacy GPRs.

Either way, my observations of the change in generated asm is that
before the change, no REX.R registers were used, whereas they are used
afterwards.

~Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/current: Provide additional information to optimise get_cpu_info()
  2014-09-01 12:18   ` Andrew Cooper
@ 2014-09-01 12:32     ` Jan Beulich
  2014-09-01 15:27       ` [PATCH v2] " Andrew Cooper
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2014-09-01 12:32 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 01.09.14 at 14:18, <andrew.cooper3@citrix.com> wrote:
> On 01/09/14 12:24, Jan Beulich wrote:
>>>>> On 01.09.14 at 12:58, <andrew.cooper3@citrix.com> wrote:
>>> Exactly as with c/s d55c5eefe "x86: use compiler visible "add" instead of
>>> inline assembly "or" in get_cpu_info()", this is achieved by providing more
>>> information to the compiler.
>>>
>>> With this modification, gcc replaces the older:
>>>     mov imm, %reg
>>>     and %rsp, %reg
>>>
>>> with:
>>>     mov %rsp, %reg
>>>     and imm, %reg
>>>
>>> which is one byte shorter.
>> I'm in no way opposed to the change, but is that really true? Afaict
>> it can be 1 byte shorter only when %rax gets selected as the register
>> here.
> 
> Oh - quite possibly only %rax, but that still makes up the majority of
> instances in shorter functions, where %rax was previously chosen as well.
> 
> I also note that the exact position of the lookup gets deferred in some
> cases until after an early exit from the function.
> 
>>
>>>  It also considers all general purpose registers
>>> for %reg rather than just the legacy ones (i.e. will now use %r12 etc), 
>>> which
>>> allows for better register scheduling in larger functions.
>> Same here - why would with the old code not all registers be
>> available for selection by the compiler?
> 
> I suspect it has something to do with the choices available from the asm
> parameter.  There no mnemonics to specify the newer registers, which is
> a holdover from the 32bit days.  I suspect there is some implicit limit
> to just the legacy GPRs.

Certainly not (minus bugs in the compiler) - "r" definitely includes
all registers other than %rsp.

> Either way, my observations of the change in generated asm is that
> before the change, no REX.R registers were used, whereas they are used
> afterwards.

Perhaps really just happens to be that way.

In any event, I'd prefer if you could either strip or correct
imprecise parts from the description, just to not misguide anyone
looking at the change later.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] x86/current: Provide additional information to optimise get_cpu_info()
  2014-09-01 12:32     ` Jan Beulich
@ 2014-09-01 15:27       ` Andrew Cooper
  2014-09-13 16:10         ` Marcin Cieslak
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2014-09-01 15:27 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Exactly as with c/s d55c5eefe "x86: use compiler visible "add" instead of
inline assembly "or" in get_cpu_info()", this is achieved by providing more
information to the compiler.

This causes a net drop of almost 4K of .text

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>

---
v2: Less speculation about generated code in the comment
---
 xen/include/asm-x86/current.h |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 2081015..b95fd79 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -25,9 +25,9 @@ struct cpu_info {
 
 static inline struct cpu_info *get_cpu_info(void)
 {
-    unsigned long tos;
-    __asm__ ( "and %%rsp,%0" : "=r" (tos) : "0" (~(STACK_SIZE-1)) );
-    return (struct cpu_info *)(tos + STACK_SIZE) - 1;
+    register unsigned long sp asm("rsp");
+
+    return (struct cpu_info *)((sp & ~(STACK_SIZE-1)) + STACK_SIZE) - 1;
 }
 
 #define get_current()         (get_cpu_info()->current_vcpu)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] x86/current: Provide additional information to optimise get_cpu_info()
  2014-09-01 15:27       ` [PATCH v2] " Andrew Cooper
@ 2014-09-13 16:10         ` Marcin Cieslak
  2014-09-15  8:16           ` Andrew Cooper
  0 siblings, 1 reply; 7+ messages in thread
From: Marcin Cieslak @ 2014-09-13 16:10 UTC (permalink / raw)
  To: xen-devel

>> Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> Exactly as with c/s d55c5eefe "x86: use compiler visible "add" instead of
> inline assembly "or" in get_cpu_info()", this is achieved by providing more
> information to the compiler.
>
> This causes a net drop of almost 4K of .text
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Jan Beulich <JBeulich@suse.com>
>
> ---
> v2: Less speculation about generated code in the comment
> ---
>  xen/include/asm-x86/current.h |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
> index 2081015..b95fd79 100644
> --- a/xen/include/asm-x86/current.h
> +++ b/xen/include/asm-x86/current.h
> @@ -25,9 +25,9 @@ struct cpu_info {
>  
>  static inline struct cpu_info *get_cpu_info(void)
>  {
> -    unsigned long tos;
> -    __asm__ ( "and %%rsp,%0" : "=r" (tos) : "0" (~(STACK_SIZE-1)) );
> -    return (struct cpu_info *)(tos + STACK_SIZE) - 1;
> +    register unsigned long sp asm("rsp");
> +
> +    return (struct cpu_info *)((sp & ~(STACK_SIZE-1)) + STACK_SIZE) - 1;
>  }

Hello, it seems to me that the above code fails on me with clang 3.4
on FreeBSD-CURRENT:

xen/include/asm/current.h:30:33:
	error: variable 'sp' is uninitialized when used here [-Werror,-Wuninitialized]

Reverting df0ae94fd56d5f9c64089364efecb1793442360b helps.

There is a workaround suggested in http://llvm.org/devmtg/2014-02/slides/moller-llvmlinux.pdf:

diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index b95fd79..e133d9d 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -26,6 +26,7 @@ struct cpu_info {
 static inline struct cpu_info *get_cpu_info(void)
 {
     register unsigned long sp asm("rsp");
+    asm("" : "=r" (sp));
 
     return (struct cpu_info *)((sp & ~(STACK_SIZE-1)) + STACK_SIZE) - 1;
 }

That silences the warning (not sure it it works).

//Marcin

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] x86/current: Provide additional information to optimise get_cpu_info()
  2014-09-13 16:10         ` Marcin Cieslak
@ 2014-09-15  8:16           ` Andrew Cooper
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Cooper @ 2014-09-15  8:16 UTC (permalink / raw)
  To: Marcin Cieslak, xen-devel

On 13/09/2014 17:10, Marcin Cieslak wrote:
>>> Andrew Cooper<andrew.cooper3@citrix.com>  wrote:
>> Exactly as with c/s d55c5eefe "x86: use compiler visible "add" instead of
>> inline assembly "or" in get_cpu_info()", this is achieved by providing more
>> information to the compiler.
>>
>> This causes a net drop of almost 4K of .text
>>
>> Signed-off-by: Andrew Cooper<andrew.cooper3@citrix.com>
>> CC: Jan Beulich<JBeulich@suse.com>
>>
>> ---
>> v2: Less speculation about generated code in the comment
>> ---
>>   xen/include/asm-x86/current.h |    6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
>> index 2081015..b95fd79 100644
>> --- a/xen/include/asm-x86/current.h
>> +++ b/xen/include/asm-x86/current.h
>> @@ -25,9 +25,9 @@ struct cpu_info {
>>   
>>   static inline struct cpu_info *get_cpu_info(void)
>>   {
>> -    unsigned long tos;
>> -    __asm__ ( "and %%rsp,%0" : "=r" (tos) : "0" (~(STACK_SIZE-1)) );
>> -    return (struct cpu_info *)(tos + STACK_SIZE) - 1;
>> +    register unsigned long sp asm("rsp");
>> +
>> +    return (struct cpu_info *)((sp & ~(STACK_SIZE-1)) + STACK_SIZE) - 1;
>>   }
> Hello, it seems to me that the above code fails on me with clang 3.4
> on FreeBSD-CURRENT:
>
> xen/include/asm/current.h:30:33:
> 	error: variable 'sp' is uninitialized when used here [-Werror,-Wuninitialized]

That is rather unfortunate for clang.  The stack pointer has an 
initialised and perfectly good value anywhere this function can be used.

> Reverting df0ae94fd56d5f9c64089364efecb1793442360b helps.
>
> There is a workaround suggested inhttp://llvm.org/devmtg/2014-02/slides/moller-llvmlinux.pdf:
>
> diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
> index b95fd79..e133d9d 100644
> --- a/xen/include/asm-x86/current.h
> +++ b/xen/include/asm-x86/current.h
> @@ -26,6 +26,7 @@ struct cpu_info {
>   static inline struct cpu_info *get_cpu_info(void)
>   {
>       register unsigned long sp asm("rsp");
> +    asm("" : "=r" (sp));
>   
>       return (struct cpu_info *)((sp & ~(STACK_SIZE-1)) + STACK_SIZE) - 1;
>   }
>
> That silences the warning (not sure it it works).

It functions under GCC as well, but undoes some (but not all of) the 
improvements introduced as a result of df0ae94f.

It would probably be acceptable in a suitable #ifdef, along with comment 
why this seemingly redundant statement is present.

~Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-15  8:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-01 10:58 [PATCH] x86/current: Provide additional information to optimise get_cpu_info() Andrew Cooper
2014-09-01 11:24 ` Jan Beulich
2014-09-01 12:18   ` Andrew Cooper
2014-09-01 12:32     ` Jan Beulich
2014-09-01 15:27       ` [PATCH v2] " Andrew Cooper
2014-09-13 16:10         ` Marcin Cieslak
2014-09-15  8:16           ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.