kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] selftests: kvm: Fix an unexpected failure with newer gcc compiler
@ 2020-08-14 13:21 Yang Weijiang
  2020-08-17 16:42 ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: Yang Weijiang @ 2020-08-14 13:21 UTC (permalink / raw)
  To: kvm, pbonzini, shuah, peterx, linux-kselftest; +Cc: Yang Weijiang

If debug_regs.c is built with newer gcc, e.g., 8.3.1 on my side, then the generated
binary looks like over-optimized by gcc:

asm volatile("ss_start: "
             "xor %%rax,%%rax\n\t"
             "cpuid\n\t"
             "movl $0x1a0,%%ecx\n\t"
             "rdmsr\n\t"
             : : : "rax", "ecx");

is translated to :

  000000000040194e <ss_start>:
  40194e:       31 c0                   xor    %eax,%eax     <----- rax->eax?
  401950:       0f a2                   cpuid
  401952:       b9 a0 01 00 00          mov    $0x1a0,%ecx
  401957:       0f 32                   rdmsr

As you can see rax is replaced with eax in taret binary code.
But if I replace %%rax with %%r8 or any GPR from r8~15, then I get below
expected binary:

0000000000401950 <ss_start>:
  401950:       45 31 ff                xor    %r15d,%r15d
  401953:       0f a2                   cpuid
  401955:       b9 a0 01 00 00          mov    $0x1a0,%ecx
  40195a:       0f 32                   rdmsr

The difference is the length of xor instruction(2 Byte vs 3 Byte),
so this makes below hard-coded instruction length cannot pass runtime check:

        /* Instruction lengths starting at ss_start */
        int ss_size[4] = {
                3,              /* xor */   <-------- 2 or 3?
                2,              /* cpuid */
                5,              /* mov */
                2,              /* rdmsr */
        };
Note:
Use 8.2.1 or older gcc, it generates expected 3 bytes xor target code.

I use the default Makefile to build the binaries, and I cannot figure out why this
happens, so it comes this patch, maybe you have better solution to resolve the
issue. If you know how things work in this way, please let me know, thanks!

Below is the capture from my environments:
========================================================================
gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

0000000000401950 <ss_start>:
  401950:       45 31 ff                xor    %r15d,%r15d
  401953:       0f a2                   cpuid
  401955:       b9 a0 01 00 00          mov    $0x1a0,%ecx
  40195a:       0f 32                   rdmsr

  000000000040194f <ss_start>:
  40194f:       31 db                   xor    %ebx,%ebx
  401951:       0f a2                   cpuid
  401953:       b9 a0 01 00 00          mov    $0x1a0,%ecx
  401958:       0f 32                   rdmsr

  000000000040194e <ss_start>:
  40194e:       31 c0                   xor    %eax,%eax
  401950:       0f a2                   cpuid
  401952:       b9 a0 01 00 00          mov    $0x1a0,%ecx
  401957:       0f 32                   rdmsr

==========================================================================

gcc (GCC) 8.2.1 20180905 (Red Hat 8.2.1-3)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

0000000000401750 <ss_start>:
  401750:       48 31 c0                xor    %rax,%rax
  401753:       0f a2                   cpuid
  401755:       b9 a0 01 00 00          mov    $0x1a0,%ecx
  40175a:       0f 32                   rdmsr

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 tools/testing/selftests/kvm/x86_64/debug_regs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/debug_regs.c b/tools/testing/selftests/kvm/x86_64/debug_regs.c
index 8162c58a1234..74641cfa8ace 100644
--- a/tools/testing/selftests/kvm/x86_64/debug_regs.c
+++ b/tools/testing/selftests/kvm/x86_64/debug_regs.c
@@ -40,11 +40,11 @@ static void guest_code(void)
 
 	/* Single step test, covers 2 basic instructions and 2 emulated */
 	asm volatile("ss_start: "
-		     "xor %%rax,%%rax\n\t"
+		     "xor %%r15,%%r15\n\t"
 		     "cpuid\n\t"
 		     "movl $0x1a0,%%ecx\n\t"
 		     "rdmsr\n\t"
-		     : : : "rax", "ecx");
+		     : : : "r15", "ecx");
 
 	/* DR6.BD test */
 	asm volatile("bd_start: mov %%dr0, %%rax" : : : "rax");
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] selftests: kvm: Fix an unexpected failure with newer gcc compiler
  2020-08-14 13:21 [PATCH] selftests: kvm: Fix an unexpected failure with newer gcc compiler Yang Weijiang
@ 2020-08-17 16:42 ` Sean Christopherson
  2020-08-17 17:19   ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2020-08-17 16:42 UTC (permalink / raw)
  To: Yang Weijiang; +Cc: kvm, pbonzini, shuah, peterx, linux-kselftest

On Fri, Aug 14, 2020 at 09:21:05PM +0800, Yang Weijiang wrote:
> If debug_regs.c is built with newer gcc, e.g., 8.3.1 on my side, then the generated
> binary looks like over-optimized by gcc:
> 
> asm volatile("ss_start: "
>              "xor %%rax,%%rax\n\t"
>              "cpuid\n\t"
>              "movl $0x1a0,%%ecx\n\t"
>              "rdmsr\n\t"
>              : : : "rax", "ecx");
> 
> is translated to :
> 
>   000000000040194e <ss_start>:
>   40194e:       31 c0                   xor    %eax,%eax     <----- rax->eax?
>   401950:       0f a2                   cpuid
>   401952:       b9 a0 01 00 00          mov    $0x1a0,%ecx
>   401957:       0f 32                   rdmsr
> 
> As you can see rax is replaced with eax in taret binary code.

It's an optimization.  `xor rax, rax` and `xor eax, eax` yield the exact
same result, as writing the lower 32 bits of a GPR in 64-bit mode clears
the upper 32 bits.  Using the eax variant avoids the REX prefix and saves
a byte of code.

> But if I replace %%rax with %%r8 or any GPR from r8~15, then I get below
> expected binary:
> 
> 0000000000401950 <ss_start>:
>   401950:       45 31 ff                xor    %r15d,%r15d

This is not replacing %rax with %r15, it's replacing it with %r15d, which
is the equivalent of %eax.  But that's beside the point.  Encoding GPRs
r8-r15 requires a REX prefix, so even though you avoid REX.W you still need
REX.R, and thus end up with a 3 byte instruction.

>   401953:       0f a2                   cpuid

Note, CPUID consumes EAX.  It doesn't look like the code actually consumes
the CPUID output, but switching to r15 is at best bizarre.

>   401955:       b9 a0 01 00 00          mov    $0x1a0,%ecx
>   40195a:       0f 32                   rdmsr
> 
> The difference is the length of xor instruction(2 Byte vs 3 Byte),
> so this makes below hard-coded instruction length cannot pass runtime check:
> 
>         /* Instruction lengths starting at ss_start */
>         int ss_size[4] = {
>                 3,              /* xor */   <-------- 2 or 3?
>                 2,              /* cpuid */
>                 5,              /* mov */
>                 2,              /* rdmsr */
>         };
> Note:
> Use 8.2.1 or older gcc, it generates expected 3 bytes xor target code.
> 
> I use the default Makefile to build the binaries, and I cannot figure out why this
> happens, so it comes this patch, maybe you have better solution to resolve the
> issue. If you know how things work in this way, please let me know, thanks!

Use `xor %%eax, %%eax`.  That should always generate a 2 byte instruction.
Encoding a 64-bit operation would technically be legal, but I doubt any
compiler would do that in practice.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] selftests: kvm: Fix an unexpected failure with newer gcc compiler
  2020-08-17 16:42 ` Sean Christopherson
@ 2020-08-17 17:19   ` Paolo Bonzini
  2020-08-18 13:25     ` Yang Weijiang
  0 siblings, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2020-08-17 17:19 UTC (permalink / raw)
  To: Sean Christopherson, Yang Weijiang; +Cc: kvm, shuah, peterx, linux-kselftest

On 17/08/20 18:42, Sean Christopherson wrote:
> On Fri, Aug 14, 2020 at 09:21:05PM +0800, Yang Weijiang wrote:
>> If debug_regs.c is built with newer gcc, e.g., 8.3.1 on my side, then the generated
>> binary looks like over-optimized by gcc:
>>
>> asm volatile("ss_start: "
>>              "xor %%rax,%%rax\n\t"
>>              "cpuid\n\t"
>>              "movl $0x1a0,%%ecx\n\t"
>>              "rdmsr\n\t"
>>              : : : "rax", "ecx");
>>
>> is translated to :
>>
>>   000000000040194e <ss_start>:
>>   40194e:       31 c0                   xor    %eax,%eax     <----- rax->eax?
>>   401950:       0f a2                   cpuid
>>   401952:       b9 a0 01 00 00          mov    $0x1a0,%ecx
>>   401957:       0f 32                   rdmsr
>>
>> As you can see rax is replaced with eax in taret binary code.
> 
> It's an optimization.  `xor rax, rax` and `xor eax, eax` yield the exact
> same result, as writing the lower 32 bits of a GPR in 64-bit mode clears
> the upper 32 bits.  Using the eax variant avoids the REX prefix and saves
> a byte of code.

I would have expected that from binutils though, not GCC.

> Use `xor %%eax, %%eax`.  That should always generate a 2 byte instruction.
> Encoding a 64-bit operation would technically be legal, but I doubt any
> compiler would do that in practice.

Indeed, and in addition the clobbers are incorrect since they miss rbx
and rdx.  I've sent a patch.

Paolo


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] selftests: kvm: Fix an unexpected failure with newer gcc compiler
  2020-08-17 17:19   ` Paolo Bonzini
@ 2020-08-18 13:25     ` Yang Weijiang
  0 siblings, 0 replies; 4+ messages in thread
From: Yang Weijiang @ 2020-08-18 13:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Yang Weijiang, kvm, shuah, peterx, linux-kselftest

On Mon, Aug 17, 2020 at 07:19:17PM +0200, Paolo Bonzini wrote:
> On 17/08/20 18:42, Sean Christopherson wrote:
> > On Fri, Aug 14, 2020 at 09:21:05PM +0800, Yang Weijiang wrote:
> >> If debug_regs.c is built with newer gcc, e.g., 8.3.1 on my side, then the generated
> >> binary looks like over-optimized by gcc:
> >>
> >> asm volatile("ss_start: "
> >>              "xor %%rax,%%rax\n\t"
> >>              "cpuid\n\t"
> >>              "movl $0x1a0,%%ecx\n\t"
> >>              "rdmsr\n\t"
> >>              : : : "rax", "ecx");
> >>
> >> is translated to :
> >>
> >>   000000000040194e <ss_start>:
> >>   40194e:       31 c0                   xor    %eax,%eax     <----- rax->eax?
> >>   401950:       0f a2                   cpuid
> >>   401952:       b9 a0 01 00 00          mov    $0x1a0,%ecx
> >>   401957:       0f 32                   rdmsr
> >>
> >> As you can see rax is replaced with eax in taret binary code.
> > 
> > It's an optimization.  `xor rax, rax` and `xor eax, eax` yield the exact
> > same result, as writing the lower 32 bits of a GPR in 64-bit mode clears
> > the upper 32 bits.  Using the eax variant avoids the REX prefix and saves
> > a byte of code.
> 
> I would have expected that from binutils though, not GCC.
> 
> > Use `xor %%eax, %%eax`.  That should always generate a 2 byte instruction.
> > Encoding a 64-bit operation would technically be legal, but I doubt any
> > compiler would do that in practice.
> 
> Indeed, and in addition the clobbers are incorrect since they miss rbx
> and rdx.  I've sent a patch.
>
Thanks Paolo and Sean for the feedback!

> Paolo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-08-18 13:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-14 13:21 [PATCH] selftests: kvm: Fix an unexpected failure with newer gcc compiler Yang Weijiang
2020-08-17 16:42 ` Sean Christopherson
2020-08-17 17:19   ` Paolo Bonzini
2020-08-18 13:25     ` Yang Weijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).