All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen HVM regression on certain Intel CPUs
@ 2013-03-27 15:26 Stefan Bader
  2013-03-27 15:53 ` Stefan Bader
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Bader @ 2013-03-27 15:26 UTC (permalink / raw)
  To: xen-devel; +Cc: H. Peter Anvin, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 2152 bytes --]

Recently I ran some experiments on newer hardware and realized that when booting
any kernel newer or equal to v3.5 (Xen version 4.2.1) in 64bit mode would fail
to bring up any APs (message about CPU Stuck). I was able to normally bisect
into a range of realmode changes and then manually drill down to the following
commit:

commit cda846f101fb1396b6924f1d9b68ac3d42de5403
Author: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
Date:   Tue May 8 21:22:46 2012 +0300

    x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline

    This patch changes 64-bit trampoline so that CR4 and
    EFER are provided by the kernel instead of using fixed
    values.

From the Xen debugging console it was possible to gather a bit more data which
pointed to a failure very close to setting CR4 in startup_32. On this particular
hardware the saved CR4 (about to be set) was 0x1407f0.

This would set two flags that somehow feel dangerous: PGE (page global enable)
and SMEP (supervisor mode execution protection). SMEP turns out to be the main
offender and the following change allows the APs to start:

--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -93,7 +93,9 @@ ENTRY(startup_32)
        movl    %edx, %fs
        movl    %edx, %gs

-       movl    pa_tr_cr4, %eax
+       movl    $X86_CR4_SMEP, %eax
+       notl    %eax
+       andl    pa_tr_cr4, %eax
        movl    %eax, %cr4              # Enable PAE mode

        # Setup trampoline 4 level pagetables

Now I am not completely convinced that this is really the way to go. Likely the
Xen hypervisor should not start up the guest with CR4 on the BP containing those
flags. But maybe it still makes sense to mask some dangerous ones off in the
realmode code (btw, it seemed that masking the assignments in arch_setup or
setup_realmode did not work).

And finally I am wondering why the SMEP flag in CR4 is set anyway. My
understanding would be that this should only be done if cpuid[7].ebx has bit7
set. And this does not seem to be the case at least on the one box I was doing
the bisection on.

-Stefan


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 15:26 Xen HVM regression on certain Intel CPUs Stefan Bader
@ 2013-03-27 15:53 ` Stefan Bader
  2013-03-27 16:04   ` Konrad Rzeszutek Wilk
  2013-03-27 16:18   ` H. Peter Anvin
  0 siblings, 2 replies; 30+ messages in thread
From: Stefan Bader @ 2013-03-27 15:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Konrad Rzeszutek Wilk, H. Peter Anvin


[-- Attachment #1.1: Type: text/plain, Size: 2673 bytes --]

On 27.03.2013 16:26, Stefan Bader wrote:
> Recently I ran some experiments on newer hardware and realized that when booting
> any kernel newer or equal to v3.5 (Xen version 4.2.1) in 64bit mode would fail
> to bring up any APs (message about CPU Stuck). I was able to normally bisect
> into a range of realmode changes and then manually drill down to the following
> commit:
> 
> commit cda846f101fb1396b6924f1d9b68ac3d42de5403
> Author: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
> Date:   Tue May 8 21:22:46 2012 +0300
> 
>     x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline
> 
>     This patch changes 64-bit trampoline so that CR4 and
>     EFER are provided by the kernel instead of using fixed
>     values.
> 
> From the Xen debugging console it was possible to gather a bit more data which
> pointed to a failure very close to setting CR4 in startup_32. On this particular
> hardware the saved CR4 (about to be set) was 0x1407f0.
> 
> This would set two flags that somehow feel dangerous: PGE (page global enable)
> and SMEP (supervisor mode execution protection). SMEP turns out to be the main
> offender and the following change allows the APs to start:
> 
> --- a/arch/x86/realmode/rm/trampoline_64.S
> +++ b/arch/x86/realmode/rm/trampoline_64.S
> @@ -93,7 +93,9 @@ ENTRY(startup_32)
>         movl    %edx, %fs
>         movl    %edx, %gs
> 
> -       movl    pa_tr_cr4, %eax
> +       movl    $X86_CR4_SMEP, %eax
> +       notl    %eax
> +       andl    pa_tr_cr4, %eax
>         movl    %eax, %cr4              # Enable PAE mode
> 
>         # Setup trampoline 4 level pagetables
> 
> Now I am not completely convinced that this is really the way to go. Likely the
> Xen hypervisor should not start up the guest with CR4 on the BP containing those
> flags. But maybe it still makes sense to mask some dangerous ones off in the
> realmode code (btw, it seemed that masking the assignments in arch_setup or
> setup_realmode did not work).
> 
> And finally I am wondering why the SMEP flag in CR4 is set anyway. My
> understanding would be that this should only be done if cpuid[7].ebx has bit7
> set. And this does not seem to be the case at least on the one box I was doing
> the bisection on.

Seems that I was relying on the wrong source of information when checking SMEP
support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that
at least explains where that comes from... sorry for that.
> 
> -Stefan
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 15:53 ` Stefan Bader
@ 2013-03-27 16:04   ` Konrad Rzeszutek Wilk
  2013-03-27 16:09     ` H. Peter Anvin
  2013-03-27 16:45     ` Stefan Bader
  2013-03-27 16:18   ` H. Peter Anvin
  1 sibling, 2 replies; 30+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-27 16:04 UTC (permalink / raw)
  To: Stefan Bader, wei.y.yang, haitao.shan, xin.li; +Cc: xen-devel, H. Peter Anvin

On Wed, Mar 27, 2013 at 04:53:16PM +0100, Stefan Bader wrote:
> On 27.03.2013 16:26, Stefan Bader wrote:
> > Recently I ran some experiments on newer hardware and realized that when booting
> > any kernel newer or equal to v3.5 (Xen version 4.2.1) in 64bit mode would fail
> > to bring up any APs (message about CPU Stuck). I was able to normally bisect
> > into a range of realmode changes and then manually drill down to the following
> > commit:
> > 
> > commit cda846f101fb1396b6924f1d9b68ac3d42de5403
> > Author: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
> > Date:   Tue May 8 21:22:46 2012 +0300
> > 
> >     x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline
> > 
> >     This patch changes 64-bit trampoline so that CR4 and
> >     EFER are provided by the kernel instead of using fixed
> >     values.
> > 
> > From the Xen debugging console it was possible to gather a bit more data which
> > pointed to a failure very close to setting CR4 in startup_32. On this particular
> > hardware the saved CR4 (about to be set) was 0x1407f0.
> > 
> > This would set two flags that somehow feel dangerous: PGE (page global enable)
> > and SMEP (supervisor mode execution protection). SMEP turns out to be the main
> > offender and the following change allows the APs to start:
> > 
> > --- a/arch/x86/realmode/rm/trampoline_64.S
> > +++ b/arch/x86/realmode/rm/trampoline_64.S
> > @@ -93,7 +93,9 @@ ENTRY(startup_32)
> >         movl    %edx, %fs
> >         movl    %edx, %gs
> > 
> > -       movl    pa_tr_cr4, %eax
> > +       movl    $X86_CR4_SMEP, %eax
> > +       notl    %eax
> > +       andl    pa_tr_cr4, %eax
> >         movl    %eax, %cr4              # Enable PAE mode
> > 
> >         # Setup trampoline 4 level pagetables
> > 
> > Now I am not completely convinced that this is really the way to go. Likely the
> > Xen hypervisor should not start up the guest with CR4 on the BP containing those
> > flags. But maybe it still makes sense to mask some dangerous ones off in the
> > realmode code (btw, it seemed that masking the assignments in arch_setup or
> > setup_realmode did not work).
> > 
> > And finally I am wondering why the SMEP flag in CR4 is set anyway. My
> > understanding would be that this should only be done if cpuid[7].ebx has bit7
> > set. And this does not seem to be the case at least on the one box I was doing
> > the bisection on.
> 
> Seems that I was relying on the wrong source of information when checking SMEP
> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that
> at least explains where that comes from... sorry for that.

OK, so if you boot Xen with smep=1 (which disables SMEP, kind of counterintuive flag)
that would work fine?

CC-ing the Intel folks who added this in.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 16:04   ` Konrad Rzeszutek Wilk
@ 2013-03-27 16:09     ` H. Peter Anvin
  2013-03-27 16:24       ` Stefan Bader
  2013-03-27 16:45     ` Stefan Bader
  1 sibling, 1 reply; 30+ messages in thread
From: H. Peter Anvin @ 2013-03-27 16:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, Stefan Bader

On 03/27/2013 09:04 AM, Konrad Rzeszutek Wilk wrote:
>>>
>>> From the Xen debugging console it was possible to gather a bit more data which
>>> pointed to a failure very close to setting CR4 in startup_32. On this particular
>>> hardware the saved CR4 (about to be set) was 0x1407f0.
>>>
>>> This would set two flags that somehow feel dangerous: PGE (page global enable)
>>> and SMEP (supervisor mode execution protection). SMEP turns out to be the main
>>> offender and the following change allows the APs to start:
>>>
>>> --- a/arch/x86/realmode/rm/trampoline_64.S
>>> +++ b/arch/x86/realmode/rm/trampoline_64.S
>>> @@ -93,7 +93,9 @@ ENTRY(startup_32)
>>>         movl    %edx, %fs
>>>         movl    %edx, %gs
>>>
>>> -       movl    pa_tr_cr4, %eax
>>> +       movl    $X86_CR4_SMEP, %eax
>>> +       notl    %eax
>>> +       andl    pa_tr_cr4, %eax
>>>         movl    %eax, %cr4              # Enable PAE mode
>>>
>>>         # Setup trampoline 4 level pagetables
>>>
>>> Now I am not completely convinced that this is really the way to go. Likely the
>>> Xen hypervisor should not start up the guest with CR4 on the BP containing those
>>> flags. But maybe it still makes sense to mask some dangerous ones off in the
>>> realmode code (btw, it seemed that masking the assignments in arch_setup or
>>> setup_realmode did not work).
>>>
>>> And finally I am wondering why the SMEP flag in CR4 is set anyway. My
>>> understanding would be that this should only be done if cpuid[7].ebx has bit7
>>> set. And this does not seem to be the case at least on the one box I was doing
>>> the bisection on.
>>
>> Seems that I was relying on the wrong source of information when checking SMEP
>> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that
>> at least explains where that comes from... sorry for that.
> 
> OK, so if you boot Xen with smep=1 (which disables SMEP, kind of counterintuive flag)
> that would work fine?
> 
> CC-ing the Intel folks who added this in.
> 

If it is present in /proc/cpuinfo and not in cpuid it means the kernel
thinks it has SMEP but the CPU doesn't... an obvious case of fail.
However, *where the hell* does the bit come from in the first place?

That is what we need to track down.

When you say Xen HVM, am I correct in assuming that neither CPUID nor
CR4 operations in the main kernel are run through paravirt_ops?

	-hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 15:53 ` Stefan Bader
  2013-03-27 16:04   ` Konrad Rzeszutek Wilk
@ 2013-03-27 16:18   ` H. Peter Anvin
  1 sibling, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2013-03-27 16:18 UTC (permalink / raw)
  To: Stefan Bader; +Cc: xen-devel, Konrad Rzeszutek Wilk

On 03/27/2013 08:53 AM, Stefan Bader wrote:
> Seems that I was relying on the wrong source of information when
> checking SMEP support. The cpuid command seems at fail. But
> /proc/cpuinfo reports it. So that at least explains where that
> comes from... sorry for that.

What does /proc/cpuinfo and cpuid (or x86info) have for the BSP and
APs, respectively?  Any instance here of inconsistencies?

	-hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 16:09     ` H. Peter Anvin
@ 2013-03-27 16:24       ` Stefan Bader
  2013-03-27 16:32         ` H. Peter Anvin
  2013-03-27 16:32         ` Stefano Stabellini
  0 siblings, 2 replies; 30+ messages in thread
From: Stefan Bader @ 2013-03-27 16:24 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 3162 bytes --]

On 27.03.2013 17:09, H. Peter Anvin wrote:
> On 03/27/2013 09:04 AM, Konrad Rzeszutek Wilk wrote:
>>>>
>>>> From the Xen debugging console it was possible to gather a bit more data which
>>>> pointed to a failure very close to setting CR4 in startup_32. On this particular
>>>> hardware the saved CR4 (about to be set) was 0x1407f0.
>>>>
>>>> This would set two flags that somehow feel dangerous: PGE (page global enable)
>>>> and SMEP (supervisor mode execution protection). SMEP turns out to be the main
>>>> offender and the following change allows the APs to start:
>>>>
>>>> --- a/arch/x86/realmode/rm/trampoline_64.S
>>>> +++ b/arch/x86/realmode/rm/trampoline_64.S
>>>> @@ -93,7 +93,9 @@ ENTRY(startup_32)
>>>>         movl    %edx, %fs
>>>>         movl    %edx, %gs
>>>>
>>>> -       movl    pa_tr_cr4, %eax
>>>> +       movl    $X86_CR4_SMEP, %eax
>>>> +       notl    %eax
>>>> +       andl    pa_tr_cr4, %eax
>>>>         movl    %eax, %cr4              # Enable PAE mode
>>>>
>>>>         # Setup trampoline 4 level pagetables
>>>>
>>>> Now I am not completely convinced that this is really the way to go. Likely the
>>>> Xen hypervisor should not start up the guest with CR4 on the BP containing those
>>>> flags. But maybe it still makes sense to mask some dangerous ones off in the
>>>> realmode code (btw, it seemed that masking the assignments in arch_setup or
>>>> setup_realmode did not work).
>>>>
>>>> And finally I am wondering why the SMEP flag in CR4 is set anyway. My
>>>> understanding would be that this should only be done if cpuid[7].ebx has bit7
>>>> set. And this does not seem to be the case at least on the one box I was doing
>>>> the bisection on.
>>>
>>> Seems that I was relying on the wrong source of information when checking SMEP
>>> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that
>>> at least explains where that comes from... sorry for that.
>>
>> OK, so if you boot Xen with smep=1 (which disables SMEP, kind of counterintuive flag)
>> that would work fine?
>>
>> CC-ing the Intel folks who added this in.
>>
> 
> If it is present in /proc/cpuinfo and not in cpuid it means the kernel
> thinks it has SMEP but the CPU doesn't... an obvious case of fail.
> However, *where the hell* does the bit come from in the first place?

I did not yet have time to track down all sources but I thought that
/proc/cpuinfo is in some way assembled from whatever cpuid info the kernel has.
I am more suspicious of the cpuid command I was using. Let me check for x86info.

> 
> That is what we need to track down.
> 
> When you say Xen HVM, am I correct in assuming that neither CPUID nor
> CR4 operations in the main kernel are run through paravirt_ops?

Not paravirt ops likely but the hypervisor traps access. At least cpuid from
within a hvm guest I expect to be filtered. So when checking things I went to
bare-metal.

Will fetch more info and get back.

-Stefan
> 
> 	-hpa
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 16:24       ` Stefan Bader
@ 2013-03-27 16:32         ` H. Peter Anvin
  2013-03-27 16:32         ` Stefano Stabellini
  1 sibling, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2013-03-27 16:32 UTC (permalink / raw)
  To: Stefan Bader
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, Konrad Rzeszutek Wilk

[-- Attachment #1: Type: text/plain, Size: 739 bytes --]

On 03/27/2013 09:24 AM, Stefan Bader wrote:
>> 
>> When you say Xen HVM, am I correct in assuming that neither CPUID
>> nor CR4 operations in the main kernel are run through
>> paravirt_ops?
> 
> Not paravirt ops likely but the hypervisor traps access. At least
> cpuid from within a hvm guest I expect to be filtered. So when
> checking things I went to bare-metal.
> 
> Will fetch more info and get back.
> 

Hypervisor traps is one thing... they should be consistent no matter
where in the code they happen... unless they are broken.

Try this CPUID program.  This uses the kernel /dev interface which may
be somewhat suboptimal in case CPUID in userspace actually differs,
but it would be interesting to know what it outputs.

	-hpa



[-- Attachment #2: cpuid.c --]
[-- Type: text/x-csrc, Size: 3245 bytes --]

#define _FILE_OFFSET_BITS 64
#include <stdio.h>
#include <ctype.h>
#include <stdbool.h>
#include <stdint.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>

struct cpuid {
  uint32_t eax, ebx, ecx, edx;
};

static int cpuid(int cpu, uint32_t leaf, uint32_t subleaf, struct cpuid *data)
{
  static int fd = -1;
  static int last_cpu;
  off_t offset = leaf + ((off_t)subleaf << 32);

  if (fd < 0 || last_cpu != cpu) {
    char devstr[64];
    if (fd >= 0)
      close(fd);
    snprintf(devstr, sizeof devstr, "/dev/cpu/%d/cpuid", cpu);
    fd = open(devstr, O_RDONLY);
    last_cpu = cpu;
  }
  return pread(fd, data, sizeof(*data), offset) == sizeof(*data) ? 0 : -1;
}

static char *make_string(uint32_t val)
{
  static char string[5] = "xxxx";
  int i, ch;

  for ( i = 0 ; i < 4 ; i++ ) {
    ch = val & 0xff;
    string[i] = isprint(ch) ? ch : '.';
    val >>= 8;
  }

  return string;
}

static void print_cpuid_level(uint32_t leaf, uint32_t subleaf,
			      struct cpuid *lvl)
{
  printf("%08x %08x:  ", leaf, subleaf);
  printf("%08x %s  ", lvl->eax, make_string(lvl->eax));
  printf("%08x %s  ", lvl->ebx, make_string(lvl->ebx));
  printf("%08x %s  ", lvl->ecx, make_string(lvl->ecx));
  printf("%08x %s\n", lvl->edx, make_string(lvl->edx));
}

static void dump_cpuid_leaf(int cpu, uint32_t leaf)
{
  struct cpuid lvl, lastlvl, lvl0;
  uint32_t subleaf;

  cpuid(cpu, leaf, 0, &lvl0);
  print_cpuid_level(leaf, 0, &lvl0);

  /*
   * There is no standard mechanism for enumerating the number of
   * subleaves, this is a heuristic...
   */
  lastlvl = lvl0;

  for (subleaf = 1; subleaf != 0; subleaf++) {
    if (cpuid(cpu, leaf, subleaf, &lvl))
      return;

    switch (leaf) {
    case 4:
      if ((lvl.eax & 0x1f) == 0 || !memcmp(&lvl, &lastlvl, sizeof lvl))
	return;
      break;

    case 7:
      if (subleaf >= lvl0.eax)
	return;
      break;

    case 0xb:
      if ((lvl.ecx & ~0xff) == 0)
	return;

    case 0xd:
      if ((lvl.eax | lvl.ebx | lvl.ecx | lvl.edx) == 0)
	return;

    default:
      /* Generic, anticipatory rules */
      /* Exclude ecx here for levels which return the initial ecx value */
      if ((lvl.eax | lvl.ebx | lvl.ecx | lvl.edx) == 0)
	return;
    
      if (!memcmp(&lvl, &lvl0, sizeof lvl))
	return;
      break;
    }
    
    print_cpuid_level(leaf, subleaf, &lvl);

    lastlvl = lvl;
  }
}

static void dump_levels(int cpu, uint32_t region)
{
  static struct cpuid invalid_leaf;
  struct cpuid max;
  uint32_t n;

  if (cpuid(cpu, region, 0, &max))
    return;

  /*
   * Intel processors may return the last group 0 CPUID leaf instead
   * all zero for a not-present level
   */
  if (region == 0) {
    cpuid(cpu, max.eax+1, 0, &invalid_leaf);
  } else {
    if (!memcmp(&max, &invalid_leaf, sizeof(struct cpuid)))
      return;
  }

  if ( (max.eax & 0xffff0000) == region ) {
    for ( n = region ; n <= max.eax ; n++ ) {
      dump_cpuid_leaf(cpu, n);
    }
  }
}

int main(int argc, char *argv[])
{
  int cpu;
  uint32_t n;

  cpu = (argc > 1) ? atoi(argv[1]) : 0;
  
  printf("Leaf     Subleaf    EAX            EBX            ECX            EDX            \n");

  for ( n = 0 ; n <= 0xffff ; n++ ) {
    dump_levels(cpu, n << 16);
  }

  return 0;
}

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 16:24       ` Stefan Bader
  2013-03-27 16:32         ` H. Peter Anvin
@ 2013-03-27 16:32         ` Stefano Stabellini
  1 sibling, 0 replies; 30+ messages in thread
From: Stefano Stabellini @ 2013-03-27 16:32 UTC (permalink / raw)
  To: Stefan Bader
  Cc: xen-devel, Konrad Rzeszutek Wilk, wei.y.yang, haitao.shan,
	xin.li, H. Peter Anvin

On Wed, 27 Mar 2013, Stefan Bader wrote:
> On 27.03.2013 17:09, H. Peter Anvin wrote:
> > When you say Xen HVM, am I correct in assuming that neither CPUID nor
> > CR4 operations in the main kernel are run through paravirt_ops?
> 
> Not paravirt ops likely but the hypervisor traps access. At least cpuid from
> within a hvm guest I expect to be filtered. So when checking things I went to
> bare-metal.

That's right.
Both cr4 and cpuid are trapped, so from the Linux POV they look like
native ops, no paravirt_ops involved.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 16:04   ` Konrad Rzeszutek Wilk
  2013-03-27 16:09     ` H. Peter Anvin
@ 2013-03-27 16:45     ` Stefan Bader
  2013-03-27 16:52       ` H. Peter Anvin
                         ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Stefan Bader @ 2013-03-27 16:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, H. Peter Anvin


[-- Attachment #1.1: Type: text/plain, Size: 3632 bytes --]

On 27.03.2013 17:04, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 27, 2013 at 04:53:16PM +0100, Stefan Bader wrote:
>> On 27.03.2013 16:26, Stefan Bader wrote:
>>> Recently I ran some experiments on newer hardware and realized that when booting
>>> any kernel newer or equal to v3.5 (Xen version 4.2.1) in 64bit mode would fail
>>> to bring up any APs (message about CPU Stuck). I was able to normally bisect
>>> into a range of realmode changes and then manually drill down to the following
>>> commit:
>>>
>>> commit cda846f101fb1396b6924f1d9b68ac3d42de5403
>>> Author: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
>>> Date:   Tue May 8 21:22:46 2012 +0300
>>>
>>>     x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline
>>>
>>>     This patch changes 64-bit trampoline so that CR4 and
>>>     EFER are provided by the kernel instead of using fixed
>>>     values.
>>>
>>> From the Xen debugging console it was possible to gather a bit more data which
>>> pointed to a failure very close to setting CR4 in startup_32. On this particular
>>> hardware the saved CR4 (about to be set) was 0x1407f0.
>>>
>>> This would set two flags that somehow feel dangerous: PGE (page global enable)
>>> and SMEP (supervisor mode execution protection). SMEP turns out to be the main
>>> offender and the following change allows the APs to start:
>>>
>>> --- a/arch/x86/realmode/rm/trampoline_64.S
>>> +++ b/arch/x86/realmode/rm/trampoline_64.S
>>> @@ -93,7 +93,9 @@ ENTRY(startup_32)
>>>         movl    %edx, %fs
>>>         movl    %edx, %gs
>>>
>>> -       movl    pa_tr_cr4, %eax
>>> +       movl    $X86_CR4_SMEP, %eax
>>> +       notl    %eax
>>> +       andl    pa_tr_cr4, %eax
>>>         movl    %eax, %cr4              # Enable PAE mode
>>>
>>>         # Setup trampoline 4 level pagetables
>>>
>>> Now I am not completely convinced that this is really the way to go. Likely the
>>> Xen hypervisor should not start up the guest with CR4 on the BP containing those
>>> flags. But maybe it still makes sense to mask some dangerous ones off in the
>>> realmode code (btw, it seemed that masking the assignments in arch_setup or
>>> setup_realmode did not work).
>>>
>>> And finally I am wondering why the SMEP flag in CR4 is set anyway. My
>>> understanding would be that this should only be done if cpuid[7].ebx has bit7
>>> set. And this does not seem to be the case at least on the one box I was doing
>>> the bisection on.
>>
>> Seems that I was relying on the wrong source of information when checking SMEP
>> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that
>> at least explains where that comes from... sorry for that.
> 
> OK, so if you boot Xen with smep=1 (which disables SMEP, kind of counterintuive flag)
> that would work fine?

Rebooting with smep=1 as a hv argument does not fix it. But I would be careful
since I just quickly did this without checking whether Xen 4.2.1 undestands the
flag already.

Second using x86info --all on bare metal does show bits set for cpuid[7] and
/proc/cpuinfo values are consistent across BP and APs. So I am a tool for using
the wrong tool there.

So I would say the main issue to look at is why reading cr4 as a HVM guest
produces the flags on boot. Surely the hypervisor itself has set certain things
up but likely there are some epxectations about the initial setup on boot.

> 
> CC-ing the Intel folks who added this in.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 16:45     ` Stefan Bader
@ 2013-03-27 16:52       ` H. Peter Anvin
  2013-03-27 17:17         ` Stefan Bader
  2013-03-27 17:28       ` Stefan Bader
  2013-03-27 20:24       ` Keir Fraser
  2 siblings, 1 reply; 30+ messages in thread
From: H. Peter Anvin @ 2013-03-27 16:52 UTC (permalink / raw)
  To: Stefan Bader
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, Konrad Rzeszutek Wilk

On 03/27/2013 09:45 AM, Stefan Bader wrote:
> 
> Rebooting with smep=1 as a hv argument does not fix it. But I
> would be careful since I just quickly did this without checking
> whether Xen 4.2.1 undestands the flag already.
> 
> Second using x86info --all on bare metal does show bits set for 
> cpuid[7] and /proc/cpuinfo values are consistent across BP and
> APs. So I am a tool for using the wrong tool there.
> 
> So I would say the main issue to look at is why reading cr4 as a 
> HVM guest produces the flags on boot. Surely the hypervisor itself 
> has set certain things up but likely there are some epxectations 
> about the initial setup on boot.
> 

What does x86info and /proc/cpuinfo show in HVM?

The inbound %cr4 shouldn't matter at all, we try to not rely on it.

If the hypervisor presents SMEP to the guest then the guest is pretty
obviously going to try to use it.

	-hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 16:52       ` H. Peter Anvin
@ 2013-03-27 17:17         ` Stefan Bader
  2013-03-27 17:23           ` H. Peter Anvin
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Bader @ 2013-03-27 17:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 1532 bytes --]

On 27.03.2013 17:52, H. Peter Anvin wrote:
> On 03/27/2013 09:45 AM, Stefan Bader wrote:
>>
>> Rebooting with smep=1 as a hv argument does not fix it. But I
>> would be careful since I just quickly did this without checking
>> whether Xen 4.2.1 undestands the flag already.
>>
>> Second using x86info --all on bare metal does show bits set for 
>> cpuid[7] and /proc/cpuinfo values are consistent across BP and
>> APs. So I am a tool for using the wrong tool there.
>>
>> So I would say the main issue to look at is why reading cr4 as a 
>> HVM guest produces the flags on boot. Surely the hypervisor itself 
>> has set certain things up but likely there are some epxectations 
>> about the initial setup on boot.
>>
> 
> What does x86info and /proc/cpuinfo show in HVM?

x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep set.

> 
> The inbound %cr4 shouldn't matter at all, we try to not rely on it.
> 
> If the hypervisor presents SMEP to the guest then the guest is pretty
> obviously going to try to use it.

To me it looks like when bootstrapping the APs things are not yet ready to use
it. If I did not miss something, the only place that the saved contents of cr4
are used is in startup_32 when the cpus are brought up. And then just stop dead.
Would need to read more code but a bit weird why the BP is not affected.
> 
> 	-hpa
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 17:17         ` Stefan Bader
@ 2013-03-27 17:23           ` H. Peter Anvin
  2013-03-27 17:38             ` Stefan Bader
  2013-03-28 13:34             ` Jan Beulich
  0 siblings, 2 replies; 30+ messages in thread
From: H. Peter Anvin @ 2013-03-27 17:23 UTC (permalink / raw)
  To: Stefan Bader
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, Konrad Rzeszutek Wilk

On 03/27/2013 10:17 AM, Stefan Bader wrote:
>> What does x86info and /proc/cpuinfo show in HVM?
> 
> x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep
> set.

On all CPUs?

>> The inbound %cr4 shouldn't matter at all, we try to not rely on
>> it.
>> 
>> If the hypervisor presents SMEP to the guest then the guest is
>> pretty obviously going to try to use it.
> 
> To me it looks like when bootstrapping the APs things are not yet
> ready to use it. If I did not miss something, the only place that
> the saved contents of cr4 are used is in startup_32 when the cpus
> are brought up. And then just stop dead. Would need to read more
> code but a bit weird why the BP is not affected.

This feels like a bug in Xen, but I don't know for sure yet.  Either
which way, it is odd.  That write to cr4 should be entirely legitimate.

	-hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 16:45     ` Stefan Bader
  2013-03-27 16:52       ` H. Peter Anvin
@ 2013-03-27 17:28       ` Stefan Bader
  2013-03-27 17:30         ` H. Peter Anvin
  2013-03-27 20:24       ` Keir Fraser
  2 siblings, 1 reply; 30+ messages in thread
From: Stefan Bader @ 2013-03-27 17:28 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, H. Peter Anvin


[-- Attachment #1.1: Type: text/plain, Size: 4254 bytes --]

On 27.03.2013 17:45, Stefan Bader wrote:
> On 27.03.2013 17:04, Konrad Rzeszutek Wilk wrote:
>> On Wed, Mar 27, 2013 at 04:53:16PM +0100, Stefan Bader wrote:
>>> On 27.03.2013 16:26, Stefan Bader wrote:
>>>> Recently I ran some experiments on newer hardware and realized that when booting
>>>> any kernel newer or equal to v3.5 (Xen version 4.2.1) in 64bit mode would fail
>>>> to bring up any APs (message about CPU Stuck). I was able to normally bisect
>>>> into a range of realmode changes and then manually drill down to the following
>>>> commit:
>>>>
>>>> commit cda846f101fb1396b6924f1d9b68ac3d42de5403
>>>> Author: Jarkko Sakkinen <jarkko.sakkinen@intel.com>
>>>> Date:   Tue May 8 21:22:46 2012 +0300
>>>>
>>>>     x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline
>>>>
>>>>     This patch changes 64-bit trampoline so that CR4 and
>>>>     EFER are provided by the kernel instead of using fixed
>>>>     values.
>>>>
>>>> From the Xen debugging console it was possible to gather a bit more data which
>>>> pointed to a failure very close to setting CR4 in startup_32. On this particular
>>>> hardware the saved CR4 (about to be set) was 0x1407f0.
>>>>
>>>> This would set two flags that somehow feel dangerous: PGE (page global enable)
>>>> and SMEP (supervisor mode execution protection). SMEP turns out to be the main
>>>> offender and the following change allows the APs to start:
>>>>
>>>> --- a/arch/x86/realmode/rm/trampoline_64.S
>>>> +++ b/arch/x86/realmode/rm/trampoline_64.S
>>>> @@ -93,7 +93,9 @@ ENTRY(startup_32)
>>>>         movl    %edx, %fs
>>>>         movl    %edx, %gs
>>>>
>>>> -       movl    pa_tr_cr4, %eax
>>>> +       movl    $X86_CR4_SMEP, %eax
>>>> +       notl    %eax
>>>> +       andl    pa_tr_cr4, %eax
>>>>         movl    %eax, %cr4              # Enable PAE mode
>>>>
>>>>         # Setup trampoline 4 level pagetables
>>>>
>>>> Now I am not completely convinced that this is really the way to go. Likely the
>>>> Xen hypervisor should not start up the guest with CR4 on the BP containing those
>>>> flags. But maybe it still makes sense to mask some dangerous ones off in the
>>>> realmode code (btw, it seemed that masking the assignments in arch_setup or
>>>> setup_realmode did not work).
>>>>
>>>> And finally I am wondering why the SMEP flag in CR4 is set anyway. My
>>>> understanding would be that this should only be done if cpuid[7].ebx has bit7
>>>> set. And this does not seem to be the case at least on the one box I was doing
>>>> the bisection on.
>>>
>>> Seems that I was relying on the wrong source of information when checking SMEP
>>> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So that
>>> at least explains where that comes from... sorry for that.
>>
>> OK, so if you boot Xen with smep=1 (which disables SMEP, kind of counterintuive flag)
>> that would work fine?
> 
> Rebooting with smep=1 as a hv argument does not fix it. But I would be careful
> since I just quickly did this without checking whether Xen 4.2.1 undestands the
> flag already.

I will need more time to look into this (and unlikely today) but it feels like
at least the cpuid flags passed on to HVM guest may be not influenced by the
smep boot argument. Probably rather something I could do by masking in the
config of the guest (which could be another pain as I normally configure those
via libvirt).

> 
> Second using x86info --all on bare metal does show bits set for cpuid[7] and
> /proc/cpuinfo values are consistent across BP and APs. So I am a tool for using
> the wrong tool there.
> 
> So I would say the main issue to look at is why reading cr4 as a HVM guest
> produces the flags on boot. Surely the hypervisor itself has set certain things
> up but likely there are some epxectations about the initial setup on boot.
> 
>>
>> CC-ing the Intel folks who added this in.
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 17:28       ` Stefan Bader
@ 2013-03-27 17:30         ` H. Peter Anvin
  2013-03-27 17:40           ` Stefan Bader
  0 siblings, 1 reply; 30+ messages in thread
From: H. Peter Anvin @ 2013-03-27 17:30 UTC (permalink / raw)
  To: Stefan Bader
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, Konrad Rzeszutek Wilk

On 03/27/2013 10:28 AM, Stefan Bader wrote:
> 
> I will need more time to look into this (and unlikely today) but it
> feels like at least the cpuid flags passed on to HVM guest may be
> not influenced by the smep boot argument. Probably rather something
> I could do by masking in the config of the guest (which could be
> another pain as I normally configure those via libvirt).
> 

There is an "nosmep" kernel command line option.

	-hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 17:23           ` H. Peter Anvin
@ 2013-03-27 17:38             ` Stefan Bader
  2013-03-28 13:34             ` Jan Beulich
  1 sibling, 0 replies; 30+ messages in thread
From: Stefan Bader @ 2013-03-27 17:38 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1317 bytes --]

On 27.03.2013 18:23, H. Peter Anvin wrote:
> On 03/27/2013 10:17 AM, Stefan Bader wrote:
>>> What does x86info and /proc/cpuinfo show in HVM?
>>
>> x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep
>> set.
> 
> On all CPUs?

x86info thinks its one core with ht so only one cpuid line for that.
> 
>>> The inbound %cr4 shouldn't matter at all, we try to not rely on
>>> it.
>>>
>>> If the hypervisor presents SMEP to the guest then the guest is
>>> pretty obviously going to try to use it.
>>
>> To me it looks like when bootstrapping the APs things are not yet
>> ready to use it. If I did not miss something, the only place that
>> the saved contents of cr4 are used is in startup_32 when the cpus
>> are brought up. And then just stop dead. Would need to read more
>> code but a bit weird why the BP is not affected.
> 
> This feels like a bug in Xen, but I don't know for sure yet.  Either
> which way, it is odd.  That write to cr4 should be entirely legitimate.

Could likely be. Unfortunately one where a change in the kernel triggers it. Not
exactly your problem but a pita nonetheless.

-Stefan
> 
> 	-hpa
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 17:30         ` H. Peter Anvin
@ 2013-03-27 17:40           ` Stefan Bader
  2013-03-27 17:44             ` H. Peter Anvin
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Bader @ 2013-03-27 17:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 565 bytes --]

On 27.03.2013 18:30, H. Peter Anvin wrote:
> On 03/27/2013 10:28 AM, Stefan Bader wrote:
>>
>> I will need more time to look into this (and unlikely today) but it
>> feels like at least the cpuid flags passed on to HVM guest may be
>> not influenced by the smep boot argument. Probably rather something
>> I could do by masking in the config of the guest (which could be
>> another pain as I normally configure those via libvirt).
>>
> 
> There is an "nosmep" kernel command line option.

Ignoring it on that side does help.

> 
> 	-hpa
> 
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 17:40           ` Stefan Bader
@ 2013-03-27 17:44             ` H. Peter Anvin
  0 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2013-03-27 17:44 UTC (permalink / raw)
  To: Stefan Bader
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, Konrad Rzeszutek Wilk

On 03/27/2013 10:40 AM, Stefan Bader wrote:
> On 27.03.2013 18:30, H. Peter Anvin wrote:
>> On 03/27/2013 10:28 AM, Stefan Bader wrote:
>>> 
>>> I will need more time to look into this (and unlikely today)
>>> but it feels like at least the cpuid flags passed on to HVM
>>> guest may be not influenced by the smep boot argument. Probably
>>> rather something I could do by masking in the config of the
>>> guest (which could be another pain as I normally configure
>>> those via libvirt).
>>> 
>> 
>> There is an "nosmep" kernel command line option.
> 
> Ignoring it on that side does help.
> 

As one would expect.  Are CPUID and /proc/cpuinfo still consistent
across all CPUs inside the HVM?

	-hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 16:45     ` Stefan Bader
  2013-03-27 16:52       ` H. Peter Anvin
  2013-03-27 17:28       ` Stefan Bader
@ 2013-03-27 20:24       ` Keir Fraser
  2013-03-28 15:06         ` Stefan Bader
  2 siblings, 1 reply; 30+ messages in thread
From: Keir Fraser @ 2013-03-27 20:24 UTC (permalink / raw)
  To: Stefan Bader, Konrad Rzeszutek Wilk
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, H. Peter Anvin

On 27/03/2013 16:45, "Stefan Bader" <stefan.bader@canonical.com> wrote:

>>> Seems that I was relying on the wrong source of information when checking
>>> SMEP
>>> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So
>>> that
>>> at least explains where that comes from... sorry for that.
>> 
>> OK, so if you boot Xen with smep=1 (which disables SMEP, kind of
>> counterintuive flag)
>> that would work fine?
> 
> Rebooting with smep=1 as a hv argument does not fix it. But I would be careful
> since I just quickly did this without checking whether Xen 4.2.1 undestands
> the
> flag already.

Yes, the flag is understood by all Xen 4.2 releases. However it is not
inverted as you believe: it really is smep=0 or smep=off or even no-smep to
disable SMEP. smep=1 will enable SMEP (which is the default anyway).

I also checked how CPUID.SMEP gets set for an HVM guest, and it is very
obviously masked off if SMEP support has been disabled or is unavailable. So
I do not think we can be erroneously passing the CPUID flag to the guest.

 -- Keir

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 17:23           ` H. Peter Anvin
  2013-03-27 17:38             ` Stefan Bader
@ 2013-03-28 13:34             ` Jan Beulich
  2013-03-28 15:02               ` Stefan Bader
  1 sibling, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2013-03-28 13:34 UTC (permalink / raw)
  To: Stefan Bader, H. Peter Anvin
  Cc: wei.y.yang, xen-devel, haitao.shan, xin.li, Konrad Rzeszutek Wilk

>>> On 27.03.13 at 18:23, "H. Peter Anvin" <hpa@zytor.com> wrote:
> On 03/27/2013 10:17 AM, Stefan Bader wrote:
>>> What does x86info and /proc/cpuinfo show in HVM?
>> 
>> x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep
>> set.
> 
> On all CPUs?
> 
>>> The inbound %cr4 shouldn't matter at all, we try to not rely on
>>> it.
>>> 
>>> If the hypervisor presents SMEP to the guest then the guest is
>>> pretty obviously going to try to use it.
>> 
>> To me it looks like when bootstrapping the APs things are not yet
>> ready to use it. If I did not miss something, the only place that
>> the saved contents of cr4 are used is in startup_32 when the cpus
>> are brought up. And then just stop dead. Would need to read more
>> code but a bit weird why the BP is not affected.
> 
> This feels like a bug in Xen, but I don't know for sure yet.  Either
> which way, it is odd.  That write to cr4 should be entirely legitimate.

And I would guess one that got fixed already.

Stefan, please try 4.2.2-rc1, or (separately)
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=485f374230d39e153d7b9786e3d0336bd52ee661
(which I think requires the immediately preceding
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1e6275a95d3e35a72939b588f422bb761ba82f6b
too).

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-28 13:34             ` Jan Beulich
@ 2013-03-28 15:02               ` Stefan Bader
  2013-03-28 16:39                 ` Stefan Bader
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Bader @ 2013-03-28 15:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Konrad Rzeszutek Wilk, wei.y.yang, haitao.shan,
	xin.li, H. Peter Anvin


[-- Attachment #1.1: Type: text/plain, Size: 2443 bytes --]

On 28.03.2013 14:34, Jan Beulich wrote:
>>>> On 27.03.13 at 18:23, "H. Peter Anvin" <hpa@zytor.com> wrote:
>> On 03/27/2013 10:17 AM, Stefan Bader wrote:
>>>> What does x86info and /proc/cpuinfo show in HVM?
>>>
>>> x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep
>>> set.
>>
>> On all CPUs?
>>
>>>> The inbound %cr4 shouldn't matter at all, we try to not rely on
>>>> it.
>>>>
>>>> If the hypervisor presents SMEP to the guest then the guest is
>>>> pretty obviously going to try to use it.
>>>
>>> To me it looks like when bootstrapping the APs things are not yet
>>> ready to use it. If I did not miss something, the only place that
>>> the saved contents of cr4 are used is in startup_32 when the cpus
>>> are brought up. And then just stop dead. Would need to read more
>>> code but a bit weird why the BP is not affected.
>>
>> This feels like a bug in Xen, but I don't know for sure yet.  Either
>> which way, it is odd.  That write to cr4 should be entirely legitimate.
> 
> And I would guess one that got fixed already.
> 
> Stefan, please try 4.2.2-rc1, or (separately)
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=485f374230d39e153d7b9786e3d0336bd52ee661
> (which I think requires the immediately preceding
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1e6275a95d3e35a72939b588f422bb761ba82f6b
> too).

The backing explanation does make a lot of sense in reasoning what is going
wrong. Unfortunately the two patches above on their own do not fix the problem
(I will try to make another go with 4.2.2-rc1).

For a bit more info I am running a kernel inside the HVM guest which shows the
contents of the cr4 shadow used in the trampoline. Out of interest I compared
those values to the ones used on a bare metal boot and both are identical
(0x1407F0).

That somehow gives some explanation for the patch above failing. Looking at the
code for cr4 updates in vmx_update_guest_cr() a few lines above the new SMEP
handling, there already was code which would clear the PAE flag when
paging_mode_hap(v->domain) was true. And that would need to be true if the SMEP
flag should get cleared. And the PAE flag was (and has to be) set before.

Will be looking into this further.

-Stefan
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-27 20:24       ` Keir Fraser
@ 2013-03-28 15:06         ` Stefan Bader
  2013-03-28 15:42           ` H. Peter Anvin
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Bader @ 2013-03-28 15:06 UTC (permalink / raw)
  To: Keir Fraser
  Cc: xen-devel, Konrad Rzeszutek Wilk, wei.y.yang, haitao.shan,
	xin.li, H. Peter Anvin


[-- Attachment #1.1: Type: text/plain, Size: 1628 bytes --]

On 27.03.2013 21:24, Keir Fraser wrote:
> On 27/03/2013 16:45, "Stefan Bader" <stefan.bader@canonical.com> wrote:
> 
>>>> Seems that I was relying on the wrong source of information when checking
>>>> SMEP
>>>> support. The cpuid command seems at fail. But /proc/cpuinfo reports it. So
>>>> that
>>>> at least explains where that comes from... sorry for that.
>>>
>>> OK, so if you boot Xen with smep=1 (which disables SMEP, kind of
>>> counterintuive flag)
>>> that would work fine?
>>
>> Rebooting with smep=1 as a hv argument does not fix it. But I would be careful
>> since I just quickly did this without checking whether Xen 4.2.1 undestands
>> the
>> flag already.
> 
> Yes, the flag is understood by all Xen 4.2 releases. However it is not
> inverted as you believe: it really is smep=0 or smep=off or even no-smep to
> disable SMEP. smep=1 will enable SMEP (which is the default anyway).
> 
> I also checked how CPUID.SMEP gets set for an HVM guest, and it is very
> obviously masked off if SMEP support has been disabled or is unavailable. So
> I do not think we can be erroneously passing the CPUID flag to the guest.

No you are completely right. The inverse boolean got me for good. So to summarize:

- smep=0 as hypervisor argument avoids the problem for all guests
- nosmep as hvm guest arguement avoids the problem for that guest
- /proc/cpuinfo correctly reflects whether smep has been masked off or not

> 
>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-28 15:06         ` Stefan Bader
@ 2013-03-28 15:42           ` H. Peter Anvin
  2013-03-28 16:12             ` Stefan Bader
  0 siblings, 1 reply; 30+ messages in thread
From: H. Peter Anvin @ 2013-03-28 15:42 UTC (permalink / raw)
  To: Stefan Bader
  Cc: xen-devel, Konrad Rzeszutek Wilk, wei.y.yang, haitao.shan,
	Keir Fraser, xin.li

On 03/28/2013 08:06 AM, Stefan Bader wrote:
> 
> No you are completely right. The inverse boolean got me for good.
> So to summarize:
> 
> - smep=0 as hypervisor argument avoids the problem for all guests -
> nosmep as hvm guest arguement avoids the problem for that guest -
> /proc/cpuinfo correctly reflects whether smep has been masked off
> or not
> 

Please try to patch Jan pointed to.

	-hpa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-28 15:42           ` H. Peter Anvin
@ 2013-03-28 16:12             ` Stefan Bader
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Bader @ 2013-03-28 16:12 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: xen-devel, Konrad Rzeszutek Wilk, wei.y.yang, haitao.shan,
	Keir Fraser, xin.li


[-- Attachment #1.1: Type: text/plain, Size: 656 bytes --]

On 28.03.2013 16:42, H. Peter Anvin wrote:
> On 03/28/2013 08:06 AM, Stefan Bader wrote:
>>
>> No you are completely right. The inverse boolean got me for good.
>> So to summarize:
>>
>> - smep=0 as hypervisor argument avoids the problem for all guests -
>> nosmep as hvm guest arguement avoids the problem for that guest -
>> /proc/cpuinfo correctly reflects whether smep has been masked off
>> or not
>>
> 
> Please try to patch Jan pointed to.

I did, but it did not work. Elaborating on it a bit more in the reply I wrote to
his mail. In short, I think the code that would clear smep is not reached.

-Stefan
> 
> 	-hpa
> 
> 



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-28 15:02               ` Stefan Bader
@ 2013-03-28 16:39                 ` Stefan Bader
  2013-04-03 11:56                   ` Stefan Bader
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Bader @ 2013-03-28 16:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Konrad Rzeszutek Wilk, wei.y.yang, haitao.shan,
	xin.li, H. Peter Anvin


[-- Attachment #1.1: Type: text/plain, Size: 2544 bytes --]

On 28.03.2013 16:02, Stefan Bader wrote:
> On 28.03.2013 14:34, Jan Beulich wrote:
>>>>> On 27.03.13 at 18:23, "H. Peter Anvin" <hpa@zytor.com> wrote:
>>> On 03/27/2013 10:17 AM, Stefan Bader wrote:
>>>>> What does x86info and /proc/cpuinfo show in HVM?
>>>>
>>>> x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep
>>>> set.
>>>
>>> On all CPUs?
>>>
>>>>> The inbound %cr4 shouldn't matter at all, we try to not rely on
>>>>> it.
>>>>>
>>>>> If the hypervisor presents SMEP to the guest then the guest is
>>>>> pretty obviously going to try to use it.
>>>>
>>>> To me it looks like when bootstrapping the APs things are not yet
>>>> ready to use it. If I did not miss something, the only place that
>>>> the saved contents of cr4 are used is in startup_32 when the cpus
>>>> are brought up. And then just stop dead. Would need to read more
>>>> code but a bit weird why the BP is not affected.
>>>
>>> This feels like a bug in Xen, but I don't know for sure yet.  Either
>>> which way, it is odd.  That write to cr4 should be entirely legitimate.
>>
>> And I would guess one that got fixed already.
>>
>> Stefan, please try 4.2.2-rc1, or (separately)
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=485f374230d39e153d7b9786e3d0336bd52ee661
>> (which I think requires the immediately preceding
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1e6275a95d3e35a72939b588f422bb761ba82f6b
>> too).
> 
> The backing explanation does make a lot of sense in reasoning what is going
> wrong. Unfortunately the two patches above on their own do not fix the problem
> (I will try to make another go with 4.2.2-rc1).

The whole of 4.2.2-rc1 has the same (smep still present in
trampoline_cr4_features) outcome.
> 
> For a bit more info I am running a kernel inside the HVM guest which shows the
> contents of the cr4 shadow used in the trampoline. Out of interest I compared
> those values to the ones used on a bare metal boot and both are identical
> (0x1407F0).
> 
> That somehow gives some explanation for the patch above failing. Looking at the
> code for cr4 updates in vmx_update_guest_cr() a few lines above the new SMEP
> handling, there already was code which would clear the PAE flag when
> paging_mode_hap(v->domain) was true. And that would need to be true if the SMEP
> flag should get cleared. And the PAE flag was (and has to be) set before.
> 

> Will be looking into this further.
Going back to gather more info and to find some fix.

-Stefan




[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-03-28 16:39                 ` Stefan Bader
@ 2013-04-03 11:56                   ` Stefan Bader
  2013-04-03 12:43                     ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Bader @ 2013-04-03 11:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Konrad Rzeszutek Wilk, wei.y.yang, haitao.shan,
	xin.li, H. Peter Anvin


[-- Attachment #1.1.1: Type: text/plain, Size: 5696 bytes --]

On 28.03.2013 17:39, Stefan Bader wrote:
> On 28.03.2013 16:02, Stefan Bader wrote:
>> On 28.03.2013 14:34, Jan Beulich wrote:
>>>>>> On 27.03.13 at 18:23, "H. Peter Anvin" <hpa@zytor.com> wrote:
>>>> On 03/27/2013 10:17 AM, Stefan Bader wrote:
>>>>>> What does x86info and /proc/cpuinfo show in HVM?
>>>>>
>>>>> x86info cpuid[7].ebx = 0xbbb and /proc/cpuinfo also shows smep
>>>>> set.
>>>>
>>>> On all CPUs?
>>>>
>>>>>> The inbound %cr4 shouldn't matter at all, we try to not rely on
>>>>>> it.
>>>>>>
>>>>>> If the hypervisor presents SMEP to the guest then the guest is
>>>>>> pretty obviously going to try to use it.
>>>>>
>>>>> To me it looks like when bootstrapping the APs things are not yet
>>>>> ready to use it. If I did not miss something, the only place that
>>>>> the saved contents of cr4 are used is in startup_32 when the cpus
>>>>> are brought up. And then just stop dead. Would need to read more
>>>>> code but a bit weird why the BP is not affected.
>>>>
>>>> This feels like a bug in Xen, but I don't know for sure yet.  Either
>>>> which way, it is odd.  That write to cr4 should be entirely legitimate.
>>>
>>> And I would guess one that got fixed already.
>>>
>>> Stefan, please try 4.2.2-rc1, or (separately)
>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=485f374230d39e153d7b9786e3d0336bd52ee661
>>> (which I think requires the immediately preceding
>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=1e6275a95d3e35a72939b588f422bb761ba82f6b
>>> too).
>>
>> The backing explanation does make a lot of sense in reasoning what is going
>> wrong. Unfortunately the two patches above on their own do not fix the problem
>> (I will try to make another go with 4.2.2-rc1).
> 
> The whole of 4.2.2-rc1 has the same (smep still present in
> trampoline_cr4_features) outcome.
>>
>> For a bit more info I am running a kernel inside the HVM guest which shows the
>> contents of the cr4 shadow used in the trampoline. Out of interest I compared
>> those values to the ones used on a bare metal boot and both are identical
>> (0x1407F0).
>>
>> That somehow gives some explanation for the patch above failing. Looking at the
>> code for cr4 updates in vmx_update_guest_cr() a few lines above the new SMEP
>> handling, there already was code which would clear the PAE flag when
>> paging_mode_hap(v->domain) was true. And that would need to be true if the SMEP
>> flag should get cleared. And the PAE flag was (and has to be) set before.
>>
> 
>> Will be looking into this further.
> Going back to gather more info and to find some fix.
> 

I added some more debugging output to the hypervisor to verify the state of HAP.
This showed that while HAP is available on the system, it is not used for the
HVM guests. It looks like this would require some flags to be set when creating
the guest domains and I assume this is not happening because I have to stay with
the xm stack for the libvirt setup for now (requires some repackaging which
hasn't been done, yet).

So the guest isn't using HAP but does seem to use some form of paging even if
the guest VCPU is not using paging. So I changed the vmx_update_guest_cr()
function in that way and that seems to prevent the hangs. Does this look like a
reasonable upstream Xen change?

From eccbc4cf0916c6d4388f658965c79770bd0ba10f Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Wed, 3 Apr 2013 12:06:24 +0200
Subject: [PATCH] VMX: Always disable SMEP when guest is in non-paging mode

commit e7dda8ec9fc9020e4f53345cdbb18a2e82e54a65
  VMX: disable SMEP feature when guest is in non-paging mode

disabled the SMEP bit if a guest VCPU was using HAP and was not
in paging mode. However I could observe VCPUs getting stuck in
the trampoline after the following patch in the Linux kernel
changed the way CR4 gets set up:
  x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline

The change will set CR4 from already set flags which includes the
SMEP bit. On bare metal this does not matter as the CPU is in non-
paging mode at that time. But Xen seems to use the emulated non-
paging mode regardless of HAP (I verified that on the guests I was
seeing the issue, HAP was not used).

Therefor it seems right to unset the SMEP bit for a VCPU that is
not in paging-mode, regardless of its HAP usage.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 xen/arch/x86/hvm/vmx/vmx.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 04dbefb..a869ed4 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1161,13 +1161,16 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned
int cr)
         if ( paging_mode_hap(v->domain) && !hvm_paging_enabled(v) )
         {
             v->arch.hvm_vcpu.hw_cr[4] |= X86_CR4_PSE;
             v->arch.hvm_vcpu.hw_cr[4] &= ~X86_CR4_PAE;
+        }
+        if ( !hvm_paging_enabled(v) )
+        {
             /*
              * SMEP is disabled if CPU is in non-paging mode in hardware.
              * However Xen always uses paging mode to emulate guest non-paging
-             * mode with HAP. To emulate this behavior, SMEP needs to be
-             * manually disabled when guest switches to non-paging mode.
+             * mode. To emulate this behavior, SMEP needs to be manually
+             * disabled when guest VCPU is in non-paging mode.
              */
             v->arch.hvm_vcpu.hw_cr[4] &= ~X86_CR4_SMEP;
         }
         __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: 0001-VMX-Always-disable-SMEP-when-guest-is-in-non-paging-.patch --]
[-- Type: text/x-diff; name="0001-VMX-Always-disable-SMEP-when-guest-is-in-non-paging-.patch", Size: 2379 bytes --]

From eccbc4cf0916c6d4388f658965c79770bd0ba10f Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Wed, 3 Apr 2013 12:06:24 +0200
Subject: [PATCH] VMX: Always disable SMEP when guest is in non-paging mode

commit e7dda8ec9fc9020e4f53345cdbb18a2e82e54a65
  VMX: disable SMEP feature when guest is in non-paging mode

disabled the SMEP bit if a guest VCPU was using HAP and was not
in paging mode. However I could observe VCPUs getting stuck in
the trampoline after the following patch in the Linux kernel
changed the way CR4 gets set up:
  x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline

The change will set CR4 from already set flags which includes the
SMEP bit. On bare metal this does not matter as the CPU is in non-
paging mode at that time. But Xen seems to use the emulated non-
paging mode regardless of HAP (I verified that on the guests I was
seeing the issue, HAP was not used).

Therefor it seems right to unset the SMEP bit for a VCPU that is
not in paging-mode, regardless of its HAP usage.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 xen/arch/x86/hvm/vmx/vmx.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 04dbefb..a869ed4 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1161,13 +1161,16 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr)
         if ( paging_mode_hap(v->domain) && !hvm_paging_enabled(v) )
         {
             v->arch.hvm_vcpu.hw_cr[4] |= X86_CR4_PSE;
             v->arch.hvm_vcpu.hw_cr[4] &= ~X86_CR4_PAE;
+        }
+        if ( !hvm_paging_enabled(v) )
+        {
             /*
              * SMEP is disabled if CPU is in non-paging mode in hardware.
              * However Xen always uses paging mode to emulate guest non-paging
-             * mode with HAP. To emulate this behavior, SMEP needs to be 
-             * manually disabled when guest switches to non-paging mode.
+             * mode. To emulate this behavior, SMEP needs to be manually
+             * disabled when guest VCPU is in non-paging mode.
              */
             v->arch.hvm_vcpu.hw_cr[4] &= ~X86_CR4_SMEP;
         }
         __vmwrite(GUEST_CR4, v->arch.hvm_vcpu.hw_cr[4]);
-- 
1.7.9.5


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-04-03 11:56                   ` Stefan Bader
@ 2013-04-03 12:43                     ` Jan Beulich
  2013-04-03 14:28                       ` Keir Fraser
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2013-04-03 12:43 UTC (permalink / raw)
  To: Stefan Bader
  Cc: xen-devel, Konrad Rzeszutek Wilk, Eddie Dong, wei.y.yang,
	haitao.shan, Dongxiao Xu, xin.li, Jun Nakajima, H. Peter Anvin,
	xiantao.zhang

>>> On 03.04.13 at 13:56, Stefan Bader <stefan.bader@canonical.com> wrote:
> I added some more debugging output to the hypervisor to verify the state of HAP.
> This showed that while HAP is available on the system, it is not used for the
> HVM guests. It looks like this would require some flags to be set when creating
> the guest domains and I assume this is not happening because I have to stay with
> the xm stack for the libvirt setup for now (requires some repackaging which hasn't been done, yet).
> 
> So the guest isn't using HAP but does seem to use some form of paging even if
> the guest VCPU is not using paging. So I changed the vmx_update_guest_cr()
> function in that way and that seems to prevent the hangs. Does this look like a
> reasonable upstream Xen change?

Yes, it looks appropriate. But I'd like this to be confirmed by the
authors of the original change and/or the VMX maintainers (added
to Cc).

Nevertheless it's very odd to not use HAP on a machine capable
of it...

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-04-03 12:43                     ` Jan Beulich
@ 2013-04-03 14:28                       ` Keir Fraser
  2013-04-03 15:00                         ` Xu, Dongxiao
  0 siblings, 1 reply; 30+ messages in thread
From: Keir Fraser @ 2013-04-03 14:28 UTC (permalink / raw)
  To: Jan Beulich, Stefan Bader
  Cc: xen-devel, Konrad Rzeszutek Wilk, Eddie Dong, wei.y.yang,
	haitao.shan, Dongxiao Xu, xin.li, Jun Nakajima, H. Peter Anvin,
	xiantao.zhang

On 03/04/2013 13:43, "Jan Beulich" <JBeulich@suse.com> wrote:

>>>> On 03.04.13 at 13:56, Stefan Bader <stefan.bader@canonical.com> wrote:
>> I added some more debugging output to the hypervisor to verify the state of
>> HAP.
>> This showed that while HAP is available on the system, it is not used for the
>> HVM guests. It looks like this would require some flags to be set when
>> creating
>> the guest domains and I assume this is not happening because I have to stay
>> with
>> the xm stack for the libvirt setup for now (requires some repackaging which
>> hasn't been done, yet).
>> 
>> So the guest isn't using HAP but does seem to use some form of paging even if
>> the guest VCPU is not using paging. So I changed the vmx_update_guest_cr()
>> function in that way and that seems to prevent the hangs. Does this look like
>> a
>> reasonable upstream Xen change?
> 
> Yes, it looks appropriate. But I'd like this to be confirmed by the
> authors of the original change and/or the VMX maintainers (added
> to Cc).

It can have my ack straight away.

Acked-by: Keir Fraser <keir@xen.org>

Nonetheless it would be nice to get a VMX maintainer ack too, though I'm
pretty sure this patch is correct.

 -- Keir

> Nevertheless it's very odd to not use HAP on a machine capable
> of it...
> 
> Jan
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-04-03 14:28                       ` Keir Fraser
@ 2013-04-03 15:00                         ` Xu, Dongxiao
  2013-04-03 15:48                           ` H. Peter Anvin
  0 siblings, 1 reply; 30+ messages in thread
From: Xu, Dongxiao @ 2013-04-03 15:00 UTC (permalink / raw)
  To: Keir Fraser, Jan Beulich, Stefan Bader
  Cc: xen-devel, Konrad Rzeszutek Wilk, Dong, Eddie, wei.y.yang, Shan,
	Haitao, xin.li, Nakajima, Jun, H. Peter Anvin, Zhang, Xiantao

> -----Original Message-----
> From: Keir Fraser [mailto:keir.xen@gmail.com]
> Sent: Wednesday, April 03, 2013 10:28 PM
> To: Jan Beulich; Stefan Bader
> Cc: xen-devel@lists.xensource.com; Konrad Rzeszutek Wilk; Dong, Eddie;
> wei.y.yang@intel.com; Shan, Haitao; Xu, Dongxiao; xin.li@intel.com; Nakajima,
> Jun; H. Peter Anvin; Zhang, Xiantao
> Subject: Re: [Xen-devel] Xen HVM regression on certain Intel CPUs
> 
> On 03/04/2013 13:43, "Jan Beulich" <JBeulich@suse.com> wrote:
> 
> >>>> On 03.04.13 at 13:56, Stefan Bader <stefan.bader@canonical.com>
> wrote:
> >> I added some more debugging output to the hypervisor to verify the state of
> >> HAP.
> >> This showed that while HAP is available on the system, it is not used for the
> >> HVM guests. It looks like this would require some flags to be set when
> >> creating
> >> the guest domains and I assume this is not happening because I have to stay
> >> with
> >> the xm stack for the libvirt setup for now (requires some repackaging which
> >> hasn't been done, yet).
> >>
> >> So the guest isn't using HAP but does seem to use some form of paging even
> if
> >> the guest VCPU is not using paging. So I changed the
> vmx_update_guest_cr()
> >> function in that way and that seems to prevent the hangs. Does this look like
> >> a
> >> reasonable upstream Xen change?
> >
> > Yes, it looks appropriate. But I'd like this to be confirmed by the
> > authors of the original change and/or the VMX maintainers (added
> > to Cc).
> 
> It can have my ack straight away.
> 
> Acked-by: Keir Fraser <keir@xen.org>
> 
> Nonetheless it would be nice to get a VMX maintainer ack too, though I'm
> pretty sure this patch is correct.

Yes, it is a good fix. Thank you!
I didn't test non-HAP case when I made the patch to fix this SMEP issue. 

Acked-by: Dongxiao Xu <dongxiao.xu@intel.com>

Thanks,
Dongxiao

> 
>  -- Keir
> 
> > Nevertheless it's very odd to not use HAP on a machine capable
> > of it...
> >
> > Jan
> >
> >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-04-03 15:00                         ` Xu, Dongxiao
@ 2013-04-03 15:48                           ` H. Peter Anvin
  2013-04-03 16:05                             ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: H. Peter Anvin @ 2013-04-03 15:48 UTC (permalink / raw)
  To: Xu, Dongxiao
  Cc: xen-devel, Nakajima, Jun, Konrad Rzeszutek Wilk, Shan, Haitao,
	wei.y.yang, Dong, Eddie, Stefan Bader, Keir Fraser, xin.li,
	Jan Beulich, Zhang, Xiantao

On 04/03/2013 08:00 AM, Xu, Dongxiao wrote:
> 
> Yes, it is a good fix. Thank you!
> I didn't test non-HAP case when I made the patch to fix this SMEP issue. 
> 

Now, won't SMAP have exactly the same issue (and so need to be added to
the same mask?)

	-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Xen HVM regression on certain Intel CPUs
  2013-04-03 15:48                           ` H. Peter Anvin
@ 2013-04-03 16:05                             ` Jan Beulich
  0 siblings, 0 replies; 30+ messages in thread
From: Jan Beulich @ 2013-04-03 16:05 UTC (permalink / raw)
  To: Dongxiao Xu, H. Peter Anvin
  Cc: xen-devel, Konrad Rzeszutek Wilk, Eddie Dong, wei.y.yang,
	Haitao Shan, Stefan Bader, Keir Fraser, xin.li, Jun Nakajima,
	Xiantao Zhang

>>> On 03.04.13 at 17:48, "H. Peter Anvin" <hpa@zytor.com> wrote:
> On 04/03/2013 08:00 AM, Xu, Dongxiao wrote:
>> 
>> Yes, it is a good fix. Thank you!
>> I didn't test non-HAP case when I made the patch to fix this SMEP issue. 
>> 
> 
> Now, won't SMAP have exactly the same issue (and so need to be added to
> the same mask?)

Whenever the hypervisor starts supporting SMAP, yes.

Jan

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-04-03 16:05 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-27 15:26 Xen HVM regression on certain Intel CPUs Stefan Bader
2013-03-27 15:53 ` Stefan Bader
2013-03-27 16:04   ` Konrad Rzeszutek Wilk
2013-03-27 16:09     ` H. Peter Anvin
2013-03-27 16:24       ` Stefan Bader
2013-03-27 16:32         ` H. Peter Anvin
2013-03-27 16:32         ` Stefano Stabellini
2013-03-27 16:45     ` Stefan Bader
2013-03-27 16:52       ` H. Peter Anvin
2013-03-27 17:17         ` Stefan Bader
2013-03-27 17:23           ` H. Peter Anvin
2013-03-27 17:38             ` Stefan Bader
2013-03-28 13:34             ` Jan Beulich
2013-03-28 15:02               ` Stefan Bader
2013-03-28 16:39                 ` Stefan Bader
2013-04-03 11:56                   ` Stefan Bader
2013-04-03 12:43                     ` Jan Beulich
2013-04-03 14:28                       ` Keir Fraser
2013-04-03 15:00                         ` Xu, Dongxiao
2013-04-03 15:48                           ` H. Peter Anvin
2013-04-03 16:05                             ` Jan Beulich
2013-03-27 17:28       ` Stefan Bader
2013-03-27 17:30         ` H. Peter Anvin
2013-03-27 17:40           ` Stefan Bader
2013-03-27 17:44             ` H. Peter Anvin
2013-03-27 20:24       ` Keir Fraser
2013-03-28 15:06         ` Stefan Bader
2013-03-28 15:42           ` H. Peter Anvin
2013-03-28 16:12             ` Stefan Bader
2013-03-27 16:18   ` H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.