All of lore.kernel.org
 help / color / mirror / Atom feed
* ACPI suspend/resume on Dell Inspirons 1464/1564/1764
@ 2010-05-04 22:25 Roger Cruz
  2010-05-04 22:52 ` Jeremy Fitzhardinge
  2010-05-12 18:38 ` Roger Cruz
  0 siblings, 2 replies; 17+ messages in thread
From: Roger Cruz @ 2010-05-04 22:25 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 643 bytes --]


Hello fellow Xen developers,

I'm about to start debugging why Dell Inspirons running Xen 3.4.2 fail to resume after a suspend operation.  A colleague has also found that the problem exists on bare-metal Linux (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/571422) and an upstream patch has been created (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commitdiff;h=29c60ccc1a408371885d79d8f8c081fbcb9b10be).  

I would like to find out if anyone in the Xen community has encountered this problem and if a fix is in the works.  Otherwise, I will attempt to provide a similar solution to Linux's patch. 

thanks
Roger

[-- Attachment #1.2: Type: text/html, Size: 1280 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-04 22:25 ACPI suspend/resume on Dell Inspirons 1464/1564/1764 Roger Cruz
@ 2010-05-04 22:52 ` Jeremy Fitzhardinge
  2010-05-04 23:06   ` Roger Cruz
  2010-05-12 18:38 ` Roger Cruz
  1 sibling, 1 reply; 17+ messages in thread
From: Jeremy Fitzhardinge @ 2010-05-04 22:52 UTC (permalink / raw)
  To: Roger Cruz; +Cc: xen-devel

On 05/04/2010 03:25 PM, Roger Cruz wrote:
>
> I'm about to start debugging why Dell Inspirons running Xen 3.4.2 fail
> to resume after a suspend operation.  A colleague has also found that
> the problem exists on bare-metal Linux
> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/571422) and an
> upstream patch has been created
> (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commitdiff;h=29c60ccc1a408371885d79d8f8c081fbcb9b10be). 
>
> I would like to find out if anyone in the Xen community has
> encountered this problem and if a fix is in the works.  Otherwise, I
> will attempt to provide a similar solution to Linux's patch.
>
> ACPI suspend/resume on Dell Inspirons 1464/1564/1764

I think the Linux fix should work under Xen as well.  Have you tried a
xen/stable-2.6.32.x kernel (assuming the fix is now in mainline Linux). 
If not, you should be able to apply it to that kernel and try it out.

Thanks,
    J

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-04 22:52 ` Jeremy Fitzhardinge
@ 2010-05-04 23:06   ` Roger Cruz
  0 siblings, 0 replies; 17+ messages in thread
From: Roger Cruz @ 2010-05-04 23:06 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1789 bytes --]


It is my understanding that my colleague tried the upstream on a bear metal and it works but when the same patch is applied to our 2.6.32 dom0 xen kernel, it does not.  He believes he issue needs to be addressed in the hypervisor's resume path.  I'm just jumping into this problem right now so I'm coming up to speed.  I wanted to first find out from the community if anyone else had seen a similar issue.  It is a tough problem for us to debug as we don't have any serial port on these Dell machines so we have no indication on how far the resume code has reached.

Thanks
Roger

-----Original Message-----
From: Jeremy Fitzhardinge [mailto:jeremy@goop.org]
Sent: Tue 5/4/2010 6:52 PM
To: Roger Cruz
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 
On 05/04/2010 03:25 PM, Roger Cruz wrote:
>
> I'm about to start debugging why Dell Inspirons running Xen 3.4.2 fail
> to resume after a suspend operation.  A colleague has also found that
> the problem exists on bare-metal Linux
> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/571422) and an
> upstream patch has been created
> (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commitdiff;h=29c60ccc1a408371885d79d8f8c081fbcb9b10be). 
>
> I would like to find out if anyone in the Xen community has
> encountered this problem and if a fix is in the works.  Otherwise, I
> will attempt to provide a similar solution to Linux's patch.
>
> ACPI suspend/resume on Dell Inspirons 1464/1564/1764

I think the Linux fix should work under Xen as well.  Have you tried a
xen/stable-2.6.32.x kernel (assuming the fix is now in mainline Linux). 
If not, you should be able to apply it to that kernel and try it out.

Thanks,
    J


[-- Attachment #1.2: Type: text/html, Size: 2662 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-04 22:25 ACPI suspend/resume on Dell Inspirons 1464/1564/1764 Roger Cruz
  2010-05-04 22:52 ` Jeremy Fitzhardinge
@ 2010-05-12 18:38 ` Roger Cruz
  2010-05-18 22:34   ` Roger Cruz
  1 sibling, 1 reply; 17+ messages in thread
From: Roger Cruz @ 2010-05-12 18:38 UTC (permalink / raw)
  To: Roger Cruz, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2772 bytes --]


We have made some progress in getting the inspiron laptops to work under Xen.  We tried xenunstable and xen-4.0.0 and discovered that xenunstable can resume whereas xen-4.0.0 cannot.  Through trial and error, we have been able to narrow down the actual changes that allowed it to work.  It looks like moving the trampoline code down from its 0x8c000 location allowed it to resume.  

So we took the change below and applied it to our 3.4.2 tree.  However, we still have a problem in our 3.4.2 tree with this patch applied.  If an HVM guest is running, the resume will fail with the exact same behavior as before.  Due to our environment setup, we have not been able to test xenunstable with an HVM guest, so we can't say if this problem is fixed in xenunstable or not.  Can someone familiar with these changes provide a clue as to what is going on?  how does having an HVM guest running affect the resume functionality?  Running PV linux guests does not affect resume, only HVM guests do.


--- old/xen-3.4.2/xen/include/asm-x86/config.h	2010-05-12 11:44:35.243564976 -0400
+++ new/xen-3.4.2/xen/include/asm-x86/config.h	2010-05-12 11:44:35.026578602 -0400
@@ -96,7 +96,7 @@
 /* Primary stack is restricted to 8kB by guard pages. */
 #define PRIMARY_STACK_SIZE 8192
 
-#define BOOT_TRAMPOLINE 0x8c000
+#define BOOT_TRAMPOLINE 0x7c000
 #define bootsym_phys(sym)                                 \
     (((unsigned long)&(sym)-(unsigned long)&trampoline_start)+BOOT_TRAMPOLINE)
 #define bootsym(sym)                                      \



--- old/xen-3.4.2/xen/include/asm-x86/config.h	2010-05-12 11:44:35.243564976 -0400
+++ new/xen-3.4.2/xen/include/asm-x86/config.h	2010-05-12 11:44:35.026578602 -0400
@@ -96,7 +96,7 @@
 /* Primary stack is restricted to 8kB by guard pages. */
 #define PRIMARY_STACK_SIZE 8192
 
-#define BOOT_TRAMPOLINE 0x8c000
+#define BOOT_TRAMPOLINE 0x7c000
 #define bootsym_phys(sym)                                 \
     (((unsigned long)&(sym)-(unsigned long)&trampoline_start)+BOOT_TRAMPOLINE)
 #define bootsym(sym)                                      \

-------

Hello fellow Xen developers,

I'm about to start debugging why Dell Inspirons running Xen 3.4.2 fail to resume after a suspend operation.  A colleague has also found that the problem exists on bare-metal Linux (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/571422) and an upstream patch has been created (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commitdiff;h=29c60ccc1a408371885d79d8f8c081fbcb9b10be).  

I would like to find out if anyone in the Xen community has encountered this problem and if a fix is in the works.  Otherwise, I will attempt to provide a similar solution to Linux's patch. 

thanks
Roger


[-- Attachment #1.2: Type: text/html, Size: 4424 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-12 18:38 ` Roger Cruz
@ 2010-05-18 22:34   ` Roger Cruz
  2010-05-19  7:25     ` Keir Fraser
  0 siblings, 1 reply; 17+ messages in thread
From: Roger Cruz @ 2010-05-18 22:34 UTC (permalink / raw)
  To: Roger Cruz


[-- Attachment #1.1: Type: text/plain, Size: 4488 bytes --]


A little more info.  I am now able to wake up the Dell Inspiron 1764 after I put it to sleep.  I found that the code commented out below would cause the problems in my system.  I have yet to understand why these variables don't end up with the expected values.  If anyone has any thoughts that they would like to share on how this code works and why it is comparing to stored variables, I would very much like to hear them.

Thank you
Roger R. Cruz


diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
--- a/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c

@@ -191,19 +192,25 @@
         cpu_has_vmx_ins_outs_instr_info = !!(vmx_basic_msr_high & (1U<<22));
         vmx_display_features();
     }
+#if 0
     else
     {
         /* Globals are already initialised: re-check them. */
         BUG_ON(vmcs_revision_id != vmx_basic_msr_low);
         BUG_ON(vmx_pin_based_exec_control != _vmx_pin_based_exec_control);
         BUG_ON(vmx_cpu_based_exec_control != _vmx_cpu_based_exec_control);
         BUG_ON(vmx_secondary_exec_control != _vmx_secondary_exec_control);
         BUG_ON(vmx_vmexit_control != _vmx_vmexit_control);
         BUG_ON(vmx_vmentry_control != _vmx_vmentry_control);
         BUG_ON(cpu_has_vmx_ins_outs_instr_info !=
                !!(vmx_basic_msr_high & (1U<<22)));
     }

+#endif
     /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
     BUG_ON((vmx_basic_msr_high & 0x1fff) > PAGE_SIZE);


-----Original Message-----
From: Roger Cruz
Sent: Wed 5/12/2010 2:38 PM
To: Roger Cruz; xen-devel@lists.xensource.com
Subject: RE: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 

We have made some progress in getting the inspiron laptops to work under Xen.  We tried xenunstable and xen-4.0.0 and discovered that xenunstable can resume whereas xen-4.0.0 cannot.  Through trial and error, we have been able to narrow down the actual changes that allowed it to work.  It looks like moving the trampoline code down from its 0x8c000 location allowed it to resume.  

So we took the change below and applied it to our 3.4.2 tree.  However, we still have a problem in our 3.4.2 tree with this patch applied.  If an HVM guest is running, the resume will fail with the exact same behavior as before.  Due to our environment setup, we have not been able to test xenunstable with an HVM guest, so we can't say if this problem is fixed in xenunstable or not.  Can someone familiar with these changes provide a clue as to what is going on?  how does having an HVM guest running affect the resume functionality?  Running PV linux guests does not affect resume, only HVM guests do.


--- old/xen-3.4.2/xen/include/asm-x86/config.h	2010-05-12 11:44:35.243564976 -0400
+++ new/xen-3.4.2/xen/include/asm-x86/config.h	2010-05-12 11:44:35.026578602 -0400
@@ -96,7 +96,7 @@
 /* Primary stack is restricted to 8kB by guard pages. */
 #define PRIMARY_STACK_SIZE 8192
 
-#define BOOT_TRAMPOLINE 0x8c000
+#define BOOT_TRAMPOLINE 0x7c000
 #define bootsym_phys(sym)                                 \
     (((unsigned long)&(sym)-(unsigned long)&trampoline_start)+BOOT_TRAMPOLINE)
 #define bootsym(sym)                                      \



--- old/xen-3.4.2/xen/include/asm-x86/config.h	2010-05-12 11:44:35.243564976 -0400
+++ new/xen-3.4.2/xen/include/asm-x86/config.h	2010-05-12 11:44:35.026578602 -0400
@@ -96,7 +96,7 @@
 /* Primary stack is restricted to 8kB by guard pages. */
 #define PRIMARY_STACK_SIZE 8192
 
-#define BOOT_TRAMPOLINE 0x8c000
+#define BOOT_TRAMPOLINE 0x7c000
 #define bootsym_phys(sym)                                 \
     (((unsigned long)&(sym)-(unsigned long)&trampoline_start)+BOOT_TRAMPOLINE)
 #define bootsym(sym)                                      \

-------

Hello fellow Xen developers,

I'm about to start debugging why Dell Inspirons running Xen 3.4.2 fail to resume after a suspend operation.  A colleague has also found that the problem exists on bare-metal Linux (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/571422) and an upstream patch has been created (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commitdiff;h=29c60ccc1a408371885d79d8f8c081fbcb9b10be).  

I would like to find out if anyone in the Xen community has encountered this problem and if a fix is in the works.  Otherwise, I will attempt to provide a similar solution to Linux's patch. 

thanks
Roger



[-- Attachment #1.2: Type: text/html, Size: 6945 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-18 22:34   ` Roger Cruz
@ 2010-05-19  7:25     ` Keir Fraser
  2010-05-19 14:30       ` Roger Cruz
  0 siblings, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2010-05-19  7:25 UTC (permalink / raw)
  To: Roger Cruz, xen-devel

On 18/05/2010 23:34, "Roger Cruz" <roger.cruz@virtualcomputer.com> wrote:

> A little more info.  I am now able to wake up the Dell Inspiron 1764 after I
> put it to sleep.  I found that the code commented out below would cause the
> problems in my system.  I have yet to understand why these variables don't end
> up with the expected values.  If anyone has any thoughts that they would like
> to share on how this code works and why it is comparing to stored variables, I
> would very much like to hear them.

The BUG_ONs are to detect VMX versioning inconsistencies between processors.
The weird thing here is that you presumably brought all CPUs online during
initial system boto with no problem. So somehow something has changed only
after resume from S3. I think you will need to add tracing to discover which
BUG_ON is failing, and why.

Incidentally, in my CPU hotplug cleanup I will be making it so that CPUs
that fail the checks will fail to come online, rather than crash the system.
Which is a bit of an improvement, but obviously something is buggy
underlying this (possibly in BIOS code).

 -- Keir

> Thank you
> Roger R. Cruz
> 
> 
> diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> --- a/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> 
> @@ -191,19 +192,25 @@
>          cpu_has_vmx_ins_outs_instr_info = !!(vmx_basic_msr_high & (1U<<22));
>          vmx_display_features();
>      }
> +#if 0
>      else
>      {
>          /* Globals are already initialised: re-check them. */
>          BUG_ON(vmcs_revision_id != vmx_basic_msr_low);
>          BUG_ON(vmx_pin_based_exec_control != _vmx_pin_based_exec_control);
>          BUG_ON(vmx_cpu_based_exec_control != _vmx_cpu_based_exec_control);
>          BUG_ON(vmx_secondary_exec_control != _vmx_secondary_exec_control);
>          BUG_ON(vmx_vmexit_control != _vmx_vmexit_control);
>          BUG_ON(vmx_vmentry_control != _vmx_vmentry_control);
>          BUG_ON(cpu_has_vmx_ins_outs_instr_info !=
>                 !!(vmx_basic_msr_high & (1U<<22)));
>      }
> 
> +#endif
>      /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
>      BUG_ON((vmx_basic_msr_high & 0x1fff) > PAGE_SIZE);
> 
> 
> -----Original Message-----
> From: Roger Cruz
> Sent: Wed 5/12/2010 2:38 PM
> To: Roger Cruz; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
> 
> 
> We have made some progress in getting the inspiron laptops to work under Xen.
> We tried xenunstable and xen-4.0.0 and discovered that xenunstable can resume
> whereas xen-4.0.0 cannot.  Through trial and error, we have been able to
> narrow down the actual changes that allowed it to work.  It looks like moving
> the trampoline code down from its 0x8c000 location allowed it to resume.
> 
> So we took the change below and applied it to our 3.4.2 tree.  However, we
> still have a problem in our 3.4.2 tree with this patch applied.  If an HVM
> guest is running, the resume will fail with the exact same behavior as before.
> Due to our environment setup, we have not been able to test xenunstable with
> an HVM guest, so we can't say if this problem is fixed in xenunstable or not.
> Can someone familiar with these changes provide a clue as to what is going on?
> how does having an HVM guest running affect the resume functionality?  Running
> PV linux guests does not affect resume, only HVM guests do.
> 
> 
> --- old/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.243564976
> -0400
> +++ new/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.026578602
> -0400
> @@ -96,7 +96,7 @@
>  /* Primary stack is restricted to 8kB by guard pages. */
>  #define PRIMARY_STACK_SIZE 8192
> 
> -#define BOOT_TRAMPOLINE 0x8c000
> +#define BOOT_TRAMPOLINE 0x7c000
>  #define bootsym_phys(sym)                                 \
>      (((unsigned long)&(sym)-(unsigned
> long)&trampoline_start)+BOOT_TRAMPOLINE)
>  #define bootsym(sym)                                      \
> 
> 
> 
> --- old/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.243564976
> -0400
> +++ new/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.026578602
> -0400
> @@ -96,7 +96,7 @@
>  /* Primary stack is restricted to 8kB by guard pages. */
>  #define PRIMARY_STACK_SIZE 8192
> 
> -#define BOOT_TRAMPOLINE 0x8c000
> +#define BOOT_TRAMPOLINE 0x7c000
>  #define bootsym_phys(sym)                                 \
>      (((unsigned long)&(sym)-(unsigned
> long)&trampoline_start)+BOOT_TRAMPOLINE)
>  #define bootsym(sym)                                      \
> 
> -------
> 
> Hello fellow Xen developers,
> 
> I'm about to start debugging why Dell Inspirons running Xen 3.4.2 fail to
> resume after a suspend operation.  A colleague has also found that the problem
> exists on bare-metal Linux
> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/571422) and an upstream
> patch has been created
> (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commitdiff;h=29c60cc
> c1a408371885d79d8f8c081fbcb9b10be).
> 
> I would like to find out if anyone in the Xen community has encountered this
> problem and if a fix is in the works.  Otherwise, I will attempt to provide a
> similar solution to Linux's patch.
> 
> thanks
> Roger
> 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-19  7:25     ` Keir Fraser
@ 2010-05-19 14:30       ` Roger Cruz
  2010-05-19 14:50         ` Keir Fraser
  2010-05-19 16:36         ` Roger Cruz
  0 siblings, 2 replies; 17+ messages in thread
From: Roger Cruz @ 2010-05-19 14:30 UTC (permalink / raw)
  To: Keir Fraser, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 9460 bytes --]

Keir and Jan,

Thank you for responding to my message.  Here is some additional info that may be of interest.

1) The problem has been reproduced on Inspiron Dell 1564 and 1764 models.  They do not have a serial port, so tracing of any sort has been impossible.  The system reboots when resuming from sleep so any in-memory-state is also lost.  If you have any suggestions on other tracing mechanisms I'm all ears.  It has been very time-consuming to do it my way (see below).

2) The way I narrow down the problem to these lines of code was by inserting a "while(1);" loop at different points in the code.  When it didn't reboot, I knew it had gotten to my while loop.  I just kept moving the while loop until I found the lines I highlighted in my previous msg.  Below is what my debug code looks like:

        //       if (sleeploop) while(1);  // it did not reboot up to this point
        BUG_ON(vmx_secondary_exec_control != _vmx_secondary_exec_control);
        //       if (sleeploop) while(1);  // did not reboot up to this piont.
        BUG_ON(vmx_vmexit_control != _vmx_vmexit_control); 
        //        if (sleeploop) while(1);  // Rebooted before here.
        BUG_ON(vmx_vmentry_control != _vmx_vmentry_control);

3) You can see above that the vmx_vmexit_control check was the point at which the crash/reboot was being triggered.  However, if I commented out just that line, I would still see a reboot.  Only when I commented the whole block out did it finally work.   Is something overwriting the location of these variables such that when I commented out a line of code, it moved the data segment causing a different variable to be overwritten?    I need to be able to explain this behavior.  So I will working towards that today.

4) My initial thoughts were that the BIOS was overwriting some of these locations, so I performed an experiment that I believe rules out the BIOS.  I commented out the code in power.c that puts the CPU into the sleep mode.  This had the effect of going through most of the sleep and wakeup code in power.c (it does not go through all the wakeup.S initialization as well).  When I did this, it still failed to resume from sleep as long as an HVM domain was present.  Here is the diff on power.c

diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/acpi/power.c
--- a/xen-3.4.2/xen/arch/x86/acpi/power.c
+++ b/xen-3.4.2/xen/arch/x86/acpi/power.c
@@ -208,9 +208,11 @@
     switch ( state )
     {
     case ACPI_STATE_S3:
+#if 0
         do_suspend_lowlevel();
         system_reset_counter++;
         error = tboot_s3_resume();
+#endif        
         break;
     case ACPI_STATE_S5:
         acpi_enter_sleep_state(ACPI_STATE_S5);

5) The problem occurs even when Xen is run in uni-processor mode.  I achieved this by adding "nosmp=1 maxcpus=1" to the grub command line that boots xen.  I confirmed that Xen only reported one physical CPU, namely CPU0.  This should have avoided any issues with waking up other non-boot processors.

6) Finally, I narrowed down the type of domain and condition of the domain that would exhibit the problem, by using python to create a domain with me being able to control its definition.  If I set "flags" to 0, the problem is does not show up.  If I set it to "1" (hvm) and do NOT execute the "xc.domain_max_vcpus" call, the problem does not show up.  However, once I add one VCPU to this domain, the problem occurs.

#! /usr/bin/python
import sys
sys.path.append('/usr/lib/python2.6/site-packages')
import xen.lowlevel.xc
from xen.xend import uuid
xc = xen.lowlevel.xc.xc()
domid=xc.domain_create(domid=0,ssidref=0,handle=uuid.fromString("bad0beef-dead-beef-dead-beefdeadbeef"), flags=1)

print domid 
xc.domain_max_vcpus(domid, 1)


Roger R. Cruz



-----Original Message-----
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
Sent: Wed 5/19/2010 3:25 AM
To: Roger Cruz; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 
On 18/05/2010 23:34, "Roger Cruz" <roger.cruz@virtualcomputer.com> wrote:

> A little more info.  I am now able to wake up the Dell Inspiron 1764 after I
> put it to sleep.  I found that the code commented out below would cause the
> problems in my system.  I have yet to understand why these variables don't end
> up with the expected values.  If anyone has any thoughts that they would like
> to share on how this code works and why it is comparing to stored variables, I
> would very much like to hear them.

The BUG_ONs are to detect VMX versioning inconsistencies between processors.
The weird thing here is that you presumably brought all CPUs online during
initial system boto with no problem. So somehow something has changed only
after resume from S3. I think you will need to add tracing to discover which
BUG_ON is failing, and why.

Incidentally, in my CPU hotplug cleanup I will be making it so that CPUs
that fail the checks will fail to come online, rather than crash the system.
Which is a bit of an improvement, but obviously something is buggy
underlying this (possibly in BIOS code).

 -- Keir

> Thank you
> Roger R. Cruz
> 
> 
> diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> --- a/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> 
> @@ -191,19 +192,25 @@
>          cpu_has_vmx_ins_outs_instr_info = !!(vmx_basic_msr_high & (1U<<22));
>          vmx_display_features();
>      }
> +#if 0
>      else
>      {
>          /* Globals are already initialised: re-check them. */
>          BUG_ON(vmcs_revision_id != vmx_basic_msr_low);
>          BUG_ON(vmx_pin_based_exec_control != _vmx_pin_based_exec_control);
>          BUG_ON(vmx_cpu_based_exec_control != _vmx_cpu_based_exec_control);
>          BUG_ON(vmx_secondary_exec_control != _vmx_secondary_exec_control);
>          BUG_ON(vmx_vmexit_control != _vmx_vmexit_control);
>          BUG_ON(vmx_vmentry_control != _vmx_vmentry_control);
>          BUG_ON(cpu_has_vmx_ins_outs_instr_info !=
>                 !!(vmx_basic_msr_high & (1U<<22)));
>      }
> 
> +#endif
>      /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
>      BUG_ON((vmx_basic_msr_high & 0x1fff) > PAGE_SIZE);
> 
> 
> -----Original Message-----
> From: Roger Cruz
> Sent: Wed 5/12/2010 2:38 PM
> To: Roger Cruz; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
> 
> 
> We have made some progress in getting the inspiron laptops to work under Xen.
> We tried xenunstable and xen-4.0.0 and discovered that xenunstable can resume
> whereas xen-4.0.0 cannot.  Through trial and error, we have been able to
> narrow down the actual changes that allowed it to work.  It looks like moving
> the trampoline code down from its 0x8c000 location allowed it to resume.
> 
> So we took the change below and applied it to our 3.4.2 tree.  However, we
> still have a problem in our 3.4.2 tree with this patch applied.  If an HVM
> guest is running, the resume will fail with the exact same behavior as before.
> Due to our environment setup, we have not been able to test xenunstable with
> an HVM guest, so we can't say if this problem is fixed in xenunstable or not.
> Can someone familiar with these changes provide a clue as to what is going on?
> how does having an HVM guest running affect the resume functionality?  Running
> PV linux guests does not affect resume, only HVM guests do.
> 
> 
> --- old/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.243564976
> -0400
> +++ new/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.026578602
> -0400
> @@ -96,7 +96,7 @@
>  /* Primary stack is restricted to 8kB by guard pages. */
>  #define PRIMARY_STACK_SIZE 8192
> 
> -#define BOOT_TRAMPOLINE 0x8c000
> +#define BOOT_TRAMPOLINE 0x7c000
>  #define bootsym_phys(sym)                                 \
>      (((unsigned long)&(sym)-(unsigned
> long)&trampoline_start)+BOOT_TRAMPOLINE)
>  #define bootsym(sym)                                      \
> 
> 
> 
> --- old/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.243564976
> -0400
> +++ new/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.026578602
> -0400
> @@ -96,7 +96,7 @@
>  /* Primary stack is restricted to 8kB by guard pages. */
>  #define PRIMARY_STACK_SIZE 8192
> 
> -#define BOOT_TRAMPOLINE 0x8c000
> +#define BOOT_TRAMPOLINE 0x7c000
>  #define bootsym_phys(sym)                                 \
>      (((unsigned long)&(sym)-(unsigned
> long)&trampoline_start)+BOOT_TRAMPOLINE)
>  #define bootsym(sym)                                      \
> 
> -------
> 
> Hello fellow Xen developers,
> 
> I'm about to start debugging why Dell Inspirons running Xen 3.4.2 fail to
> resume after a suspend operation.  A colleague has also found that the problem
> exists on bare-metal Linux
> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/571422) and an upstream
> patch has been created
> (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commitdiff;h=29c60cc
> c1a408371885d79d8f8c081fbcb9b10be).
> 
> I would like to find out if anyone in the Xen community has encountered this
> problem and if a fix is in the works.  Otherwise, I will attempt to provide a
> similar solution to Linux's patch.
> 
> thanks
> Roger
> 
> 
> 




[-- Attachment #1.2: Type: text/html, Size: 13591 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-19 14:30       ` Roger Cruz
@ 2010-05-19 14:50         ` Keir Fraser
  2010-05-19 14:59           ` Keir Fraser
  2010-05-19 16:36         ` Roger Cruz
  1 sibling, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2010-05-19 14:50 UTC (permalink / raw)
  To: Roger Cruz, xen-devel

On 19/05/2010 15:30, "Roger Cruz" <roger.cruz@virtualcomputer.com> wrote:

> 2) The way I narrow down the problem to these lines of code was by inserting a
> "while(1);" loop at different points in the code.  When it didn't reboot, I
> knew it had gotten to my while loop.  I just kept moving the while loop until
> I found the lines I highlighted in my previous msg.  Below is what my debug
> code looks like:

Your system seems to hobble along just fine if you remove the BUG_ON()s, so
why not convert them into printk() warnings? Or if it's too early for
printk, stash some info in memory and printk() it at the very end of S3
resume.

> 3) You can see above that the vmx_vmexit_control check was the point at which
> the crash/reboot was being triggered.  However, if I commented out just that
> line, I would still see a reboot.  Only when I commented the whole block out
> did it finally work.   Is something overwriting the location of these
> variables such that when I commented out a line of code, it moved the data
> segment causing a different variable to be overwritten?    I need to be able
> to explain this behavior.  So I will working towards that today.

I would assume that more than one of the BUG_ON()s is triggering. So if you
just comment out the first offending one that you find, you instead fall
foul of a second one.

> 4) My initial thoughts were that the BIOS was overwriting some of these
> locations, so I performed an experiment that I believe rules out the BIOS.  I
> commented out the code in power.c that puts the CPU into the sleep mode.  This
> had the effect of going through most of the sleep and wakeup code in power.c
> (it does not go through all the wakeup.S initialization as well).  When I did
> this, it still failed to resume from sleep as long as an HVM domain was
> present.  Here is the diff on power.c

Yep, that patch should do the expected thing and do everything except the
actual BIOS S3 transition.

Well, overall this does sound like a memory corruption issue, not a BIOS or
platform issue. You need to printk out the contents of variables
contributing to your failing BUG_ON()s and see what's written there, I
think.

 -- Keir

> 5) The problem occurs even when Xen is run in uni-processor mode.  I achieved
> this by adding "nosmp=1 maxcpus=1" to the grub command line that boots xen.  I
> confirmed that Xen only reported one physical CPU, namely CPU0.  This should
> have avoided any issues with waking up other non-boot processors.
> 
> 6) Finally, I narrowed down the type of domain and condition of the domain
> that would exhibit the problem, by using python to create a domain with me
> being able to control its definition.  If I set "flags" to 0, the problem is
> does not show up.  If I set it to "1" (hvm) and do NOT execute the
> "xc.domain_max_vcpus" call, the problem does not show up.  However, once I add
> one VCPU to this domain, the problem occurs.
> 
> #! /usr/bin/python
> import sys
> sys.path.append('/usr/lib/python2.6/site-packages')
> import xen.lowlevel.xc
> from xen.xend import uuid
> xc = xen.lowlevel.xc.xc()
> domid=xc.domain_create(domid=0,ssidref=0,handle=uuid.fromString("bad0beef-dead
> -beef-dead-beefdeadbeef"), flags=1)
> 
> print domid
> xc.domain_max_vcpus(domid, 1)
> 
> 
> Roger R. Cruz
> 
> 
> 
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Wed 5/19/2010 3:25 AM
> To: Roger Cruz; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
> 
> On 18/05/2010 23:34, "Roger Cruz" <roger.cruz@virtualcomputer.com> wrote:
> 
>> A little more info.  I am now able to wake up the Dell Inspiron 1764 after I
>> put it to sleep.  I found that the code commented out below would cause the
>> problems in my system.  I have yet to understand why these variables don't
>> end
>> up with the expected values.  If anyone has any thoughts that they would like
>> to share on how this code works and why it is comparing to stored variables,
>> I
>> would very much like to hear them.
> 
> The BUG_ONs are to detect VMX versioning inconsistencies between processors.
> The weird thing here is that you presumably brought all CPUs online during
> initial system boto with no problem. So somehow something has changed only
> after resume from S3. I think you will need to add tracing to discover which
> BUG_ON is failing, and why.
> 
> Incidentally, in my CPU hotplug cleanup I will be making it so that CPUs
> that fail the checks will fail to come online, rather than crash the system.
> Which is a bit of an improvement, but obviously something is buggy
> underlying this (possibly in BIOS code).
> 
>  -- Keir
> 
>> Thank you
>> Roger R. Cruz
>> 
>> 
>> diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
>> --- a/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
>> 
>> @@ -191,19 +192,25 @@
>>          cpu_has_vmx_ins_outs_instr_info = !!(vmx_basic_msr_high & (1U<<22));
>>          vmx_display_features();
>>      }
>> +#if 0
>>      else
>>      {
>>          /* Globals are already initialised: re-check them. */
>>          BUG_ON(vmcs_revision_id != vmx_basic_msr_low);
>>          BUG_ON(vmx_pin_based_exec_control != _vmx_pin_based_exec_control);
>>          BUG_ON(vmx_cpu_based_exec_control != _vmx_cpu_based_exec_control);
>>          BUG_ON(vmx_secondary_exec_control != _vmx_secondary_exec_control);
>>          BUG_ON(vmx_vmexit_control != _vmx_vmexit_control);
>>          BUG_ON(vmx_vmentry_control != _vmx_vmentry_control);
>>          BUG_ON(cpu_has_vmx_ins_outs_instr_info !=
>>                 !!(vmx_basic_msr_high & (1U<<22)));
>>      }
>> 
>> +#endif
>>      /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
>>      BUG_ON((vmx_basic_msr_high & 0x1fff) > PAGE_SIZE);
>> 
>> 
>> -----Original Message-----
>> From: Roger Cruz
>> Sent: Wed 5/12/2010 2:38 PM
>> To: Roger Cruz; xen-devel@lists.xensource.com
>> Subject: RE: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
>> 
>> 
>> We have made some progress in getting the inspiron laptops to work under Xen.
>> We tried xenunstable and xen-4.0.0 and discovered that xenunstable can resume
>> whereas xen-4.0.0 cannot.  Through trial and error, we have been able to
>> narrow down the actual changes that allowed it to work.  It looks like moving
>> the trampoline code down from its 0x8c000 location allowed it to resume.
>> 
>> So we took the change below and applied it to our 3.4.2 tree.  However, we
>> still have a problem in our 3.4.2 tree with this patch applied.  If an HVM
>> guest is running, the resume will fail with the exact same behavior as
>> before.
>> Due to our environment setup, we have not been able to test xenunstable with
>> an HVM guest, so we can't say if this problem is fixed in xenunstable or not.
>> Can someone familiar with these changes provide a clue as to what is going
>> on?
>> how does having an HVM guest running affect the resume functionality?
>> Running
>> PV linux guests does not affect resume, only HVM guests do.
>> 
>> 
>> --- old/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.243564976
>> -0400
>> +++ new/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.026578602
>> -0400
>> @@ -96,7 +96,7 @@
>>  /* Primary stack is restricted to 8kB by guard pages. */
>>  #define PRIMARY_STACK_SIZE 8192
>> 
>> -#define BOOT_TRAMPOLINE 0x8c000
>> +#define BOOT_TRAMPOLINE 0x7c000
>>  #define bootsym_phys(sym)                                 \
>>      (((unsigned long)&(sym)-(unsigned
>> long)&trampoline_start)+BOOT_TRAMPOLINE)
>>  #define bootsym(sym)                                      \
>> 
>> 
>> 
>> --- old/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.243564976
>> -0400
>> +++ new/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.026578602
>> -0400
>> @@ -96,7 +96,7 @@
>>  /* Primary stack is restricted to 8kB by guard pages. */
>>  #define PRIMARY_STACK_SIZE 8192
>> 
>> -#define BOOT_TRAMPOLINE 0x8c000
>> +#define BOOT_TRAMPOLINE 0x7c000
>>  #define bootsym_phys(sym)                                 \
>>      (((unsigned long)&(sym)-(unsigned
>> long)&trampoline_start)+BOOT_TRAMPOLINE)
>>  #define bootsym(sym)                                      \
>> 
>> -------
>> 
>> Hello fellow Xen developers,
>> 
>> I'm about to start debugging why Dell Inspirons running Xen 3.4.2 fail to
>> resume after a suspend operation.  A colleague has also found that the
>> problem
>> exists on bare-metal Linux
>> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/571422) and an upstream
>> patch has been created
>> 
(http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commitdiff;h=29c60c>>
c
>> c1a408371885d79d8f8c081fbcb9b10be).
>> 
>> I would like to find out if anyone in the Xen community has encountered this
>> problem and if a fix is in the works.  Otherwise, I will attempt to provide a
>> similar solution to Linux's patch.
>> 
>> thanks
>> Roger
>> 
>> 
>> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-19 14:50         ` Keir Fraser
@ 2010-05-19 14:59           ` Keir Fraser
  0 siblings, 0 replies; 17+ messages in thread
From: Keir Fraser @ 2010-05-19 14:59 UTC (permalink / raw)
  To: Roger Cruz, xen-devel

On 19/05/2010 15:50, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

> Well, overall this does sound like a memory corruption issue, not a BIOS or
> platform issue. You need to printk out the contents of variables
> contributing to your failing BUG_ON()s and see what's written there, I
> think.

Dumping the BUG_ON-checked fields during boot (when they presumably have the
correct values) would of course be useful too. So you can compare with the
bad values.

 K.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-19 14:30       ` Roger Cruz
  2010-05-19 14:50         ` Keir Fraser
@ 2010-05-19 16:36         ` Roger Cruz
  2010-05-19 19:24           ` Keir Fraser
  2010-05-19 19:26           ` Roger Cruz
  1 sibling, 2 replies; 17+ messages in thread
From: Roger Cruz @ 2010-05-19 16:36 UTC (permalink / raw)
  To: Roger Cruz, Keir Fraser, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 11586 bytes --]

Keir,

Following your recommendation to store the variables that are being checked on the BUG_ON, here is what I found to be different.

Upon platform boot. These are my base values.

(XEN)     vmx_vmexit_control = 0xfefff
(XEN)     vmx_vmentry_control = 0x51ff

(XEN)     _vmx_vmexit_control = 0xfefff
(XEN)     _vmx_vmentry_control = 0x51ff

When the sleep code is entered at "int acpi_enter_sleep(struct xenpf_enter_acpi_sleep *sleep)" in power.c, I print out the values as well.

(XEN) *** ACPI Enter Sleep has been called
(XEN)     vmx_vmexit_control = 0x3efff
(XEN)     vmx_vmentry_control = 0x11ff
(XEN) 
(XEN)     _vmx_vmexit_control = 0xfefff
(XEN)     _vmx_vmentry_control = 0x51ff

At the time the hvm_cpu_up returns (hvm_cpu_up is where the BUG_ON code is invoked), I also print the values.

(XEN)     vmx_vmexit_control = 0x3efff
(XEN)     vmx_vmentry_control = 0x11ff

(XEN)     _vmx_vmexit_control = 0xfefff
(XEN)     _vmx_vmentry_control = 0x51ff

As one can see here, even before entering the sleep code, the "saved"  vmx_vmexit_control and vmx_vmentry_control variables against which we compare upon wakeup, have a few different bits.  The only place I found in the code that twiddles these bits is in vmcs.c in "static int construct_vmcs(struct vcpu *v)"

    if ( paging_mode_hap(d) )
    {
        v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING |
                                          CPU_BASED_CR3_LOAD_EXITING |
                                          CPU_BASED_CR3_STORE_EXITING);
    }
    else
    {
        v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
        vmx_vmexit_control &= ~(VM_EXIT_SAVE_GUEST_PAT |
                                VM_EXIT_LOAD_HOST_PAT);
        vmx_vmentry_control &= ~VM_ENTRY_LOAD_GUEST_PAT;
    }

Roger R. Cruz


-----Original Message-----
From: xen-devel-bounces@lists.xensource.com on behalf of Roger Cruz
Sent: Wed 5/19/2010 10:30 AM
To: Keir Fraser; xen-devel@lists.xensource.com
Subject: RE: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 
Keir and Jan,

Thank you for responding to my message.  Here is some additional info that may be of interest.

1) The problem has been reproduced on Inspiron Dell 1564 and 1764 models.  They do not have a serial port, so tracing of any sort has been impossible.  The system reboots when resuming from sleep so any in-memory-state is also lost.  If you have any suggestions on other tracing mechanisms I'm all ears.  It has been very time-consuming to do it my way (see below).

2) The way I narrow down the problem to these lines of code was by inserting a "while(1);" loop at different points in the code.  When it didn't reboot, I knew it had gotten to my while loop.  I just kept moving the while loop until I found the lines I highlighted in my previous msg.  Below is what my debug code looks like:

        //       if (sleeploop) while(1);  // it did not reboot up to this point
        BUG_ON(vmx_secondary_exec_control != _vmx_secondary_exec_control);
        //       if (sleeploop) while(1);  // did not reboot up to this piont.
        BUG_ON(vmx_vmexit_control != _vmx_vmexit_control); 
        //        if (sleeploop) while(1);  // Rebooted before here.
        BUG_ON(vmx_vmentry_control != _vmx_vmentry_control);

3) You can see above that the vmx_vmexit_control check was the point at which the crash/reboot was being triggered.  However, if I commented out just that line, I would still see a reboot.  Only when I commented the whole block out did it finally work.   Is something overwriting the location of these variables such that when I commented out a line of code, it moved the data segment causing a different variable to be overwritten?    I need to be able to explain this behavior.  So I will working towards that today.

4) My initial thoughts were that the BIOS was overwriting some of these locations, so I performed an experiment that I believe rules out the BIOS.  I commented out the code in power.c that puts the CPU into the sleep mode.  This had the effect of going through most of the sleep and wakeup code in power.c (it does not go through all the wakeup.S initialization as well).  When I did this, it still failed to resume from sleep as long as an HVM domain was present.  Here is the diff on power.c

diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/acpi/power.c
--- a/xen-3.4.2/xen/arch/x86/acpi/power.c
+++ b/xen-3.4.2/xen/arch/x86/acpi/power.c
@@ -208,9 +208,11 @@
     switch ( state )
     {
     case ACPI_STATE_S3:
+#if 0
         do_suspend_lowlevel();
         system_reset_counter++;
         error = tboot_s3_resume();
+#endif        
         break;
     case ACPI_STATE_S5:
         acpi_enter_sleep_state(ACPI_STATE_S5);

5) The problem occurs even when Xen is run in uni-processor mode.  I achieved this by adding "nosmp=1 maxcpus=1" to the grub command line that boots xen.  I confirmed that Xen only reported one physical CPU, namely CPU0.  This should have avoided any issues with waking up other non-boot processors.

6) Finally, I narrowed down the type of domain and condition of the domain that would exhibit the problem, by using python to create a domain with me being able to control its definition.  If I set "flags" to 0, the problem is does not show up.  If I set it to "1" (hvm) and do NOT execute the "xc.domain_max_vcpus" call, the problem does not show up.  However, once I add one VCPU to this domain, the problem occurs.

#! /usr/bin/python
import sys
sys.path.append('/usr/lib/python2.6/site-packages')
import xen.lowlevel.xc
from xen.xend import uuid
xc = xen.lowlevel.xc.xc()
domid=xc.domain_create(domid=0,ssidref=0,handle=uuid.fromString("bad0beef-dead-beef-dead-beefdeadbeef"), flags=1)

print domid 
xc.domain_max_vcpus(domid, 1)


Roger R. Cruz



-----Original Message-----
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
Sent: Wed 5/19/2010 3:25 AM
To: Roger Cruz; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 
On 18/05/2010 23:34, "Roger Cruz" <roger.cruz@virtualcomputer.com> wrote:

> A little more info.  I am now able to wake up the Dell Inspiron 1764 after I
> put it to sleep.  I found that the code commented out below would cause the
> problems in my system.  I have yet to understand why these variables don't end
> up with the expected values.  If anyone has any thoughts that they would like
> to share on how this code works and why it is comparing to stored variables, I
> would very much like to hear them.

The BUG_ONs are to detect VMX versioning inconsistencies between processors.
The weird thing here is that you presumably brought all CPUs online during
initial system boto with no problem. So somehow something has changed only
after resume from S3. I think you will need to add tracing to discover which
BUG_ON is failing, and why.

Incidentally, in my CPU hotplug cleanup I will be making it so that CPUs
that fail the checks will fail to come online, rather than crash the system.
Which is a bit of an improvement, but obviously something is buggy
underlying this (possibly in BIOS code).

 -- Keir

> Thank you
> Roger R. Cruz
> 
> 
> diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> --- a/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> 
> @@ -191,19 +192,25 @@
>          cpu_has_vmx_ins_outs_instr_info = !!(vmx_basic_msr_high & (1U<<22));
>          vmx_display_features();
>      }
> +#if 0
>      else
>      {
>          /* Globals are already initialised: re-check them. */
>          BUG_ON(vmcs_revision_id != vmx_basic_msr_low);
>          BUG_ON(vmx_pin_based_exec_control != _vmx_pin_based_exec_control);
>          BUG_ON(vmx_cpu_based_exec_control != _vmx_cpu_based_exec_control);
>          BUG_ON(vmx_secondary_exec_control != _vmx_secondary_exec_control);
>          BUG_ON(vmx_vmexit_control != _vmx_vmexit_control);
>          BUG_ON(vmx_vmentry_control != _vmx_vmentry_control);
>          BUG_ON(cpu_has_vmx_ins_outs_instr_info !=
>                 !!(vmx_basic_msr_high & (1U<<22)));
>      }
> 
> +#endif
>      /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
>      BUG_ON((vmx_basic_msr_high & 0x1fff) > PAGE_SIZE);
> 
> 
> -----Original Message-----
> From: Roger Cruz
> Sent: Wed 5/12/2010 2:38 PM
> To: Roger Cruz; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
> 
> 
> We have made some progress in getting the inspiron laptops to work under Xen.
> We tried xenunstable and xen-4.0.0 and discovered that xenunstable can resume
> whereas xen-4.0.0 cannot.  Through trial and error, we have been able to
> narrow down the actual changes that allowed it to work.  It looks like moving
> the trampoline code down from its 0x8c000 location allowed it to resume.
> 
> So we took the change below and applied it to our 3.4.2 tree.  However, we
> still have a problem in our 3.4.2 tree with this patch applied.  If an HVM
> guest is running, the resume will fail with the exact same behavior as before.
> Due to our environment setup, we have not been able to test xenunstable with
> an HVM guest, so we can't say if this problem is fixed in xenunstable or not.
> Can someone familiar with these changes provide a clue as to what is going on?
> how does having an HVM guest running affect the resume functionality?  Running
> PV linux guests does not affect resume, only HVM guests do.
> 
> 
> --- old/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.243564976
> -0400
> +++ new/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.026578602
> -0400
> @@ -96,7 +96,7 @@
>  /* Primary stack is restricted to 8kB by guard pages. */
>  #define PRIMARY_STACK_SIZE 8192
> 
> -#define BOOT_TRAMPOLINE 0x8c000
> +#define BOOT_TRAMPOLINE 0x7c000
>  #define bootsym_phys(sym)                                 \
>      (((unsigned long)&(sym)-(unsigned
> long)&trampoline_start)+BOOT_TRAMPOLINE)
>  #define bootsym(sym)                                      \
> 
> 
> 
> --- old/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.243564976
> -0400
> +++ new/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.026578602
> -0400
> @@ -96,7 +96,7 @@
>  /* Primary stack is restricted to 8kB by guard pages. */
>  #define PRIMARY_STACK_SIZE 8192
> 
> -#define BOOT_TRAMPOLINE 0x8c000
> +#define BOOT_TRAMPOLINE 0x7c000
>  #define bootsym_phys(sym)                                 \
>      (((unsigned long)&(sym)-(unsigned
> long)&trampoline_start)+BOOT_TRAMPOLINE)
>  #define bootsym(sym)                                      \
> 
> -------
> 
> Hello fellow Xen developers,
> 
> I'm about to start debugging why Dell Inspirons running Xen 3.4.2 fail to
> resume after a suspend operation.  A colleague has also found that the problem
> exists on bare-metal Linux
> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/571422) and an upstream
> patch has been created
> (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commitdiff;h=29c60cc
> c1a408371885d79d8f8c081fbcb9b10be).
> 
> I would like to find out if anyone in the Xen community has encountered this
> problem and if a fix is in the works.  Otherwise, I will attempt to provide a
> similar solution to Linux's patch.
> 
> thanks
> Roger
> 
> 
> 







[-- Attachment #1.2: Type: text/html, Size: 17040 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-19 16:36         ` Roger Cruz
@ 2010-05-19 19:24           ` Keir Fraser
  2010-05-19 19:41             ` Roger Cruz
  2010-05-19 19:54             ` Keir Fraser
  2010-05-19 19:26           ` Roger Cruz
  1 sibling, 2 replies; 17+ messages in thread
From: Keir Fraser @ 2010-05-19 19:24 UTC (permalink / raw)
  To: Roger Cruz, xen-devel

On 19/05/2010 17:36, "Roger Cruz" <roger.cruz@virtualcomputer.com> wrote:

>    else
>     {
>         v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
>         vmx_vmexit_control &= ~(VM_EXIT_SAVE_GUEST_PAT |
>                                 VM_EXIT_LOAD_HOST_PAT);
>         vmx_vmentry_control &= ~VM_ENTRY_LOAD_GUEST_PAT;
>     }

This is the bug. construct_vmcs() should make local copies of
vmx_vmexit_control and vmx_vmentry_control, and only clear bits in those
local copies. It should then __vmwrite() those local copies. I will make a
patch and apply to xen-unstable and xen-4.0 and xen-3.4.

 Thanks,
 Keir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-19 16:36         ` Roger Cruz
  2010-05-19 19:24           ` Keir Fraser
@ 2010-05-19 19:26           ` Roger Cruz
  1 sibling, 0 replies; 17+ messages in thread
From: Roger Cruz @ 2010-05-19 19:26 UTC (permalink / raw)
  To: Roger Cruz


[-- Attachment #1.1: Type: text/plain, Size: 14529 bytes --]


I got a working solution.  The problem occurs because an HVM domain gets created without EPT support causing the global variable's bits to be cleared.  When the comparison is done, crashes because of the mismatch.

If you guys find it acceptable, I can port it to xenunstable for integration to that tree.

Roger R. Cruz


diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
--- a/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
@@ -46,7 +46,9 @@
 u32 vmx_cpu_based_exec_control __read_mostly;
 u32 vmx_secondary_exec_control __read_mostly;
 u32 vmx_vmexit_control __read_mostly;
+u32 vmx_vmexit_control_must_clear __read_mostly;
 u32 vmx_vmentry_control __read_mostly;
+u32 vmx_vmentry_control_must_clear __read_mostly;
 bool_t cpu_has_vmx_ins_outs_instr_info __read_mostly;
 
 static DEFINE_PER_CPU(struct vmcs_struct *, host_vmcs);
@@ -187,7 +189,9 @@
         vmx_cpu_based_exec_control = _vmx_cpu_based_exec_control;
         vmx_secondary_exec_control = _vmx_secondary_exec_control;
         vmx_vmexit_control         = _vmx_vmexit_control;
+        vmx_vmexit_control_must_clear = 0;
         vmx_vmentry_control        = _vmx_vmentry_control;
+        vmx_vmentry_control_must_clear = 0;
         cpu_has_vmx_ins_outs_instr_info = !!(vmx_basic_msr_high & (1U<<22));
         vmx_display_features();
     }
@@ -198,7 +202,9 @@
         BUG_ON(vmx_pin_based_exec_control != _vmx_pin_based_exec_control);
         BUG_ON(vmx_cpu_based_exec_control != _vmx_cpu_based_exec_control);
         BUG_ON(vmx_secondary_exec_control != _vmx_secondary_exec_control);
+        _vmx_vmexit_control &= ~vmx_vmexit_control_must_clear;
         BUG_ON(vmx_vmexit_control != _vmx_vmexit_control);
+        _vmx_vmentry_control &= ~vmx_vmentry_control_must_clear;
         BUG_ON(vmx_vmentry_control != _vmx_vmentry_control);
         BUG_ON(cpu_has_vmx_ins_outs_instr_info !=
                !!(vmx_basic_msr_high & (1U<<22)));
@@ -533,9 +539,11 @@
     else
     {
         v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
-        vmx_vmexit_control &= ~(VM_EXIT_SAVE_GUEST_PAT |
-                                VM_EXIT_LOAD_HOST_PAT);
-        vmx_vmentry_control &= ~VM_ENTRY_LOAD_GUEST_PAT;
+        vmx_vmexit_control_must_clear |= (VM_EXIT_SAVE_GUEST_PAT |
+                                         VM_EXIT_LOAD_HOST_PAT);
+        vmx_vmexit_control &= ~vmx_vmexit_control_must_clear;
+        vmx_vmentry_control_must_clear |= VM_ENTRY_LOAD_GUEST_PAT;
+        vmx_vmentry_control &= ~vmx_vmentry_control_must_clear;
     }
 
     /* Do not enable Monitor Trap Flag unless start single step debug */


-----Original Message-----
From: Roger Cruz
Sent: Wed 5/19/2010 12:36 PM
To: Roger Cruz; Keir Fraser; xen-devel@lists.xensource.com
Subject: RE: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 
Keir,

Following your recommendation to store the variables that are being checked on the BUG_ON, here is what I found to be different.

Upon platform boot. These are my base values.

(XEN)     vmx_vmexit_control = 0xfefff
(XEN)     vmx_vmentry_control = 0x51ff

(XEN)     _vmx_vmexit_control = 0xfefff
(XEN)     _vmx_vmentry_control = 0x51ff

When the sleep code is entered at "int acpi_enter_sleep(struct xenpf_enter_acpi_sleep *sleep)" in power.c, I print out the values as well.

(XEN) *** ACPI Enter Sleep has been called
(XEN)     vmx_vmexit_control = 0x3efff
(XEN)     vmx_vmentry_control = 0x11ff
(XEN) 
(XEN)     _vmx_vmexit_control = 0xfefff
(XEN)     _vmx_vmentry_control = 0x51ff

At the time the hvm_cpu_up returns (hvm_cpu_up is where the BUG_ON code is invoked), I also print the values.

(XEN)     vmx_vmexit_control = 0x3efff
(XEN)     vmx_vmentry_control = 0x11ff

(XEN)     _vmx_vmexit_control = 0xfefff
(XEN)     _vmx_vmentry_control = 0x51ff

As one can see here, even before entering the sleep code, the "saved"  vmx_vmexit_control and vmx_vmentry_control variables against which we compare upon wakeup, have a few different bits.  The only place I found in the code that twiddles these bits is in vmcs.c in "static int construct_vmcs(struct vcpu *v)"

    if ( paging_mode_hap(d) )
    {
        v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING |
                                          CPU_BASED_CR3_LOAD_EXITING |
                                          CPU_BASED_CR3_STORE_EXITING);
    }
    else
    {
        v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
        vmx_vmexit_control &= ~(VM_EXIT_SAVE_GUEST_PAT |
                                VM_EXIT_LOAD_HOST_PAT);
        vmx_vmentry_control &= ~VM_ENTRY_LOAD_GUEST_PAT;
    }

Roger R. Cruz


-----Original Message-----
From: xen-devel-bounces@lists.xensource.com on behalf of Roger Cruz
Sent: Wed 5/19/2010 10:30 AM
To: Keir Fraser; xen-devel@lists.xensource.com
Subject: RE: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 
Keir and Jan,

Thank you for responding to my message.  Here is some additional info that may be of interest.

1) The problem has been reproduced on Inspiron Dell 1564 and 1764 models.  They do not have a serial port, so tracing of any sort has been impossible.  The system reboots when resuming from sleep so any in-memory-state is also lost.  If you have any suggestions on other tracing mechanisms I'm all ears.  It has been very time-consuming to do it my way (see below).

2) The way I narrow down the problem to these lines of code was by inserting a "while(1);" loop at different points in the code.  When it didn't reboot, I knew it had gotten to my while loop.  I just kept moving the while loop until I found the lines I highlighted in my previous msg.  Below is what my debug code looks like:

        //       if (sleeploop) while(1);  // it did not reboot up to this point
        BUG_ON(vmx_secondary_exec_control != _vmx_secondary_exec_control);
        //       if (sleeploop) while(1);  // did not reboot up to this piont.
        BUG_ON(vmx_vmexit_control != _vmx_vmexit_control); 
        //        if (sleeploop) while(1);  // Rebooted before here.
        BUG_ON(vmx_vmentry_control != _vmx_vmentry_control);

3) You can see above that the vmx_vmexit_control check was the point at which the crash/reboot was being triggered.  However, if I commented out just that line, I would still see a reboot.  Only when I commented the whole block out did it finally work.   Is something overwriting the location of these variables such that when I commented out a line of code, it moved the data segment causing a different variable to be overwritten?    I need to be able to explain this behavior.  So I will working towards that today.

4) My initial thoughts were that the BIOS was overwriting some of these locations, so I performed an experiment that I believe rules out the BIOS.  I commented out the code in power.c that puts the CPU into the sleep mode.  This had the effect of going through most of the sleep and wakeup code in power.c (it does not go through all the wakeup.S initialization as well).  When I did this, it still failed to resume from sleep as long as an HVM domain was present.  Here is the diff on power.c

diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/acpi/power.c
--- a/xen-3.4.2/xen/arch/x86/acpi/power.c
+++ b/xen-3.4.2/xen/arch/x86/acpi/power.c
@@ -208,9 +208,11 @@
     switch ( state )
     {
     case ACPI_STATE_S3:
+#if 0
         do_suspend_lowlevel();
         system_reset_counter++;
         error = tboot_s3_resume();
+#endif        
         break;
     case ACPI_STATE_S5:
         acpi_enter_sleep_state(ACPI_STATE_S5);

5) The problem occurs even when Xen is run in uni-processor mode.  I achieved this by adding "nosmp=1 maxcpus=1" to the grub command line that boots xen.  I confirmed that Xen only reported one physical CPU, namely CPU0.  This should have avoided any issues with waking up other non-boot processors.

6) Finally, I narrowed down the type of domain and condition of the domain that would exhibit the problem, by using python to create a domain with me being able to control its definition.  If I set "flags" to 0, the problem is does not show up.  If I set it to "1" (hvm) and do NOT execute the "xc.domain_max_vcpus" call, the problem does not show up.  However, once I add one VCPU to this domain, the problem occurs.

#! /usr/bin/python
import sys
sys.path.append('/usr/lib/python2.6/site-packages')
import xen.lowlevel.xc
from xen.xend import uuid
xc = xen.lowlevel.xc.xc()
domid=xc.domain_create(domid=0,ssidref=0,handle=uuid.fromString("bad0beef-dead-beef-dead-beefdeadbeef"), flags=1)

print domid 
xc.domain_max_vcpus(domid, 1)


Roger R. Cruz



-----Original Message-----
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
Sent: Wed 5/19/2010 3:25 AM
To: Roger Cruz; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 
On 18/05/2010 23:34, "Roger Cruz" <roger.cruz@virtualcomputer.com> wrote:

> A little more info.  I am now able to wake up the Dell Inspiron 1764 after I
> put it to sleep.  I found that the code commented out below would cause the
> problems in my system.  I have yet to understand why these variables don't end
> up with the expected values.  If anyone has any thoughts that they would like
> to share on how this code works and why it is comparing to stored variables, I
> would very much like to hear them.

The BUG_ONs are to detect VMX versioning inconsistencies between processors.
The weird thing here is that you presumably brought all CPUs online during
initial system boto with no problem. So somehow something has changed only
after resume from S3. I think you will need to add tracing to discover which
BUG_ON is failing, and why.

Incidentally, in my CPU hotplug cleanup I will be making it so that CPUs
that fail the checks will fail to come online, rather than crash the system.
Which is a bit of an improvement, but obviously something is buggy
underlying this (possibly in BIOS code).

 -- Keir

> Thank you
> Roger R. Cruz
> 
> 
> diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> --- a/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
> 
> @@ -191,19 +192,25 @@
>          cpu_has_vmx_ins_outs_instr_info = !!(vmx_basic_msr_high & (1U<<22));
>          vmx_display_features();
>      }
> +#if 0
>      else
>      {
>          /* Globals are already initialised: re-check them. */
>          BUG_ON(vmcs_revision_id != vmx_basic_msr_low);
>          BUG_ON(vmx_pin_based_exec_control != _vmx_pin_based_exec_control);
>          BUG_ON(vmx_cpu_based_exec_control != _vmx_cpu_based_exec_control);
>          BUG_ON(vmx_secondary_exec_control != _vmx_secondary_exec_control);
>          BUG_ON(vmx_vmexit_control != _vmx_vmexit_control);
>          BUG_ON(vmx_vmentry_control != _vmx_vmentry_control);
>          BUG_ON(cpu_has_vmx_ins_outs_instr_info !=
>                 !!(vmx_basic_msr_high & (1U<<22)));
>      }
> 
> +#endif
>      /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
>      BUG_ON((vmx_basic_msr_high & 0x1fff) > PAGE_SIZE);
> 
> 
> -----Original Message-----
> From: Roger Cruz
> Sent: Wed 5/12/2010 2:38 PM
> To: Roger Cruz; xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
> 
> 
> We have made some progress in getting the inspiron laptops to work under Xen.
> We tried xenunstable and xen-4.0.0 and discovered that xenunstable can resume
> whereas xen-4.0.0 cannot.  Through trial and error, we have been able to
> narrow down the actual changes that allowed it to work.  It looks like moving
> the trampoline code down from its 0x8c000 location allowed it to resume.
> 
> So we took the change below and applied it to our 3.4.2 tree.  However, we
> still have a problem in our 3.4.2 tree with this patch applied.  If an HVM
> guest is running, the resume will fail with the exact same behavior as before.
> Due to our environment setup, we have not been able to test xenunstable with
> an HVM guest, so we can't say if this problem is fixed in xenunstable or not.
> Can someone familiar with these changes provide a clue as to what is going on?
> how does having an HVM guest running affect the resume functionality?  Running
> PV linux guests does not affect resume, only HVM guests do.
> 
> 
> --- old/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.243564976
> -0400
> +++ new/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.026578602
> -0400
> @@ -96,7 +96,7 @@
>  /* Primary stack is restricted to 8kB by guard pages. */
>  #define PRIMARY_STACK_SIZE 8192
> 
> -#define BOOT_TRAMPOLINE 0x8c000
> +#define BOOT_TRAMPOLINE 0x7c000
>  #define bootsym_phys(sym)                                 \
>      (((unsigned long)&(sym)-(unsigned
> long)&trampoline_start)+BOOT_TRAMPOLINE)
>  #define bootsym(sym)                                      \
> 
> 
> 
> --- old/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.243564976
> -0400
> +++ new/xen-3.4.2/xen/include/asm-x86/config.h  2010-05-12 11:44:35.026578602
> -0400
> @@ -96,7 +96,7 @@
>  /* Primary stack is restricted to 8kB by guard pages. */
>  #define PRIMARY_STACK_SIZE 8192
> 
> -#define BOOT_TRAMPOLINE 0x8c000
> +#define BOOT_TRAMPOLINE 0x7c000
>  #define bootsym_phys(sym)                                 \
>      (((unsigned long)&(sym)-(unsigned
> long)&trampoline_start)+BOOT_TRAMPOLINE)
>  #define bootsym(sym)                                      \
> 
> -------
> 
> Hello fellow Xen developers,
> 
> I'm about to start debugging why Dell Inspirons running Xen 3.4.2 fail to
> resume after a suspend operation.  A colleague has also found that the problem
> exists on bare-metal Linux
> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/571422) and an upstream
> patch has been created
> (http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commitdiff;h=29c60cc
> c1a408371885d79d8f8c081fbcb9b10be).
> 
> I would like to find out if anyone in the Xen community has encountered this
> problem and if a fix is in the works.  Otherwise, I will attempt to provide a
> similar solution to Linux's patch.
> 
> thanks
> Roger
> 
> 
> 








[-- Attachment #1.2: Type: text/html, Size: 21806 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-19 19:24           ` Keir Fraser
@ 2010-05-19 19:41             ` Roger Cruz
  2010-05-19 19:50               ` Roger Cruz
  2010-05-19 19:54             ` Keir Fraser
  1 sibling, 1 reply; 17+ messages in thread
From: Roger Cruz @ 2010-05-19 19:41 UTC (permalink / raw)
  To: Keir Fraser, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1257 bytes --]


OK.  My patch attempted to preserve the changes across the global variables as that is what I thought the intent was.  If I understood you right, the changes in construct_vmcs don't need to apply to future vmcs creations so the changes may only be done locally.  

When do you think an official patch will be available? 

Thank you.
Roger



-----Original Message-----
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
Sent: Wed 5/19/2010 3:24 PM
To: Roger Cruz; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 
On 19/05/2010 17:36, "Roger Cruz" <roger.cruz@virtualcomputer.com> wrote:

>    else
>     {
>         v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
>         vmx_vmexit_control &= ~(VM_EXIT_SAVE_GUEST_PAT |
>                                 VM_EXIT_LOAD_HOST_PAT);
>         vmx_vmentry_control &= ~VM_ENTRY_LOAD_GUEST_PAT;
>     }

This is the bug. construct_vmcs() should make local copies of
vmx_vmexit_control and vmx_vmentry_control, and only clear bits in those
local copies. It should then __vmwrite() those local copies. I will make a
patch and apply to xen-unstable and xen-4.0 and xen-3.4.

 Thanks,
 Keir




[-- Attachment #1.2: Type: text/html, Size: 2268 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-19 19:41             ` Roger Cruz
@ 2010-05-19 19:50               ` Roger Cruz
  0 siblings, 0 replies; 17+ messages in thread
From: Roger Cruz @ 2010-05-19 19:50 UTC (permalink / raw)
  To: Roger Cruz, Keir Fraser, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3393 bytes --]


In case this helps, here is our newly updated patch that doesn't modify the global variables.  I have tested it on a Dell 1764 which has the i5 chips.


diff -r 6b2b1470f009 xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
--- a/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen-3.4.2/xen/arch/x86/hvm/vmx/vmcs.c
@@ -515,6 +515,8 @@
     struct domain *d = v->domain;
     uint16_t sysenter_cs;
     unsigned long sysenter_eip;
+    u32 _vmx_vmexit_control;
+    u32 _vmx_vmentry_control;
 
     vmx_vmcs_enter(v);
 
@@ -524,6 +526,9 @@
     v->arch.hvm_vmx.exec_control = vmx_cpu_based_exec_control;
     v->arch.hvm_vmx.secondary_exec_control = vmx_secondary_exec_control;
 
+    _vmx_vmexit_control = vmx_vmexit_control;
+    _vmx_vmentry_control = vmx_vmentry_control;
+    
     if ( paging_mode_hap(d) )
     {
         v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING |
@@ -533,17 +538,17 @@
     else
     {
         v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
-        vmx_vmexit_control &= ~(VM_EXIT_SAVE_GUEST_PAT |
+        _vmx_vmexit_control &= ~(VM_EXIT_SAVE_GUEST_PAT |
                                 VM_EXIT_LOAD_HOST_PAT);
-        vmx_vmentry_control &= ~VM_ENTRY_LOAD_GUEST_PAT;
+        _vmx_vmentry_control &= ~VM_ENTRY_LOAD_GUEST_PAT;
     }
 
     /* Do not enable Monitor Trap Flag unless start single step debug */
     v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG;
 
     __vmwrite(CPU_BASED_VM_EXEC_CONTROL, v->arch.hvm_vmx.exec_control);
-    __vmwrite(VM_EXIT_CONTROLS, vmx_vmexit_control);
-    __vmwrite(VM_ENTRY_CONTROLS, vmx_vmentry_control);
+    __vmwrite(VM_EXIT_CONTROLS, _vmx_vmexit_control);
+    __vmwrite(VM_ENTRY_CONTROLS, _vmx_vmentry_control);
 
     if ( cpu_has_vmx_secondary_exec_control )
         __vmwrite(SECONDARY_VM_EXEC_CONTROL,


-----Original Message-----
From: xen-devel-bounces@lists.xensource.com on behalf of Roger Cruz
Sent: Wed 5/19/2010 3:41 PM
To: Keir Fraser; xen-devel@lists.xensource.com
Subject: RE: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 

OK.  My patch attempted to preserve the changes across the global variables as that is what I thought the intent was.  If I understood you right, the changes in construct_vmcs don't need to apply to future vmcs creations so the changes may only be done locally.  

When do you think an official patch will be available? 

Thank you.
Roger



-----Original Message-----
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
Sent: Wed 5/19/2010 3:24 PM
To: Roger Cruz; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 
On 19/05/2010 17:36, "Roger Cruz" <roger.cruz@virtualcomputer.com> wrote:

>    else
>     {
>         v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
>         vmx_vmexit_control &= ~(VM_EXIT_SAVE_GUEST_PAT |
>                                 VM_EXIT_LOAD_HOST_PAT);
>         vmx_vmentry_control &= ~VM_ENTRY_LOAD_GUEST_PAT;
>     }

This is the bug. construct_vmcs() should make local copies of
vmx_vmexit_control and vmx_vmentry_control, and only clear bits in those
local copies. It should then __vmwrite() those local copies. I will make a
patch and apply to xen-unstable and xen-4.0 and xen-3.4.

 Thanks,
 Keir





[-- Attachment #1.2: Type: text/html, Size: 5533 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-19 19:24           ` Keir Fraser
  2010-05-19 19:41             ` Roger Cruz
@ 2010-05-19 19:54             ` Keir Fraser
  2010-05-19 19:59               ` Roger Cruz
  1 sibling, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2010-05-19 19:54 UTC (permalink / raw)
  To: Roger Cruz, xen-devel

On 19/05/2010 20:24, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

> On 19/05/2010 17:36, "Roger Cruz" <roger.cruz@virtualcomputer.com> wrote:
> 
>>    else
>>     {
>>         v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
>>         vmx_vmexit_control &= ~(VM_EXIT_SAVE_GUEST_PAT |
>>                                 VM_EXIT_LOAD_HOST_PAT);
>>         vmx_vmentry_control &= ~VM_ENTRY_LOAD_GUEST_PAT;
>>     }
> 
> This is the bug. construct_vmcs() should make local copies of
> vmx_vmexit_control and vmx_vmentry_control, and only clear bits in those
> local copies. It should then __vmwrite() those local copies. I will make a
> patch and apply to xen-unstable and xen-4.0 and xen-3.4.

Done -- xen-unstable:21435, xen-4.0-testing:21157, xen-3.4-testing:19971

 K.

>  Thanks,
>  Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
  2010-05-19 19:54             ` Keir Fraser
@ 2010-05-19 19:59               ` Roger Cruz
  0 siblings, 0 replies; 17+ messages in thread
From: Roger Cruz @ 2010-05-19 19:59 UTC (permalink / raw)
  To: Keir Fraser, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1297 bytes --]

Wow, you are fast.  Thanks a bunch!

Roger


-----Original Message-----
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
Sent: Wed 5/19/2010 3:54 PM
To: Roger Cruz; xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] ACPI suspend/resume on Dell Inspirons 1464/1564/1764
 
On 19/05/2010 20:24, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

> On 19/05/2010 17:36, "Roger Cruz" <roger.cruz@virtualcomputer.com> wrote:
> 
>>    else
>>     {
>>         v->arch.hvm_vmx.secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_EPT;
>>         vmx_vmexit_control &= ~(VM_EXIT_SAVE_GUEST_PAT |
>>                                 VM_EXIT_LOAD_HOST_PAT);
>>         vmx_vmentry_control &= ~VM_ENTRY_LOAD_GUEST_PAT;
>>     }
> 
> This is the bug. construct_vmcs() should make local copies of
> vmx_vmexit_control and vmx_vmentry_control, and only clear bits in those
> local copies. It should then __vmwrite() those local copies. I will make a
> patch and apply to xen-unstable and xen-4.0 and xen-3.4.

Done -- xen-unstable:21435, xen-4.0-testing:21157, xen-3.4-testing:19971

 K.

>  Thanks,
>  Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel




[-- Attachment #1.2: Type: text/html, Size: 2468 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: ACPI suspend/resume on Dell Inspirons 1464/1564/1764
@ 2010-05-19  7:10 Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2010-05-19  7:10 UTC (permalink / raw)
  To: roger.cruz; +Cc: xen-devel

>>> "Roger Cruz"  05/19/10 12:39 AM >>>
>
>A little more info.  I am now able to wake up the Dell Inspiron 1764 after I put it to sleep.  I found that the code commented out below would cause the problems in my system.  I have yet to understand why these variables don't end up with the expected values.  If anyone has any thoughts that they would like to share on how this code works and why it is comparing to stored variables, I would very much like to hear them.

The checks are done because code elsewhere expects that the capabilities found on individual CPUs are consistent across the whole system. If you found that one of these BUG_ON()s triggers, you can certainly also find out which one it is, and what the specific discrepancy is.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-05-19 19:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-04 22:25 ACPI suspend/resume on Dell Inspirons 1464/1564/1764 Roger Cruz
2010-05-04 22:52 ` Jeremy Fitzhardinge
2010-05-04 23:06   ` Roger Cruz
2010-05-12 18:38 ` Roger Cruz
2010-05-18 22:34   ` Roger Cruz
2010-05-19  7:25     ` Keir Fraser
2010-05-19 14:30       ` Roger Cruz
2010-05-19 14:50         ` Keir Fraser
2010-05-19 14:59           ` Keir Fraser
2010-05-19 16:36         ` Roger Cruz
2010-05-19 19:24           ` Keir Fraser
2010-05-19 19:41             ` Roger Cruz
2010-05-19 19:50               ` Roger Cruz
2010-05-19 19:54             ` Keir Fraser
2010-05-19 19:59               ` Roger Cruz
2010-05-19 19:26           ` Roger Cruz
2010-05-19  7:10 Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.