All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: linux-kernel@vger.kernel.org, xen-devel@lists.xensource.com,
	lenb@kernel.org, linux-acpi@vger.kernel.org, hpa@zytor.com,
	x86@kernel.org
Subject: Re: [PATCH 4/4] xen/acpi: Prep saved_context cr3 values.
Date: Thu, 17 Jan 2013 09:48:35 -0500	[thread overview]
Message-ID: <20130117144835.GC2552@phenom.dumpdata.com> (raw)
In-Reply-To: <1350481786-4969-5-git-send-email-konrad.wilk@oracle.com>

On Wed, Oct 17, 2012 at 09:49:46AM -0400, Konrad Rzeszutek Wilk wrote:
> When save_processor_state is executed it executes a bunch of
> pvops calls to save the CPU state values. When it comes back
> from Xen's S3 (so acpi_enter_sleep_state, which ends up calling
> xen_acpi_notify_hypervisor_state), it ends up back in the
> assembler code in wakeup_[32|64].S. It skips the wakeup
> calls (so wakeup_pmode_return and wakeup_long64) as that has
> been done in the hypervisor. Instead it continues on in
> the resume_point (64-bit) or ret_point (32-bit). Most of the
> calls in there are safe to be executed except when it comes to
> reloading of cr3 (which it only does on 64-bit kernels). Since
> they are native assembler calls and Xen expects a PV kernel to
> prep those to use machine address for cr3 that is what
> we are going to do. Note: that it is not Machine Frame Numbers
> (those are used in the MMUEXT_NEW_BASEPTR hypercall for cr3
> installation) but the machine physical address.
> 
> When the assembler code executes this mov %ebx, cr3 it ends
> end up trapped in the hypervisor (traps.c) which properly now
> sets the cr3 value.

And then I found out that after this nice
	 mov %ebx,cr3

in the assembler code (resume_point), it ends up calling
__restore_processor_state later on which does:

	write_cr3(ctxt->cr3);

and since that is a pvops call which expects physical address and in
this patch we  modified the ctxt->cr3 to be a machine address value, we end
up giving the hypervisor the wrong value.

This workaround makes it work.. but it is the ugly green-puke spotted
duct-tape variant.

So I think this idea of modifying the ctxt->cr3 to without modifying
the resume_point is a no-go. Len suggested at some point doing this
in the resume point:

-- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -83,12 +83,16 @@ resume_point:
        movq    $saved_context, %rax
        movq    saved_context_cr4(%rax), %rbx
        movq    %rbx, %cr4
+
+       cmp     $0x0, saved_context_skip(%rax)
+       jne     skip
        movq    saved_context_cr3(%rax), %rbx
        movq    %rbx, %cr3
        movq    saved_context_cr2(%rax), %rbx
        movq    %rbx, %cr2
        movq    saved_context_cr0(%rax), %rbx
        movq    %rbx, %cr0
+skip:
        pushq   pt_regs_flags(%rax)
        popfq
        movq    pt_regs_sp(%rax), %rsp

and that makes it work too. Surprisingly it also makes it work on
baremetal! (Note: Only tested right now on AMD).

anyhow, here is the duct-tape:

commit 09194f091d1eaef7b1aac0289f46acd7cc8c0845
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date:   Fri Oct 19 13:41:10 2012 -0400

    xen/acpi: Workaround movq X, %cr3 and write_cr3 pvops call using same value.
    
    This is a quirk to work-around the discontinuity of the
    x86_lowlevel_suspend code on x86_64. In it, after calling
    acpi_suspend it calls resume_point, which restores the registers
    and one of them is a movq X, %cr3. The movq X in that case needs
    to be machine address. Later on it calls to  restore_processor_state()
    which uses the pvops variant (write_cr3) - which expects X to be
    physical address. This function restores the cr3 value to be a
    physical address to allow the pvops variant (write_cr3) to work.
    
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index fd1f3dd..dfe1332 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1082,7 +1082,20 @@ static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
 		if (smp_processor_id() == 0)
 			xen_set_pat(((u64)high << 32) | low);
 		break;
-
+#if defined(CONFIG_X86_64)
+	case MSR_EFER:
+		/* Piggyback on the fact that in powers/cpu.c we do
+ 		 * a wrmsr before the paravirt write_cr3. The cr3 value at that
+ 		 * stage in saved_context.cr3 is a machine address instead of
+ 		 * the physical address (this is done in drivers/xen/acpi.c to
+ 		 * compensate for 'return_point' in wakeup_64.S doing an:
+ 		 * movw %ebx, cr3). Anyhow, we piggy back here to reload the
+ 		 * cr3 value of the saved_context. This is done b/c otherwise
+ 		 * xen_read_cr3 will try to find the cr3 for the user-space
+ 		 * case - and feed it to the hypercall (which would fail).
+ 		 */
+		xen_acpi_reload_cr3_value();
+#endif
 	default:
 		ret = native_write_msr_safe(msr, low, high);
 	}
diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c
index 25e612c..a97414d 100644
--- a/drivers/xen/acpi.c
+++ b/drivers/xen/acpi.c
@@ -40,8 +40,22 @@
 #ifdef CONFIG_X86_64
 #include <asm/suspend_64.h>
 extern struct saved_context saved_context;
-#endif
 
+/*
+ * This is a quirk to work-around the discontinuity of the
+ * x86_lowlevel_suspend code on x86_64. In it, after calling
+ * acpi_suspend it calls resume_point, which restores the registers
+ * and one of them is a movq X, %cr3. The movq X in that case needs
+ * to be machine address. Later on it calls to  restore_processor_state()
+ * which uses the pvops variant (write_cr3) - which expects X to be
+ * physical address. This function restores the cr3 value to be a
+ * physical address to allow the pvops variant (write_cr3) to work.
+ */
+void xen_acpi_reload_cr3_value(void)
+{
+	saved_context.cr3 = read_cr3();
+}
+#endif
 int xen_acpi_notify_hypervisor_state(u8 sleep_state,
 				     u32 pm1a_cnt, u32 pm1b_cnt)
 {
@@ -71,6 +85,11 @@ int xen_acpi_notify_hypervisor_state(u8 sleep_state,
 	{
 		unsigned long mfn;
 
+		/* Flushes out any multi-calls. */
+		arch_flush_lazy_mmu_mode();
+
+		/* Re-reads the CR3 in case of pending multicalls */
+		saved_context.cr3 = read_cr3();
 		/* resume_point in wakeup_64.s barrels through and does:
 		 * movq    saved_context_cr3(%rax), %rbx
 		 * movq    %rbx, %cr3
@@ -80,6 +99,14 @@ int xen_acpi_notify_hypervisor_state(u8 sleep_state,
 		/* and back to a physical address */
 		mfn = PFN_PHYS(mfn);
 		saved_context.cr3 = mfn;
+
+		/* Sadly, this has the end result that if we the resume code
+ 		 * does the movq X, %cr3 and then later uses the X value to do
+ 		 * an pvops call (write_cr3), we have a discontinuity.
+ 		 * The movq expects a machine frame address while the pvops call
+ 		 * expects a physical frame address. We fix this up with
+ 		 * xen_acpi_reload_cr3_quirk which we put in wrmsr code.
+ 		 */
 	}
 #endif
 	HYPERVISOR_dom0_op(&op);
diff --git a/include/xen/acpi.h b/include/xen/acpi.h
index 48a9c01..ccf26f1 100644
--- a/include/xen/acpi.h
+++ b/include/xen/acpi.h
@@ -42,7 +42,9 @@
 
 int xen_acpi_notify_hypervisor_state(u8 sleep_state,
 				     u32 pm1a_cnt, u32 pm1b_cnd);
-
+#ifdef CONFIG_X86_64
+void xen_acpi_reload_cr3_value(void);
+#endif
 static inline void xen_acpi_sleep_register(void)
 {
 	if (xen_initial_domain())
@@ -53,6 +55,11 @@ static inline void xen_acpi_sleep_register(void)
 static inline void xen_acpi_sleep_register(void)
 {
 }
+#ifdef CONFIG_X86_64
+static inline void xen_acpi_reload_cr3_value(void)
+{
+}
+#endif
 #endif
 
 #endif	/* _XEN_ACPI_H */

  reply	other threads:[~2013-01-18  3:49 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-17 13:49 [RFC] ACPI S3 and Xen (suprisingly small\!) Konrad Rzeszutek Wilk
2012-10-17 13:49 ` [PATCH 1/4] x86/wakeup/sleep: Check whether the TSS GDT descriptor is empty before using it Konrad Rzeszutek Wilk
2012-10-18  0:03   ` H. Peter Anvin
2012-10-18 14:47     ` Konrad Rzeszutek Wilk
2012-10-18 15:01       ` H. Peter Anvin
2013-01-17 14:41     ` Konrad Rzeszutek Wilk
2012-10-17 13:49 ` [PATCH 2/4] xen/lowlevel: Implement pvop call for load_idt (sidt) Konrad Rzeszutek Wilk
2012-10-17 23:51   ` H. Peter Anvin
2012-10-18 14:45     ` Konrad Rzeszutek Wilk
2012-10-18 15:02       ` H. Peter Anvin
2013-01-17 14:36     ` Konrad Rzeszutek Wilk
2012-10-17 13:49 ` [PATCH 3/4] xen/lowlevel: Implement pvop call for store_gdt (gidt) Konrad Rzeszutek Wilk
2012-10-17 13:49 ` [PATCH 4/4] xen/acpi: Prep saved_context cr3 values Konrad Rzeszutek Wilk
2013-01-17 14:48   ` Konrad Rzeszutek Wilk [this message]
2012-10-17 16:03 ` [RFC] ACPI S3 and Xen (suprisingly small\!) H. Peter Anvin
2012-10-17 16:10   ` Is: axe read_tscp pvops call. Was: " Konrad Rzeszutek Wilk
2012-10-17 16:39     ` Konrad Rzeszutek Wilk
2012-10-17 16:54       ` H. Peter Anvin
2012-10-17 16:50     ` H. Peter Anvin
2012-10-17 16:54       ` Konrad Rzeszutek Wilk
2012-10-17 17:35         ` H. Peter Anvin
2012-10-18 15:22           ` [Xen-devel] " Dan Magenheimer
2012-10-18 15:28             ` H. Peter Anvin
2012-10-18 15:56               ` Dan Magenheimer
2012-10-18 16:17                 ` Borislav Petkov
2012-10-18 16:44                   ` Stefano Stabellini
2012-10-18 17:04                     ` H. Peter Anvin
2012-10-18 16:37                 ` H. Peter Anvin
2012-10-19 15:48                   ` Is: Xen architecture document. Was: " Konrad Rzeszutek Wilk
2012-10-19 17:45                     ` H. Peter Anvin
2012-10-18 16:31         ` David Vrabel
2012-10-18 16:31           ` David Vrabel
2012-10-18 17:42           ` Konrad Rzeszutek Wilk
2012-10-18 18:02             ` David Vrabel
2012-10-17 17:46 ` Ben Guthro
2012-10-17 17:43   ` Konrad Rzeszutek Wilk
2012-10-17 18:00     ` Ben Guthro
2012-10-19 18:49       ` Konrad Rzeszutek Wilk
2012-10-20  1:23         ` Ben Guthro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130117144835.GC2552@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=hpa@zytor.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.