From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: linux-kernel@vger.kernel.org, xen-devel@lists.xensource.com,
lenb@kernel.org, linux-acpi@vger.kernel.org, hpa@zytor.com,
x86@kernel.org
Subject: Re: [PATCH 4/4] xen/acpi: Prep saved_context cr3 values.
Date: Thu, 17 Jan 2013 09:48:35 -0500 [thread overview]
Message-ID: <20130117144835.GC2552@phenom.dumpdata.com> (raw)
In-Reply-To: <1350481786-4969-5-git-send-email-konrad.wilk@oracle.com>
On Wed, Oct 17, 2012 at 09:49:46AM -0400, Konrad Rzeszutek Wilk wrote:
> When save_processor_state is executed it executes a bunch of
> pvops calls to save the CPU state values. When it comes back
> from Xen's S3 (so acpi_enter_sleep_state, which ends up calling
> xen_acpi_notify_hypervisor_state), it ends up back in the
> assembler code in wakeup_[32|64].S. It skips the wakeup
> calls (so wakeup_pmode_return and wakeup_long64) as that has
> been done in the hypervisor. Instead it continues on in
> the resume_point (64-bit) or ret_point (32-bit). Most of the
> calls in there are safe to be executed except when it comes to
> reloading of cr3 (which it only does on 64-bit kernels). Since
> they are native assembler calls and Xen expects a PV kernel to
> prep those to use machine address for cr3 that is what
> we are going to do. Note: that it is not Machine Frame Numbers
> (those are used in the MMUEXT_NEW_BASEPTR hypercall for cr3
> installation) but the machine physical address.
>
> When the assembler code executes this mov %ebx, cr3 it ends
> end up trapped in the hypervisor (traps.c) which properly now
> sets the cr3 value.
And then I found out that after this nice
mov %ebx,cr3
in the assembler code (resume_point), it ends up calling
__restore_processor_state later on which does:
write_cr3(ctxt->cr3);
and since that is a pvops call which expects physical address and in
this patch we modified the ctxt->cr3 to be a machine address value, we end
up giving the hypervisor the wrong value.
This workaround makes it work.. but it is the ugly green-puke spotted
duct-tape variant.
So I think this idea of modifying the ctxt->cr3 to without modifying
the resume_point is a no-go. Len suggested at some point doing this
in the resume point:
-- a/arch/x86/kernel/acpi/wakeup_64.S
+++ b/arch/x86/kernel/acpi/wakeup_64.S
@@ -83,12 +83,16 @@ resume_point:
movq $saved_context, %rax
movq saved_context_cr4(%rax), %rbx
movq %rbx, %cr4
+
+ cmp $0x0, saved_context_skip(%rax)
+ jne skip
movq saved_context_cr3(%rax), %rbx
movq %rbx, %cr3
movq saved_context_cr2(%rax), %rbx
movq %rbx, %cr2
movq saved_context_cr0(%rax), %rbx
movq %rbx, %cr0
+skip:
pushq pt_regs_flags(%rax)
popfq
movq pt_regs_sp(%rax), %rsp
and that makes it work too. Surprisingly it also makes it work on
baremetal! (Note: Only tested right now on AMD).
anyhow, here is the duct-tape:
commit 09194f091d1eaef7b1aac0289f46acd7cc8c0845
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri Oct 19 13:41:10 2012 -0400
xen/acpi: Workaround movq X, %cr3 and write_cr3 pvops call using same value.
This is a quirk to work-around the discontinuity of the
x86_lowlevel_suspend code on x86_64. In it, after calling
acpi_suspend it calls resume_point, which restores the registers
and one of them is a movq X, %cr3. The movq X in that case needs
to be machine address. Later on it calls to restore_processor_state()
which uses the pvops variant (write_cr3) - which expects X to be
physical address. This function restores the cr3 value to be a
physical address to allow the pvops variant (write_cr3) to work.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index fd1f3dd..dfe1332 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1082,7 +1082,20 @@ static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
if (smp_processor_id() == 0)
xen_set_pat(((u64)high << 32) | low);
break;
-
+#if defined(CONFIG_X86_64)
+ case MSR_EFER:
+ /* Piggyback on the fact that in powers/cpu.c we do
+ * a wrmsr before the paravirt write_cr3. The cr3 value at that
+ * stage in saved_context.cr3 is a machine address instead of
+ * the physical address (this is done in drivers/xen/acpi.c to
+ * compensate for 'return_point' in wakeup_64.S doing an:
+ * movw %ebx, cr3). Anyhow, we piggy back here to reload the
+ * cr3 value of the saved_context. This is done b/c otherwise
+ * xen_read_cr3 will try to find the cr3 for the user-space
+ * case - and feed it to the hypercall (which would fail).
+ */
+ xen_acpi_reload_cr3_value();
+#endif
default:
ret = native_write_msr_safe(msr, low, high);
}
diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c
index 25e612c..a97414d 100644
--- a/drivers/xen/acpi.c
+++ b/drivers/xen/acpi.c
@@ -40,8 +40,22 @@
#ifdef CONFIG_X86_64
#include <asm/suspend_64.h>
extern struct saved_context saved_context;
-#endif
+/*
+ * This is a quirk to work-around the discontinuity of the
+ * x86_lowlevel_suspend code on x86_64. In it, after calling
+ * acpi_suspend it calls resume_point, which restores the registers
+ * and one of them is a movq X, %cr3. The movq X in that case needs
+ * to be machine address. Later on it calls to restore_processor_state()
+ * which uses the pvops variant (write_cr3) - which expects X to be
+ * physical address. This function restores the cr3 value to be a
+ * physical address to allow the pvops variant (write_cr3) to work.
+ */
+void xen_acpi_reload_cr3_value(void)
+{
+ saved_context.cr3 = read_cr3();
+}
+#endif
int xen_acpi_notify_hypervisor_state(u8 sleep_state,
u32 pm1a_cnt, u32 pm1b_cnt)
{
@@ -71,6 +85,11 @@ int xen_acpi_notify_hypervisor_state(u8 sleep_state,
{
unsigned long mfn;
+ /* Flushes out any multi-calls. */
+ arch_flush_lazy_mmu_mode();
+
+ /* Re-reads the CR3 in case of pending multicalls */
+ saved_context.cr3 = read_cr3();
/* resume_point in wakeup_64.s barrels through and does:
* movq saved_context_cr3(%rax), %rbx
* movq %rbx, %cr3
@@ -80,6 +99,14 @@ int xen_acpi_notify_hypervisor_state(u8 sleep_state,
/* and back to a physical address */
mfn = PFN_PHYS(mfn);
saved_context.cr3 = mfn;
+
+ /* Sadly, this has the end result that if we the resume code
+ * does the movq X, %cr3 and then later uses the X value to do
+ * an pvops call (write_cr3), we have a discontinuity.
+ * The movq expects a machine frame address while the pvops call
+ * expects a physical frame address. We fix this up with
+ * xen_acpi_reload_cr3_quirk which we put in wrmsr code.
+ */
}
#endif
HYPERVISOR_dom0_op(&op);
diff --git a/include/xen/acpi.h b/include/xen/acpi.h
index 48a9c01..ccf26f1 100644
--- a/include/xen/acpi.h
+++ b/include/xen/acpi.h
@@ -42,7 +42,9 @@
int xen_acpi_notify_hypervisor_state(u8 sleep_state,
u32 pm1a_cnt, u32 pm1b_cnd);
-
+#ifdef CONFIG_X86_64
+void xen_acpi_reload_cr3_value(void);
+#endif
static inline void xen_acpi_sleep_register(void)
{
if (xen_initial_domain())
@@ -53,6 +55,11 @@ static inline void xen_acpi_sleep_register(void)
static inline void xen_acpi_sleep_register(void)
{
}
+#ifdef CONFIG_X86_64
+static inline void xen_acpi_reload_cr3_value(void)
+{
+}
+#endif
#endif
#endif /* _XEN_ACPI_H */
next prev parent reply other threads:[~2013-01-18 3:49 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-17 13:49 [RFC] ACPI S3 and Xen (suprisingly small\!) Konrad Rzeszutek Wilk
2012-10-17 13:49 ` [PATCH 1/4] x86/wakeup/sleep: Check whether the TSS GDT descriptor is empty before using it Konrad Rzeszutek Wilk
2012-10-18 0:03 ` H. Peter Anvin
2012-10-18 14:47 ` Konrad Rzeszutek Wilk
2012-10-18 15:01 ` H. Peter Anvin
2013-01-17 14:41 ` Konrad Rzeszutek Wilk
2012-10-17 13:49 ` [PATCH 2/4] xen/lowlevel: Implement pvop call for load_idt (sidt) Konrad Rzeszutek Wilk
2012-10-17 23:51 ` H. Peter Anvin
2012-10-18 14:45 ` Konrad Rzeszutek Wilk
2012-10-18 15:02 ` H. Peter Anvin
2013-01-17 14:36 ` Konrad Rzeszutek Wilk
2012-10-17 13:49 ` [PATCH 3/4] xen/lowlevel: Implement pvop call for store_gdt (gidt) Konrad Rzeszutek Wilk
2012-10-17 13:49 ` [PATCH 4/4] xen/acpi: Prep saved_context cr3 values Konrad Rzeszutek Wilk
2013-01-17 14:48 ` Konrad Rzeszutek Wilk [this message]
2012-10-17 16:03 ` [RFC] ACPI S3 and Xen (suprisingly small\!) H. Peter Anvin
2012-10-17 16:10 ` Is: axe read_tscp pvops call. Was: " Konrad Rzeszutek Wilk
2012-10-17 16:39 ` Konrad Rzeszutek Wilk
2012-10-17 16:54 ` H. Peter Anvin
2012-10-17 16:50 ` H. Peter Anvin
2012-10-17 16:54 ` Konrad Rzeszutek Wilk
2012-10-17 17:35 ` H. Peter Anvin
2012-10-18 15:22 ` [Xen-devel] " Dan Magenheimer
2012-10-18 15:28 ` H. Peter Anvin
2012-10-18 15:56 ` Dan Magenheimer
2012-10-18 16:17 ` Borislav Petkov
2012-10-18 16:44 ` Stefano Stabellini
2012-10-18 17:04 ` H. Peter Anvin
2012-10-18 16:37 ` H. Peter Anvin
2012-10-19 15:48 ` Is: Xen architecture document. Was: " Konrad Rzeszutek Wilk
2012-10-19 17:45 ` H. Peter Anvin
2012-10-18 16:31 ` David Vrabel
2012-10-18 16:31 ` David Vrabel
2012-10-18 17:42 ` Konrad Rzeszutek Wilk
2012-10-18 18:02 ` David Vrabel
2012-10-17 17:46 ` Ben Guthro
2012-10-17 17:43 ` Konrad Rzeszutek Wilk
2012-10-17 18:00 ` Ben Guthro
2012-10-19 18:49 ` Konrad Rzeszutek Wilk
2012-10-20 1:23 ` Ben Guthro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130117144835.GC2552@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=hpa@zytor.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.