All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0 of 6] Fix kexec in Xen (take 3)
@ 2011-05-25 14:32 Andrew Cooper
  2011-05-25 14:32 ` [PATCH 1 of 6] APIC: record local APIC state on boot Andrew Cooper
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Andrew Cooper @ 2011-05-25 14:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

This set of patches is designed to get the kexec path working again on Xen 4.x

kdump kernels can't boot if x2apic mode is enabled and the ACPI tables dont state this fact.  They also cant boot at all with interrupt remapping enabled.

These patches cause xen to track the BSP local APIC boot state and return to it before kexec'ing to a new kernel.  It also makes sure to disable IO virtualisation.

One area which is problematic is disabling interrupt remapping.  lapic_suspend() calls iommu_disable_x2apic_IR() which in a previous thread was deemed to be Intel specific and only works by chance on AMD boxes by effectivly being a NOP.  As lapic_suspend() is generic code, does this mean that we can't/don't ever disable interrupt remapping on AMD boxes?

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1 of 6] APIC: record local APIC state on boot
  2011-05-25 14:32 [PATCH 0 of 6] Fix kexec in Xen (take 3) Andrew Cooper
@ 2011-05-25 14:32 ` Andrew Cooper
  2011-05-25 14:32 ` [PATCH 2 of 6] APIC: remove 'enabled_via_apicbase' variable Andrew Cooper
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2011-05-25 14:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

Xen does not store the boot local APIC state which leads to problems
when shutting down for a kexec jump.  This patch records the boot
state so we can return to the boot state when kexec'ing.

Section 3.8 of the MultiProcessor spec states that when the bios hands
over to the operating system, all AP locals APICs should be disabled, and
the BSP local APIC is up to the bios with regards to compatability.
Therefore, we only need to record the BSP mode.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r 37c77bacb52a -r 8d30fccc0771 xen/arch/x86/apic.c
--- a/xen/arch/x86/apic.c	Mon May 23 17:38:28 2011 +0100
+++ b/xen/arch/x86/apic.c	Wed May 25 15:10:24 2011 +0100
@@ -74,6 +74,8 @@ u8 __read_mostly apic_verbosity;
 static bool_t __initdata opt_x2apic = 1;
 boolean_param("x2apic", opt_x2apic);
 
+enum apic_mode __read_mostly apic_boot_mode = APIC_MODE_INVALID;
+
 bool_t __read_mostly x2apic_enabled = 0;
 bool_t __read_mostly directed_eoi_enabled = 0;
 
@@ -1437,8 +1439,64 @@ int __init APIC_init_uniprocessor (void)
     return 0;
 }
 
+/* Needs to be called once per CPU during startup.  It records the state the BIOS
+ * leaves the local APIC so we can tare back down upon shutdown/crash
+ */
+void __init record_boot_APIC_mode(void)
+{
+
+    /* Sanity check - we should only ever run once, but could possibly be called
+     * several times */
+    if( APIC_MODE_INVALID != apic_boot_mode )
+        return;
+
+    apic_boot_mode = current_local_apic_mode();
+
+    apic_printk(APIC_DEBUG, "APIC boot state is '%s' on core #%d\n",
+                apic_mode_to_str(apic_boot_mode), smp_processor_id());
+}
+
+/* Look at the bits in MSR_IA32_APICBASE and work out which APIC mode we are in */
+const enum apic_mode current_local_apic_mode(void)
+{
+    u64 msr_contents;
+
+    rdmsrl(MSR_IA32_APICBASE, msr_contents);
+
+    /* Reading EXTD bit from the MSR is only valid if CPUID says so, else reserved */
+    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
+         && (msr_contents & MSR_IA32_APICBASE_EXTD) )
+        return APIC_MODE_X2APIC;
+    else
+        {
+            /* EN bit should always be valid as long as we can read the MSR
+             */
+            if ( msr_contents & MSR_IA32_APICBASE_ENABLE )
+                return APIC_MODE_XAPIC;
+            else
+                return APIC_MODE_DISABLED;
+        }
+}
+
 void check_for_unexpected_msi(unsigned int vector)
 {
     unsigned long v = apic_read(APIC_ISR + ((vector & ~0x1f) >> 1));
     BUG_ON(v & (1 << (vector & 0x1f)));
 }
+
+const char * apic_mode_to_str(const enum apic_mode mode)
+{
+    switch(mode)
+        {
+        case APIC_MODE_INVALID:
+            return "invalid";
+        case APIC_MODE_DISABLED:
+            return "disabled";
+        case APIC_MODE_XAPIC:
+            return "xapic";
+        case APIC_MODE_X2APIC:
+            return "x2apic";
+        default:
+            return "unrecognised";
+        }
+}
diff -r 37c77bacb52a -r 8d30fccc0771 xen/arch/x86/genapic/probe.c
--- a/xen/arch/x86/genapic/probe.c	Mon May 23 17:38:28 2011 +0100
+++ b/xen/arch/x86/genapic/probe.c	Wed May 25 15:10:24 2011 +0100
@@ -60,6 +60,8 @@ void __init generic_apic_probe(void)
 { 
 	int i, changed;
 
+    record_boot_APIC_mode();
+
 	check_x2apic_preenabled();
 	cmdline_apic = changed = (genapic != NULL);
 
diff -r 37c77bacb52a -r 8d30fccc0771 xen/include/asm-x86/apic.h
--- a/xen/include/asm-x86/apic.h	Mon May 23 17:38:28 2011 +0100
+++ b/xen/include/asm-x86/apic.h	Wed May 25 15:10:24 2011 +0100
@@ -21,6 +21,19 @@
 #define IO_APIC_REDIR_DEST_LOGICAL	0x00800
 #define IO_APIC_REDIR_DEST_PHYSICAL	0x00000
 
+/* Possible APIC states */
+enum apic_mode { APIC_MODE_INVALID,  /* Not set yet */
+                 APIC_MODE_DISABLED, /* Some bioses disable by default for compatability */
+                 APIC_MODE_XAPIC,    /* xAPIC mode - default upon chipset reset */
+                 APIC_MODE_X2APIC    /* x2APIC mode - common for large smp machines */
+};
+
+/* Bootstrap processor local APIC boot mode - so we can taredown to bios state */
+extern enum apic_mode apic_boot_mode;
+
+/* enum apic_mode -> str function for logging/debug */
+const char * apic_mode_to_str(const enum apic_mode);
+
 extern u8 apic_verbosity;
 extern bool_t x2apic_enabled;
 extern bool_t directed_eoi_enabled;
@@ -203,6 +216,8 @@ extern void disable_APIC_timer(void);
 extern void enable_APIC_timer(void);
 extern int lapic_suspend(void);
 extern int lapic_resume(void);
+extern void record_boot_APIC_mode(void);
+extern const enum apic_mode current_local_apic_mode(void);
 
 extern int check_nmi_watchdog (void);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2 of 6] APIC: remove 'enabled_via_apicbase' variable
  2011-05-25 14:32 [PATCH 0 of 6] Fix kexec in Xen (take 3) Andrew Cooper
  2011-05-25 14:32 ` [PATCH 1 of 6] APIC: record local APIC state on boot Andrew Cooper
@ 2011-05-25 14:32 ` Andrew Cooper
  2011-05-25 14:32 ` [PATCH 3 of 6] APIC: add crash_disable_local_APIC Andrew Cooper
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2011-05-25 14:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

The use of this varable was only sensible when the choice for local APIC
mode was between enabled or disabled.  Now that x2apic is about, it is
wrong, and causes a protection fault in certain cases when trying to tear
down x2apic mode.

The only place where its use is relevent in the code is in
disable_local_APIC which is very out of date.  It has now been updated
to be in line with the MultiProcessor Spec, as well as being able
to successfully shut down from x2apic mode without the possibility of
throwing a protection fault (which leads to a general protection fault).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r 8d30fccc0771 -r 17991cc69e88 xen/arch/x86/apic.c
--- a/xen/arch/x86/apic.c	Wed May 25 15:10:24 2011 +0100
+++ b/xen/arch/x86/apic.c	Wed May 25 15:10:52 2011 +0100
@@ -165,8 +165,6 @@ void __init apic_intr_init(void)
 /* Using APIC to generate smp_local_timer_interrupt? */
 static bool_t __read_mostly using_apic_timer;
 
-static bool_t __read_mostly enabled_via_apicbase;
-
 int get_physical_broadcast(void)
 {
     if (modern_apic())
@@ -330,6 +328,8 @@ void disconnect_bsp_APIC(int virt_wire_s
 
 void disable_local_APIC(void)
 {
+    u64 msr_contents;
+
     clear_local_APIC();
 
     /*
@@ -339,10 +339,39 @@ void disable_local_APIC(void)
     apic_write_around(APIC_SPIV,
         apic_read(APIC_SPIV) & ~APIC_SPIV_APIC_ENABLED);
 
-    if (enabled_via_apicbase) {
-        uint64_t msr_content;
-        rdmsrl(MSR_IA32_APICBASE, msr_content);
-        wrmsrl(MSR_IA32_APICBASE, msr_content & ~MSR_IA32_APICBASE_ENABLE);
+    /* Always disable the APIC */
+    rdmsrl(MSR_IA32_APICBASE, msr_contents);
+    msr_contents &= ~ ( MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD );
+    wrmsrl(MSR_IA32_APICBASE, msr_contents);
+
+    /* If we are the boot processor, stick the local apic back to how we found
+     * it on boot. */
+    if( smp_processor_id() != 0 )
+    {
+
+        switch(apic_boot_mode)
+        {
+        case APIC_MODE_DISABLED:
+            break; /* Nothing to do - we did this above */
+        case APIC_MODE_XAPIC:
+        {
+            msr_contents |= MSR_IA32_APICBASE_ENABLE;
+            wrmsrl(MSR_IA32_APICBASE, msr_contents);
+            break;
+        }
+        case APIC_MODE_X2APIC:
+        {
+            msr_contents |= ( MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD );
+            wrmsrl(MSR_IA32_APICBASE, msr_contents);
+            break;
+        }
+        default:
+        {
+            printk("Hit default case when reverting lapic to boot state on core #%d\n",
+                   smp_processor_id());
+            break;
+        }
+        }
     }
 }
 
@@ -874,7 +903,6 @@ static int __init detect_init_APIC (void
             wrmsrl(MSR_IA32_APICBASE,
                 msr_content | MSR_IA32_APICBASE_ENABLE
                 | APIC_DEFAULT_PHYS_BASE);
-            enabled_via_apicbase = 1;
         }
     }
     /*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3 of 6] APIC: add crash_disable_local_APIC
  2011-05-25 14:32 [PATCH 0 of 6] Fix kexec in Xen (take 3) Andrew Cooper
  2011-05-25 14:32 ` [PATCH 1 of 6] APIC: record local APIC state on boot Andrew Cooper
  2011-05-25 14:32 ` [PATCH 2 of 6] APIC: remove 'enabled_via_apicbase' variable Andrew Cooper
@ 2011-05-25 14:32 ` Andrew Cooper
  2011-05-25 14:32 ` [PATCH 4 of 6] IOMMU: Sanitise some of our pointer work Andrew Cooper
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2011-05-25 14:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

This new function is needed because on an SMP system, we are
not likely to be crashing on the BootStrap Processor.  As a result,
we need to leave the crashing CPU in the state that the BSP was
booted with, so that the kdump kernel can take over as if it
were booting on the real BSP.

diff -r 17991cc69e88 -r 2f91c312ade5 xen/arch/x86/apic.c
--- a/xen/arch/x86/apic.c	Wed May 25 15:10:52 2011 +0100
+++ b/xen/arch/x86/apic.c	Wed May 25 15:11:58 2011 +0100
@@ -375,6 +375,56 @@ void disable_local_APIC(void)
     }
 }
 
+void crash_disable_local_APIC(bool_t crashing_cpu)
+{
+    u64 msr_contents;
+
+    clear_local_APIC();
+
+    /*
+     * Disable APIC (implies clearing of registers
+     * for 82489DX!).
+     */
+    apic_write_around(APIC_SPIV,
+        apic_read(APIC_SPIV) & ~APIC_SPIV_APIC_ENABLED);
+
+    /* Always disable the APIC */
+    rdmsrl(MSR_IA32_APICBASE, msr_contents);
+    msr_contents &= ~ ( MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD );
+    wrmsrl(MSR_IA32_APICBASE, msr_contents);
+
+    /* If we are the crashing processor, stick the local apic back to how we found
+     * it on boot, on the bootstrap processor. The crashdump kernel knows that it
+     * may not be booting on the real bootstrap processor. */
+    if( crashing_cpu )
+    {
+        switch(apic_boot_mode)
+        {
+        case APIC_MODE_DISABLED:
+            break; /* Nothing to do - we did this above */
+        case APIC_MODE_XAPIC:
+        {
+            msr_contents |= MSR_IA32_APICBASE_ENABLE;
+            wrmsrl(MSR_IA32_APICBASE, msr_contents);
+            break;
+        }
+        case APIC_MODE_X2APIC:
+        {
+            msr_contents |= ( MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD );
+            wrmsrl(MSR_IA32_APICBASE, msr_contents);
+            break;
+        }
+        default:
+        {
+            printk("Hit default case when reverting lapic to boot state on core #%d\n",
+                   smp_processor_id());
+            break;
+        }
+        }
+    }
+}
+
+
 /*
  * This is to verify that we're looking at a real local APIC.
  * Check these against your board if the CPUs aren't getting
diff -r 17991cc69e88 -r 2f91c312ade5 xen/include/asm-x86/apic.h
--- a/xen/include/asm-x86/apic.h	Wed May 25 15:10:52 2011 +0100
+++ b/xen/include/asm-x86/apic.h	Wed May 25 15:11:58 2011 +0100
@@ -195,6 +195,7 @@ extern void clear_local_APIC(void);
 extern void connect_bsp_APIC (void);
 extern void disconnect_bsp_APIC (int virt_wire_setup);
 extern void disable_local_APIC (void);
+extern void crash_disable_local_APIC (bool_t crashing_cpu);
 extern int verify_local_APIC (void);
 extern void cache_APIC_registers (void);
 extern void sync_Arb_IDs (void);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4 of 6] IOMMU: Sanitise some of our pointer work
  2011-05-25 14:32 [PATCH 0 of 6] Fix kexec in Xen (take 3) Andrew Cooper
                   ` (2 preceding siblings ...)
  2011-05-25 14:32 ` [PATCH 3 of 6] APIC: add crash_disable_local_APIC Andrew Cooper
@ 2011-05-25 14:32 ` Andrew Cooper
  2011-05-25 17:28   ` Konrad Rzeszutek Wilk
  2011-05-25 14:32 ` [PATCH 5 of 6] IOMMU: add crash_shutdown iommu_op Andrew Cooper
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2011-05-25 14:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

This is not related to the rest of my kdump changes, but as pointed
out by Konrad in a previous thread, we really should make these checks
before blindly calling them.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r 2f91c312ade5 -r 80f7a773887d xen/drivers/passthrough/iommu.c
--- a/xen/drivers/passthrough/iommu.c	Wed May 25 15:11:58 2011 +0100
+++ b/xen/drivers/passthrough/iommu.c	Wed May 25 15:12:23 2011 +0100
@@ -407,17 +407,17 @@ unsigned int iommu_read_apic_from_ire(un
     return ops->read_apic_from_ire(apic, reg);
 }
 
-void iommu_resume()
+void iommu_resume(void)
 {
     const struct iommu_ops *ops = iommu_get_ops();
-    if ( iommu_enabled )
+    if ( iommu_enabled && ops && ops->resume )
         ops->resume();
 }
 
-void iommu_suspend()
+void iommu_suspend(void)
 {
     const struct iommu_ops *ops = iommu_get_ops();
-    if ( iommu_enabled )
+    if ( iommu_enabled && ops && ops->resume )
         ops->suspend();
 }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 5 of 6] IOMMU: add crash_shutdown iommu_op
  2011-05-25 14:32 [PATCH 0 of 6] Fix kexec in Xen (take 3) Andrew Cooper
                   ` (3 preceding siblings ...)
  2011-05-25 14:32 ` [PATCH 4 of 6] IOMMU: Sanitise some of our pointer work Andrew Cooper
@ 2011-05-25 14:32 ` Andrew Cooper
  2011-05-25 14:32 ` [PATCH 6 of 6] KEXEC: disable iommu jumping into the kdump kernel Andrew Cooper
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2011-05-25 14:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

The kdump kernel has problems booting with interrupt/dma
remapping enabled, so we need a new iommu_ops called
crash_shutdown which is basically suspend but doesn't
need to bother saving state.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r 80f7a773887d -r d3026545e9a0 xen/drivers/passthrough/amd/iommu_init.c
--- a/xen/drivers/passthrough/amd/iommu_init.c	Wed May 25 15:12:23 2011 +0100
+++ b/xen/drivers/passthrough/amd/iommu_init.c	Wed May 25 15:12:37 2011 +0100
@@ -921,6 +921,14 @@ void amd_iommu_suspend(void)
         disable_iommu(iommu);
 }
 
+void amd_iommu_crash_shutdown(void)
+{
+    struct amd_iommu *iommu;
+
+    for_each_amd_iommu ( iommu )
+        disable_iommu(iommu);
+}
+
 void amd_iommu_resume(void)
 {
     struct amd_iommu *iommu;
diff -r 80f7a773887d -r d3026545e9a0 xen/drivers/passthrough/amd/pci_amd_iommu.c
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c	Wed May 25 15:12:23 2011 +0100
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c	Wed May 25 15:12:37 2011 +0100
@@ -459,4 +459,5 @@ const struct iommu_ops amd_iommu_ops = {
     .suspend = amd_iommu_suspend,
     .resume = amd_iommu_resume,
     .share_p2m = amd_iommu_share_p2m,
+    .crash_shutdown = amd_iommu_crash_shutdown,
 };
diff -r 80f7a773887d -r d3026545e9a0 xen/drivers/passthrough/iommu.c
--- a/xen/drivers/passthrough/iommu.c	Wed May 25 15:12:23 2011 +0100
+++ b/xen/drivers/passthrough/iommu.c	Wed May 25 15:12:37 2011 +0100
@@ -421,6 +421,13 @@ void iommu_suspend(void)
         ops->suspend();
 }
 
+void iommu_crash_shutdown(void)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+    if ( ops && ops->crash_shutdown )
+        ops->crash_shutdown();
+}
+
 void iommu_share_p2m_table(struct domain* d)
 {
     const struct iommu_ops *ops = iommu_get_ops();
diff -r 80f7a773887d -r d3026545e9a0 xen/drivers/passthrough/vtd/iommu.c
--- a/xen/drivers/passthrough/vtd/iommu.c	Wed May 25 15:12:23 2011 +0100
+++ b/xen/drivers/passthrough/vtd/iommu.c	Wed May 25 15:12:37 2011 +0100
@@ -2238,6 +2238,25 @@ static void vtd_suspend(void)
     }
 }
 
+static void vtd_crash_shutdown(void)
+{
+    struct acpi_drhd_unit *drhd;
+    struct iommu *iommu;
+
+    if ( !iommu_enabled )
+        return;
+
+    iommu_flush_all();
+
+    for_each_drhd_unit ( drhd )
+    {
+        iommu = drhd->iommu;
+        iommu_disable_translation(iommu);
+    }
+
+    iommu_disable_x2apic_IR();
+}
+
 static void vtd_resume(void)
 {
     struct acpi_drhd_unit *drhd;
@@ -2289,6 +2308,7 @@ const struct iommu_ops intel_iommu_ops =
     .suspend = vtd_suspend,
     .resume = vtd_resume,
     .share_p2m = iommu_set_pgd,
+    .crash_shutdown = vtd_crash_shutdown,
 };
 
 /*
diff -r 80f7a773887d -r d3026545e9a0 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h	Wed May 25 15:12:23 2011 +0100
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h	Wed May 25 15:12:37 2011 +0100
@@ -98,6 +98,7 @@ unsigned int amd_iommu_read_ioapic_from_
 /* power management support */
 void amd_iommu_resume(void);
 void amd_iommu_suspend(void);
+void amd_iommu_crash_shutdown(void);
 
 static inline u32 get_field_from_reg_u32(u32 reg_value, u32 mask, u32 shift)
 {
diff -r 80f7a773887d -r d3026545e9a0 xen/include/xen/iommu.h
--- a/xen/include/xen/iommu.h	Wed May 25 15:12:23 2011 +0100
+++ b/xen/include/xen/iommu.h	Wed May 25 15:12:37 2011 +0100
@@ -133,6 +133,7 @@ struct iommu_ops {
     void (*suspend)(void);
     void (*resume)(void);
     void (*share_p2m)(struct domain *d);
+    void (*crash_shutdown)(void);
 };
 
 void iommu_update_ire_from_apic(unsigned int apic, unsigned int reg, unsigned int value);
@@ -142,6 +143,7 @@ unsigned int iommu_read_apic_from_ire(un
 
 void iommu_suspend(void);
 void iommu_resume(void);
+void iommu_crash_shutdown(void);
 
 void iommu_set_dom0_mapping(struct domain *d);
 void iommu_share_p2m_table(struct domain *d);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 6 of 6] KEXEC: disable iommu jumping into the kdump kernel
  2011-05-25 14:32 [PATCH 0 of 6] Fix kexec in Xen (take 3) Andrew Cooper
                   ` (4 preceding siblings ...)
  2011-05-25 14:32 ` [PATCH 5 of 6] IOMMU: add crash_shutdown iommu_op Andrew Cooper
@ 2011-05-25 14:32 ` Andrew Cooper
  2011-05-25 15:01 ` [PATCH 0 of 6] Fix kexec in Xen (take 3) Wei Wang2
  2011-05-25 16:14 ` Jan Beulich
  7 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2011-05-25 14:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

kdump kernels are unable to boot with IOMMU enabled.
this patch disabled IOMMU mode and removes some of the generic
code from the shutdown path which doesnt work after other
CPUs have been shot down.

Because we need to replace the calls to disable_local_APIC,
we remove the calls to __stop_this_cpu and replace them inline,
with suitable modification regarding interrupts and local APICs.

At the bottom of nmi_shootdown_cpus, remove the call to
local_irq_enable as this causes us to jump into purgatory with
the interrupt flag enabled.  From a quick qrep through the current
kexec-tools source, there is a fair amount of time before the
interrupt flag is touched, meaning that we could possibly be
servicing interrupts in the Xen context even though we have really
crashed and left.

hpet_disable_legacy_broadcast has been split sideways into a crash
version which forgoes the locks (as only 1 cpu is still running by
this point), and forgoes the IPI to all other processors.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r d3026545e9a0 -r cb005f9078d3 xen/arch/x86/crash.c
--- a/xen/arch/x86/crash.c	Wed May 25 15:12:37 2011 +0100
+++ b/xen/arch/x86/crash.c	Wed May 25 15:30:43 2011 +0100
@@ -27,6 +27,7 @@
 #include <asm/hvm/support.h>
 #include <asm/apic.h>
 #include <asm/io_apic.h>
+#include <xen/iommu.h>
 
 static atomic_t waiting_for_crash_ipi;
 static unsigned int crashing_cpu;
@@ -43,7 +44,10 @@ static int crash_nmi_callback(struct cpu
 
     kexec_crash_save_cpu();
 
-    __stop_this_cpu();
+    crash_disable_local_APIC(FALSE);
+    hvm_cpu_down();
+    clts();
+    asm volatile ( "fninit" );
 
     atomic_dec(&waiting_for_crash_ipi);
 
@@ -77,10 +81,20 @@ static void nmi_shootdown_cpus(void)
         msecs--;
     }
 
-    __stop_this_cpu();
+    crash_disable_local_APIC(TRUE);
+    hvm_cpu_down();
+    clts();
+    asm volatile ( "fninit" );
+
+    /* This is a bit of a hack due to the problems with the x2apic_enabled
+     * variable, but we can't do any better without a significant refactoring
+     * of the APIC code */
+    if ( current_local_apic_mode() == APIC_MODE_X2APIC )
+        x2apic_enabled = 1;
+    else
+        x2apic_enabled = 0;
+
     disable_IO_APIC();
-
-    local_irq_enable();
 }
 
 void machine_crash_shutdown(void)
@@ -89,6 +103,10 @@ void machine_crash_shutdown(void)
 
     nmi_shootdown_cpus();
 
+    /* Crash shutdown any IOMMU functionality as the crashdump kernel is not
+     * happy when booting if interrupt/dma remapping is still enabled */
+    iommu_crash_shutdown();
+
     info = kexec_crash_save_info();
     info->xen_phys_start = xen_phys_start;
     info->dom0_pfn_to_mfn_frame_list_list =
diff -r d3026545e9a0 -r cb005f9078d3 xen/arch/x86/hpet.c
--- a/xen/arch/x86/hpet.c	Wed May 25 15:12:37 2011 +0100
+++ b/xen/arch/x86/hpet.c	Wed May 25 15:30:43 2011 +0100
@@ -670,6 +670,32 @@ void hpet_disable_legacy_broadcast(void)
     smp_send_event_check_mask(&cpu_online_map);
 }
 
+/* This function is similar to the regular
+ * hpet_disable_legacy_broadcast function, except it is called
+ * on the crash path with only the current processor up, so we
+ * can forget the locks and really cant send an event check IPI
+ * to the other processors */
+void crash_hpet_disable_legacy_broadcast(void)
+{
+    u32 cfg;
+
+    if ( !hpet_events || !(hpet_events->flags & HPET_EVT_LEGACY) )
+        return;
+
+    hpet_events->flags |= HPET_EVT_DISABLE;
+
+    /* disable HPET T0 */
+    cfg = hpet_read32(HPET_Tn_CFG(0));
+    cfg &= ~HPET_TN_ENABLE;
+    hpet_write32(cfg, HPET_Tn_CFG(0));
+
+    /* Stop HPET legacy interrupts */
+    cfg = hpet_read32(HPET_CFG);
+    cfg &= ~HPET_CFG_LEGACY;
+    hpet_write32(cfg, HPET_CFG);
+
+}
+
 void hpet_broadcast_enter(void)
 {
     unsigned int cpu = smp_processor_id();
diff -r d3026545e9a0 -r cb005f9078d3 xen/arch/x86/machine_kexec.c
--- a/xen/arch/x86/machine_kexec.c	Wed May 25 15:12:37 2011 +0100
+++ b/xen/arch/x86/machine_kexec.c	Wed May 25 15:30:43 2011 +0100
@@ -89,7 +89,7 @@ void machine_kexec(xen_kexec_image_t *im
     };
 
     if ( hpet_broadcast_is_available() )
-        hpet_disable_legacy_broadcast();
+        crash_hpet_disable_legacy_broadcast();
 
     /*
      * compat_machine_kexec() returns to idle pagetables, which requires us
diff -r d3026545e9a0 -r cb005f9078d3 xen/include/asm-x86/hpet.h
--- a/xen/include/asm-x86/hpet.h	Wed May 25 15:12:37 2011 +0100
+++ b/xen/include/asm-x86/hpet.h	Wed May 25 15:30:43 2011 +0100
@@ -73,5 +73,6 @@ void hpet_broadcast_enter(void);
 void hpet_broadcast_exit(void);
 int hpet_broadcast_is_available(void);
 void hpet_disable_legacy_broadcast(void);
+void crash_hpet_disable_legacy_broadcast(void);
 
 #endif /* __X86_HPET_H__ */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0 of 6] Fix kexec in Xen (take 3)
  2011-05-25 14:32 [PATCH 0 of 6] Fix kexec in Xen (take 3) Andrew Cooper
                   ` (5 preceding siblings ...)
  2011-05-25 14:32 ` [PATCH 6 of 6] KEXEC: disable iommu jumping into the kdump kernel Andrew Cooper
@ 2011-05-25 15:01 ` Wei Wang2
  2011-05-25 16:14 ` Jan Beulich
  7 siblings, 0 replies; 14+ messages in thread
From: Wei Wang2 @ 2011-05-25 15:01 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

On Wednesday 25 May 2011 16:32:02 Andrew Cooper wrote:
> This set of patches is designed to get the kexec path working again on Xen
> 4.x
>
> kdump kernels can't boot if x2apic mode is enabled and the ACPI tables dont
> state this fact.  They also cant boot at all with interrupt remapping
> enabled.
>
> These patches cause xen to track the BSP local APIC boot state and return
> to it before kexec'ing to a new kernel.  It also makes sure to disable IO
> virtualisation.
>
> One area which is problematic is disabling interrupt remapping. 
> lapic_suspend() calls iommu_disable_x2apic_IR() which in a previous thread
> was deemed to be Intel specific and only works by chance on AMD boxes by
> effectivly being a NOP.  As lapic_suspend() is generic code, does this mean
> that we can't/don't ever disable interrupt remapping on AMD boxes?

Yes, there is no explicit way to disable amd iommu like disable_intremap() for 
vtd. The reason is that interrupt remapping is enabled per device and there 
is no global flag to disable it. We need to visit every device entry and then 
disable per device, but it still sounds doable... Maybe we could make 
disable_intremap() generic for both Intel and AMD?
Thanks,
Wei


> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0 of 6] Fix kexec in Xen (take 3)
  2011-05-25 14:32 [PATCH 0 of 6] Fix kexec in Xen (take 3) Andrew Cooper
                   ` (6 preceding siblings ...)
  2011-05-25 15:01 ` [PATCH 0 of 6] Fix kexec in Xen (take 3) Wei Wang2
@ 2011-05-25 16:14 ` Jan Beulich
  2011-05-25 16:30   ` Andrew Cooper
  7 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2011-05-25 16:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 25.05.11 at 16:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> This set of patches is designed to get the kexec path working again on Xen 
> 4.x
> 
> kdump kernels can't boot if x2apic mode is enabled and the ACPI tables dont 
> state this fact.  They also cant boot at all with interrupt remapping 
> enabled.
> 
> These patches cause xen to track the BSP local APIC boot state and return to 
> it before kexec'ing to a new kernel.  It also makes sure to disable IO 
> virtualisation.

I was about to reply to the individual patches, but they just seem
too inconsistent to me (comments not matching code, without it
being clear whether code or comment is wrong; functions introduced
that have no callers). Can you work on getting them into a
state suitable for reviewing?

Further I don't buy your pseudo-quoting of the MP spec saying
that secondary CPUs' local APICs have to be disabled. Keir already
pointed out on your previous submission that in order for them to
receive the INIT and Startup IPIs they must be enabled.

Jan

> One area which is problematic is disabling interrupt remapping.  
> lapic_suspend() calls iommu_disable_x2apic_IR() which in a previous thread 
> was deemed to be Intel specific and only works by chance on AMD boxes by 
> effectivly being a NOP.  As lapic_suspend() is generic code, does this mean 
> that we can't/don't ever disable interrupt remapping on AMD boxes?
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com 
> http://lists.xensource.com/xen-devel 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0 of 6] Fix kexec in Xen (take 3)
  2011-05-25 16:14 ` Jan Beulich
@ 2011-05-25 16:30   ` Andrew Cooper
  2011-05-25 21:35     ` Keir Fraser
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2011-05-25 16:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel



On 25/05/11 17:14, Jan Beulich wrote:
>>>> On 25.05.11 at 16:32, Andrew Cooper<andrew.cooper3@citrix.com>  wrote:
>> This set of patches is designed to get the kexec path working again on Xen
>> 4.x
>>
>> kdump kernels can't boot if x2apic mode is enabled and the ACPI tables dont
>> state this fact.  They also cant boot at all with interrupt remapping
>> enabled.
>>
>> These patches cause xen to track the BSP local APIC boot state and return to
>> it before kexec'ing to a new kernel.  It also makes sure to disable IO
>> virtualisation.
> I was about to reply to the individual patches, but they just seem
> too inconsistent to me (comments not matching code, without it
> being clear whether code or comment is wrong; functions introduced
> that have no callers). Can you work on getting them into a
> state suitable for reviewing?
I was splitting the patches up to make them smaller and modular.  With 
the patches as a full series, there are no functions without callers.

Which comments don't match the code?
> Further I don't buy your pseudo-quoting of the MP spec saying
> that secondary CPUs' local APICs have to be disabled. Keir already
> pointed out on your previous submission that in order for them to
> receive the INIT and Startup IPIs they must be enabled.
What Keir said and what the MP spec states are in direct contraction.  
Please do correct me if I have misread/misinterpreted the spec, but:

Section 3.8 states that all local APICs are disabled when the BIOS hands 
over to the OS.

and

Section 3.7.3 states that the INIT IPI twiddles the APIC reset lines, 
which enabled them when they come out of reset, thus receiving and 
handling the IPI.

~Andrew
> Jan
>
>> One area which is problematic is disabling interrupt remapping.
>> lapic_suspend() calls iommu_disable_x2apic_IR() which in a previous thread
>> was deemed to be Intel specific and only works by chance on AMD boxes by
>> effectivly being a NOP.  As lapic_suspend() is generic code, does this mean
>> that we can't/don't ever disable interrupt remapping on AMD boxes?
>>
>> Signed-off-by: Andrew Cooper<andrew.cooper3@citrix.com>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4 of 6] IOMMU: Sanitise some of our pointer work
  2011-05-25 14:32 ` [PATCH 4 of 6] IOMMU: Sanitise some of our pointer work Andrew Cooper
@ 2011-05-25 17:28   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-25 17:28 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Wed, May 25, 2011 at 03:32:06PM +0100, Andrew Cooper wrote:
> This is not related to the rest of my kdump changes, but as pointed
> out by Konrad in a previous thread, we really should make these checks
> before blindly calling them.
> 

You might want to change the title.
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> diff -r 2f91c312ade5 -r 80f7a773887d xen/drivers/passthrough/iommu.c
> --- a/xen/drivers/passthrough/iommu.c	Wed May 25 15:11:58 2011 +0100
> +++ b/xen/drivers/passthrough/iommu.c	Wed May 25 15:12:23 2011 +0100
> @@ -407,17 +407,17 @@ unsigned int iommu_read_apic_from_ire(un
>      return ops->read_apic_from_ire(apic, reg);
>  }
>  
> -void iommu_resume()
> +void iommu_resume(void)
>  {
>      const struct iommu_ops *ops = iommu_get_ops();
> -    if ( iommu_enabled )
> +    if ( iommu_enabled && ops && ops->resume )
>          ops->resume();
>  }
>  
> -void iommu_suspend()
> +void iommu_suspend(void)
>  {
>      const struct iommu_ops *ops = iommu_get_ops();
> -    if ( iommu_enabled )
> +    if ( iommu_enabled && ops && ops->resume )

resume? Not suspend?
>          ops->suspend();
>  }
>  
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0 of 6] Fix kexec in Xen (take 3)
  2011-05-25 16:30   ` Andrew Cooper
@ 2011-05-25 21:35     ` Keir Fraser
  2011-05-26  9:12       ` Andrew Cooper
  0 siblings, 1 reply; 14+ messages in thread
From: Keir Fraser @ 2011-05-25 21:35 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: xen-devel

On 25/05/2011 17:30, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:

>> I was about to reply to the individual patches, but they just seem
>> too inconsistent to me (comments not matching code, without it
>> being clear whether code or comment is wrong; functions introduced
>> that have no callers). Can you work on getting them into a
>> state suitable for reviewing?
> I was splitting the patches up to make them smaller and modular.  With
> the patches as a full series, there are no functions without callers.
> 
> Which comments don't match the code?

+    /* If we are the boot processor, stick the local apic back to how we
found
+     * it on boot. */
+    if( smp_processor_id() != 0 )

It's a pretty fundamental one.

>> Further I don't buy your pseudo-quoting of the MP spec saying
>> that secondary CPUs' local APICs have to be disabled. Keir already
>> pointed out on your previous submission that in order for them to
>> receive the INIT and Startup IPIs they must be enabled.
> What Keir said and what the MP spec states are in direct contraction.
> Please do correct me if I have misread/misinterpreted the spec, but:
> 
> Section 3.8 states that all local APICs are disabled when the BIOS hands
> over to the OS.
> 
> and
> 
> Section 3.7.3 states that the INIT IPI twiddles the APIC reset lines,
> which enabled them when they come out of reset, thus receiving and
> handling the IPI.

You are quoting from a spec that is nearly 15 years old, and particularly
addressing 486 and older systems with discrete APICs. Xen has never run on
such systems.

A better reference for APIC behaviour is Chapter 10 of Volume 3A of the
Intel Software Developer Manual. See 10.4.7.1 particularly. The APIC is
software disabled on startup -- meaning that the enable bit in the SPIV
register is clear. That is quite different from *hardware* disable (via the
APICBASE MSR) which your patch attempts to deal with. In this latter case
the APIC would be totally shut down and it would not be possible to
INIT-SIPI the secondary processor. The software disable (via SPIV) is very
much a semi-disabled state (and disable_local_APIC() already returns an APIC
to that state).

 -- Keir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0 of 6] Fix kexec in Xen (take 3)
  2011-05-25 21:35     ` Keir Fraser
@ 2011-05-26  9:12       ` Andrew Cooper
  2011-05-26  9:19         ` Keir Fraser
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2011-05-26  9:12 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel



On 25/05/11 22:35, Keir Fraser wrote:
> On 25/05/2011 17:30, "Andrew Cooper"<andrew.cooper3@citrix.com>  wrote:
>
>>> I was about to reply to the individual patches, but they just seem
>>> too inconsistent to me (comments not matching code, without it
>>> being clear whether code or comment is wrong; functions introduced
>>> that have no callers). Can you work on getting them into a
>>> state suitable for reviewing?
>> I was splitting the patches up to make them smaller and modular.  With
>> the patches as a full series, there are no functions without callers.
>>
>> Which comments don't match the code?
> +    /* If we are the boot processor, stick the local apic back to how we
> found
> +     * it on boot. */
> +    if( smp_processor_id() != 0 )
>
> It's a pretty fundamental one.
Oops - point taken. I should never attempt to do a trivial cleanup of code.
>>> Further I don't buy your pseudo-quoting of the MP spec saying
>>> that secondary CPUs' local APICs have to be disabled. Keir already
>>> pointed out on your previous submission that in order for them to
>>> receive the INIT and Startup IPIs they must be enabled.
>> What Keir said and what the MP spec states are in direct contraction.
>> Please do correct me if I have misread/misinterpreted the spec, but:
>>
>> Section 3.8 states that all local APICs are disabled when the BIOS hands
>> over to the OS.
>>
>> and
>>
>> Section 3.7.3 states that the INIT IPI twiddles the APIC reset lines,
>> which enabled them when they come out of reset, thus receiving and
>> handling the IPI.
> You are quoting from a spec that is nearly 15 years old, and particularly
> addressing 486 and older systems with discrete APICs. Xen has never run on
> such systems.
>
> A better reference for APIC behaviour is Chapter 10 of Volume 3A of the
> Intel Software Developer Manual. See 10.4.7.1 particularly. The APIC is
> software disabled on startup -- meaning that the enable bit in the SPIV
> register is clear. That is quite different from *hardware* disable (via the
> APICBASE MSR) which your patch attempts to deal with. In this latter case
> the APIC would be totally shut down and it would not be possible to
> INIT-SIPI the secondary processor. The software disable (via SPIV) is very
> much a semi-disabled state (and disable_local_APIC() already returns an APIC
> to that state).
>
>   -- Keir
>
Ok - I will read up on this more, and then I guess I have some code to 
change.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0 of 6] Fix kexec in Xen (take 3)
  2011-05-26  9:12       ` Andrew Cooper
@ 2011-05-26  9:19         ` Keir Fraser
  0 siblings, 0 replies; 14+ messages in thread
From: Keir Fraser @ 2011-05-26  9:19 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On 26/05/2011 10:12, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:

>> A better reference for APIC behaviour is Chapter 10 of Volume 3A of the
>> Intel Software Developer Manual. See 10.4.7.1 particularly. The APIC is
>> software disabled on startup -- meaning that the enable bit in the SPIV
>> register is clear. That is quite different from *hardware* disable (via the
>> APICBASE MSR) which your patch attempts to deal with. In this latter case
>> the APIC would be totally shut down and it would not be possible to
>> INIT-SIPI the secondary processor. The software disable (via SPIV) is very
>> much a semi-disabled state (and disable_local_APIC() already returns an APIC
>> to that state).
>> 
>>   -- Keir
>> 
> Ok - I will read up on this more, and then I guess I have some code to
> change.

Yes, please do.

Also if you can please try to avoid making crash-specific versions of
cleanup/shutdown functions. I would actually rather have a global variable
indicating I-am-crashing-on-cpu-x, and have at that from the existing
shutdown functions to modify their behaviour.

If you can fix that plus curb your impulse to do more to the APIC code than
necessary then we have a starting point for further review.

 -- Keir

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-05-26  9:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-25 14:32 [PATCH 0 of 6] Fix kexec in Xen (take 3) Andrew Cooper
2011-05-25 14:32 ` [PATCH 1 of 6] APIC: record local APIC state on boot Andrew Cooper
2011-05-25 14:32 ` [PATCH 2 of 6] APIC: remove 'enabled_via_apicbase' variable Andrew Cooper
2011-05-25 14:32 ` [PATCH 3 of 6] APIC: add crash_disable_local_APIC Andrew Cooper
2011-05-25 14:32 ` [PATCH 4 of 6] IOMMU: Sanitise some of our pointer work Andrew Cooper
2011-05-25 17:28   ` Konrad Rzeszutek Wilk
2011-05-25 14:32 ` [PATCH 5 of 6] IOMMU: add crash_shutdown iommu_op Andrew Cooper
2011-05-25 14:32 ` [PATCH 6 of 6] KEXEC: disable iommu jumping into the kdump kernel Andrew Cooper
2011-05-25 15:01 ` [PATCH 0 of 6] Fix kexec in Xen (take 3) Wei Wang2
2011-05-25 16:14 ` Jan Beulich
2011-05-25 16:30   ` Andrew Cooper
2011-05-25 21:35     ` Keir Fraser
2011-05-26  9:12       ` Andrew Cooper
2011-05-26  9:19         ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.