All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0 of 3] Fix kexec path in xen (take 2)
@ 2011-05-18 18:08 Andrew Cooper
  2011-05-18 18:08 ` [PATCH 1 of 3] apic: record local apic state on boot Andrew Cooper
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Andrew Cooper @ 2011-05-18 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

This set of patches is the minimal subset required to get the kexec path working again on Xen 4.x

kdump kernels can't boot if x2apic mode is enabled and the ACPI tables dont state this fact.  They also cant boot at all with interrupt remapping enabled.

These patches cause xen to track the local apic boot state and return to it before kexec'ing to a new kernel.  It also makes sure to disable IO virtualisation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-18 18:08 [PATCH 0 of 3] Fix kexec path in xen (take 2) Andrew Cooper
@ 2011-05-18 18:08 ` Andrew Cooper
  2011-05-18 18:49   ` Konrad Rzeszutek Wilk
  2011-05-18 23:40   ` Keir Fraser
  2011-05-18 18:08 ` [PATCH 2 of 3] apic: remove 'enabled_via_apicbase' variable Andrew Cooper
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 27+ messages in thread
From: Andrew Cooper @ 2011-05-18 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

Xen does not store the boot local apic state which leads to problems
when shutting down for a kexec jump.  This patch records the boot
state so we can return to the boot state when kexec'ing.

This is per CPU because all 3 bioses on the boxes I have tested dont
enabled all local apics on boot.  As a result, we have to return to
the bios state so the ACPI tables match up with the hardware state
for the booting kernel.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/apic.c
--- a/xen/arch/x86/apic.c	Tue May 17 17:32:19 2011 +0100
+++ b/xen/arch/x86/apic.c	Wed May 18 19:00:13 2011 +0100
@@ -74,6 +74,8 @@ u8 __read_mostly apic_verbosity;
 static bool_t __initdata opt_x2apic = 1;
 boolean_param("x2apic", opt_x2apic);
 
+DEFINE_PER_CPU_READ_MOSTLY(enum apic_mode, apic_boot_mode) = APIC_MODE_INVALID;
+
 bool_t __read_mostly x2apic_enabled = 0;
 bool_t __read_mostly directed_eoi_enabled = 0;
 
@@ -1437,6 +1439,41 @@ int __init APIC_init_uniprocessor (void)
     return 0;
 }
 
+/* Needs to be called once per CPU during startup.  It records the state the BIOS
+ * leaves the local APIC so we can tare back down upon shutdown/crash
+ */
+void __init record_boot_APIC_mode(void)
+{
+    enum apic_mode this_apic_mode;
+    u64 msr_contents;
+
+    this_apic_mode = APIC_MODE_INVALID;
+
+    /* Sanity check - we should only ever run once */
+    BUG_ON( APIC_MODE_INVALID != this_cpu(apic_boot_mode) );
+
+    rdmsrl(MSR_IA32_APICBASE, msr_contents);
+
+    /* Reading EXTD bit from the MSR is only valid if CPUID says so, else reserved */
+    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
+         && (msr_contents & MSR_IA32_APICBASE_EXTD) )
+        this_apic_mode = APIC_MODE_X2APIC;
+    else
+        {
+            /* EN bit should always be valid as long as we can read the MSR
+             * Can anyone confirm this?
+             */
+            if ( msr_contents & MSR_IA32_APICBASE_ENABLE )
+                this_apic_mode = APIC_MODE_XAPIC;
+            else
+                this_apic_mode = APIC_MODE_DISABLED;
+        }
+
+    this_cpu(apic_boot_mode) = this_apic_mode;
+    apic_printk(APIC_DEBUG, "APIC boot state is %d on core #%d\n",
+                this_apic_mode, smp_processor_id());
+}
+
 void check_for_unexpected_msi(unsigned int vector)
 {
     unsigned long v = apic_read(APIC_ISR + ((vector & ~0x1f) >> 1));
diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/genapic/probe.c
--- a/xen/arch/x86/genapic/probe.c	Tue May 17 17:32:19 2011 +0100
+++ b/xen/arch/x86/genapic/probe.c	Wed May 18 19:00:13 2011 +0100
@@ -60,6 +60,8 @@ void __init generic_apic_probe(void)
 { 
 	int i, changed;
 
+    record_boot_APIC_mode();
+
 	check_x2apic_preenabled();
 	cmdline_apic = changed = (genapic != NULL);
 
diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/smpboot.c
--- a/xen/arch/x86/smpboot.c	Tue May 17 17:32:19 2011 +0100
+++ b/xen/arch/x86/smpboot.c	Wed May 18 19:00:13 2011 +0100
@@ -388,6 +388,9 @@ void start_secondary(void *unused)
 
     microcode_resume_cpu(cpu);
 
+    /* Record boot apic state */
+    record_boot_APIC_mode();
+
     wmb();
     startup_cpu_idle_loop();
 }
diff -r f531ed84b066 -r 62a8ce6595ad xen/include/asm-x86/apic.h
--- a/xen/include/asm-x86/apic.h	Tue May 17 17:32:19 2011 +0100
+++ b/xen/include/asm-x86/apic.h	Wed May 18 19:00:13 2011 +0100
@@ -21,6 +21,16 @@
 #define IO_APIC_REDIR_DEST_LOGICAL	0x00800
 #define IO_APIC_REDIR_DEST_PHYSICAL	0x00000
 
+/* Possible APIC states */
+enum apic_mode { APIC_MODE_INVALID,  /* Not set yet */
+                 APIC_MODE_DISABLED, /* Some bioses disable by default for compatability */
+                 APIC_MODE_XAPIC,    /* xAPIC mode - default upon chipset reset */
+                 APIC_MODE_X2APIC    /* x2APIC mode - common for large smp machines */
+};
+
+/* PerCPU local APIC boot mode - so we can taredown to bios state */
+DECLARE_PER_CPU(enum apic_mode, apic_boot_mode);
+
 extern u8 apic_verbosity;
 extern bool_t x2apic_enabled;
 extern bool_t directed_eoi_enabled;
@@ -203,6 +213,7 @@ extern void disable_APIC_timer(void);
 extern void enable_APIC_timer(void);
 extern int lapic_suspend(void);
 extern int lapic_resume(void);
+extern void record_boot_APIC_mode(void);
 
 extern int check_nmi_watchdog (void);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2 of 3] apic: remove 'enabled_via_apicbase' variable
  2011-05-18 18:08 [PATCH 0 of 3] Fix kexec path in xen (take 2) Andrew Cooper
  2011-05-18 18:08 ` [PATCH 1 of 3] apic: record local apic state on boot Andrew Cooper
@ 2011-05-18 18:08 ` Andrew Cooper
  2011-05-18 18:53   ` Konrad Rzeszutek Wilk
  2011-05-18 18:08 ` [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel Andrew Cooper
  2011-05-18 18:39 ` [PATCH 0 of 3] Fix kexec path in xen (take 2) Konrad Rzeszutek Wilk
  3 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2011-05-18 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

The use of this varable was only sensible when the choice for lapic mode
was between enabled or disabled.  Now that x2apic is about, it is wrong,
and causes a protection fault in certain cases when trying to tare down
x2apic mode.

The only place where its use is relevent in the code is in disable_local_APIC
which has been changed to correctly tare down the local APIC without a
protection fault (which leads to a general protection fault).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r 62a8ce6595ad -r e80b5280fe2f xen/arch/x86/apic.c
--- a/xen/arch/x86/apic.c	Wed May 18 19:00:13 2011 +0100
+++ b/xen/arch/x86/apic.c	Wed May 18 19:00:13 2011 +0100
@@ -165,8 +165,6 @@ void __init apic_intr_init(void)
 /* Using APIC to generate smp_local_timer_interrupt? */
 static bool_t __read_mostly using_apic_timer;
 
-static bool_t __read_mostly enabled_via_apicbase;
-
 int get_physical_broadcast(void)
 {
     if (modern_apic())
@@ -330,6 +328,9 @@ void disconnect_bsp_APIC(int virt_wire_s
 
 void disable_local_APIC(void)
 {
+    u64 msr_contents;
+    enum apic_mode current_mode;
+
     clear_local_APIC();
 
     /*
@@ -339,10 +340,54 @@ void disable_local_APIC(void)
     apic_write_around(APIC_SPIV,
         apic_read(APIC_SPIV) & ~APIC_SPIV_APIC_ENABLED);
 
-    if (enabled_via_apicbase) {
-        uint64_t msr_content;
-        rdmsrl(MSR_IA32_APICBASE, msr_content);
-        wrmsrl(MSR_IA32_APICBASE, msr_content & ~MSR_IA32_APICBASE_ENABLE);
+    /* Work out what apic mode we are in */
+    rdmsrl(MSR_IA32_APICBASE, msr_contents);
+
+    /* Reading EXTD bit from the MSR is only valid if CPUID says so, else reserved */
+    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
+         && (msr_contents & MSR_IA32_APICBASE_EXTD) )
+        current_mode = APIC_MODE_X2APIC;
+    else
+        {
+            /* EN bit should always be valid as long as we can read the MSR
+             * Can anyone confirm this?
+             */
+            if ( msr_contents & MSR_IA32_APICBASE_ENABLE )
+                current_mode = APIC_MODE_XAPIC;
+            else
+                current_mode = APIC_MODE_DISABLED;
+        }
+
+    /* See what (if anything) we need to do to revert to boot mode */
+    if( current_mode != this_cpu(apic_boot_mode) )
+    {
+        rdmsrl(MSR_IA32_APICBASE, msr_contents);
+        msr_contents &= ~ ( MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD );
+        wrmsrl(MSR_IA32_APICBASE, msr_contents);
+
+        switch(this_cpu(apic_boot_mode))
+        {
+        case APIC_MODE_DISABLED:
+            break; /* Nothing to do - we did this above */
+        case APIC_MODE_XAPIC:
+        {
+            msr_contents |= MSR_IA32_APICBASE_ENABLE;
+            wrmsrl(MSR_IA32_APICBASE, msr_contents);
+            break;
+        }
+        case APIC_MODE_X2APIC:
+        {
+            msr_contents |= ( MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD );
+            wrmsrl(MSR_IA32_APICBASE, msr_contents);
+            break;
+        }
+        default:
+        {
+            printk("Hit default case when reverting lapic to boot state on core #%d\n",
+                   smp_processor_id());
+            break;
+        }
+        }
     }
 }
 
@@ -874,7 +919,6 @@ static int __init detect_init_APIC (void
             wrmsrl(MSR_IA32_APICBASE,
                 msr_content | MSR_IA32_APICBASE_ENABLE
                 | APIC_DEFAULT_PHYS_BASE);
-            enabled_via_apicbase = 1;
         }
     }
     /*

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel
  2011-05-18 18:08 [PATCH 0 of 3] Fix kexec path in xen (take 2) Andrew Cooper
  2011-05-18 18:08 ` [PATCH 1 of 3] apic: record local apic state on boot Andrew Cooper
  2011-05-18 18:08 ` [PATCH 2 of 3] apic: remove 'enabled_via_apicbase' variable Andrew Cooper
@ 2011-05-18 18:08 ` Andrew Cooper
  2011-05-18 18:49   ` Konrad Rzeszutek Wilk
  2011-05-18 18:39 ` [PATCH 0 of 3] Fix kexec path in xen (take 2) Konrad Rzeszutek Wilk
  3 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2011-05-18 18:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

kdump kernels are unable to boot with IOMMU enabled,
this patch disabled IOMMU mode and removes some of the generic
code from the shutdown path which doesnt work after other
CPUs have been shot down.

Also, leave local interrupts disabled when jumping into pugatory
as we have no idea whats in there and really dont want to be
servicing interrupts when our entire state is invalid.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/crash.c
--- a/xen/arch/x86/crash.c	Wed May 18 19:00:13 2011 +0100
+++ b/xen/arch/x86/crash.c	Wed May 18 19:00:13 2011 +0100
@@ -27,6 +27,8 @@
 #include <asm/hvm/support.h>
 #include <asm/apic.h>
 #include <asm/io_apic.h>
+#include <xen/iommu.h>
+#include <asm/hvm/iommu.h>
 
 static atomic_t waiting_for_crash_ipi;
 static unsigned int crashing_cpu;
@@ -43,7 +45,10 @@ static int crash_nmi_callback(struct cpu
 
     kexec_crash_save_cpu();
 
-    __stop_this_cpu();
+    disable_local_APIC();
+    hvm_cpu_down();
+    clts();
+    asm volatile ( "fninit" );
 
     atomic_dec(&waiting_for_crash_ipi);
 
@@ -56,6 +61,7 @@ static int crash_nmi_callback(struct cpu
 static void nmi_shootdown_cpus(void)
 {
     unsigned long msecs;
+    u64 msr_contents;
 
     local_irq_disable();
 
@@ -77,18 +83,43 @@ static void nmi_shootdown_cpus(void)
         msecs--;
     }
 
-    __stop_this_cpu();
+    disable_local_APIC();
+    hvm_cpu_down();
+    clts();
+    asm volatile ( "fninit" );
+
+    /* This is a bit of a hack but there is no other way to shutdown correctly
+     * without a significant refactoring of the APIC code */
+    rdmsrl(MSR_IA32_APICBASE, msr_contents);
+    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
+         && (msr_contents & MSR_IA32_APICBASE_EXTD) )
+        x2apic_enabled = 1;
+    else
+        x2apic_enabled = 0;
+
     disable_IO_APIC();
-
-    local_irq_enable();
 }
 
 void machine_crash_shutdown(void)
 {
     crash_xen_info_t *info;
+    const struct iommu_ops * ops;
 
     nmi_shootdown_cpus();
 
+    /* Yes i know this is hacky but it is the easiest solution.  I should add an iommu_ops
+     * function called crash() or so which just disables the iommu 'fun' without saving state
+     */
+    ops = iommu_get_ops();
+    if(ops)
+        ops->suspend();
+
+    /* Yes i know this is from driver/passthrough/vtd/ but it appears to be architecture
+     * independant, and also bears little/no relation to x2apic.  Needs cleaning up
+     */
+    iommu_disable_x2apic_IR();
+
+
     info = kexec_crash_save_info();
     info->xen_phys_start = xen_phys_start;
     info->dom0_pfn_to_mfn_frame_list_list =
diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/hpet.c
--- a/xen/arch/x86/hpet.c	Wed May 18 19:00:13 2011 +0100
+++ b/xen/arch/x86/hpet.c	Wed May 18 19:00:13 2011 +0100
@@ -670,6 +670,33 @@ void hpet_disable_legacy_broadcast(void)
     smp_send_event_check_mask(&cpu_online_map);
 }
 
+/* This function is similar to the regular
+ * hpet_disable_legacy_broadcast function, except it is called
+ * on the crash path with only the current processor up, so we
+ * can forget the locks and really cant send an event check IPI
+ * to the other processors */
+void crash_hpet_disable_legacy_broadcast(void)
+{
+    u32 cfg;
+
+    if ( !hpet_events || !(hpet_events->flags & HPET_EVT_LEGACY) )
+        return;
+
+    hpet_events->flags |= HPET_EVT_DISABLE;
+
+    /* disable HPET T0 */
+    cfg = hpet_read32(HPET_Tn_CFG(0));
+    cfg &= ~HPET_TN_ENABLE;
+    hpet_write32(cfg, HPET_Tn_CFG(0));
+
+    /* Stop HPET legacy interrupts */
+    cfg = hpet_read32(HPET_CFG);
+    cfg &= ~HPET_CFG_LEGACY;
+    hpet_write32(cfg, HPET_CFG);
+
+}
+
+
 void hpet_broadcast_enter(void)
 {
     unsigned int cpu = smp_processor_id();
diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/machine_kexec.c
--- a/xen/arch/x86/machine_kexec.c	Wed May 18 19:00:13 2011 +0100
+++ b/xen/arch/x86/machine_kexec.c	Wed May 18 19:00:13 2011 +0100
@@ -97,7 +97,7 @@ void machine_kexec(xen_kexec_image_t *im
     };
 
     if ( hpet_broadcast_is_available() )
-        hpet_disable_legacy_broadcast();
+        crash_hpet_disable_legacy_broadcast();
 
     /*
      * compat_machine_kexec() returns to idle pagetables, which requires us
diff -r e80b5280fe2f -r aaf44d1a903d xen/include/asm-x86/hpet.h
--- a/xen/include/asm-x86/hpet.h	Wed May 18 19:00:13 2011 +0100
+++ b/xen/include/asm-x86/hpet.h	Wed May 18 19:00:13 2011 +0100
@@ -73,5 +73,6 @@ void hpet_broadcast_enter(void);
 void hpet_broadcast_exit(void);
 int hpet_broadcast_is_available(void);
 void hpet_disable_legacy_broadcast(void);
+void crash_hpet_disable_legacy_broadcast(void);
 
 #endif /* __X86_HPET_H__ */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0 of 3] Fix kexec path in xen (take 2)
  2011-05-18 18:08 [PATCH 0 of 3] Fix kexec path in xen (take 2) Andrew Cooper
                   ` (2 preceding siblings ...)
  2011-05-18 18:08 ` [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel Andrew Cooper
@ 2011-05-18 18:39 ` Konrad Rzeszutek Wilk
  3 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-18 18:39 UTC (permalink / raw)
  To: Andrew Cooper, dkiper; +Cc: xen-devel

On Wed, May 18, 2011 at 07:08:13PM +0100, Andrew Cooper wrote:
> This set of patches is the minimal subset required to get the kexec path working again on Xen 4.x

You might want to copy Daniel on these patches: dkiper@net-space.pl as he is working
on a project alongside this.

> 
> kdump kernels can't boot if x2apic mode is enabled and the ACPI tables dont state this fact.  They also cant boot at all with interrupt remapping enabled.
> 
> These patches cause xen to track the local apic boot state and return to it before kexec'ing to a new kernel.  It also makes sure to disable IO virtualisation.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel
  2011-05-18 18:08 ` [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel Andrew Cooper
@ 2011-05-18 18:49   ` Konrad Rzeszutek Wilk
  2011-05-18 20:48     ` Andrew Cooper
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-18 18:49 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Wed, May 18, 2011 at 07:08:16PM +0100, Andrew Cooper wrote:
> kdump kernels are unable to boot with IOMMU enabled,
> this patch disabled IOMMU mode and removes some of the generic
> code from the shutdown path which doesnt work after other
> CPUs have been shot down.
> 
> Also, leave local interrupts disabled when jumping into pugatory

purgatory?
> as we have no idea whats in there and really dont want to be
> servicing interrupts when our entire state is invalid.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/crash.c
> --- a/xen/arch/x86/crash.c	Wed May 18 19:00:13 2011 +0100
> +++ b/xen/arch/x86/crash.c	Wed May 18 19:00:13 2011 +0100
> @@ -27,6 +27,8 @@
>  #include <asm/hvm/support.h>
>  #include <asm/apic.h>
>  #include <asm/io_apic.h>
> +#include <xen/iommu.h>
> +#include <asm/hvm/iommu.h>
>  
>  static atomic_t waiting_for_crash_ipi;
>  static unsigned int crashing_cpu;
> @@ -43,7 +45,10 @@ static int crash_nmi_callback(struct cpu
>  
>      kexec_crash_save_cpu();
>  
> -    __stop_this_cpu();
> +    disable_local_APIC();
> +    hvm_cpu_down();
> +    clts();
> +    asm volatile ( "fninit" );

Can you provide a comment why you are using fninit and clt?
Is this what the Linux kernel does too when it goes through the kexec path?
>  
>      atomic_dec(&waiting_for_crash_ipi);
>  
> @@ -56,6 +61,7 @@ static int crash_nmi_callback(struct cpu
>  static void nmi_shootdown_cpus(void)
>  {
>      unsigned long msecs;
> +    u64 msr_contents;
>  
>      local_irq_disable();
>  
> @@ -77,18 +83,43 @@ static void nmi_shootdown_cpus(void)
>          msecs--;
>      }
>  
> -    __stop_this_cpu();
> +    disable_local_APIC();
> +    hvm_cpu_down();
> +    clts();
> +    asm volatile ( "fninit" );
> +
> +    /* This is a bit of a hack but there is no other way to shutdown correctly
> +     * without a significant refactoring of the APIC code */
> +    rdmsrl(MSR_IA32_APICBASE, msr_contents);
> +    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
> +         && (msr_contents & MSR_IA32_APICBASE_EXTD) )
> +        x2apic_enabled = 1;
> +    else
> +        x2apic_enabled = 0;
> +
>      disable_IO_APIC();
> -
> -    local_irq_enable();

Why?
>  }
>  
>  void machine_crash_shutdown(void)
>  {
>      crash_xen_info_t *info;
> +    const struct iommu_ops * ops;
>  
>      nmi_shootdown_cpus();
>  
> +    /* Yes i know this is hacky but it is the easiest solution.  I should add an iommu_ops
> +     * function called crash() or so which just disables the iommu 'fun' without saving state
> +     */
> +    ops = iommu_get_ops();
> +    if(ops)
> +        ops->suspend();

Uh, no checking if ops->suspend exists?

> +
> +    /* Yes i know this is from driver/passthrough/vtd/ but it appears to be architecture
> +     * independant, and also bears little/no relation to x2apic.  Needs cleaning up

What about AMD VI IOMMUs? Does it work when that IOMMU is used?

> +     */
> +    iommu_disable_x2apic_IR();

Can't that function be done in the suspend code of the IOMMU?
> +
> +
>      info = kexec_crash_save_info();
>      info->xen_phys_start = xen_phys_start;
>      info->dom0_pfn_to_mfn_frame_list_list =
> diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/hpet.c
> --- a/xen/arch/x86/hpet.c	Wed May 18 19:00:13 2011 +0100
> +++ b/xen/arch/x86/hpet.c	Wed May 18 19:00:13 2011 +0100
> @@ -670,6 +670,33 @@ void hpet_disable_legacy_broadcast(void)
>      smp_send_event_check_mask(&cpu_online_map);
>  }
>  
> +/* This function is similar to the regular
> + * hpet_disable_legacy_broadcast function, except it is called
> + * on the crash path with only the current processor up, so we
> + * can forget the locks and really cant send an event check IPI
> + * to the other processors */
> +void crash_hpet_disable_legacy_broadcast(void)
> +{
> +    u32 cfg;
> +
> +    if ( !hpet_events || !(hpet_events->flags & HPET_EVT_LEGACY) )
> +        return;
> +
> +    hpet_events->flags |= HPET_EVT_DISABLE;
> +
> +    /* disable HPET T0 */
> +    cfg = hpet_read32(HPET_Tn_CFG(0));
> +    cfg &= ~HPET_TN_ENABLE;
> +    hpet_write32(cfg, HPET_Tn_CFG(0));
> +
> +    /* Stop HPET legacy interrupts */
> +    cfg = hpet_read32(HPET_CFG);
> +    cfg &= ~HPET_CFG_LEGACY;
> +    hpet_write32(cfg, HPET_CFG);
> +
> +}
> +
> +
>  void hpet_broadcast_enter(void)
>  {
>      unsigned int cpu = smp_processor_id();
> diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/machine_kexec.c
> --- a/xen/arch/x86/machine_kexec.c	Wed May 18 19:00:13 2011 +0100
> +++ b/xen/arch/x86/machine_kexec.c	Wed May 18 19:00:13 2011 +0100
> @@ -97,7 +97,7 @@ void machine_kexec(xen_kexec_image_t *im
>      };
>  
>      if ( hpet_broadcast_is_available() )
> -        hpet_disable_legacy_broadcast();
> +        crash_hpet_disable_legacy_broadcast();
>  
>      /*
>       * compat_machine_kexec() returns to idle pagetables, which requires us
> diff -r e80b5280fe2f -r aaf44d1a903d xen/include/asm-x86/hpet.h
> --- a/xen/include/asm-x86/hpet.h	Wed May 18 19:00:13 2011 +0100
> +++ b/xen/include/asm-x86/hpet.h	Wed May 18 19:00:13 2011 +0100
> @@ -73,5 +73,6 @@ void hpet_broadcast_enter(void);
>  void hpet_broadcast_exit(void);
>  int hpet_broadcast_is_available(void);
>  void hpet_disable_legacy_broadcast(void);
> +void crash_hpet_disable_legacy_broadcast(void);
>  
>  #endif /* __X86_HPET_H__ */
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-18 18:08 ` [PATCH 1 of 3] apic: record local apic state on boot Andrew Cooper
@ 2011-05-18 18:49   ` Konrad Rzeszutek Wilk
  2011-05-18 20:27     ` Andrew Cooper
  2011-05-18 23:40   ` Keir Fraser
  1 sibling, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-18 18:49 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Wed, May 18, 2011 at 07:08:14PM +0100, Andrew Cooper wrote:
> Xen does not store the boot local apic state which leads to problems
> when shutting down for a kexec jump.  This patch records the boot
> state so we can return to the boot state when kexec'ing.
> 
> This is per CPU because all 3 bioses on the boxes I have tested dont
> enabled all local apics on boot.  As a result, we have to return to
> the bios state so the ACPI tables match up with the hardware state
> for the booting kernel.

Which ACPI table requires this?

> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/apic.c
> --- a/xen/arch/x86/apic.c	Tue May 17 17:32:19 2011 +0100
> +++ b/xen/arch/x86/apic.c	Wed May 18 19:00:13 2011 +0100
> @@ -74,6 +74,8 @@ u8 __read_mostly apic_verbosity;
>  static bool_t __initdata opt_x2apic = 1;
>  boolean_param("x2apic", opt_x2apic);
>  
> +DEFINE_PER_CPU_READ_MOSTLY(enum apic_mode, apic_boot_mode) = APIC_MODE_INVALID;
> +
>  bool_t __read_mostly x2apic_enabled = 0;
>  bool_t __read_mostly directed_eoi_enabled = 0;
>  
> @@ -1437,6 +1439,41 @@ int __init APIC_init_uniprocessor (void)
>      return 0;
>  }
>  
> +/* Needs to be called once per CPU during startup.  It records the state the BIOS
> + * leaves the local APIC so we can tare back down upon shutdown/crash

tare?

> + */
> +void __init record_boot_APIC_mode(void)
> +{
> +    enum apic_mode this_apic_mode;
> +    u64 msr_contents;
> +
> +    this_apic_mode = APIC_MODE_INVALID;
> +
> +    /* Sanity check - we should only ever run once */
> +    BUG_ON( APIC_MODE_INVALID != this_cpu(apic_boot_mode) );
> +
> +    rdmsrl(MSR_IA32_APICBASE, msr_contents);
> +
> +    /* Reading EXTD bit from the MSR is only valid if CPUID says so, else reserved */
> +    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
> +         && (msr_contents & MSR_IA32_APICBASE_EXTD) )
> +        this_apic_mode = APIC_MODE_X2APIC;
> +    else
> +        {
> +            /* EN bit should always be valid as long as we can read the MSR
> +             * Can anyone confirm this?

Email Vivek Goyal. He is the kexec/kdump maintainer.

> +             */
> +            if ( msr_contents & MSR_IA32_APICBASE_ENABLE )
> +                this_apic_mode = APIC_MODE_XAPIC;
> +            else
> +                this_apic_mode = APIC_MODE_DISABLED;
> +        }
> +
> +    this_cpu(apic_boot_mode) = this_apic_mode;
> +    apic_printk(APIC_DEBUG, "APIC boot state is %d on core #%d\n",
> +                this_apic_mode, smp_processor_id());

This begs of a function to convert those enums to strings..
> +}
> +
>  void check_for_unexpected_msi(unsigned int vector)
>  {
>      unsigned long v = apic_read(APIC_ISR + ((vector & ~0x1f) >> 1));
> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/genapic/probe.c
> --- a/xen/arch/x86/genapic/probe.c	Tue May 17 17:32:19 2011 +0100
> +++ b/xen/arch/x86/genapic/probe.c	Wed May 18 19:00:13 2011 +0100
> @@ -60,6 +60,8 @@ void __init generic_apic_probe(void)
>  { 
>  	int i, changed;
>  
> +    record_boot_APIC_mode();
> +

The spacing looks odd.

>  	check_x2apic_preenabled();
>  	cmdline_apic = changed = (genapic != NULL);
>  
> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/smpboot.c
> --- a/xen/arch/x86/smpboot.c	Tue May 17 17:32:19 2011 +0100
> +++ b/xen/arch/x86/smpboot.c	Wed May 18 19:00:13 2011 +0100
> @@ -388,6 +388,9 @@ void start_secondary(void *unused)
>  
>      microcode_resume_cpu(cpu);
>  
> +    /* Record boot apic state */
> +    record_boot_APIC_mode();
> +
>      wmb();
>      startup_cpu_idle_loop();
>  }
> diff -r f531ed84b066 -r 62a8ce6595ad xen/include/asm-x86/apic.h
> --- a/xen/include/asm-x86/apic.h	Tue May 17 17:32:19 2011 +0100
> +++ b/xen/include/asm-x86/apic.h	Wed May 18 19:00:13 2011 +0100
> @@ -21,6 +21,16 @@
>  #define IO_APIC_REDIR_DEST_LOGICAL	0x00800
>  #define IO_APIC_REDIR_DEST_PHYSICAL	0x00000
>  
> +/* Possible APIC states */
> +enum apic_mode { APIC_MODE_INVALID,  /* Not set yet */
> +                 APIC_MODE_DISABLED, /* Some bioses disable by default for compatability */
> +                 APIC_MODE_XAPIC,    /* xAPIC mode - default upon chipset reset */
> +                 APIC_MODE_X2APIC    /* x2APIC mode - common for large smp machines */
> +};
> +
> +/* PerCPU local APIC boot mode - so we can taredown to bios state */

taredown?

> +DECLARE_PER_CPU(enum apic_mode, apic_boot_mode);
> +
>  extern u8 apic_verbosity;
>  extern bool_t x2apic_enabled;
>  extern bool_t directed_eoi_enabled;
> @@ -203,6 +213,7 @@ extern void disable_APIC_timer(void);
>  extern void enable_APIC_timer(void);
>  extern int lapic_suspend(void);
>  extern int lapic_resume(void);
> +extern void record_boot_APIC_mode(void);
>  
>  extern int check_nmi_watchdog (void);
>  
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2 of 3] apic: remove 'enabled_via_apicbase' variable
  2011-05-18 18:08 ` [PATCH 2 of 3] apic: remove 'enabled_via_apicbase' variable Andrew Cooper
@ 2011-05-18 18:53   ` Konrad Rzeszutek Wilk
  2011-05-18 20:35     ` Andrew Cooper
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-18 18:53 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Wed, May 18, 2011 at 07:08:15PM +0100, Andrew Cooper wrote:
> The use of this varable was only sensible when the choice for lapic mode

variable
> was between enabled or disabled.  Now that x2apic is about, it is wrong,
> and causes a protection fault in certain cases when trying to tare down

tare?

> x2apic mode.
> 
> The only place where its use is relevent in the code is in disable_local_APIC
                                          ^- is         ^^^ take that out.

> which has been changed to correctly tare down the local APIC without a

teardown?
> protection fault (which leads to a general protection fault).

So if you don't have x2apic, then it is wrong to disable the LAPIC mode?
What about older hardware?

> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> diff -r 62a8ce6595ad -r e80b5280fe2f xen/arch/x86/apic.c
> --- a/xen/arch/x86/apic.c	Wed May 18 19:00:13 2011 +0100
> +++ b/xen/arch/x86/apic.c	Wed May 18 19:00:13 2011 +0100
> @@ -165,8 +165,6 @@ void __init apic_intr_init(void)
>  /* Using APIC to generate smp_local_timer_interrupt? */
>  static bool_t __read_mostly using_apic_timer;
>  
> -static bool_t __read_mostly enabled_via_apicbase;
> -
>  int get_physical_broadcast(void)
>  {
>      if (modern_apic())
> @@ -330,6 +328,9 @@ void disconnect_bsp_APIC(int virt_wire_s
>  
>  void disable_local_APIC(void)
>  {
> +    u64 msr_contents;
> +    enum apic_mode current_mode;
> +
>      clear_local_APIC();
>  
>      /*
> @@ -339,10 +340,54 @@ void disable_local_APIC(void)
>      apic_write_around(APIC_SPIV,
>          apic_read(APIC_SPIV) & ~APIC_SPIV_APIC_ENABLED);
>  
> -    if (enabled_via_apicbase) {
> -        uint64_t msr_content;
> -        rdmsrl(MSR_IA32_APICBASE, msr_content);
> -        wrmsrl(MSR_IA32_APICBASE, msr_content & ~MSR_IA32_APICBASE_ENABLE);
> +    /* Work out what apic mode we are in */
> +    rdmsrl(MSR_IA32_APICBASE, msr_contents);
> +
> +    /* Reading EXTD bit from the MSR is only valid if CPUID says so, else reserved */
> +    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
> +         && (msr_contents & MSR_IA32_APICBASE_EXTD) )
> +        current_mode = APIC_MODE_X2APIC;
> +    else
> +        {
> +            /* EN bit should always be valid as long as we can read the MSR
> +             * Can anyone confirm this?

Might want to email Vivek Goyal..
> +             */
> +            if ( msr_contents & MSR_IA32_APICBASE_ENABLE )
> +                current_mode = APIC_MODE_XAPIC;
> +            else
> +                current_mode = APIC_MODE_DISABLED;
> +        }
> +
> +    /* See what (if anything) we need to do to revert to boot mode */
> +    if( current_mode != this_cpu(apic_boot_mode) )
> +    {
> +        rdmsrl(MSR_IA32_APICBASE, msr_contents);
> +        msr_contents &= ~ ( MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD );
> +        wrmsrl(MSR_IA32_APICBASE, msr_contents);
> +
> +        switch(this_cpu(apic_boot_mode))
> +        {
> +        case APIC_MODE_DISABLED:
> +            break; /* Nothing to do - we did this above */
> +        case APIC_MODE_XAPIC:
> +        {
> +            msr_contents |= MSR_IA32_APICBASE_ENABLE;
> +            wrmsrl(MSR_IA32_APICBASE, msr_contents);
> +            break;
> +        }
> +        case APIC_MODE_X2APIC:
> +        {
> +            msr_contents |= ( MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD );
> +            wrmsrl(MSR_IA32_APICBASE, msr_contents);
> +            break;
> +        }
> +        default:
> +        {
> +            printk("Hit default case when reverting lapic to boot state on core #%d\n",
> +                   smp_processor_id());
> +            break;
> +        }
> +        }
>      }
>  }
>  
> @@ -874,7 +919,6 @@ static int __init detect_init_APIC (void
>              wrmsrl(MSR_IA32_APICBASE,
>                  msr_content | MSR_IA32_APICBASE_ENABLE
>                  | APIC_DEFAULT_PHYS_BASE);
> -            enabled_via_apicbase = 1;
>          }
>      }
>      /*
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-18 18:49   ` Konrad Rzeszutek Wilk
@ 2011-05-18 20:27     ` Andrew Cooper
  2011-05-18 20:43       ` Konrad Rzeszutek Wilk
  2011-05-19  0:54       ` Tian, Kevin
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Cooper @ 2011-05-18 20:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel



On 18/05/11 19:49, Konrad Rzeszutek Wilk wrote:
> On Wed, May 18, 2011 at 07:08:14PM +0100, Andrew Cooper wrote:
>> Xen does not store the boot local apic state which leads to problems
>> when shutting down for a kexec jump.  This patch records the boot
>> state so we can return to the boot state when kexec'ing.
>>
>> This is per CPU because all 3 bioses on the boxes I have tested dont
>> enabled all local apics on boot.  As a result, we have to return to
>> the bios state so the ACPI tables match up with the hardware state
>> for the booting kernel.
> Which ACPI table requires this?
Cant remember offhand but linux (2.6.32) was doing finger pointing at 
the multi-processor tables
>> Signed-off-by: Andrew Cooper<andrew.cooper3@citrix.com>
>>
>> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/apic.c
>> --- a/xen/arch/x86/apic.c	Tue May 17 17:32:19 2011 +0100
>> +++ b/xen/arch/x86/apic.c	Wed May 18 19:00:13 2011 +0100
>> @@ -74,6 +74,8 @@ u8 __read_mostly apic_verbosity;
>>   static bool_t __initdata opt_x2apic = 1;
>>   boolean_param("x2apic", opt_x2apic);
>>
>> +DEFINE_PER_CPU_READ_MOSTLY(enum apic_mode, apic_boot_mode) = APIC_MODE_INVALID;
>> +
>>   bool_t __read_mostly x2apic_enabled = 0;
>>   bool_t __read_mostly directed_eoi_enabled = 0;
>>
>> @@ -1437,6 +1439,41 @@ int __init APIC_init_uniprocessor (void)
>>       return 0;
>>   }
>>
>> +/* Needs to be called once per CPU during startup.  It records the state the BIOS
>> + * leaves the local APIC so we can tare back down upon shutdown/crash
> tare?
I fail at spelling - I meant "tear down"
>> + */
>> +void __init record_boot_APIC_mode(void)
>> +{
>> +    enum apic_mode this_apic_mode;
>> +    u64 msr_contents;
>> +
>> +    this_apic_mode = APIC_MODE_INVALID;
>> +
>> +    /* Sanity check - we should only ever run once */
>> +    BUG_ON( APIC_MODE_INVALID != this_cpu(apic_boot_mode) );
>> +
>> +    rdmsrl(MSR_IA32_APICBASE, msr_contents);
>> +
>> +    /* Reading EXTD bit from the MSR is only valid if CPUID says so, else reserved */
>> +    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
>> +&&  (msr_contents&  MSR_IA32_APICBASE_EXTD) )
>> +        this_apic_mode = APIC_MODE_X2APIC;
>> +    else
>> +        {
>> +            /* EN bit should always be valid as long as we can read the MSR
>> +             * Can anyone confirm this?
> Email Vivek Goyal. He is the kexec/kdump maintainer.
He doesn't appear in the maintainers file, and I don't have an email 
address.
>> +             */
>> +            if ( msr_contents&  MSR_IA32_APICBASE_ENABLE )
>> +                this_apic_mode = APIC_MODE_XAPIC;
>> +            else
>> +                this_apic_mode = APIC_MODE_DISABLED;
>> +        }
>> +
>> +    this_cpu(apic_boot_mode) = this_apic_mode;
>> +    apic_printk(APIC_DEBUG, "APIC boot state is %d on core #%d\n",
>> +                this_apic_mode, smp_processor_id());
> This begs of a function to convert those enums to strings..
True - this was left over from my debugging - I shall fix that tomorrow.
>> +}
>> +
>>   void check_for_unexpected_msi(unsigned int vector)
>>   {
>>       unsigned long v = apic_read(APIC_ISR + ((vector&  ~0x1f)>>  1));
>> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/genapic/probe.c
>> --- a/xen/arch/x86/genapic/probe.c	Tue May 17 17:32:19 2011 +0100
>> +++ b/xen/arch/x86/genapic/probe.c	Wed May 18 19:00:13 2011 +0100
>> @@ -60,6 +60,8 @@ void __init generic_apic_probe(void)
>>   {
>>   	int i, changed;
>>
>> +    record_boot_APIC_mode();
>> +
> The spacing looks odd.
Does it?  Looks fine for me - I have been following the 1 tab to 4 
spaces convention in half the codebase.
>>   	check_x2apic_preenabled();
>>   	cmdline_apic = changed = (genapic != NULL);
>>
>> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/smpboot.c
>> --- a/xen/arch/x86/smpboot.c	Tue May 17 17:32:19 2011 +0100
>> +++ b/xen/arch/x86/smpboot.c	Wed May 18 19:00:13 2011 +0100
>> @@ -388,6 +388,9 @@ void start_secondary(void *unused)
>>
>>       microcode_resume_cpu(cpu);
>>
>> +    /* Record boot apic state */
>> +    record_boot_APIC_mode();
>> +
>>       wmb();
>>       startup_cpu_idle_loop();
>>   }
>> diff -r f531ed84b066 -r 62a8ce6595ad xen/include/asm-x86/apic.h
>> --- a/xen/include/asm-x86/apic.h	Tue May 17 17:32:19 2011 +0100
>> +++ b/xen/include/asm-x86/apic.h	Wed May 18 19:00:13 2011 +0100
>> @@ -21,6 +21,16 @@
>>   #define IO_APIC_REDIR_DEST_LOGICAL	0x00800
>>   #define IO_APIC_REDIR_DEST_PHYSICAL	0x00000
>>
>> +/* Possible APIC states */
>> +enum apic_mode { APIC_MODE_INVALID,  /* Not set yet */
>> +                 APIC_MODE_DISABLED, /* Some bioses disable by default for compatability */
>> +                 APIC_MODE_XAPIC,    /* xAPIC mode - default upon chipset reset */
>> +                 APIC_MODE_X2APIC    /* x2APIC mode - common for large smp machines */
>> +};
>> +
>> +/* PerCPU local APIC boot mode - so we can taredown to bios state */
> taredown?
same comment as above re. spelling
>> +DECLARE_PER_CPU(enum apic_mode, apic_boot_mode);
>> +
>>   extern u8 apic_verbosity;
>>   extern bool_t x2apic_enabled;
>>   extern bool_t directed_eoi_enabled;
>> @@ -203,6 +213,7 @@ extern void disable_APIC_timer(void);
>>   extern void enable_APIC_timer(void);
>>   extern int lapic_suspend(void);
>>   extern int lapic_resume(void);
>> +extern void record_boot_APIC_mode(void);
>>
>>   extern int check_nmi_watchdog (void);
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2 of 3] apic: remove 'enabled_via_apicbase' variable
  2011-05-18 18:53   ` Konrad Rzeszutek Wilk
@ 2011-05-18 20:35     ` Andrew Cooper
  2011-05-18 20:45       ` Konrad Rzeszutek Wilk
  2011-05-19  3:31       ` Tian, Kevin
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Cooper @ 2011-05-18 20:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel



On 18/05/11 19:53, Konrad Rzeszutek Wilk wrote:
> On Wed, May 18, 2011 at 07:08:15PM +0100, Andrew Cooper wrote:
>> The use of this varable was only sensible when the choice for lapic mode
> variable
>> was between enabled or disabled.  Now that x2apic is about, it is wrong,
>> and causes a protection fault in certain cases when trying to tare down
> tare?
>
>> x2apic mode.
>>
>> The only place where its use is relevent in the code is in disable_local_APIC
>                                            ^- is         ^^^ take that out.
>
>> which has been changed to correctly tare down the local APIC without a
> teardown?
>> protection fault (which leads to a general protection fault).
> So if you don't have x2apic, then it is wrong to disable the LAPIC mode?
> What about older hardware?
I guess I wasn't very clear in my description.  In older hardware 
without x2apic, it is correct to simply twiddle the ENABLE bit in the 
APICBASE MSR.  However, with x2apic mode enabled, setting the ENABLE bit 
from 1 to 0 while leaving the EXTD bit set will result in a protection 
fault which will propagate to a general protection fault the same 
codepath will be called in the fault handler.  As a result, the current 
code in disable_local_APIC will result in a GPF if the BIOS boots with 
LAPICs disabled (fine as per the spec for compatibility) and xen decided 
to take advantage of x2apic mode.
>> Signed-off-by: Andrew Cooper<andrew.cooper3@citrix.com>
>>
>> diff -r 62a8ce6595ad -r e80b5280fe2f xen/arch/x86/apic.c
>> --- a/xen/arch/x86/apic.c	Wed May 18 19:00:13 2011 +0100
>> +++ b/xen/arch/x86/apic.c	Wed May 18 19:00:13 2011 +0100
>> @@ -165,8 +165,6 @@ void __init apic_intr_init(void)
>>   /* Using APIC to generate smp_local_timer_interrupt? */
>>   static bool_t __read_mostly using_apic_timer;
>>
>> -static bool_t __read_mostly enabled_via_apicbase;
>> -
>>   int get_physical_broadcast(void)
>>   {
>>       if (modern_apic())
>> @@ -330,6 +328,9 @@ void disconnect_bsp_APIC(int virt_wire_s
>>
>>   void disable_local_APIC(void)
>>   {
>> +    u64 msr_contents;
>> +    enum apic_mode current_mode;
>> +
>>       clear_local_APIC();
>>
>>       /*
>> @@ -339,10 +340,54 @@ void disable_local_APIC(void)
>>       apic_write_around(APIC_SPIV,
>>           apic_read(APIC_SPIV)&  ~APIC_SPIV_APIC_ENABLED);
>>
>> -    if (enabled_via_apicbase) {
>> -        uint64_t msr_content;
>> -        rdmsrl(MSR_IA32_APICBASE, msr_content);
>> -        wrmsrl(MSR_IA32_APICBASE, msr_content&  ~MSR_IA32_APICBASE_ENABLE);
>> +    /* Work out what apic mode we are in */
>> +    rdmsrl(MSR_IA32_APICBASE, msr_contents);
>> +
>> +    /* Reading EXTD bit from the MSR is only valid if CPUID says so, else reserved */
>> +    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
>> +&&  (msr_contents&  MSR_IA32_APICBASE_EXTD) )
>> +        current_mode = APIC_MODE_X2APIC;
>> +    else
>> +        {
>> +            /* EN bit should always be valid as long as we can read the MSR
>> +             * Can anyone confirm this?
> Might want to email Vivek Goyal..
>> +             */
>> +            if ( msr_contents&  MSR_IA32_APICBASE_ENABLE )
>> +                current_mode = APIC_MODE_XAPIC;
>> +            else
>> +                current_mode = APIC_MODE_DISABLED;
>> +        }
>> +
>> +    /* See what (if anything) we need to do to revert to boot mode */
>> +    if( current_mode != this_cpu(apic_boot_mode) )
>> +    {
>> +        rdmsrl(MSR_IA32_APICBASE, msr_contents);
>> +        msr_contents&= ~ ( MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD );
>> +        wrmsrl(MSR_IA32_APICBASE, msr_contents);
>> +
>> +        switch(this_cpu(apic_boot_mode))
>> +        {
>> +        case APIC_MODE_DISABLED:
>> +            break; /* Nothing to do - we did this above */
>> +        case APIC_MODE_XAPIC:
>> +        {
>> +            msr_contents |= MSR_IA32_APICBASE_ENABLE;
>> +            wrmsrl(MSR_IA32_APICBASE, msr_contents);
>> +            break;
>> +        }
>> +        case APIC_MODE_X2APIC:
>> +        {
>> +            msr_contents |= ( MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD );
>> +            wrmsrl(MSR_IA32_APICBASE, msr_contents);
>> +            break;
>> +        }
>> +        default:
>> +        {
>> +            printk("Hit default case when reverting lapic to boot state on core #%d\n",
>> +                   smp_processor_id());
>> +            break;
>> +        }
>> +        }
>>       }
>>   }
>>
>> @@ -874,7 +919,6 @@ static int __init detect_init_APIC (void
>>               wrmsrl(MSR_IA32_APICBASE,
>>                   msr_content | MSR_IA32_APICBASE_ENABLE
>>                   | APIC_DEFAULT_PHYS_BASE);
>> -            enabled_via_apicbase = 1;
>>           }
>>       }
>>       /*
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-18 20:27     ` Andrew Cooper
@ 2011-05-18 20:43       ` Konrad Rzeszutek Wilk
  2011-05-19  0:56         ` Tian, Kevin
  2011-05-19  8:34         ` Ian Campbell
  2011-05-19  0:54       ` Tian, Kevin
  1 sibling, 2 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-18 20:43 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Wed, May 18, 2011 at 09:27:56PM +0100, Andrew Cooper wrote:
> 
> 
> On 18/05/11 19:49, Konrad Rzeszutek Wilk wrote:
> >On Wed, May 18, 2011 at 07:08:14PM +0100, Andrew Cooper wrote:
> >>Xen does not store the boot local apic state which leads to problems
> >>when shutting down for a kexec jump.  This patch records the boot
> >>state so we can return to the boot state when kexec'ing.
> >>
> >>This is per CPU because all 3 bioses on the boxes I have tested dont
> >>enabled all local apics on boot.  As a result, we have to return to
> >>the bios state so the ACPI tables match up with the hardware state
> >>for the booting kernel.
> >Which ACPI table requires this?
> Cant remember offhand but linux (2.6.32) was doing finger pointing
> at the multi-processor tables

MP tables? Those don't get used when ACPI is used.. 

> >>Signed-off-by: Andrew Cooper<andrew.cooper3@citrix.com>
> >>
> >>diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/apic.c
> >>--- a/xen/arch/x86/apic.c	Tue May 17 17:32:19 2011 +0100
> >>+++ b/xen/arch/x86/apic.c	Wed May 18 19:00:13 2011 +0100
> >>@@ -74,6 +74,8 @@ u8 __read_mostly apic_verbosity;
> >>  static bool_t __initdata opt_x2apic = 1;
> >>  boolean_param("x2apic", opt_x2apic);
> >>
> >>+DEFINE_PER_CPU_READ_MOSTLY(enum apic_mode, apic_boot_mode) = APIC_MODE_INVALID;
> >>+
> >>  bool_t __read_mostly x2apic_enabled = 0;
> >>  bool_t __read_mostly directed_eoi_enabled = 0;
> >>
> >>@@ -1437,6 +1439,41 @@ int __init APIC_init_uniprocessor (void)
> >>      return 0;
> >>  }
> >>
> >>+/* Needs to be called once per CPU during startup.  It records the state the BIOS
> >>+ * leaves the local APIC so we can tare back down upon shutdown/crash
> >tare?
> I fail at spelling - I meant "tear down"
> >>+ */
> >>+void __init record_boot_APIC_mode(void)
> >>+{
> >>+    enum apic_mode this_apic_mode;
> >>+    u64 msr_contents;
> >>+
> >>+    this_apic_mode = APIC_MODE_INVALID;
> >>+
> >>+    /* Sanity check - we should only ever run once */
> >>+    BUG_ON( APIC_MODE_INVALID != this_cpu(apic_boot_mode) );
> >>+
> >>+    rdmsrl(MSR_IA32_APICBASE, msr_contents);
> >>+
> >>+    /* Reading EXTD bit from the MSR is only valid if CPUID says so, else reserved */
> >>+    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
> >>+&&  (msr_contents&  MSR_IA32_APICBASE_EXTD) )
> >>+        this_apic_mode = APIC_MODE_X2APIC;
> >>+    else
> >>+        {
> >>+            /* EN bit should always be valid as long as we can read the MSR
> >>+             * Can anyone confirm this?
> >Email Vivek Goyal. He is the kexec/kdump maintainer.
> He doesn't appear in the maintainers file, and I don't have an email

Hmm..This is what I see in the MAINTAINERS file:
KDUMP
M:      Vivek Goyal <vgoyal@redhat.com>
M:      Haren Myneni <hbabu@us.ibm.com>
L:      kexec@lists.infradead.org
W:      http://lse.sourceforge.net/kdump/
S:      Maintained
F:      Documentation/kdump/


> address.

http://lmgtfy.com/?q=Vivek+Goyal+kexec


> >>+             */
> >>+            if ( msr_contents&  MSR_IA32_APICBASE_ENABLE )
> >>+                this_apic_mode = APIC_MODE_XAPIC;
> >>+            else
> >>+                this_apic_mode = APIC_MODE_DISABLED;
> >>+        }
> >>+
> >>+    this_cpu(apic_boot_mode) = this_apic_mode;
> >>+    apic_printk(APIC_DEBUG, "APIC boot state is %d on core #%d\n",
> >>+                this_apic_mode, smp_processor_id());
> >This begs of a function to convert those enums to strings..
> True - this was left over from my debugging - I shall fix that tomorrow.

Ok.
> >>+}
> >>+
> >>  void check_for_unexpected_msi(unsigned int vector)
> >>  {
> >>      unsigned long v = apic_read(APIC_ISR + ((vector&  ~0x1f)>>  1));
> >>diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/genapic/probe.c
> >>--- a/xen/arch/x86/genapic/probe.c	Tue May 17 17:32:19 2011 +0100
> >>+++ b/xen/arch/x86/genapic/probe.c	Wed May 18 19:00:13 2011 +0100
> >>@@ -60,6 +60,8 @@ void __init generic_apic_probe(void)
> >>  {
> >>  	int i, changed;
> >>
> >>+    record_boot_APIC_mode();
> >>+
> >The spacing looks odd.
> Does it?  Looks fine for me - I have been following the 1 tab to 4
> spaces convention in half the codebase.

Could be my mailer..

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2 of 3] apic: remove 'enabled_via_apicbase' variable
  2011-05-18 20:35     ` Andrew Cooper
@ 2011-05-18 20:45       ` Konrad Rzeszutek Wilk
  2011-05-19  3:31       ` Tian, Kevin
  1 sibling, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-18 20:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

> >So if you don't have x2apic, then it is wrong to disable the LAPIC mode?
> >What about older hardware?
> I guess I wasn't very clear in my description.  In older hardware
> without x2apic, it is correct to simply twiddle the ENABLE bit in
> the APICBASE MSR.  However, with x2apic mode enabled, setting the
> ENABLE bit from 1 to 0 while leaving the EXTD bit set will result in
> a protection fault which will propagate to a general protection
> fault the same codepath will be called in the fault handler.  As a
> result, the current code in disable_local_APIC will result in a GPF
> if the BIOS boots with LAPICs disabled (fine as per the spec for
> compatibility) and xen decided to take advantage of x2apic mode.

Ok. You might also point to the appropiate section of the x2APIC
spec about this. I think it is 2.7.1.2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel
  2011-05-18 18:49   ` Konrad Rzeszutek Wilk
@ 2011-05-18 20:48     ` Andrew Cooper
  2011-05-18 20:57       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2011-05-18 20:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel



On 18/05/11 19:49, Konrad Rzeszutek Wilk wrote:
> On Wed, May 18, 2011 at 07:08:16PM +0100, Andrew Cooper wrote:
>> kdump kernels are unable to boot with IOMMU enabled,
>> this patch disabled IOMMU mode and removes some of the generic
>> code from the shutdown path which doesnt work after other
>> CPUs have been shot down.
>>
>> Also, leave local interrupts disabled when jumping into pugatory
> purgatory?
purgatory is the bit of code which the a crashing kernel jumps into, 
which pretends to do minimal bootloader things to book the kdump 
kernel.  It is part of the kexec-tools package.

>> as we have no idea whats in there and really dont want to be
>> servicing interrupts when our entire state is invalid.
>>
>> Signed-off-by: Andrew Cooper<andrew.cooper3@citrix.com>
>>
>> diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/crash.c
>> --- a/xen/arch/x86/crash.c	Wed May 18 19:00:13 2011 +0100
>> +++ b/xen/arch/x86/crash.c	Wed May 18 19:00:13 2011 +0100
>> @@ -27,6 +27,8 @@
>>   #include<asm/hvm/support.h>
>>   #include<asm/apic.h>
>>   #include<asm/io_apic.h>
>> +#include<xen/iommu.h>
>> +#include<asm/hvm/iommu.h>
>>
>>   static atomic_t waiting_for_crash_ipi;
>>   static unsigned int crashing_cpu;
>> @@ -43,7 +45,10 @@ static int crash_nmi_callback(struct cpu
>>
>>       kexec_crash_save_cpu();
>>
>> -    __stop_this_cpu();
>> +    disable_local_APIC();
>> +    hvm_cpu_down();
>> +    clts();
>> +    asm volatile ( "fninit" );
> Can you provide a comment why you are using fninit and clt?
> Is this what the Linux kernel does too when it goes through the kexec path?
I was replacing __stop_this_cpu() with the safe subset of its contents - 
it was a verbatim copy minus the SMP stuff which the regular 
__stop_this_cpu() does.  I suppose I could have split __stop_this_cpu() 
to __crash_stop_this_cpu() but it didn't seem worth making such a 
trivially small function.
>>
>>       atomic_dec(&waiting_for_crash_ipi);
>>
>> @@ -56,6 +61,7 @@ static int crash_nmi_callback(struct cpu
>>   static void nmi_shootdown_cpus(void)
>>   {
>>       unsigned long msecs;
>> +    u64 msr_contents;
>>
>>       local_irq_disable();
>>
>> @@ -77,18 +83,43 @@ static void nmi_shootdown_cpus(void)
>>           msecs--;
>>       }
>>
>> -    __stop_this_cpu();
>> +    disable_local_APIC();
>> +    hvm_cpu_down();
>> +    clts();
>> +    asm volatile ( "fninit" );
>> +
>> +    /* This is a bit of a hack but there is no other way to shutdown correctly
>> +     * without a significant refactoring of the APIC code */
>> +    rdmsrl(MSR_IA32_APICBASE, msr_contents);
>> +    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
>> +&&  (msr_contents&  MSR_IA32_APICBASE_EXTD) )
>> +        x2apic_enabled = 1;
>> +    else
>> +        x2apic_enabled = 0;
>> +
>>       disable_IO_APIC();
>> -
>> -    local_irq_enable();
> Why?
Because that local_irq_enable() results in the interrupt flag being set 
when jumping into purgatory, which (at the moment) doesn't touch 
interrupts at all.  The result is that interrupts from PCI devices which 
are unaware of the crash are (potentially) being serviced by the xen 
handlers even though we have left the Xen context for good.
>>   }
>>
>>   void machine_crash_shutdown(void)
>>   {
>>       crash_xen_info_t *info;
>> +    const struct iommu_ops * ops;
>>
>>       nmi_shootdown_cpus();
>>
>> +    /* Yes i know this is hacky but it is the easiest solution.  I should add an iommu_ops
>> +     * function called crash() or so which just disables the iommu 'fun' without saving state
>> +     */
>> +    ops = iommu_get_ops();
>> +    if(ops)
>> +        ops->suspend();
> Uh, no checking if ops->suspend exists?
>
True - at the moment both intel and amd iommu_ops are fully implemented 
but I will add an extra condition to the if statement.
>> +
>> +    /* Yes i know this is from driver/passthrough/vtd/ but it appears to be architecture
>> +     * independant, and also bears little/no relation to x2apic.  Needs cleaning up
> What about AMD VI IOMMUs? Does it work when that IOMMU is used?
>
It worked on the AMD box I tested the code on.  Like the comment says - 
as far as I can tell, it is architecture independent code.
>> +     */
>> +    iommu_disable_x2apic_IR();
> Can't that function be done in the suspend code of the IOMMU?
There is a comment in iommu suspend stating that it cant and isn't done, 
but rather is left for the local/ioapic_suspend functions which dont 
properly work in the kexec path.
>> +
>> +
>>       info = kexec_crash_save_info();
>>       info->xen_phys_start = xen_phys_start;
>>       info->dom0_pfn_to_mfn_frame_list_list =
>> diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/hpet.c
>> --- a/xen/arch/x86/hpet.c	Wed May 18 19:00:13 2011 +0100
>> +++ b/xen/arch/x86/hpet.c	Wed May 18 19:00:13 2011 +0100
>> @@ -670,6 +670,33 @@ void hpet_disable_legacy_broadcast(void)
>>       smp_send_event_check_mask(&cpu_online_map);
>>   }
>>
>> +/* This function is similar to the regular
>> + * hpet_disable_legacy_broadcast function, except it is called
>> + * on the crash path with only the current processor up, so we
>> + * can forget the locks and really cant send an event check IPI
>> + * to the other processors */
>> +void crash_hpet_disable_legacy_broadcast(void)
>> +{
>> +    u32 cfg;
>> +
>> +    if ( !hpet_events || !(hpet_events->flags&  HPET_EVT_LEGACY) )
>> +        return;
>> +
>> +    hpet_events->flags |= HPET_EVT_DISABLE;
>> +
>> +    /* disable HPET T0 */
>> +    cfg = hpet_read32(HPET_Tn_CFG(0));
>> +    cfg&= ~HPET_TN_ENABLE;
>> +    hpet_write32(cfg, HPET_Tn_CFG(0));
>> +
>> +    /* Stop HPET legacy interrupts */
>> +    cfg = hpet_read32(HPET_CFG);
>> +    cfg&= ~HPET_CFG_LEGACY;
>> +    hpet_write32(cfg, HPET_CFG);
>> +
>> +}
>> +
>> +
>>   void hpet_broadcast_enter(void)
>>   {
>>       unsigned int cpu = smp_processor_id();
>> diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/machine_kexec.c
>> --- a/xen/arch/x86/machine_kexec.c	Wed May 18 19:00:13 2011 +0100
>> +++ b/xen/arch/x86/machine_kexec.c	Wed May 18 19:00:13 2011 +0100
>> @@ -97,7 +97,7 @@ void machine_kexec(xen_kexec_image_t *im
>>       };
>>
>>       if ( hpet_broadcast_is_available() )
>> -        hpet_disable_legacy_broadcast();
>> +        crash_hpet_disable_legacy_broadcast();
>>
>>       /*
>>        * compat_machine_kexec() returns to idle pagetables, which requires us
>> diff -r e80b5280fe2f -r aaf44d1a903d xen/include/asm-x86/hpet.h
>> --- a/xen/include/asm-x86/hpet.h	Wed May 18 19:00:13 2011 +0100
>> +++ b/xen/include/asm-x86/hpet.h	Wed May 18 19:00:13 2011 +0100
>> @@ -73,5 +73,6 @@ void hpet_broadcast_enter(void);
>>   void hpet_broadcast_exit(void);
>>   int hpet_broadcast_is_available(void);
>>   void hpet_disable_legacy_broadcast(void);
>> +void crash_hpet_disable_legacy_broadcast(void);
>>
>>   #endif /* __X86_HPET_H__ */
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel
  2011-05-18 20:48     ` Andrew Cooper
@ 2011-05-18 20:57       ` Konrad Rzeszutek Wilk
  2011-05-18 21:24         ` Andrew Cooper
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-18 20:57 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Wed, May 18, 2011 at 09:48:36PM +0100, Andrew Cooper wrote:
> 
> 
> On 18/05/11 19:49, Konrad Rzeszutek Wilk wrote:
> >On Wed, May 18, 2011 at 07:08:16PM +0100, Andrew Cooper wrote:
> >>kdump kernels are unable to boot with IOMMU enabled,
> >>this patch disabled IOMMU mode and removes some of the generic
> >>code from the shutdown path which doesnt work after other
> >>CPUs have been shot down.
> >>
> >>Also, leave local interrupts disabled when jumping into pugatory
> >purgatory?
> purgatory is the bit of code which the a crashing kernel jumps into,
> which pretends to do minimal bootloader things to book the kdump
> kernel.  It is part of the kexec-tools package.

Ok. Might want to include that in the description.

> 
> >>as we have no idea whats in there and really dont want to be
> >>servicing interrupts when our entire state is invalid.
> >>
> >>Signed-off-by: Andrew Cooper<andrew.cooper3@citrix.com>
> >>
> >>diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/crash.c
> >>--- a/xen/arch/x86/crash.c	Wed May 18 19:00:13 2011 +0100
> >>+++ b/xen/arch/x86/crash.c	Wed May 18 19:00:13 2011 +0100
> >>@@ -27,6 +27,8 @@
> >>  #include<asm/hvm/support.h>
> >>  #include<asm/apic.h>
> >>  #include<asm/io_apic.h>
> >>+#include<xen/iommu.h>
> >>+#include<asm/hvm/iommu.h>
> >>
> >>  static atomic_t waiting_for_crash_ipi;
> >>  static unsigned int crashing_cpu;
> >>@@ -43,7 +45,10 @@ static int crash_nmi_callback(struct cpu
> >>
> >>      kexec_crash_save_cpu();
> >>
> >>-    __stop_this_cpu();
> >>+    disable_local_APIC();
> >>+    hvm_cpu_down();
> >>+    clts();
> >>+    asm volatile ( "fninit" );
> >Can you provide a comment why you are using fninit and clt?
> >Is this what the Linux kernel does too when it goes through the kexec path?
> I was replacing __stop_this_cpu() with the safe subset of its
> contents - it was a verbatim copy minus the SMP stuff which the
> regular __stop_this_cpu() does.  I suppose I could have split
> __stop_this_cpu() to __crash_stop_this_cpu() but it didn't seem
> worth making such a trivially small function.

Why can't you use the SMP version? I know you are not running
in SMP mode, but does it hurt?

> >>
> >>      atomic_dec(&waiting_for_crash_ipi);
> >>
> >>@@ -56,6 +61,7 @@ static int crash_nmi_callback(struct cpu
> >>  static void nmi_shootdown_cpus(void)
> >>  {
> >>      unsigned long msecs;
> >>+    u64 msr_contents;
> >>
> >>      local_irq_disable();
> >>
> >>@@ -77,18 +83,43 @@ static void nmi_shootdown_cpus(void)
> >>          msecs--;
> >>      }
> >>
> >>-    __stop_this_cpu();
> >>+    disable_local_APIC();
> >>+    hvm_cpu_down();
> >>+    clts();
> >>+    asm volatile ( "fninit" );
> >>+
> >>+    /* This is a bit of a hack but there is no other way to shutdown correctly
> >>+     * without a significant refactoring of the APIC code */
> >>+    rdmsrl(MSR_IA32_APICBASE, msr_contents);
> >>+    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
> >>+&&  (msr_contents&  MSR_IA32_APICBASE_EXTD) )
> >>+        x2apic_enabled = 1;
> >>+    else
> >>+        x2apic_enabled = 0;
> >>+
> >>      disable_IO_APIC();
> >>-
> >>-    local_irq_enable();
> >Why?
> Because that local_irq_enable() results in the interrupt flag being
> set when jumping into purgatory, which (at the moment) doesn't touch
> interrupts at all.  The result is that interrupts from PCI devices
> which are unaware of the crash are (potentially) being serviced by
> the xen handlers even though we have left the Xen context for good.

Yikes. Please do explain this in the code right there were you
remove the local_irq_enable..

> >>  }
> >>
> >>  void machine_crash_shutdown(void)
> >>  {
> >>      crash_xen_info_t *info;
> >>+    const struct iommu_ops * ops;
> >>
> >>      nmi_shootdown_cpus();
> >>
> >>+    /* Yes i know this is hacky but it is the easiest solution.  I should add an iommu_ops
> >>+     * function called crash() or so which just disables the iommu 'fun' without saving state
> >>+     */
> >>+    ops = iommu_get_ops();
> >>+    if(ops)
> >>+        ops->suspend();
> >Uh, no checking if ops->suspend exists?
> >
> True - at the moment both intel and amd iommu_ops are fully
> implemented but I will add an extra condition to the if statement.
> >>+
> >>+    /* Yes i know this is from driver/passthrough/vtd/ but it appears to be architecture
> >>+     * independant, and also bears little/no relation to x2apic.  Needs cleaning up
> >What about AMD VI IOMMUs? Does it work when that IOMMU is used?
> >
> It worked on the AMD box I tested the code on.  Like the comment
> says - as far as I can tell, it is architecture independent code.
> >>+     */
> >>+    iommu_disable_x2apic_IR();
> >Can't that function be done in the suspend code of the IOMMU?
> There is a comment in iommu suspend stating that it cant and isn't
> done, but rather is left for the local/ioapic_suspend functions
> which dont properly work in the kexec path.

OK, how about just moving it out of driver/passthrought/vtd then?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel
  2011-05-18 20:57       ` Konrad Rzeszutek Wilk
@ 2011-05-18 21:24         ` Andrew Cooper
  2011-05-19 14:32           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2011-05-18 21:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel



On 18/05/11 21:57, Konrad Rzeszutek Wilk wrote:
> On Wed, May 18, 2011 at 09:48:36PM +0100, Andrew Cooper wrote:
>>
>> On 18/05/11 19:49, Konrad Rzeszutek Wilk wrote:
>>> On Wed, May 18, 2011 at 07:08:16PM +0100, Andrew Cooper wrote:
>>>> kdump kernels are unable to boot with IOMMU enabled,
>>>> this patch disabled IOMMU mode and removes some of the generic
>>>> code from the shutdown path which doesnt work after other
>>>> CPUs have been shot down.
>>>>
>>>> Also, leave local interrupts disabled when jumping into pugatory
>>> purgatory?
>> purgatory is the bit of code which the a crashing kernel jumps into,
>> which pretends to do minimal bootloader things to book the kdump
>> kernel.  It is part of the kexec-tools package.
> Ok. Might want to include that in the description.
>
>>>> as we have no idea whats in there and really dont want to be
>>>> servicing interrupts when our entire state is invalid.
>>>>
>>>> Signed-off-by: Andrew Cooper<andrew.cooper3@citrix.com>
>>>>
>>>> diff -r e80b5280fe2f -r aaf44d1a903d xen/arch/x86/crash.c
>>>> --- a/xen/arch/x86/crash.c	Wed May 18 19:00:13 2011 +0100
>>>> +++ b/xen/arch/x86/crash.c	Wed May 18 19:00:13 2011 +0100
>>>> @@ -27,6 +27,8 @@
>>>>   #include<asm/hvm/support.h>
>>>>   #include<asm/apic.h>
>>>>   #include<asm/io_apic.h>
>>>> +#include<xen/iommu.h>
>>>> +#include<asm/hvm/iommu.h>
>>>>
>>>>   static atomic_t waiting_for_crash_ipi;
>>>>   static unsigned int crashing_cpu;
>>>> @@ -43,7 +45,10 @@ static int crash_nmi_callback(struct cpu
>>>>
>>>>       kexec_crash_save_cpu();
>>>>
>>>> -    __stop_this_cpu();
>>>> +    disable_local_APIC();
>>>> +    hvm_cpu_down();
>>>> +    clts();
>>>> +    asm volatile ( "fninit" );
>>> Can you provide a comment why you are using fninit and clt?
>>> Is this what the Linux kernel does too when it goes through the kexec path?
>> I was replacing __stop_this_cpu() with the safe subset of its
>> contents - it was a verbatim copy minus the SMP stuff which the
>> regular __stop_this_cpu() does.  I suppose I could have split
>> __stop_this_cpu() to __crash_stop_this_cpu() but it didn't seem
>> worth making such a trivially small function.
> Why can't you use the SMP version? I know you are not running
> in SMP mode, but does it hurt?
>
It sadly does hurt.  All the code relating to the x2apic_enabled 
variable is broken.  That variable is used to mean "is the user 
expecting me to use x2apic mode", "did the bios boot me in x2apic mode" 
and "what mode is the boot processor currently in".  For normal 
operation, it works (assuming no race conditions creep in when starting 
non-boot processors).  However,  x2apic mode really needs to be per_cpu 
because it depends on the lapic MSR as to whether you should be using 
MSRs or MMIO to talk to the lapic/ioapic registers, and you kindly get a 
protection fault if you use MSRs outside of x2apic mode.  As a result, 1 
global variable doesn't cut it when you are doing an nmi shootdown of 
processors.  I was in the middle of making a fix for that, but Keir made 
it clear that such a patch would not be accepted.
>>>>       atomic_dec(&waiting_for_crash_ipi);
>>>>
>>>> @@ -56,6 +61,7 @@ static int crash_nmi_callback(struct cpu
>>>>   static void nmi_shootdown_cpus(void)
>>>>   {
>>>>       unsigned long msecs;
>>>> +    u64 msr_contents;
>>>>
>>>>       local_irq_disable();
>>>>
>>>> @@ -77,18 +83,43 @@ static void nmi_shootdown_cpus(void)
>>>>           msecs--;
>>>>       }
>>>>
>>>> -    __stop_this_cpu();
>>>> +    disable_local_APIC();
>>>> +    hvm_cpu_down();
>>>> +    clts();
>>>> +    asm volatile ( "fninit" );
>>>> +
>>>> +    /* This is a bit of a hack but there is no other way to shutdown correctly
>>>> +     * without a significant refactoring of the APIC code */
>>>> +    rdmsrl(MSR_IA32_APICBASE, msr_contents);
>>>> +    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
>>>> +&&   (msr_contents&   MSR_IA32_APICBASE_EXTD) )
>>>> +        x2apic_enabled = 1;
>>>> +    else
>>>> +        x2apic_enabled = 0;
>>>> +
>>>>       disable_IO_APIC();
>>>> -
>>>> -    local_irq_enable();
>>> Why?
>> Because that local_irq_enable() results in the interrupt flag being
>> set when jumping into purgatory, which (at the moment) doesn't touch
>> interrupts at all.  The result is that interrupts from PCI devices
>> which are unaware of the crash are (potentially) being serviced by
>> the xen handlers even though we have left the Xen context for good.
> Yikes. Please do explain this in the code right there were you
> remove the local_irq_enable..
>
>>>>   }
>>>>
>>>>   void machine_crash_shutdown(void)
>>>>   {
>>>>       crash_xen_info_t *info;
>>>> +    const struct iommu_ops * ops;
>>>>
>>>>       nmi_shootdown_cpus();
>>>>
>>>> +    /* Yes i know this is hacky but it is the easiest solution.  I should add an iommu_ops
>>>> +     * function called crash() or so which just disables the iommu 'fun' without saving state
>>>> +     */
>>>> +    ops = iommu_get_ops();
>>>> +    if(ops)
>>>> +        ops->suspend();
>>> Uh, no checking if ops->suspend exists?
>>>
>> True - at the moment both intel and amd iommu_ops are fully
>> implemented but I will add an extra condition to the if statement.
>>>> +
>>>> +    /* Yes i know this is from driver/passthrough/vtd/ but it appears to be architecture
>>>> +     * independant, and also bears little/no relation to x2apic.  Needs cleaning up
>>> What about AMD VI IOMMUs? Does it work when that IOMMU is used?
>>>
>> It worked on the AMD box I tested the code on.  Like the comment
>> says - as far as I can tell, it is architecture independent code.
>>>> +     */
>>>> +    iommu_disable_x2apic_IR();
>>> Can't that function be done in the suspend code of the IOMMU?
>> There is a comment in iommu suspend stating that it cant and isn't
>> done, but rather is left for the local/ioapic_suspend functions
>> which dont properly work in the kexec path.
> OK, how about just moving it out of driver/passthrought/vtd then?
Because that code is fragile enough without me poking about in it.  I 
would prefer someone with more knowledge about IOMMU to make that call.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-18 18:08 ` [PATCH 1 of 3] apic: record local apic state on boot Andrew Cooper
  2011-05-18 18:49   ` Konrad Rzeszutek Wilk
@ 2011-05-18 23:40   ` Keir Fraser
  2011-05-19 11:14     ` Andrew Cooper
  1 sibling, 1 reply; 27+ messages in thread
From: Keir Fraser @ 2011-05-18 23:40 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel

On 18/05/2011 19:08, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:

> Xen does not store the boot local apic state which leads to problems
> when shutting down for a kexec jump.  This patch records the boot
> state so we can return to the boot state when kexec'ing.
> 
> This is per CPU because all 3 bioses on the boxes I have tested dont
> enabled all local apics on boot.

You mean some CPUs have APIC totally disabled? Impossible: you need the APIC
enabled to be able to INIT-SIPI bootstrap the secondary processors. And Xen
itself has no code to enable disabled APICs (except some very legacy code
that executes only in the uniprocessor case, which I'm pretty sure you're
not touching).

You call your new function so late in secondary CPU bringup that our own
APIC setup has already run (via
smp_callin->{x2apic_ap_setup,setup_local_APIC}). Which would indicate that
the boot state of the secondary CPUs is not of importance to you, since you
I'm sure tested your patch as working okay in this current form, and that
would imply that you don't need to record per-cpu boot state.

The singleshot BUG_ON in record_boot_APIC_mode is broken, since secondary
CPUs can be started up multiple times (we can offline/online them). You need
to bail on repeated invocations. Or not be recording boot state for
secondary cpus since apparently it isn't needed...

We're going to be going through these patches very slowly and painfully,
because frankly the arguments and assertions seem to be plenty full of
holes.

 -- Keir

> As a result, we have to return to
> the bios state so the ACPI tables match up with the hardware state
> for the booting kernel.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/apic.c
> --- a/xen/arch/x86/apic.c Tue May 17 17:32:19 2011 +0100
> +++ b/xen/arch/x86/apic.c Wed May 18 19:00:13 2011 +0100
> @@ -74,6 +74,8 @@ u8 __read_mostly apic_verbosity;
>  static bool_t __initdata opt_x2apic = 1;
>  boolean_param("x2apic", opt_x2apic);
>  
> +DEFINE_PER_CPU_READ_MOSTLY(enum apic_mode, apic_boot_mode) =
> APIC_MODE_INVALID;
> +
>  bool_t __read_mostly x2apic_enabled = 0;
>  bool_t __read_mostly directed_eoi_enabled = 0;
>  
> @@ -1437,6 +1439,41 @@ int __init APIC_init_uniprocessor (void)
>      return 0;
>  }
>  
> +/* Needs to be called once per CPU during startup.  It records the state the
> BIOS
> + * leaves the local APIC so we can tare back down upon shutdown/crash
> + */
> +void __init record_boot_APIC_mode(void)
> +{
> +    enum apic_mode this_apic_mode;
> +    u64 msr_contents;
> +
> +    this_apic_mode = APIC_MODE_INVALID;
> +
> +    /* Sanity check - we should only ever run once */
> +    BUG_ON( APIC_MODE_INVALID != this_cpu(apic_boot_mode) );
> +
> +    rdmsrl(MSR_IA32_APICBASE, msr_contents);
> +
> +    /* Reading EXTD bit from the MSR is only valid if CPUID says so, else
> reserved */
> +    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
> +         && (msr_contents & MSR_IA32_APICBASE_EXTD) )
> +        this_apic_mode = APIC_MODE_X2APIC;
> +    else
> +        {
> +            /* EN bit should always be valid as long as we can read the MSR
> +             * Can anyone confirm this?
> +             */
> +            if ( msr_contents & MSR_IA32_APICBASE_ENABLE )
> +                this_apic_mode = APIC_MODE_XAPIC;
> +            else
> +                this_apic_mode = APIC_MODE_DISABLED;
> +        }
> +
> +    this_cpu(apic_boot_mode) = this_apic_mode;
> +    apic_printk(APIC_DEBUG, "APIC boot state is %d on core #%d\n",
> +                this_apic_mode, smp_processor_id());
> +}
> +
>  void check_for_unexpected_msi(unsigned int vector)
>  {
>      unsigned long v = apic_read(APIC_ISR + ((vector & ~0x1f) >> 1));
> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/genapic/probe.c
> --- a/xen/arch/x86/genapic/probe.c Tue May 17 17:32:19 2011 +0100
> +++ b/xen/arch/x86/genapic/probe.c Wed May 18 19:00:13 2011 +0100
> @@ -60,6 +60,8 @@ void __init generic_apic_probe(void)
>  { 
> int i, changed;
>  
> +    record_boot_APIC_mode();
> +
> check_x2apic_preenabled();
> cmdline_apic = changed = (genapic != NULL);
>  
> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/smpboot.c
> --- a/xen/arch/x86/smpboot.c Tue May 17 17:32:19 2011 +0100
> +++ b/xen/arch/x86/smpboot.c Wed May 18 19:00:13 2011 +0100
> @@ -388,6 +388,9 @@ void start_secondary(void *unused)
>  
>      microcode_resume_cpu(cpu);
>  
> +    /* Record boot apic state */
> +    record_boot_APIC_mode();
> +
>      wmb();
>      startup_cpu_idle_loop();
>  }
> diff -r f531ed84b066 -r 62a8ce6595ad xen/include/asm-x86/apic.h
> --- a/xen/include/asm-x86/apic.h Tue May 17 17:32:19 2011 +0100
> +++ b/xen/include/asm-x86/apic.h Wed May 18 19:00:13 2011 +0100
> @@ -21,6 +21,16 @@
>  #define IO_APIC_REDIR_DEST_LOGICAL 0x00800
>  #define IO_APIC_REDIR_DEST_PHYSICAL 0x00000
>  
> +/* Possible APIC states */
> +enum apic_mode { APIC_MODE_INVALID,  /* Not set yet */
> +                 APIC_MODE_DISABLED, /* Some bioses disable by default for
> compatability */
> +                 APIC_MODE_XAPIC,    /* xAPIC mode - default upon chipset
> reset */
> +                 APIC_MODE_X2APIC    /* x2APIC mode - common for large smp
> machines */
> +};
> +
> +/* PerCPU local APIC boot mode - so we can taredown to bios state */
> +DECLARE_PER_CPU(enum apic_mode, apic_boot_mode);
> +
>  extern u8 apic_verbosity;
>  extern bool_t x2apic_enabled;
>  extern bool_t directed_eoi_enabled;
> @@ -203,6 +213,7 @@ extern void disable_APIC_timer(void);
>  extern void enable_APIC_timer(void);
>  extern int lapic_suspend(void);
>  extern int lapic_resume(void);
> +extern void record_boot_APIC_mode(void);
>  
>  extern int check_nmi_watchdog (void);
>  
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-18 20:27     ` Andrew Cooper
  2011-05-18 20:43       ` Konrad Rzeszutek Wilk
@ 2011-05-19  0:54       ` Tian, Kevin
  1 sibling, 0 replies; 27+ messages in thread
From: Tian, Kevin @ 2011-05-19  0:54 UTC (permalink / raw)
  To: Andrew Cooper, Konrad Rzeszutek Wilk; +Cc: xen-devel

> From: Andrew Cooper
> Sent: Thursday, May 19, 2011 4:28 AM
> 
> On 18/05/11 19:49, Konrad Rzeszutek Wilk wrote:
> > On Wed, May 18, 2011 at 07:08:14PM +0100, Andrew Cooper wrote:
> >> Xen does not store the boot local apic state which leads to problems
> >> when shutting down for a kexec jump.  This patch records the boot
> >> state so we can return to the boot state when kexec'ing.
> >>
> >> This is per CPU because all 3 bioses on the boxes I have tested dont
> >> enabled all local apics on boot.  As a result, we have to return to
> >> the bios state so the ACPI tables match up with the hardware state
> >> for the booting kernel.
> > Which ACPI table requires this?
> Cant remember offhand but linux (2.6.32) was doing finger pointing at the
> multi-processor tables

it would be good to understand exactly how CPU states don't match the
ACPI table. As Keir commented, your current patch is bogus which may
just work around the issue in another way.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-18 20:43       ` Konrad Rzeszutek Wilk
@ 2011-05-19  0:56         ` Tian, Kevin
  2011-05-19  8:34         ` Ian Campbell
  1 sibling, 0 replies; 27+ messages in thread
From: Tian, Kevin @ 2011-05-19  0:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andrew Cooper; +Cc: xen-devel

> From: Konrad Rzeszutek Wilk
> Sent: Thursday, May 19, 2011 4:43 AM
> 
> On Wed, May 18, 2011 at 09:27:56PM +0100, Andrew Cooper wrote:
> >
> >
> > On 18/05/11 19:49, Konrad Rzeszutek Wilk wrote:
> > >On Wed, May 18, 2011 at 07:08:14PM +0100, Andrew Cooper wrote:
> > >>Xen does not store the boot local apic state which leads to problems
> > >>when shutting down for a kexec jump.  This patch records the boot
> > >>state so we can return to the boot state when kexec'ing.
> > >>
> > >>This is per CPU because all 3 bioses on the boxes I have tested dont
> > >>enabled all local apics on boot.  As a result, we have to return to
> > >>the bios state so the ACPI tables match up with the hardware state
> > >>for the booting kernel.
> > >Which ACPI table requires this?
> > Cant remember offhand but linux (2.6.32) was doing finger pointing at
> > the multi-processor tables
> 
> MP tables? Those don't get used when ACPI is used..

He may mean ACPI MADT table...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 2 of 3] apic: remove 'enabled_via_apicbase' variable
  2011-05-18 20:35     ` Andrew Cooper
  2011-05-18 20:45       ` Konrad Rzeszutek Wilk
@ 2011-05-19  3:31       ` Tian, Kevin
  1 sibling, 0 replies; 27+ messages in thread
From: Tian, Kevin @ 2011-05-19  3:31 UTC (permalink / raw)
  To: Andrew Cooper, Konrad Rzeszutek Wilk; +Cc: xen-devel

> From: Andrew Cooper
> Sent: Thursday, May 19, 2011 4:36 AM
> 
> >
> >> which has been changed to correctly tare down the local APIC without
> >> a
> > teardown?
> >> protection fault (which leads to a general protection fault).
> > So if you don't have x2apic, then it is wrong to disable the LAPIC mode?
> > What about older hardware?
> I guess I wasn't very clear in my description.  In older hardware without
> x2apic, it is correct to simply twiddle the ENABLE bit in the APICBASE MSR.
> However, with x2apic mode enabled, setting the ENABLE bit from 1 to 0 while
> leaving the EXTD bit set will result in a protection fault which will propagate to
> a general protection fault the same codepath will be called in the fault handler.
> As a result, the current code in disable_local_APIC will result in a GPF if the
> BIOS boots with LAPICs disabled (fine as per the spec for compatibility) and xen
> decided to take advantage of x2apic mode.

Though above is true a bug, I'm curious whether it's a hard requirement to reset
APIC state to the BIOS state, or whether there's other way around to handle it
in kexec path. Could you check how upstream Linux handles this? In a glimpse
upstream Linux only tries to recover APIC to BIOS state for 32bit, and also lacks
of special treatment on x2apic. Then how would 64bit linux work with kexec?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-18 20:43       ` Konrad Rzeszutek Wilk
  2011-05-19  0:56         ` Tian, Kevin
@ 2011-05-19  8:34         ` Ian Campbell
  2011-05-19 16:21           ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 27+ messages in thread
From: Ian Campbell @ 2011-05-19  8:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andrew Cooper, xen-devel

On Wed, 2011-05-18 at 21:43 +0100, Konrad Rzeszutek Wilk wrote:
> > >Email Vivek Goyal. He is the kexec/kdump maintainer.
> > He doesn't appear in the maintainers file, and I don't have an email
> 
> Hmm..This is what I see in the MAINTAINERS file:
> KDUMP
> M:      Vivek Goyal <vgoyal@redhat.com>
> M:      Haren Myneni <hbabu@us.ibm.com>
> L:      kexec@lists.infradead.org
> W:      http://lse.sourceforge.net/kdump/
> S:      Maintained
> F:      Documentation/kdump/

That's the Linux MAINTAINERS file, which is why Andrew probably didn't
look there since this is a Xen side patch. The question didn't really
seem like a kdump maintainer one in any case, more of a "person who
knows about APICs one", perhaps those are sometimes the same person ;-)

Ian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-18 23:40   ` Keir Fraser
@ 2011-05-19 11:14     ` Andrew Cooper
  2011-05-19 14:26       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2011-05-19 11:14 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel



On 19/05/11 00:40, Keir Fraser wrote:
> On 18/05/2011 19:08, "Andrew Cooper"<andrew.cooper3@citrix.com>  wrote:
>
>> Xen does not store the boot local apic state which leads to problems
>> when shutting down for a kexec jump.  This patch records the boot
>> state so we can return to the boot state when kexec'ing.
>>
>> This is per CPU because all 3 bioses on the boxes I have tested dont
>> enabled all local apics on boot.
> You mean some CPUs have APIC totally disabled?
That is what I meant but now I have double checked the trace, I realize 
I was mis-interpreting the Xen APIC debug trace.  As a result, I do 
agree that most of the rest of this patch is based on false assumptions.
>   Impossible: you need the APIC
> enabled to be able to INIT-SIPI bootstrap the secondary processors.
Nope.  The MP spec section 3.8 states that all AP lapics must be 
disabled when the bios passes over to the OS.  Section 3.7.3 states that 
the INIT-IPI is special and twiddles the PRST pin for external lapics, 
or the INIT pin for internal lapics.  The result in either case is the 
lapic resetting, enabling themselves in the process, and reading the 
reset vector.
>   And Xen
> itself has no code to enable disabled APICs (except some very legacy code
> that executes only in the uniprocessor case, which I'm pretty sure you're
> not touching).
>
> You call your new function so late in secondary CPU bringup that our own
> APIC setup has already run (via
> smp_callin->{x2apic_ap_setup,setup_local_APIC}). Which would indicate that
> the boot state of the secondary CPUs is not of importance to you, since you
> I'm sure tested your patch as working okay in this current form, and that
> would imply that you don't need to record per-cpu boot state.
Hmm I thought I had traced the codepath but I clearly got that wrong.

Either way, given the new clarification from the MP spec, my suggestion 
would be to fully disable the lapics in the nmi_crash_handler just 
before looping over halt().  Experimental evidence shows however that 
linux 2.6.32 kdump (and presumably earlier) can not function if its BSP 
lapic is in x2apic mode when the bios would have left it disabled or in 
xapic mode.
> The singleshot BUG_ON in record_boot_APIC_mode is broken, since secondary
> CPUs can be started up multiple times (we can offline/online them). You need
> to bail on repeated invocations. Or not be recording boot state for
> secondary cpus since apparently it isn't needed...
Fair enough - I didn't realize that it was possible to be called several 
times
> We're going to be going through these patches very slowly and painfully,
> because frankly the arguments and assertions seem to be plenty full of
> holes.
>
>   -- Keir
>> As a result, we have to return to
>> the bios state so the ACPI tables match up with the hardware state
>> for the booting kernel.
>>
>> Signed-off-by: Andrew Cooper<andrew.cooper3@citrix.com>
>>
>> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/apic.c
>> --- a/xen/arch/x86/apic.c Tue May 17 17:32:19 2011 +0100
>> +++ b/xen/arch/x86/apic.c Wed May 18 19:00:13 2011 +0100
>> @@ -74,6 +74,8 @@ u8 __read_mostly apic_verbosity;
>>   static bool_t __initdata opt_x2apic = 1;
>>   boolean_param("x2apic", opt_x2apic);
>>
>> +DEFINE_PER_CPU_READ_MOSTLY(enum apic_mode, apic_boot_mode) =
>> APIC_MODE_INVALID;
>> +
>>   bool_t __read_mostly x2apic_enabled = 0;
>>   bool_t __read_mostly directed_eoi_enabled = 0;
>>
>> @@ -1437,6 +1439,41 @@ int __init APIC_init_uniprocessor (void)
>>       return 0;
>>   }
>>
>> +/* Needs to be called once per CPU during startup.  It records the state the
>> BIOS
>> + * leaves the local APIC so we can tare back down upon shutdown/crash
>> + */
>> +void __init record_boot_APIC_mode(void)
>> +{
>> +    enum apic_mode this_apic_mode;
>> +    u64 msr_contents;
>> +
>> +    this_apic_mode = APIC_MODE_INVALID;
>> +
>> +    /* Sanity check - we should only ever run once */
>> +    BUG_ON( APIC_MODE_INVALID != this_cpu(apic_boot_mode) );
>> +
>> +    rdmsrl(MSR_IA32_APICBASE, msr_contents);
>> +
>> +    /* Reading EXTD bit from the MSR is only valid if CPUID says so, else
>> reserved */
>> +    if ( cpu_has(&current_cpu_data, X86_FEATURE_X2APIC)
>> +&&  (msr_contents&  MSR_IA32_APICBASE_EXTD) )
>> +        this_apic_mode = APIC_MODE_X2APIC;
>> +    else
>> +        {
>> +            /* EN bit should always be valid as long as we can read the MSR
>> +             * Can anyone confirm this?
>> +             */
>> +            if ( msr_contents&  MSR_IA32_APICBASE_ENABLE )
>> +                this_apic_mode = APIC_MODE_XAPIC;
>> +            else
>> +                this_apic_mode = APIC_MODE_DISABLED;
>> +        }
>> +
>> +    this_cpu(apic_boot_mode) = this_apic_mode;
>> +    apic_printk(APIC_DEBUG, "APIC boot state is %d on core #%d\n",
>> +                this_apic_mode, smp_processor_id());
>> +}
>> +
>>   void check_for_unexpected_msi(unsigned int vector)
>>   {
>>       unsigned long v = apic_read(APIC_ISR + ((vector&  ~0x1f)>>  1));
>> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/genapic/probe.c
>> --- a/xen/arch/x86/genapic/probe.c Tue May 17 17:32:19 2011 +0100
>> +++ b/xen/arch/x86/genapic/probe.c Wed May 18 19:00:13 2011 +0100
>> @@ -60,6 +60,8 @@ void __init generic_apic_probe(void)
>>   {
>> int i, changed;
>>
>> +    record_boot_APIC_mode();
>> +
>> check_x2apic_preenabled();
>> cmdline_apic = changed = (genapic != NULL);
>>
>> diff -r f531ed84b066 -r 62a8ce6595ad xen/arch/x86/smpboot.c
>> --- a/xen/arch/x86/smpboot.c Tue May 17 17:32:19 2011 +0100
>> +++ b/xen/arch/x86/smpboot.c Wed May 18 19:00:13 2011 +0100
>> @@ -388,6 +388,9 @@ void start_secondary(void *unused)
>>
>>       microcode_resume_cpu(cpu);
>>
>> +    /* Record boot apic state */
>> +    record_boot_APIC_mode();
>> +
>>       wmb();
>>       startup_cpu_idle_loop();
>>   }
>> diff -r f531ed84b066 -r 62a8ce6595ad xen/include/asm-x86/apic.h
>> --- a/xen/include/asm-x86/apic.h Tue May 17 17:32:19 2011 +0100
>> +++ b/xen/include/asm-x86/apic.h Wed May 18 19:00:13 2011 +0100
>> @@ -21,6 +21,16 @@
>>   #define IO_APIC_REDIR_DEST_LOGICAL 0x00800
>>   #define IO_APIC_REDIR_DEST_PHYSICAL 0x00000
>>
>> +/* Possible APIC states */
>> +enum apic_mode { APIC_MODE_INVALID,  /* Not set yet */
>> +                 APIC_MODE_DISABLED, /* Some bioses disable by default for
>> compatability */
>> +                 APIC_MODE_XAPIC,    /* xAPIC mode - default upon chipset
>> reset */
>> +                 APIC_MODE_X2APIC    /* x2APIC mode - common for large smp
>> machines */
>> +};
>> +
>> +/* PerCPU local APIC boot mode - so we can taredown to bios state */
>> +DECLARE_PER_CPU(enum apic_mode, apic_boot_mode);
>> +
>>   extern u8 apic_verbosity;
>>   extern bool_t x2apic_enabled;
>>   extern bool_t directed_eoi_enabled;
>> @@ -203,6 +213,7 @@ extern void disable_APIC_timer(void);
>>   extern void enable_APIC_timer(void);
>>   extern int lapic_suspend(void);
>>   extern int lapic_resume(void);
>> +extern void record_boot_APIC_mode(void);
>>
>>   extern int check_nmi_watchdog (void);
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-19 11:14     ` Andrew Cooper
@ 2011-05-19 14:26       ` Konrad Rzeszutek Wilk
  2011-05-19 14:43         ` Andrew Cooper
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-19 14:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, xen-devel

> evidence shows however that linux 2.6.32 kdump (and presumably

I would suggest you use the latest linux kernel (2.6.39) to 
look at the kdump implementation, much fresher.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel
  2011-05-18 21:24         ` Andrew Cooper
@ 2011-05-19 14:32           ` Konrad Rzeszutek Wilk
  2011-05-20  0:33             ` Kay, Allen M
  2011-05-20 21:55             ` Kay, Allen M
  0 siblings, 2 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-19 14:32 UTC (permalink / raw)
  To: Andrew Cooper, allen.m.kay; +Cc: xen-devel

> >>>What about AMD VI IOMMUs? Does it work when that IOMMU is used?
> >>>
> >>It worked on the AMD box I tested the code on.  Like the comment
> >>says - as far as I can tell, it is architecture independent code.
> >>>>+     */
> >>>>+    iommu_disable_x2apic_IR();
> >>>Can't that function be done in the suspend code of the IOMMU?
> >>There is a comment in iommu suspend stating that it cant and isn't
> >>done, but rather is left for the local/ioapic_suspend functions
> >>which dont properly work in the kexec path.
> >OK, how about just moving it out of driver/passthrought/vtd then?
> Because that code is fragile enough without me poking about in it.
> I would prefer someone with more knowledge about IOMMU to make that
> call.

OK. Lets CC him here then.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-19 14:26       ` Konrad Rzeszutek Wilk
@ 2011-05-19 14:43         ` Andrew Cooper
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Cooper @ 2011-05-19 14:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Keir Fraser, xen-devel



On 19/05/11 15:26, Konrad Rzeszutek Wilk wrote:
>> evidence shows however that linux 2.6.32 kdump (and presumably
> I would suggest you use the latest linux kernel (2.6.39) to
> look at the kdump implementation, much fresher.
True, but at the moment, my problem is to fix the fact that our next 
release of XenServer (which runs 2.6.32) doesn't successfully kdump.  
Therefore, that was the base of my investigation of the problem.

I will experiment with later kernels when I get a chance, but from 
comparing the similarities in the codepaths, I'm not sure the results 
will be much different.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1 of 3] apic: record local apic state on boot
  2011-05-19  8:34         ` Ian Campbell
@ 2011-05-19 16:21           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-05-19 16:21 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Andrew Cooper, xen-devel

On Thu, May 19, 2011 at 09:34:46AM +0100, Ian Campbell wrote:
> On Wed, 2011-05-18 at 21:43 +0100, Konrad Rzeszutek Wilk wrote:
> > > >Email Vivek Goyal. He is the kexec/kdump maintainer.
> > > He doesn't appear in the maintainers file, and I don't have an email
> > 
> > Hmm..This is what I see in the MAINTAINERS file:
> > KDUMP
> > M:      Vivek Goyal <vgoyal@redhat.com>
> > M:      Haren Myneni <hbabu@us.ibm.com>
> > L:      kexec@lists.infradead.org
> > W:      http://lse.sourceforge.net/kdump/
> > S:      Maintained
> > F:      Documentation/kdump/
> 
> That's the Linux MAINTAINERS file, which is why Andrew probably didn't
> look there since this is a Xen side patch. The question didn't really
> seem like a kdump maintainer one in any case, more of a "person who
> knows about APICs one", perhaps those are sometimes the same person ;-)

Oh yeah. I sat next to Vivek when he was cursing out IO-APICs,
SCSI controllers, network cards being brain-dead when the Linux kernel
barfed. He is your man.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel
  2011-05-19 14:32           ` Konrad Rzeszutek Wilk
@ 2011-05-20  0:33             ` Kay, Allen M
  2011-05-20 21:55             ` Kay, Allen M
  1 sibling, 0 replies; 27+ messages in thread
From: Kay, Allen M @ 2011-05-20  0:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andrew Cooper; +Cc: xen-devel

The functions iommu_enable_x2apic_IR()/iommu_disable_x2apic_IR() should really be architectural specific.  They should not be called from common code without going through iommu API. The reason it worked on AMD box is because it returns if the list acpi_drhd_units is empty.  On AMD box, this list is empty since it is only populated only on Intel VT-d enabled systems.  We will clean this up.

I don't know why is disable_intremap() called separately.  It seems to me we should be able to call disable_qinval() and disable_intremap() in vtd_suspend().

I will give it a try tomorrow and see if I can find a clue the code is written this way.

Allen

-----Original Message-----
From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
Sent: Thursday, May 19, 2011 7:32 AM
To: Andrew Cooper; Kay, Allen M
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel

> >>>What about AMD VI IOMMUs? Does it work when that IOMMU is used?
> >>>
> >>It worked on the AMD box I tested the code on.  Like the comment
> >>says - as far as I can tell, it is architecture independent code.
> >>>>+     */
> >>>>+    iommu_disable_x2apic_IR();
> >>>Can't that function be done in the suspend code of the IOMMU?
> >>There is a comment in iommu suspend stating that it cant and isn't
> >>done, but rather is left for the local/ioapic_suspend functions
> >>which dont properly work in the kexec path.
> >OK, how about just moving it out of driver/passthrought/vtd then?
> Because that code is fragile enough without me poking about in it.
> I would prefer someone with more knowledge about IOMMU to make that
> call.

OK. Lets CC him here then.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel
  2011-05-19 14:32           ` Konrad Rzeszutek Wilk
  2011-05-20  0:33             ` Kay, Allen M
@ 2011-05-20 21:55             ` Kay, Allen M
  1 sibling, 0 replies; 27+ messages in thread
From: Kay, Allen M @ 2011-05-20 21:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Andrew Cooper; +Cc: xen-devel

Andrew,

Given that there is some cleanup work needed with x2apic/suspend, you can go ahead and add calls to disable_intremap() and disable_qinval() in vtd_suspend() for now.  This seems the right thing to do by looking at the code.  We will look at this again when we revisit and clean up x2apic/suspend code.

Once you are done with kexec work, can you write a xen wiki page so that we can follow your instructions to test kexec when we touch this part of the code?

Allen

-----Original Message-----
From: Kay, Allen M 
Sent: Thursday, May 19, 2011 5:33 PM
To: 'Konrad Rzeszutek Wilk'; Andrew Cooper
Cc: xen-devel@lists.xensource.com
Subject: RE: [Xen-devel] [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel

The functions iommu_enable_x2apic_IR()/iommu_disable_x2apic_IR() should really be architectural specific.  They should not be called from common code without going through iommu API. The reason it worked on AMD box is because it returns if the list acpi_drhd_units is empty.  On AMD box, this list is empty since it is only populated only on Intel VT-d enabled systems.  We will clean this up.

I don't know why is disable_intremap() called separately.  It seems to me we should be able to call disable_qinval() and disable_intremap() in vtd_suspend().

I will give it a try tomorrow and see if I can find a clue the code is written this way.

Allen

-----Original Message-----
From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] 
Sent: Thursday, May 19, 2011 7:32 AM
To: Andrew Cooper; Kay, Allen M
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel

> >>>What about AMD VI IOMMUs? Does it work when that IOMMU is used?
> >>>
> >>It worked on the AMD box I tested the code on.  Like the comment
> >>says - as far as I can tell, it is architecture independent code.
> >>>>+     */
> >>>>+    iommu_disable_x2apic_IR();
> >>>Can't that function be done in the suspend code of the IOMMU?
> >>There is a comment in iommu suspend stating that it cant and isn't
> >>done, but rather is left for the local/ioapic_suspend functions
> >>which dont properly work in the kexec path.
> >OK, how about just moving it out of driver/passthrought/vtd then?
> Because that code is fragile enough without me poking about in it.
> I would prefer someone with more knowledge about IOMMU to make that
> call.

OK. Lets CC him here then.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-05-20 21:55 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-18 18:08 [PATCH 0 of 3] Fix kexec path in xen (take 2) Andrew Cooper
2011-05-18 18:08 ` [PATCH 1 of 3] apic: record local apic state on boot Andrew Cooper
2011-05-18 18:49   ` Konrad Rzeszutek Wilk
2011-05-18 20:27     ` Andrew Cooper
2011-05-18 20:43       ` Konrad Rzeszutek Wilk
2011-05-19  0:56         ` Tian, Kevin
2011-05-19  8:34         ` Ian Campbell
2011-05-19 16:21           ` Konrad Rzeszutek Wilk
2011-05-19  0:54       ` Tian, Kevin
2011-05-18 23:40   ` Keir Fraser
2011-05-19 11:14     ` Andrew Cooper
2011-05-19 14:26       ` Konrad Rzeszutek Wilk
2011-05-19 14:43         ` Andrew Cooper
2011-05-18 18:08 ` [PATCH 2 of 3] apic: remove 'enabled_via_apicbase' variable Andrew Cooper
2011-05-18 18:53   ` Konrad Rzeszutek Wilk
2011-05-18 20:35     ` Andrew Cooper
2011-05-18 20:45       ` Konrad Rzeszutek Wilk
2011-05-19  3:31       ` Tian, Kevin
2011-05-18 18:08 ` [PATCH 3 of 3] kexec: disable iommu jumping into the kdump kernel Andrew Cooper
2011-05-18 18:49   ` Konrad Rzeszutek Wilk
2011-05-18 20:48     ` Andrew Cooper
2011-05-18 20:57       ` Konrad Rzeszutek Wilk
2011-05-18 21:24         ` Andrew Cooper
2011-05-19 14:32           ` Konrad Rzeszutek Wilk
2011-05-20  0:33             ` Kay, Allen M
2011-05-20 21:55             ` Kay, Allen M
2011-05-18 18:39 ` [PATCH 0 of 3] Fix kexec path in xen (take 2) Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.