All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec
@ 2014-07-15 13:40 Vitaly Kuznetsov
  2014-07-15 13:40 ` [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info Vitaly Kuznetsov
                   ` (9 more replies)
  0 siblings, 10 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 13:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Stefano Stabellini, Andrew Jones, linux-kernel

With this patch series I'm trying to address several issues with kexec on pvhvm:
- shared_info issue (1st patch, just sending Olaf's work with Konrad's fix)
- create specific pvhvm shutdown handler for kexec (2nd patch)
- GSI PIRQ issue (3rd patch, I'm pretty confident that it does the right thing)
- MSI PIRQ issue (4th patch, and I'm not sure it doesn't break anything -> RFC)

This patch series can be tested on single vCPU guest. We still have SMP issues with
pvhvm guests and kexec which require additional fixes.

Olaf Hering (1):
  xen PVonHVM: use E820_Reserved area for shared_info

Vitaly Kuznetsov (3):
  xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  xen/pvhvm: Unmap all PIRQs on startup and shutdown
  xen/pvhvm: Make MSI IRQs work after kexec

 arch/x86/pci/xen.c               |  3 +-
 arch/x86/xen/enlighten.c         | 83 +++++++++++++++++++++++++++++++---------
 arch/x86/xen/smp.c               | 10 +++++
 arch/x86/xen/smp.h               |  1 +
 arch/x86/xen/suspend.c           |  2 +-
 arch/x86/xen/xen-ops.h           |  2 +-
 drivers/xen/events/events_base.c | 76 ++++++++++++++++++++++++++++++++++++
 include/xen/events.h             |  3 ++
 8 files changed, 158 insertions(+), 22 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 13:40 [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec Vitaly Kuznetsov
  2014-07-15 13:40 ` [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info Vitaly Kuznetsov
@ 2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-15 15:06   ` Konrad Rzeszutek Wilk
                     ` (3 more replies)
  2014-07-15 13:40 ` [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown() Vitaly Kuznetsov
                   ` (7 subsequent siblings)
  9 siblings, 4 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 13:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Stefano Stabellini, Andrew Jones, linux-kernel

From: Olaf Hering <olaf@aepfle.de>

This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
("xen PVonHVM: move shared_info to MMIO before kexec").

Currently kexec in a PVonHVM guest fails with a triple fault because the
new kernel overwrites the shared info page. The exact failure depends on
the size of the kernel image. This patch moves the pfn from RAM into an
E820 reserved memory area.

The pfn containing the shared_info is located somewhere in RAM. This will
cause trouble if the current kernel is doing a kexec boot into a new
kernel. The new kernel (and its startup code) can not know where the pfn
is, so it can not reserve the page. The hypervisor will continue to update
the pfn, and as a result memory corruption occours in the new kernel.

The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
will usually not allocate areas up to FE700000, so FE700000 is expected
to work also with older toolstacks.

In Xen3 there is no reserved area at a fixed location. If the guest is
started on such old hosts the shared_info page will be placed in RAM. As
a result kexec can not be used.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)

[On resume we need to reset the xen_vcpu_info, which the original
patch did not do]

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/xen/enlighten.c | 74 ++++++++++++++++++++++++++++++++++++------------
 arch/x86/xen/suspend.c   |  2 +-
 arch/x86/xen/xen-ops.h   |  2 +-
 3 files changed, 58 insertions(+), 20 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ffb101e..a11af62 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
 #endif
 }
 
-void __ref xen_hvm_init_shared_info(void)
+#ifdef CONFIG_XEN_PVHVM
+#define HVM_SHARED_INFO_ADDR 0xFE700000UL
+static struct shared_info *xen_hvm_shared_info;
+static unsigned long xen_hvm_sip_phys;
+static int xen_major, xen_minor;
+
+static void xen_hvm_connect_shared_info(unsigned long pfn)
 {
-	int cpu;
 	struct xen_add_to_physmap xatp;
-	static struct shared_info *shared_info_page = 0;
 
-	if (!shared_info_page)
-		shared_info_page = (struct shared_info *)
-			extend_brk(PAGE_SIZE, PAGE_SIZE);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = pfn;
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+}
+static void __init xen_hvm_set_shared_info(struct shared_info *sip)
+{
+	int cpu;
+
+	HYPERVISOR_shared_info = sip;
 
 	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 	 * page, we use it in the event channel upcall and in some pvclock
@@ -1760,20 +1766,39 @@ void __ref xen_hvm_init_shared_info(void)
 	}
 }
 
-#ifdef CONFIG_XEN_PVHVM
+/* Reconnect the shared_info pfn to a (new) mfn */
+void xen_hvm_resume_shared_info(void)
+{
+	xen_hvm_connect_shared_info(xen_hvm_sip_phys >> PAGE_SHIFT);
+	xen_hvm_set_shared_info(xen_hvm_shared_info);
+}
+
+/* Xen tools prior to Xen 4 do not provide a E820_Reserved area for guest usage.
+ * On these old tools the shared info page will be placed in E820_Ram.
+ * Xen 4 provides a E820_Reserved area at 0xFC000000, and this code expects
+ * that nothing is mapped up to HVM_SHARED_INFO_ADDR.
+ * Xen 4.3+ provides an explicit 1MB area at HVM_SHARED_INFO_ADDR which is used
+ * here for the shared info page. */
+static void __init xen_hvm_init_shared_info(void)
+{
+	if (xen_major < 4) {
+		xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		xen_hvm_sip_phys = __pa(xen_hvm_shared_info);
+	} else {
+		xen_hvm_sip_phys = HVM_SHARED_INFO_ADDR;
+		set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_hvm_sip_phys);
+		xen_hvm_shared_info =
+		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
+	}
+	xen_hvm_resume_shared_info();
+}
+
 static void __init init_hvm_pv_info(void)
 {
-	int major, minor;
-	uint32_t eax, ebx, ecx, edx, pages, msr, base;
+	uint32_t  ecx, edx, pages, msr, base;
 	u64 pfn;
 
 	base = xen_cpuid_base();
-	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
-
-	major = eax >> 16;
-	minor = eax & 0xffff;
-	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
-
 	cpuid(base + 2, &pages, &msr, &ecx, &edx);
 
 	pfn = __pa(hypercall_page);
@@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
 
 static uint32_t __init xen_hvm_platform(void)
 {
+	uint32_t eax, ebx, ecx, edx, base;
+
 	if (xen_pv_domain())
 		return 0;
 
-	return xen_cpuid_base();
+	base = xen_cpuid_base();
+	if (!base)
+		return 0;
+
+	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
+
+	xen_major = eax >> 16;
+	xen_minor = eax & 0xffff;
+
+	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
+
+	return 1;
 }
 
 bool xen_hvm_need_lapic(void)
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index c4df9db..8d6793e 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -32,7 +32,7 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
 {
 #ifdef CONFIG_XEN_PVHVM
 	int cpu;
-	xen_hvm_init_shared_info();
+	xen_hvm_resume_shared_info();
 	xen_callback_vector();
 	xen_unplug_emulated_devices();
 	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 97d8765..d083e82 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -43,7 +43,7 @@ void xen_enable_syscall(void);
 void xen_vcpu_restore(void);
 
 void xen_callback_vector(void);
-void xen_hvm_init_shared_info(void);
+void xen_hvm_resume_shared_info(void);
 void xen_unplug_emulated_devices(void);
 
 void __init xen_build_dynamic_phys_to_machine(void);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 13:40 [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec Vitaly Kuznetsov
@ 2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-15 13:40 ` Vitaly Kuznetsov
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 13:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	Boris Ostrovsky

From: Olaf Hering <olaf@aepfle.de>

This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
("xen PVonHVM: move shared_info to MMIO before kexec").

Currently kexec in a PVonHVM guest fails with a triple fault because the
new kernel overwrites the shared info page. The exact failure depends on
the size of the kernel image. This patch moves the pfn from RAM into an
E820 reserved memory area.

The pfn containing the shared_info is located somewhere in RAM. This will
cause trouble if the current kernel is doing a kexec boot into a new
kernel. The new kernel (and its startup code) can not know where the pfn
is, so it can not reserve the page. The hypervisor will continue to update
the pfn, and as a result memory corruption occours in the new kernel.

The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
will usually not allocate areas up to FE700000, so FE700000 is expected
to work also with older toolstacks.

In Xen3 there is no reserved area at a fixed location. If the guest is
started on such old hosts the shared_info page will be placed in RAM. As
a result kexec can not be used.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)

[On resume we need to reset the xen_vcpu_info, which the original
patch did not do]

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/xen/enlighten.c | 74 ++++++++++++++++++++++++++++++++++++------------
 arch/x86/xen/suspend.c   |  2 +-
 arch/x86/xen/xen-ops.h   |  2 +-
 3 files changed, 58 insertions(+), 20 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ffb101e..a11af62 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
 #endif
 }
 
-void __ref xen_hvm_init_shared_info(void)
+#ifdef CONFIG_XEN_PVHVM
+#define HVM_SHARED_INFO_ADDR 0xFE700000UL
+static struct shared_info *xen_hvm_shared_info;
+static unsigned long xen_hvm_sip_phys;
+static int xen_major, xen_minor;
+
+static void xen_hvm_connect_shared_info(unsigned long pfn)
 {
-	int cpu;
 	struct xen_add_to_physmap xatp;
-	static struct shared_info *shared_info_page = 0;
 
-	if (!shared_info_page)
-		shared_info_page = (struct shared_info *)
-			extend_brk(PAGE_SIZE, PAGE_SIZE);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = pfn;
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+}
+static void __init xen_hvm_set_shared_info(struct shared_info *sip)
+{
+	int cpu;
+
+	HYPERVISOR_shared_info = sip;
 
 	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 	 * page, we use it in the event channel upcall and in some pvclock
@@ -1760,20 +1766,39 @@ void __ref xen_hvm_init_shared_info(void)
 	}
 }
 
-#ifdef CONFIG_XEN_PVHVM
+/* Reconnect the shared_info pfn to a (new) mfn */
+void xen_hvm_resume_shared_info(void)
+{
+	xen_hvm_connect_shared_info(xen_hvm_sip_phys >> PAGE_SHIFT);
+	xen_hvm_set_shared_info(xen_hvm_shared_info);
+}
+
+/* Xen tools prior to Xen 4 do not provide a E820_Reserved area for guest usage.
+ * On these old tools the shared info page will be placed in E820_Ram.
+ * Xen 4 provides a E820_Reserved area at 0xFC000000, and this code expects
+ * that nothing is mapped up to HVM_SHARED_INFO_ADDR.
+ * Xen 4.3+ provides an explicit 1MB area at HVM_SHARED_INFO_ADDR which is used
+ * here for the shared info page. */
+static void __init xen_hvm_init_shared_info(void)
+{
+	if (xen_major < 4) {
+		xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		xen_hvm_sip_phys = __pa(xen_hvm_shared_info);
+	} else {
+		xen_hvm_sip_phys = HVM_SHARED_INFO_ADDR;
+		set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_hvm_sip_phys);
+		xen_hvm_shared_info =
+		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
+	}
+	xen_hvm_resume_shared_info();
+}
+
 static void __init init_hvm_pv_info(void)
 {
-	int major, minor;
-	uint32_t eax, ebx, ecx, edx, pages, msr, base;
+	uint32_t  ecx, edx, pages, msr, base;
 	u64 pfn;
 
 	base = xen_cpuid_base();
-	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
-
-	major = eax >> 16;
-	minor = eax & 0xffff;
-	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
-
 	cpuid(base + 2, &pages, &msr, &ecx, &edx);
 
 	pfn = __pa(hypercall_page);
@@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
 
 static uint32_t __init xen_hvm_platform(void)
 {
+	uint32_t eax, ebx, ecx, edx, base;
+
 	if (xen_pv_domain())
 		return 0;
 
-	return xen_cpuid_base();
+	base = xen_cpuid_base();
+	if (!base)
+		return 0;
+
+	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
+
+	xen_major = eax >> 16;
+	xen_minor = eax & 0xffff;
+
+	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
+
+	return 1;
 }
 
 bool xen_hvm_need_lapic(void)
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index c4df9db..8d6793e 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -32,7 +32,7 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
 {
 #ifdef CONFIG_XEN_PVHVM
 	int cpu;
-	xen_hvm_init_shared_info();
+	xen_hvm_resume_shared_info();
 	xen_callback_vector();
 	xen_unplug_emulated_devices();
 	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 97d8765..d083e82 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -43,7 +43,7 @@ void xen_enable_syscall(void);
 void xen_vcpu_restore(void);
 
 void xen_callback_vector(void);
-void xen_hvm_init_shared_info(void);
+void xen_hvm_resume_shared_info(void);
 void xen_unplug_emulated_devices(void);
 
 void __init xen_build_dynamic_phys_to_machine(void);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 13:40 [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2014-07-15 13:40 ` [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown() Vitaly Kuznetsov
@ 2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-15 15:09   ` Konrad Rzeszutek Wilk
                     ` (5 more replies)
  2014-07-15 13:40 ` [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown Vitaly Kuznetsov
                   ` (5 subsequent siblings)
  9 siblings, 6 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 13:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Stefano Stabellini, Andrew Jones, linux-kernel

PVHVM guest requires special actions before kexec. Register specific
xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/xen/enlighten.c | 9 +++++++++
 arch/x86/xen/smp.c       | 9 +++++++++
 arch/x86/xen/smp.h       | 1 +
 3 files changed, 19 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a11af62..8074e4a 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
 	.notifier_call	= xen_hvm_cpu_notify,
 };
 
+static void xen_pvhvm_kexec_shutdown(void)
+{
+	xen_kexec_shutdown();
+	native_machine_shutdown();
+}
+
 static void __init xen_hvm_guest_init(void)
 {
 	init_hvm_pv_info();
@@ -1849,6 +1855,9 @@ static void __init xen_hvm_guest_init(void)
 	x86_init.irqs.intr_init = xen_init_IRQ;
 	xen_hvm_init_time_ops();
 	xen_hvm_init_mmu_ops();
+#ifdef CONFIG_KEXEC
+	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
+#endif
 }
 
 static uint32_t __init xen_hvm_platform(void)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..35dcf39 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -18,6 +18,7 @@
 #include <linux/smp.h>
 #include <linux/irq_work.h>
 #include <linux/tick.h>
+#include <linux/kexec.h>
 
 #include <asm/paravirt.h>
 #include <asm/desc.h>
@@ -762,6 +763,14 @@ static void xen_hvm_cpu_die(unsigned int cpu)
 	native_cpu_die(cpu);
 }
 
+void xen_kexec_shutdown(void)
+{
+#ifdef CONFIG_KEXEC
+	if (!kexec_in_progress)
+		return;
+#endif
+}
+
 void __init xen_hvm_smp_init(void)
 {
 	if (!xen_have_vector_callback)
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..1af0493 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -8,4 +8,5 @@ extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
 
+extern void xen_kexec_shutdown(void);
 #endif
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 13:40 [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec Vitaly Kuznetsov
  2014-07-15 13:40 ` [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info Vitaly Kuznetsov
  2014-07-15 13:40 ` Vitaly Kuznetsov
@ 2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-15 13:40 ` Vitaly Kuznetsov
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 13:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	Boris Ostrovsky

PVHVM guest requires special actions before kexec. Register specific
xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/xen/enlighten.c | 9 +++++++++
 arch/x86/xen/smp.c       | 9 +++++++++
 arch/x86/xen/smp.h       | 1 +
 3 files changed, 19 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a11af62..8074e4a 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
 	.notifier_call	= xen_hvm_cpu_notify,
 };
 
+static void xen_pvhvm_kexec_shutdown(void)
+{
+	xen_kexec_shutdown();
+	native_machine_shutdown();
+}
+
 static void __init xen_hvm_guest_init(void)
 {
 	init_hvm_pv_info();
@@ -1849,6 +1855,9 @@ static void __init xen_hvm_guest_init(void)
 	x86_init.irqs.intr_init = xen_init_IRQ;
 	xen_hvm_init_time_ops();
 	xen_hvm_init_mmu_ops();
+#ifdef CONFIG_KEXEC
+	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
+#endif
 }
 
 static uint32_t __init xen_hvm_platform(void)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..35dcf39 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -18,6 +18,7 @@
 #include <linux/smp.h>
 #include <linux/irq_work.h>
 #include <linux/tick.h>
+#include <linux/kexec.h>
 
 #include <asm/paravirt.h>
 #include <asm/desc.h>
@@ -762,6 +763,14 @@ static void xen_hvm_cpu_die(unsigned int cpu)
 	native_cpu_die(cpu);
 }
 
+void xen_kexec_shutdown(void)
+{
+#ifdef CONFIG_KEXEC
+	if (!kexec_in_progress)
+		return;
+#endif
+}
+
 void __init xen_hvm_smp_init(void)
 {
 	if (!xen_have_vector_callback)
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..1af0493 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -8,4 +8,5 @@ extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
 
+extern void xen_kexec_shutdown(void);
 #endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-15 13:40 [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2014-07-15 13:40 ` Vitaly Kuznetsov
@ 2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-15 15:23   ` Konrad Rzeszutek Wilk
                     ` (3 more replies)
  2014-07-15 13:40 ` Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  9 siblings, 4 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 13:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Stefano Stabellini, Andrew Jones, linux-kernel

When kexec is being run PIRQs from Qemu-emulated devices are still
mapped to old event channels and new kernel has no information about
that. Trying to map them twice results in the following in Xen's dmesg:

 (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
 (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
 (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
 ...

 and the following in new kernel's dmesg:

 [   92.286796] xen:events: Failed to obtain physical IRQ 4

The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
when kexec was requested and on every kernel startup. We need to do this
twice to deal with the following issues:
- startup-time unmapping is required to make kdump work;
- shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
- shutdown-time unmapping is required to make Qemu-emulated NICs work after
  kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
  is being performed).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/xen/smp.c               |  1 +
 drivers/xen/events/events_base.c | 76 ++++++++++++++++++++++++++++++++++++++++
 include/xen/events.h             |  3 ++
 3 files changed, 80 insertions(+)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 35dcf39..e2b4deb 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
 #ifdef CONFIG_KEXEC
 	if (!kexec_in_progress)
 		return;
+	xen_unmap_all_pirqs();
 #endif
 }
 
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c919d3d..7701c7f 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
 static bool fifo_events = true;
 module_param(fifo_events, bool, 0);
 
+void xen_unmap_all_pirqs(void)
+{
+	int pirq, rc, gsi, irq, evtchn;
+	struct physdev_unmap_pirq unmap_irq;
+	struct irq_info *info;
+	struct evtchn_close close;
+
+	mutex_lock(&irq_mapping_update_lock);
+
+	list_for_each_entry(info, &xen_irq_list_head, list) {
+		if (info->type != IRQT_PIRQ)
+			continue;
+
+		pirq = info->u.pirq.pirq;
+		gsi = info->u.pirq.gsi;
+		evtchn = info->evtchn;
+		irq = info->irq;
+
+		pr_debug("unmapping pirq gsi=%d pirq=%d irq=%d evtchn=%d\n",
+			gsi, pirq, irq, evtchn);
+
+		if (evtchn > 0) {
+			close.port = evtchn;
+			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
+							&close) != 0)
+				pr_warn("close evtchn %d failed\n", evtchn);
+		}
+
+		unmap_irq.pirq = pirq;
+		unmap_irq.domid = DOMID_SELF;
+
+		rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
+		if (rc)
+			pr_warn("unmap pirq failed gsi=%d pirq=%d irq=%d rc=%d\n",
+				gsi, pirq, irq, rc);
+	}
+
+	mutex_unlock(&irq_mapping_update_lock);
+}
+EXPORT_SYMBOL_GPL(xen_unmap_all_pirqs);
+
+static void xen_startup_unmap_pirqs(void)
+{
+	struct evtchn_status status;
+	int port, rc = -ENOENT;
+	struct physdev_unmap_pirq unmap_irq;
+	struct evtchn_close close;
+
+	memset(&status, 0, sizeof(status));
+	for (port = 0; port < xen_evtchn_max_channels(); port++) {
+		status.dom = DOMID_SELF;
+		status.port = port;
+		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
+		if (rc < 0)
+			continue;
+		if (status.status == EVTCHNSTAT_pirq) {
+			close.port = port;
+			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
+							&close) != 0)
+				pr_warn("xen: failed to close evtchn %d\n",
+					port);
+			unmap_irq.pirq = status.u.pirq;
+			unmap_irq.domid = DOMID_SELF;
+			pr_warn("xen: unmapping previously mapped pirq %d\n",
+				unmap_irq.pirq);
+			if (HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq,
+						  &unmap_irq) != 0)
+				pr_warn("xen: failed to unmap pirq %d\n",
+					unmap_irq.pirq);
+		}
+	}
+}
+
+
 void __init xen_init_IRQ(void)
 {
 	int ret = -EINVAL;
@@ -1671,6 +1745,8 @@ void __init xen_init_IRQ(void)
 		xen_callback_vector();
 
 	if (xen_hvm_domain()) {
+		xen_startup_unmap_pirqs();
+
 		native_init_IRQ();
 		/* pci_xen_hvm_init must be called after native_init_IRQ so that
 		 * __acpi_register_gsi can point at the right function */
diff --git a/include/xen/events.h b/include/xen/events.h
index 8bee7a7..3f9f428 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -122,6 +122,9 @@ int xen_irq_from_gsi(unsigned gsi);
 /* Determine whether to ignore this IRQ if it is passed to a guest. */
 int xen_test_irq_shared(int irq);
 
+/* Unmap all PIRQs and close event channels */
+void xen_unmap_all_pirqs(void);
+
 /* initialize Xen IRQ subsystem */
 void xen_init_IRQ(void);
 #endif	/* _XEN_EVENTS_H */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-15 13:40 [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2014-07-15 13:40 ` [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown Vitaly Kuznetsov
@ 2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-15 13:40 ` [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 13:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	Boris Ostrovsky

When kexec is being run PIRQs from Qemu-emulated devices are still
mapped to old event channels and new kernel has no information about
that. Trying to map them twice results in the following in Xen's dmesg:

 (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
 (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
 (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
 ...

 and the following in new kernel's dmesg:

 [   92.286796] xen:events: Failed to obtain physical IRQ 4

The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
when kexec was requested and on every kernel startup. We need to do this
twice to deal with the following issues:
- startup-time unmapping is required to make kdump work;
- shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
- shutdown-time unmapping is required to make Qemu-emulated NICs work after
  kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
  is being performed).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/xen/smp.c               |  1 +
 drivers/xen/events/events_base.c | 76 ++++++++++++++++++++++++++++++++++++++++
 include/xen/events.h             |  3 ++
 3 files changed, 80 insertions(+)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 35dcf39..e2b4deb 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
 #ifdef CONFIG_KEXEC
 	if (!kexec_in_progress)
 		return;
+	xen_unmap_all_pirqs();
 #endif
 }
 
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c919d3d..7701c7f 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
 static bool fifo_events = true;
 module_param(fifo_events, bool, 0);
 
+void xen_unmap_all_pirqs(void)
+{
+	int pirq, rc, gsi, irq, evtchn;
+	struct physdev_unmap_pirq unmap_irq;
+	struct irq_info *info;
+	struct evtchn_close close;
+
+	mutex_lock(&irq_mapping_update_lock);
+
+	list_for_each_entry(info, &xen_irq_list_head, list) {
+		if (info->type != IRQT_PIRQ)
+			continue;
+
+		pirq = info->u.pirq.pirq;
+		gsi = info->u.pirq.gsi;
+		evtchn = info->evtchn;
+		irq = info->irq;
+
+		pr_debug("unmapping pirq gsi=%d pirq=%d irq=%d evtchn=%d\n",
+			gsi, pirq, irq, evtchn);
+
+		if (evtchn > 0) {
+			close.port = evtchn;
+			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
+							&close) != 0)
+				pr_warn("close evtchn %d failed\n", evtchn);
+		}
+
+		unmap_irq.pirq = pirq;
+		unmap_irq.domid = DOMID_SELF;
+
+		rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
+		if (rc)
+			pr_warn("unmap pirq failed gsi=%d pirq=%d irq=%d rc=%d\n",
+				gsi, pirq, irq, rc);
+	}
+
+	mutex_unlock(&irq_mapping_update_lock);
+}
+EXPORT_SYMBOL_GPL(xen_unmap_all_pirqs);
+
+static void xen_startup_unmap_pirqs(void)
+{
+	struct evtchn_status status;
+	int port, rc = -ENOENT;
+	struct physdev_unmap_pirq unmap_irq;
+	struct evtchn_close close;
+
+	memset(&status, 0, sizeof(status));
+	for (port = 0; port < xen_evtchn_max_channels(); port++) {
+		status.dom = DOMID_SELF;
+		status.port = port;
+		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
+		if (rc < 0)
+			continue;
+		if (status.status == EVTCHNSTAT_pirq) {
+			close.port = port;
+			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
+							&close) != 0)
+				pr_warn("xen: failed to close evtchn %d\n",
+					port);
+			unmap_irq.pirq = status.u.pirq;
+			unmap_irq.domid = DOMID_SELF;
+			pr_warn("xen: unmapping previously mapped pirq %d\n",
+				unmap_irq.pirq);
+			if (HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq,
+						  &unmap_irq) != 0)
+				pr_warn("xen: failed to unmap pirq %d\n",
+					unmap_irq.pirq);
+		}
+	}
+}
+
+
 void __init xen_init_IRQ(void)
 {
 	int ret = -EINVAL;
@@ -1671,6 +1745,8 @@ void __init xen_init_IRQ(void)
 		xen_callback_vector();
 
 	if (xen_hvm_domain()) {
+		xen_startup_unmap_pirqs();
+
 		native_init_IRQ();
 		/* pci_xen_hvm_init must be called after native_init_IRQ so that
 		 * __acpi_register_gsi can point at the right function */
diff --git a/include/xen/events.h b/include/xen/events.h
index 8bee7a7..3f9f428 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -122,6 +122,9 @@ int xen_irq_from_gsi(unsigned gsi);
 /* Determine whether to ignore this IRQ if it is passed to a guest. */
 int xen_test_irq_shared(int irq);
 
+/* Unmap all PIRQs and close event channels */
+void xen_unmap_all_pirqs(void);
+
 /* initialize Xen IRQ subsystem */
 void xen_init_IRQ(void);
 #endif	/* _XEN_EVENTS_H */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-15 13:40 [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec Vitaly Kuznetsov
                   ` (5 preceding siblings ...)
  2014-07-15 13:40 ` Vitaly Kuznetsov
@ 2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-15 15:21   ` Konrad Rzeszutek Wilk
  2014-07-15 15:21   ` Konrad Rzeszutek Wilk
  2014-07-15 13:40 ` Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 13:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Stefano Stabellini, Andrew Jones, linux-kernel

When kexec was peformed MSI IRQs for passthrough-ed devices were already
mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
fails as we have no IRQ mapping information for that. Requesting for new
mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
we don't recieve these IRQs.

RFC: I wasn't able to understand why commit af42b8d1 which introduced
xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
not sure when __write_msi_msg() with new PIRQ will result in new mapping.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/pci/xen.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 905956f..685e8f1 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 		__read_msi_msg(msidesc, &msg);
 		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
 			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
-		if (msg.data != XEN_PIRQ_MSI_DATA ||
-		    xen_irq_from_pirq(pirq) < 0) {
+		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
 			pirq = xen_allocate_pirq_msi(dev, msidesc);
 			if (pirq < 0) {
 				irq = -ENODEV;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-15 13:40 [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec Vitaly Kuznetsov
                   ` (6 preceding siblings ...)
  2014-07-15 13:40 ` [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec Vitaly Kuznetsov
@ 2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-28 13:24 ` [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec David Vrabel
  2014-07-28 13:24 ` David Vrabel
  9 siblings, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 13:40 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	Boris Ostrovsky

When kexec was peformed MSI IRQs for passthrough-ed devices were already
mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
fails as we have no IRQ mapping information for that. Requesting for new
mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
we don't recieve these IRQs.

RFC: I wasn't able to understand why commit af42b8d1 which introduced
xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
not sure when __write_msi_msg() with new PIRQ will result in new mapping.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/pci/xen.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 905956f..685e8f1 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 		__read_msi_msg(msidesc, &msg);
 		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
 			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
-		if (msg.data != XEN_PIRQ_MSI_DATA ||
-		    xen_irq_from_pirq(pirq) < 0) {
+		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
 			pirq = xen_allocate_pirq_msi(dev, msidesc);
 			if (pirq < 0) {
 				irq = -ENODEV;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-15 15:06   ` Konrad Rzeszutek Wilk
@ 2014-07-15 15:06   ` Konrad Rzeszutek Wilk
  2014-07-15 15:43     ` Vitaly Kuznetsov
  2014-07-15 15:43     ` Vitaly Kuznetsov
  2014-07-28 13:33   ` David Vrabel
  2014-07-28 13:33   ` David Vrabel
  3 siblings, 2 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:06 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
> From: Olaf Hering <olaf@aepfle.de>
> 
> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
> ("xen PVonHVM: move shared_info to MMIO before kexec").
> 
> Currently kexec in a PVonHVM guest fails with a triple fault because the
> new kernel overwrites the shared info page. The exact failure depends on
> the size of the kernel image. This patch moves the pfn from RAM into an
> E820 reserved memory area.
> 
> The pfn containing the shared_info is located somewhere in RAM. This will
> cause trouble if the current kernel is doing a kexec boot into a new
> kernel. The new kernel (and its startup code) can not know where the pfn
> is, so it can not reserve the page. The hypervisor will continue to update
> the pfn, and as a result memory corruption occours in the new kernel.
> 
> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
> will usually not allocate areas up to FE700000, so FE700000 is expected
> to work also with older toolstacks.
> 
> In Xen3 there is no reserved area at a fixed location. If the guest is
> started on such old hosts the shared_info page will be placed in RAM. As
> a result kexec can not be used.

So this looks right, the one thing that we really need to check 
is e9daff24a266307943457086533041bd971d0ef9

   This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
    
    We are doing this b/c on 32-bit PVonHVM with older hypervisors
    (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
    we use it for the time keeping, and the timekeeping code loops
    forever - as the version field never changes. Olaf says to
    revert it, so lets do that.
    

Could you kindly test that the migration on 32-bit PVHVM guests
on older hypervisors works?

> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)
> 
> [On resume we need to reset the xen_vcpu_info, which the original
> patch did not do]
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/xen/enlighten.c | 74 ++++++++++++++++++++++++++++++++++++------------
>  arch/x86/xen/suspend.c   |  2 +-
>  arch/x86/xen/xen-ops.h   |  2 +-
>  3 files changed, 58 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ffb101e..a11af62 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
>  #endif
>  }
>  
> -void __ref xen_hvm_init_shared_info(void)
> +#ifdef CONFIG_XEN_PVHVM
> +#define HVM_SHARED_INFO_ADDR 0xFE700000UL
> +static struct shared_info *xen_hvm_shared_info;
> +static unsigned long xen_hvm_sip_phys;
> +static int xen_major, xen_minor;
> +
> +static void xen_hvm_connect_shared_info(unsigned long pfn)
>  {
> -	int cpu;
>  	struct xen_add_to_physmap xatp;
> -	static struct shared_info *shared_info_page = 0;
>  
> -	if (!shared_info_page)
> -		shared_info_page = (struct shared_info *)
> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
>  	xatp.domid = DOMID_SELF;
>  	xatp.idx = 0;
>  	xatp.space = XENMAPSPACE_shared_info;
> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> +	xatp.gpfn = pfn;
>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
>  		BUG();
>  
> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> +}
> +static void __init xen_hvm_set_shared_info(struct shared_info *sip)
> +{
> +	int cpu;
> +
> +	HYPERVISOR_shared_info = sip;
>  
>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>  	 * page, we use it in the event channel upcall and in some pvclock
> @@ -1760,20 +1766,39 @@ void __ref xen_hvm_init_shared_info(void)
>  	}
>  }
>  
> -#ifdef CONFIG_XEN_PVHVM
> +/* Reconnect the shared_info pfn to a (new) mfn */
> +void xen_hvm_resume_shared_info(void)
> +{
> +	xen_hvm_connect_shared_info(xen_hvm_sip_phys >> PAGE_SHIFT);
> +	xen_hvm_set_shared_info(xen_hvm_shared_info);
> +}
> +
> +/* Xen tools prior to Xen 4 do not provide a E820_Reserved area for guest usage.
> + * On these old tools the shared info page will be placed in E820_Ram.
> + * Xen 4 provides a E820_Reserved area at 0xFC000000, and this code expects
> + * that nothing is mapped up to HVM_SHARED_INFO_ADDR.
> + * Xen 4.3+ provides an explicit 1MB area at HVM_SHARED_INFO_ADDR which is used
> + * here for the shared info page. */
> +static void __init xen_hvm_init_shared_info(void)
> +{
> +	if (xen_major < 4) {
> +		xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
> +		xen_hvm_sip_phys = __pa(xen_hvm_shared_info);
> +	} else {
> +		xen_hvm_sip_phys = HVM_SHARED_INFO_ADDR;
> +		set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_hvm_sip_phys);
> +		xen_hvm_shared_info =
> +		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
> +	}
> +	xen_hvm_resume_shared_info();
> +}
> +
>  static void __init init_hvm_pv_info(void)
>  {
> -	int major, minor;
> -	uint32_t eax, ebx, ecx, edx, pages, msr, base;
> +	uint32_t  ecx, edx, pages, msr, base;
>  	u64 pfn;
>  
>  	base = xen_cpuid_base();
> -	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
> -
> -	major = eax >> 16;
> -	minor = eax & 0xffff;
> -	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
> -
>  	cpuid(base + 2, &pages, &msr, &ecx, &edx);
>  
>  	pfn = __pa(hypercall_page);
> @@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
>  
>  static uint32_t __init xen_hvm_platform(void)
>  {
> +	uint32_t eax, ebx, ecx, edx, base;
> +
>  	if (xen_pv_domain())
>  		return 0;
>  
> -	return xen_cpuid_base();
> +	base = xen_cpuid_base();
> +	if (!base)
> +		return 0;
> +
> +	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
> +
> +	xen_major = eax >> 16;
> +	xen_minor = eax & 0xffff;
> +
> +	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
> +
> +	return 1;
>  }
>  
>  bool xen_hvm_need_lapic(void)
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index c4df9db..8d6793e 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -32,7 +32,7 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>  {
>  #ifdef CONFIG_XEN_PVHVM
>  	int cpu;
> -	xen_hvm_init_shared_info();
> +	xen_hvm_resume_shared_info();
>  	xen_callback_vector();
>  	xen_unplug_emulated_devices();
>  	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 97d8765..d083e82 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -43,7 +43,7 @@ void xen_enable_syscall(void);
>  void xen_vcpu_restore(void);
>  
>  void xen_callback_vector(void);
> -void xen_hvm_init_shared_info(void);
> +void xen_hvm_resume_shared_info(void);
>  void xen_unplug_emulated_devices(void);
>  
>  void __init xen_build_dynamic_phys_to_machine(void);
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 13:40 ` Vitaly Kuznetsov
@ 2014-07-15 15:06   ` Konrad Rzeszutek Wilk
  2014-07-15 15:06   ` Konrad Rzeszutek Wilk
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:06 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
> From: Olaf Hering <olaf@aepfle.de>
> 
> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
> ("xen PVonHVM: move shared_info to MMIO before kexec").
> 
> Currently kexec in a PVonHVM guest fails with a triple fault because the
> new kernel overwrites the shared info page. The exact failure depends on
> the size of the kernel image. This patch moves the pfn from RAM into an
> E820 reserved memory area.
> 
> The pfn containing the shared_info is located somewhere in RAM. This will
> cause trouble if the current kernel is doing a kexec boot into a new
> kernel. The new kernel (and its startup code) can not know where the pfn
> is, so it can not reserve the page. The hypervisor will continue to update
> the pfn, and as a result memory corruption occours in the new kernel.
> 
> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
> will usually not allocate areas up to FE700000, so FE700000 is expected
> to work also with older toolstacks.
> 
> In Xen3 there is no reserved area at a fixed location. If the guest is
> started on such old hosts the shared_info page will be placed in RAM. As
> a result kexec can not be used.

So this looks right, the one thing that we really need to check 
is e9daff24a266307943457086533041bd971d0ef9

   This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
    
    We are doing this b/c on 32-bit PVonHVM with older hypervisors
    (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
    we use it for the time keeping, and the timekeeping code loops
    forever - as the version field never changes. Olaf says to
    revert it, so lets do that.
    

Could you kindly test that the migration on 32-bit PVHVM guests
on older hypervisors works?

> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)
> 
> [On resume we need to reset the xen_vcpu_info, which the original
> patch did not do]
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/xen/enlighten.c | 74 ++++++++++++++++++++++++++++++++++++------------
>  arch/x86/xen/suspend.c   |  2 +-
>  arch/x86/xen/xen-ops.h   |  2 +-
>  3 files changed, 58 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ffb101e..a11af62 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
>  #endif
>  }
>  
> -void __ref xen_hvm_init_shared_info(void)
> +#ifdef CONFIG_XEN_PVHVM
> +#define HVM_SHARED_INFO_ADDR 0xFE700000UL
> +static struct shared_info *xen_hvm_shared_info;
> +static unsigned long xen_hvm_sip_phys;
> +static int xen_major, xen_minor;
> +
> +static void xen_hvm_connect_shared_info(unsigned long pfn)
>  {
> -	int cpu;
>  	struct xen_add_to_physmap xatp;
> -	static struct shared_info *shared_info_page = 0;
>  
> -	if (!shared_info_page)
> -		shared_info_page = (struct shared_info *)
> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
>  	xatp.domid = DOMID_SELF;
>  	xatp.idx = 0;
>  	xatp.space = XENMAPSPACE_shared_info;
> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> +	xatp.gpfn = pfn;
>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
>  		BUG();
>  
> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> +}
> +static void __init xen_hvm_set_shared_info(struct shared_info *sip)
> +{
> +	int cpu;
> +
> +	HYPERVISOR_shared_info = sip;
>  
>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>  	 * page, we use it in the event channel upcall and in some pvclock
> @@ -1760,20 +1766,39 @@ void __ref xen_hvm_init_shared_info(void)
>  	}
>  }
>  
> -#ifdef CONFIG_XEN_PVHVM
> +/* Reconnect the shared_info pfn to a (new) mfn */
> +void xen_hvm_resume_shared_info(void)
> +{
> +	xen_hvm_connect_shared_info(xen_hvm_sip_phys >> PAGE_SHIFT);
> +	xen_hvm_set_shared_info(xen_hvm_shared_info);
> +}
> +
> +/* Xen tools prior to Xen 4 do not provide a E820_Reserved area for guest usage.
> + * On these old tools the shared info page will be placed in E820_Ram.
> + * Xen 4 provides a E820_Reserved area at 0xFC000000, and this code expects
> + * that nothing is mapped up to HVM_SHARED_INFO_ADDR.
> + * Xen 4.3+ provides an explicit 1MB area at HVM_SHARED_INFO_ADDR which is used
> + * here for the shared info page. */
> +static void __init xen_hvm_init_shared_info(void)
> +{
> +	if (xen_major < 4) {
> +		xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
> +		xen_hvm_sip_phys = __pa(xen_hvm_shared_info);
> +	} else {
> +		xen_hvm_sip_phys = HVM_SHARED_INFO_ADDR;
> +		set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_hvm_sip_phys);
> +		xen_hvm_shared_info =
> +		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
> +	}
> +	xen_hvm_resume_shared_info();
> +}
> +
>  static void __init init_hvm_pv_info(void)
>  {
> -	int major, minor;
> -	uint32_t eax, ebx, ecx, edx, pages, msr, base;
> +	uint32_t  ecx, edx, pages, msr, base;
>  	u64 pfn;
>  
>  	base = xen_cpuid_base();
> -	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
> -
> -	major = eax >> 16;
> -	minor = eax & 0xffff;
> -	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
> -
>  	cpuid(base + 2, &pages, &msr, &ecx, &edx);
>  
>  	pfn = __pa(hypercall_page);
> @@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
>  
>  static uint32_t __init xen_hvm_platform(void)
>  {
> +	uint32_t eax, ebx, ecx, edx, base;
> +
>  	if (xen_pv_domain())
>  		return 0;
>  
> -	return xen_cpuid_base();
> +	base = xen_cpuid_base();
> +	if (!base)
> +		return 0;
> +
> +	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
> +
> +	xen_major = eax >> 16;
> +	xen_minor = eax & 0xffff;
> +
> +	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
> +
> +	return 1;
>  }
>  
>  bool xen_hvm_need_lapic(void)
> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> index c4df9db..8d6793e 100644
> --- a/arch/x86/xen/suspend.c
> +++ b/arch/x86/xen/suspend.c
> @@ -32,7 +32,7 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>  {
>  #ifdef CONFIG_XEN_PVHVM
>  	int cpu;
> -	xen_hvm_init_shared_info();
> +	xen_hvm_resume_shared_info();
>  	xen_callback_vector();
>  	xen_unplug_emulated_devices();
>  	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 97d8765..d083e82 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -43,7 +43,7 @@ void xen_enable_syscall(void);
>  void xen_vcpu_restore(void);
>  
>  void xen_callback_vector(void);
> -void xen_hvm_init_shared_info(void);
> +void xen_hvm_resume_shared_info(void);
>  void xen_unplug_emulated_devices(void);
>  
>  void __init xen_build_dynamic_phys_to_machine(void);
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 13:40 ` Vitaly Kuznetsov
@ 2014-07-15 15:09   ` Konrad Rzeszutek Wilk
  2014-07-15 15:52     ` Vitaly Kuznetsov
  2014-07-15 15:52     ` Vitaly Kuznetsov
  2014-07-15 15:09   ` Konrad Rzeszutek Wilk
                     ` (4 subsequent siblings)
  5 siblings, 2 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:09 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

On Tue, Jul 15, 2014 at 03:40:38PM +0200, Vitaly Kuznetsov wrote:
> PVHVM guest requires special actions before kexec. Register specific
> xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().
> 

This looks close to what I had sent as an RFC to you?

> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/xen/enlighten.c | 9 +++++++++
>  arch/x86/xen/smp.c       | 9 +++++++++
>  arch/x86/xen/smp.h       | 1 +
>  3 files changed, 19 insertions(+)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index a11af62..8074e4a 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
>  	.notifier_call	= xen_hvm_cpu_notify,
>  };
>  
> +static void xen_pvhvm_kexec_shutdown(void)
> +{
> +	xen_kexec_shutdown();
> +	native_machine_shutdown();
> +}
> +
>  static void __init xen_hvm_guest_init(void)
>  {
>  	init_hvm_pv_info();
> @@ -1849,6 +1855,9 @@ static void __init xen_hvm_guest_init(void)
>  	x86_init.irqs.intr_init = xen_init_IRQ;
>  	xen_hvm_init_time_ops();
>  	xen_hvm_init_mmu_ops();
> +#ifdef CONFIG_KEXEC
> +	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
> +#endif
>  }
>  
>  static uint32_t __init xen_hvm_platform(void)
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 7005974..35dcf39 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -18,6 +18,7 @@
>  #include <linux/smp.h>
>  #include <linux/irq_work.h>
>  #include <linux/tick.h>
> +#include <linux/kexec.h>
>  
>  #include <asm/paravirt.h>
>  #include <asm/desc.h>
> @@ -762,6 +763,14 @@ static void xen_hvm_cpu_die(unsigned int cpu)
>  	native_cpu_die(cpu);
>  }
>  
> +void xen_kexec_shutdown(void)
> +{
> +#ifdef CONFIG_KEXEC
> +	if (!kexec_in_progress)
> +		return;
> +#endif
> +}
> +
>  void __init xen_hvm_smp_init(void)
>  {
>  	if (!xen_have_vector_callback)
> diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
> index c7c2d89..1af0493 100644
> --- a/arch/x86/xen/smp.h
> +++ b/arch/x86/xen/smp.h
> @@ -8,4 +8,5 @@ extern void xen_send_IPI_allbutself(int vector);
>  extern void xen_send_IPI_all(int vector);
>  extern void xen_send_IPI_self(int vector);
>  
> +extern void xen_kexec_shutdown(void);
>  #endif
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-15 15:09   ` Konrad Rzeszutek Wilk
@ 2014-07-15 15:09   ` Konrad Rzeszutek Wilk
  2014-07-15 17:41   ` Boris Ostrovsky
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:09 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Tue, Jul 15, 2014 at 03:40:38PM +0200, Vitaly Kuznetsov wrote:
> PVHVM guest requires special actions before kexec. Register specific
> xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().
> 

This looks close to what I had sent as an RFC to you?

> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/xen/enlighten.c | 9 +++++++++
>  arch/x86/xen/smp.c       | 9 +++++++++
>  arch/x86/xen/smp.h       | 1 +
>  3 files changed, 19 insertions(+)
> 
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index a11af62..8074e4a 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
>  	.notifier_call	= xen_hvm_cpu_notify,
>  };
>  
> +static void xen_pvhvm_kexec_shutdown(void)
> +{
> +	xen_kexec_shutdown();
> +	native_machine_shutdown();
> +}
> +
>  static void __init xen_hvm_guest_init(void)
>  {
>  	init_hvm_pv_info();
> @@ -1849,6 +1855,9 @@ static void __init xen_hvm_guest_init(void)
>  	x86_init.irqs.intr_init = xen_init_IRQ;
>  	xen_hvm_init_time_ops();
>  	xen_hvm_init_mmu_ops();
> +#ifdef CONFIG_KEXEC
> +	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
> +#endif
>  }
>  
>  static uint32_t __init xen_hvm_platform(void)
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 7005974..35dcf39 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -18,6 +18,7 @@
>  #include <linux/smp.h>
>  #include <linux/irq_work.h>
>  #include <linux/tick.h>
> +#include <linux/kexec.h>
>  
>  #include <asm/paravirt.h>
>  #include <asm/desc.h>
> @@ -762,6 +763,14 @@ static void xen_hvm_cpu_die(unsigned int cpu)
>  	native_cpu_die(cpu);
>  }
>  
> +void xen_kexec_shutdown(void)
> +{
> +#ifdef CONFIG_KEXEC
> +	if (!kexec_in_progress)
> +		return;
> +#endif
> +}
> +
>  void __init xen_hvm_smp_init(void)
>  {
>  	if (!xen_have_vector_callback)
> diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
> index c7c2d89..1af0493 100644
> --- a/arch/x86/xen/smp.h
> +++ b/arch/x86/xen/smp.h
> @@ -8,4 +8,5 @@ extern void xen_send_IPI_allbutself(int vector);
>  extern void xen_send_IPI_all(int vector);
>  extern void xen_send_IPI_self(int vector);
>  
> +extern void xen_kexec_shutdown(void);
>  #endif
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-15 13:40 ` [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec Vitaly Kuznetsov
  2014-07-15 15:21   ` Konrad Rzeszutek Wilk
@ 2014-07-15 15:21   ` Konrad Rzeszutek Wilk
  2014-07-16  9:01     ` Vitaly Kuznetsov
  2014-07-16  9:01     ` Vitaly Kuznetsov
  1 sibling, 2 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:21 UTC (permalink / raw)
  To: Vitaly Kuznetsov, stefano.stabellini
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
> When kexec was peformed MSI IRQs for passthrough-ed devices were already
> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
> fails as we have no IRQ mapping information for that. Requesting for new
> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
> we don't recieve these IRQs.

receive

How come '__write_msi_msg' does not result in new MSI IRQs?

Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
from the device and updating the internal PIRQ<->IRQ code to match
with the reality?

> 
> RFC: I wasn't able to understand why commit af42b8d1 which introduced
> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
> not sure when __write_msi_msg() with new PIRQ will result in new mapping.

We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
At least that is how the commit you mentioned worked.

In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
that we might be called twice by a buggy driver. As such we want to check
our PIRQ<->IRQ to figure this out.

> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/pci/xen.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index 905956f..685e8f1 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>  		__read_msi_msg(msidesc, &msg);
>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
> -		    xen_irq_from_pirq(pirq) < 0) {
> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
>  			if (pirq < 0) {
>  				irq = -ENODEV;
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-15 13:40 ` [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec Vitaly Kuznetsov
@ 2014-07-15 15:21   ` Konrad Rzeszutek Wilk
  2014-07-15 15:21   ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:21 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
> When kexec was peformed MSI IRQs for passthrough-ed devices were already
> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
> fails as we have no IRQ mapping information for that. Requesting for new
> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
> we don't recieve these IRQs.

receive

How come '__write_msi_msg' does not result in new MSI IRQs?

Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
from the device and updating the internal PIRQ<->IRQ code to match
with the reality?

> 
> RFC: I wasn't able to understand why commit af42b8d1 which introduced
> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
> not sure when __write_msi_msg() with new PIRQ will result in new mapping.

We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
At least that is how the commit you mentioned worked.

In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
that we might be called twice by a buggy driver. As such we want to check
our PIRQ<->IRQ to figure this out.

> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/pci/xen.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index 905956f..685e8f1 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>  		__read_msi_msg(msidesc, &msg);
>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
> -		    xen_irq_from_pirq(pirq) < 0) {
> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
>  			if (pirq < 0) {
>  				irq = -ENODEV;
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-15 13:40 ` [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown Vitaly Kuznetsov
  2014-07-15 15:23   ` Konrad Rzeszutek Wilk
@ 2014-07-15 15:23   ` Konrad Rzeszutek Wilk
  2014-07-16  9:37     ` Vitaly Kuznetsov
  2014-07-16  9:37     ` Vitaly Kuznetsov
  2014-07-28 13:43   ` David Vrabel
  2014-07-28 13:43   ` David Vrabel
  3 siblings, 2 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

On Tue, Jul 15, 2014 at 03:40:39PM +0200, Vitaly Kuznetsov wrote:
> When kexec is being run PIRQs from Qemu-emulated devices are still
> mapped to old event channels and new kernel has no information about
> that. Trying to map them twice results in the following in Xen's dmesg:
> 
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>  ...
> 
>  and the following in new kernel's dmesg:
> 
>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
> 
> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
> when kexec was requested and on every kernel startup. We need to do this
> twice to deal with the following issues:
> - startup-time unmapping is required to make kdump work;
> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>   is being performed).

How does this work when you boot the guest under Xen 4.4 where the FIFO events
are used? Does it still work correctly?

Thanks.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/xen/smp.c               |  1 +
>  drivers/xen/events/events_base.c | 76 ++++++++++++++++++++++++++++++++++++++++
>  include/xen/events.h             |  3 ++
>  3 files changed, 80 insertions(+)
> 
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 35dcf39..e2b4deb 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
>  #ifdef CONFIG_KEXEC
>  	if (!kexec_in_progress)
>  		return;
> +	xen_unmap_all_pirqs();
>  #endif
>  }
>  
> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
> index c919d3d..7701c7f 100644
> --- a/drivers/xen/events/events_base.c
> +++ b/drivers/xen/events/events_base.c
> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
>  static bool fifo_events = true;
>  module_param(fifo_events, bool, 0);
>  
> +void xen_unmap_all_pirqs(void)
> +{
> +	int pirq, rc, gsi, irq, evtchn;
> +	struct physdev_unmap_pirq unmap_irq;
> +	struct irq_info *info;
> +	struct evtchn_close close;
> +
> +	mutex_lock(&irq_mapping_update_lock);
> +
> +	list_for_each_entry(info, &xen_irq_list_head, list) {
> +		if (info->type != IRQT_PIRQ)
> +			continue;
> +
> +		pirq = info->u.pirq.pirq;
> +		gsi = info->u.pirq.gsi;
> +		evtchn = info->evtchn;
> +		irq = info->irq;
> +
> +		pr_debug("unmapping pirq gsi=%d pirq=%d irq=%d evtchn=%d\n",
> +			gsi, pirq, irq, evtchn);
> +
> +		if (evtchn > 0) {
> +			close.port = evtchn;
> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
> +							&close) != 0)
> +				pr_warn("close evtchn %d failed\n", evtchn);
> +		}
> +
> +		unmap_irq.pirq = pirq;
> +		unmap_irq.domid = DOMID_SELF;
> +
> +		rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
> +		if (rc)
> +			pr_warn("unmap pirq failed gsi=%d pirq=%d irq=%d rc=%d\n",
> +				gsi, pirq, irq, rc);
> +	}
> +
> +	mutex_unlock(&irq_mapping_update_lock);
> +}
> +EXPORT_SYMBOL_GPL(xen_unmap_all_pirqs);

Why the EXPORT? Is this used by modules?
> +
> +static void xen_startup_unmap_pirqs(void)
> +{
> +	struct evtchn_status status;
> +	int port, rc = -ENOENT;
> +	struct physdev_unmap_pirq unmap_irq;
> +	struct evtchn_close close;
> +
> +	memset(&status, 0, sizeof(status));
> +	for (port = 0; port < xen_evtchn_max_channels(); port++) {
> +		status.dom = DOMID_SELF;
> +		status.port = port;
> +		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
> +		if (rc < 0)
> +			continue;
> +		if (status.status == EVTCHNSTAT_pirq) {
> +			close.port = port;
> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
> +							&close) != 0)
> +				pr_warn("xen: failed to close evtchn %d\n",
> +					port);
> +			unmap_irq.pirq = status.u.pirq;
> +			unmap_irq.domid = DOMID_SELF;
> +			pr_warn("xen: unmapping previously mapped pirq %d\n",
> +				unmap_irq.pirq);
> +			if (HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq,
> +						  &unmap_irq) != 0)
> +				pr_warn("xen: failed to unmap pirq %d\n",
> +					unmap_irq.pirq);
> +		}
> +	}
> +}
> +
> +
>  void __init xen_init_IRQ(void)
>  {
>  	int ret = -EINVAL;
> @@ -1671,6 +1745,8 @@ void __init xen_init_IRQ(void)
>  		xen_callback_vector();
>  
>  	if (xen_hvm_domain()) {
> +		xen_startup_unmap_pirqs();
> +
>  		native_init_IRQ();
>  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
>  		 * __acpi_register_gsi can point at the right function */
> diff --git a/include/xen/events.h b/include/xen/events.h
> index 8bee7a7..3f9f428 100644
> --- a/include/xen/events.h
> +++ b/include/xen/events.h
> @@ -122,6 +122,9 @@ int xen_irq_from_gsi(unsigned gsi);
>  /* Determine whether to ignore this IRQ if it is passed to a guest. */
>  int xen_test_irq_shared(int irq);
>  
> +/* Unmap all PIRQs and close event channels */
> +void xen_unmap_all_pirqs(void);
> +
>  /* initialize Xen IRQ subsystem */
>  void xen_init_IRQ(void);
>  #endif	/* _XEN_EVENTS_H */
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-15 13:40 ` [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown Vitaly Kuznetsov
@ 2014-07-15 15:23   ` Konrad Rzeszutek Wilk
  2014-07-15 15:23   ` Konrad Rzeszutek Wilk
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:23 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Tue, Jul 15, 2014 at 03:40:39PM +0200, Vitaly Kuznetsov wrote:
> When kexec is being run PIRQs from Qemu-emulated devices are still
> mapped to old event channels and new kernel has no information about
> that. Trying to map them twice results in the following in Xen's dmesg:
> 
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>  ...
> 
>  and the following in new kernel's dmesg:
> 
>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
> 
> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
> when kexec was requested and on every kernel startup. We need to do this
> twice to deal with the following issues:
> - startup-time unmapping is required to make kdump work;
> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>   is being performed).

How does this work when you boot the guest under Xen 4.4 where the FIFO events
are used? Does it still work correctly?

Thanks.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/xen/smp.c               |  1 +
>  drivers/xen/events/events_base.c | 76 ++++++++++++++++++++++++++++++++++++++++
>  include/xen/events.h             |  3 ++
>  3 files changed, 80 insertions(+)
> 
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 35dcf39..e2b4deb 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
>  #ifdef CONFIG_KEXEC
>  	if (!kexec_in_progress)
>  		return;
> +	xen_unmap_all_pirqs();
>  #endif
>  }
>  
> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
> index c919d3d..7701c7f 100644
> --- a/drivers/xen/events/events_base.c
> +++ b/drivers/xen/events/events_base.c
> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
>  static bool fifo_events = true;
>  module_param(fifo_events, bool, 0);
>  
> +void xen_unmap_all_pirqs(void)
> +{
> +	int pirq, rc, gsi, irq, evtchn;
> +	struct physdev_unmap_pirq unmap_irq;
> +	struct irq_info *info;
> +	struct evtchn_close close;
> +
> +	mutex_lock(&irq_mapping_update_lock);
> +
> +	list_for_each_entry(info, &xen_irq_list_head, list) {
> +		if (info->type != IRQT_PIRQ)
> +			continue;
> +
> +		pirq = info->u.pirq.pirq;
> +		gsi = info->u.pirq.gsi;
> +		evtchn = info->evtchn;
> +		irq = info->irq;
> +
> +		pr_debug("unmapping pirq gsi=%d pirq=%d irq=%d evtchn=%d\n",
> +			gsi, pirq, irq, evtchn);
> +
> +		if (evtchn > 0) {
> +			close.port = evtchn;
> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
> +							&close) != 0)
> +				pr_warn("close evtchn %d failed\n", evtchn);
> +		}
> +
> +		unmap_irq.pirq = pirq;
> +		unmap_irq.domid = DOMID_SELF;
> +
> +		rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
> +		if (rc)
> +			pr_warn("unmap pirq failed gsi=%d pirq=%d irq=%d rc=%d\n",
> +				gsi, pirq, irq, rc);
> +	}
> +
> +	mutex_unlock(&irq_mapping_update_lock);
> +}
> +EXPORT_SYMBOL_GPL(xen_unmap_all_pirqs);

Why the EXPORT? Is this used by modules?
> +
> +static void xen_startup_unmap_pirqs(void)
> +{
> +	struct evtchn_status status;
> +	int port, rc = -ENOENT;
> +	struct physdev_unmap_pirq unmap_irq;
> +	struct evtchn_close close;
> +
> +	memset(&status, 0, sizeof(status));
> +	for (port = 0; port < xen_evtchn_max_channels(); port++) {
> +		status.dom = DOMID_SELF;
> +		status.port = port;
> +		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
> +		if (rc < 0)
> +			continue;
> +		if (status.status == EVTCHNSTAT_pirq) {
> +			close.port = port;
> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
> +							&close) != 0)
> +				pr_warn("xen: failed to close evtchn %d\n",
> +					port);
> +			unmap_irq.pirq = status.u.pirq;
> +			unmap_irq.domid = DOMID_SELF;
> +			pr_warn("xen: unmapping previously mapped pirq %d\n",
> +				unmap_irq.pirq);
> +			if (HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq,
> +						  &unmap_irq) != 0)
> +				pr_warn("xen: failed to unmap pirq %d\n",
> +					unmap_irq.pirq);
> +		}
> +	}
> +}
> +
> +
>  void __init xen_init_IRQ(void)
>  {
>  	int ret = -EINVAL;
> @@ -1671,6 +1745,8 @@ void __init xen_init_IRQ(void)
>  		xen_callback_vector();
>  
>  	if (xen_hvm_domain()) {
> +		xen_startup_unmap_pirqs();
> +
>  		native_init_IRQ();
>  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
>  		 * __acpi_register_gsi can point at the right function */
> diff --git a/include/xen/events.h b/include/xen/events.h
> index 8bee7a7..3f9f428 100644
> --- a/include/xen/events.h
> +++ b/include/xen/events.h
> @@ -122,6 +122,9 @@ int xen_irq_from_gsi(unsigned gsi);
>  /* Determine whether to ignore this IRQ if it is passed to a guest. */
>  int xen_test_irq_shared(int irq);
>  
> +/* Unmap all PIRQs and close event channels */
> +void xen_unmap_all_pirqs(void);
> +
>  /* initialize Xen IRQ subsystem */
>  void xen_init_IRQ(void);
>  #endif	/* _XEN_EVENTS_H */
> -- 
> 1.9.3
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 15:06   ` Konrad Rzeszutek Wilk
  2014-07-15 15:43     ` Vitaly Kuznetsov
@ 2014-07-15 15:43     ` Vitaly Kuznetsov
  2014-07-15 15:50       ` Konrad Rzeszutek Wilk
  2014-07-15 15:50       ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 15:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
>> From: Olaf Hering <olaf@aepfle.de>
>> 
>> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
>> ("xen PVonHVM: move shared_info to MMIO before kexec").
>> 
>> Currently kexec in a PVonHVM guest fails with a triple fault because the
>> new kernel overwrites the shared info page. The exact failure depends on
>> the size of the kernel image. This patch moves the pfn from RAM into an
>> E820 reserved memory area.
>> 
>> The pfn containing the shared_info is located somewhere in RAM. This will
>> cause trouble if the current kernel is doing a kexec boot into a new
>> kernel. The new kernel (and its startup code) can not know where the pfn
>> is, so it can not reserve the page. The hypervisor will continue to update
>> the pfn, and as a result memory corruption occours in the new kernel.
>> 
>> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
>> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
>> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
>> will usually not allocate areas up to FE700000, so FE700000 is expected
>> to work also with older toolstacks.
>> 
>> In Xen3 there is no reserved area at a fixed location. If the guest is
>> started on such old hosts the shared_info page will be placed in RAM. As
>> a result kexec can not be used.
>
> So this looks right, the one thing that we really need to check 
> is e9daff24a266307943457086533041bd971d0ef9
>
>    This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
>
>     We are doing this b/c on 32-bit PVonHVM with older hypervisors
>     (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
>     we use it for the time keeping, and the timekeeping code loops
>     forever - as the version field never changes. Olaf says to
>     revert it, so lets do that.
>
> Could you kindly test that the migration on 32-bit PVHVM guests
> on older hypervisors works?
>

Sure, will do! Was there anything special about the setup or any 32-bit
pvhvm guest migration (on 64-bit hypervisor I suppose) would fail? I can
try checking both current and old versions to make sure the issue was
acutually fixed.

>> 
>> Signed-off-by: Olaf Hering <olaf@aepfle.de>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)
>> 
>> [On resume we need to reset the xen_vcpu_info, which the original
>> patch did not do]
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/xen/enlighten.c | 74 ++++++++++++++++++++++++++++++++++++------------
>>  arch/x86/xen/suspend.c   |  2 +-
>>  arch/x86/xen/xen-ops.h   |  2 +-
>>  3 files changed, 58 insertions(+), 20 deletions(-)
>> 
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index ffb101e..a11af62 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
>>  #endif
>>  }
>>  
>> -void __ref xen_hvm_init_shared_info(void)
>> +#ifdef CONFIG_XEN_PVHVM
>> +#define HVM_SHARED_INFO_ADDR 0xFE700000UL
>> +static struct shared_info *xen_hvm_shared_info;
>> +static unsigned long xen_hvm_sip_phys;
>> +static int xen_major, xen_minor;
>> +
>> +static void xen_hvm_connect_shared_info(unsigned long pfn)
>>  {
>> -	int cpu;
>>  	struct xen_add_to_physmap xatp;
>> -	static struct shared_info *shared_info_page = 0;
>>  
>> -	if (!shared_info_page)
>> -		shared_info_page = (struct shared_info *)
>> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
>>  	xatp.domid = DOMID_SELF;
>>  	xatp.idx = 0;
>>  	xatp.space = XENMAPSPACE_shared_info;
>> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
>> +	xatp.gpfn = pfn;
>>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
>>  		BUG();
>>  
>> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
>> +}
>> +static void __init xen_hvm_set_shared_info(struct shared_info *sip)
>> +{
>> +	int cpu;
>> +
>> +	HYPERVISOR_shared_info = sip;
>>  
>>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>>  	 * page, we use it in the event channel upcall and in some pvclock
>> @@ -1760,20 +1766,39 @@ void __ref xen_hvm_init_shared_info(void)
>>  	}
>>  }
>>  
>> -#ifdef CONFIG_XEN_PVHVM
>> +/* Reconnect the shared_info pfn to a (new) mfn */
>> +void xen_hvm_resume_shared_info(void)
>> +{
>> +	xen_hvm_connect_shared_info(xen_hvm_sip_phys >> PAGE_SHIFT);
>> +	xen_hvm_set_shared_info(xen_hvm_shared_info);
>> +}
>> +
>> +/* Xen tools prior to Xen 4 do not provide a E820_Reserved area for guest usage.
>> + * On these old tools the shared info page will be placed in E820_Ram.
>> + * Xen 4 provides a E820_Reserved area at 0xFC000000, and this code expects
>> + * that nothing is mapped up to HVM_SHARED_INFO_ADDR.
>> + * Xen 4.3+ provides an explicit 1MB area at HVM_SHARED_INFO_ADDR which is used
>> + * here for the shared info page. */
>> +static void __init xen_hvm_init_shared_info(void)
>> +{
>> +	if (xen_major < 4) {
>> +		xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
>> +		xen_hvm_sip_phys = __pa(xen_hvm_shared_info);
>> +	} else {
>> +		xen_hvm_sip_phys = HVM_SHARED_INFO_ADDR;
>> +		set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_hvm_sip_phys);
>> +		xen_hvm_shared_info =
>> +		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
>> +	}
>> +	xen_hvm_resume_shared_info();
>> +}
>> +
>>  static void __init init_hvm_pv_info(void)
>>  {
>> -	int major, minor;
>> -	uint32_t eax, ebx, ecx, edx, pages, msr, base;
>> +	uint32_t  ecx, edx, pages, msr, base;
>>  	u64 pfn;
>>  
>>  	base = xen_cpuid_base();
>> -	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
>> -
>> -	major = eax >> 16;
>> -	minor = eax & 0xffff;
>> -	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
>> -
>>  	cpuid(base + 2, &pages, &msr, &ecx, &edx);
>>  
>>  	pfn = __pa(hypercall_page);
>> @@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
>>  
>>  static uint32_t __init xen_hvm_platform(void)
>>  {
>> +	uint32_t eax, ebx, ecx, edx, base;
>> +
>>  	if (xen_pv_domain())
>>  		return 0;
>>  
>> -	return xen_cpuid_base();
>> +	base = xen_cpuid_base();
>> +	if (!base)
>> +		return 0;
>> +
>> +	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
>> +
>> +	xen_major = eax >> 16;
>> +	xen_minor = eax & 0xffff;
>> +
>> +	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
>> +
>> +	return 1;
>>  }
>>  
>>  bool xen_hvm_need_lapic(void)
>> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
>> index c4df9db..8d6793e 100644
>> --- a/arch/x86/xen/suspend.c
>> +++ b/arch/x86/xen/suspend.c
>> @@ -32,7 +32,7 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>>  {
>>  #ifdef CONFIG_XEN_PVHVM
>>  	int cpu;
>> -	xen_hvm_init_shared_info();
>> +	xen_hvm_resume_shared_info();
>>  	xen_callback_vector();
>>  	xen_unplug_emulated_devices();
>>  	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
>> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
>> index 97d8765..d083e82 100644
>> --- a/arch/x86/xen/xen-ops.h
>> +++ b/arch/x86/xen/xen-ops.h
>> @@ -43,7 +43,7 @@ void xen_enable_syscall(void);
>>  void xen_vcpu_restore(void);
>>  
>>  void xen_callback_vector(void);
>> -void xen_hvm_init_shared_info(void);
>> +void xen_hvm_resume_shared_info(void);
>>  void xen_unplug_emulated_devices(void);
>>  
>>  void __init xen_build_dynamic_phys_to_machine(void);
>> -- 
>> 1.9.3
>> 

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 15:06   ` Konrad Rzeszutek Wilk
@ 2014-07-15 15:43     ` Vitaly Kuznetsov
  2014-07-15 15:43     ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 15:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
>> From: Olaf Hering <olaf@aepfle.de>
>> 
>> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
>> ("xen PVonHVM: move shared_info to MMIO before kexec").
>> 
>> Currently kexec in a PVonHVM guest fails with a triple fault because the
>> new kernel overwrites the shared info page. The exact failure depends on
>> the size of the kernel image. This patch moves the pfn from RAM into an
>> E820 reserved memory area.
>> 
>> The pfn containing the shared_info is located somewhere in RAM. This will
>> cause trouble if the current kernel is doing a kexec boot into a new
>> kernel. The new kernel (and its startup code) can not know where the pfn
>> is, so it can not reserve the page. The hypervisor will continue to update
>> the pfn, and as a result memory corruption occours in the new kernel.
>> 
>> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
>> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
>> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
>> will usually not allocate areas up to FE700000, so FE700000 is expected
>> to work also with older toolstacks.
>> 
>> In Xen3 there is no reserved area at a fixed location. If the guest is
>> started on such old hosts the shared_info page will be placed in RAM. As
>> a result kexec can not be used.
>
> So this looks right, the one thing that we really need to check 
> is e9daff24a266307943457086533041bd971d0ef9
>
>    This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
>
>     We are doing this b/c on 32-bit PVonHVM with older hypervisors
>     (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
>     we use it for the time keeping, and the timekeeping code loops
>     forever - as the version field never changes. Olaf says to
>     revert it, so lets do that.
>
> Could you kindly test that the migration on 32-bit PVHVM guests
> on older hypervisors works?
>

Sure, will do! Was there anything special about the setup or any 32-bit
pvhvm guest migration (on 64-bit hypervisor I suppose) would fail? I can
try checking both current and old versions to make sure the issue was
acutually fixed.

>> 
>> Signed-off-by: Olaf Hering <olaf@aepfle.de>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)
>> 
>> [On resume we need to reset the xen_vcpu_info, which the original
>> patch did not do]
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/xen/enlighten.c | 74 ++++++++++++++++++++++++++++++++++++------------
>>  arch/x86/xen/suspend.c   |  2 +-
>>  arch/x86/xen/xen-ops.h   |  2 +-
>>  3 files changed, 58 insertions(+), 20 deletions(-)
>> 
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index ffb101e..a11af62 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
>>  #endif
>>  }
>>  
>> -void __ref xen_hvm_init_shared_info(void)
>> +#ifdef CONFIG_XEN_PVHVM
>> +#define HVM_SHARED_INFO_ADDR 0xFE700000UL
>> +static struct shared_info *xen_hvm_shared_info;
>> +static unsigned long xen_hvm_sip_phys;
>> +static int xen_major, xen_minor;
>> +
>> +static void xen_hvm_connect_shared_info(unsigned long pfn)
>>  {
>> -	int cpu;
>>  	struct xen_add_to_physmap xatp;
>> -	static struct shared_info *shared_info_page = 0;
>>  
>> -	if (!shared_info_page)
>> -		shared_info_page = (struct shared_info *)
>> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
>>  	xatp.domid = DOMID_SELF;
>>  	xatp.idx = 0;
>>  	xatp.space = XENMAPSPACE_shared_info;
>> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
>> +	xatp.gpfn = pfn;
>>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
>>  		BUG();
>>  
>> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
>> +}
>> +static void __init xen_hvm_set_shared_info(struct shared_info *sip)
>> +{
>> +	int cpu;
>> +
>> +	HYPERVISOR_shared_info = sip;
>>  
>>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>>  	 * page, we use it in the event channel upcall and in some pvclock
>> @@ -1760,20 +1766,39 @@ void __ref xen_hvm_init_shared_info(void)
>>  	}
>>  }
>>  
>> -#ifdef CONFIG_XEN_PVHVM
>> +/* Reconnect the shared_info pfn to a (new) mfn */
>> +void xen_hvm_resume_shared_info(void)
>> +{
>> +	xen_hvm_connect_shared_info(xen_hvm_sip_phys >> PAGE_SHIFT);
>> +	xen_hvm_set_shared_info(xen_hvm_shared_info);
>> +}
>> +
>> +/* Xen tools prior to Xen 4 do not provide a E820_Reserved area for guest usage.
>> + * On these old tools the shared info page will be placed in E820_Ram.
>> + * Xen 4 provides a E820_Reserved area at 0xFC000000, and this code expects
>> + * that nothing is mapped up to HVM_SHARED_INFO_ADDR.
>> + * Xen 4.3+ provides an explicit 1MB area at HVM_SHARED_INFO_ADDR which is used
>> + * here for the shared info page. */
>> +static void __init xen_hvm_init_shared_info(void)
>> +{
>> +	if (xen_major < 4) {
>> +		xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
>> +		xen_hvm_sip_phys = __pa(xen_hvm_shared_info);
>> +	} else {
>> +		xen_hvm_sip_phys = HVM_SHARED_INFO_ADDR;
>> +		set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_hvm_sip_phys);
>> +		xen_hvm_shared_info =
>> +		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
>> +	}
>> +	xen_hvm_resume_shared_info();
>> +}
>> +
>>  static void __init init_hvm_pv_info(void)
>>  {
>> -	int major, minor;
>> -	uint32_t eax, ebx, ecx, edx, pages, msr, base;
>> +	uint32_t  ecx, edx, pages, msr, base;
>>  	u64 pfn;
>>  
>>  	base = xen_cpuid_base();
>> -	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
>> -
>> -	major = eax >> 16;
>> -	minor = eax & 0xffff;
>> -	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
>> -
>>  	cpuid(base + 2, &pages, &msr, &ecx, &edx);
>>  
>>  	pfn = __pa(hypercall_page);
>> @@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
>>  
>>  static uint32_t __init xen_hvm_platform(void)
>>  {
>> +	uint32_t eax, ebx, ecx, edx, base;
>> +
>>  	if (xen_pv_domain())
>>  		return 0;
>>  
>> -	return xen_cpuid_base();
>> +	base = xen_cpuid_base();
>> +	if (!base)
>> +		return 0;
>> +
>> +	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
>> +
>> +	xen_major = eax >> 16;
>> +	xen_minor = eax & 0xffff;
>> +
>> +	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
>> +
>> +	return 1;
>>  }
>>  
>>  bool xen_hvm_need_lapic(void)
>> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
>> index c4df9db..8d6793e 100644
>> --- a/arch/x86/xen/suspend.c
>> +++ b/arch/x86/xen/suspend.c
>> @@ -32,7 +32,7 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>>  {
>>  #ifdef CONFIG_XEN_PVHVM
>>  	int cpu;
>> -	xen_hvm_init_shared_info();
>> +	xen_hvm_resume_shared_info();
>>  	xen_callback_vector();
>>  	xen_unplug_emulated_devices();
>>  	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
>> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
>> index 97d8765..d083e82 100644
>> --- a/arch/x86/xen/xen-ops.h
>> +++ b/arch/x86/xen/xen-ops.h
>> @@ -43,7 +43,7 @@ void xen_enable_syscall(void);
>>  void xen_vcpu_restore(void);
>>  
>>  void xen_callback_vector(void);
>> -void xen_hvm_init_shared_info(void);
>> +void xen_hvm_resume_shared_info(void);
>>  void xen_unplug_emulated_devices(void);
>>  
>>  void __init xen_build_dynamic_phys_to_machine(void);
>> -- 
>> 1.9.3
>> 

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 15:43     ` Vitaly Kuznetsov
@ 2014-07-15 15:50       ` Konrad Rzeszutek Wilk
  2014-07-18 11:05         ` Vitaly Kuznetsov
  2014-07-18 11:05         ` Vitaly Kuznetsov
  2014-07-15 15:50       ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:50 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

On Tue, Jul 15, 2014 at 05:43:17PM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
> >> From: Olaf Hering <olaf@aepfle.de>
> >> 
> >> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
> >> ("xen PVonHVM: move shared_info to MMIO before kexec").
> >> 
> >> Currently kexec in a PVonHVM guest fails with a triple fault because the
> >> new kernel overwrites the shared info page. The exact failure depends on
> >> the size of the kernel image. This patch moves the pfn from RAM into an
> >> E820 reserved memory area.
> >> 
> >> The pfn containing the shared_info is located somewhere in RAM. This will
> >> cause trouble if the current kernel is doing a kexec boot into a new
> >> kernel. The new kernel (and its startup code) can not know where the pfn
> >> is, so it can not reserve the page. The hypervisor will continue to update
> >> the pfn, and as a result memory corruption occours in the new kernel.
> >> 
> >> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
> >> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
> >> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
> >> will usually not allocate areas up to FE700000, so FE700000 is expected
> >> to work also with older toolstacks.
> >> 
> >> In Xen3 there is no reserved area at a fixed location. If the guest is
> >> started on such old hosts the shared_info page will be placed in RAM. As
> >> a result kexec can not be used.
> >
> > So this looks right, the one thing that we really need to check 
> > is e9daff24a266307943457086533041bd971d0ef9
> >
> >    This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
> >
> >     We are doing this b/c on 32-bit PVonHVM with older hypervisors
> >     (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
> >     we use it for the time keeping, and the timekeeping code loops
> >     forever - as the version field never changes. Olaf says to
> >     revert it, so lets do that.
> >
> > Could you kindly test that the migration on 32-bit PVHVM guests
> > on older hypervisors works?
> >
> 
> Sure, will do! Was there anything special about the setup or any 32-bit
> pvhvm guest migration (on 64-bit hypervisor I suppose) would fail? I can
> try checking both current and old versions to make sure the issue was
> acutually fixed.

Nothing fancy (well, it was SMP, so 4 CPUs). I did the 'save'/'restore' and the
guest would not restore properly.

Thank you!
> 
> >> 
> >> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> >> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)
> >> 
> >> [On resume we need to reset the xen_vcpu_info, which the original
> >> patch did not do]
> >> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/x86/xen/enlighten.c | 74 ++++++++++++++++++++++++++++++++++++------------
> >>  arch/x86/xen/suspend.c   |  2 +-
> >>  arch/x86/xen/xen-ops.h   |  2 +-
> >>  3 files changed, 58 insertions(+), 20 deletions(-)
> >> 
> >> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> >> index ffb101e..a11af62 100644
> >> --- a/arch/x86/xen/enlighten.c
> >> +++ b/arch/x86/xen/enlighten.c
> >> @@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
> >>  #endif
> >>  }
> >>  
> >> -void __ref xen_hvm_init_shared_info(void)
> >> +#ifdef CONFIG_XEN_PVHVM
> >> +#define HVM_SHARED_INFO_ADDR 0xFE700000UL
> >> +static struct shared_info *xen_hvm_shared_info;
> >> +static unsigned long xen_hvm_sip_phys;
> >> +static int xen_major, xen_minor;
> >> +
> >> +static void xen_hvm_connect_shared_info(unsigned long pfn)
> >>  {
> >> -	int cpu;
> >>  	struct xen_add_to_physmap xatp;
> >> -	static struct shared_info *shared_info_page = 0;
> >>  
> >> -	if (!shared_info_page)
> >> -		shared_info_page = (struct shared_info *)
> >> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
> >>  	xatp.domid = DOMID_SELF;
> >>  	xatp.idx = 0;
> >>  	xatp.space = XENMAPSPACE_shared_info;
> >> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> >> +	xatp.gpfn = pfn;
> >>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
> >>  		BUG();
> >>  
> >> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> >> +}
> >> +static void __init xen_hvm_set_shared_info(struct shared_info *sip)
> >> +{
> >> +	int cpu;
> >> +
> >> +	HYPERVISOR_shared_info = sip;
> >>  
> >>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
> >>  	 * page, we use it in the event channel upcall and in some pvclock
> >> @@ -1760,20 +1766,39 @@ void __ref xen_hvm_init_shared_info(void)
> >>  	}
> >>  }
> >>  
> >> -#ifdef CONFIG_XEN_PVHVM
> >> +/* Reconnect the shared_info pfn to a (new) mfn */
> >> +void xen_hvm_resume_shared_info(void)
> >> +{
> >> +	xen_hvm_connect_shared_info(xen_hvm_sip_phys >> PAGE_SHIFT);
> >> +	xen_hvm_set_shared_info(xen_hvm_shared_info);
> >> +}
> >> +
> >> +/* Xen tools prior to Xen 4 do not provide a E820_Reserved area for guest usage.
> >> + * On these old tools the shared info page will be placed in E820_Ram.
> >> + * Xen 4 provides a E820_Reserved area at 0xFC000000, and this code expects
> >> + * that nothing is mapped up to HVM_SHARED_INFO_ADDR.
> >> + * Xen 4.3+ provides an explicit 1MB area at HVM_SHARED_INFO_ADDR which is used
> >> + * here for the shared info page. */
> >> +static void __init xen_hvm_init_shared_info(void)
> >> +{
> >> +	if (xen_major < 4) {
> >> +		xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
> >> +		xen_hvm_sip_phys = __pa(xen_hvm_shared_info);
> >> +	} else {
> >> +		xen_hvm_sip_phys = HVM_SHARED_INFO_ADDR;
> >> +		set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_hvm_sip_phys);
> >> +		xen_hvm_shared_info =
> >> +		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
> >> +	}
> >> +	xen_hvm_resume_shared_info();
> >> +}
> >> +
> >>  static void __init init_hvm_pv_info(void)
> >>  {
> >> -	int major, minor;
> >> -	uint32_t eax, ebx, ecx, edx, pages, msr, base;
> >> +	uint32_t  ecx, edx, pages, msr, base;
> >>  	u64 pfn;
> >>  
> >>  	base = xen_cpuid_base();
> >> -	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
> >> -
> >> -	major = eax >> 16;
> >> -	minor = eax & 0xffff;
> >> -	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
> >> -
> >>  	cpuid(base + 2, &pages, &msr, &ecx, &edx);
> >>  
> >>  	pfn = __pa(hypercall_page);
> >> @@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
> >>  
> >>  static uint32_t __init xen_hvm_platform(void)
> >>  {
> >> +	uint32_t eax, ebx, ecx, edx, base;
> >> +
> >>  	if (xen_pv_domain())
> >>  		return 0;
> >>  
> >> -	return xen_cpuid_base();
> >> +	base = xen_cpuid_base();
> >> +	if (!base)
> >> +		return 0;
> >> +
> >> +	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
> >> +
> >> +	xen_major = eax >> 16;
> >> +	xen_minor = eax & 0xffff;
> >> +
> >> +	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
> >> +
> >> +	return 1;
> >>  }
> >>  
> >>  bool xen_hvm_need_lapic(void)
> >> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> >> index c4df9db..8d6793e 100644
> >> --- a/arch/x86/xen/suspend.c
> >> +++ b/arch/x86/xen/suspend.c
> >> @@ -32,7 +32,7 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
> >>  {
> >>  #ifdef CONFIG_XEN_PVHVM
> >>  	int cpu;
> >> -	xen_hvm_init_shared_info();
> >> +	xen_hvm_resume_shared_info();
> >>  	xen_callback_vector();
> >>  	xen_unplug_emulated_devices();
> >>  	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> >> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> >> index 97d8765..d083e82 100644
> >> --- a/arch/x86/xen/xen-ops.h
> >> +++ b/arch/x86/xen/xen-ops.h
> >> @@ -43,7 +43,7 @@ void xen_enable_syscall(void);
> >>  void xen_vcpu_restore(void);
> >>  
> >>  void xen_callback_vector(void);
> >> -void xen_hvm_init_shared_info(void);
> >> +void xen_hvm_resume_shared_info(void);
> >>  void xen_unplug_emulated_devices(void);
> >>  
> >>  void __init xen_build_dynamic_phys_to_machine(void);
> >> -- 
> >> 1.9.3
> >> 
> 
> -- 
>   Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 15:43     ` Vitaly Kuznetsov
  2014-07-15 15:50       ` Konrad Rzeszutek Wilk
@ 2014-07-15 15:50       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:50 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Tue, Jul 15, 2014 at 05:43:17PM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
> >> From: Olaf Hering <olaf@aepfle.de>
> >> 
> >> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
> >> ("xen PVonHVM: move shared_info to MMIO before kexec").
> >> 
> >> Currently kexec in a PVonHVM guest fails with a triple fault because the
> >> new kernel overwrites the shared info page. The exact failure depends on
> >> the size of the kernel image. This patch moves the pfn from RAM into an
> >> E820 reserved memory area.
> >> 
> >> The pfn containing the shared_info is located somewhere in RAM. This will
> >> cause trouble if the current kernel is doing a kexec boot into a new
> >> kernel. The new kernel (and its startup code) can not know where the pfn
> >> is, so it can not reserve the page. The hypervisor will continue to update
> >> the pfn, and as a result memory corruption occours in the new kernel.
> >> 
> >> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
> >> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
> >> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
> >> will usually not allocate areas up to FE700000, so FE700000 is expected
> >> to work also with older toolstacks.
> >> 
> >> In Xen3 there is no reserved area at a fixed location. If the guest is
> >> started on such old hosts the shared_info page will be placed in RAM. As
> >> a result kexec can not be used.
> >
> > So this looks right, the one thing that we really need to check 
> > is e9daff24a266307943457086533041bd971d0ef9
> >
> >    This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
> >
> >     We are doing this b/c on 32-bit PVonHVM with older hypervisors
> >     (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
> >     we use it for the time keeping, and the timekeeping code loops
> >     forever - as the version field never changes. Olaf says to
> >     revert it, so lets do that.
> >
> > Could you kindly test that the migration on 32-bit PVHVM guests
> > on older hypervisors works?
> >
> 
> Sure, will do! Was there anything special about the setup or any 32-bit
> pvhvm guest migration (on 64-bit hypervisor I suppose) would fail? I can
> try checking both current and old versions to make sure the issue was
> acutually fixed.

Nothing fancy (well, it was SMP, so 4 CPUs). I did the 'save'/'restore' and the
guest would not restore properly.

Thank you!
> 
> >> 
> >> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> >> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)
> >> 
> >> [On resume we need to reset the xen_vcpu_info, which the original
> >> patch did not do]
> >> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/x86/xen/enlighten.c | 74 ++++++++++++++++++++++++++++++++++++------------
> >>  arch/x86/xen/suspend.c   |  2 +-
> >>  arch/x86/xen/xen-ops.h   |  2 +-
> >>  3 files changed, 58 insertions(+), 20 deletions(-)
> >> 
> >> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> >> index ffb101e..a11af62 100644
> >> --- a/arch/x86/xen/enlighten.c
> >> +++ b/arch/x86/xen/enlighten.c
> >> @@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
> >>  #endif
> >>  }
> >>  
> >> -void __ref xen_hvm_init_shared_info(void)
> >> +#ifdef CONFIG_XEN_PVHVM
> >> +#define HVM_SHARED_INFO_ADDR 0xFE700000UL
> >> +static struct shared_info *xen_hvm_shared_info;
> >> +static unsigned long xen_hvm_sip_phys;
> >> +static int xen_major, xen_minor;
> >> +
> >> +static void xen_hvm_connect_shared_info(unsigned long pfn)
> >>  {
> >> -	int cpu;
> >>  	struct xen_add_to_physmap xatp;
> >> -	static struct shared_info *shared_info_page = 0;
> >>  
> >> -	if (!shared_info_page)
> >> -		shared_info_page = (struct shared_info *)
> >> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
> >>  	xatp.domid = DOMID_SELF;
> >>  	xatp.idx = 0;
> >>  	xatp.space = XENMAPSPACE_shared_info;
> >> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> >> +	xatp.gpfn = pfn;
> >>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
> >>  		BUG();
> >>  
> >> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
> >> +}
> >> +static void __init xen_hvm_set_shared_info(struct shared_info *sip)
> >> +{
> >> +	int cpu;
> >> +
> >> +	HYPERVISOR_shared_info = sip;
> >>  
> >>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
> >>  	 * page, we use it in the event channel upcall and in some pvclock
> >> @@ -1760,20 +1766,39 @@ void __ref xen_hvm_init_shared_info(void)
> >>  	}
> >>  }
> >>  
> >> -#ifdef CONFIG_XEN_PVHVM
> >> +/* Reconnect the shared_info pfn to a (new) mfn */
> >> +void xen_hvm_resume_shared_info(void)
> >> +{
> >> +	xen_hvm_connect_shared_info(xen_hvm_sip_phys >> PAGE_SHIFT);
> >> +	xen_hvm_set_shared_info(xen_hvm_shared_info);
> >> +}
> >> +
> >> +/* Xen tools prior to Xen 4 do not provide a E820_Reserved area for guest usage.
> >> + * On these old tools the shared info page will be placed in E820_Ram.
> >> + * Xen 4 provides a E820_Reserved area at 0xFC000000, and this code expects
> >> + * that nothing is mapped up to HVM_SHARED_INFO_ADDR.
> >> + * Xen 4.3+ provides an explicit 1MB area at HVM_SHARED_INFO_ADDR which is used
> >> + * here for the shared info page. */
> >> +static void __init xen_hvm_init_shared_info(void)
> >> +{
> >> +	if (xen_major < 4) {
> >> +		xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
> >> +		xen_hvm_sip_phys = __pa(xen_hvm_shared_info);
> >> +	} else {
> >> +		xen_hvm_sip_phys = HVM_SHARED_INFO_ADDR;
> >> +		set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_hvm_sip_phys);
> >> +		xen_hvm_shared_info =
> >> +		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
> >> +	}
> >> +	xen_hvm_resume_shared_info();
> >> +}
> >> +
> >>  static void __init init_hvm_pv_info(void)
> >>  {
> >> -	int major, minor;
> >> -	uint32_t eax, ebx, ecx, edx, pages, msr, base;
> >> +	uint32_t  ecx, edx, pages, msr, base;
> >>  	u64 pfn;
> >>  
> >>  	base = xen_cpuid_base();
> >> -	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
> >> -
> >> -	major = eax >> 16;
> >> -	minor = eax & 0xffff;
> >> -	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
> >> -
> >>  	cpuid(base + 2, &pages, &msr, &ecx, &edx);
> >>  
> >>  	pfn = __pa(hypercall_page);
> >> @@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
> >>  
> >>  static uint32_t __init xen_hvm_platform(void)
> >>  {
> >> +	uint32_t eax, ebx, ecx, edx, base;
> >> +
> >>  	if (xen_pv_domain())
> >>  		return 0;
> >>  
> >> -	return xen_cpuid_base();
> >> +	base = xen_cpuid_base();
> >> +	if (!base)
> >> +		return 0;
> >> +
> >> +	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
> >> +
> >> +	xen_major = eax >> 16;
> >> +	xen_minor = eax & 0xffff;
> >> +
> >> +	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
> >> +
> >> +	return 1;
> >>  }
> >>  
> >>  bool xen_hvm_need_lapic(void)
> >> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
> >> index c4df9db..8d6793e 100644
> >> --- a/arch/x86/xen/suspend.c
> >> +++ b/arch/x86/xen/suspend.c
> >> @@ -32,7 +32,7 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
> >>  {
> >>  #ifdef CONFIG_XEN_PVHVM
> >>  	int cpu;
> >> -	xen_hvm_init_shared_info();
> >> +	xen_hvm_resume_shared_info();
> >>  	xen_callback_vector();
> >>  	xen_unplug_emulated_devices();
> >>  	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
> >> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> >> index 97d8765..d083e82 100644
> >> --- a/arch/x86/xen/xen-ops.h
> >> +++ b/arch/x86/xen/xen-ops.h
> >> @@ -43,7 +43,7 @@ void xen_enable_syscall(void);
> >>  void xen_vcpu_restore(void);
> >>  
> >>  void xen_callback_vector(void);
> >> -void xen_hvm_init_shared_info(void);
> >> +void xen_hvm_resume_shared_info(void);
> >>  void xen_unplug_emulated_devices(void);
> >>  
> >>  void __init xen_build_dynamic_phys_to_machine(void);
> >> -- 
> >> 1.9.3
> >> 
> 
> -- 
>   Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 15:09   ` Konrad Rzeszutek Wilk
  2014-07-15 15:52     ` Vitaly Kuznetsov
@ 2014-07-15 15:52     ` Vitaly Kuznetsov
  2014-07-15 15:58       ` Konrad Rzeszutek Wilk
  2014-07-15 15:58       ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 15:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Tue, Jul 15, 2014 at 03:40:38PM +0200, Vitaly Kuznetsov wrote:
>> PVHVM guest requires special actions before kexec. Register specific
>> xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().
>> 
>
> This looks close to what I had sent as an RFC to you?
>

Yes, I stole that part from your "RFC: VCPU_reset_cpu_info" patch to
call xen_unmap_all_pirqs() on shutdown. I'm looking at SMP issues your
patches address as well.

>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/xen/enlighten.c | 9 +++++++++
>>  arch/x86/xen/smp.c       | 9 +++++++++
>>  arch/x86/xen/smp.h       | 1 +
>>  3 files changed, 19 insertions(+)
>> 
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index a11af62..8074e4a 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
>>  	.notifier_call	= xen_hvm_cpu_notify,
>>  };
>>  
>> +static void xen_pvhvm_kexec_shutdown(void)
>> +{
>> +	xen_kexec_shutdown();
>> +	native_machine_shutdown();
>> +}
>> +
>>  static void __init xen_hvm_guest_init(void)
>>  {
>>  	init_hvm_pv_info();
>> @@ -1849,6 +1855,9 @@ static void __init xen_hvm_guest_init(void)
>>  	x86_init.irqs.intr_init = xen_init_IRQ;
>>  	xen_hvm_init_time_ops();
>>  	xen_hvm_init_mmu_ops();
>> +#ifdef CONFIG_KEXEC
>> +	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
>> +#endif
>>  }
>>  
>>  static uint32_t __init xen_hvm_platform(void)
>> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>> index 7005974..35dcf39 100644
>> --- a/arch/x86/xen/smp.c
>> +++ b/arch/x86/xen/smp.c
>> @@ -18,6 +18,7 @@
>>  #include <linux/smp.h>
>>  #include <linux/irq_work.h>
>>  #include <linux/tick.h>
>> +#include <linux/kexec.h>
>>  
>>  #include <asm/paravirt.h>
>>  #include <asm/desc.h>
>> @@ -762,6 +763,14 @@ static void xen_hvm_cpu_die(unsigned int cpu)
>>  	native_cpu_die(cpu);
>>  }
>>  
>> +void xen_kexec_shutdown(void)
>> +{
>> +#ifdef CONFIG_KEXEC
>> +	if (!kexec_in_progress)
>> +		return;
>> +#endif
>> +}
>> +
>>  void __init xen_hvm_smp_init(void)
>>  {
>>  	if (!xen_have_vector_callback)
>> diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
>> index c7c2d89..1af0493 100644
>> --- a/arch/x86/xen/smp.h
>> +++ b/arch/x86/xen/smp.h
>> @@ -8,4 +8,5 @@ extern void xen_send_IPI_allbutself(int vector);
>>  extern void xen_send_IPI_all(int vector);
>>  extern void xen_send_IPI_self(int vector);
>>  
>> +extern void xen_kexec_shutdown(void);
>>  #endif
>> -- 
>> 1.9.3
>> 

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 15:09   ` Konrad Rzeszutek Wilk
@ 2014-07-15 15:52     ` Vitaly Kuznetsov
  2014-07-15 15:52     ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-15 15:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Tue, Jul 15, 2014 at 03:40:38PM +0200, Vitaly Kuznetsov wrote:
>> PVHVM guest requires special actions before kexec. Register specific
>> xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().
>> 
>
> This looks close to what I had sent as an RFC to you?
>

Yes, I stole that part from your "RFC: VCPU_reset_cpu_info" patch to
call xen_unmap_all_pirqs() on shutdown. I'm looking at SMP issues your
patches address as well.

>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/xen/enlighten.c | 9 +++++++++
>>  arch/x86/xen/smp.c       | 9 +++++++++
>>  arch/x86/xen/smp.h       | 1 +
>>  3 files changed, 19 insertions(+)
>> 
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index a11af62..8074e4a 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
>>  	.notifier_call	= xen_hvm_cpu_notify,
>>  };
>>  
>> +static void xen_pvhvm_kexec_shutdown(void)
>> +{
>> +	xen_kexec_shutdown();
>> +	native_machine_shutdown();
>> +}
>> +
>>  static void __init xen_hvm_guest_init(void)
>>  {
>>  	init_hvm_pv_info();
>> @@ -1849,6 +1855,9 @@ static void __init xen_hvm_guest_init(void)
>>  	x86_init.irqs.intr_init = xen_init_IRQ;
>>  	xen_hvm_init_time_ops();
>>  	xen_hvm_init_mmu_ops();
>> +#ifdef CONFIG_KEXEC
>> +	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
>> +#endif
>>  }
>>  
>>  static uint32_t __init xen_hvm_platform(void)
>> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>> index 7005974..35dcf39 100644
>> --- a/arch/x86/xen/smp.c
>> +++ b/arch/x86/xen/smp.c
>> @@ -18,6 +18,7 @@
>>  #include <linux/smp.h>
>>  #include <linux/irq_work.h>
>>  #include <linux/tick.h>
>> +#include <linux/kexec.h>
>>  
>>  #include <asm/paravirt.h>
>>  #include <asm/desc.h>
>> @@ -762,6 +763,14 @@ static void xen_hvm_cpu_die(unsigned int cpu)
>>  	native_cpu_die(cpu);
>>  }
>>  
>> +void xen_kexec_shutdown(void)
>> +{
>> +#ifdef CONFIG_KEXEC
>> +	if (!kexec_in_progress)
>> +		return;
>> +#endif
>> +}
>> +
>>  void __init xen_hvm_smp_init(void)
>>  {
>>  	if (!xen_have_vector_callback)
>> diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
>> index c7c2d89..1af0493 100644
>> --- a/arch/x86/xen/smp.h
>> +++ b/arch/x86/xen/smp.h
>> @@ -8,4 +8,5 @@ extern void xen_send_IPI_allbutself(int vector);
>>  extern void xen_send_IPI_all(int vector);
>>  extern void xen_send_IPI_self(int vector);
>>  
>> +extern void xen_kexec_shutdown(void);
>>  #endif
>> -- 
>> 1.9.3
>> 

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 15:52     ` Vitaly Kuznetsov
  2014-07-15 15:58       ` Konrad Rzeszutek Wilk
@ 2014-07-15 15:58       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:58 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

On Tue, Jul 15, 2014 at 05:52:33PM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Tue, Jul 15, 2014 at 03:40:38PM +0200, Vitaly Kuznetsov wrote:
> >> PVHVM guest requires special actions before kexec. Register specific
> >> xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().
> >> 
> >
> > This looks close to what I had sent as an RFC to you?
> >
> 
> Yes, I stole that part from your "RFC: VCPU_reset_cpu_info" patch to
> call xen_unmap_all_pirqs() on shutdown. I'm looking at SMP issues your
> patches address as well.

It is customary to attribute that in the patch. Such as:

"Original idea/patch from Konrad Rzeszutek Wilk" is suffice.

Thanks!
> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/x86/xen/enlighten.c | 9 +++++++++
> >>  arch/x86/xen/smp.c       | 9 +++++++++
> >>  arch/x86/xen/smp.h       | 1 +
> >>  3 files changed, 19 insertions(+)
> >> 
> >> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> >> index a11af62..8074e4a 100644
> >> --- a/arch/x86/xen/enlighten.c
> >> +++ b/arch/x86/xen/enlighten.c
> >> @@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
> >>  	.notifier_call	= xen_hvm_cpu_notify,
> >>  };
> >>  
> >> +static void xen_pvhvm_kexec_shutdown(void)
> >> +{
> >> +	xen_kexec_shutdown();
> >> +	native_machine_shutdown();
> >> +}
> >> +
> >>  static void __init xen_hvm_guest_init(void)
> >>  {
> >>  	init_hvm_pv_info();
> >> @@ -1849,6 +1855,9 @@ static void __init xen_hvm_guest_init(void)
> >>  	x86_init.irqs.intr_init = xen_init_IRQ;
> >>  	xen_hvm_init_time_ops();
> >>  	xen_hvm_init_mmu_ops();
> >> +#ifdef CONFIG_KEXEC
> >> +	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
> >> +#endif
> >>  }
> >>  
> >>  static uint32_t __init xen_hvm_platform(void)
> >> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> >> index 7005974..35dcf39 100644
> >> --- a/arch/x86/xen/smp.c
> >> +++ b/arch/x86/xen/smp.c
> >> @@ -18,6 +18,7 @@
> >>  #include <linux/smp.h>
> >>  #include <linux/irq_work.h>
> >>  #include <linux/tick.h>
> >> +#include <linux/kexec.h>
> >>  
> >>  #include <asm/paravirt.h>
> >>  #include <asm/desc.h>
> >> @@ -762,6 +763,14 @@ static void xen_hvm_cpu_die(unsigned int cpu)
> >>  	native_cpu_die(cpu);
> >>  }
> >>  
> >> +void xen_kexec_shutdown(void)
> >> +{
> >> +#ifdef CONFIG_KEXEC
> >> +	if (!kexec_in_progress)
> >> +		return;
> >> +#endif
> >> +}
> >> +
> >>  void __init xen_hvm_smp_init(void)
> >>  {
> >>  	if (!xen_have_vector_callback)
> >> diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
> >> index c7c2d89..1af0493 100644
> >> --- a/arch/x86/xen/smp.h
> >> +++ b/arch/x86/xen/smp.h
> >> @@ -8,4 +8,5 @@ extern void xen_send_IPI_allbutself(int vector);
> >>  extern void xen_send_IPI_all(int vector);
> >>  extern void xen_send_IPI_self(int vector);
> >>  
> >> +extern void xen_kexec_shutdown(void);
> >>  #endif
> >> -- 
> >> 1.9.3
> >> 
> 
> -- 
>   Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 15:52     ` Vitaly Kuznetsov
@ 2014-07-15 15:58       ` Konrad Rzeszutek Wilk
  2014-07-15 15:58       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-15 15:58 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Tue, Jul 15, 2014 at 05:52:33PM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Tue, Jul 15, 2014 at 03:40:38PM +0200, Vitaly Kuznetsov wrote:
> >> PVHVM guest requires special actions before kexec. Register specific
> >> xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().
> >> 
> >
> > This looks close to what I had sent as an RFC to you?
> >
> 
> Yes, I stole that part from your "RFC: VCPU_reset_cpu_info" patch to
> call xen_unmap_all_pirqs() on shutdown. I'm looking at SMP issues your
> patches address as well.

It is customary to attribute that in the patch. Such as:

"Original idea/patch from Konrad Rzeszutek Wilk" is suffice.

Thanks!
> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/x86/xen/enlighten.c | 9 +++++++++
> >>  arch/x86/xen/smp.c       | 9 +++++++++
> >>  arch/x86/xen/smp.h       | 1 +
> >>  3 files changed, 19 insertions(+)
> >> 
> >> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> >> index a11af62..8074e4a 100644
> >> --- a/arch/x86/xen/enlighten.c
> >> +++ b/arch/x86/xen/enlighten.c
> >> @@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
> >>  	.notifier_call	= xen_hvm_cpu_notify,
> >>  };
> >>  
> >> +static void xen_pvhvm_kexec_shutdown(void)
> >> +{
> >> +	xen_kexec_shutdown();
> >> +	native_machine_shutdown();
> >> +}
> >> +
> >>  static void __init xen_hvm_guest_init(void)
> >>  {
> >>  	init_hvm_pv_info();
> >> @@ -1849,6 +1855,9 @@ static void __init xen_hvm_guest_init(void)
> >>  	x86_init.irqs.intr_init = xen_init_IRQ;
> >>  	xen_hvm_init_time_ops();
> >>  	xen_hvm_init_mmu_ops();
> >> +#ifdef CONFIG_KEXEC
> >> +	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
> >> +#endif
> >>  }
> >>  
> >>  static uint32_t __init xen_hvm_platform(void)
> >> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> >> index 7005974..35dcf39 100644
> >> --- a/arch/x86/xen/smp.c
> >> +++ b/arch/x86/xen/smp.c
> >> @@ -18,6 +18,7 @@
> >>  #include <linux/smp.h>
> >>  #include <linux/irq_work.h>
> >>  #include <linux/tick.h>
> >> +#include <linux/kexec.h>
> >>  
> >>  #include <asm/paravirt.h>
> >>  #include <asm/desc.h>
> >> @@ -762,6 +763,14 @@ static void xen_hvm_cpu_die(unsigned int cpu)
> >>  	native_cpu_die(cpu);
> >>  }
> >>  
> >> +void xen_kexec_shutdown(void)
> >> +{
> >> +#ifdef CONFIG_KEXEC
> >> +	if (!kexec_in_progress)
> >> +		return;
> >> +#endif
> >> +}
> >> +
> >>  void __init xen_hvm_smp_init(void)
> >>  {
> >>  	if (!xen_have_vector_callback)
> >> diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
> >> index c7c2d89..1af0493 100644
> >> --- a/arch/x86/xen/smp.h
> >> +++ b/arch/x86/xen/smp.h
> >> @@ -8,4 +8,5 @@ extern void xen_send_IPI_allbutself(int vector);
> >>  extern void xen_send_IPI_all(int vector);
> >>  extern void xen_send_IPI_self(int vector);
> >>  
> >> +extern void xen_kexec_shutdown(void);
> >>  #endif
> >> -- 
> >> 1.9.3
> >> 
> 
> -- 
>   Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-15 15:09   ` Konrad Rzeszutek Wilk
  2014-07-15 15:09   ` Konrad Rzeszutek Wilk
@ 2014-07-15 17:41   ` Boris Ostrovsky
  2014-07-15 17:41   ` Boris Ostrovsky
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Boris Ostrovsky @ 2014-07-15 17:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: Konrad Rzeszutek Wilk, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

On 07/15/2014 09:40 AM, Vitaly Kuznetsov wrote:
> PVHVM guest requires special actions before kexec. Register specific
> xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>   arch/x86/xen/enlighten.c | 9 +++++++++
>   arch/x86/xen/smp.c       | 9 +++++++++
>   arch/x86/xen/smp.h       | 1 +
>   3 files changed, 19 insertions(+)
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index a11af62..8074e4a 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
>   	.notifier_call	= xen_hvm_cpu_notify,
>   };
>   
> +static void xen_pvhvm_kexec_shutdown(void)
> +{
> +	xen_kexec_shutdown();
> +	native_machine_shutdown();
> +}
> +

This should probably be under ifdef CONFIG_KEXEC ?

>   static void __init xen_hvm_guest_init(void)
>   {
>   	init_hvm_pv_info();
> @@ -1849,6 +1855,9 @@ static void __init xen_hvm_guest_init(void)
>   	x86_init.irqs.intr_init = xen_init_IRQ;
>   	xen_hvm_init_time_ops();
>   	xen_hvm_init_mmu_ops();
> +#ifdef CONFIG_KEXEC
> +	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
> +#endif
>   }
>   
>   static uint32_t __init xen_hvm_platform(void)
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 7005974..35dcf39 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -18,6 +18,7 @@
>   #include <linux/smp.h>
>   #include <linux/irq_work.h>
>   #include <linux/tick.h>
> +#include <linux/kexec.h>
>   
>   #include <asm/paravirt.h>
>   #include <asm/desc.h>
> @@ -762,6 +763,14 @@ static void xen_hvm_cpu_die(unsigned int cpu)
>   	native_cpu_die(cpu);
>   }
>   
> +void xen_kexec_shutdown(void)
> +{
> +#ifdef CONFIG_KEXEC
> +	if (!kexec_in_progress)
> +		return;
> +#endif
> +}
> +

and this one too (whole routine)? And "#include" above as well.

>   void __init xen_hvm_smp_init(void)
>   {
>   	if (!xen_have_vector_callback)
> diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
> index c7c2d89..1af0493 100644
> --- a/arch/x86/xen/smp.h
> +++ b/arch/x86/xen/smp.h
> @@ -8,4 +8,5 @@ extern void xen_send_IPI_allbutself(int vector);
>   extern void xen_send_IPI_all(int vector);
>   extern void xen_send_IPI_self(int vector);
>   
> +extern void xen_kexec_shutdown(void);
>   #endif


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 13:40 ` Vitaly Kuznetsov
                     ` (2 preceding siblings ...)
  2014-07-15 17:41   ` Boris Ostrovsky
@ 2014-07-15 17:41   ` Boris Ostrovsky
  2014-07-28 13:36   ` David Vrabel
  2014-07-28 13:36   ` David Vrabel
  5 siblings, 0 replies; 77+ messages in thread
From: Boris Ostrovsky @ 2014-07-15 17:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: linux-kernel, Andrew Jones, Stefano Stabellini, David Vrabel

On 07/15/2014 09:40 AM, Vitaly Kuznetsov wrote:
> PVHVM guest requires special actions before kexec. Register specific
> xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>   arch/x86/xen/enlighten.c | 9 +++++++++
>   arch/x86/xen/smp.c       | 9 +++++++++
>   arch/x86/xen/smp.h       | 1 +
>   3 files changed, 19 insertions(+)
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index a11af62..8074e4a 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
>   	.notifier_call	= xen_hvm_cpu_notify,
>   };
>   
> +static void xen_pvhvm_kexec_shutdown(void)
> +{
> +	xen_kexec_shutdown();
> +	native_machine_shutdown();
> +}
> +

This should probably be under ifdef CONFIG_KEXEC ?

>   static void __init xen_hvm_guest_init(void)
>   {
>   	init_hvm_pv_info();
> @@ -1849,6 +1855,9 @@ static void __init xen_hvm_guest_init(void)
>   	x86_init.irqs.intr_init = xen_init_IRQ;
>   	xen_hvm_init_time_ops();
>   	xen_hvm_init_mmu_ops();
> +#ifdef CONFIG_KEXEC
> +	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
> +#endif
>   }
>   
>   static uint32_t __init xen_hvm_platform(void)
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 7005974..35dcf39 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -18,6 +18,7 @@
>   #include <linux/smp.h>
>   #include <linux/irq_work.h>
>   #include <linux/tick.h>
> +#include <linux/kexec.h>
>   
>   #include <asm/paravirt.h>
>   #include <asm/desc.h>
> @@ -762,6 +763,14 @@ static void xen_hvm_cpu_die(unsigned int cpu)
>   	native_cpu_die(cpu);
>   }
>   
> +void xen_kexec_shutdown(void)
> +{
> +#ifdef CONFIG_KEXEC
> +	if (!kexec_in_progress)
> +		return;
> +#endif
> +}
> +

and this one too (whole routine)? And "#include" above as well.

>   void __init xen_hvm_smp_init(void)
>   {
>   	if (!xen_have_vector_callback)
> diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
> index c7c2d89..1af0493 100644
> --- a/arch/x86/xen/smp.h
> +++ b/arch/x86/xen/smp.h
> @@ -8,4 +8,5 @@ extern void xen_send_IPI_allbutself(int vector);
>   extern void xen_send_IPI_all(int vector);
>   extern void xen_send_IPI_self(int vector);
>   
> +extern void xen_kexec_shutdown(void);
>   #endif

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-15 15:21   ` Konrad Rzeszutek Wilk
@ 2014-07-16  9:01     ` Vitaly Kuznetsov
  2014-07-16 13:40       ` Konrad Rzeszutek Wilk
  2014-07-16 13:40       ` Konrad Rzeszutek Wilk
  2014-07-16  9:01     ` Vitaly Kuznetsov
  1 sibling, 2 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-16  9:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: stefano.stabellini, xen-devel, Boris Ostrovsky, David Vrabel,
	Andrew Jones, linux-kernel

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
>> When kexec was peformed MSI IRQs for passthrough-ed devices were already
>> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
>> fails as we have no IRQ mapping information for that. Requesting for new
>> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
>> we don't recieve these IRQs.
>
> receive
>

Thanks for your comments!

> How come '__write_msi_msg' does not result in new MSI IRQs?
>

Actually that was the hidden question in my RFC :-)

Let me describe what I see. When normal boot is performed we have the
following in xen_hvm_setup_msi_irqs():

__read_msi_msg()
 pirq -> 0

then we allocate new pirq with
 pirq = xen_allocate_pirq_msi()
 pirq -> 54

and we have the following mapping:
xen: msi --> pirq=54 --> irq=72

in 'xl debug-keys i':
(XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),

After kexec we see the following:
__read_msi_msg()
 pirq -> 54

but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
 pirq = xen_allocate_pirq_msi()
 pirq -> 55

and we have the following mapping:
xen: msi --> pirq=55 --> irq=75

However (afaict) mapping in xen wasn't updated:

in 'xl debug-keys i':
(XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),

> Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
> from the device and updating the internal PIRQ<->IRQ code to match
> with the reality?
>

Yea, 'always trust the device'.

>> 
>> RFC: I wasn't able to understand why commit af42b8d1 which introduced
>> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
>> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
>> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
>> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
>
> We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
> At least that is how the commit you mentioned worked.
>

I meant to say that in case we have pirq > 0 from __read_msi_msg() but
xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
xen_allocate_pirq_msi() which brings us new pirq.

> In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
> that we might be called twice by a buggy driver. As such we want to check
> our PIRQ<->IRQ to figure this out.

But if we're called twice we'll see the same pirq, right? Or there are
some cases when we see 'crap' instead of pirq here?

I think it would be nice to use the same pirq after kexec instead of
allocating a new one even in case we can make remapping work.

Thanks for your comments again!

>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/pci/xen.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>> index 905956f..685e8f1 100644
>> --- a/arch/x86/pci/xen.c
>> +++ b/arch/x86/pci/xen.c
>> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>>  		__read_msi_msg(msidesc, &msg);
>>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
>> -		    xen_irq_from_pirq(pirq) < 0) {
>> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
>>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
>>  			if (pirq < 0) {
>>  				irq = -ENODEV;
>> -- 
>> 1.9.3
>> 

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-15 15:21   ` Konrad Rzeszutek Wilk
  2014-07-16  9:01     ` Vitaly Kuznetsov
@ 2014-07-16  9:01     ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-16  9:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Jones, stefano.stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
>> When kexec was peformed MSI IRQs for passthrough-ed devices were already
>> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
>> fails as we have no IRQ mapping information for that. Requesting for new
>> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
>> we don't recieve these IRQs.
>
> receive
>

Thanks for your comments!

> How come '__write_msi_msg' does not result in new MSI IRQs?
>

Actually that was the hidden question in my RFC :-)

Let me describe what I see. When normal boot is performed we have the
following in xen_hvm_setup_msi_irqs():

__read_msi_msg()
 pirq -> 0

then we allocate new pirq with
 pirq = xen_allocate_pirq_msi()
 pirq -> 54

and we have the following mapping:
xen: msi --> pirq=54 --> irq=72

in 'xl debug-keys i':
(XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),

After kexec we see the following:
__read_msi_msg()
 pirq -> 54

but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
 pirq = xen_allocate_pirq_msi()
 pirq -> 55

and we have the following mapping:
xen: msi --> pirq=55 --> irq=75

However (afaict) mapping in xen wasn't updated:

in 'xl debug-keys i':
(XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),

> Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
> from the device and updating the internal PIRQ<->IRQ code to match
> with the reality?
>

Yea, 'always trust the device'.

>> 
>> RFC: I wasn't able to understand why commit af42b8d1 which introduced
>> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
>> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
>> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
>> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
>
> We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
> At least that is how the commit you mentioned worked.
>

I meant to say that in case we have pirq > 0 from __read_msi_msg() but
xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
xen_allocate_pirq_msi() which brings us new pirq.

> In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
> that we might be called twice by a buggy driver. As such we want to check
> our PIRQ<->IRQ to figure this out.

But if we're called twice we'll see the same pirq, right? Or there are
some cases when we see 'crap' instead of pirq here?

I think it would be nice to use the same pirq after kexec instead of
allocating a new one even in case we can make remapping work.

Thanks for your comments again!

>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/pci/xen.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>> index 905956f..685e8f1 100644
>> --- a/arch/x86/pci/xen.c
>> +++ b/arch/x86/pci/xen.c
>> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>>  		__read_msi_msg(msidesc, &msg);
>>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
>> -		    xen_irq_from_pirq(pirq) < 0) {
>> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
>>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
>>  			if (pirq < 0) {
>>  				irq = -ENODEV;
>> -- 
>> 1.9.3
>> 

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-15 15:23   ` Konrad Rzeszutek Wilk
  2014-07-16  9:37     ` Vitaly Kuznetsov
@ 2014-07-16  9:37     ` Vitaly Kuznetsov
  2014-07-16 13:45       ` Konrad Rzeszutek Wilk
  2014-07-16 13:45       ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-16  9:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Tue, Jul 15, 2014 at 03:40:39PM +0200, Vitaly Kuznetsov wrote:
>> When kexec is being run PIRQs from Qemu-emulated devices are still
>> mapped to old event channels and new kernel has no information about
>> that. Trying to map them twice results in the following in Xen's dmesg:
>> 
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>>  ...
>> 
>>  and the following in new kernel's dmesg:
>> 
>>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>> 
>> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>> when kexec was requested and on every kernel startup. We need to do this
>> twice to deal with the following issues:
>> - startup-time unmapping is required to make kdump work;
>> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>>   is being performed).
>
> How does this work when you boot the guest under Xen 4.4 where the FIFO events
> are used? Does it still work correctly?

Thanks for pointing that out! I've checked and it doesn't. However
patches make no difference - guest kernel gets stuck on boot with and
without them. Will try to investigate...

>
> Thanks.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/xen/smp.c               |  1 +
>>  drivers/xen/events/events_base.c | 76 ++++++++++++++++++++++++++++++++++++++++
>>  include/xen/events.h             |  3 ++
>>  3 files changed, 80 insertions(+)
>> 
>> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>> index 35dcf39..e2b4deb 100644
>> --- a/arch/x86/xen/smp.c
>> +++ b/arch/x86/xen/smp.c
>> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
>>  #ifdef CONFIG_KEXEC
>>  	if (!kexec_in_progress)
>>  		return;
>> +	xen_unmap_all_pirqs();
>>  #endif
>>  }
>>  
>> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
>> index c919d3d..7701c7f 100644
>> --- a/drivers/xen/events/events_base.c
>> +++ b/drivers/xen/events/events_base.c
>> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
>>  static bool fifo_events = true;
>>  module_param(fifo_events, bool, 0);
>>  
>> +void xen_unmap_all_pirqs(void)
>> +{
>> +	int pirq, rc, gsi, irq, evtchn;
>> +	struct physdev_unmap_pirq unmap_irq;
>> +	struct irq_info *info;
>> +	struct evtchn_close close;
>> +
>> +	mutex_lock(&irq_mapping_update_lock);
>> +
>> +	list_for_each_entry(info, &xen_irq_list_head, list) {
>> +		if (info->type != IRQT_PIRQ)
>> +			continue;
>> +
>> +		pirq = info->u.pirq.pirq;
>> +		gsi = info->u.pirq.gsi;
>> +		evtchn = info->evtchn;
>> +		irq = info->irq;
>> +
>> +		pr_debug("unmapping pirq gsi=%d pirq=%d irq=%d evtchn=%d\n",
>> +			gsi, pirq, irq, evtchn);
>> +
>> +		if (evtchn > 0) {
>> +			close.port = evtchn;
>> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
>> +							&close) != 0)
>> +				pr_warn("close evtchn %d failed\n", evtchn);
>> +		}
>> +
>> +		unmap_irq.pirq = pirq;
>> +		unmap_irq.domid = DOMID_SELF;
>> +
>> +		rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
>> +		if (rc)
>> +			pr_warn("unmap pirq failed gsi=%d pirq=%d irq=%d rc=%d\n",
>> +				gsi, pirq, irq, rc);
>> +	}
>> +
>> +	mutex_unlock(&irq_mapping_update_lock);
>> +}
>> +EXPORT_SYMBOL_GPL(xen_unmap_all_pirqs);
>
> Why the EXPORT? Is this used by modules?
>> +
>> +static void xen_startup_unmap_pirqs(void)
>> +{
>> +	struct evtchn_status status;
>> +	int port, rc = -ENOENT;
>> +	struct physdev_unmap_pirq unmap_irq;
>> +	struct evtchn_close close;
>> +
>> +	memset(&status, 0, sizeof(status));
>> +	for (port = 0; port < xen_evtchn_max_channels(); port++) {
>> +		status.dom = DOMID_SELF;
>> +		status.port = port;
>> +		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
>> +		if (rc < 0)
>> +			continue;
>> +		if (status.status == EVTCHNSTAT_pirq) {
>> +			close.port = port;
>> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
>> +							&close) != 0)
>> +				pr_warn("xen: failed to close evtchn %d\n",
>> +					port);
>> +			unmap_irq.pirq = status.u.pirq;
>> +			unmap_irq.domid = DOMID_SELF;
>> +			pr_warn("xen: unmapping previously mapped pirq %d\n",
>> +				unmap_irq.pirq);
>> +			if (HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq,
>> +						  &unmap_irq) != 0)
>> +				pr_warn("xen: failed to unmap pirq %d\n",
>> +					unmap_irq.pirq);
>> +		}
>> +	}
>> +}
>> +
>> +
>>  void __init xen_init_IRQ(void)
>>  {
>>  	int ret = -EINVAL;
>> @@ -1671,6 +1745,8 @@ void __init xen_init_IRQ(void)
>>  		xen_callback_vector();
>>  
>>  	if (xen_hvm_domain()) {
>> +		xen_startup_unmap_pirqs();
>> +
>>  		native_init_IRQ();
>>  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
>>  		 * __acpi_register_gsi can point at the right function */
>> diff --git a/include/xen/events.h b/include/xen/events.h
>> index 8bee7a7..3f9f428 100644
>> --- a/include/xen/events.h
>> +++ b/include/xen/events.h
>> @@ -122,6 +122,9 @@ int xen_irq_from_gsi(unsigned gsi);
>>  /* Determine whether to ignore this IRQ if it is passed to a guest. */
>>  int xen_test_irq_shared(int irq);
>>  
>> +/* Unmap all PIRQs and close event channels */
>> +void xen_unmap_all_pirqs(void);
>> +
>>  /* initialize Xen IRQ subsystem */
>>  void xen_init_IRQ(void);
>>  #endif	/* _XEN_EVENTS_H */
>> -- 
>> 1.9.3
>> 

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-15 15:23   ` Konrad Rzeszutek Wilk
@ 2014-07-16  9:37     ` Vitaly Kuznetsov
  2014-07-16  9:37     ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-16  9:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Tue, Jul 15, 2014 at 03:40:39PM +0200, Vitaly Kuznetsov wrote:
>> When kexec is being run PIRQs from Qemu-emulated devices are still
>> mapped to old event channels and new kernel has no information about
>> that. Trying to map them twice results in the following in Xen's dmesg:
>> 
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>>  ...
>> 
>>  and the following in new kernel's dmesg:
>> 
>>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>> 
>> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>> when kexec was requested and on every kernel startup. We need to do this
>> twice to deal with the following issues:
>> - startup-time unmapping is required to make kdump work;
>> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>>   is being performed).
>
> How does this work when you boot the guest under Xen 4.4 where the FIFO events
> are used? Does it still work correctly?

Thanks for pointing that out! I've checked and it doesn't. However
patches make no difference - guest kernel gets stuck on boot with and
without them. Will try to investigate...

>
> Thanks.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/xen/smp.c               |  1 +
>>  drivers/xen/events/events_base.c | 76 ++++++++++++++++++++++++++++++++++++++++
>>  include/xen/events.h             |  3 ++
>>  3 files changed, 80 insertions(+)
>> 
>> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>> index 35dcf39..e2b4deb 100644
>> --- a/arch/x86/xen/smp.c
>> +++ b/arch/x86/xen/smp.c
>> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
>>  #ifdef CONFIG_KEXEC
>>  	if (!kexec_in_progress)
>>  		return;
>> +	xen_unmap_all_pirqs();
>>  #endif
>>  }
>>  
>> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
>> index c919d3d..7701c7f 100644
>> --- a/drivers/xen/events/events_base.c
>> +++ b/drivers/xen/events/events_base.c
>> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
>>  static bool fifo_events = true;
>>  module_param(fifo_events, bool, 0);
>>  
>> +void xen_unmap_all_pirqs(void)
>> +{
>> +	int pirq, rc, gsi, irq, evtchn;
>> +	struct physdev_unmap_pirq unmap_irq;
>> +	struct irq_info *info;
>> +	struct evtchn_close close;
>> +
>> +	mutex_lock(&irq_mapping_update_lock);
>> +
>> +	list_for_each_entry(info, &xen_irq_list_head, list) {
>> +		if (info->type != IRQT_PIRQ)
>> +			continue;
>> +
>> +		pirq = info->u.pirq.pirq;
>> +		gsi = info->u.pirq.gsi;
>> +		evtchn = info->evtchn;
>> +		irq = info->irq;
>> +
>> +		pr_debug("unmapping pirq gsi=%d pirq=%d irq=%d evtchn=%d\n",
>> +			gsi, pirq, irq, evtchn);
>> +
>> +		if (evtchn > 0) {
>> +			close.port = evtchn;
>> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
>> +							&close) != 0)
>> +				pr_warn("close evtchn %d failed\n", evtchn);
>> +		}
>> +
>> +		unmap_irq.pirq = pirq;
>> +		unmap_irq.domid = DOMID_SELF;
>> +
>> +		rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
>> +		if (rc)
>> +			pr_warn("unmap pirq failed gsi=%d pirq=%d irq=%d rc=%d\n",
>> +				gsi, pirq, irq, rc);
>> +	}
>> +
>> +	mutex_unlock(&irq_mapping_update_lock);
>> +}
>> +EXPORT_SYMBOL_GPL(xen_unmap_all_pirqs);
>
> Why the EXPORT? Is this used by modules?
>> +
>> +static void xen_startup_unmap_pirqs(void)
>> +{
>> +	struct evtchn_status status;
>> +	int port, rc = -ENOENT;
>> +	struct physdev_unmap_pirq unmap_irq;
>> +	struct evtchn_close close;
>> +
>> +	memset(&status, 0, sizeof(status));
>> +	for (port = 0; port < xen_evtchn_max_channels(); port++) {
>> +		status.dom = DOMID_SELF;
>> +		status.port = port;
>> +		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
>> +		if (rc < 0)
>> +			continue;
>> +		if (status.status == EVTCHNSTAT_pirq) {
>> +			close.port = port;
>> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
>> +							&close) != 0)
>> +				pr_warn("xen: failed to close evtchn %d\n",
>> +					port);
>> +			unmap_irq.pirq = status.u.pirq;
>> +			unmap_irq.domid = DOMID_SELF;
>> +			pr_warn("xen: unmapping previously mapped pirq %d\n",
>> +				unmap_irq.pirq);
>> +			if (HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq,
>> +						  &unmap_irq) != 0)
>> +				pr_warn("xen: failed to unmap pirq %d\n",
>> +					unmap_irq.pirq);
>> +		}
>> +	}
>> +}
>> +
>> +
>>  void __init xen_init_IRQ(void)
>>  {
>>  	int ret = -EINVAL;
>> @@ -1671,6 +1745,8 @@ void __init xen_init_IRQ(void)
>>  		xen_callback_vector();
>>  
>>  	if (xen_hvm_domain()) {
>> +		xen_startup_unmap_pirqs();
>> +
>>  		native_init_IRQ();
>>  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
>>  		 * __acpi_register_gsi can point at the right function */
>> diff --git a/include/xen/events.h b/include/xen/events.h
>> index 8bee7a7..3f9f428 100644
>> --- a/include/xen/events.h
>> +++ b/include/xen/events.h
>> @@ -122,6 +122,9 @@ int xen_irq_from_gsi(unsigned gsi);
>>  /* Determine whether to ignore this IRQ if it is passed to a guest. */
>>  int xen_test_irq_shared(int irq);
>>  
>> +/* Unmap all PIRQs and close event channels */
>> +void xen_unmap_all_pirqs(void);
>> +
>>  /* initialize Xen IRQ subsystem */
>>  void xen_init_IRQ(void);
>>  #endif	/* _XEN_EVENTS_H */
>> -- 
>> 1.9.3
>> 

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16  9:01     ` Vitaly Kuznetsov
@ 2014-07-16 13:40       ` Konrad Rzeszutek Wilk
  2014-07-16 17:20         ` Vitaly Kuznetsov
                           ` (3 more replies)
  2014-07-16 13:40       ` Konrad Rzeszutek Wilk
  1 sibling, 4 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-16 13:40 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: stefano.stabellini, xen-devel, Boris Ostrovsky, David Vrabel,
	Andrew Jones, linux-kernel

On Wed, Jul 16, 2014 at 11:01:55AM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
> >> When kexec was peformed MSI IRQs for passthrough-ed devices were already
> >> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
> >> fails as we have no IRQ mapping information for that. Requesting for new
> >> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
> >> we don't recieve these IRQs.
> >
> > receive
> >
> 
> Thanks for your comments!

Thank you for quick turnaround with the answers!
> 
> > How come '__write_msi_msg' does not result in new MSI IRQs?
> >
> 
> Actually that was the hidden question in my RFC :-)
> 
> Let me describe what I see. When normal boot is performed we have the
> following in xen_hvm_setup_msi_irqs():
> 
> __read_msi_msg()
>  pirq -> 0
> 
> then we allocate new pirq with
>  pirq = xen_allocate_pirq_msi()
>  pirq -> 54
> 
> and we have the following mapping:
> xen: msi --> pirq=54 --> irq=72
> 
> in 'xl debug-keys i':
> (XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),
> 
> After kexec we see the following:
> __read_msi_msg()
>  pirq -> 54
> 
> but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
>  pirq = xen_allocate_pirq_msi()
>  pirq -> 55
> 
> and we have the following mapping:
> xen: msi --> pirq=55 --> irq=75
> 
> However (afaict) mapping in xen wasn't updated:
> 
> in 'xl debug-keys i':
> (XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),

I am wondering if that is related to in QEMU traditional:

    qemu-xen-trad: free all the pirqs for msi/msix when driver unloads

(which in the upstream QEMU is 1d4fd4f0e2fc5dcae0c60e00cc9af95f52988050)

If you have that patch in, is the PIRQ value correctly updated?

> 
> > Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
> > from the device and updating the internal PIRQ<->IRQ code to match
> > with the reality?
> >
> 
> Yea, 'always trust the device'.
> 
> >> 
> >> RFC: I wasn't able to understand why commit af42b8d1 which introduced
> >> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
> >> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
> >> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
> >> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
> >
> > We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
> > At least that is how the commit you mentioned worked.
> >
> 
> I meant to say that in case we have pirq > 0 from __read_msi_msg() but
> xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
> xen_allocate_pirq_msi() which brings us new pirq.
> 
> > In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
> > that we might be called twice by a buggy driver. As such we want to check
> > our PIRQ<->IRQ to figure this out.
> 
> But if we're called twice we'll see the same pirq, right? Or there are

Good point.
> some cases when we see 'crap' instead of pirq here?

For PCI passthrough devices they will be zero until they are enabled.
But I am not sure about the emulated devices, such as e1000 or such, which
would also go through this path (I think - do we have MSI devices that
we emulate in QEMU?)

> 
> I think it would be nice to use the same pirq after kexec instead of
> allocating a new one even in case we can make remapping work.

I concur.

Stefano, do you recall why you used xen_irq_from_pirq instead of just
trusting the 'pirq' value? Was it to workaround broken QEMU?

> 
> Thanks for your comments again!
> 
> >> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/x86/pci/xen.c | 3 +--
> >>  1 file changed, 1 insertion(+), 2 deletions(-)
> >> 
> >> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> >> index 905956f..685e8f1 100644
> >> --- a/arch/x86/pci/xen.c
> >> +++ b/arch/x86/pci/xen.c
> >> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
> >>  		__read_msi_msg(msidesc, &msg);
> >>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
> >>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
> >> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
> >> -		    xen_irq_from_pirq(pirq) < 0) {
> >> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
> >>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
> >>  			if (pirq < 0) {
> >>  				irq = -ENODEV;
> >> -- 
> >> 1.9.3
> >> 
> 
> -- 
>   Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16  9:01     ` Vitaly Kuznetsov
  2014-07-16 13:40       ` Konrad Rzeszutek Wilk
@ 2014-07-16 13:40       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-16 13:40 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, stefano.stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Wed, Jul 16, 2014 at 11:01:55AM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
> >> When kexec was peformed MSI IRQs for passthrough-ed devices were already
> >> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
> >> fails as we have no IRQ mapping information for that. Requesting for new
> >> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
> >> we don't recieve these IRQs.
> >
> > receive
> >
> 
> Thanks for your comments!

Thank you for quick turnaround with the answers!
> 
> > How come '__write_msi_msg' does not result in new MSI IRQs?
> >
> 
> Actually that was the hidden question in my RFC :-)
> 
> Let me describe what I see. When normal boot is performed we have the
> following in xen_hvm_setup_msi_irqs():
> 
> __read_msi_msg()
>  pirq -> 0
> 
> then we allocate new pirq with
>  pirq = xen_allocate_pirq_msi()
>  pirq -> 54
> 
> and we have the following mapping:
> xen: msi --> pirq=54 --> irq=72
> 
> in 'xl debug-keys i':
> (XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),
> 
> After kexec we see the following:
> __read_msi_msg()
>  pirq -> 54
> 
> but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
>  pirq = xen_allocate_pirq_msi()
>  pirq -> 55
> 
> and we have the following mapping:
> xen: msi --> pirq=55 --> irq=75
> 
> However (afaict) mapping in xen wasn't updated:
> 
> in 'xl debug-keys i':
> (XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),

I am wondering if that is related to in QEMU traditional:

    qemu-xen-trad: free all the pirqs for msi/msix when driver unloads

(which in the upstream QEMU is 1d4fd4f0e2fc5dcae0c60e00cc9af95f52988050)

If you have that patch in, is the PIRQ value correctly updated?

> 
> > Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
> > from the device and updating the internal PIRQ<->IRQ code to match
> > with the reality?
> >
> 
> Yea, 'always trust the device'.
> 
> >> 
> >> RFC: I wasn't able to understand why commit af42b8d1 which introduced
> >> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
> >> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
> >> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
> >> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
> >
> > We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
> > At least that is how the commit you mentioned worked.
> >
> 
> I meant to say that in case we have pirq > 0 from __read_msi_msg() but
> xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
> xen_allocate_pirq_msi() which brings us new pirq.
> 
> > In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
> > that we might be called twice by a buggy driver. As such we want to check
> > our PIRQ<->IRQ to figure this out.
> 
> But if we're called twice we'll see the same pirq, right? Or there are

Good point.
> some cases when we see 'crap' instead of pirq here?

For PCI passthrough devices they will be zero until they are enabled.
But I am not sure about the emulated devices, such as e1000 or such, which
would also go through this path (I think - do we have MSI devices that
we emulate in QEMU?)

> 
> I think it would be nice to use the same pirq after kexec instead of
> allocating a new one even in case we can make remapping work.

I concur.

Stefano, do you recall why you used xen_irq_from_pirq instead of just
trusting the 'pirq' value? Was it to workaround broken QEMU?

> 
> Thanks for your comments again!
> 
> >> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/x86/pci/xen.c | 3 +--
> >>  1 file changed, 1 insertion(+), 2 deletions(-)
> >> 
> >> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> >> index 905956f..685e8f1 100644
> >> --- a/arch/x86/pci/xen.c
> >> +++ b/arch/x86/pci/xen.c
> >> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
> >>  		__read_msi_msg(msidesc, &msg);
> >>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
> >>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
> >> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
> >> -		    xen_irq_from_pirq(pirq) < 0) {
> >> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
> >>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
> >>  			if (pirq < 0) {
> >>  				irq = -ENODEV;
> >> -- 
> >> 1.9.3
> >> 
> 
> -- 
>   Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-16  9:37     ` Vitaly Kuznetsov
@ 2014-07-16 13:45       ` Konrad Rzeszutek Wilk
  2014-07-16 16:34         ` Vitaly Kuznetsov
  2014-07-16 16:34         ` Vitaly Kuznetsov
  2014-07-16 13:45       ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-16 13:45 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

On Wed, Jul 16, 2014 at 11:37:10AM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Tue, Jul 15, 2014 at 03:40:39PM +0200, Vitaly Kuznetsov wrote:
> >> When kexec is being run PIRQs from Qemu-emulated devices are still
> >> mapped to old event channels and new kernel has no information about
> >> that. Trying to map them twice results in the following in Xen's dmesg:
> >> 
> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
> >>  ...
> >> 
> >>  and the following in new kernel's dmesg:
> >> 
> >>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
> >> 
> >> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
> >> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
> >> when kexec was requested and on every kernel startup. We need to do this
> >> twice to deal with the following issues:
> >> - startup-time unmapping is required to make kdump work;
> >> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
> >> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
> >>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
> >>   is being performed).
> >
> > How does this work when you boot the guest under Xen 4.4 where the FIFO events
> > are used? Does it still work correctly?
> 
> Thanks for pointing that out! I've checked and it doesn't. However
> patches make no difference - guest kernel gets stuck on boot with and
> without them. Will try to investigate...

I think for FIFO events we can't do much right now - it would need some
new hypercalls to de-allocate or such.

But I was thinking that your code logic could just return out when
it detects that it is running with FIFO events (with a TODO comment)
- and also spit out some information to this effect?

Say: "Use xen.fifo=0 in your launching kernel"

(don't know the right name for the kernel in which you do 'kexec -e' in ?
Is that launching? Original? Bootstrap kernel?)

> 
> >
> > Thanks.
> >> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/x86/xen/smp.c               |  1 +
> >>  drivers/xen/events/events_base.c | 76 ++++++++++++++++++++++++++++++++++++++++
> >>  include/xen/events.h             |  3 ++
> >>  3 files changed, 80 insertions(+)
> >> 
> >> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> >> index 35dcf39..e2b4deb 100644
> >> --- a/arch/x86/xen/smp.c
> >> +++ b/arch/x86/xen/smp.c
> >> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
> >>  #ifdef CONFIG_KEXEC
> >>  	if (!kexec_in_progress)
> >>  		return;
> >> +	xen_unmap_all_pirqs();
> >>  #endif
> >>  }
> >>  
> >> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
> >> index c919d3d..7701c7f 100644
> >> --- a/drivers/xen/events/events_base.c
> >> +++ b/drivers/xen/events/events_base.c
> >> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
> >>  static bool fifo_events = true;
> >>  module_param(fifo_events, bool, 0);
> >>  
> >> +void xen_unmap_all_pirqs(void)
> >> +{
> >> +	int pirq, rc, gsi, irq, evtchn;
> >> +	struct physdev_unmap_pirq unmap_irq;
> >> +	struct irq_info *info;
> >> +	struct evtchn_close close;
> >> +
> >> +	mutex_lock(&irq_mapping_update_lock);
> >> +
> >> +	list_for_each_entry(info, &xen_irq_list_head, list) {
> >> +		if (info->type != IRQT_PIRQ)
> >> +			continue;
> >> +
> >> +		pirq = info->u.pirq.pirq;
> >> +		gsi = info->u.pirq.gsi;
> >> +		evtchn = info->evtchn;
> >> +		irq = info->irq;
> >> +
> >> +		pr_debug("unmapping pirq gsi=%d pirq=%d irq=%d evtchn=%d\n",
> >> +			gsi, pirq, irq, evtchn);
> >> +
> >> +		if (evtchn > 0) {
> >> +			close.port = evtchn;
> >> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
> >> +							&close) != 0)
> >> +				pr_warn("close evtchn %d failed\n", evtchn);
> >> +		}
> >> +
> >> +		unmap_irq.pirq = pirq;
> >> +		unmap_irq.domid = DOMID_SELF;
> >> +
> >> +		rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
> >> +		if (rc)
> >> +			pr_warn("unmap pirq failed gsi=%d pirq=%d irq=%d rc=%d\n",
> >> +				gsi, pirq, irq, rc);
> >> +	}
> >> +
> >> +	mutex_unlock(&irq_mapping_update_lock);
> >> +}
> >> +EXPORT_SYMBOL_GPL(xen_unmap_all_pirqs);
> >
> > Why the EXPORT? Is this used by modules?
> >> +
> >> +static void xen_startup_unmap_pirqs(void)
> >> +{
> >> +	struct evtchn_status status;
> >> +	int port, rc = -ENOENT;
> >> +	struct physdev_unmap_pirq unmap_irq;
> >> +	struct evtchn_close close;
> >> +
> >> +	memset(&status, 0, sizeof(status));
> >> +	for (port = 0; port < xen_evtchn_max_channels(); port++) {
> >> +		status.dom = DOMID_SELF;
> >> +		status.port = port;
> >> +		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
> >> +		if (rc < 0)
> >> +			continue;
> >> +		if (status.status == EVTCHNSTAT_pirq) {
> >> +			close.port = port;
> >> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
> >> +							&close) != 0)
> >> +				pr_warn("xen: failed to close evtchn %d\n",
> >> +					port);
> >> +			unmap_irq.pirq = status.u.pirq;
> >> +			unmap_irq.domid = DOMID_SELF;
> >> +			pr_warn("xen: unmapping previously mapped pirq %d\n",
> >> +				unmap_irq.pirq);
> >> +			if (HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq,
> >> +						  &unmap_irq) != 0)
> >> +				pr_warn("xen: failed to unmap pirq %d\n",
> >> +					unmap_irq.pirq);
> >> +		}
> >> +	}
> >> +}
> >> +
> >> +
> >>  void __init xen_init_IRQ(void)
> >>  {
> >>  	int ret = -EINVAL;
> >> @@ -1671,6 +1745,8 @@ void __init xen_init_IRQ(void)
> >>  		xen_callback_vector();
> >>  
> >>  	if (xen_hvm_domain()) {
> >> +		xen_startup_unmap_pirqs();
> >> +
> >>  		native_init_IRQ();
> >>  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
> >>  		 * __acpi_register_gsi can point at the right function */
> >> diff --git a/include/xen/events.h b/include/xen/events.h
> >> index 8bee7a7..3f9f428 100644
> >> --- a/include/xen/events.h
> >> +++ b/include/xen/events.h
> >> @@ -122,6 +122,9 @@ int xen_irq_from_gsi(unsigned gsi);
> >>  /* Determine whether to ignore this IRQ if it is passed to a guest. */
> >>  int xen_test_irq_shared(int irq);
> >>  
> >> +/* Unmap all PIRQs and close event channels */
> >> +void xen_unmap_all_pirqs(void);
> >> +
> >>  /* initialize Xen IRQ subsystem */
> >>  void xen_init_IRQ(void);
> >>  #endif	/* _XEN_EVENTS_H */
> >> -- 
> >> 1.9.3
> >> 
> 
> -- 
>   Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-16  9:37     ` Vitaly Kuznetsov
  2014-07-16 13:45       ` Konrad Rzeszutek Wilk
@ 2014-07-16 13:45       ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-16 13:45 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Wed, Jul 16, 2014 at 11:37:10AM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Tue, Jul 15, 2014 at 03:40:39PM +0200, Vitaly Kuznetsov wrote:
> >> When kexec is being run PIRQs from Qemu-emulated devices are still
> >> mapped to old event channels and new kernel has no information about
> >> that. Trying to map them twice results in the following in Xen's dmesg:
> >> 
> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
> >>  ...
> >> 
> >>  and the following in new kernel's dmesg:
> >> 
> >>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
> >> 
> >> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
> >> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
> >> when kexec was requested and on every kernel startup. We need to do this
> >> twice to deal with the following issues:
> >> - startup-time unmapping is required to make kdump work;
> >> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
> >> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
> >>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
> >>   is being performed).
> >
> > How does this work when you boot the guest under Xen 4.4 where the FIFO events
> > are used? Does it still work correctly?
> 
> Thanks for pointing that out! I've checked and it doesn't. However
> patches make no difference - guest kernel gets stuck on boot with and
> without them. Will try to investigate...

I think for FIFO events we can't do much right now - it would need some
new hypercalls to de-allocate or such.

But I was thinking that your code logic could just return out when
it detects that it is running with FIFO events (with a TODO comment)
- and also spit out some information to this effect?

Say: "Use xen.fifo=0 in your launching kernel"

(don't know the right name for the kernel in which you do 'kexec -e' in ?
Is that launching? Original? Bootstrap kernel?)

> 
> >
> > Thanks.
> >> 
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  arch/x86/xen/smp.c               |  1 +
> >>  drivers/xen/events/events_base.c | 76 ++++++++++++++++++++++++++++++++++++++++
> >>  include/xen/events.h             |  3 ++
> >>  3 files changed, 80 insertions(+)
> >> 
> >> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> >> index 35dcf39..e2b4deb 100644
> >> --- a/arch/x86/xen/smp.c
> >> +++ b/arch/x86/xen/smp.c
> >> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
> >>  #ifdef CONFIG_KEXEC
> >>  	if (!kexec_in_progress)
> >>  		return;
> >> +	xen_unmap_all_pirqs();
> >>  #endif
> >>  }
> >>  
> >> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
> >> index c919d3d..7701c7f 100644
> >> --- a/drivers/xen/events/events_base.c
> >> +++ b/drivers/xen/events/events_base.c
> >> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
> >>  static bool fifo_events = true;
> >>  module_param(fifo_events, bool, 0);
> >>  
> >> +void xen_unmap_all_pirqs(void)
> >> +{
> >> +	int pirq, rc, gsi, irq, evtchn;
> >> +	struct physdev_unmap_pirq unmap_irq;
> >> +	struct irq_info *info;
> >> +	struct evtchn_close close;
> >> +
> >> +	mutex_lock(&irq_mapping_update_lock);
> >> +
> >> +	list_for_each_entry(info, &xen_irq_list_head, list) {
> >> +		if (info->type != IRQT_PIRQ)
> >> +			continue;
> >> +
> >> +		pirq = info->u.pirq.pirq;
> >> +		gsi = info->u.pirq.gsi;
> >> +		evtchn = info->evtchn;
> >> +		irq = info->irq;
> >> +
> >> +		pr_debug("unmapping pirq gsi=%d pirq=%d irq=%d evtchn=%d\n",
> >> +			gsi, pirq, irq, evtchn);
> >> +
> >> +		if (evtchn > 0) {
> >> +			close.port = evtchn;
> >> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
> >> +							&close) != 0)
> >> +				pr_warn("close evtchn %d failed\n", evtchn);
> >> +		}
> >> +
> >> +		unmap_irq.pirq = pirq;
> >> +		unmap_irq.domid = DOMID_SELF;
> >> +
> >> +		rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
> >> +		if (rc)
> >> +			pr_warn("unmap pirq failed gsi=%d pirq=%d irq=%d rc=%d\n",
> >> +				gsi, pirq, irq, rc);
> >> +	}
> >> +
> >> +	mutex_unlock(&irq_mapping_update_lock);
> >> +}
> >> +EXPORT_SYMBOL_GPL(xen_unmap_all_pirqs);
> >
> > Why the EXPORT? Is this used by modules?
> >> +
> >> +static void xen_startup_unmap_pirqs(void)
> >> +{
> >> +	struct evtchn_status status;
> >> +	int port, rc = -ENOENT;
> >> +	struct physdev_unmap_pirq unmap_irq;
> >> +	struct evtchn_close close;
> >> +
> >> +	memset(&status, 0, sizeof(status));
> >> +	for (port = 0; port < xen_evtchn_max_channels(); port++) {
> >> +		status.dom = DOMID_SELF;
> >> +		status.port = port;
> >> +		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
> >> +		if (rc < 0)
> >> +			continue;
> >> +		if (status.status == EVTCHNSTAT_pirq) {
> >> +			close.port = port;
> >> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
> >> +							&close) != 0)
> >> +				pr_warn("xen: failed to close evtchn %d\n",
> >> +					port);
> >> +			unmap_irq.pirq = status.u.pirq;
> >> +			unmap_irq.domid = DOMID_SELF;
> >> +			pr_warn("xen: unmapping previously mapped pirq %d\n",
> >> +				unmap_irq.pirq);
> >> +			if (HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq,
> >> +						  &unmap_irq) != 0)
> >> +				pr_warn("xen: failed to unmap pirq %d\n",
> >> +					unmap_irq.pirq);
> >> +		}
> >> +	}
> >> +}
> >> +
> >> +
> >>  void __init xen_init_IRQ(void)
> >>  {
> >>  	int ret = -EINVAL;
> >> @@ -1671,6 +1745,8 @@ void __init xen_init_IRQ(void)
> >>  		xen_callback_vector();
> >>  
> >>  	if (xen_hvm_domain()) {
> >> +		xen_startup_unmap_pirqs();
> >> +
> >>  		native_init_IRQ();
> >>  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
> >>  		 * __acpi_register_gsi can point at the right function */
> >> diff --git a/include/xen/events.h b/include/xen/events.h
> >> index 8bee7a7..3f9f428 100644
> >> --- a/include/xen/events.h
> >> +++ b/include/xen/events.h
> >> @@ -122,6 +122,9 @@ int xen_irq_from_gsi(unsigned gsi);
> >>  /* Determine whether to ignore this IRQ if it is passed to a guest. */
> >>  int xen_test_irq_shared(int irq);
> >>  
> >> +/* Unmap all PIRQs and close event channels */
> >> +void xen_unmap_all_pirqs(void);
> >> +
> >>  /* initialize Xen IRQ subsystem */
> >>  void xen_init_IRQ(void);
> >>  #endif	/* _XEN_EVENTS_H */
> >> -- 
> >> 1.9.3
> >> 
> 
> -- 
>   Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-16 13:45       ` Konrad Rzeszutek Wilk
@ 2014-07-16 16:34         ` Vitaly Kuznetsov
  2014-07-16 16:34         ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-16 16:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Wed, Jul 16, 2014 at 11:37:10AM +0200, Vitaly Kuznetsov wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> 
>> > On Tue, Jul 15, 2014 at 03:40:39PM +0200, Vitaly Kuznetsov wrote:
>> >> When kexec is being run PIRQs from Qemu-emulated devices are still
>> >> mapped to old event channels and new kernel has no information about
>> >> that. Trying to map them twice results in the following in Xen's dmesg:
>> >> 
>> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>> >>  ...
>> >> 
>> >>  and the following in new kernel's dmesg:
>> >> 
>> >>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>> >> 
>> >> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>> >> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>> >> when kexec was requested and on every kernel startup. We need to do this
>> >> twice to deal with the following issues:
>> >> - startup-time unmapping is required to make kdump work;
>> >> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>> >> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>> >>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>> >>   is being performed).
>> >
>> > How does this work when you boot the guest under Xen 4.4 where the FIFO events
>> > are used? Does it still work correctly?
>> 
>> Thanks for pointing that out! I've checked and it doesn't. However
>> patches make no difference - guest kernel gets stuck on boot with and
>> without them. Will try to investigate...
>
> I think for FIFO events we can't do much right now - it would need some
> new hypercalls to de-allocate or such.
>

Yeah, you're probably right. I tried wrapping evtchn_fifo_destroy() into
'EVTCHNOP_fifo_destroy' hypercall but it seems some other actions are
required as well..

> But I was thinking that your code logic could just return out when
> it detects that it is running with FIFO events (with a TODO comment)
> - and also spit out some information to this effect?

Sure, having TODO here is a good idea.

>
> Say: "Use xen.fifo=0 in your launching kernel"

s,xen.fifo,xen.fifo_events,

>
> (don't know the right name for the kernel in which you do 'kexec -e' in ?
> Is that launching? Original? Bootstrap kernel?)

Yes, if under Xen-4.4 I boot original kernel with "xen.fifo_events=0"
I'm able to do kexec with "xen.fifo_events=0" and even without it (but
only once :-). Once kernel is booted with FIFO-based event channels
enabled no kexec is possible, new kernel gets stuck (I guess vcpuop
timer doesn't work..). My patch series brings no difference here..

Thanks,

>
>> 
>> >
>> > Thanks.
>> >> 
>> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> >> ---
>> >>  arch/x86/xen/smp.c               |  1 +
>> >>  drivers/xen/events/events_base.c | 76 ++++++++++++++++++++++++++++++++++++++++
>> >>  include/xen/events.h             |  3 ++
>> >>  3 files changed, 80 insertions(+)
>> >> 
>> >> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>> >> index 35dcf39..e2b4deb 100644
>> >> --- a/arch/x86/xen/smp.c
>> >> +++ b/arch/x86/xen/smp.c
>> >> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
>> >>  #ifdef CONFIG_KEXEC
>> >>  	if (!kexec_in_progress)
>> >>  		return;
>> >> +	xen_unmap_all_pirqs();
>> >>  #endif
>> >>  }
>> >>  
>> >> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
>> >> index c919d3d..7701c7f 100644
>> >> --- a/drivers/xen/events/events_base.c
>> >> +++ b/drivers/xen/events/events_base.c
>> >> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
>> >>  static bool fifo_events = true;
>> >>  module_param(fifo_events, bool, 0);
>> >>  
>> >> +void xen_unmap_all_pirqs(void)
>> >> +{
>> >> +	int pirq, rc, gsi, irq, evtchn;
>> >> +	struct physdev_unmap_pirq unmap_irq;
>> >> +	struct irq_info *info;
>> >> +	struct evtchn_close close;
>> >> +
>> >> +	mutex_lock(&irq_mapping_update_lock);
>> >> +
>> >> +	list_for_each_entry(info, &xen_irq_list_head, list) {
>> >> +		if (info->type != IRQT_PIRQ)
>> >> +			continue;
>> >> +
>> >> +		pirq = info->u.pirq.pirq;
>> >> +		gsi = info->u.pirq.gsi;
>> >> +		evtchn = info->evtchn;
>> >> +		irq = info->irq;
>> >> +
>> >> +		pr_debug("unmapping pirq gsi=%d pirq=%d irq=%d evtchn=%d\n",
>> >> +			gsi, pirq, irq, evtchn);
>> >> +
>> >> +		if (evtchn > 0) {
>> >> +			close.port = evtchn;
>> >> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
>> >> +							&close) != 0)
>> >> +				pr_warn("close evtchn %d failed\n", evtchn);
>> >> +		}
>> >> +
>> >> +		unmap_irq.pirq = pirq;
>> >> +		unmap_irq.domid = DOMID_SELF;
>> >> +
>> >> +		rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
>> >> +		if (rc)
>> >> +			pr_warn("unmap pirq failed gsi=%d pirq=%d irq=%d rc=%d\n",
>> >> +				gsi, pirq, irq, rc);
>> >> +	}
>> >> +
>> >> +	mutex_unlock(&irq_mapping_update_lock);
>> >> +}
>> >> +EXPORT_SYMBOL_GPL(xen_unmap_all_pirqs);
>> >
>> > Why the EXPORT? Is this used by modules?
>> >> +
>> >> +static void xen_startup_unmap_pirqs(void)
>> >> +{
>> >> +	struct evtchn_status status;
>> >> +	int port, rc = -ENOENT;
>> >> +	struct physdev_unmap_pirq unmap_irq;
>> >> +	struct evtchn_close close;
>> >> +
>> >> +	memset(&status, 0, sizeof(status));
>> >> +	for (port = 0; port < xen_evtchn_max_channels(); port++) {
>> >> +		status.dom = DOMID_SELF;
>> >> +		status.port = port;
>> >> +		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
>> >> +		if (rc < 0)
>> >> +			continue;
>> >> +		if (status.status == EVTCHNSTAT_pirq) {
>> >> +			close.port = port;
>> >> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
>> >> +							&close) != 0)
>> >> +				pr_warn("xen: failed to close evtchn %d\n",
>> >> +					port);
>> >> +			unmap_irq.pirq = status.u.pirq;
>> >> +			unmap_irq.domid = DOMID_SELF;
>> >> +			pr_warn("xen: unmapping previously mapped pirq %d\n",
>> >> +				unmap_irq.pirq);
>> >> +			if (HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq,
>> >> +						  &unmap_irq) != 0)
>> >> +				pr_warn("xen: failed to unmap pirq %d\n",
>> >> +					unmap_irq.pirq);
>> >> +		}
>> >> +	}
>> >> +}
>> >> +
>> >> +
>> >>  void __init xen_init_IRQ(void)
>> >>  {
>> >>  	int ret = -EINVAL;
>> >> @@ -1671,6 +1745,8 @@ void __init xen_init_IRQ(void)
>> >>  		xen_callback_vector();
>> >>  
>> >>  	if (xen_hvm_domain()) {
>> >> +		xen_startup_unmap_pirqs();
>> >> +
>> >>  		native_init_IRQ();
>> >>  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
>> >>  		 * __acpi_register_gsi can point at the right function */
>> >> diff --git a/include/xen/events.h b/include/xen/events.h
>> >> index 8bee7a7..3f9f428 100644
>> >> --- a/include/xen/events.h
>> >> +++ b/include/xen/events.h
>> >> @@ -122,6 +122,9 @@ int xen_irq_from_gsi(unsigned gsi);
>> >>  /* Determine whether to ignore this IRQ if it is passed to a guest. */
>> >>  int xen_test_irq_shared(int irq);
>> >>  
>> >> +/* Unmap all PIRQs and close event channels */
>> >> +void xen_unmap_all_pirqs(void);
>> >> +
>> >>  /* initialize Xen IRQ subsystem */
>> >>  void xen_init_IRQ(void);
>> >>  #endif	/* _XEN_EVENTS_H */
>> >> -- 
>> >> 1.9.3
>> >> 
>> 
>> -- 
>>   Vitaly

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-16 13:45       ` Konrad Rzeszutek Wilk
  2014-07-16 16:34         ` Vitaly Kuznetsov
@ 2014-07-16 16:34         ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-16 16:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Wed, Jul 16, 2014 at 11:37:10AM +0200, Vitaly Kuznetsov wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> 
>> > On Tue, Jul 15, 2014 at 03:40:39PM +0200, Vitaly Kuznetsov wrote:
>> >> When kexec is being run PIRQs from Qemu-emulated devices are still
>> >> mapped to old event channels and new kernel has no information about
>> >> that. Trying to map them twice results in the following in Xen's dmesg:
>> >> 
>> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>> >>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>> >>  ...
>> >> 
>> >>  and the following in new kernel's dmesg:
>> >> 
>> >>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>> >> 
>> >> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>> >> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>> >> when kexec was requested and on every kernel startup. We need to do this
>> >> twice to deal with the following issues:
>> >> - startup-time unmapping is required to make kdump work;
>> >> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>> >> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>> >>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>> >>   is being performed).
>> >
>> > How does this work when you boot the guest under Xen 4.4 where the FIFO events
>> > are used? Does it still work correctly?
>> 
>> Thanks for pointing that out! I've checked and it doesn't. However
>> patches make no difference - guest kernel gets stuck on boot with and
>> without them. Will try to investigate...
>
> I think for FIFO events we can't do much right now - it would need some
> new hypercalls to de-allocate or such.
>

Yeah, you're probably right. I tried wrapping evtchn_fifo_destroy() into
'EVTCHNOP_fifo_destroy' hypercall but it seems some other actions are
required as well..

> But I was thinking that your code logic could just return out when
> it detects that it is running with FIFO events (with a TODO comment)
> - and also spit out some information to this effect?

Sure, having TODO here is a good idea.

>
> Say: "Use xen.fifo=0 in your launching kernel"

s,xen.fifo,xen.fifo_events,

>
> (don't know the right name for the kernel in which you do 'kexec -e' in ?
> Is that launching? Original? Bootstrap kernel?)

Yes, if under Xen-4.4 I boot original kernel with "xen.fifo_events=0"
I'm able to do kexec with "xen.fifo_events=0" and even without it (but
only once :-). Once kernel is booted with FIFO-based event channels
enabled no kexec is possible, new kernel gets stuck (I guess vcpuop
timer doesn't work..). My patch series brings no difference here..

Thanks,

>
>> 
>> >
>> > Thanks.
>> >> 
>> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> >> ---
>> >>  arch/x86/xen/smp.c               |  1 +
>> >>  drivers/xen/events/events_base.c | 76 ++++++++++++++++++++++++++++++++++++++++
>> >>  include/xen/events.h             |  3 ++
>> >>  3 files changed, 80 insertions(+)
>> >> 
>> >> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>> >> index 35dcf39..e2b4deb 100644
>> >> --- a/arch/x86/xen/smp.c
>> >> +++ b/arch/x86/xen/smp.c
>> >> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
>> >>  #ifdef CONFIG_KEXEC
>> >>  	if (!kexec_in_progress)
>> >>  		return;
>> >> +	xen_unmap_all_pirqs();
>> >>  #endif
>> >>  }
>> >>  
>> >> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
>> >> index c919d3d..7701c7f 100644
>> >> --- a/drivers/xen/events/events_base.c
>> >> +++ b/drivers/xen/events/events_base.c
>> >> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
>> >>  static bool fifo_events = true;
>> >>  module_param(fifo_events, bool, 0);
>> >>  
>> >> +void xen_unmap_all_pirqs(void)
>> >> +{
>> >> +	int pirq, rc, gsi, irq, evtchn;
>> >> +	struct physdev_unmap_pirq unmap_irq;
>> >> +	struct irq_info *info;
>> >> +	struct evtchn_close close;
>> >> +
>> >> +	mutex_lock(&irq_mapping_update_lock);
>> >> +
>> >> +	list_for_each_entry(info, &xen_irq_list_head, list) {
>> >> +		if (info->type != IRQT_PIRQ)
>> >> +			continue;
>> >> +
>> >> +		pirq = info->u.pirq.pirq;
>> >> +		gsi = info->u.pirq.gsi;
>> >> +		evtchn = info->evtchn;
>> >> +		irq = info->irq;
>> >> +
>> >> +		pr_debug("unmapping pirq gsi=%d pirq=%d irq=%d evtchn=%d\n",
>> >> +			gsi, pirq, irq, evtchn);
>> >> +
>> >> +		if (evtchn > 0) {
>> >> +			close.port = evtchn;
>> >> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
>> >> +							&close) != 0)
>> >> +				pr_warn("close evtchn %d failed\n", evtchn);
>> >> +		}
>> >> +
>> >> +		unmap_irq.pirq = pirq;
>> >> +		unmap_irq.domid = DOMID_SELF;
>> >> +
>> >> +		rc = HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq);
>> >> +		if (rc)
>> >> +			pr_warn("unmap pirq failed gsi=%d pirq=%d irq=%d rc=%d\n",
>> >> +				gsi, pirq, irq, rc);
>> >> +	}
>> >> +
>> >> +	mutex_unlock(&irq_mapping_update_lock);
>> >> +}
>> >> +EXPORT_SYMBOL_GPL(xen_unmap_all_pirqs);
>> >
>> > Why the EXPORT? Is this used by modules?
>> >> +
>> >> +static void xen_startup_unmap_pirqs(void)
>> >> +{
>> >> +	struct evtchn_status status;
>> >> +	int port, rc = -ENOENT;
>> >> +	struct physdev_unmap_pirq unmap_irq;
>> >> +	struct evtchn_close close;
>> >> +
>> >> +	memset(&status, 0, sizeof(status));
>> >> +	for (port = 0; port < xen_evtchn_max_channels(); port++) {
>> >> +		status.dom = DOMID_SELF;
>> >> +		status.port = port;
>> >> +		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
>> >> +		if (rc < 0)
>> >> +			continue;
>> >> +		if (status.status == EVTCHNSTAT_pirq) {
>> >> +			close.port = port;
>> >> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close,
>> >> +							&close) != 0)
>> >> +				pr_warn("xen: failed to close evtchn %d\n",
>> >> +					port);
>> >> +			unmap_irq.pirq = status.u.pirq;
>> >> +			unmap_irq.domid = DOMID_SELF;
>> >> +			pr_warn("xen: unmapping previously mapped pirq %d\n",
>> >> +				unmap_irq.pirq);
>> >> +			if (HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq,
>> >> +						  &unmap_irq) != 0)
>> >> +				pr_warn("xen: failed to unmap pirq %d\n",
>> >> +					unmap_irq.pirq);
>> >> +		}
>> >> +	}
>> >> +}
>> >> +
>> >> +
>> >>  void __init xen_init_IRQ(void)
>> >>  {
>> >>  	int ret = -EINVAL;
>> >> @@ -1671,6 +1745,8 @@ void __init xen_init_IRQ(void)
>> >>  		xen_callback_vector();
>> >>  
>> >>  	if (xen_hvm_domain()) {
>> >> +		xen_startup_unmap_pirqs();
>> >> +
>> >>  		native_init_IRQ();
>> >>  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
>> >>  		 * __acpi_register_gsi can point at the right function */
>> >> diff --git a/include/xen/events.h b/include/xen/events.h
>> >> index 8bee7a7..3f9f428 100644
>> >> --- a/include/xen/events.h
>> >> +++ b/include/xen/events.h
>> >> @@ -122,6 +122,9 @@ int xen_irq_from_gsi(unsigned gsi);
>> >>  /* Determine whether to ignore this IRQ if it is passed to a guest. */
>> >>  int xen_test_irq_shared(int irq);
>> >>  
>> >> +/* Unmap all PIRQs and close event channels */
>> >> +void xen_unmap_all_pirqs(void);
>> >> +
>> >>  /* initialize Xen IRQ subsystem */
>> >>  void xen_init_IRQ(void);
>> >>  #endif	/* _XEN_EVENTS_H */
>> >> -- 
>> >> 1.9.3
>> >> 
>> 
>> -- 
>>   Vitaly

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16 13:40       ` Konrad Rzeszutek Wilk
  2014-07-16 17:20         ` Vitaly Kuznetsov
@ 2014-07-16 17:20         ` Vitaly Kuznetsov
  2014-07-16 17:30           ` Konrad Rzeszutek Wilk
                             ` (3 more replies)
  2014-07-21 14:13         ` Stefano Stabellini
  2014-07-21 14:13         ` Stefano Stabellini
  3 siblings, 4 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-16 17:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: stefano.stabellini, xen-devel, Boris Ostrovsky, David Vrabel,
	Andrew Jones, linux-kernel

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Wed, Jul 16, 2014 at 11:01:55AM +0200, Vitaly Kuznetsov wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> 
>> > On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
>> >> When kexec was peformed MSI IRQs for passthrough-ed devices were already
>> >> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
>> >> fails as we have no IRQ mapping information for that. Requesting for new
>> >> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
>> >> we don't recieve these IRQs.
>> >
>> > receive
>> >
>> 
>> Thanks for your comments!
>
> Thank you for quick turnaround with the answers!
>> 
>> > How come '__write_msi_msg' does not result in new MSI IRQs?
>> >
>> 
>> Actually that was the hidden question in my RFC :-)
>> 
>> Let me describe what I see. When normal boot is performed we have the
>> following in xen_hvm_setup_msi_irqs():
>> 
>> __read_msi_msg()
>>  pirq -> 0
>> 
>> then we allocate new pirq with
>>  pirq = xen_allocate_pirq_msi()
>>  pirq -> 54
>> 
>> and we have the following mapping:
>> xen: msi --> pirq=54 --> irq=72
>> 
>> in 'xl debug-keys i':
>> (XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),
>> 
>> After kexec we see the following:
>> __read_msi_msg()
>>  pirq -> 54
>> 
>> but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
>>  pirq = xen_allocate_pirq_msi()
>>  pirq -> 55
>> 
>> and we have the following mapping:
>> xen: msi --> pirq=55 --> irq=75
>> 
>> However (afaict) mapping in xen wasn't updated:
>> 
>> in 'xl debug-keys i':
>> (XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),
>
> I am wondering if that is related to in QEMU traditional:
>
>     qemu-xen-trad: free all the pirqs for msi/msix when driver unloads
>
> (which in the upstream QEMU is 1d4fd4f0e2fc5dcae0c60e00cc9af95f52988050)
>
> If you have that patch in, is the PIRQ value correctly updated?
>

Thanks, that really works! I tested both kexec -e / kdump cases. I'm
wondering if we although need my commit to workaround non-fixed qemus?

>> 
>> > Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
>> > from the device and updating the internal PIRQ<->IRQ code to match
>> > with the reality?
>> >
>> 
>> Yea, 'always trust the device'.
>> 
>> >> 
>> >> RFC: I wasn't able to understand why commit af42b8d1 which introduced
>> >> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
>> >> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
>> >> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
>> >> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
>> >
>> > We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
>> > At least that is how the commit you mentioned worked.
>> >
>> 
>> I meant to say that in case we have pirq > 0 from __read_msi_msg() but
>> xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
>> xen_allocate_pirq_msi() which brings us new pirq.
>> 
>> > In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
>> > that we might be called twice by a buggy driver. As such we want to check
>> > our PIRQ<->IRQ to figure this out.
>> 
>> But if we're called twice we'll see the same pirq, right? Or there are
>
> Good point.
>> some cases when we see 'crap' instead of pirq here?
>
> For PCI passthrough devices they will be zero until they are enabled.
> But I am not sure about the emulated devices, such as e1000 or such, which
> would also go through this path (I think - do we have MSI devices that
> we emulate in QEMU?)

AFAICT emulated e1000 doesn't use MSI (at least with qemu-tradidtional)
and with my patch series it works after kexec.

>
>> 
>> I think it would be nice to use the same pirq after kexec instead of
>> allocating a new one even in case we can make remapping work.
>
> I concur.
>
> Stefano, do you recall why you used xen_irq_from_pirq instead of just
> trusting the 'pirq' value? Was it to workaround broken QEMU?
>
>> 
>> Thanks for your comments again!
>> 
>> >> 
>> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> >> ---
>> >>  arch/x86/pci/xen.c | 3 +--
>> >>  1 file changed, 1 insertion(+), 2 deletions(-)
>> >> 
>> >> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>> >> index 905956f..685e8f1 100644
>> >> --- a/arch/x86/pci/xen.c
>> >> +++ b/arch/x86/pci/xen.c
>> >> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>> >>  		__read_msi_msg(msidesc, &msg);
>> >>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>> >>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>> >> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
>> >> -		    xen_irq_from_pirq(pirq) < 0) {
>> >> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
>> >>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
>> >>  			if (pirq < 0) {
>> >>  				irq = -ENODEV;
>> >> -- 
>> >> 1.9.3
>> >> 
>> 
>> -- 
>>   Vitaly

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16 13:40       ` Konrad Rzeszutek Wilk
@ 2014-07-16 17:20         ` Vitaly Kuznetsov
  2014-07-16 17:20         ` Vitaly Kuznetsov
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-16 17:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Jones, stefano.stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Wed, Jul 16, 2014 at 11:01:55AM +0200, Vitaly Kuznetsov wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> 
>> > On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
>> >> When kexec was peformed MSI IRQs for passthrough-ed devices were already
>> >> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
>> >> fails as we have no IRQ mapping information for that. Requesting for new
>> >> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
>> >> we don't recieve these IRQs.
>> >
>> > receive
>> >
>> 
>> Thanks for your comments!
>
> Thank you for quick turnaround with the answers!
>> 
>> > How come '__write_msi_msg' does not result in new MSI IRQs?
>> >
>> 
>> Actually that was the hidden question in my RFC :-)
>> 
>> Let me describe what I see. When normal boot is performed we have the
>> following in xen_hvm_setup_msi_irqs():
>> 
>> __read_msi_msg()
>>  pirq -> 0
>> 
>> then we allocate new pirq with
>>  pirq = xen_allocate_pirq_msi()
>>  pirq -> 54
>> 
>> and we have the following mapping:
>> xen: msi --> pirq=54 --> irq=72
>> 
>> in 'xl debug-keys i':
>> (XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),
>> 
>> After kexec we see the following:
>> __read_msi_msg()
>>  pirq -> 54
>> 
>> but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
>>  pirq = xen_allocate_pirq_msi()
>>  pirq -> 55
>> 
>> and we have the following mapping:
>> xen: msi --> pirq=55 --> irq=75
>> 
>> However (afaict) mapping in xen wasn't updated:
>> 
>> in 'xl debug-keys i':
>> (XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),
>
> I am wondering if that is related to in QEMU traditional:
>
>     qemu-xen-trad: free all the pirqs for msi/msix when driver unloads
>
> (which in the upstream QEMU is 1d4fd4f0e2fc5dcae0c60e00cc9af95f52988050)
>
> If you have that patch in, is the PIRQ value correctly updated?
>

Thanks, that really works! I tested both kexec -e / kdump cases. I'm
wondering if we although need my commit to workaround non-fixed qemus?

>> 
>> > Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
>> > from the device and updating the internal PIRQ<->IRQ code to match
>> > with the reality?
>> >
>> 
>> Yea, 'always trust the device'.
>> 
>> >> 
>> >> RFC: I wasn't able to understand why commit af42b8d1 which introduced
>> >> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
>> >> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
>> >> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
>> >> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
>> >
>> > We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
>> > At least that is how the commit you mentioned worked.
>> >
>> 
>> I meant to say that in case we have pirq > 0 from __read_msi_msg() but
>> xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
>> xen_allocate_pirq_msi() which brings us new pirq.
>> 
>> > In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
>> > that we might be called twice by a buggy driver. As such we want to check
>> > our PIRQ<->IRQ to figure this out.
>> 
>> But if we're called twice we'll see the same pirq, right? Or there are
>
> Good point.
>> some cases when we see 'crap' instead of pirq here?
>
> For PCI passthrough devices they will be zero until they are enabled.
> But I am not sure about the emulated devices, such as e1000 or such, which
> would also go through this path (I think - do we have MSI devices that
> we emulate in QEMU?)

AFAICT emulated e1000 doesn't use MSI (at least with qemu-tradidtional)
and with my patch series it works after kexec.

>
>> 
>> I think it would be nice to use the same pirq after kexec instead of
>> allocating a new one even in case we can make remapping work.
>
> I concur.
>
> Stefano, do you recall why you used xen_irq_from_pirq instead of just
> trusting the 'pirq' value? Was it to workaround broken QEMU?
>
>> 
>> Thanks for your comments again!
>> 
>> >> 
>> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> >> ---
>> >>  arch/x86/pci/xen.c | 3 +--
>> >>  1 file changed, 1 insertion(+), 2 deletions(-)
>> >> 
>> >> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>> >> index 905956f..685e8f1 100644
>> >> --- a/arch/x86/pci/xen.c
>> >> +++ b/arch/x86/pci/xen.c
>> >> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>> >>  		__read_msi_msg(msidesc, &msg);
>> >>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>> >>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>> >> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
>> >> -		    xen_irq_from_pirq(pirq) < 0) {
>> >> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
>> >>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
>> >>  			if (pirq < 0) {
>> >>  				irq = -ENODEV;
>> >> -- 
>> >> 1.9.3
>> >> 
>> 
>> -- 
>>   Vitaly

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16 17:20         ` Vitaly Kuznetsov
  2014-07-16 17:30           ` Konrad Rzeszutek Wilk
@ 2014-07-16 17:30           ` Konrad Rzeszutek Wilk
  2014-07-17  8:12             ` Vitaly Kuznetsov
  2014-07-17  8:12             ` Vitaly Kuznetsov
  2014-07-28 13:47           ` David Vrabel
  2014-07-28 13:47           ` David Vrabel
  3 siblings, 2 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-16 17:30 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: stefano.stabellini, xen-devel, Boris Ostrovsky, David Vrabel,
	Andrew Jones, linux-kernel

On Wed, Jul 16, 2014 at 07:20:39PM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Wed, Jul 16, 2014 at 11:01:55AM +0200, Vitaly Kuznetsov wrote:
> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> >> 
> >> > On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
> >> >> When kexec was peformed MSI IRQs for passthrough-ed devices were already
> >> >> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
> >> >> fails as we have no IRQ mapping information for that. Requesting for new
> >> >> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
> >> >> we don't recieve these IRQs.
> >> >
> >> > receive
> >> >
> >> 
> >> Thanks for your comments!
> >
> > Thank you for quick turnaround with the answers!
> >> 
> >> > How come '__write_msi_msg' does not result in new MSI IRQs?
> >> >
> >> 
> >> Actually that was the hidden question in my RFC :-)
> >> 
> >> Let me describe what I see. When normal boot is performed we have the
> >> following in xen_hvm_setup_msi_irqs():
> >> 
> >> __read_msi_msg()
> >>  pirq -> 0
> >> 
> >> then we allocate new pirq with
> >>  pirq = xen_allocate_pirq_msi()
> >>  pirq -> 54
> >> 
> >> and we have the following mapping:
> >> xen: msi --> pirq=54 --> irq=72
> >> 
> >> in 'xl debug-keys i':
> >> (XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),
> >> 
> >> After kexec we see the following:
> >> __read_msi_msg()
> >>  pirq -> 54
> >> 
> >> but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
> >>  pirq = xen_allocate_pirq_msi()
> >>  pirq -> 55
> >> 
> >> and we have the following mapping:
> >> xen: msi --> pirq=55 --> irq=75
> >> 
> >> However (afaict) mapping in xen wasn't updated:
> >> 
> >> in 'xl debug-keys i':
> >> (XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),
> >
> > I am wondering if that is related to in QEMU traditional:
> >
> >     qemu-xen-trad: free all the pirqs for msi/msix when driver unloads
> >
> > (which in the upstream QEMU is 1d4fd4f0e2fc5dcae0c60e00cc9af95f52988050)
> >
> > If you have that patch in, is the PIRQ value correctly updated?
> >
> 
> Thanks, that really works! I tested both kexec -e / kdump cases. I'm
> wondering if we although need my commit to workaround non-fixed qemus?

Without your patch on older QEMU's with PCI passthrough we won't get
any more interrupts after we kexec in the guest right?

As in, this issue happens _only_ with PCI passthrough devices that use
MSI or MSI-X?


Still need to get Stefano's view on this.

> 
> >> 
> >> > Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
> >> > from the device and updating the internal PIRQ<->IRQ code to match
> >> > with the reality?
> >> >
> >> 
> >> Yea, 'always trust the device'.
> >> 
> >> >> 
> >> >> RFC: I wasn't able to understand why commit af42b8d1 which introduced
> >> >> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
> >> >> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
> >> >> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
> >> >> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
> >> >
> >> > We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
> >> > At least that is how the commit you mentioned worked.
> >> >
> >> 
> >> I meant to say that in case we have pirq > 0 from __read_msi_msg() but
> >> xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
> >> xen_allocate_pirq_msi() which brings us new pirq.
> >> 
> >> > In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
> >> > that we might be called twice by a buggy driver. As such we want to check
> >> > our PIRQ<->IRQ to figure this out.
> >> 
> >> But if we're called twice we'll see the same pirq, right? Or there are
> >
> > Good point.
> >> some cases when we see 'crap' instead of pirq here?
> >
> > For PCI passthrough devices they will be zero until they are enabled.
> > But I am not sure about the emulated devices, such as e1000 or such, which
> > would also go through this path (I think - do we have MSI devices that
> > we emulate in QEMU?)
> 
> AFAICT emulated e1000 doesn't use MSI (at least with qemu-tradidtional)
> and with my patch series it works after kexec.
> 
> >
> >> 
> >> I think it would be nice to use the same pirq after kexec instead of
> >> allocating a new one even in case we can make remapping work.
> >
> > I concur.
> >
> > Stefano, do you recall why you used xen_irq_from_pirq instead of just
> > trusting the 'pirq' value? Was it to workaround broken QEMU?
> >
> >> 
> >> Thanks for your comments again!
> >> 
> >> >> 
> >> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> >> ---
> >> >>  arch/x86/pci/xen.c | 3 +--
> >> >>  1 file changed, 1 insertion(+), 2 deletions(-)
> >> >> 
> >> >> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> >> >> index 905956f..685e8f1 100644
> >> >> --- a/arch/x86/pci/xen.c
> >> >> +++ b/arch/x86/pci/xen.c
> >> >> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
> >> >>  		__read_msi_msg(msidesc, &msg);
> >> >>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
> >> >>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
> >> >> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
> >> >> -		    xen_irq_from_pirq(pirq) < 0) {
> >> >> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
> >> >>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
> >> >>  			if (pirq < 0) {
> >> >>  				irq = -ENODEV;
> >> >> -- 
> >> >> 1.9.3
> >> >> 
> >> 
> >> -- 
> >>   Vitaly
> 
> -- 
>   Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16 17:20         ` Vitaly Kuznetsov
@ 2014-07-16 17:30           ` Konrad Rzeszutek Wilk
  2014-07-16 17:30           ` Konrad Rzeszutek Wilk
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-16 17:30 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, stefano.stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Wed, Jul 16, 2014 at 07:20:39PM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Wed, Jul 16, 2014 at 11:01:55AM +0200, Vitaly Kuznetsov wrote:
> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> >> 
> >> > On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
> >> >> When kexec was peformed MSI IRQs for passthrough-ed devices were already
> >> >> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
> >> >> fails as we have no IRQ mapping information for that. Requesting for new
> >> >> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
> >> >> we don't recieve these IRQs.
> >> >
> >> > receive
> >> >
> >> 
> >> Thanks for your comments!
> >
> > Thank you for quick turnaround with the answers!
> >> 
> >> > How come '__write_msi_msg' does not result in new MSI IRQs?
> >> >
> >> 
> >> Actually that was the hidden question in my RFC :-)
> >> 
> >> Let me describe what I see. When normal boot is performed we have the
> >> following in xen_hvm_setup_msi_irqs():
> >> 
> >> __read_msi_msg()
> >>  pirq -> 0
> >> 
> >> then we allocate new pirq with
> >>  pirq = xen_allocate_pirq_msi()
> >>  pirq -> 54
> >> 
> >> and we have the following mapping:
> >> xen: msi --> pirq=54 --> irq=72
> >> 
> >> in 'xl debug-keys i':
> >> (XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),
> >> 
> >> After kexec we see the following:
> >> __read_msi_msg()
> >>  pirq -> 54
> >> 
> >> but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
> >>  pirq = xen_allocate_pirq_msi()
> >>  pirq -> 55
> >> 
> >> and we have the following mapping:
> >> xen: msi --> pirq=55 --> irq=75
> >> 
> >> However (afaict) mapping in xen wasn't updated:
> >> 
> >> in 'xl debug-keys i':
> >> (XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),
> >
> > I am wondering if that is related to in QEMU traditional:
> >
> >     qemu-xen-trad: free all the pirqs for msi/msix when driver unloads
> >
> > (which in the upstream QEMU is 1d4fd4f0e2fc5dcae0c60e00cc9af95f52988050)
> >
> > If you have that patch in, is the PIRQ value correctly updated?
> >
> 
> Thanks, that really works! I tested both kexec -e / kdump cases. I'm
> wondering if we although need my commit to workaround non-fixed qemus?

Without your patch on older QEMU's with PCI passthrough we won't get
any more interrupts after we kexec in the guest right?

As in, this issue happens _only_ with PCI passthrough devices that use
MSI or MSI-X?


Still need to get Stefano's view on this.

> 
> >> 
> >> > Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
> >> > from the device and updating the internal PIRQ<->IRQ code to match
> >> > with the reality?
> >> >
> >> 
> >> Yea, 'always trust the device'.
> >> 
> >> >> 
> >> >> RFC: I wasn't able to understand why commit af42b8d1 which introduced
> >> >> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
> >> >> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
> >> >> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
> >> >> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
> >> >
> >> > We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
> >> > At least that is how the commit you mentioned worked.
> >> >
> >> 
> >> I meant to say that in case we have pirq > 0 from __read_msi_msg() but
> >> xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
> >> xen_allocate_pirq_msi() which brings us new pirq.
> >> 
> >> > In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
> >> > that we might be called twice by a buggy driver. As such we want to check
> >> > our PIRQ<->IRQ to figure this out.
> >> 
> >> But if we're called twice we'll see the same pirq, right? Or there are
> >
> > Good point.
> >> some cases when we see 'crap' instead of pirq here?
> >
> > For PCI passthrough devices they will be zero until they are enabled.
> > But I am not sure about the emulated devices, such as e1000 or such, which
> > would also go through this path (I think - do we have MSI devices that
> > we emulate in QEMU?)
> 
> AFAICT emulated e1000 doesn't use MSI (at least with qemu-tradidtional)
> and with my patch series it works after kexec.
> 
> >
> >> 
> >> I think it would be nice to use the same pirq after kexec instead of
> >> allocating a new one even in case we can make remapping work.
> >
> > I concur.
> >
> > Stefano, do you recall why you used xen_irq_from_pirq instead of just
> > trusting the 'pirq' value? Was it to workaround broken QEMU?
> >
> >> 
> >> Thanks for your comments again!
> >> 
> >> >> 
> >> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> >> ---
> >> >>  arch/x86/pci/xen.c | 3 +--
> >> >>  1 file changed, 1 insertion(+), 2 deletions(-)
> >> >> 
> >> >> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> >> >> index 905956f..685e8f1 100644
> >> >> --- a/arch/x86/pci/xen.c
> >> >> +++ b/arch/x86/pci/xen.c
> >> >> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
> >> >>  		__read_msi_msg(msidesc, &msg);
> >> >>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
> >> >>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
> >> >> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
> >> >> -		    xen_irq_from_pirq(pirq) < 0) {
> >> >> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
> >> >>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
> >> >>  			if (pirq < 0) {
> >> >>  				irq = -ENODEV;
> >> >> -- 
> >> >> 1.9.3
> >> >> 
> >> 
> >> -- 
> >>   Vitaly
> 
> -- 
>   Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16 17:30           ` Konrad Rzeszutek Wilk
@ 2014-07-17  8:12             ` Vitaly Kuznetsov
  2014-07-17  8:12             ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-17  8:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: stefano.stabellini, xen-devel, Boris Ostrovsky, David Vrabel,
	Andrew Jones, linux-kernel

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Wed, Jul 16, 2014 at 07:20:39PM +0200, Vitaly Kuznetsov wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> 
>> > On Wed, Jul 16, 2014 at 11:01:55AM +0200, Vitaly Kuznetsov wrote:
>> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> >> 
>> >> > On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
>> >> >> When kexec was peformed MSI IRQs for passthrough-ed devices were already
>> >> >> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
>> >> >> fails as we have no IRQ mapping information for that. Requesting for new
>> >> >> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
>> >> >> we don't recieve these IRQs.
>> >> >
>> >> > receive
>> >> >
>> >> 
>> >> Thanks for your comments!
>> >
>> > Thank you for quick turnaround with the answers!
>> >> 
>> >> > How come '__write_msi_msg' does not result in new MSI IRQs?
>> >> >
>> >> 
>> >> Actually that was the hidden question in my RFC :-)
>> >> 
>> >> Let me describe what I see. When normal boot is performed we have the
>> >> following in xen_hvm_setup_msi_irqs():
>> >> 
>> >> __read_msi_msg()
>> >>  pirq -> 0
>> >> 
>> >> then we allocate new pirq with
>> >>  pirq = xen_allocate_pirq_msi()
>> >>  pirq -> 54
>> >> 
>> >> and we have the following mapping:
>> >> xen: msi --> pirq=54 --> irq=72
>> >> 
>> >> in 'xl debug-keys i':
>> >> (XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),
>> >> 
>> >> After kexec we see the following:
>> >> __read_msi_msg()
>> >>  pirq -> 54
>> >> 
>> >> but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
>> >>  pirq = xen_allocate_pirq_msi()
>> >>  pirq -> 55
>> >> 
>> >> and we have the following mapping:
>> >> xen: msi --> pirq=55 --> irq=75
>> >> 
>> >> However (afaict) mapping in xen wasn't updated:
>> >> 
>> >> in 'xl debug-keys i':
>> >> (XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),
>> >
>> > I am wondering if that is related to in QEMU traditional:
>> >
>> >     qemu-xen-trad: free all the pirqs for msi/msix when driver unloads
>> >
>> > (which in the upstream QEMU is 1d4fd4f0e2fc5dcae0c60e00cc9af95f52988050)
>> >
>> > If you have that patch in, is the PIRQ value correctly updated?
>> >
>> 
>> Thanks, that really works! I tested both kexec -e / kdump cases. I'm
>> wondering if we although need my commit to workaround non-fixed qemus?
>
> Without your patch on older QEMU's with PCI passthrough we won't get
> any more interrupts after we kexec in the guest right?
>

Correct.

> As in, this issue happens _only_ with PCI passthrough devices that use
> MSI or MSI-X?

I haven't tested MSI-X but in theory yes, only MSI and MSI-X
passthrough-ed devices are affected.

>
> Still need to get Stefano's view on this.
>

Sure, thanks!

>> 
>> >> 
>> >> > Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
>> >> > from the device and updating the internal PIRQ<->IRQ code to match
>> >> > with the reality?
>> >> >
>> >> 
>> >> Yea, 'always trust the device'.
>> >> 
>> >> >> 
>> >> >> RFC: I wasn't able to understand why commit af42b8d1 which introduced
>> >> >> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
>> >> >> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
>> >> >> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
>> >> >> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
>> >> >
>> >> > We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
>> >> > At least that is how the commit you mentioned worked.
>> >> >
>> >> 
>> >> I meant to say that in case we have pirq > 0 from __read_msi_msg() but
>> >> xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
>> >> xen_allocate_pirq_msi() which brings us new pirq.
>> >> 
>> >> > In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
>> >> > that we might be called twice by a buggy driver. As such we want to check
>> >> > our PIRQ<->IRQ to figure this out.
>> >> 
>> >> But if we're called twice we'll see the same pirq, right? Or there are
>> >
>> > Good point.
>> >> some cases when we see 'crap' instead of pirq here?
>> >
>> > For PCI passthrough devices they will be zero until they are enabled.
>> > But I am not sure about the emulated devices, such as e1000 or such, which
>> > would also go through this path (I think - do we have MSI devices that
>> > we emulate in QEMU?)
>> 
>> AFAICT emulated e1000 doesn't use MSI (at least with qemu-tradidtional)
>> and with my patch series it works after kexec.
>> 
>> >
>> >> 
>> >> I think it would be nice to use the same pirq after kexec instead of
>> >> allocating a new one even in case we can make remapping work.
>> >
>> > I concur.
>> >
>> > Stefano, do you recall why you used xen_irq_from_pirq instead of just
>> > trusting the 'pirq' value? Was it to workaround broken QEMU?
>> >
>> >> 
>> >> Thanks for your comments again!
>> >> 
>> >> >> 
>> >> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> >> >> ---
>> >> >>  arch/x86/pci/xen.c | 3 +--
>> >> >>  1 file changed, 1 insertion(+), 2 deletions(-)
>> >> >> 
>> >> >> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>> >> >> index 905956f..685e8f1 100644
>> >> >> --- a/arch/x86/pci/xen.c
>> >> >> +++ b/arch/x86/pci/xen.c
>> >> >> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>> >> >>  		__read_msi_msg(msidesc, &msg);
>> >> >>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>> >> >>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>> >> >> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
>> >> >> -		    xen_irq_from_pirq(pirq) < 0) {
>> >> >> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
>> >> >>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
>> >> >>  			if (pirq < 0) {
>> >> >>  				irq = -ENODEV;
>> >> >> -- 
>> >> >> 1.9.3
>> >> >> 
>> >> 
>> >> -- 
>> >>   Vitaly
>> 
>> -- 
>>   Vitaly

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16 17:30           ` Konrad Rzeszutek Wilk
  2014-07-17  8:12             ` Vitaly Kuznetsov
@ 2014-07-17  8:12             ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-17  8:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Jones, stefano.stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Wed, Jul 16, 2014 at 07:20:39PM +0200, Vitaly Kuznetsov wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> 
>> > On Wed, Jul 16, 2014 at 11:01:55AM +0200, Vitaly Kuznetsov wrote:
>> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> >> 
>> >> > On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
>> >> >> When kexec was peformed MSI IRQs for passthrough-ed devices were already
>> >> >> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
>> >> >> fails as we have no IRQ mapping information for that. Requesting for new
>> >> >> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
>> >> >> we don't recieve these IRQs.
>> >> >
>> >> > receive
>> >> >
>> >> 
>> >> Thanks for your comments!
>> >
>> > Thank you for quick turnaround with the answers!
>> >> 
>> >> > How come '__write_msi_msg' does not result in new MSI IRQs?
>> >> >
>> >> 
>> >> Actually that was the hidden question in my RFC :-)
>> >> 
>> >> Let me describe what I see. When normal boot is performed we have the
>> >> following in xen_hvm_setup_msi_irqs():
>> >> 
>> >> __read_msi_msg()
>> >>  pirq -> 0
>> >> 
>> >> then we allocate new pirq with
>> >>  pirq = xen_allocate_pirq_msi()
>> >>  pirq -> 54
>> >> 
>> >> and we have the following mapping:
>> >> xen: msi --> pirq=54 --> irq=72
>> >> 
>> >> in 'xl debug-keys i':
>> >> (XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),
>> >> 
>> >> After kexec we see the following:
>> >> __read_msi_msg()
>> >>  pirq -> 54
>> >> 
>> >> but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
>> >>  pirq = xen_allocate_pirq_msi()
>> >>  pirq -> 55
>> >> 
>> >> and we have the following mapping:
>> >> xen: msi --> pirq=55 --> irq=75
>> >> 
>> >> However (afaict) mapping in xen wasn't updated:
>> >> 
>> >> in 'xl debug-keys i':
>> >> (XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),
>> >
>> > I am wondering if that is related to in QEMU traditional:
>> >
>> >     qemu-xen-trad: free all the pirqs for msi/msix when driver unloads
>> >
>> > (which in the upstream QEMU is 1d4fd4f0e2fc5dcae0c60e00cc9af95f52988050)
>> >
>> > If you have that patch in, is the PIRQ value correctly updated?
>> >
>> 
>> Thanks, that really works! I tested both kexec -e / kdump cases. I'm
>> wondering if we although need my commit to workaround non-fixed qemus?
>
> Without your patch on older QEMU's with PCI passthrough we won't get
> any more interrupts after we kexec in the guest right?
>

Correct.

> As in, this issue happens _only_ with PCI passthrough devices that use
> MSI or MSI-X?

I haven't tested MSI-X but in theory yes, only MSI and MSI-X
passthrough-ed devices are affected.

>
> Still need to get Stefano's view on this.
>

Sure, thanks!

>> 
>> >> 
>> >> > Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
>> >> > from the device and updating the internal PIRQ<->IRQ code to match
>> >> > with the reality?
>> >> >
>> >> 
>> >> Yea, 'always trust the device'.
>> >> 
>> >> >> 
>> >> >> RFC: I wasn't able to understand why commit af42b8d1 which introduced
>> >> >> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
>> >> >> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
>> >> >> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
>> >> >> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
>> >> >
>> >> > We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
>> >> > At least that is how the commit you mentioned worked.
>> >> >
>> >> 
>> >> I meant to say that in case we have pirq > 0 from __read_msi_msg() but
>> >> xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
>> >> xen_allocate_pirq_msi() which brings us new pirq.
>> >> 
>> >> > In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
>> >> > that we might be called twice by a buggy driver. As such we want to check
>> >> > our PIRQ<->IRQ to figure this out.
>> >> 
>> >> But if we're called twice we'll see the same pirq, right? Or there are
>> >
>> > Good point.
>> >> some cases when we see 'crap' instead of pirq here?
>> >
>> > For PCI passthrough devices they will be zero until they are enabled.
>> > But I am not sure about the emulated devices, such as e1000 or such, which
>> > would also go through this path (I think - do we have MSI devices that
>> > we emulate in QEMU?)
>> 
>> AFAICT emulated e1000 doesn't use MSI (at least with qemu-tradidtional)
>> and with my patch series it works after kexec.
>> 
>> >
>> >> 
>> >> I think it would be nice to use the same pirq after kexec instead of
>> >> allocating a new one even in case we can make remapping work.
>> >
>> > I concur.
>> >
>> > Stefano, do you recall why you used xen_irq_from_pirq instead of just
>> > trusting the 'pirq' value? Was it to workaround broken QEMU?
>> >
>> >> 
>> >> Thanks for your comments again!
>> >> 
>> >> >> 
>> >> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> >> >> ---
>> >> >>  arch/x86/pci/xen.c | 3 +--
>> >> >>  1 file changed, 1 insertion(+), 2 deletions(-)
>> >> >> 
>> >> >> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>> >> >> index 905956f..685e8f1 100644
>> >> >> --- a/arch/x86/pci/xen.c
>> >> >> +++ b/arch/x86/pci/xen.c
>> >> >> @@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>> >> >>  		__read_msi_msg(msidesc, &msg);
>> >> >>  		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
>> >> >>  			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
>> >> >> -		if (msg.data != XEN_PIRQ_MSI_DATA ||
>> >> >> -		    xen_irq_from_pirq(pirq) < 0) {
>> >> >> +		if (msg.data != XEN_PIRQ_MSI_DATA || pirq <= 0) {
>> >> >>  			pirq = xen_allocate_pirq_msi(dev, msidesc);
>> >> >>  			if (pirq < 0) {
>> >> >>  				irq = -ENODEV;
>> >> >> -- 
>> >> >> 1.9.3
>> >> >> 
>> >> 
>> >> -- 
>> >>   Vitaly
>> 
>> -- 
>>   Vitaly

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 15:50       ` Konrad Rzeszutek Wilk
@ 2014-07-18 11:05         ` Vitaly Kuznetsov
  2014-07-18 13:56           ` Konrad Rzeszutek Wilk
  2014-07-18 13:56           ` Konrad Rzeszutek Wilk
  2014-07-18 11:05         ` Vitaly Kuznetsov
  1 sibling, 2 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-18 11:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Tue, Jul 15, 2014 at 05:43:17PM +0200, Vitaly Kuznetsov wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> 
>> > On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
>> >> From: Olaf Hering <olaf@aepfle.de>
>> >> 
>> >> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
>> >> ("xen PVonHVM: move shared_info to MMIO before kexec").
>> >> 
>> >> Currently kexec in a PVonHVM guest fails with a triple fault because the
>> >> new kernel overwrites the shared info page. The exact failure depends on
>> >> the size of the kernel image. This patch moves the pfn from RAM into an
>> >> E820 reserved memory area.
>> >> 
>> >> The pfn containing the shared_info is located somewhere in RAM. This will
>> >> cause trouble if the current kernel is doing a kexec boot into a new
>> >> kernel. The new kernel (and its startup code) can not know where the pfn
>> >> is, so it can not reserve the page. The hypervisor will continue to update
>> >> the pfn, and as a result memory corruption occours in the new kernel.
>> >> 
>> >> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
>> >> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
>> >> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
>> >> will usually not allocate areas up to FE700000, so FE700000 is expected
>> >> to work also with older toolstacks.
>> >> 
>> >> In Xen3 there is no reserved area at a fixed location. If the guest is
>> >> started on such old hosts the shared_info page will be placed in RAM. As
>> >> a result kexec can not be used.
>> >
>> > So this looks right, the one thing that we really need to check 
>> > is e9daff24a266307943457086533041bd971d0ef9
>> >
>> >    This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
>> >
>> >     We are doing this b/c on 32-bit PVonHVM with older hypervisors
>> >     (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
>> >     we use it for the time keeping, and the timekeeping code loops
>> >     forever - as the version field never changes. Olaf says to
>> >     revert it, so lets do that.
>> >
>> > Could you kindly test that the migration on 32-bit PVHVM guests
>> > on older hypervisors works?
>> >
>> 
>> Sure, will do! Was there anything special about the setup or any 32-bit
>> pvhvm guest migration (on 64-bit hypervisor I suppose) would fail? I can
>> try checking both current and old versions to make sure the issue was
>> acutually fixed.
>
> Nothing fancy (well, it was SMP, so 4 CPUs). I did the 'save'/'restore' and the
> guest would not restore properly.
>

The symptoms you saw were: after the resume guest appears to be frozen,
all vcpus except for the first one spin at 100%? I was able to reproduce
that on old patch version and everything works fine with your fix
(calling xen_hvm_set_shared_info() in addition to
xen_hvm_connect_shared_info() on resume). We're probably safe to apply
it now, thanks!

However I'd like to suggest we remove '__init' from
xen_hvm_set_shared_info() as now we call it on resume.

> Thank you!
>> 
>> >> 
>> >> Signed-off-by: Olaf Hering <olaf@aepfle.de>
>> >> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> >> (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)
>> >> 
>> >> [On resume we need to reset the xen_vcpu_info, which the original
>> >> patch did not do]
>> >> 
>> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> >> ---
>> >>  arch/x86/xen/enlighten.c | 74 ++++++++++++++++++++++++++++++++++++------------
>> >>  arch/x86/xen/suspend.c   |  2 +-
>> >>  arch/x86/xen/xen-ops.h   |  2 +-
>> >>  3 files changed, 58 insertions(+), 20 deletions(-)
>> >> 
>> >> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> >> index ffb101e..a11af62 100644
>> >> --- a/arch/x86/xen/enlighten.c
>> >> +++ b/arch/x86/xen/enlighten.c
>> >> @@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
>> >>  #endif
>> >>  }
>> >>  
>> >> -void __ref xen_hvm_init_shared_info(void)
>> >> +#ifdef CONFIG_XEN_PVHVM
>> >> +#define HVM_SHARED_INFO_ADDR 0xFE700000UL
>> >> +static struct shared_info *xen_hvm_shared_info;
>> >> +static unsigned long xen_hvm_sip_phys;
>> >> +static int xen_major, xen_minor;
>> >> +
>> >> +static void xen_hvm_connect_shared_info(unsigned long pfn)
>> >>  {
>> >> -	int cpu;
>> >>  	struct xen_add_to_physmap xatp;
>> >> -	static struct shared_info *shared_info_page = 0;
>> >>  
>> >> -	if (!shared_info_page)
>> >> -		shared_info_page = (struct shared_info *)
>> >> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
>> >>  	xatp.domid = DOMID_SELF;
>> >>  	xatp.idx = 0;
>> >>  	xatp.space = XENMAPSPACE_shared_info;
>> >> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
>> >> +	xatp.gpfn = pfn;
>> >>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
>> >>  		BUG();
>> >>  
>> >> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
>> >> +}
>> >> +static void __init xen_hvm_set_shared_info(struct shared_info *sip)
>> >> +{
>> >> +	int cpu;
>> >> +
>> >> +	HYPERVISOR_shared_info = sip;
>> >>  
>> >>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>> >>  	 * page, we use it in the event channel upcall and in some pvclock
>> >> @@ -1760,20 +1766,39 @@ void __ref xen_hvm_init_shared_info(void)
>> >>  	}
>> >>  }
>> >>  
>> >> -#ifdef CONFIG_XEN_PVHVM
>> >> +/* Reconnect the shared_info pfn to a (new) mfn */
>> >> +void xen_hvm_resume_shared_info(void)
>> >> +{
>> >> +	xen_hvm_connect_shared_info(xen_hvm_sip_phys >> PAGE_SHIFT);
>> >> +	xen_hvm_set_shared_info(xen_hvm_shared_info);
>> >> +}
>> >> +
>> >> +/* Xen tools prior to Xen 4 do not provide a E820_Reserved area for guest usage.
>> >> + * On these old tools the shared info page will be placed in E820_Ram.
>> >> + * Xen 4 provides a E820_Reserved area at 0xFC000000, and this code expects
>> >> + * that nothing is mapped up to HVM_SHARED_INFO_ADDR.
>> >> + * Xen 4.3+ provides an explicit 1MB area at HVM_SHARED_INFO_ADDR which is used
>> >> + * here for the shared info page. */
>> >> +static void __init xen_hvm_init_shared_info(void)
>> >> +{
>> >> +	if (xen_major < 4) {
>> >> +		xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
>> >> +		xen_hvm_sip_phys = __pa(xen_hvm_shared_info);
>> >> +	} else {
>> >> +		xen_hvm_sip_phys = HVM_SHARED_INFO_ADDR;
>> >> +		set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_hvm_sip_phys);
>> >> +		xen_hvm_shared_info =
>> >> +		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
>> >> +	}
>> >> +	xen_hvm_resume_shared_info();
>> >> +}
>> >> +
>> >>  static void __init init_hvm_pv_info(void)
>> >>  {
>> >> -	int major, minor;
>> >> -	uint32_t eax, ebx, ecx, edx, pages, msr, base;
>> >> +	uint32_t  ecx, edx, pages, msr, base;
>> >>  	u64 pfn;
>> >>  
>> >>  	base = xen_cpuid_base();
>> >> -	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
>> >> -
>> >> -	major = eax >> 16;
>> >> -	minor = eax & 0xffff;
>> >> -	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
>> >> -
>> >>  	cpuid(base + 2, &pages, &msr, &ecx, &edx);
>> >>  
>> >>  	pfn = __pa(hypercall_page);
>> >> @@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
>> >>  
>> >>  static uint32_t __init xen_hvm_platform(void)
>> >>  {
>> >> +	uint32_t eax, ebx, ecx, edx, base;
>> >> +
>> >>  	if (xen_pv_domain())
>> >>  		return 0;
>> >>  
>> >> -	return xen_cpuid_base();
>> >> +	base = xen_cpuid_base();
>> >> +	if (!base)
>> >> +		return 0;
>> >> +
>> >> +	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
>> >> +
>> >> +	xen_major = eax >> 16;
>> >> +	xen_minor = eax & 0xffff;
>> >> +
>> >> +	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
>> >> +
>> >> +	return 1;
>> >>  }
>> >>  
>> >>  bool xen_hvm_need_lapic(void)
>> >> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
>> >> index c4df9db..8d6793e 100644
>> >> --- a/arch/x86/xen/suspend.c
>> >> +++ b/arch/x86/xen/suspend.c
>> >> @@ -32,7 +32,7 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>> >>  {
>> >>  #ifdef CONFIG_XEN_PVHVM
>> >>  	int cpu;
>> >> -	xen_hvm_init_shared_info();
>> >> +	xen_hvm_resume_shared_info();
>> >>  	xen_callback_vector();
>> >>  	xen_unplug_emulated_devices();
>> >>  	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
>> >> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
>> >> index 97d8765..d083e82 100644
>> >> --- a/arch/x86/xen/xen-ops.h
>> >> +++ b/arch/x86/xen/xen-ops.h
>> >> @@ -43,7 +43,7 @@ void xen_enable_syscall(void);
>> >>  void xen_vcpu_restore(void);
>> >>  
>> >>  void xen_callback_vector(void);
>> >> -void xen_hvm_init_shared_info(void);
>> >> +void xen_hvm_resume_shared_info(void);
>> >>  void xen_unplug_emulated_devices(void);
>> >>  
>> >>  void __init xen_build_dynamic_phys_to_machine(void);
>> >> -- 
>> >> 1.9.3
>> >> 
>> 
>> -- 
>>   Vitaly

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 15:50       ` Konrad Rzeszutek Wilk
  2014-07-18 11:05         ` Vitaly Kuznetsov
@ 2014-07-18 11:05         ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-18 11:05 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Tue, Jul 15, 2014 at 05:43:17PM +0200, Vitaly Kuznetsov wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> 
>> > On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
>> >> From: Olaf Hering <olaf@aepfle.de>
>> >> 
>> >> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
>> >> ("xen PVonHVM: move shared_info to MMIO before kexec").
>> >> 
>> >> Currently kexec in a PVonHVM guest fails with a triple fault because the
>> >> new kernel overwrites the shared info page. The exact failure depends on
>> >> the size of the kernel image. This patch moves the pfn from RAM into an
>> >> E820 reserved memory area.
>> >> 
>> >> The pfn containing the shared_info is located somewhere in RAM. This will
>> >> cause trouble if the current kernel is doing a kexec boot into a new
>> >> kernel. The new kernel (and its startup code) can not know where the pfn
>> >> is, so it can not reserve the page. The hypervisor will continue to update
>> >> the pfn, and as a result memory corruption occours in the new kernel.
>> >> 
>> >> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
>> >> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
>> >> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
>> >> will usually not allocate areas up to FE700000, so FE700000 is expected
>> >> to work also with older toolstacks.
>> >> 
>> >> In Xen3 there is no reserved area at a fixed location. If the guest is
>> >> started on such old hosts the shared_info page will be placed in RAM. As
>> >> a result kexec can not be used.
>> >
>> > So this looks right, the one thing that we really need to check 
>> > is e9daff24a266307943457086533041bd971d0ef9
>> >
>> >    This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
>> >
>> >     We are doing this b/c on 32-bit PVonHVM with older hypervisors
>> >     (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
>> >     we use it for the time keeping, and the timekeeping code loops
>> >     forever - as the version field never changes. Olaf says to
>> >     revert it, so lets do that.
>> >
>> > Could you kindly test that the migration on 32-bit PVHVM guests
>> > on older hypervisors works?
>> >
>> 
>> Sure, will do! Was there anything special about the setup or any 32-bit
>> pvhvm guest migration (on 64-bit hypervisor I suppose) would fail? I can
>> try checking both current and old versions to make sure the issue was
>> acutually fixed.
>
> Nothing fancy (well, it was SMP, so 4 CPUs). I did the 'save'/'restore' and the
> guest would not restore properly.
>

The symptoms you saw were: after the resume guest appears to be frozen,
all vcpus except for the first one spin at 100%? I was able to reproduce
that on old patch version and everything works fine with your fix
(calling xen_hvm_set_shared_info() in addition to
xen_hvm_connect_shared_info() on resume). We're probably safe to apply
it now, thanks!

However I'd like to suggest we remove '__init' from
xen_hvm_set_shared_info() as now we call it on resume.

> Thank you!
>> 
>> >> 
>> >> Signed-off-by: Olaf Hering <olaf@aepfle.de>
>> >> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> >> (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)
>> >> 
>> >> [On resume we need to reset the xen_vcpu_info, which the original
>> >> patch did not do]
>> >> 
>> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> >> ---
>> >>  arch/x86/xen/enlighten.c | 74 ++++++++++++++++++++++++++++++++++++------------
>> >>  arch/x86/xen/suspend.c   |  2 +-
>> >>  arch/x86/xen/xen-ops.h   |  2 +-
>> >>  3 files changed, 58 insertions(+), 20 deletions(-)
>> >> 
>> >> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> >> index ffb101e..a11af62 100644
>> >> --- a/arch/x86/xen/enlighten.c
>> >> +++ b/arch/x86/xen/enlighten.c
>> >> @@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
>> >>  #endif
>> >>  }
>> >>  
>> >> -void __ref xen_hvm_init_shared_info(void)
>> >> +#ifdef CONFIG_XEN_PVHVM
>> >> +#define HVM_SHARED_INFO_ADDR 0xFE700000UL
>> >> +static struct shared_info *xen_hvm_shared_info;
>> >> +static unsigned long xen_hvm_sip_phys;
>> >> +static int xen_major, xen_minor;
>> >> +
>> >> +static void xen_hvm_connect_shared_info(unsigned long pfn)
>> >>  {
>> >> -	int cpu;
>> >>  	struct xen_add_to_physmap xatp;
>> >> -	static struct shared_info *shared_info_page = 0;
>> >>  
>> >> -	if (!shared_info_page)
>> >> -		shared_info_page = (struct shared_info *)
>> >> -			extend_brk(PAGE_SIZE, PAGE_SIZE);
>> >>  	xatp.domid = DOMID_SELF;
>> >>  	xatp.idx = 0;
>> >>  	xatp.space = XENMAPSPACE_shared_info;
>> >> -	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
>> >> +	xatp.gpfn = pfn;
>> >>  	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
>> >>  		BUG();
>> >>  
>> >> -	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
>> >> +}
>> >> +static void __init xen_hvm_set_shared_info(struct shared_info *sip)
>> >> +{
>> >> +	int cpu;
>> >> +
>> >> +	HYPERVISOR_shared_info = sip;
>> >>  
>> >>  	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
>> >>  	 * page, we use it in the event channel upcall and in some pvclock
>> >> @@ -1760,20 +1766,39 @@ void __ref xen_hvm_init_shared_info(void)
>> >>  	}
>> >>  }
>> >>  
>> >> -#ifdef CONFIG_XEN_PVHVM
>> >> +/* Reconnect the shared_info pfn to a (new) mfn */
>> >> +void xen_hvm_resume_shared_info(void)
>> >> +{
>> >> +	xen_hvm_connect_shared_info(xen_hvm_sip_phys >> PAGE_SHIFT);
>> >> +	xen_hvm_set_shared_info(xen_hvm_shared_info);
>> >> +}
>> >> +
>> >> +/* Xen tools prior to Xen 4 do not provide a E820_Reserved area for guest usage.
>> >> + * On these old tools the shared info page will be placed in E820_Ram.
>> >> + * Xen 4 provides a E820_Reserved area at 0xFC000000, and this code expects
>> >> + * that nothing is mapped up to HVM_SHARED_INFO_ADDR.
>> >> + * Xen 4.3+ provides an explicit 1MB area at HVM_SHARED_INFO_ADDR which is used
>> >> + * here for the shared info page. */
>> >> +static void __init xen_hvm_init_shared_info(void)
>> >> +{
>> >> +	if (xen_major < 4) {
>> >> +		xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
>> >> +		xen_hvm_sip_phys = __pa(xen_hvm_shared_info);
>> >> +	} else {
>> >> +		xen_hvm_sip_phys = HVM_SHARED_INFO_ADDR;
>> >> +		set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_hvm_sip_phys);
>> >> +		xen_hvm_shared_info =
>> >> +		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
>> >> +	}
>> >> +	xen_hvm_resume_shared_info();
>> >> +}
>> >> +
>> >>  static void __init init_hvm_pv_info(void)
>> >>  {
>> >> -	int major, minor;
>> >> -	uint32_t eax, ebx, ecx, edx, pages, msr, base;
>> >> +	uint32_t  ecx, edx, pages, msr, base;
>> >>  	u64 pfn;
>> >>  
>> >>  	base = xen_cpuid_base();
>> >> -	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
>> >> -
>> >> -	major = eax >> 16;
>> >> -	minor = eax & 0xffff;
>> >> -	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
>> >> -
>> >>  	cpuid(base + 2, &pages, &msr, &ecx, &edx);
>> >>  
>> >>  	pfn = __pa(hypercall_page);
>> >> @@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
>> >>  
>> >>  static uint32_t __init xen_hvm_platform(void)
>> >>  {
>> >> +	uint32_t eax, ebx, ecx, edx, base;
>> >> +
>> >>  	if (xen_pv_domain())
>> >>  		return 0;
>> >>  
>> >> -	return xen_cpuid_base();
>> >> +	base = xen_cpuid_base();
>> >> +	if (!base)
>> >> +		return 0;
>> >> +
>> >> +	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
>> >> +
>> >> +	xen_major = eax >> 16;
>> >> +	xen_minor = eax & 0xffff;
>> >> +
>> >> +	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
>> >> +
>> >> +	return 1;
>> >>  }
>> >>  
>> >>  bool xen_hvm_need_lapic(void)
>> >> diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
>> >> index c4df9db..8d6793e 100644
>> >> --- a/arch/x86/xen/suspend.c
>> >> +++ b/arch/x86/xen/suspend.c
>> >> @@ -32,7 +32,7 @@ static void xen_hvm_post_suspend(int suspend_cancelled)
>> >>  {
>> >>  #ifdef CONFIG_XEN_PVHVM
>> >>  	int cpu;
>> >> -	xen_hvm_init_shared_info();
>> >> +	xen_hvm_resume_shared_info();
>> >>  	xen_callback_vector();
>> >>  	xen_unplug_emulated_devices();
>> >>  	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
>> >> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
>> >> index 97d8765..d083e82 100644
>> >> --- a/arch/x86/xen/xen-ops.h
>> >> +++ b/arch/x86/xen/xen-ops.h
>> >> @@ -43,7 +43,7 @@ void xen_enable_syscall(void);
>> >>  void xen_vcpu_restore(void);
>> >>  
>> >>  void xen_callback_vector(void);
>> >> -void xen_hvm_init_shared_info(void);
>> >> +void xen_hvm_resume_shared_info(void);
>> >>  void xen_unplug_emulated_devices(void);
>> >>  
>> >>  void __init xen_build_dynamic_phys_to_machine(void);
>> >> -- 
>> >> 1.9.3
>> >> 
>> 
>> -- 
>>   Vitaly

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-18 11:05         ` Vitaly Kuznetsov
  2014-07-18 13:56           ` Konrad Rzeszutek Wilk
@ 2014-07-18 13:56           ` Konrad Rzeszutek Wilk
  2014-07-18 15:45             ` Vitaly Kuznetsov
  2014-07-18 15:45             ` Vitaly Kuznetsov
  1 sibling, 2 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-18 13:56 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

On Fri, Jul 18, 2014 at 01:05:46PM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Tue, Jul 15, 2014 at 05:43:17PM +0200, Vitaly Kuznetsov wrote:
> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> >> 
> >> > On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
> >> >> From: Olaf Hering <olaf@aepfle.de>
> >> >> 
> >> >> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
> >> >> ("xen PVonHVM: move shared_info to MMIO before kexec").
> >> >> 
> >> >> Currently kexec in a PVonHVM guest fails with a triple fault because the
> >> >> new kernel overwrites the shared info page. The exact failure depends on
> >> >> the size of the kernel image. This patch moves the pfn from RAM into an
> >> >> E820 reserved memory area.
> >> >> 
> >> >> The pfn containing the shared_info is located somewhere in RAM. This will
> >> >> cause trouble if the current kernel is doing a kexec boot into a new
> >> >> kernel. The new kernel (and its startup code) can not know where the pfn
> >> >> is, so it can not reserve the page. The hypervisor will continue to update
> >> >> the pfn, and as a result memory corruption occours in the new kernel.
> >> >> 
> >> >> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
> >> >> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
> >> >> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
> >> >> will usually not allocate areas up to FE700000, so FE700000 is expected
> >> >> to work also with older toolstacks.
> >> >> 
> >> >> In Xen3 there is no reserved area at a fixed location. If the guest is
> >> >> started on such old hosts the shared_info page will be placed in RAM. As
> >> >> a result kexec can not be used.
> >> >
> >> > So this looks right, the one thing that we really need to check 
> >> > is e9daff24a266307943457086533041bd971d0ef9
> >> >
> >> >    This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
> >> >
> >> >     We are doing this b/c on 32-bit PVonHVM with older hypervisors
> >> >     (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
> >> >     we use it for the time keeping, and the timekeeping code loops
> >> >     forever - as the version field never changes. Olaf says to
> >> >     revert it, so lets do that.
> >> >
> >> > Could you kindly test that the migration on 32-bit PVHVM guests
> >> > on older hypervisors works?
> >> >
> >> 
> >> Sure, will do! Was there anything special about the setup or any 32-bit
> >> pvhvm guest migration (on 64-bit hypervisor I suppose) would fail? I can
> >> try checking both current and old versions to make sure the issue was
> >> acutually fixed.
> >
> > Nothing fancy (well, it was SMP, so 4 CPUs). I did the 'save'/'restore' and the
> > guest would not restore properly.
> >
> 
> The symptoms you saw were: after the resume guest appears to be frozen,
> all vcpus except for the first one spin at 100%? I was able to reproduce

Yes, that is it.
> that on old patch version and everything works fine with your fix
> (calling xen_hvm_set_shared_info() in addition to
> xen_hvm_connect_shared_info() on resume). We're probably safe to apply
> it now, thanks!

Woot!  Could you include that tidbit of information in
the commit please?

> 
> However I'd like to suggest we remove '__init' from
> xen_hvm_set_shared_info() as now we call it on resume.

Good idea.

Lets wait until Stefano responds (for the MSI PIRQ one), and
if he does not have anything special to say, then repost the
whole patchset including this tiny __init fix and the updated
comment?


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-18 11:05         ` Vitaly Kuznetsov
@ 2014-07-18 13:56           ` Konrad Rzeszutek Wilk
  2014-07-18 13:56           ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-18 13:56 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

On Fri, Jul 18, 2014 at 01:05:46PM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > On Tue, Jul 15, 2014 at 05:43:17PM +0200, Vitaly Kuznetsov wrote:
> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> >> 
> >> > On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
> >> >> From: Olaf Hering <olaf@aepfle.de>
> >> >> 
> >> >> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
> >> >> ("xen PVonHVM: move shared_info to MMIO before kexec").
> >> >> 
> >> >> Currently kexec in a PVonHVM guest fails with a triple fault because the
> >> >> new kernel overwrites the shared info page. The exact failure depends on
> >> >> the size of the kernel image. This patch moves the pfn from RAM into an
> >> >> E820 reserved memory area.
> >> >> 
> >> >> The pfn containing the shared_info is located somewhere in RAM. This will
> >> >> cause trouble if the current kernel is doing a kexec boot into a new
> >> >> kernel. The new kernel (and its startup code) can not know where the pfn
> >> >> is, so it can not reserve the page. The hypervisor will continue to update
> >> >> the pfn, and as a result memory corruption occours in the new kernel.
> >> >> 
> >> >> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
> >> >> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
> >> >> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
> >> >> will usually not allocate areas up to FE700000, so FE700000 is expected
> >> >> to work also with older toolstacks.
> >> >> 
> >> >> In Xen3 there is no reserved area at a fixed location. If the guest is
> >> >> started on such old hosts the shared_info page will be placed in RAM. As
> >> >> a result kexec can not be used.
> >> >
> >> > So this looks right, the one thing that we really need to check 
> >> > is e9daff24a266307943457086533041bd971d0ef9
> >> >
> >> >    This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
> >> >
> >> >     We are doing this b/c on 32-bit PVonHVM with older hypervisors
> >> >     (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
> >> >     we use it for the time keeping, and the timekeeping code loops
> >> >     forever - as the version field never changes. Olaf says to
> >> >     revert it, so lets do that.
> >> >
> >> > Could you kindly test that the migration on 32-bit PVHVM guests
> >> > on older hypervisors works?
> >> >
> >> 
> >> Sure, will do! Was there anything special about the setup or any 32-bit
> >> pvhvm guest migration (on 64-bit hypervisor I suppose) would fail? I can
> >> try checking both current and old versions to make sure the issue was
> >> acutually fixed.
> >
> > Nothing fancy (well, it was SMP, so 4 CPUs). I did the 'save'/'restore' and the
> > guest would not restore properly.
> >
> 
> The symptoms you saw were: after the resume guest appears to be frozen,
> all vcpus except for the first one spin at 100%? I was able to reproduce

Yes, that is it.
> that on old patch version and everything works fine with your fix
> (calling xen_hvm_set_shared_info() in addition to
> xen_hvm_connect_shared_info() on resume). We're probably safe to apply
> it now, thanks!

Woot!  Could you include that tidbit of information in
the commit please?

> 
> However I'd like to suggest we remove '__init' from
> xen_hvm_set_shared_info() as now we call it on resume.

Good idea.

Lets wait until Stefano responds (for the MSI PIRQ one), and
if he does not have anything special to say, then repost the
whole patchset including this tiny __init fix and the updated
comment?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-18 13:56           ` Konrad Rzeszutek Wilk
@ 2014-07-18 15:45             ` Vitaly Kuznetsov
  2014-07-18 15:45             ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-18 15:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, David Vrabel, Stefano Stabellini,
	Andrew Jones, linux-kernel

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Fri, Jul 18, 2014 at 01:05:46PM +0200, Vitaly Kuznetsov wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> 
>> > On Tue, Jul 15, 2014 at 05:43:17PM +0200, Vitaly Kuznetsov wrote:
>> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> >> 
>> >> > On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
>> >> >> From: Olaf Hering <olaf@aepfle.de>
>> >> >> 
>> >> >> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
>> >> >> ("xen PVonHVM: move shared_info to MMIO before kexec").
>> >> >> 
>> >> >> Currently kexec in a PVonHVM guest fails with a triple fault because the
>> >> >> new kernel overwrites the shared info page. The exact failure depends on
>> >> >> the size of the kernel image. This patch moves the pfn from RAM into an
>> >> >> E820 reserved memory area.
>> >> >> 
>> >> >> The pfn containing the shared_info is located somewhere in RAM. This will
>> >> >> cause trouble if the current kernel is doing a kexec boot into a new
>> >> >> kernel. The new kernel (and its startup code) can not know where the pfn
>> >> >> is, so it can not reserve the page. The hypervisor will continue to update
>> >> >> the pfn, and as a result memory corruption occours in the new kernel.
>> >> >> 
>> >> >> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
>> >> >> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
>> >> >> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
>> >> >> will usually not allocate areas up to FE700000, so FE700000 is expected
>> >> >> to work also with older toolstacks.
>> >> >> 
>> >> >> In Xen3 there is no reserved area at a fixed location. If the guest is
>> >> >> started on such old hosts the shared_info page will be placed in RAM. As
>> >> >> a result kexec can not be used.
>> >> >
>> >> > So this looks right, the one thing that we really need to check 
>> >> > is e9daff24a266307943457086533041bd971d0ef9
>> >> >
>> >> >    This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
>> >> >
>> >> >     We are doing this b/c on 32-bit PVonHVM with older hypervisors
>> >> >     (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
>> >> >     we use it for the time keeping, and the timekeeping code loops
>> >> >     forever - as the version field never changes. Olaf says to
>> >> >     revert it, so lets do that.
>> >> >
>> >> > Could you kindly test that the migration on 32-bit PVHVM guests
>> >> > on older hypervisors works?
>> >> >
>> >> 
>> >> Sure, will do! Was there anything special about the setup or any 32-bit
>> >> pvhvm guest migration (on 64-bit hypervisor I suppose) would fail? I can
>> >> try checking both current and old versions to make sure the issue was
>> >> acutually fixed.
>> >
>> > Nothing fancy (well, it was SMP, so 4 CPUs). I did the 'save'/'restore' and the
>> > guest would not restore properly.
>> >
>> 
>> The symptoms you saw were: after the resume guest appears to be frozen,
>> all vcpus except for the first one spin at 100%? I was able to reproduce
>
> Yes, that is it.
>> that on old patch version and everything works fine with your fix
>> (calling xen_hvm_set_shared_info() in addition to
>> xen_hvm_connect_shared_info() on resume). We're probably safe to apply
>> it now, thanks!
>
> Woot!  Could you include that tidbit of information in
> the commit please?
>

Sure,

>> 
>> However I'd like to suggest we remove '__init' from
>> xen_hvm_set_shared_info() as now we call it on resume.
>
> Good idea.
>
> Lets wait until Stefano responds (for the MSI PIRQ one), and
> if he does not have anything special to say, then repost the
> whole patchset including this tiny __init fix and the updated
> comment?

Deal :-) Please take a look at my '[PATCH RFC] evtchn: introduce
EVTCHNOP_fifo_destroy hypercall'. In case that works we can fix FIFO
case at the same time, no TODO required.

I'll be able to return to this work at the end of next week.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-18 13:56           ` Konrad Rzeszutek Wilk
  2014-07-18 15:45             ` Vitaly Kuznetsov
@ 2014-07-18 15:45             ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-18 15:45 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, David Vrabel,
	xen-devel, Boris Ostrovsky

Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:

> On Fri, Jul 18, 2014 at 01:05:46PM +0200, Vitaly Kuznetsov wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> 
>> > On Tue, Jul 15, 2014 at 05:43:17PM +0200, Vitaly Kuznetsov wrote:
>> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
>> >> 
>> >> > On Tue, Jul 15, 2014 at 03:40:37PM +0200, Vitaly Kuznetsov wrote:
>> >> >> From: Olaf Hering <olaf@aepfle.de>
>> >> >> 
>> >> >> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
>> >> >> ("xen PVonHVM: move shared_info to MMIO before kexec").
>> >> >> 
>> >> >> Currently kexec in a PVonHVM guest fails with a triple fault because the
>> >> >> new kernel overwrites the shared info page. The exact failure depends on
>> >> >> the size of the kernel image. This patch moves the pfn from RAM into an
>> >> >> E820 reserved memory area.
>> >> >> 
>> >> >> The pfn containing the shared_info is located somewhere in RAM. This will
>> >> >> cause trouble if the current kernel is doing a kexec boot into a new
>> >> >> kernel. The new kernel (and its startup code) can not know where the pfn
>> >> >> is, so it can not reserve the page. The hypervisor will continue to update
>> >> >> the pfn, and as a result memory corruption occours in the new kernel.
>> >> >> 
>> >> >> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
>> >> >> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
>> >> >> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
>> >> >> will usually not allocate areas up to FE700000, so FE700000 is expected
>> >> >> to work also with older toolstacks.
>> >> >> 
>> >> >> In Xen3 there is no reserved area at a fixed location. If the guest is
>> >> >> started on such old hosts the shared_info page will be placed in RAM. As
>> >> >> a result kexec can not be used.
>> >> >
>> >> > So this looks right, the one thing that we really need to check 
>> >> > is e9daff24a266307943457086533041bd971d0ef9
>> >> >
>> >> >    This reverts commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f.
>> >> >
>> >> >     We are doing this b/c on 32-bit PVonHVM with older hypervisors
>> >> >     (Xen 4.1) it ends up bothing up the start_info. This is bad b/c
>> >> >     we use it for the time keeping, and the timekeeping code loops
>> >> >     forever - as the version field never changes. Olaf says to
>> >> >     revert it, so lets do that.
>> >> >
>> >> > Could you kindly test that the migration on 32-bit PVHVM guests
>> >> > on older hypervisors works?
>> >> >
>> >> 
>> >> Sure, will do! Was there anything special about the setup or any 32-bit
>> >> pvhvm guest migration (on 64-bit hypervisor I suppose) would fail? I can
>> >> try checking both current and old versions to make sure the issue was
>> >> acutually fixed.
>> >
>> > Nothing fancy (well, it was SMP, so 4 CPUs). I did the 'save'/'restore' and the
>> > guest would not restore properly.
>> >
>> 
>> The symptoms you saw were: after the resume guest appears to be frozen,
>> all vcpus except for the first one spin at 100%? I was able to reproduce
>
> Yes, that is it.
>> that on old patch version and everything works fine with your fix
>> (calling xen_hvm_set_shared_info() in addition to
>> xen_hvm_connect_shared_info() on resume). We're probably safe to apply
>> it now, thanks!
>
> Woot!  Could you include that tidbit of information in
> the commit please?
>

Sure,

>> 
>> However I'd like to suggest we remove '__init' from
>> xen_hvm_set_shared_info() as now we call it on resume.
>
> Good idea.
>
> Lets wait until Stefano responds (for the MSI PIRQ one), and
> if he does not have anything special to say, then repost the
> whole patchset including this tiny __init fix and the updated
> comment?

Deal :-) Please take a look at my '[PATCH RFC] evtchn: introduce
EVTCHNOP_fifo_destroy hypercall'. In case that works we can fix FIFO
case at the same time, no TODO required.

I'll be able to return to this work at the end of next week.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16 13:40       ` Konrad Rzeszutek Wilk
  2014-07-16 17:20         ` Vitaly Kuznetsov
  2014-07-16 17:20         ` Vitaly Kuznetsov
@ 2014-07-21 14:13         ` Stefano Stabellini
  2014-07-21 14:13         ` Stefano Stabellini
  3 siblings, 0 replies; 77+ messages in thread
From: Stefano Stabellini @ 2014-07-21 14:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Vitaly Kuznetsov, stefano.stabellini, xen-devel, Boris Ostrovsky,
	David Vrabel, Andrew Jones, linux-kernel

On Wed, 16 Jul 2014, Konrad Rzeszutek Wilk wrote:
> On Wed, Jul 16, 2014 at 11:01:55AM +0200, Vitaly Kuznetsov wrote:
> > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> > 
> > > On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
> > >> When kexec was peformed MSI IRQs for passthrough-ed devices were already
> > >> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
> > >> fails as we have no IRQ mapping information for that. Requesting for new
> > >> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
> > >> we don't recieve these IRQs.
> > >
> > > receive
> > >
> > 
> > Thanks for your comments!
> 
> Thank you for quick turnaround with the answers!
> > 
> > > How come '__write_msi_msg' does not result in new MSI IRQs?
> > >
> > 
> > Actually that was the hidden question in my RFC :-)
> > 
> > Let me describe what I see. When normal boot is performed we have the
> > following in xen_hvm_setup_msi_irqs():
> > 
> > __read_msi_msg()
> >  pirq -> 0
> > 
> > then we allocate new pirq with
> >  pirq = xen_allocate_pirq_msi()
> >  pirq -> 54
> > 
> > and we have the following mapping:
> > xen: msi --> pirq=54 --> irq=72
> > 
> > in 'xl debug-keys i':
> > (XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),
> > 
> > After kexec we see the following:
> > __read_msi_msg()
> >  pirq -> 54
> > 
> > but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
> >  pirq = xen_allocate_pirq_msi()
> >  pirq -> 55
> > 
> > and we have the following mapping:
> > xen: msi --> pirq=55 --> irq=75
> > 
> > However (afaict) mapping in xen wasn't updated:
> > 
> > in 'xl debug-keys i':
> > (XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),
> 
> I am wondering if that is related to in QEMU traditional:
> 
>     qemu-xen-trad: free all the pirqs for msi/msix when driver unloads
> 
> (which in the upstream QEMU is 1d4fd4f0e2fc5dcae0c60e00cc9af95f52988050)
> 
> If you have that patch in, is the PIRQ value correctly updated?
> 
> > 
> > > Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
> > > from the device and updating the internal PIRQ<->IRQ code to match
> > > with the reality?
> > >
> > 
> > Yea, 'always trust the device'.
> > 
> > >> 
> > >> RFC: I wasn't able to understand why commit af42b8d1 which introduced
> > >> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
> > >> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
> > >> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
> > >> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
> > >
> > > We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
> > > At least that is how the commit you mentioned worked.
> > >
> > 
> > I meant to say that in case we have pirq > 0 from __read_msi_msg() but
> > xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
> > xen_allocate_pirq_msi() which brings us new pirq.
> > 
> > > In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
> > > that we might be called twice by a buggy driver. As such we want to check
> > > our PIRQ<->IRQ to figure this out.
> > 
> > But if we're called twice we'll see the same pirq, right? Or there are
> 
> Good point.
> > some cases when we see 'crap' instead of pirq here?
> 
> For PCI passthrough devices they will be zero until they are enabled.
> But I am not sure about the emulated devices, such as e1000 or such, which
> would also go through this path (I think - do we have MSI devices that
> we emulate in QEMU?)
> 
> > 
> > I think it would be nice to use the same pirq after kexec instead of
> > allocating a new one even in case we can make remapping work.
> 
> I concur.
> 
> Stefano, do you recall why you used xen_irq_from_pirq instead of just
> trusting the 'pirq' value? Was it to workaround broken QEMU?

If I recall correctly the problem is that pirq == 0 is a valid pirq
number.  So the check pirq <= 0 is wrong.

Can we rely on the fact that msg.data is always 0 on first read?  If so,
then we could simply:

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 905956f..d824743 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 		__read_msi_msg(msidesc, &msg);
 		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
 			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
-		if (msg.data != XEN_PIRQ_MSI_DATA ||
-		    xen_irq_from_pirq(pirq) < 0) {
+		if (msg.data != XEN_PIRQ_MSI_DATA) {
 			pirq = xen_allocate_pirq_msi(dev, msidesc);
 			if (pirq < 0) {
 				irq = -ENODEV;

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16 13:40       ` Konrad Rzeszutek Wilk
                           ` (2 preceding siblings ...)
  2014-07-21 14:13         ` Stefano Stabellini
@ 2014-07-21 14:13         ` Stefano Stabellini
  3 siblings, 0 replies; 77+ messages in thread
From: Stefano Stabellini @ 2014-07-21 14:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Andrew Jones, stefano.stabellini, linux-kernel, David Vrabel,
	xen-devel, Vitaly Kuznetsov, Boris Ostrovsky

On Wed, 16 Jul 2014, Konrad Rzeszutek Wilk wrote:
> On Wed, Jul 16, 2014 at 11:01:55AM +0200, Vitaly Kuznetsov wrote:
> > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> > 
> > > On Tue, Jul 15, 2014 at 03:40:40PM +0200, Vitaly Kuznetsov wrote:
> > >> When kexec was peformed MSI IRQs for passthrough-ed devices were already
> > >> mapped and we see non-zero pirq extracted from MSI msg. xen_irq_from_pirq()
> > >> fails as we have no IRQ mapping information for that. Requesting for new
> > >> mapping with __write_msi_msg() does not result in MSI IRQ being remapped so
> > >> we don't recieve these IRQs.
> > >
> > > receive
> > >
> > 
> > Thanks for your comments!
> 
> Thank you for quick turnaround with the answers!
> > 
> > > How come '__write_msi_msg' does not result in new MSI IRQs?
> > >
> > 
> > Actually that was the hidden question in my RFC :-)
> > 
> > Let me describe what I see. When normal boot is performed we have the
> > following in xen_hvm_setup_msi_irqs():
> > 
> > __read_msi_msg()
> >  pirq -> 0
> > 
> > then we allocate new pirq with
> >  pirq = xen_allocate_pirq_msi()
> >  pirq -> 54
> > 
> > and we have the following mapping:
> > xen: msi --> pirq=54 --> irq=72
> > 
> > in 'xl debug-keys i':
> > (XEN)    IRQ:  29 affinity:04 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(----),
> > 
> > After kexec we see the following:
> > __read_msi_msg()
> >  pirq -> 54
> > 
> > but as xen_irq_from_pirq() fails we follow the same path allocating new pirq:
> >  pirq = xen_allocate_pirq_msi()
> >  pirq -> 55
> > 
> > and we have the following mapping:
> > xen: msi --> pirq=55 --> irq=75
> > 
> > However (afaict) mapping in xen wasn't updated:
> > 
> > in 'xl debug-keys i':
> > (XEN)    IRQ:  29 affinity:02 vec:b9 type=PCI-MSI status=00000030 in-flight=0 domain-list=7: 54(--M-),
> 
> I am wondering if that is related to in QEMU traditional:
> 
>     qemu-xen-trad: free all the pirqs for msi/msix when driver unloads
> 
> (which in the upstream QEMU is 1d4fd4f0e2fc5dcae0c60e00cc9af95f52988050)
> 
> If you have that patch in, is the PIRQ value correctly updated?
> 
> > 
> > > Is it fair to state that your code ends up reading the MSI IRQ (PIRQ)
> > > from the device and updating the internal PIRQ<->IRQ code to match
> > > with the reality?
> > >
> > 
> > Yea, 'always trust the device'.
> > 
> > >> 
> > >> RFC: I wasn't able to understand why commit af42b8d1 which introduced
> > >> xen_irq_from_pirq() check in xen_hvm_setup_msi_irqs() is checking that instead
> > >> of checking pirq > 0 as if the mapping was already done (and we have pirq>0 here)
> > >> we don't need to request for a new pirq. We're loosing existing PIRQ and I'm also
> > >> not sure when __write_msi_msg() with new PIRQ will result in new mapping.
> > >
> > > We don't request a new pirq. We end up returning before we call xen_allocate_pirq_msi.
> > > At least that is how the commit you mentioned worked.
> > >
> > 
> > I meant to say that in case we have pirq > 0 from __read_msi_msg() but
> > xen_irq_from_pirq(pirq) fails (kexec-only case?) we always do
> > xen_allocate_pirq_msi() which brings us new pirq.
> > 
> > > In regards to why using 'xen_irq_from_pirq' instead of just checking the PIRQ - is
> > > that we might be called twice by a buggy driver. As such we want to check
> > > our PIRQ<->IRQ to figure this out.
> > 
> > But if we're called twice we'll see the same pirq, right? Or there are
> 
> Good point.
> > some cases when we see 'crap' instead of pirq here?
> 
> For PCI passthrough devices they will be zero until they are enabled.
> But I am not sure about the emulated devices, such as e1000 or such, which
> would also go through this path (I think - do we have MSI devices that
> we emulate in QEMU?)
> 
> > 
> > I think it would be nice to use the same pirq after kexec instead of
> > allocating a new one even in case we can make remapping work.
> 
> I concur.
> 
> Stefano, do you recall why you used xen_irq_from_pirq instead of just
> trusting the 'pirq' value? Was it to workaround broken QEMU?

If I recall correctly the problem is that pirq == 0 is a valid pirq
number.  So the check pirq <= 0 is wrong.

Can we rely on the fact that msg.data is always 0 on first read?  If so,
then we could simply:

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 905956f..d824743 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -231,8 +231,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 		__read_msi_msg(msidesc, &msg);
 		pirq = MSI_ADDR_EXT_DEST_ID(msg.address_hi) |
 			((msg.address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff);
-		if (msg.data != XEN_PIRQ_MSI_DATA ||
-		    xen_irq_from_pirq(pirq) < 0) {
+		if (msg.data != XEN_PIRQ_MSI_DATA) {
 			pirq = xen_allocate_pirq_msi(dev, msidesc);
 			if (pirq < 0) {
 				irq = -ENODEV;

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec
  2014-07-15 13:40 [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec Vitaly Kuznetsov
                   ` (7 preceding siblings ...)
  2014-07-15 13:40 ` Vitaly Kuznetsov
@ 2014-07-28 13:24 ` David Vrabel
  2014-08-01 12:21   ` Vitaly Kuznetsov
  2014-08-01 12:21   ` Vitaly Kuznetsov
  2014-07-28 13:24 ` David Vrabel
  9 siblings, 2 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-28 13:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, Stefano Stabellini,
	Andrew Jones, linux-kernel

On 15/07/14 14:40, Vitaly Kuznetsov wrote:
> With this patch series I'm trying to address several issues with kexec on pvhvm:
> - shared_info issue (1st patch, just sending Olaf's work with Konrad's fix)
> - create specific pvhvm shutdown handler for kexec (2nd patch)
> - GSI PIRQ issue (3rd patch, I'm pretty confident that it does the right thing)
> - MSI PIRQ issue (4th patch, and I'm not sure it doesn't break anything -> RFC)
> 
> This patch series can be tested on single vCPU guest. We still have SMP issues with
> pvhvm guests and kexec which require additional fixes.

In addition to the fixes for multi-VCPU guests, what else remains?

What's the plan for handling granted pages?

I don't think we want to accept a partial solution unless the known
non-working configurations fail fast on kexec load.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec
  2014-07-15 13:40 [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec Vitaly Kuznetsov
                   ` (8 preceding siblings ...)
  2014-07-28 13:24 ` [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec David Vrabel
@ 2014-07-28 13:24 ` David Vrabel
  9 siblings, 0 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-28 13:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: linux-kernel, Boris Ostrovsky, Stefano Stabellini, Andrew Jones

On 15/07/14 14:40, Vitaly Kuznetsov wrote:
> With this patch series I'm trying to address several issues with kexec on pvhvm:
> - shared_info issue (1st patch, just sending Olaf's work with Konrad's fix)
> - create specific pvhvm shutdown handler for kexec (2nd patch)
> - GSI PIRQ issue (3rd patch, I'm pretty confident that it does the right thing)
> - MSI PIRQ issue (4th patch, and I'm not sure it doesn't break anything -> RFC)
> 
> This patch series can be tested on single vCPU guest. We still have SMP issues with
> pvhvm guests and kexec which require additional fixes.

In addition to the fixes for multi-VCPU guests, what else remains?

What's the plan for handling granted pages?

I don't think we want to accept a partial solution unless the known
non-working configurations fail fast on kexec load.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 13:40 ` Vitaly Kuznetsov
  2014-07-15 15:06   ` Konrad Rzeszutek Wilk
  2014-07-15 15:06   ` Konrad Rzeszutek Wilk
@ 2014-07-28 13:33   ` David Vrabel
  2014-08-04 15:15     ` Konrad Rzeszutek Wilk
  2014-08-04 15:15     ` Konrad Rzeszutek Wilk
  2014-07-28 13:33   ` David Vrabel
  3 siblings, 2 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-28 13:33 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, Stefano Stabellini,
	Andrew Jones, linux-kernel

On 15/07/14 14:40, Vitaly Kuznetsov wrote:
> From: Olaf Hering <olaf@aepfle.de>
> 
> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
> ("xen PVonHVM: move shared_info to MMIO before kexec").
> 
> Currently kexec in a PVonHVM guest fails with a triple fault because the
> new kernel overwrites the shared info page. The exact failure depends on
> the size of the kernel image. This patch moves the pfn from RAM into an
> E820 reserved memory area.
> 
> The pfn containing the shared_info is located somewhere in RAM. This will
> cause trouble if the current kernel is doing a kexec boot into a new
> kernel. The new kernel (and its startup code) can not know where the pfn
> is, so it can not reserve the page. The hypervisor will continue to update
> the pfn, and as a result memory corruption occours in the new kernel.
> 
> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
> will usually not allocate areas up to FE700000, so FE700000 is expected
> to work also with older toolstacks.
> 
> In Xen3 there is no reserved area at a fixed location. If the guest is
> started on such old hosts the shared_info page will be placed in RAM. As
> a result kexec can not be used.
> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)

Is this a useful commit to give?  Isn't it one of Konrad's random trees?


> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
>  #endif
>  }
>  
> -void __ref xen_hvm_init_shared_info(void)
> +#ifdef CONFIG_XEN_PVHVM
> +#define HVM_SHARED_INFO_ADDR 0xFE700000UL

I would like clearer documentation for this magic address.  Perhaps
referring to some toolstack documentation?

> @@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
>  
>  static uint32_t __init xen_hvm_platform(void)
>  {
> +	uint32_t eax, ebx, ecx, edx, base;
> +
>  	if (xen_pv_domain())
>  		return 0;
>  
> -	return xen_cpuid_base();
> +	base = xen_cpuid_base();
> +	if (!base)
> +		return 0;
> +
> +	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
> +
> +	xen_major = eax >> 16;
> +	xen_minor = eax & 0xffff;
> +
> +	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
> +
> +	return 1;

You appear to have changed to return value here?  Did you mean to return
base here?

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-15 13:40 ` Vitaly Kuznetsov
                     ` (2 preceding siblings ...)
  2014-07-28 13:33   ` David Vrabel
@ 2014-07-28 13:33   ` David Vrabel
  3 siblings, 0 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-28 13:33 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: linux-kernel, Boris Ostrovsky, Stefano Stabellini, Andrew Jones

On 15/07/14 14:40, Vitaly Kuznetsov wrote:
> From: Olaf Hering <olaf@aepfle.de>
> 
> This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
> ("xen PVonHVM: move shared_info to MMIO before kexec").
> 
> Currently kexec in a PVonHVM guest fails with a triple fault because the
> new kernel overwrites the shared info page. The exact failure depends on
> the size of the kernel image. This patch moves the pfn from RAM into an
> E820 reserved memory area.
> 
> The pfn containing the shared_info is located somewhere in RAM. This will
> cause trouble if the current kernel is doing a kexec boot into a new
> kernel. The new kernel (and its startup code) can not know where the pfn
> is, so it can not reserve the page. The hypervisor will continue to update
> the pfn, and as a result memory corruption occours in the new kernel.
> 
> The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
> E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
> starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
> will usually not allocate areas up to FE700000, so FE700000 is expected
> to work also with older toolstacks.
> 
> In Xen3 there is no reserved area at a fixed location. If the guest is
> started on such old hosts the shared_info page will be placed in RAM. As
> a result kexec can not be used.
> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)

Is this a useful commit to give?  Isn't it one of Konrad's random trees?


> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1726,23 +1726,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
>  #endif
>  }
>  
> -void __ref xen_hvm_init_shared_info(void)
> +#ifdef CONFIG_XEN_PVHVM
> +#define HVM_SHARED_INFO_ADDR 0xFE700000UL

I would like clearer documentation for this magic address.  Perhaps
referring to some toolstack documentation?

> @@ -1828,10 +1853,23 @@ static void __init xen_hvm_guest_init(void)
>  
>  static uint32_t __init xen_hvm_platform(void)
>  {
> +	uint32_t eax, ebx, ecx, edx, base;
> +
>  	if (xen_pv_domain())
>  		return 0;
>  
> -	return xen_cpuid_base();
> +	base = xen_cpuid_base();
> +	if (!base)
> +		return 0;
> +
> +	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
> +
> +	xen_major = eax >> 16;
> +	xen_minor = eax & 0xffff;
> +
> +	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
> +
> +	return 1;

You appear to have changed to return value here?  Did you mean to return
base here?

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 13:40 ` Vitaly Kuznetsov
                     ` (4 preceding siblings ...)
  2014-07-28 13:36   ` David Vrabel
@ 2014-07-28 13:36   ` David Vrabel
  5 siblings, 0 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-28 13:36 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, Stefano Stabellini,
	Andrew Jones, linux-kernel

On 15/07/14 14:40, Vitaly Kuznetsov wrote:
> PVHVM guest requires special actions before kexec. Register specific
> xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().
[...]
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
>  	.notifier_call	= xen_hvm_cpu_notify,
>  };
>  
> +static void xen_pvhvm_kexec_shutdown(void)
> +{
> +	xen_kexec_shutdown();

I think it would be preferrable to have

    if (kexec_in_progress)
        xen_kexec_shutdown();

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown()
  2014-07-15 13:40 ` Vitaly Kuznetsov
                     ` (3 preceding siblings ...)
  2014-07-15 17:41   ` Boris Ostrovsky
@ 2014-07-28 13:36   ` David Vrabel
  2014-07-28 13:36   ` David Vrabel
  5 siblings, 0 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-28 13:36 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: linux-kernel, Boris Ostrovsky, Stefano Stabellini, Andrew Jones

On 15/07/14 14:40, Vitaly Kuznetsov wrote:
> PVHVM guest requires special actions before kexec. Register specific
> xen_pvhvm_kexec_shutdown() handler for machine_ops.shutdown().
[...]
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1833,6 +1833,12 @@ static struct notifier_block xen_hvm_cpu_notifier = {
>  	.notifier_call	= xen_hvm_cpu_notify,
>  };
>  
> +static void xen_pvhvm_kexec_shutdown(void)
> +{
> +	xen_kexec_shutdown();

I think it would be preferrable to have

    if (kexec_in_progress)
        xen_kexec_shutdown();

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-15 13:40 ` [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown Vitaly Kuznetsov
  2014-07-15 15:23   ` Konrad Rzeszutek Wilk
  2014-07-15 15:23   ` Konrad Rzeszutek Wilk
@ 2014-07-28 13:43   ` David Vrabel
  2014-07-29 13:50     ` Vitaly Kuznetsov
  2014-07-29 13:50     ` Vitaly Kuznetsov
  2014-07-28 13:43   ` David Vrabel
  3 siblings, 2 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-28 13:43 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: Konrad Rzeszutek Wilk, Boris Ostrovsky, Stefano Stabellini,
	Andrew Jones, linux-kernel

On 15/07/14 14:40, Vitaly Kuznetsov wrote:
> When kexec is being run PIRQs from Qemu-emulated devices are still
> mapped to old event channels and new kernel has no information about
> that. Trying to map them twice results in the following in Xen's dmesg:
> 
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>  ...
> 
>  and the following in new kernel's dmesg:
> 
>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
> 
> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
> when kexec was requested and on every kernel startup. We need to do this
> twice to deal with the following issues:
> - startup-time unmapping is required to make kdump work;
> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>   is being performed).

I think this should be done only in one place -- just prior to exec'ing
the new kernel (including kdump kernels).

> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
>  #ifdef CONFIG_KEXEC
>  	if (!kexec_in_progress)
>  		return;
> +	xen_unmap_all_pirqs();
>  #endif
>  }
>  
> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
> index c919d3d..7701c7f 100644
> --- a/drivers/xen/events/events_base.c
> +++ b/drivers/xen/events/events_base.c
> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
>  static bool fifo_events = true;
>  module_param(fifo_events, bool, 0);
>  
> +void xen_unmap_all_pirqs(void)
> +{
> +	int pirq, rc, gsi, irq, evtchn;
> +	struct physdev_unmap_pirq unmap_irq;
> +	struct irq_info *info;
> +	struct evtchn_close close;
> +
> +	mutex_lock(&irq_mapping_update_lock);
> +
> +	list_for_each_entry(info, &xen_irq_list_head, list) {
> +		if (info->type != IRQT_PIRQ)
> +			continue;

I think you need to do this by querying Xen state rather than relying on
potentially bad guest state.  Particularly since you may crash while
holding irq_mapping_update_lock.

EVTCHNOP_status gets you the info you need I think.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-15 13:40 ` [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown Vitaly Kuznetsov
                     ` (2 preceding siblings ...)
  2014-07-28 13:43   ` David Vrabel
@ 2014-07-28 13:43   ` David Vrabel
  3 siblings, 0 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-28 13:43 UTC (permalink / raw)
  To: Vitaly Kuznetsov, xen-devel
  Cc: linux-kernel, Boris Ostrovsky, Stefano Stabellini, Andrew Jones

On 15/07/14 14:40, Vitaly Kuznetsov wrote:
> When kexec is being run PIRQs from Qemu-emulated devices are still
> mapped to old event channels and new kernel has no information about
> that. Trying to map them twice results in the following in Xen's dmesg:
> 
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>  ...
> 
>  and the following in new kernel's dmesg:
> 
>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
> 
> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
> when kexec was requested and on every kernel startup. We need to do this
> twice to deal with the following issues:
> - startup-time unmapping is required to make kdump work;
> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>   is being performed).

I think this should be done only in one place -- just prior to exec'ing
the new kernel (including kdump kernels).

> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
>  #ifdef CONFIG_KEXEC
>  	if (!kexec_in_progress)
>  		return;
> +	xen_unmap_all_pirqs();
>  #endif
>  }
>  
> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
> index c919d3d..7701c7f 100644
> --- a/drivers/xen/events/events_base.c
> +++ b/drivers/xen/events/events_base.c
> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
>  static bool fifo_events = true;
>  module_param(fifo_events, bool, 0);
>  
> +void xen_unmap_all_pirqs(void)
> +{
> +	int pirq, rc, gsi, irq, evtchn;
> +	struct physdev_unmap_pirq unmap_irq;
> +	struct irq_info *info;
> +	struct evtchn_close close;
> +
> +	mutex_lock(&irq_mapping_update_lock);
> +
> +	list_for_each_entry(info, &xen_irq_list_head, list) {
> +		if (info->type != IRQT_PIRQ)
> +			continue;

I think you need to do this by querying Xen state rather than relying on
potentially bad guest state.  Particularly since you may crash while
holding irq_mapping_update_lock.

EVTCHNOP_status gets you the info you need I think.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16 17:20         ` Vitaly Kuznetsov
                             ` (2 preceding siblings ...)
  2014-07-28 13:47           ` David Vrabel
@ 2014-07-28 13:47           ` David Vrabel
  3 siblings, 0 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-28 13:47 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Konrad Rzeszutek Wilk
  Cc: stefano.stabellini, xen-devel, Boris Ostrovsky, Andrew Jones,
	linux-kernel

On 16/07/14 18:20, Vitaly Kuznetsov wrote:
> 
> Thanks, that really works! I tested both kexec -e / kdump cases. I'm
> wondering if we although need my commit to workaround non-fixed qemus?

I don't think so.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec
  2014-07-16 17:20         ` Vitaly Kuznetsov
  2014-07-16 17:30           ` Konrad Rzeszutek Wilk
  2014-07-16 17:30           ` Konrad Rzeszutek Wilk
@ 2014-07-28 13:47           ` David Vrabel
  2014-07-28 13:47           ` David Vrabel
  3 siblings, 0 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-28 13:47 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Konrad Rzeszutek Wilk
  Cc: xen-devel, Boris Ostrovsky, Andrew Jones, linux-kernel,
	stefano.stabellini

On 16/07/14 18:20, Vitaly Kuznetsov wrote:
> 
> Thanks, that really works! I tested both kexec -e / kdump cases. I'm
> wondering if we although need my commit to workaround non-fixed qemus?

I don't think so.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-28 13:43   ` David Vrabel
  2014-07-29 13:50     ` Vitaly Kuznetsov
@ 2014-07-29 13:50     ` Vitaly Kuznetsov
  2014-07-29 15:25       ` David Vrabel
  2014-07-29 15:25       ` [Xen-devel] " David Vrabel
  1 sibling, 2 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-29 13:50 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Stefano Stabellini, Andrew Jones, linux-kernel

David Vrabel <david.vrabel@citrix.com> writes:

> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>> When kexec is being run PIRQs from Qemu-emulated devices are still
>> mapped to old event channels and new kernel has no information about
>> that. Trying to map them twice results in the following in Xen's dmesg:
>> 
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>>  ...
>> 
>>  and the following in new kernel's dmesg:
>> 
>>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>> 
>> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>> when kexec was requested and on every kernel startup. We need to do this
>> twice to deal with the following issues:
>> - startup-time unmapping is required to make kdump work;
>> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>>   is being performed).
>
> I think this should be done only in one place -- just prior to exec'ing
> the new kernel (including kdump kernels).
>

Thank you for your comments!

The problem I'm fighting wiht atm is: with FIFO-based event channels we
need to call evtchn_fifo_destroy() so next EVTCHNOP_init_control won't
fail. I was intended to put evtchn_fifo_destroy() in
EVTCHNOP_reset. That introduces a problem: we need to deal with
store/console channels. It is possible to remap those from guest with
EVTCHNOP_bind_interdomain (if we remember where they were mapped before)
but we can't do it after we did evtchn_fifo_destroy() and we can't
rebind them after kexec and performing EVTCHNOP_init_control as
we can't remember where these channels were mapped to after kexec/kdump.

I see the following possible solutions:
1) We put evtchn_fifo_destroy() in EVTCHNOP_init_control so
EVTCHNOP_init_control can be called twice. No EVTCHNOP_reset is required
in that case.

2) Introduce special (e.g. 'EVTCHNOP_fifo_destroy') hypercall to do
evtchn_fifo_destroy() without closing all channels. Alternatively we can
avoid closing all channels in EVTCHNOP_reset when called with DOMID_SELF
(as this mode is not being used atm) -- but that would look unobvious.

3) Keep evtchn_fifo_destroy() in EVTCHNOP_reset but keep console/store
channels -- I saw your concerns it is not safe, some sort of additional
blocking will be required.

4) Do the remapping boot time (query for store/console channels ->
perform EVTCHNOP_reset -> rebind with EVTCHNOP_bind_interdomain).

There is an additional problem: EVTCHNOP_bind_interdomain operation has
local port as OUT parameter so we can't guarantee that remapping
store/console channels will remap them to the same local channel they
were mapped before EVTCHNOP_reset (and we have this information in hvm
info: HVM_PARAM_CONSOLE_EVTCHN/HVM_PARAM_STORE_EVTCHN, ...). Not sure
how to deal with that in case we go with remapping.

Your thoughts would be very appreciated. Thank you again,

>> --- a/arch/x86/xen/smp.c
>> +++ b/arch/x86/xen/smp.c
>> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
>>  #ifdef CONFIG_KEXEC
>>  	if (!kexec_in_progress)
>>  		return;
>> +	xen_unmap_all_pirqs();
>>  #endif
>>  }
>>  
>> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
>> index c919d3d..7701c7f 100644
>> --- a/drivers/xen/events/events_base.c
>> +++ b/drivers/xen/events/events_base.c
>> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
>>  static bool fifo_events = true;
>>  module_param(fifo_events, bool, 0);
>>  
>> +void xen_unmap_all_pirqs(void)
>> +{
>> +	int pirq, rc, gsi, irq, evtchn;
>> +	struct physdev_unmap_pirq unmap_irq;
>> +	struct irq_info *info;
>> +	struct evtchn_close close;
>> +
>> +	mutex_lock(&irq_mapping_update_lock);
>> +
>> +	list_for_each_entry(info, &xen_irq_list_head, list) {
>> +		if (info->type != IRQT_PIRQ)
>> +			continue;
>
> I think you need to do this by querying Xen state rather than relying on
> potentially bad guest state.  Particularly since you may crash while
> holding irq_mapping_update_lock.
>
> EVTCHNOP_status gets you the info you need I think.
>
> David

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-28 13:43   ` David Vrabel
@ 2014-07-29 13:50     ` Vitaly Kuznetsov
  2014-07-29 13:50     ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-29 13:50 UTC (permalink / raw)
  To: David Vrabel
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, xen-devel,
	Boris Ostrovsky

David Vrabel <david.vrabel@citrix.com> writes:

> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>> When kexec is being run PIRQs from Qemu-emulated devices are still
>> mapped to old event channels and new kernel has no information about
>> that. Trying to map them twice results in the following in Xen's dmesg:
>> 
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>>  ...
>> 
>>  and the following in new kernel's dmesg:
>> 
>>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>> 
>> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>> when kexec was requested and on every kernel startup. We need to do this
>> twice to deal with the following issues:
>> - startup-time unmapping is required to make kdump work;
>> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>>   is being performed).
>
> I think this should be done only in one place -- just prior to exec'ing
> the new kernel (including kdump kernels).
>

Thank you for your comments!

The problem I'm fighting wiht atm is: with FIFO-based event channels we
need to call evtchn_fifo_destroy() so next EVTCHNOP_init_control won't
fail. I was intended to put evtchn_fifo_destroy() in
EVTCHNOP_reset. That introduces a problem: we need to deal with
store/console channels. It is possible to remap those from guest with
EVTCHNOP_bind_interdomain (if we remember where they were mapped before)
but we can't do it after we did evtchn_fifo_destroy() and we can't
rebind them after kexec and performing EVTCHNOP_init_control as
we can't remember where these channels were mapped to after kexec/kdump.

I see the following possible solutions:
1) We put evtchn_fifo_destroy() in EVTCHNOP_init_control so
EVTCHNOP_init_control can be called twice. No EVTCHNOP_reset is required
in that case.

2) Introduce special (e.g. 'EVTCHNOP_fifo_destroy') hypercall to do
evtchn_fifo_destroy() without closing all channels. Alternatively we can
avoid closing all channels in EVTCHNOP_reset when called with DOMID_SELF
(as this mode is not being used atm) -- but that would look unobvious.

3) Keep evtchn_fifo_destroy() in EVTCHNOP_reset but keep console/store
channels -- I saw your concerns it is not safe, some sort of additional
blocking will be required.

4) Do the remapping boot time (query for store/console channels ->
perform EVTCHNOP_reset -> rebind with EVTCHNOP_bind_interdomain).

There is an additional problem: EVTCHNOP_bind_interdomain operation has
local port as OUT parameter so we can't guarantee that remapping
store/console channels will remap them to the same local channel they
were mapped before EVTCHNOP_reset (and we have this information in hvm
info: HVM_PARAM_CONSOLE_EVTCHN/HVM_PARAM_STORE_EVTCHN, ...). Not sure
how to deal with that in case we go with remapping.

Your thoughts would be very appreciated. Thank you again,

>> --- a/arch/x86/xen/smp.c
>> +++ b/arch/x86/xen/smp.c
>> @@ -768,6 +768,7 @@ void xen_kexec_shutdown(void)
>>  #ifdef CONFIG_KEXEC
>>  	if (!kexec_in_progress)
>>  		return;
>> +	xen_unmap_all_pirqs();
>>  #endif
>>  }
>>  
>> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
>> index c919d3d..7701c7f 100644
>> --- a/drivers/xen/events/events_base.c
>> +++ b/drivers/xen/events/events_base.c
>> @@ -1643,6 +1643,80 @@ void xen_callback_vector(void) {}
>>  static bool fifo_events = true;
>>  module_param(fifo_events, bool, 0);
>>  
>> +void xen_unmap_all_pirqs(void)
>> +{
>> +	int pirq, rc, gsi, irq, evtchn;
>> +	struct physdev_unmap_pirq unmap_irq;
>> +	struct irq_info *info;
>> +	struct evtchn_close close;
>> +
>> +	mutex_lock(&irq_mapping_update_lock);
>> +
>> +	list_for_each_entry(info, &xen_irq_list_head, list) {
>> +		if (info->type != IRQT_PIRQ)
>> +			continue;
>
> I think you need to do this by querying Xen state rather than relying on
> potentially bad guest state.  Particularly since you may crash while
> holding irq_mapping_update_lock.
>
> EVTCHNOP_status gets you the info you need I think.
>
> David

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Xen-devel] [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-29 13:50     ` Vitaly Kuznetsov
  2014-07-29 15:25       ` David Vrabel
@ 2014-07-29 15:25       ` David Vrabel
  2014-07-29 17:06         ` Vitaly Kuznetsov
  2014-07-29 17:06         ` Vitaly Kuznetsov
  1 sibling, 2 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-29 15:25 UTC (permalink / raw)
  To: Vitaly Kuznetsov, David Vrabel
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, xen-devel,
	Boris Ostrovsky

On 29/07/14 14:50, Vitaly Kuznetsov wrote:
> David Vrabel <david.vrabel@citrix.com> writes:
> 
>> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>>> When kexec is being run PIRQs from Qemu-emulated devices are still
>>> mapped to old event channels and new kernel has no information about
>>> that. Trying to map them twice results in the following in Xen's dmesg:
>>>
>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>>>  ...
>>>
>>>  and the following in new kernel's dmesg:
>>>
>>>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>>>
>>> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>>> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>>> when kexec was requested and on every kernel startup. We need to do this
>>> twice to deal with the following issues:
>>> - startup-time unmapping is required to make kdump work;
>>> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>>> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>>>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>>>   is being performed).
>>
>> I think this should be done only in one place -- just prior to exec'ing
>> the new kernel (including kdump kernels).
>>
> 
> Thank you for your comments!
> 
> The problem I'm fighting wiht atm is: with FIFO-based event channels we
> need to call evtchn_fifo_destroy() so next EVTCHNOP_init_control won't
> fail. I was intended to put evtchn_fifo_destroy() in
> EVTCHNOP_reset. That introduces a problem: we need to deal with
> store/console channels. It is possible to remap those from guest with
> EVTCHNOP_bind_interdomain (if we remember where they were mapped before)
> but we can't do it after we did evtchn_fifo_destroy() and we can't
> rebind them after kexec and performing EVTCHNOP_init_control as
> we can't remember where these channels were mapped to after kexec/kdump.
> 
> I see the following possible solutions:
> 1) We put evtchn_fifo_destroy() in EVTCHNOP_init_control so
> EVTCHNOP_init_control can be called twice. No EVTCHNOP_reset is required
> in that case.

EVTCHNOP_init_control is called for each VCPU so I can't see how this
would work.

> 2) Introduce special (e.g. 'EVTCHNOP_fifo_destroy') hypercall to do
> evtchn_fifo_destroy() without closing all channels. Alternatively we can
> avoid closing all channels in EVTCHNOP_reset when called with DOMID_SELF
> (as this mode is not being used atm) -- but that would look unobvious.

I would try this.  The guest prior to kexec would then:

1. Use EVTCHNOP_status to query remote end of console and xenstore event
channels.

2. Loop for all event channels:

a. unmap pirq (if required)
b. EVTCHNOP_close

3. EVTCHNOP_fifo_destroy (the implementation of which must verify that
no channels are bound).

4. EVTCHNOP_bind_interdomain to rebind the console and xenstore channels.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-29 13:50     ` Vitaly Kuznetsov
@ 2014-07-29 15:25       ` David Vrabel
  2014-07-29 15:25       ` [Xen-devel] " David Vrabel
  1 sibling, 0 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-29 15:25 UTC (permalink / raw)
  To: Vitaly Kuznetsov, David Vrabel
  Cc: Boris Ostrovsky, xen-devel, Andrew Jones, linux-kernel,
	Stefano Stabellini

On 29/07/14 14:50, Vitaly Kuznetsov wrote:
> David Vrabel <david.vrabel@citrix.com> writes:
> 
>> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>>> When kexec is being run PIRQs from Qemu-emulated devices are still
>>> mapped to old event channels and new kernel has no information about
>>> that. Trying to map them twice results in the following in Xen's dmesg:
>>>
>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>>>  ...
>>>
>>>  and the following in new kernel's dmesg:
>>>
>>>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>>>
>>> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>>> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>>> when kexec was requested and on every kernel startup. We need to do this
>>> twice to deal with the following issues:
>>> - startup-time unmapping is required to make kdump work;
>>> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>>> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>>>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>>>   is being performed).
>>
>> I think this should be done only in one place -- just prior to exec'ing
>> the new kernel (including kdump kernels).
>>
> 
> Thank you for your comments!
> 
> The problem I'm fighting wiht atm is: with FIFO-based event channels we
> need to call evtchn_fifo_destroy() so next EVTCHNOP_init_control won't
> fail. I was intended to put evtchn_fifo_destroy() in
> EVTCHNOP_reset. That introduces a problem: we need to deal with
> store/console channels. It is possible to remap those from guest with
> EVTCHNOP_bind_interdomain (if we remember where they were mapped before)
> but we can't do it after we did evtchn_fifo_destroy() and we can't
> rebind them after kexec and performing EVTCHNOP_init_control as
> we can't remember where these channels were mapped to after kexec/kdump.
> 
> I see the following possible solutions:
> 1) We put evtchn_fifo_destroy() in EVTCHNOP_init_control so
> EVTCHNOP_init_control can be called twice. No EVTCHNOP_reset is required
> in that case.

EVTCHNOP_init_control is called for each VCPU so I can't see how this
would work.

> 2) Introduce special (e.g. 'EVTCHNOP_fifo_destroy') hypercall to do
> evtchn_fifo_destroy() without closing all channels. Alternatively we can
> avoid closing all channels in EVTCHNOP_reset when called with DOMID_SELF
> (as this mode is not being used atm) -- but that would look unobvious.

I would try this.  The guest prior to kexec would then:

1. Use EVTCHNOP_status to query remote end of console and xenstore event
channels.

2. Loop for all event channels:

a. unmap pirq (if required)
b. EVTCHNOP_close

3. EVTCHNOP_fifo_destroy (the implementation of which must verify that
no channels are bound).

4. EVTCHNOP_bind_interdomain to rebind the console and xenstore channels.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Xen-devel] [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-29 15:25       ` [Xen-devel] " David Vrabel
@ 2014-07-29 17:06         ` Vitaly Kuznetsov
  2014-07-29 17:12           ` David Vrabel
  2014-07-29 17:12           ` David Vrabel
  2014-07-29 17:06         ` Vitaly Kuznetsov
  1 sibling, 2 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-29 17:06 UTC (permalink / raw)
  To: David Vrabel
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, xen-devel,
	Boris Ostrovsky

David Vrabel <david.vrabel@citrix.com> writes:

> On 29/07/14 14:50, Vitaly Kuznetsov wrote:
>> David Vrabel <david.vrabel@citrix.com> writes:
>> 
>>> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>>>> When kexec is being run PIRQs from Qemu-emulated devices are still
>>>> mapped to old event channels and new kernel has no information about
>>>> that. Trying to map them twice results in the following in Xen's dmesg:
>>>>
>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>>>>  ...
>>>>
>>>>  and the following in new kernel's dmesg:
>>>>
>>>>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>>>>
>>>> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>>>> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>>>> when kexec was requested and on every kernel startup. We need to do this
>>>> twice to deal with the following issues:
>>>> - startup-time unmapping is required to make kdump work;
>>>> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>>>> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>>>>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>>>>   is being performed).
>>>
>>> I think this should be done only in one place -- just prior to exec'ing
>>> the new kernel (including kdump kernels).
>>>
>> 
>> Thank you for your comments!
>> 
>> The problem I'm fighting wiht atm is: with FIFO-based event channels we
>> need to call evtchn_fifo_destroy() so next EVTCHNOP_init_control won't
>> fail. I was intended to put evtchn_fifo_destroy() in
>> EVTCHNOP_reset. That introduces a problem: we need to deal with
>> store/console channels. It is possible to remap those from guest with
>> EVTCHNOP_bind_interdomain (if we remember where they were mapped before)
>> but we can't do it after we did evtchn_fifo_destroy() and we can't
>> rebind them after kexec and performing EVTCHNOP_init_control as
>> we can't remember where these channels were mapped to after kexec/kdump.
>> 
>> I see the following possible solutions:
>> 1) We put evtchn_fifo_destroy() in EVTCHNOP_init_control so
>> EVTCHNOP_init_control can be called twice. No EVTCHNOP_reset is required
>> in that case.
>
> EVTCHNOP_init_control is called for each VCPU so I can't see how this
> would work.

Right, forgot about that...

>
>> 2) Introduce special (e.g. 'EVTCHNOP_fifo_destroy') hypercall to do
>> evtchn_fifo_destroy() without closing all channels. Alternatively we can
>> avoid closing all channels in EVTCHNOP_reset when called with DOMID_SELF
>> (as this mode is not being used atm) -- but that would look unobvious.
>
> I would try this.  The guest prior to kexec would then:
>
> 1. Use EVTCHNOP_status to query remote end of console and xenstore event
> channels.
>
> 2. Loop for all event channels:
>
> a. unmap pirq (if required)
> b. EVTCHNOP_close
>
> 3. EVTCHNOP_fifo_destroy (the implementation of which must verify that
> no channels are bound).
>
> 4. EVTCHNOP_bind_interdomain to rebind the console and xenstore channels.
>

Yea, that's what I have now when I put evtchn_fifo_destroy() in
EVTCHNOP_reset. The problem here is: we can't do
EVTCHNOP_bind_interdomain after we did evtchn_fifo_destroy(), we need to
call EVTCHNOP_init_control first. And we'll do that only after kexec so
we won't remember what we need to remap.. The second issue is the fact
that EVTCHNOP_bind_interdomain will remap store/storage channels to
*some* local ports, not necessary matching hvm info
(HVM_PARAM_CONSOLE_EVTCHN/HVM_PARAM_STORE_EVTCHN)..

Would it be safe is instead of closing interdomain channels on
EVTCHNOP_fifo_destroy we switch evtchn_port_ops to evtchn_port_ops_2l
(so on EVTCHNOP_init_control after kexec we switch back)? I'll try
prototyping this.

Thank you for your comments,

> David

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-29 15:25       ` [Xen-devel] " David Vrabel
  2014-07-29 17:06         ` Vitaly Kuznetsov
@ 2014-07-29 17:06         ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-07-29 17:06 UTC (permalink / raw)
  To: David Vrabel
  Cc: Boris Ostrovsky, xen-devel, Andrew Jones, linux-kernel,
	Stefano Stabellini

David Vrabel <david.vrabel@citrix.com> writes:

> On 29/07/14 14:50, Vitaly Kuznetsov wrote:
>> David Vrabel <david.vrabel@citrix.com> writes:
>> 
>>> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>>>> When kexec is being run PIRQs from Qemu-emulated devices are still
>>>> mapped to old event channels and new kernel has no information about
>>>> that. Trying to map them twice results in the following in Xen's dmesg:
>>>>
>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>>>>  ...
>>>>
>>>>  and the following in new kernel's dmesg:
>>>>
>>>>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>>>>
>>>> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>>>> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>>>> when kexec was requested and on every kernel startup. We need to do this
>>>> twice to deal with the following issues:
>>>> - startup-time unmapping is required to make kdump work;
>>>> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>>>> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>>>>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>>>>   is being performed).
>>>
>>> I think this should be done only in one place -- just prior to exec'ing
>>> the new kernel (including kdump kernels).
>>>
>> 
>> Thank you for your comments!
>> 
>> The problem I'm fighting wiht atm is: with FIFO-based event channels we
>> need to call evtchn_fifo_destroy() so next EVTCHNOP_init_control won't
>> fail. I was intended to put evtchn_fifo_destroy() in
>> EVTCHNOP_reset. That introduces a problem: we need to deal with
>> store/console channels. It is possible to remap those from guest with
>> EVTCHNOP_bind_interdomain (if we remember where they were mapped before)
>> but we can't do it after we did evtchn_fifo_destroy() and we can't
>> rebind them after kexec and performing EVTCHNOP_init_control as
>> we can't remember where these channels were mapped to after kexec/kdump.
>> 
>> I see the following possible solutions:
>> 1) We put evtchn_fifo_destroy() in EVTCHNOP_init_control so
>> EVTCHNOP_init_control can be called twice. No EVTCHNOP_reset is required
>> in that case.
>
> EVTCHNOP_init_control is called for each VCPU so I can't see how this
> would work.

Right, forgot about that...

>
>> 2) Introduce special (e.g. 'EVTCHNOP_fifo_destroy') hypercall to do
>> evtchn_fifo_destroy() without closing all channels. Alternatively we can
>> avoid closing all channels in EVTCHNOP_reset when called with DOMID_SELF
>> (as this mode is not being used atm) -- but that would look unobvious.
>
> I would try this.  The guest prior to kexec would then:
>
> 1. Use EVTCHNOP_status to query remote end of console and xenstore event
> channels.
>
> 2. Loop for all event channels:
>
> a. unmap pirq (if required)
> b. EVTCHNOP_close
>
> 3. EVTCHNOP_fifo_destroy (the implementation of which must verify that
> no channels are bound).
>
> 4. EVTCHNOP_bind_interdomain to rebind the console and xenstore channels.
>

Yea, that's what I have now when I put evtchn_fifo_destroy() in
EVTCHNOP_reset. The problem here is: we can't do
EVTCHNOP_bind_interdomain after we did evtchn_fifo_destroy(), we need to
call EVTCHNOP_init_control first. And we'll do that only after kexec so
we won't remember what we need to remap.. The second issue is the fact
that EVTCHNOP_bind_interdomain will remap store/storage channels to
*some* local ports, not necessary matching hvm info
(HVM_PARAM_CONSOLE_EVTCHN/HVM_PARAM_STORE_EVTCHN)..

Would it be safe is instead of closing interdomain channels on
EVTCHNOP_fifo_destroy we switch evtchn_port_ops to evtchn_port_ops_2l
(so on EVTCHNOP_init_control after kexec we switch back)? I'll try
prototyping this.

Thank you for your comments,

> David

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Xen-devel] [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-29 17:06         ` Vitaly Kuznetsov
@ 2014-07-29 17:12           ` David Vrabel
  2014-07-29 17:12           ` David Vrabel
  1 sibling, 0 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-29 17:12 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, xen-devel,
	Boris Ostrovsky

On 29/07/14 18:06, Vitaly Kuznetsov wrote:
> David Vrabel <david.vrabel@citrix.com> writes:
> 
>> On 29/07/14 14:50, Vitaly Kuznetsov wrote:
>>> David Vrabel <david.vrabel@citrix.com> writes:
>>>
>>>> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>>>>> When kexec is being run PIRQs from Qemu-emulated devices are still
>>>>> mapped to old event channels and new kernel has no information about
>>>>> that. Trying to map them twice results in the following in Xen's dmesg:
>>>>>
>>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>>>>>  ...
>>>>>
>>>>>  and the following in new kernel's dmesg:
>>>>>
>>>>>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>>>>>
>>>>> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>>>>> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>>>>> when kexec was requested and on every kernel startup. We need to do this
>>>>> twice to deal with the following issues:
>>>>> - startup-time unmapping is required to make kdump work;
>>>>> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>>>>> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>>>>>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>>>>>   is being performed).
>>>>
>>>> I think this should be done only in one place -- just prior to exec'ing
>>>> the new kernel (including kdump kernels).
>>>>
>>>
>>> Thank you for your comments!
>>>
>>> The problem I'm fighting wiht atm is: with FIFO-based event channels we
>>> need to call evtchn_fifo_destroy() so next EVTCHNOP_init_control won't
>>> fail. I was intended to put evtchn_fifo_destroy() in
>>> EVTCHNOP_reset. That introduces a problem: we need to deal with
>>> store/console channels. It is possible to remap those from guest with
>>> EVTCHNOP_bind_interdomain (if we remember where they were mapped before)
>>> but we can't do it after we did evtchn_fifo_destroy() and we can't
>>> rebind them after kexec and performing EVTCHNOP_init_control as
>>> we can't remember where these channels were mapped to after kexec/kdump.
>>>
>>> I see the following possible solutions:
>>> 1) We put evtchn_fifo_destroy() in EVTCHNOP_init_control so
>>> EVTCHNOP_init_control can be called twice. No EVTCHNOP_reset is required
>>> in that case.
>>
>> EVTCHNOP_init_control is called for each VCPU so I can't see how this
>> would work.
> 
> Right, forgot about that...
> 
>>
>>> 2) Introduce special (e.g. 'EVTCHNOP_fifo_destroy') hypercall to do
>>> evtchn_fifo_destroy() without closing all channels. Alternatively we can
>>> avoid closing all channels in EVTCHNOP_reset when called with DOMID_SELF
>>> (as this mode is not being used atm) -- but that would look unobvious.
>>
>> I would try this.  The guest prior to kexec would then:
>>
>> 1. Use EVTCHNOP_status to query remote end of console and xenstore event
>> channels.
>>
>> 2. Loop for all event channels:
>>
>> a. unmap pirq (if required)
>> b. EVTCHNOP_close
>>
>> 3. EVTCHNOP_fifo_destroy (the implementation of which must verify that
>> no channels are bound).
>>
>> 4. EVTCHNOP_bind_interdomain to rebind the console and xenstore channels.
>>
> 
> Yea, that's what I have now when I put evtchn_fifo_destroy() in
> EVTCHNOP_reset. The problem here is: we can't do
> EVTCHNOP_bind_interdomain after we did evtchn_fifo_destroy(), we need to
> call EVTCHNOP_init_control first. And we'll do that only after kexec so
> we won't remember what we need to remap.. The second issue is the fact
> that EVTCHNOP_bind_interdomain will remap store/storage channels to
> *some* local ports, not necessary matching hvm info
> (HVM_PARAM_CONSOLE_EVTCHN/HVM_PARAM_STORE_EVTCHN)..

You can set the HVM params to match.

> Would it be safe is instead of closing interdomain channels on
> EVTCHNOP_fifo_destroy we switch evtchn_port_ops to evtchn_port_ops_2l
> (so on EVTCHNOP_init_control after kexec we switch back)? I'll try
> prototyping this.

Switching the ops back to the 2l ones is an essential part of
fifo_destroy (the execed kernel might not have fifo support).  I assumed
you were already doing this.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown
  2014-07-29 17:06         ` Vitaly Kuznetsov
  2014-07-29 17:12           ` David Vrabel
@ 2014-07-29 17:12           ` David Vrabel
  1 sibling, 0 replies; 77+ messages in thread
From: David Vrabel @ 2014-07-29 17:12 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Boris Ostrovsky, xen-devel, Andrew Jones, linux-kernel,
	Stefano Stabellini

On 29/07/14 18:06, Vitaly Kuznetsov wrote:
> David Vrabel <david.vrabel@citrix.com> writes:
> 
>> On 29/07/14 14:50, Vitaly Kuznetsov wrote:
>>> David Vrabel <david.vrabel@citrix.com> writes:
>>>
>>>> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>>>>> When kexec is being run PIRQs from Qemu-emulated devices are still
>>>>> mapped to old event channels and new kernel has no information about
>>>>> that. Trying to map them twice results in the following in Xen's dmesg:
>>>>>
>>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 8 already mapped
>>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 12 already mapped
>>>>>  (XEN) irq.c:2278: dom7: pirq 24 or emuirq 1 already mapped
>>>>>  ...
>>>>>
>>>>>  and the following in new kernel's dmesg:
>>>>>
>>>>>  [   92.286796] xen:events: Failed to obtain physical IRQ 4
>>>>>
>>>>> The result is that the new kernel doesn't recieve IRQs for Qemu-emulated
>>>>> devices. Address the issue by unmapping all mapped PIRQs on kernel shutdown
>>>>> when kexec was requested and on every kernel startup. We need to do this
>>>>> twice to deal with the following issues:
>>>>> - startup-time unmapping is required to make kdump work;
>>>>> - shutdown-time unmapping is required to support kexec-ing non-fixed kernels;
>>>>> - shutdown-time unmapping is required to make Qemu-emulated NICs work after
>>>>>   kexec (event channel is being closed on shutdown but no PHYSDEVOP_unmap_pirq
>>>>>   is being performed).
>>>>
>>>> I think this should be done only in one place -- just prior to exec'ing
>>>> the new kernel (including kdump kernels).
>>>>
>>>
>>> Thank you for your comments!
>>>
>>> The problem I'm fighting wiht atm is: with FIFO-based event channels we
>>> need to call evtchn_fifo_destroy() so next EVTCHNOP_init_control won't
>>> fail. I was intended to put evtchn_fifo_destroy() in
>>> EVTCHNOP_reset. That introduces a problem: we need to deal with
>>> store/console channels. It is possible to remap those from guest with
>>> EVTCHNOP_bind_interdomain (if we remember where they were mapped before)
>>> but we can't do it after we did evtchn_fifo_destroy() and we can't
>>> rebind them after kexec and performing EVTCHNOP_init_control as
>>> we can't remember where these channels were mapped to after kexec/kdump.
>>>
>>> I see the following possible solutions:
>>> 1) We put evtchn_fifo_destroy() in EVTCHNOP_init_control so
>>> EVTCHNOP_init_control can be called twice. No EVTCHNOP_reset is required
>>> in that case.
>>
>> EVTCHNOP_init_control is called for each VCPU so I can't see how this
>> would work.
> 
> Right, forgot about that...
> 
>>
>>> 2) Introduce special (e.g. 'EVTCHNOP_fifo_destroy') hypercall to do
>>> evtchn_fifo_destroy() without closing all channels. Alternatively we can
>>> avoid closing all channels in EVTCHNOP_reset when called with DOMID_SELF
>>> (as this mode is not being used atm) -- but that would look unobvious.
>>
>> I would try this.  The guest prior to kexec would then:
>>
>> 1. Use EVTCHNOP_status to query remote end of console and xenstore event
>> channels.
>>
>> 2. Loop for all event channels:
>>
>> a. unmap pirq (if required)
>> b. EVTCHNOP_close
>>
>> 3. EVTCHNOP_fifo_destroy (the implementation of which must verify that
>> no channels are bound).
>>
>> 4. EVTCHNOP_bind_interdomain to rebind the console and xenstore channels.
>>
> 
> Yea, that's what I have now when I put evtchn_fifo_destroy() in
> EVTCHNOP_reset. The problem here is: we can't do
> EVTCHNOP_bind_interdomain after we did evtchn_fifo_destroy(), we need to
> call EVTCHNOP_init_control first. And we'll do that only after kexec so
> we won't remember what we need to remap.. The second issue is the fact
> that EVTCHNOP_bind_interdomain will remap store/storage channels to
> *some* local ports, not necessary matching hvm info
> (HVM_PARAM_CONSOLE_EVTCHN/HVM_PARAM_STORE_EVTCHN)..

You can set the HVM params to match.

> Would it be safe is instead of closing interdomain channels on
> EVTCHNOP_fifo_destroy we switch evtchn_port_ops to evtchn_port_ops_2l
> (so on EVTCHNOP_init_control after kexec we switch back)? I'll try
> prototyping this.

Switching the ops back to the 2l ones is an essential part of
fifo_destroy (the execed kernel might not have fifo support).  I assumed
you were already doing this.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec
  2014-07-28 13:24 ` [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec David Vrabel
  2014-08-01 12:21   ` Vitaly Kuznetsov
@ 2014-08-01 12:21   ` Vitaly Kuznetsov
  2014-08-01 13:00     ` David Vrabel
  2014-08-01 13:00     ` David Vrabel
  1 sibling, 2 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-01 12:21 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Stefano Stabellini, Andrew Jones, linux-kernel

David Vrabel <david.vrabel@citrix.com> writes:

> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>> With this patch series I'm trying to address several issues with kexec on pvhvm:
>> - shared_info issue (1st patch, just sending Olaf's work with Konrad's fix)
>> - create specific pvhvm shutdown handler for kexec (2nd patch)
>> - GSI PIRQ issue (3rd patch, I'm pretty confident that it does the right thing)
>> - MSI PIRQ issue (4th patch, and I'm not sure it doesn't break anything -> RFC)
>> 
>> This patch series can be tested on single vCPU guest. We still have SMP issues with
>> pvhvm guests and kexec which require additional fixes.
>
> In addition to the fixes for multi-VCPU guests, what else remains?
>

I'm aware of grants and ballooned out pages.

> What's the plan for handling granted pages?
>

(if I got the design right) we have two issues:

1) Pages we grant access to other domains. We have the list so we can
try doing gnttab_end_foreign_access for all unmapped grants but there is
nothing we can do with mapped ones from guest. We can either assume that
all such usages are short-term and try waiting for them to finish or we
need to do something like force-unmap from hypervisor side.

2) Pages we mapped from other domains. There is no easy way to collect
all grant handles from different places in kernel atm so I can see two
possible solutions:
- we keep track of all handles with new kernel structure in guest and
unmap them all on kexec/kdump.
- we introduce new GNTTABOP_reset which does something similar to
gnttab_release_mappings().

There is nothing we need to do with transferred grants (and I don't see
transfer usages in kernel).

Please correct me if I'm wrong.

> I don't think we want to accept a partial solution unless the known
> non-working configurations fail fast on kexec load.

*I think* we can leave ballooned out pages out of scope for now.

Thanks,

>
> David

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec
  2014-07-28 13:24 ` [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec David Vrabel
@ 2014-08-01 12:21   ` Vitaly Kuznetsov
  2014-08-01 12:21   ` Vitaly Kuznetsov
  1 sibling, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-01 12:21 UTC (permalink / raw)
  To: David Vrabel
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, xen-devel,
	Boris Ostrovsky

David Vrabel <david.vrabel@citrix.com> writes:

> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>> With this patch series I'm trying to address several issues with kexec on pvhvm:
>> - shared_info issue (1st patch, just sending Olaf's work with Konrad's fix)
>> - create specific pvhvm shutdown handler for kexec (2nd patch)
>> - GSI PIRQ issue (3rd patch, I'm pretty confident that it does the right thing)
>> - MSI PIRQ issue (4th patch, and I'm not sure it doesn't break anything -> RFC)
>> 
>> This patch series can be tested on single vCPU guest. We still have SMP issues with
>> pvhvm guests and kexec which require additional fixes.
>
> In addition to the fixes for multi-VCPU guests, what else remains?
>

I'm aware of grants and ballooned out pages.

> What's the plan for handling granted pages?
>

(if I got the design right) we have two issues:

1) Pages we grant access to other domains. We have the list so we can
try doing gnttab_end_foreign_access for all unmapped grants but there is
nothing we can do with mapped ones from guest. We can either assume that
all such usages are short-term and try waiting for them to finish or we
need to do something like force-unmap from hypervisor side.

2) Pages we mapped from other domains. There is no easy way to collect
all grant handles from different places in kernel atm so I can see two
possible solutions:
- we keep track of all handles with new kernel structure in guest and
unmap them all on kexec/kdump.
- we introduce new GNTTABOP_reset which does something similar to
gnttab_release_mappings().

There is nothing we need to do with transferred grants (and I don't see
transfer usages in kernel).

Please correct me if I'm wrong.

> I don't think we want to accept a partial solution unless the known
> non-working configurations fail fast on kexec load.

*I think* we can leave ballooned out pages out of scope for now.

Thanks,

>
> David

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec
  2014-08-01 12:21   ` Vitaly Kuznetsov
@ 2014-08-01 13:00     ` David Vrabel
  2014-08-04 15:44         ` Vitaly Kuznetsov
  2014-08-01 13:00     ` David Vrabel
  1 sibling, 1 reply; 77+ messages in thread
From: David Vrabel @ 2014-08-01 13:00 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Stefano Stabellini, Andrew Jones, linux-kernel

On 01/08/14 13:21, Vitaly Kuznetsov wrote:
> David Vrabel <david.vrabel@citrix.com> writes:
> 
>> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>>> With this patch series I'm trying to address several issues with kexec on pvhvm:
>>> - shared_info issue (1st patch, just sending Olaf's work with Konrad's fix)
>>> - create specific pvhvm shutdown handler for kexec (2nd patch)
>>> - GSI PIRQ issue (3rd patch, I'm pretty confident that it does the right thing)
>>> - MSI PIRQ issue (4th patch, and I'm not sure it doesn't break anything -> RFC)
>>>
>>> This patch series can be tested on single vCPU guest. We still have SMP issues with
>>> pvhvm guests and kexec which require additional fixes.
>>
>> In addition to the fixes for multi-VCPU guests, what else remains?
>>
> 
> I'm aware of grants and ballooned out pages.
> 
>> What's the plan for handling granted pages?
>>
> 
> (if I got the design right) we have two issues:
> 
> 1) Pages we grant access to other domains. We have the list so we can
> try doing gnttab_end_foreign_access for all unmapped grants but there is
> nothing we can do with mapped ones from guest. We can either assume that
> all such usages are short-term and try waiting for them to finish or we
> need to do something like force-unmap from hypervisor side.

Shared rings and persistent grants (used in blkfront) remain mapped for
long periods so just waiting won't work.

Force unmap by the hypervisor might be a possibility but the hypervisor
needs to atomically replace the grant mapping with a different valid
mapping, or the force unmap would cause the backend that was accesses
the pages to fault.

Every writable mapping would have to be replaced with a mapping to a
unique page (to prevent information leaking between different granted
pages).  Read-only mappings could be replaces with a read-only mapping
to shared zero page safely.

The only way I can see how to do this requires co-operation from the
backend kernel -- it would need to provide replacement frames for every
grant map.  Xen would then use this frame when force-unmapping
(revoking) the mapping.

> 2) Pages we mapped from other domains. There is no easy way to collect
> all grant handles from different places in kernel atm so I can see two
> possible solutions:
> - we keep track of all handles with new kernel structure in guest and
> unmap them all on kexec/kdump.
> - we introduce new GNTTABOP_reset which does something similar to
> gnttab_release_mappings().

I think you can ignore this for now -- frontend drivers do not grant
map, but see suggestion about kexec_is_safe below.

> There is nothing we need to do with transferred grants (and I don't see
> transfer usages in kernel).

Agreed.

>> I don't think we want to accept a partial solution unless the known
>> non-working configurations fail fast on kexec load.
> 
> *I think* we can leave ballooned out pages out of scope for now.

I'm thinking we want a global flag (kexec_is_safe) that is cleared even
any unsafe operation (e.g., ballooning, grant mapping) is done by the
guest.  kexec load and exec can then be made to fail if this flag is clear.

This would allow you to fix only the use cases you're interested in
without introducing subtle failures if someone tries an unsupported
configurations.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec
  2014-08-01 12:21   ` Vitaly Kuznetsov
  2014-08-01 13:00     ` David Vrabel
@ 2014-08-01 13:00     ` David Vrabel
  1 sibling, 0 replies; 77+ messages in thread
From: David Vrabel @ 2014-08-01 13:00 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, xen-devel,
	Boris Ostrovsky

On 01/08/14 13:21, Vitaly Kuznetsov wrote:
> David Vrabel <david.vrabel@citrix.com> writes:
> 
>> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>>> With this patch series I'm trying to address several issues with kexec on pvhvm:
>>> - shared_info issue (1st patch, just sending Olaf's work with Konrad's fix)
>>> - create specific pvhvm shutdown handler for kexec (2nd patch)
>>> - GSI PIRQ issue (3rd patch, I'm pretty confident that it does the right thing)
>>> - MSI PIRQ issue (4th patch, and I'm not sure it doesn't break anything -> RFC)
>>>
>>> This patch series can be tested on single vCPU guest. We still have SMP issues with
>>> pvhvm guests and kexec which require additional fixes.
>>
>> In addition to the fixes for multi-VCPU guests, what else remains?
>>
> 
> I'm aware of grants and ballooned out pages.
> 
>> What's the plan for handling granted pages?
>>
> 
> (if I got the design right) we have two issues:
> 
> 1) Pages we grant access to other domains. We have the list so we can
> try doing gnttab_end_foreign_access for all unmapped grants but there is
> nothing we can do with mapped ones from guest. We can either assume that
> all such usages are short-term and try waiting for them to finish or we
> need to do something like force-unmap from hypervisor side.

Shared rings and persistent grants (used in blkfront) remain mapped for
long periods so just waiting won't work.

Force unmap by the hypervisor might be a possibility but the hypervisor
needs to atomically replace the grant mapping with a different valid
mapping, or the force unmap would cause the backend that was accesses
the pages to fault.

Every writable mapping would have to be replaced with a mapping to a
unique page (to prevent information leaking between different granted
pages).  Read-only mappings could be replaces with a read-only mapping
to shared zero page safely.

The only way I can see how to do this requires co-operation from the
backend kernel -- it would need to provide replacement frames for every
grant map.  Xen would then use this frame when force-unmapping
(revoking) the mapping.

> 2) Pages we mapped from other domains. There is no easy way to collect
> all grant handles from different places in kernel atm so I can see two
> possible solutions:
> - we keep track of all handles with new kernel structure in guest and
> unmap them all on kexec/kdump.
> - we introduce new GNTTABOP_reset which does something similar to
> gnttab_release_mappings().

I think you can ignore this for now -- frontend drivers do not grant
map, but see suggestion about kexec_is_safe below.

> There is nothing we need to do with transferred grants (and I don't see
> transfer usages in kernel).

Agreed.

>> I don't think we want to accept a partial solution unless the known
>> non-working configurations fail fast on kexec load.
> 
> *I think* we can leave ballooned out pages out of scope for now.

I'm thinking we want a global flag (kexec_is_safe) that is cleared even
any unsafe operation (e.g., ballooning, grant mapping) is done by the
guest.  kexec load and exec can then be made to fail if this flag is clear.

This would allow you to fix only the use cases you're interested in
without introducing subtle failures if someone tries an unsupported
configurations.

David

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-28 13:33   ` David Vrabel
@ 2014-08-04 15:15     ` Konrad Rzeszutek Wilk
  2014-08-04 15:15     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-08-04 15:15 UTC (permalink / raw)
  To: David Vrabel
  Cc: Vitaly Kuznetsov, xen-devel, Boris Ostrovsky, Stefano Stabellini,
	Andrew Jones, linux-kernel

On Mon, Jul 28, 2014 at 02:33:06PM +0100, David Vrabel wrote:
> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
> > From: Olaf Hering <olaf@aepfle.de>
> > 
> > This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
> > ("xen PVonHVM: move shared_info to MMIO before kexec").
> > 
> > Currently kexec in a PVonHVM guest fails with a triple fault because the
> > new kernel overwrites the shared info page. The exact failure depends on
> > the size of the kernel image. This patch moves the pfn from RAM into an
> > E820 reserved memory area.
> > 
> > The pfn containing the shared_info is located somewhere in RAM. This will
> > cause trouble if the current kernel is doing a kexec boot into a new
> > kernel. The new kernel (and its startup code) can not know where the pfn
> > is, so it can not reserve the page. The hypervisor will continue to update
> > the pfn, and as a result memory corruption occours in the new kernel.
> > 
> > The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
> > E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
> > starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
> > will usually not allocate areas up to FE700000, so FE700000 is expected
> > to work also with older toolstacks.
> > 
> > In Xen3 there is no reserved area at a fixed location. If the guest is
> > started on such old hosts the shared_info page will be placed in RAM. As
> > a result kexec can not be used.
> > 
> > Signed-off-by: Olaf Hering <olaf@aepfle.de>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)
> 
> Is this a useful commit to give?  Isn't it one of Konrad's random trees?

Actually it is from upstream. That patch was reverted at certain point.

Perhaps just mention that it was added in, then reverted, and now after
a year or so, added back in with the fix.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info
  2014-07-28 13:33   ` David Vrabel
  2014-08-04 15:15     ` Konrad Rzeszutek Wilk
@ 2014-08-04 15:15     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 77+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-08-04 15:15 UTC (permalink / raw)
  To: David Vrabel
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, xen-devel,
	Vitaly Kuznetsov, Boris Ostrovsky

On Mon, Jul 28, 2014 at 02:33:06PM +0100, David Vrabel wrote:
> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
> > From: Olaf Hering <olaf@aepfle.de>
> > 
> > This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
> > ("xen PVonHVM: move shared_info to MMIO before kexec").
> > 
> > Currently kexec in a PVonHVM guest fails with a triple fault because the
> > new kernel overwrites the shared info page. The exact failure depends on
> > the size of the kernel image. This patch moves the pfn from RAM into an
> > E820 reserved memory area.
> > 
> > The pfn containing the shared_info is located somewhere in RAM. This will
> > cause trouble if the current kernel is doing a kexec boot into a new
> > kernel. The new kernel (and its startup code) can not know where the pfn
> > is, so it can not reserve the page. The hypervisor will continue to update
> > the pfn, and as a result memory corruption occours in the new kernel.
> > 
> > The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
> > E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
> > starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
> > will usually not allocate areas up to FE700000, so FE700000 is expected
> > to work also with older toolstacks.
> > 
> > In Xen3 there is no reserved area at a fixed location. If the guest is
> > started on such old hosts the shared_info page will be placed in RAM. As
> > a result kexec can not be used.
> > 
> > Signed-off-by: Olaf Hering <olaf@aepfle.de>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > (cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)
> 
> Is this a useful commit to give?  Isn't it one of Konrad's random trees?

Actually it is from upstream. That patch was reverted at certain point.

Perhaps just mention that it was added in, then reverted, and now after
a year or so, added back in with the fix.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec
  2014-08-01 13:00     ` David Vrabel
@ 2014-08-04 15:44         ` Vitaly Kuznetsov
  0 siblings, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-04 15:44 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
	Stefano Stabellini, Andrew Jones, linux-kernel

David Vrabel <david.vrabel@citrix.com> writes:

> On 01/08/14 13:21, Vitaly Kuznetsov wrote:
>> David Vrabel <david.vrabel@citrix.com> writes:
>> 
>>> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>>>> With this patch series I'm trying to address several issues with kexec on pvhvm:
>>>> - shared_info issue (1st patch, just sending Olaf's work with Konrad's fix)
>>>> - create specific pvhvm shutdown handler for kexec (2nd patch)
>>>> - GSI PIRQ issue (3rd patch, I'm pretty confident that it does the right thing)
>>>> - MSI PIRQ issue (4th patch, and I'm not sure it doesn't break anything -> RFC)
>>>>
>>>> This patch series can be tested on single vCPU guest. We still have SMP issues with
>>>> pvhvm guests and kexec which require additional fixes.
>>>
>>> In addition to the fixes for multi-VCPU guests, what else remains?
>>>
>> 
>> I'm aware of grants and ballooned out pages.
>> 
>>> What's the plan for handling granted pages?
>>>
>> 
>> (if I got the design right) we have two issues:
>> 
>> 1) Pages we grant access to other domains. We have the list so we can
>> try doing gnttab_end_foreign_access for all unmapped grants but there is
>> nothing we can do with mapped ones from guest. We can either assume that
>> all such usages are short-term and try waiting for them to finish or we
>> need to do something like force-unmap from hypervisor side.
>
> Shared rings and persistent grants (used in blkfront) remain mapped for
> long periods so just waiting won't work.
>
> Force unmap by the hypervisor might be a possibility but the hypervisor
> needs to atomically replace the grant mapping with a different valid
> mapping, or the force unmap would cause the backend that was accesses
> the pages to fault.
>
> Every writable mapping would have to be replaced with a mapping to a
> unique page (to prevent information leaking between different granted
> pages).  Read-only mappings could be replaces with a read-only mapping
> to shared zero page safely.
>
> The only way I can see how to do this requires co-operation from the
> backend kernel -- it would need to provide replacement frames for every
> grant map.  Xen would then use this frame when force-unmapping
> (revoking) the mapping.

We can introduce something like GNTTABOP_safe_map_grant_ref op with such
replacement frames and use then in 'force unmap' but to be honest I'd
like to avoid that.. What if xen could allocate and provide replacement
pages on its own? I understand we need to take some precautions against
malicious domains hogging all memory with such approach..

Actually if we narrow down our case to backend/frontend usage only we
can think that grants do not introduce any issue for kexec/kdump because:
- frontends are being closed on kexec so backends unmap corresponding
grants. That's not true for persistent grant case as persistent grants
are being unmaped only in xen_blkif_free() but not in
xen_blkif_disconnect() atm. The following patch helps:

diff --git a/drivers/block/xen-blkback/xenbus.c
b/drivers/block/xen-blkback/xenbus.c
index 3a8b810..54f4089 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -270,6 +270,9 @@ static int xen_blkif_disconnect(struct xen_blkif
*blkif)
                blkif->blk_rings.common.sring = NULL;
        }
 
+       /* Remove all persistent grants and the cache of ballooned
pages. */
+       xen_blkbk_free_caches(blkif);
+
        return 0;
 }
 
@@ -281,9 +284,6 @@ static void xen_blkif_free(struct xen_blkif *blkif)
        xen_blkif_disconnect(blkif);
        xen_vbd_free(&blkif->vbd);
 
-       /* Remove all persistent grants and the cache of ballooned
        pages. */
-       xen_blkbk_free_caches(blkif);
-
        /* Make sure everything is drained before shutting down */
        BUG_ON(blkif->persistent_gnt_c != 0);
        BUG_ON(atomic_read(&blkif->persistent_gnt_in_use) != 0);


- in kdump case our new kernel lives in its own space so we won't hit
granted pages (we can read some crap when collecting memory dump but
it's not any better in case these pages were unmapped)

- frontends are being reinitialized on new kernel startup, that also
causes the unmap (with the patch from above applied).

So I think we need to do something with grants in two cases: they were
used outside frontends/backends and we have no idea if they are going to
be unmapped (can be solved with kexec_is_safe flag) and when backend got
stuck and refused to close/unmap (should be rare).

>
>> 2) Pages we mapped from other domains. There is no easy way to collect
>> all grant handles from different places in kernel atm so I can see two
>> possible solutions:
>> - we keep track of all handles with new kernel structure in guest and
>> unmap them all on kexec/kdump.
>> - we introduce new GNTTABOP_reset which does something similar to
>> gnttab_release_mappings().
>
> I think you can ignore this for now -- frontend drivers do not grant
> map, but see suggestion about kexec_is_safe below.
>
>> There is nothing we need to do with transferred grants (and I don't see
>> transfer usages in kernel).
>
> Agreed.
>
>>> I don't think we want to accept a partial solution unless the known
>>> non-working configurations fail fast on kexec load.
>> 
>> *I think* we can leave ballooned out pages out of scope for now.
>
> I'm thinking we want a global flag (kexec_is_safe) that is cleared even
> any unsafe operation (e.g., ballooning, grant mapping) is done by the
> guest.  kexec load and exec can then be made to fail if this flag is clear.
>
> This would allow you to fix only the use cases you're interested in
> without introducing subtle failures if someone tries an unsupported
> configurations.

Makes sense but it would also be nice to have this flag supported in
kexec utility itself (so it will report something like "kexec is not
safe, use '-f' to override...).

Thanks,

>
> David

-- 
  Vitaly

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec
@ 2014-08-04 15:44         ` Vitaly Kuznetsov
  0 siblings, 0 replies; 77+ messages in thread
From: Vitaly Kuznetsov @ 2014-08-04 15:44 UTC (permalink / raw)
  To: David Vrabel
  Cc: Andrew Jones, Stefano Stabellini, linux-kernel, xen-devel,
	Boris Ostrovsky

David Vrabel <david.vrabel@citrix.com> writes:

> On 01/08/14 13:21, Vitaly Kuznetsov wrote:
>> David Vrabel <david.vrabel@citrix.com> writes:
>> 
>>> On 15/07/14 14:40, Vitaly Kuznetsov wrote:
>>>> With this patch series I'm trying to address several issues with kexec on pvhvm:
>>>> - shared_info issue (1st patch, just sending Olaf's work with Konrad's fix)
>>>> - create specific pvhvm shutdown handler for kexec (2nd patch)
>>>> - GSI PIRQ issue (3rd patch, I'm pretty confident that it does the right thing)
>>>> - MSI PIRQ issue (4th patch, and I'm not sure it doesn't break anything -> RFC)
>>>>
>>>> This patch series can be tested on single vCPU guest. We still have SMP issues with
>>>> pvhvm guests and kexec which require additional fixes.
>>>
>>> In addition to the fixes for multi-VCPU guests, what else remains?
>>>
>> 
>> I'm aware of grants and ballooned out pages.
>> 
>>> What's the plan for handling granted pages?
>>>
>> 
>> (if I got the design right) we have two issues:
>> 
>> 1) Pages we grant access to other domains. We have the list so we can
>> try doing gnttab_end_foreign_access for all unmapped grants but there is
>> nothing we can do with mapped ones from guest. We can either assume that
>> all such usages are short-term and try waiting for them to finish or we
>> need to do something like force-unmap from hypervisor side.
>
> Shared rings and persistent grants (used in blkfront) remain mapped for
> long periods so just waiting won't work.
>
> Force unmap by the hypervisor might be a possibility but the hypervisor
> needs to atomically replace the grant mapping with a different valid
> mapping, or the force unmap would cause the backend that was accesses
> the pages to fault.
>
> Every writable mapping would have to be replaced with a mapping to a
> unique page (to prevent information leaking between different granted
> pages).  Read-only mappings could be replaces with a read-only mapping
> to shared zero page safely.
>
> The only way I can see how to do this requires co-operation from the
> backend kernel -- it would need to provide replacement frames for every
> grant map.  Xen would then use this frame when force-unmapping
> (revoking) the mapping.

We can introduce something like GNTTABOP_safe_map_grant_ref op with such
replacement frames and use then in 'force unmap' but to be honest I'd
like to avoid that.. What if xen could allocate and provide replacement
pages on its own? I understand we need to take some precautions against
malicious domains hogging all memory with such approach..

Actually if we narrow down our case to backend/frontend usage only we
can think that grants do not introduce any issue for kexec/kdump because:
- frontends are being closed on kexec so backends unmap corresponding
grants. That's not true for persistent grant case as persistent grants
are being unmaped only in xen_blkif_free() but not in
xen_blkif_disconnect() atm. The following patch helps:

diff --git a/drivers/block/xen-blkback/xenbus.c
b/drivers/block/xen-blkback/xenbus.c
index 3a8b810..54f4089 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -270,6 +270,9 @@ static int xen_blkif_disconnect(struct xen_blkif
*blkif)
                blkif->blk_rings.common.sring = NULL;
        }
 
+       /* Remove all persistent grants and the cache of ballooned
pages. */
+       xen_blkbk_free_caches(blkif);
+
        return 0;
 }
 
@@ -281,9 +284,6 @@ static void xen_blkif_free(struct xen_blkif *blkif)
        xen_blkif_disconnect(blkif);
        xen_vbd_free(&blkif->vbd);
 
-       /* Remove all persistent grants and the cache of ballooned
        pages. */
-       xen_blkbk_free_caches(blkif);
-
        /* Make sure everything is drained before shutting down */
        BUG_ON(blkif->persistent_gnt_c != 0);
        BUG_ON(atomic_read(&blkif->persistent_gnt_in_use) != 0);


- in kdump case our new kernel lives in its own space so we won't hit
granted pages (we can read some crap when collecting memory dump but
it's not any better in case these pages were unmapped)

- frontends are being reinitialized on new kernel startup, that also
causes the unmap (with the patch from above applied).

So I think we need to do something with grants in two cases: they were
used outside frontends/backends and we have no idea if they are going to
be unmapped (can be solved with kexec_is_safe flag) and when backend got
stuck and refused to close/unmap (should be rare).

>
>> 2) Pages we mapped from other domains. There is no easy way to collect
>> all grant handles from different places in kernel atm so I can see two
>> possible solutions:
>> - we keep track of all handles with new kernel structure in guest and
>> unmap them all on kexec/kdump.
>> - we introduce new GNTTABOP_reset which does something similar to
>> gnttab_release_mappings().
>
> I think you can ignore this for now -- frontend drivers do not grant
> map, but see suggestion about kexec_is_safe below.
>
>> There is nothing we need to do with transferred grants (and I don't see
>> transfer usages in kernel).
>
> Agreed.
>
>>> I don't think we want to accept a partial solution unless the known
>>> non-working configurations fail fast on kexec load.
>> 
>> *I think* we can leave ballooned out pages out of scope for now.
>
> I'm thinking we want a global flag (kexec_is_safe) that is cleared even
> any unsafe operation (e.g., ballooning, grant mapping) is done by the
> guest.  kexec load and exec can then be made to fail if this flag is clear.
>
> This would allow you to fix only the use cases you're interested in
> without introducing subtle failures if someone tries an unsupported
> configurations.

Makes sense but it would also be nice to have this flag supported in
kexec utility itself (so it will report something like "kexec is not
safe, use '-f' to override...).

Thanks,

>
> David

-- 
  Vitaly

^ permalink raw reply related	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2014-08-04 15:49 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-15 13:40 [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec Vitaly Kuznetsov
2014-07-15 13:40 ` [PATCH RFC 1/4] xen PVonHVM: use E820_Reserved area for shared_info Vitaly Kuznetsov
2014-07-15 13:40 ` Vitaly Kuznetsov
2014-07-15 15:06   ` Konrad Rzeszutek Wilk
2014-07-15 15:06   ` Konrad Rzeszutek Wilk
2014-07-15 15:43     ` Vitaly Kuznetsov
2014-07-15 15:43     ` Vitaly Kuznetsov
2014-07-15 15:50       ` Konrad Rzeszutek Wilk
2014-07-18 11:05         ` Vitaly Kuznetsov
2014-07-18 13:56           ` Konrad Rzeszutek Wilk
2014-07-18 13:56           ` Konrad Rzeszutek Wilk
2014-07-18 15:45             ` Vitaly Kuznetsov
2014-07-18 15:45             ` Vitaly Kuznetsov
2014-07-18 11:05         ` Vitaly Kuznetsov
2014-07-15 15:50       ` Konrad Rzeszutek Wilk
2014-07-28 13:33   ` David Vrabel
2014-08-04 15:15     ` Konrad Rzeszutek Wilk
2014-08-04 15:15     ` Konrad Rzeszutek Wilk
2014-07-28 13:33   ` David Vrabel
2014-07-15 13:40 ` [PATCH RFC 2/4] xen/pvhvm: Introduce xen_pvhvm_kexec_shutdown() Vitaly Kuznetsov
2014-07-15 13:40 ` Vitaly Kuznetsov
2014-07-15 15:09   ` Konrad Rzeszutek Wilk
2014-07-15 15:52     ` Vitaly Kuznetsov
2014-07-15 15:52     ` Vitaly Kuznetsov
2014-07-15 15:58       ` Konrad Rzeszutek Wilk
2014-07-15 15:58       ` Konrad Rzeszutek Wilk
2014-07-15 15:09   ` Konrad Rzeszutek Wilk
2014-07-15 17:41   ` Boris Ostrovsky
2014-07-15 17:41   ` Boris Ostrovsky
2014-07-28 13:36   ` David Vrabel
2014-07-28 13:36   ` David Vrabel
2014-07-15 13:40 ` [PATCH RFC 3/4] xen/pvhvm: Unmap all PIRQs on startup and shutdown Vitaly Kuznetsov
2014-07-15 15:23   ` Konrad Rzeszutek Wilk
2014-07-15 15:23   ` Konrad Rzeszutek Wilk
2014-07-16  9:37     ` Vitaly Kuznetsov
2014-07-16  9:37     ` Vitaly Kuznetsov
2014-07-16 13:45       ` Konrad Rzeszutek Wilk
2014-07-16 16:34         ` Vitaly Kuznetsov
2014-07-16 16:34         ` Vitaly Kuznetsov
2014-07-16 13:45       ` Konrad Rzeszutek Wilk
2014-07-28 13:43   ` David Vrabel
2014-07-29 13:50     ` Vitaly Kuznetsov
2014-07-29 13:50     ` Vitaly Kuznetsov
2014-07-29 15:25       ` David Vrabel
2014-07-29 15:25       ` [Xen-devel] " David Vrabel
2014-07-29 17:06         ` Vitaly Kuznetsov
2014-07-29 17:12           ` David Vrabel
2014-07-29 17:12           ` David Vrabel
2014-07-29 17:06         ` Vitaly Kuznetsov
2014-07-28 13:43   ` David Vrabel
2014-07-15 13:40 ` Vitaly Kuznetsov
2014-07-15 13:40 ` [PATCH RFC 4/4] xen/pvhvm: Make MSI IRQs work after kexec Vitaly Kuznetsov
2014-07-15 15:21   ` Konrad Rzeszutek Wilk
2014-07-15 15:21   ` Konrad Rzeszutek Wilk
2014-07-16  9:01     ` Vitaly Kuznetsov
2014-07-16 13:40       ` Konrad Rzeszutek Wilk
2014-07-16 17:20         ` Vitaly Kuznetsov
2014-07-16 17:20         ` Vitaly Kuznetsov
2014-07-16 17:30           ` Konrad Rzeszutek Wilk
2014-07-16 17:30           ` Konrad Rzeszutek Wilk
2014-07-17  8:12             ` Vitaly Kuznetsov
2014-07-17  8:12             ` Vitaly Kuznetsov
2014-07-28 13:47           ` David Vrabel
2014-07-28 13:47           ` David Vrabel
2014-07-21 14:13         ` Stefano Stabellini
2014-07-21 14:13         ` Stefano Stabellini
2014-07-16 13:40       ` Konrad Rzeszutek Wilk
2014-07-16  9:01     ` Vitaly Kuznetsov
2014-07-15 13:40 ` Vitaly Kuznetsov
2014-07-28 13:24 ` [PATCH RFC 0/4] xen/pvhvm: fix shared_info and pirq issues with kexec David Vrabel
2014-08-01 12:21   ` Vitaly Kuznetsov
2014-08-01 12:21   ` Vitaly Kuznetsov
2014-08-01 13:00     ` David Vrabel
2014-08-04 15:44       ` Vitaly Kuznetsov
2014-08-04 15:44         ` Vitaly Kuznetsov
2014-08-01 13:00     ` David Vrabel
2014-07-28 13:24 ` David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.