All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] [v4] kexec and kdump for Xen PVonHVM guests
@ 2011-08-04 16:20 Olaf Hering
  2011-08-04 16:20 ` [PATCH 1/3] xen/pv-on-hvm kexec: add xs_introduce to shutdown watches from old kernel Olaf Hering
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Olaf Hering @ 2011-08-04 16:20 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, Konrad; +Cc: xen-devel



The following series implements kexec and kdump in a Xen PVonHVM guest.

It is (should be) available via git:

 git://github.com/olafhering/linux.git xen-kexec-3.0

The kexec or kdump kernel has to take care of already allocated virqs,
PV devices in Closed or Connected state, and of registered watches in
the old kernel. With the three patches these conditions are checked
during boot of the new kernel rather than in the reboot/crash path.

A fixed kexec-tools-2.0.2 package is required:
http://lists.infradead.org/pipermail/kexec/2011-May/005026.html
http://lists.infradead.org/pipermail/kexec/2011-August/005339.html

Another fix is for xenstored, it has to accept the XS_INTRODUCE from a guest:
http://lists.xensource.com/archives/html/xen-devel/2011-08/msg00007.html

One open issue is the balloon driver. It removes pages from the guest
and gives them back to the hypervisor. The kexec kernel is not aware of
the fact that some pages are unavailable, and hangs or crashes.
The workaround for the time being is:

if test -f /sys/devices/system/xen_memory/xen_memory0/target -a \
        -f /sys/devices/system/xen_memory/xen_memory0/target_kb
then
    cat /sys/devices/system/xen_memory/xen_memory0/target > \
        /sys/devices/system/xen_memory/xen_memory0/target_kb
fi
kexec -e

This has to be resolved with another series of changes.

Olaf



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] xen/pv-on-hvm kexec: add xs_introduce to shutdown watches from old kernel
  2011-08-04 16:20 [PATCH 0/3] [v4] kexec and kdump for Xen PVonHVM guests Olaf Hering
@ 2011-08-04 16:20 ` Olaf Hering
  2011-08-09  9:15     ` Ian Campbell
  2011-08-04 16:20 ` [PATCH 2/3] xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports Olaf Hering
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Olaf Hering @ 2011-08-04 16:20 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, Konrad; +Cc: xen-devel

[-- Attachment #1: xen.kexec.xs_introduce.patch --]
[-- Type: text/plain, Size: 3593 bytes --]

Add new xs_introduce function to shutdown watches from old kernel after
kexec boot.  The old kernel does not unregister all watches in the
shutdown path.  They are still active, the double registration can not
be detected by the new kernel.  When the watches fire, unexpected events
will arrive and the xenwatch thread will crash (jumps to NULL).  An
orderly reboot of a hvm guest will destroy the entire guest with all its
resources (including the watches) before it is rebuilt from scratch, so
the missing unregister is not an issue in that case.

With this change the xenstored is instructed to wipe all active watches
for the guest.  However, a patch for xenstored is required so that it
accepts the XS_INTRODUCE request from a guest. Up to now such a request
was only allowed from the xen tool stack in dom0.

http://lists.xensource.com/archives/html/xen-devel/2011-08/msg00007.html

Signed-off-by: Olaf Hering <olaf@aepfle.de>

---
 drivers/xen/xenbus/xenbus_comms.h |    2 +-
 drivers/xen/xenbus/xenbus_probe.c |    2 +-
 drivers/xen/xenbus/xenbus_xs.c    |   27 ++++++++++++++++++++++++++-
 3 files changed, 28 insertions(+), 3 deletions(-)

Index: linux-3.0/drivers/xen/xenbus/xenbus_comms.h
===================================================================
--- linux-3.0.orig/drivers/xen/xenbus/xenbus_comms.h
+++ linux-3.0/drivers/xen/xenbus/xenbus_comms.h
@@ -31,7 +31,7 @@
 #ifndef _XENBUS_COMMS_H
 #define _XENBUS_COMMS_H
 
-int xs_init(void);
+int xs_init(unsigned long xen_store_mfn);
 int xb_init_comms(void);
 
 /* Low level routines. */
Index: linux-3.0/drivers/xen/xenbus/xenbus_probe.c
===================================================================
--- linux-3.0.orig/drivers/xen/xenbus/xenbus_probe.c
+++ linux-3.0/drivers/xen/xenbus/xenbus_probe.c
@@ -757,7 +757,7 @@ static int __init xenbus_init(void)
 	}
 
 	/* Initialize the interface to xenstore. */
-	err = xs_init();
+	err = xs_init(xen_store_mfn);
 	if (err) {
 		printk(KERN_WARNING
 		       "XENBUS: Error initializing xenstore comms: %i\n", err);
Index: linux-3.0/drivers/xen/xenbus/xenbus_xs.c
===================================================================
--- linux-3.0.orig/drivers/xen/xenbus/xenbus_xs.c
+++ linux-3.0/drivers/xen/xenbus/xenbus_xs.c
@@ -620,6 +620,20 @@ static struct xenbus_watch *find_watch(c
 	return NULL;
 }
 
+static int xs_introduce(const char *domid, const char *mfn, const char *port)
+{
+	struct kvec iov[3];
+
+	iov[0].iov_base = (char *)domid;
+	iov[0].iov_len = strlen(domid) + 1;
+	iov[1].iov_base = (char *)mfn;
+	iov[1].iov_len = strlen(mfn) + 1;
+	iov[2].iov_base = (char *)port;
+	iov[2].iov_len = strlen(port) + 1;
+
+	return xs_error(xs_talkv(XBT_NIL, XS_INTRODUCE, iov,
+				 ARRAY_SIZE(iov), NULL));
+}
 /* Register callback to watch this node. */
 int register_xenbus_watch(struct xenbus_watch *watch)
 {
@@ -867,10 +881,11 @@ static int xenbus_thread(void *unused)
 	return 0;
 }
 
-int xs_init(void)
+int xs_init(unsigned long xen_store_mfn)
 {
 	int err;
 	struct task_struct *task;
+	char domid[12], mfn[24], port[24];
 
 	INIT_LIST_HEAD(&xs_state.reply_list);
 	spin_lock_init(&xs_state.reply_lock);
@@ -897,5 +912,15 @@ int xs_init(void)
 	if (IS_ERR(task))
 		return PTR_ERR(task);
 
+	snprintf(domid, sizeof(domid), "%u", DOMID_SELF);
+	snprintf(mfn, sizeof(mfn), "%lu", xen_store_mfn);
+	snprintf(port, sizeof(port), "%d", xen_store_evtchn);
+	if (xen_hvm_domain()) {
+		/* shutdown watches for kexec boot */
+		err = xs_introduce(domid, mfn, port);
+		if (err)
+			printk(KERN_WARNING "xs_introduce failed: %d\n", err);
+	}
+
 	return 0;
 }


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/3] xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
  2011-08-04 16:20 [PATCH 0/3] [v4] kexec and kdump for Xen PVonHVM guests Olaf Hering
  2011-08-04 16:20 ` [PATCH 1/3] xen/pv-on-hvm kexec: add xs_introduce to shutdown watches from old kernel Olaf Hering
@ 2011-08-04 16:20 ` Olaf Hering
  2011-08-09  9:17   ` [Xen-devel] " Ian Campbell
  2011-08-04 16:20 ` [PATCH 3/3] xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel Olaf Hering
  2011-08-09 19:02 ` [Xen-devel] [PATCH 0/3] [v4] kexec and kdump for Xen PVonHVM guests Konrad Rzeszutek Wilk
  3 siblings, 1 reply; 18+ messages in thread
From: Olaf Hering @ 2011-08-04 16:20 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, Konrad; +Cc: xen-devel

[-- Attachment #1: xen.kexec.find_virq.patch --]
[-- Type: text/plain, Size: 2109 bytes --]

During a kexec boot some virqs such as timer and debugirq were already
registered by the old kernel.  The hypervisor will return -EEXISTS from
the new EVTCHNOP_bind_virq request and the BUG in bind_virq_to_irq()
triggers.  Catch the -EEXISTS error and loop through all possible ports to find
what port belongs to the virq/cpu combo.

Signed-off-by: Olaf Hering <olaf@aepfle.de>

---
 drivers/xen/events.c |   40 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 5 deletions(-)

Index: linux-3.0/drivers/xen/events.c
===================================================================
--- linux-3.0.orig/drivers/xen/events.c
+++ linux-3.0/drivers/xen/events.c
@@ -877,11 +877,35 @@ static int bind_interdomain_evtchn_to_ir
 	return err ? : bind_evtchn_to_irq(bind_interdomain.local_port);
 }
 
+/* BITS_PER_LONG is used in Xen */
+#define MAX_EVTCHNS (64 * 64)
+
+static int find_virq(unsigned int virq, unsigned int cpu)
+{
+	struct evtchn_status status;
+	int port, rc = -ENOENT;
+
+	memset(&status, 0, sizeof(status));
+	for (port = 0; port <= MAX_EVTCHNS; port++) {
+		status.dom = DOMID_SELF;
+		status.port = port;
+		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
+		if (rc < 0)
+			continue;
+		if (status.status != EVTCHNSTAT_virq)
+			continue;
+		if (status.u.virq == virq && status.vcpu == cpu) {
+			rc = port;
+			break;
+		}
+	}
+	return rc;
+}
 
 int bind_virq_to_irq(unsigned int virq, unsigned int cpu)
 {
 	struct evtchn_bind_virq bind_virq;
-	int evtchn, irq;
+	int evtchn, irq, ret;
 
 	spin_lock(&irq_mapping_update_lock);
 
@@ -897,10 +921,16 @@ int bind_virq_to_irq(unsigned int virq,
 
 		bind_virq.virq = virq;
 		bind_virq.vcpu = cpu;
-		if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
-						&bind_virq) != 0)
-			BUG();
-		evtchn = bind_virq.port;
+		ret = HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
+						&bind_virq);
+		if (ret == 0)
+			evtchn = bind_virq.port;
+		else {
+			if (ret == -EEXIST)
+				ret = find_virq(virq, cpu);
+			BUG_ON(ret < 0);
+			evtchn = ret;
+		}
 
 		xen_irq_info_virq_init(cpu, irq, evtchn, virq);
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/3] xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
  2011-08-04 16:20 [PATCH 0/3] [v4] kexec and kdump for Xen PVonHVM guests Olaf Hering
  2011-08-04 16:20 ` [PATCH 1/3] xen/pv-on-hvm kexec: add xs_introduce to shutdown watches from old kernel Olaf Hering
  2011-08-04 16:20 ` [PATCH 2/3] xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports Olaf Hering
@ 2011-08-04 16:20 ` Olaf Hering
  2011-08-09  9:23   ` [Xen-devel] " Ian Campbell
  2011-08-09 19:02 ` [Xen-devel] [PATCH 0/3] [v4] kexec and kdump for Xen PVonHVM guests Konrad Rzeszutek Wilk
  3 siblings, 1 reply; 18+ messages in thread
From: Olaf Hering @ 2011-08-04 16:20 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, Konrad; +Cc: xen-devel

[-- Attachment #1: xen.kexec-kdump.reconnect-devices.patch --]
[-- Type: text/plain, Size: 5291 bytes --]

After triggering a crash dump in a HVM guest, the PV backend drivers
will remain in Connected state. When the kdump kernel starts the PV
drivers will skip such devices. As a result, no root device is found and
the vmcore cant be saved.  

A similar situation happens after a kexec boot, here the devices will be
in the Closed state.

With this change all frontend devices with state XenbusStateConnected or
XenbusStateClosed will be reset by changing the state file to
Closing/Closed/Initializing.  This will trigger a disconnect in the
backend drivers. Now the frontend drivers will find the backend drivers
in state Initwait and can connect.

Signed-off-by: Olaf Hering <olaf@aepfle.de>

---
 drivers/xen/xenbus/xenbus_comms.c          |    4 -
 drivers/xen/xenbus/xenbus_probe_frontend.c |  116 +++++++++++++++++++++++++++++
 2 files changed, 119 insertions(+), 1 deletion(-)

Index: linux-3.0/drivers/xen/xenbus/xenbus_comms.c
===================================================================
--- linux-3.0.orig/drivers/xen/xenbus/xenbus_comms.c
+++ linux-3.0/drivers/xen/xenbus/xenbus_comms.c
@@ -212,7 +212,9 @@ int xb_init_comms(void)
 		printk(KERN_WARNING "XENBUS response ring is not quiescent "
 		       "(%08x:%08x): fixing up\n",
 		       intf->rsp_cons, intf->rsp_prod);
-		intf->rsp_cons = intf->rsp_prod;
+		/* breaks kdump */
+		if (!reset_devices)
+			intf->rsp_cons = intf->rsp_prod;
 	}
 
 	if (xenbus_irq) {
Index: linux-3.0/drivers/xen/xenbus/xenbus_probe_frontend.c
===================================================================
--- linux-3.0.orig/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ linux-3.0/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -252,10 +252,126 @@ int __xenbus_register_frontend(struct xe
 }
 EXPORT_SYMBOL_GPL(__xenbus_register_frontend);
 
+static DECLARE_WAIT_QUEUE_HEAD(backend_state_wq);
+static int backend_state;
+
+static void xenbus_reset_backend_state_changed(struct xenbus_watch *w,
+					const char **v, unsigned int l)
+{
+	xenbus_scanf(XBT_NIL, v[XS_WATCH_PATH], "", "%i", &backend_state);
+	printk(KERN_DEBUG "XENBUS: %s %s\n",
+			v[XS_WATCH_PATH], xenbus_strstate(backend_state));
+	wake_up(&backend_state_wq);
+}
+
+static void xenbus_reset_wait_for_backend(int expected)
+{
+	wait_event_interruptible(backend_state_wq, backend_state == expected);
+}
+
+/*
+ * Reset frontend if it is in Connected or Closed state.
+ * Wait for backend to catch up.
+ * State Connected happens during kdump, Closed after kexec.
+ */
+static void xenbus_reset_frontend(char *fe, char *be, int be_state)
+{
+	struct xenbus_watch be_watch;
+
+	printk(KERN_DEBUG "XENBUS: backend %s %s\n",
+			be, xenbus_strstate(be_state));
+
+	memset(&be_watch, 0, sizeof(be_watch));
+	be_watch.node = kasprintf(GFP_NOIO | __GFP_HIGH, "%s/state", be);
+	if (!be_watch.node)
+		return;
+
+	be_watch.callback = xenbus_reset_backend_state_changed;
+	backend_state = XenbusStateUnknown;
+
+	printk(KERN_INFO "XENBUS: triggering reconnect on %s\n", be);
+	register_xenbus_watch(&be_watch);
+
+	switch (be_state) {
+	case XenbusStateConnected:
+		xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateClosing);
+		xenbus_reset_wait_for_backend(XenbusStateClosing);
+
+	case XenbusStateClosing:
+		xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateClosed);
+		xenbus_reset_wait_for_backend(XenbusStateClosed);
+
+	case XenbusStateClosed:
+		xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateInitialising);
+		xenbus_reset_wait_for_backend(XenbusStateInitWait);
+	}
+
+	unregister_xenbus_watch(&be_watch);
+	printk(KERN_INFO "XENBUS: reconnect done on %s\n", be);
+	kfree(be_watch.node);
+}
+
+static void xenbus_check_frontend(char *class, char *dev)
+{
+	int be_state, fe_state, err;
+	char *backend, *frontend;
+
+	frontend = kasprintf(GFP_NOIO | __GFP_HIGH, "device/%s/%s", class, dev);
+	if (!frontend)
+		return;
+
+	err = xenbus_scanf(XBT_NIL, frontend, "state", "%i", &fe_state);
+	if (err != 1)
+		goto out;
+
+	switch (fe_state) {
+	case XenbusStateConnected:
+	case XenbusStateClosed:
+		printk(KERN_DEBUG "XENBUS: frontend %s %s\n",
+				frontend, xenbus_strstate(fe_state));
+		backend = xenbus_read(XBT_NIL, frontend, "backend", NULL);
+		if (!backend || IS_ERR(backend))
+			goto out;
+		err = xenbus_scanf(XBT_NIL, backend, "state", "%i", &be_state);
+		if (err == 1)
+			xenbus_reset_frontend(frontend, backend, be_state);
+		kfree(backend);
+		break;
+	default:
+		break;
+	}
+out:
+	kfree(frontend);
+}
+
+static void xenbus_reset_state(void)
+{
+	char **devclass, **dev;
+	int devclass_n, dev_n;
+	int i, j;
+
+	devclass = xenbus_directory(XBT_NIL, "device", "", &devclass_n);
+	if (IS_ERR(devclass))
+		return;
+
+	for (i = 0; i < devclass_n; i++) {
+		dev = xenbus_directory(XBT_NIL, "device", devclass[i], &dev_n);
+		if (IS_ERR(dev))
+			continue;
+		for (j = 0; j < dev_n; j++)
+			xenbus_check_frontend(devclass[i], dev[j]);
+		kfree(dev);
+	}
+	kfree(devclass);
+}
+
 static int frontend_probe_and_watch(struct notifier_block *notifier,
 				   unsigned long event,
 				   void *data)
 {
+	/* reset devices in Connected or Closed state */
+	if (xen_hvm_domain())
+		xenbus_reset_state();
 	/* Enumerate devices in xenstore and watch for changes. */
 	xenbus_probe_devices(&xenbus_frontend);
 	register_xenbus_watch(&fe_watch);


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] xen/pv-on-hvm kexec: add xs_introduce to shutdown watches from old kernel
  2011-08-04 16:20 ` [PATCH 1/3] xen/pv-on-hvm kexec: add xs_introduce to shutdown watches from old kernel Olaf Hering
@ 2011-08-09  9:15     ` Ian Campbell
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2011-08-09  9:15 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, Jeremy Fitzhardinge, Konrad, xen-devel

> Index: linux-3.0/drivers/xen/xenbus/xenbus_xs.c
> ===================================================================
> --- linux-3.0.orig/drivers/xen/xenbus/xenbus_xs.c
> +++ linux-3.0/drivers/xen/xenbus/xenbus_xs.c
> @@ -620,6 +620,20 @@ static struct xenbus_watch *find_watch(c
>  	return NULL;
>  }
>  
> +static int xs_introduce(const char *domid, const char *mfn, const char *port)
> +{
> +	struct kvec iov[3];
> +
> +	iov[0].iov_base = (char *)domid;
> +	iov[0].iov_len = strlen(domid) + 1;
> +	iov[1].iov_base = (char *)mfn;
> +	iov[1].iov_len = strlen(mfn) + 1;
> +	iov[2].iov_base = (char *)port;
> +	iov[2].iov_len = strlen(port) + 1;
> +
> +	return xs_error(xs_talkv(XBT_NIL, XS_INTRODUCE, iov,
> +				 ARRAY_SIZE(iov), NULL));

What should we do if this fails?

> +}
>  /* Register callback to watch this node. */
>  int register_xenbus_watch(struct xenbus_watch *watch)
>  {
> @@ -867,10 +881,11 @@ static int xenbus_thread(void *unused)
>  	return 0;
>  }
>  
> -int xs_init(void)
> +int xs_init(unsigned long xen_store_mfn)
>  {
>  	int err;
>  	struct task_struct *task;
> +	char domid[12], mfn[24], port[24];
>  
>  	INIT_LIST_HEAD(&xs_state.reply_list);
>  	spin_lock_init(&xs_state.reply_lock);
> @@ -897,5 +912,15 @@ int xs_init(void)
>  	if (IS_ERR(task))
>  		return PTR_ERR(task);
>  
> +	snprintf(domid, sizeof(domid), "%u", DOMID_SELF);
> +	snprintf(mfn, sizeof(mfn), "%lu", xen_store_mfn);
> +	snprintf(port, sizeof(port), "%d", xen_store_evtchn);

These can be within the if, or better within the xs_introduce function
itself.

> +	if (xen_hvm_domain()) {
> +		/* shutdown watches for kexec boot */
> +		err = xs_introduce(domid, mfn, port);
> +		if (err)
> +			printk(KERN_WARNING "xs_introduce failed: %d\n", err);
> +	}
> +
>  	return 0;
>  }
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] xen/pv-on-hvm kexec: add xs_introduce to shutdown watches from old kernel
@ 2011-08-09  9:15     ` Ian Campbell
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2011-08-09  9:15 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Jeremy Fitzhardinge, xen-devel, linux-kernel, Konrad

> Index: linux-3.0/drivers/xen/xenbus/xenbus_xs.c
> ===================================================================
> --- linux-3.0.orig/drivers/xen/xenbus/xenbus_xs.c
> +++ linux-3.0/drivers/xen/xenbus/xenbus_xs.c
> @@ -620,6 +620,20 @@ static struct xenbus_watch *find_watch(c
>  	return NULL;
>  }
>  
> +static int xs_introduce(const char *domid, const char *mfn, const char *port)
> +{
> +	struct kvec iov[3];
> +
> +	iov[0].iov_base = (char *)domid;
> +	iov[0].iov_len = strlen(domid) + 1;
> +	iov[1].iov_base = (char *)mfn;
> +	iov[1].iov_len = strlen(mfn) + 1;
> +	iov[2].iov_base = (char *)port;
> +	iov[2].iov_len = strlen(port) + 1;
> +
> +	return xs_error(xs_talkv(XBT_NIL, XS_INTRODUCE, iov,
> +				 ARRAY_SIZE(iov), NULL));

What should we do if this fails?

> +}
>  /* Register callback to watch this node. */
>  int register_xenbus_watch(struct xenbus_watch *watch)
>  {
> @@ -867,10 +881,11 @@ static int xenbus_thread(void *unused)
>  	return 0;
>  }
>  
> -int xs_init(void)
> +int xs_init(unsigned long xen_store_mfn)
>  {
>  	int err;
>  	struct task_struct *task;
> +	char domid[12], mfn[24], port[24];
>  
>  	INIT_LIST_HEAD(&xs_state.reply_list);
>  	spin_lock_init(&xs_state.reply_lock);
> @@ -897,5 +912,15 @@ int xs_init(void)
>  	if (IS_ERR(task))
>  		return PTR_ERR(task);
>  
> +	snprintf(domid, sizeof(domid), "%u", DOMID_SELF);
> +	snprintf(mfn, sizeof(mfn), "%lu", xen_store_mfn);
> +	snprintf(port, sizeof(port), "%d", xen_store_evtchn);

These can be within the if, or better within the xs_introduce function
itself.

> +	if (xen_hvm_domain()) {
> +		/* shutdown watches for kexec boot */
> +		err = xs_introduce(domid, mfn, port);
> +		if (err)
> +			printk(KERN_WARNING "xs_introduce failed: %d\n", err);
> +	}
> +
>  	return 0;
>  }
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/3] xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
  2011-08-04 16:20 ` [PATCH 2/3] xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports Olaf Hering
@ 2011-08-09  9:17   ` Ian Campbell
  2011-08-09  9:29     ` Olaf Hering
  2011-08-09 15:28     ` Olaf Hering
  0 siblings, 2 replies; 18+ messages in thread
From: Ian Campbell @ 2011-08-09  9:17 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, Jeremy Fitzhardinge, Konrad, xen-devel

On Thu, 2011-08-04 at 17:20 +0100, Olaf Hering wrote:
> During a kexec boot some virqs such as timer and debugirq were already
> registered by the old kernel.  The hypervisor will return -EEXISTS from
> the new EVTCHNOP_bind_virq request and the BUG in bind_virq_to_irq()
> triggers.  Catch the -EEXISTS error and loop through all possible ports to find
> what port belongs to the virq/cpu combo.

Would it be better to proactively just query the status of all event
channels early on, like you do in find_virq, and setup the irq info
structures as appropriate? Rather than waiting for an -EEXISTS I mean.

> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> 
> ---
>  drivers/xen/events.c |   40 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 35 insertions(+), 5 deletions(-)
> 
> Index: linux-3.0/drivers/xen/events.c
> ===================================================================
> --- linux-3.0.orig/drivers/xen/events.c
> +++ linux-3.0/drivers/xen/events.c
> @@ -877,11 +877,35 @@ static int bind_interdomain_evtchn_to_ir
>  	return err ? : bind_evtchn_to_irq(bind_interdomain.local_port);
>  }
>  
> +/* BITS_PER_LONG is used in Xen */
> +#define MAX_EVTCHNS (64 * 64)
> +
> +static int find_virq(unsigned int virq, unsigned int cpu)
> +{
> +	struct evtchn_status status;
> +	int port, rc = -ENOENT;
> +
> +	memset(&status, 0, sizeof(status));
> +	for (port = 0; port <= MAX_EVTCHNS; port++) {
> +		status.dom = DOMID_SELF;
> +		status.port = port;
> +		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
> +		if (rc < 0)
> +			continue;
> +		if (status.status != EVTCHNSTAT_virq)
> +			continue;
> +		if (status.u.virq == virq && status.vcpu == cpu) {
> +			rc = port;
> +			break;
> +		}
> +	}
> +	return rc;
> +}
>  
>  int bind_virq_to_irq(unsigned int virq, unsigned int cpu)
>  {
>  	struct evtchn_bind_virq bind_virq;
> -	int evtchn, irq;
> +	int evtchn, irq, ret;
>  
>  	spin_lock(&irq_mapping_update_lock);
>  
> @@ -897,10 +921,16 @@ int bind_virq_to_irq(unsigned int virq,
>  
>  		bind_virq.virq = virq;
>  		bind_virq.vcpu = cpu;
> -		if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
> -						&bind_virq) != 0)
> -			BUG();
> -		evtchn = bind_virq.port;
> +		ret = HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
> +						&bind_virq);
> +		if (ret == 0)
> +			evtchn = bind_virq.port;
> +		else {
> +			if (ret == -EEXIST)
> +				ret = find_virq(virq, cpu);
> +			BUG_ON(ret < 0);
> +			evtchn = ret;
> +		}
>  
>  		xen_irq_info_virq_init(cpu, irq, evtchn, virq);
>  
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 3/3] xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
  2011-08-04 16:20 ` [PATCH 3/3] xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel Olaf Hering
@ 2011-08-09  9:23   ` Ian Campbell
  2011-08-09  9:44       ` Olaf Hering
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Campbell @ 2011-08-09  9:23 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, Jeremy Fitzhardinge, Konrad, xen-devel

On Thu, 2011-08-04 at 17:20 +0100, Olaf Hering wrote:
> After triggering a crash dump in a HVM guest, the PV backend drivers
> will remain in Connected state. When the kdump kernel starts the PV
> drivers will skip such devices. As a result, no root device is found and
> the vmcore cant be saved.  
> 
> A similar situation happens after a kexec boot, here the devices will be
> in the Closed state.
> 
> With this change all frontend devices with state XenbusStateConnected or
> XenbusStateClosed will be reset by changing the state file to
> Closing/Closed/Initializing.  This will trigger a disconnect in the
> backend drivers. Now the frontend drivers will find the backend drivers
> in state Initwait and can connect.
> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> 
> ---
>  drivers/xen/xenbus/xenbus_comms.c          |    4 -
>  drivers/xen/xenbus/xenbus_probe_frontend.c |  116 +++++++++++++++++++++++++++++
>  2 files changed, 119 insertions(+), 1 deletion(-)
> 
> Index: linux-3.0/drivers/xen/xenbus/xenbus_comms.c
> ===================================================================
> --- linux-3.0.orig/drivers/xen/xenbus/xenbus_comms.c
> +++ linux-3.0/drivers/xen/xenbus/xenbus_comms.c
> @@ -212,7 +212,9 @@ int xb_init_comms(void)
>  		printk(KERN_WARNING "XENBUS response ring is not quiescent "
>  		       "(%08x:%08x): fixing up\n",
>  		       intf->rsp_cons, intf->rsp_prod);
> -		intf->rsp_cons = intf->rsp_prod;
> +		/* breaks kdump */
> +		if (!reset_devices)
> +			intf->rsp_cons = intf->rsp_prod;
>  	}
>  
>  	if (xenbus_irq) {
> Index: linux-3.0/drivers/xen/xenbus/xenbus_probe_frontend.c
> ===================================================================
> --- linux-3.0.orig/drivers/xen/xenbus/xenbus_probe_frontend.c
> +++ linux-3.0/drivers/xen/xenbus/xenbus_probe_frontend.c
> @@ -252,10 +252,126 @@ int __xenbus_register_frontend(struct xe
>  }
>  EXPORT_SYMBOL_GPL(__xenbus_register_frontend);
>  
> +static DECLARE_WAIT_QUEUE_HEAD(backend_state_wq);
> +static int backend_state;
> +
> +static void xenbus_reset_backend_state_changed(struct xenbus_watch *w,
> +					const char **v, unsigned int l)
> +{
> +	xenbus_scanf(XBT_NIL, v[XS_WATCH_PATH], "", "%i", &backend_state);
> +	printk(KERN_DEBUG "XENBUS: %s %s\n",
> +			v[XS_WATCH_PATH], xenbus_strstate(backend_state));
> +	wake_up(&backend_state_wq);
> +}
> +
> +static void xenbus_reset_wait_for_backend(int expected)
> +{
> +	wait_event_interruptible(backend_state_wq, backend_state == expected);
> +}
> +
> +/*
> + * Reset frontend if it is in Connected or Closed state.
> + * Wait for backend to catch up.
> + * State Connected happens during kdump, Closed after kexec.
> + */
> +static void xenbus_reset_frontend(char *fe, char *be, int be_state)
> +{
> +	struct xenbus_watch be_watch;
> +
> +	printk(KERN_DEBUG "XENBUS: backend %s %s\n",
> +			be, xenbus_strstate(be_state));
> +
> +	memset(&be_watch, 0, sizeof(be_watch));
> +	be_watch.node = kasprintf(GFP_NOIO | __GFP_HIGH, "%s/state", be);
> +	if (!be_watch.node)
> +		return;
> +
> +	be_watch.callback = xenbus_reset_backend_state_changed;
> +	backend_state = XenbusStateUnknown;
> +
> +	printk(KERN_INFO "XENBUS: triggering reconnect on %s\n", be);
> +	register_xenbus_watch(&be_watch);
> +
> +	switch (be_state) {
> +	case XenbusStateConnected:
> +		xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateClosing);
> +		xenbus_reset_wait_for_backend(XenbusStateClosing);
> +
> +	case XenbusStateClosing:
> +		xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateClosed);
> +		xenbus_reset_wait_for_backend(XenbusStateClosed);
> +
> +	case XenbusStateClosed:
> +		xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateInitialising);
> +		xenbus_reset_wait_for_backend(XenbusStateInitWait);
> +	}
> +
> +	unregister_xenbus_watch(&be_watch);
> +	printk(KERN_INFO "XENBUS: reconnect done on %s\n", be);
> +	kfree(be_watch.node);
> +}
> +
> +static void xenbus_check_frontend(char *class, char *dev)
> +{
> +	int be_state, fe_state, err;
> +	char *backend, *frontend;
> +
> +	frontend = kasprintf(GFP_NOIO | __GFP_HIGH, "device/%s/%s", class, dev);
> +	if (!frontend)
> +		return;
> +
> +	err = xenbus_scanf(XBT_NIL, frontend, "state", "%i", &fe_state);
> +	if (err != 1)
> +		goto out;
> +
> +	switch (fe_state) {
> +	case XenbusStateConnected:
> +	case XenbusStateClosed:
> +		printk(KERN_DEBUG "XENBUS: frontend %s %s\n",
> +				frontend, xenbus_strstate(fe_state));
> +		backend = xenbus_read(XBT_NIL, frontend, "backend", NULL);
> +		if (!backend || IS_ERR(backend))
> +			goto out;
> +		err = xenbus_scanf(XBT_NIL, backend, "state", "%i", &be_state);
> +		if (err == 1)
> +			xenbus_reset_frontend(frontend, backend, be_state);
> +		kfree(backend);
> +		break;
> +	default:
> +		break;
> +	}
> +out:
> +	kfree(frontend);
> +}
> +
> +static void xenbus_reset_state(void)
> +{
> +	char **devclass, **dev;
> +	int devclass_n, dev_n;
> +	int i, j;
> +
> +	devclass = xenbus_directory(XBT_NIL, "device", "", &devclass_n);
> +	if (IS_ERR(devclass))
> +		return;
> +
> +	for (i = 0; i < devclass_n; i++) {
> +		dev = xenbus_directory(XBT_NIL, "device", devclass[i], &dev_n);
> +		if (IS_ERR(dev))
> +			continue;
> +		for (j = 0; j < dev_n; j++)
> +			xenbus_check_frontend(devclass[i], dev[j]);
> +		kfree(dev);
> +	}
> +	kfree(devclass);
> +}
> +
>  static int frontend_probe_and_watch(struct notifier_block *notifier,
>  				   unsigned long event,
>  				   void *data)
>  {
> +	/* reset devices in Connected or Closed state */
> +	if (xen_hvm_domain())
                               && reset_devices ??

How long should we wait for the backend to respond? Should we add a
timeout and countdown similar to wait_for_devices?

It's unfortunate that this code is effectively serialising on each
device. It would be much preferable to kick off all the resets and then
wait for them to occur. You could probably do this by incrementing a
counter for each device you reset and decrementing it each time a watch
triggers then wait for the counter to hit zero.

> +		xenbus_reset_state();
>  	/* Enumerate devices in xenstore and watch for changes. */
>  	xenbus_probe_devices(&xenbus_frontend);
>  	register_xenbus_watch(&fe_watch);
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] xen/pv-on-hvm kexec: add xs_introduce to shutdown watches from old kernel
  2011-08-09  9:15     ` Ian Campbell
  (?)
@ 2011-08-09  9:28     ` Olaf Hering
  -1 siblings, 0 replies; 18+ messages in thread
From: Olaf Hering @ 2011-08-09  9:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: linux-kernel, Jeremy Fitzhardinge, Konrad, xen-devel

On Tue, Aug 09, Ian Campbell wrote:

> > +++ linux-3.0/drivers/xen/xenbus/xenbus_xs.c

> > +	return xs_error(xs_talkv(XBT_NIL, XS_INTRODUCE, iov,
> > +				 ARRAY_SIZE(iov), NULL));
> 
> What should we do if this fails?

There is not much other than printk and move on.
A kexec boot cant be detected from withing kexec.

> > +	snprintf(domid, sizeof(domid), "%u", DOMID_SELF);
> > +	snprintf(mfn, sizeof(mfn), "%lu", xen_store_mfn);
> > +	snprintf(port, sizeof(port), "%d", xen_store_evtchn);
> 
> These can be within the if, or better within the xs_introduce function
> itself.

Yes, thats true. I will rearrange the code.

Olaf

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/3] xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
  2011-08-09  9:17   ` [Xen-devel] " Ian Campbell
@ 2011-08-09  9:29     ` Olaf Hering
  2011-08-09 11:25       ` Ian Campbell
  2011-08-09 15:28     ` Olaf Hering
  1 sibling, 1 reply; 18+ messages in thread
From: Olaf Hering @ 2011-08-09  9:29 UTC (permalink / raw)
  To: Ian Campbell; +Cc: linux-kernel, Jeremy Fitzhardinge, Konrad, xen-devel

On Tue, Aug 09, Ian Campbell wrote:

> On Thu, 2011-08-04 at 17:20 +0100, Olaf Hering wrote:
> > During a kexec boot some virqs such as timer and debugirq were already
> > registered by the old kernel.  The hypervisor will return -EEXISTS from
> > the new EVTCHNOP_bind_virq request and the BUG in bind_virq_to_irq()
> > triggers.  Catch the -EEXISTS error and loop through all possible ports to find
> > what port belongs to the virq/cpu combo.
> 
> Would it be better to proactively just query the status of all event
> channels early on, like you do in find_virq, and setup the irq info
> structures as appropriate? Rather than waiting for an -EEXISTS I mean.

Doing one hypercall in the common case is cheaper than doing a dozen in
the kexec case.

If you prefer I will rearrange this part and query first.

Olaf

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 3/3] xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
  2011-08-09  9:23   ` [Xen-devel] " Ian Campbell
@ 2011-08-09  9:44       ` Olaf Hering
  0 siblings, 0 replies; 18+ messages in thread
From: Olaf Hering @ 2011-08-09  9:44 UTC (permalink / raw)
  To: Ian Campbell; +Cc: linux-kernel, Jeremy Fitzhardinge, Konrad, xen-devel

On Tue, Aug 09, Ian Campbell wrote:

> >  static int frontend_probe_and_watch(struct notifier_block *notifier,
> >  				   unsigned long event,
> >  				   void *data)
> >  {
> > +	/* reset devices in Connected or Closed state */
> > +	if (xen_hvm_domain())
>                                && reset_devices ??

No, reset_devices is passed as kernel cmdline option to a kdump boot.
But its not part of a kexec boot.

> How long should we wait for the backend to respond? Should we add a
> timeout and countdown similar to wait_for_devices?

Adding a timeout to catch a confused backend is a good idea. That would
give one at least a chance to poke around in a rescue shell.

> It's unfortunate that this code is effectively serialising on each
> device. It would be much preferable to kick off all the resets and then
> wait for them to occur. You could probably do this by incrementing a
> counter for each device you reset and decrementing it each time a watch
> triggers then wait for the counter to hit zero.

That feature needs more thought. Since xenbus_reset_state() is only
executed in the kexec/kdump case, the average use case is not slowed
down.

Olaf

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
@ 2011-08-09  9:44       ` Olaf Hering
  0 siblings, 0 replies; 18+ messages in thread
From: Olaf Hering @ 2011-08-09  9:44 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jeremy Fitzhardinge, xen-devel, linux-kernel, Konrad

On Tue, Aug 09, Ian Campbell wrote:

> >  static int frontend_probe_and_watch(struct notifier_block *notifier,
> >  				   unsigned long event,
> >  				   void *data)
> >  {
> > +	/* reset devices in Connected or Closed state */
> > +	if (xen_hvm_domain())
>                                && reset_devices ??

No, reset_devices is passed as kernel cmdline option to a kdump boot.
But its not part of a kexec boot.

> How long should we wait for the backend to respond? Should we add a
> timeout and countdown similar to wait_for_devices?

Adding a timeout to catch a confused backend is a good idea. That would
give one at least a chance to poke around in a rescue shell.

> It's unfortunate that this code is effectively serialising on each
> device. It would be much preferable to kick off all the resets and then
> wait for them to occur. You could probably do this by incrementing a
> counter for each device you reset and decrementing it each time a watch
> triggers then wait for the counter to hit zero.

That feature needs more thought. Since xenbus_reset_state() is only
executed in the kexec/kdump case, the average use case is not slowed
down.

Olaf

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/3] xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
  2011-08-09  9:29     ` Olaf Hering
@ 2011-08-09 11:25       ` Ian Campbell
  0 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2011-08-09 11:25 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, Jeremy Fitzhardinge, Konrad, xen-devel

On Tue, 2011-08-09 at 10:29 +0100, Olaf Hering wrote:
> On Tue, Aug 09, Ian Campbell wrote:
> 
> > On Thu, 2011-08-04 at 17:20 +0100, Olaf Hering wrote:
> > > During a kexec boot some virqs such as timer and debugirq were already
> > > registered by the old kernel.  The hypervisor will return -EEXISTS from
> > > the new EVTCHNOP_bind_virq request and the BUG in bind_virq_to_irq()
> > > triggers.  Catch the -EEXISTS error and loop through all possible ports to find
> > > what port belongs to the virq/cpu combo.
> > 
> > Would it be better to proactively just query the status of all event
> > channels early on, like you do in find_virq, and setup the irq info
> > structures as appropriate? Rather than waiting for an -EEXISTS I mean.
> 
> Doing one hypercall in the common case is cheaper than doing a dozen in
> the kexec case.

It's actually up to NR_EVENT_CHANNELS*NR_VIRQS (since you will do this
each time you find a VIRQ which is already bound) which is 24,576 on 32
bit and 98,304 hypercalls on 64 bit. If you just do it in the common
case (i.e. query each VIRQ once) then it is "only" 1024 in the 32 bit
case and 4096 in the 64 bit case.

I guess there's probably no way to detect if/when we need to do this? I
suppose we could scan all VIRQs on on the first -EEXISTS?

> If you prefer I will rearrange this part and query first.
> 
> Olaf



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 3/3] xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
  2011-08-09  9:44       ` Olaf Hering
  (?)
@ 2011-08-09 11:26       ` Ian Campbell
  -1 siblings, 0 replies; 18+ messages in thread
From: Ian Campbell @ 2011-08-09 11:26 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, Jeremy Fitzhardinge, Konrad, xen-devel

On Tue, 2011-08-09 at 10:44 +0100, Olaf Hering wrote:
> On Tue, Aug 09, Ian Campbell wrote:
> 
> > >  static int frontend_probe_and_watch(struct notifier_block *notifier,
> > >  				   unsigned long event,
> > >  				   void *data)
> > >  {
> > > +	/* reset devices in Connected or Closed state */
> > > +	if (xen_hvm_domain())
> >                                && reset_devices ??
> 
> No, reset_devices is passed as kernel cmdline option to a kdump boot.
> But its not part of a kexec boot.
> 
> > How long should we wait for the backend to respond? Should we add a
> > timeout and countdown similar to wait_for_devices?
> 
> Adding a timeout to catch a confused backend is a good idea. That would
> give one at least a chance to poke around in a rescue shell.
> 
> > It's unfortunate that this code is effectively serialising on each
> > device. It would be much preferable to kick off all the resets and then
> > wait for them to occur. You could probably do this by incrementing a
> > counter for each device you reset and decrementing it each time a watch
> > triggers then wait for the counter to hit zero.
> 
> That feature needs more thought. Since xenbus_reset_state() is only
> executed in the kexec/kdump case, the average use case is not slowed
> down.

I was thinking more of avoiding slowing down the kexec/kdump case
unnecessarily.

Ian.
> 
> Olaf



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/3] xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
  2011-08-09  9:17   ` [Xen-devel] " Ian Campbell
  2011-08-09  9:29     ` Olaf Hering
@ 2011-08-09 15:28     ` Olaf Hering
  2011-08-09 18:51       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 18+ messages in thread
From: Olaf Hering @ 2011-08-09 15:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: linux-kernel, Jeremy Fitzhardinge, Konrad, xen-devel

On Tue, Aug 09, Ian Campbell wrote:

> On Thu, 2011-08-04 at 17:20 +0100, Olaf Hering wrote:
> > During a kexec boot some virqs such as timer and debugirq were already
> > registered by the old kernel.  The hypervisor will return -EEXISTS from
> > the new EVTCHNOP_bind_virq request and the BUG in bind_virq_to_irq()
> > triggers.  Catch the -EEXISTS error and loop through all possible ports to find
> > what port belongs to the virq/cpu combo.
> 
> Would it be better to proactively just query the status of all event
> channels early on, like you do in find_virq, and setup the irq info
> structures as appropriate? Rather than waiting for an -EEXISTS I mean.

Now that I read that again more carefully: 
No idea if thats possible, I will leave that answer to Jeremy/Konrad.

Olaf

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 2/3] xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports
  2011-08-09 15:28     ` Olaf Hering
@ 2011-08-09 18:51       ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-09 18:51 UTC (permalink / raw)
  To: Olaf Hering; +Cc: Ian Campbell, linux-kernel, Jeremy Fitzhardinge, xen-devel

On Tue, Aug 09, 2011 at 05:28:06PM +0200, Olaf Hering wrote:
> On Tue, Aug 09, Ian Campbell wrote:
> 
> > On Thu, 2011-08-04 at 17:20 +0100, Olaf Hering wrote:
> > > During a kexec boot some virqs such as timer and debugirq were already
> > > registered by the old kernel.  The hypervisor will return -EEXISTS from
> > > the new EVTCHNOP_bind_virq request and the BUG in bind_virq_to_irq()
> > > triggers.  Catch the -EEXISTS error and loop through all possible ports to find
> > > what port belongs to the virq/cpu combo.
> > 
> > Would it be better to proactively just query the status of all event
> > channels early on, like you do in find_virq, and setup the irq info
> > structures as appropriate? Rather than waiting for an -EEXISTS I mean.

We only create those structures when the IRQ handler is setup. And since
this is a new kernel, the irq_get_handler_data(irq) won't be present.
> 
> Now that I read that again more carefully: 
> No idea if thats possible, I will leave that answer to Jeremy/Konrad.

But we could an optimization. The find_virq does a search every time
from 0->NR_EVENTS. Perhaps we can also check the xen_irq_list_head
to skip over the event channels we have already created?

Something like this:

   bool skip = false;
   list_for_each_entry(info, &xen_irq_list_head, list)
   if (info->evtchn == port && info->cpu == cpu) {
        skip=true;
        break;
    }
    if (skip)
        continue

    .. snip..
    and here is the EVTCHNOP_status hypercall.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xen-devel] [PATCH 0/3] [v4] kexec and kdump for Xen PVonHVM guests
  2011-08-04 16:20 [PATCH 0/3] [v4] kexec and kdump for Xen PVonHVM guests Olaf Hering
                   ` (2 preceding siblings ...)
  2011-08-04 16:20 ` [PATCH 3/3] xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel Olaf Hering
@ 2011-08-09 19:02 ` Konrad Rzeszutek Wilk
  3 siblings, 0 replies; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-09 19:02 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linux-kernel, Jeremy Fitzhardinge, xen-devel

On Thu, Aug 04, 2011 at 06:20:53PM +0200, Olaf Hering wrote:
> 
> 
> The following series implements kexec and kdump in a Xen PVonHVM guest.
> 
> It is (should be) available via git:
> 
>  git://github.com/olafhering/linux.git xen-kexec-3.0
> 
> The kexec or kdump kernel has to take care of already allocated virqs,
> PV devices in Closed or Connected state, and of registered watches in
> the old kernel. With the three patches these conditions are checked
> during boot of the new kernel rather than in the reboot/crash path.
> 
> A fixed kexec-tools-2.0.2 package is required:
> http://lists.infradead.org/pipermail/kexec/2011-May/005026.html
> http://lists.infradead.org/pipermail/kexec/2011-August/005339.html
> 
> Another fix is for xenstored, it has to accept the XS_INTRODUCE from a guest:
> http://lists.xensource.com/archives/html/xen-devel/2011-08/msg00007.html
> 
> One open issue is the balloon driver. It removes pages from the guest
> and gives them back to the hypervisor. The kexec kernel is not aware of
> the fact that some pages are unavailable, and hangs or crashes.
> The workaround for the time being is:
> 
> if test -f /sys/devices/system/xen_memory/xen_memory0/target -a \
>         -f /sys/devices/system/xen_memory/xen_memory0/target_kb
> then
>     cat /sys/devices/system/xen_memory/xen_memory0/target > \
>         /sys/devices/system/xen_memory/xen_memory0/target_kb
> fi
> kexec -e
> 
> This has to be resolved with another series of changes.

You are just plowing through these bugs and fixing. Thanks for doing it!

I think one more revision and they will be ready?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/3] xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel
  2011-08-16 13:16 [PATCH 0/3] [v5] " Olaf Hering
@ 2011-08-16 13:16 ` Olaf Hering
  0 siblings, 0 replies; 18+ messages in thread
From: Olaf Hering @ 2011-08-16 13:16 UTC (permalink / raw)
  To: linux-kernel, Jeremy Fitzhardinge, Konrad; +Cc: xen-devel

After triggering a crash dump in a HVM guest, the PV backend drivers
will remain in Connected state. When the kdump kernel starts the PV
drivers will skip such devices. As a result, no root device is found and
the vmcore cant be saved.

A similar situation happens after a kexec boot, here the devices will be
in the Closed state.

With this change all frontend devices with state XenbusStateConnected or
XenbusStateClosed will be reset by changing the state file to Closing ->
Closed -> Initializing.  This will trigger a disconnect in the backend
drivers. Now the frontend drivers will find the backend drivers in state
Initwait and can connect.

v2:
  - add timeout when waiting for backend state change
  (based on feedback from Ian Campell)
  - extent printk message to include backend string
  - add comment to fall-through case in xenbus_reset_frontend

Signed-off-by: Olaf Hering <olaf@aepfle.de>
---
 drivers/xen/xenbus/xenbus_comms.c          |    4 +-
 drivers/xen/xenbus/xenbus_probe_frontend.c |  121 ++++++++++++++++++++++++++++
 2 files changed, 124 insertions(+), 1 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c
index 090c61e..2eff7a6 100644
--- a/drivers/xen/xenbus/xenbus_comms.c
+++ b/drivers/xen/xenbus/xenbus_comms.c
@@ -212,7 +212,9 @@ int xb_init_comms(void)
 		printk(KERN_WARNING "XENBUS response ring is not quiescent "
 		       "(%08x:%08x): fixing up\n",
 		       intf->rsp_cons, intf->rsp_prod);
-		intf->rsp_cons = intf->rsp_prod;
+		/* breaks kdump */
+		if (!reset_devices)
+			intf->rsp_cons = intf->rsp_prod;
 	}
 
 	if (xenbus_irq) {
diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index b6a2690..b521ce4 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -252,10 +252,131 @@ int __xenbus_register_frontend(struct xenbus_driver *drv,
 }
 EXPORT_SYMBOL_GPL(__xenbus_register_frontend);
 
+static DECLARE_WAIT_QUEUE_HEAD(backend_state_wq);
+static int backend_state;
+
+static void xenbus_reset_backend_state_changed(struct xenbus_watch *w,
+					const char **v, unsigned int l)
+{
+	xenbus_scanf(XBT_NIL, v[XS_WATCH_PATH], "", "%i", &backend_state);
+	printk(KERN_DEBUG "XENBUS: backend %s %s\n",
+			v[XS_WATCH_PATH], xenbus_strstate(backend_state));
+	wake_up(&backend_state_wq);
+}
+
+static void xenbus_reset_wait_for_backend(char *be, int expected)
+{
+	long timeout;
+	timeout = wait_event_interruptible_timeout(backend_state_wq,
+			backend_state == expected, 5 * HZ);
+	if (timeout <= 0)
+		printk(KERN_INFO "XENBUS: backend %s timed out.\n", be);
+}
+
+/*
+ * Reset frontend if it is in Connected or Closed state.
+ * Wait for backend to catch up.
+ * State Connected happens during kdump, Closed after kexec.
+ */
+static void xenbus_reset_frontend(char *fe, char *be, int be_state)
+{
+	struct xenbus_watch be_watch;
+
+	printk(KERN_DEBUG "XENBUS: backend %s %s\n",
+			be, xenbus_strstate(be_state));
+
+	memset(&be_watch, 0, sizeof(be_watch));
+	be_watch.node = kasprintf(GFP_NOIO | __GFP_HIGH, "%s/state", be);
+	if (!be_watch.node)
+		return;
+
+	be_watch.callback = xenbus_reset_backend_state_changed;
+	backend_state = XenbusStateUnknown;
+
+	printk(KERN_INFO "XENBUS: triggering reconnect on %s\n", be);
+	register_xenbus_watch(&be_watch);
+
+	/* fall through to forward backend to state XenbusStateInitialising */
+	switch (be_state) {
+	case XenbusStateConnected:
+		xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateClosing);
+		xenbus_reset_wait_for_backend(be, XenbusStateClosing);
+
+	case XenbusStateClosing:
+		xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateClosed);
+		xenbus_reset_wait_for_backend(be, XenbusStateClosed);
+
+	case XenbusStateClosed:
+		xenbus_printf(XBT_NIL, fe, "state", "%d", XenbusStateInitialising);
+		xenbus_reset_wait_for_backend(be, XenbusStateInitWait);
+	}
+
+	unregister_xenbus_watch(&be_watch);
+	printk(KERN_INFO "XENBUS: reconnect done on %s\n", be);
+	kfree(be_watch.node);
+}
+
+static void xenbus_check_frontend(char *class, char *dev)
+{
+	int be_state, fe_state, err;
+	char *backend, *frontend;
+
+	frontend = kasprintf(GFP_NOIO | __GFP_HIGH, "device/%s/%s", class, dev);
+	if (!frontend)
+		return;
+
+	err = xenbus_scanf(XBT_NIL, frontend, "state", "%i", &fe_state);
+	if (err != 1)
+		goto out;
+
+	switch (fe_state) {
+	case XenbusStateConnected:
+	case XenbusStateClosed:
+		printk(KERN_DEBUG "XENBUS: frontend %s %s\n",
+				frontend, xenbus_strstate(fe_state));
+		backend = xenbus_read(XBT_NIL, frontend, "backend", NULL);
+		if (!backend || IS_ERR(backend))
+			goto out;
+		err = xenbus_scanf(XBT_NIL, backend, "state", "%i", &be_state);
+		if (err == 1)
+			xenbus_reset_frontend(frontend, backend, be_state);
+		kfree(backend);
+		break;
+	default:
+		break;
+	}
+out:
+	kfree(frontend);
+}
+
+static void xenbus_reset_state(void)
+{
+	char **devclass, **dev;
+	int devclass_n, dev_n;
+	int i, j;
+
+	devclass = xenbus_directory(XBT_NIL, "device", "", &devclass_n);
+	if (IS_ERR(devclass))
+		return;
+
+	for (i = 0; i < devclass_n; i++) {
+		dev = xenbus_directory(XBT_NIL, "device", devclass[i], &dev_n);
+		if (IS_ERR(dev))
+			continue;
+		for (j = 0; j < dev_n; j++)
+			xenbus_check_frontend(devclass[i], dev[j]);
+		kfree(dev);
+	}
+	kfree(devclass);
+}
+
 static int frontend_probe_and_watch(struct notifier_block *notifier,
 				   unsigned long event,
 				   void *data)
 {
+	/* reset devices in Connected or Closed state */
+	if (xen_hvm_domain())
+		xenbus_reset_state();
 	/* Enumerate devices in xenstore and watch for changes. */
 	xenbus_probe_devices(&xenbus_frontend);
 	register_xenbus_watch(&fe_watch);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-08-16 13:17 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-04 16:20 [PATCH 0/3] [v4] kexec and kdump for Xen PVonHVM guests Olaf Hering
2011-08-04 16:20 ` [PATCH 1/3] xen/pv-on-hvm kexec: add xs_introduce to shutdown watches from old kernel Olaf Hering
2011-08-09  9:15   ` [Xen-devel] " Ian Campbell
2011-08-09  9:15     ` Ian Campbell
2011-08-09  9:28     ` [Xen-devel] " Olaf Hering
2011-08-04 16:20 ` [PATCH 2/3] xen/pv-on-hvm kexec: rebind virqs to existing eventchannel ports Olaf Hering
2011-08-09  9:17   ` [Xen-devel] " Ian Campbell
2011-08-09  9:29     ` Olaf Hering
2011-08-09 11:25       ` Ian Campbell
2011-08-09 15:28     ` Olaf Hering
2011-08-09 18:51       ` Konrad Rzeszutek Wilk
2011-08-04 16:20 ` [PATCH 3/3] xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel Olaf Hering
2011-08-09  9:23   ` [Xen-devel] " Ian Campbell
2011-08-09  9:44     ` Olaf Hering
2011-08-09  9:44       ` Olaf Hering
2011-08-09 11:26       ` [Xen-devel] " Ian Campbell
2011-08-09 19:02 ` [Xen-devel] [PATCH 0/3] [v4] kexec and kdump for Xen PVonHVM guests Konrad Rzeszutek Wilk
2011-08-16 13:16 [PATCH 0/3] [v5] " Olaf Hering
2011-08-16 13:16 ` [PATCH 3/3] xen/pv-on-hvm kexec+kdump: reset PV devices in kexec or crash kernel Olaf Hering

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.