linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1).
@ 2013-11-08 17:38 Konrad Rzeszutek Wilk
  2013-11-08 17:38 ` [PATCH 1/4] xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas Konrad Rzeszutek Wilk
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-11-08 17:38 UTC (permalink / raw)
  To: Ian.Campbell, xen-devel, linux-kernel, JBeulich, david.vrabel,
	boris.ostrovsky

Hey,

Two of these:
 [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet
 [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling

fix the bugs.xenproject.org outstanding bugs.

The other ones are that were discovered on xen-devel and discussed.
They should go in v3.13 and as the merge window is next week I am
hoping they can be squeezed in then.


 drivers/xen/manage.c            | 64 ++++++++++++++++++++++++++++++++++-------
 drivers/xen/pci.c               | 47 ++++++++++++++++++++++++++++++
 drivers/xen/xenbus/xenbus_xs.c  | 24 ++++++++++++++--
 include/xen/interface/physdev.h | 11 +++++++
 
Konrad Rzeszutek Wilk (4):
      xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas.
      xen/manage: Poweroff forcefully if user-space is not yet up.
      xen/manage: Guard against user-space initiated poweroff and XenBus.
      xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/4] xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas.
  2013-11-08 17:38 [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) Konrad Rzeszutek Wilk
@ 2013-11-08 17:38 ` Konrad Rzeszutek Wilk
  2013-11-21 10:37   ` David Vrabel
  2013-11-08 17:38 ` [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet up Konrad Rzeszutek Wilk
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-11-08 17:38 UTC (permalink / raw)
  To: Ian.Campbell, xen-devel, linux-kernel, JBeulich, david.vrabel,
	boris.ostrovsky
  Cc: Konrad Rzeszutek Wilk, Mukesh Rathor

The PCI MMCONFIG area is usually reserved via the E820 so the Xen hypervisor
is aware of these regions. But they can also be enumerated in the ACPI
DSDT which means the hypervisor won't know of them until the initial
domain informs it of via PHYSDEVOP_pci_mmcfg_reserved.

This is what this patch does for all of the MCFG regions that the
initial domain is aware of (E820 enumerated and ACPI).

Reported-by:  Santosh Jodh <Santosh.Jodh@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: David Vrabel <david.vrabel@citrix.com>
CC: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: Redid it a bit]
[v2: Dropped the P2M 1-1 setting]
[v3: Check for Xen in-case we are running under baremetal]
[v4: Wrap with CONFIG_PCI_MMCONFIG]
---
 drivers/xen/pci.c               | 47 +++++++++++++++++++++++++++++++++++++++++
 include/xen/interface/physdev.h | 11 ++++++++++
 2 files changed, 58 insertions(+)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 18fff88..d15f6e8 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -26,6 +26,7 @@
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 #include "../pci/pci.h"
+#include <asm/pci_x86.h>
 
 static bool __read_mostly pci_seg_supported = true;
 
@@ -192,3 +193,49 @@ static int __init register_xen_pci_notifier(void)
 }
 
 arch_initcall(register_xen_pci_notifier);
+
+#ifdef CONFIG_PCI_MMCONFIG
+static int __init xen_mcfg_late(void)
+{
+	struct pci_mmcfg_region *cfg;
+	int rc;
+
+	if (!xen_initial_domain())
+		return 0;
+
+	if ((pci_probe & PCI_PROBE_MMCONF) == 0)
+		return 0;
+
+	if (list_empty(&pci_mmcfg_list))
+		return 0;
+
+	/* Check whether they are in the right area. */
+	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
+		struct physdev_pci_mmcfg_reserved r;
+
+		r.address = cfg->address;
+		r.segment = cfg->segment;
+		r.start_bus = cfg->start_bus;
+		r.end_bus = cfg->end_bus;
+		r.flags = XEN_PCI_MMCFG_RESERVED;
+
+		rc = HYPERVISOR_physdev_op(PHYSDEVOP_pci_mmcfg_reserved, &r);
+		switch (rc) {
+		case 0:
+		case -ENOSYS:
+			continue;
+
+		default:
+			pr_warn("Failed to report MMCONFIG reservation"
+				" state for %s to hypervisor"
+				" (%d)\n",
+				cfg->name, rc);
+		}
+	}
+	return 0;
+}
+/*
+ * Needs to be done after acpi_init which are subsys_initcall.
+ */
+subsys_initcall_sync(xen_mcfg_late);
+#endif
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index 7000bb1..42721d1 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -231,6 +231,17 @@ struct physdev_get_free_pirq {
 #define XEN_PCI_DEV_VIRTFN             0x2
 #define XEN_PCI_DEV_PXM                0x4
 
+#define XEN_PCI_MMCFG_RESERVED         0x1
+
+#define PHYSDEVOP_pci_mmcfg_reserved    24
+struct physdev_pci_mmcfg_reserved {
+    uint64_t address;
+    uint16_t segment;
+    uint8_t start_bus;
+    uint8_t end_bus;
+    uint32_t flags;
+};
+
 #define PHYSDEVOP_pci_device_add        25
 struct physdev_pci_device_add {
     /* IN */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet up.
  2013-11-08 17:38 [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) Konrad Rzeszutek Wilk
  2013-11-08 17:38 ` [PATCH 1/4] xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas Konrad Rzeszutek Wilk
@ 2013-11-08 17:38 ` Konrad Rzeszutek Wilk
  2013-11-20 21:11   ` Boris Ostrovsky
  2013-11-21 11:33   ` David Vrabel
  2013-11-08 17:38 ` [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-11-08 17:38 UTC (permalink / raw)
  To: Ian.Campbell, xen-devel, linux-kernel, JBeulich, david.vrabel,
	boris.ostrovsky
  Cc: Konrad Rzeszutek Wilk

The user can launch the guest in this sequence:

xl create -p /vm.cfg	[launch, but pause it]
xl shutdown latest	[sets control/shutdown=poweroff]
xl unpause latest
xl console latest	[and see that the guest has completely
ignored the shutdown request]

In reality the guest hasn't ignored it. It registers a watch
and gets a notification that there is value. It then calls
the shutdown_handler which ends up calling orderly_shutdown.

Unfortunately that is so early in the bootup that there
are no user-space. Which means that the orderly_shutdown fails.
But since the force flag was set to false it continues on without
reporting.

We check if the system is still in the booting stage and if so
enable the force option (which will shutdown in early bootup
process). If in normal running case we don't force it.

Fixes-Bug: http://bugs.xenproject.org/xen/bug/6
Reported-by:  Alex Bligh <alex@alex.org.uk>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Add switch statement]
---
 drivers/xen/manage.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 624e8dc..3f8496c 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -185,7 +185,18 @@ struct shutdown_handler {
 static void do_poweroff(void)
 {
 	shutting_down = SHUTDOWN_POWEROFF;
-	orderly_poweroff(false);
+	switch (system_state) {
+	case SYSTEM_BOOTING:
+		orderly_poweroff(true);
+		break;
+	case SYSTEM_RUNNING:
+		orderly_poweroff(false);
+		break;
+	default:
+		/* Don't do it when we are halting/rebooting. */
+		pr_info("Ignoring Xen toolstack shutdown.\n");
+		break;
+	}
 }
 
 static void do_reboot(void)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus.
  2013-11-08 17:38 [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) Konrad Rzeszutek Wilk
  2013-11-08 17:38 ` [PATCH 1/4] xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas Konrad Rzeszutek Wilk
  2013-11-08 17:38 ` [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet up Konrad Rzeszutek Wilk
@ 2013-11-08 17:38 ` Konrad Rzeszutek Wilk
  2013-11-20 21:40   ` Boris Ostrovsky
                     ` (2 more replies)
  2013-11-08 17:38 ` [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart Konrad Rzeszutek Wilk
  2014-04-03 11:59 ` [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) David Vrabel
  4 siblings, 3 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-11-08 17:38 UTC (permalink / raw)
  To: Ian.Campbell, xen-devel, linux-kernel, JBeulich, david.vrabel,
	boris.ostrovsky
  Cc: Konrad Rzeszutek Wilk

There is a race case where the user does 'poweroff'
and at the same time the system admin does 'xl shutdown'.

Depending on the race, the system_state will be SYSTEM_RUNNING or
SYSTEM_POWER_OFF. If SYSTEM_RUNNING we just end up making
a duplicate call to 'poweroff' (while it is running).

That will fail or execute (And if executed then it will be
stuck in the reboot_mutex mutex). But nobody will care b/c the
machine is in poweroff sequence.

If the system_state is SYSTEM_POWER_OFF then we end up making
a duplicate call to kernel_power_off. There is no locking
there so we walk in the same steps as what 'poweroff'
has been doing.

The code in kernel/reboot.c has a mutex guarding against multiple
'poweroff' operations. But not against the kernel 'orderly_poweroff'.

As such, lets detect this so that we don't invoke orderly_poweroff
if the user had initiated a poweroff.

This is code by changing the 'shutting_down' to an atomic and
having a reboot notifier. If the 'shutting_down' is set to anything
but SHUTDOWN_INVALID the XenBus handler will not run.

That is exactly what we do in the reboot notifier - we set the
'shutting_down' to SHUTDOWN_POWEROFF.

The reason we change the 'shutting_down' to an atomic is that
the changes to said variable were normally guarded by the XenBus
mutex - "xenwatch_mutex" - guarantting only one caller changing
shutting_down. Since we have now the reboot notifier we have
another user of this variable. Surfacing the 'xenwatch_mutex'
out of XenBus is not a nice way of doing it. Having the
variable however be atomic solves the problem easily.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Don't expose xenwatch_mutex, add comments]
---
 drivers/xen/manage.c | 51 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 42 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 3f8496c..323703a 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -36,8 +36,16 @@ enum shutdown_state {
 	 SHUTDOWN_HALT = 4,
 };
 
-/* Ignore multiple shutdown requests. */
-static enum shutdown_state shutting_down = SHUTDOWN_INVALID;
+/* Ignore multiple shutdown requests. There are two potential race conditions:
+ *  - Multiple XenStore 'shutdown' requests. We don't want to run any off
+ *    the callbacks in parallel.
+ *  - In progress 'poweroff' (initiated inside the guest) and a XenStore
+ *     'shutdown' request. If the poweroff has transitioned 'system_state' to
+ *    SYSTEM_POWER_OFF we do not want to call orderly_poweroff. 'system_state'
+ *    is not SMP safe so we depend on reboot notifiers to set 'shutting_down'
+ *    so that we will ignore XenBus shutdown requests.
+ */
+static atomic_t shutting_down = ATOMIC_INIT(SHUTDOWN_INVALID);
 
 struct suspend_info {
 	int cancelled;
@@ -109,7 +117,7 @@ static void do_suspend(void)
 	int err;
 	struct suspend_info si;
 
-	shutting_down = SHUTDOWN_SUSPEND;
+	atomic_set(&shutting_down, SHUTDOWN_SUSPEND);
 
 #ifdef CONFIG_PREEMPT
 	/* If the kernel is preemptible, we need to freeze all the processes
@@ -173,7 +181,7 @@ out_thaw:
 	thaw_processes();
 out:
 #endif
-	shutting_down = SHUTDOWN_INVALID;
+	atomic_set(&shutting_down, SHUTDOWN_INVALID);
 }
 #endif	/* CONFIG_HIBERNATE_CALLBACKS */
 
@@ -184,7 +192,7 @@ struct shutdown_handler {
 
 static void do_poweroff(void)
 {
-	shutting_down = SHUTDOWN_POWEROFF;
+	atomic_set(&shutting_down, SHUTDOWN_POWEROFF);
 	switch (system_state) {
 	case SYSTEM_BOOTING:
 		orderly_poweroff(true);
@@ -201,7 +209,7 @@ static void do_poweroff(void)
 
 static void do_reboot(void)
 {
-	shutting_down = SHUTDOWN_POWEROFF; /* ? */
+	atomic_set(&shutting_down, SHUTDOWN_POWEROFF); /* ? */
 	ctrl_alt_del();
 }
 
@@ -222,7 +230,7 @@ static void shutdown_handler(struct xenbus_watch *watch,
 	};
 	static struct shutdown_handler *handler;
 
-	if (shutting_down != SHUTDOWN_INVALID)
+	if (atomic_read(&shutting_down) != SHUTDOWN_INVALID)
 		return;
 
  again:
@@ -256,12 +264,29 @@ static void shutdown_handler(struct xenbus_watch *watch,
 		handler->cb();
 	} else {
 		pr_info("Ignoring shutdown request: %s\n", str);
-		shutting_down = SHUTDOWN_INVALID;
+		atomic_set(&shutting_down, SHUTDOWN_INVALID);
 	}
 
 	kfree(str);
 }
+/*
+ * This function is called when the system is being rebooted.
+ */
+static int
+xen_system_reboot(struct notifier_block *nb, unsigned long event, void *unused)
+{
+	switch (event) {
+	case SYS_RESTART:
+	case SYS_HALT:
+	case SYS_POWER_OFF:
+		atomic_set(&shutting_down, SHUTDOWN_POWEROFF);
+		break;
+	default:
+		break;
+	}
 
+	return NOTIFY_DONE;
+}
 #ifdef CONFIG_MAGIC_SYSRQ
 static void sysrq_handler(struct xenbus_watch *watch, const char **vec,
 			  unsigned int len)
@@ -302,6 +327,10 @@ static struct xenbus_watch shutdown_watch = {
 	.callback = shutdown_handler
 };
 
+static struct notifier_block xen_shutdown_notifier = {
+	.notifier_call = xen_system_reboot,
+};
+
 static int setup_shutdown_watcher(void)
 {
 	int err;
@@ -319,7 +348,11 @@ static int setup_shutdown_watcher(void)
 		return err;
 	}
 #endif
-
+	err = register_reboot_notifier(&xen_shutdown_notifier);
+	if (err) {
+		pr_warn("Failed to register shutdown notifier\n");
+		return err;
+	}
 	return 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  2013-11-08 17:38 [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2013-11-08 17:38 ` [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus Konrad Rzeszutek Wilk
@ 2013-11-08 17:38 ` Konrad Rzeszutek Wilk
  2013-11-21 17:52   ` [Xen-devel] " David Vrabel
  2014-01-26  1:13   ` Zhang, Yang Z
  2014-04-03 11:59 ` [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) David Vrabel
  4 siblings, 2 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-11-08 17:38 UTC (permalink / raw)
  To: Ian.Campbell, xen-devel, linux-kernel, JBeulich, david.vrabel,
	boris.ostrovsky
  Cc: Konrad Rzeszutek Wilk

The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
'process_msg' is running from within the 'xenbus' thread. Whenever
a message shows up in XenBus it is put on a xs_state.reply_list list
and 'read_reply' picks it up.

The problem is if the backend domain or the xenstored process is killed.
In which case 'xenbus' is still awaiting - and 'read_reply' if called -
stuck forever waiting for the reply_list to have some contents.

This is normally not a problem - as the backend domain can come back
or the xenstored process can be restarted. However if the domain
is in process of being powered off/restarted/halted - there is no
point of waiting on it coming back - as we are effectively being
terminated and should not impede the progress.

This patch solves this problem by checking the 'system_state' value
to see if we are in heading towards death. We also make the wait
mechanism a bit more asynchronous.

Fixes-Bug: http://bugs.xenproject.org/xen/bug/8
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/xenbus/xenbus_xs.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index b6d5fff..4f22706 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -148,9 +148,24 @@ static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
 
 	while (list_empty(&xs_state.reply_list)) {
 		spin_unlock(&xs_state.reply_lock);
-		/* XXX FIXME: Avoid synchronous wait for response here. */
-		wait_event(xs_state.reply_waitq,
-			   !list_empty(&xs_state.reply_list));
+		wait_event_timeout(xs_state.reply_waitq,
+				   !list_empty(&xs_state.reply_list),
+				   msecs_to_jiffies(500));
+
+		/*
+		 * If we are in the process of being shut-down there is
+		 * no point of trying to contact XenBus - it is either
+		 * killed (xenstored application) or the other domain
+		 * has been killed or is unreachable.
+		 */
+		switch (system_state) {
+		case SYSTEM_POWER_OFF:
+		case SYSTEM_RESTART:
+		case SYSTEM_HALT:
+			return ERR_PTR(-EIO);
+		default:
+			break;
+		}
 		spin_lock(&xs_state.reply_lock);
 	}
 
@@ -215,6 +230,9 @@ void *xenbus_dev_request_and_reply(struct xsd_sockmsg *msg)
 
 	mutex_unlock(&xs_state.request_mutex);
 
+	if (IS_ERR(ret))
+		return ret;
+
 	if ((msg->type == XS_TRANSACTION_END) ||
 	    ((req_msg.type == XS_TRANSACTION_START) &&
 	     (msg->type == XS_ERROR)))
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet up.
  2013-11-08 17:38 ` [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet up Konrad Rzeszutek Wilk
@ 2013-11-20 21:11   ` Boris Ostrovsky
  2013-11-21 11:33   ` David Vrabel
  1 sibling, 0 replies; 27+ messages in thread
From: Boris Ostrovsky @ 2013-11-20 21:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, david.vrabel

On 11/08/2013 12:38 PM, Konrad Rzeszutek Wilk wrote:
> The user can launch the guest in this sequence:
>
> xl create -p /vm.cfg	[launch, but pause it]
> xl shutdown latest	[sets control/shutdown=poweroff]
> xl unpause latest
> xl console latest	[and see that the guest has completely
> ignored the shutdown request]
>
> In reality the guest hasn't ignored it. It registers a watch
> and gets a notification that there is value. It then calls
> the shutdown_handler which ends up calling orderly_shutdown.
>
> Unfortunately that is so early in the bootup that there
> are no user-space. Which means that the orderly_shutdown fails.
> But since the force flag was set to false it continues on without
> reporting.
>
> We check if the system is still in the booting stage and if so
> enable the force option (which will shutdown in early bootup
> process). If in normal running case we don't force it.
>
> Fixes-Bug: http://bugs.xenproject.org/xen/bug/6
> Reported-by:  Alex Bligh <alex@alex.org.uk>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> [v2: Add switch statement]

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

> ---
>   drivers/xen/manage.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
> index 624e8dc..3f8496c 100644
> --- a/drivers/xen/manage.c
> +++ b/drivers/xen/manage.c
> @@ -185,7 +185,18 @@ struct shutdown_handler {
>   static void do_poweroff(void)
>   {
>   	shutting_down = SHUTDOWN_POWEROFF;
> -	orderly_poweroff(false);
> +	switch (system_state) {
> +	case SYSTEM_BOOTING:
> +		orderly_poweroff(true);
> +		break;
> +	case SYSTEM_RUNNING:
> +		orderly_poweroff(false);
> +		break;
> +	default:
> +		/* Don't do it when we are halting/rebooting. */
> +		pr_info("Ignoring Xen toolstack shutdown.\n");
> +		break;
> +	}
>   }
>   
>   static void do_reboot(void)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus.
  2013-11-08 17:38 ` [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus Konrad Rzeszutek Wilk
@ 2013-11-20 21:40   ` Boris Ostrovsky
  2013-11-21 11:09   ` David Vrabel
  2014-04-01 13:18   ` David Vrabel
  2 siblings, 0 replies; 27+ messages in thread
From: Boris Ostrovsky @ 2013-11-20 21:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, david.vrabel

On 11/08/2013 12:38 PM, Konrad Rzeszutek Wilk wrote:
> There is a race case where the user does 'poweroff'
> and at the same time the system admin does 'xl shutdown'.
>
> Depending on the race, the system_state will be SYSTEM_RUNNING or
> SYSTEM_POWER_OFF. If SYSTEM_RUNNING we just end up making
> a duplicate call to 'poweroff' (while it is running).
>
> That will fail or execute (And if executed then it will be
> stuck in the reboot_mutex mutex). But nobody will care b/c the
> machine is in poweroff sequence.
>
> If the system_state is SYSTEM_POWER_OFF then we end up making
> a duplicate call to kernel_power_off. There is no locking
> there so we walk in the same steps as what 'poweroff'
> has been doing.
>
> The code in kernel/reboot.c has a mutex guarding against multiple
> 'poweroff' operations. But not against the kernel 'orderly_poweroff'.
>
> As such, lets detect this so that we don't invoke orderly_poweroff
> if the user had initiated a poweroff.
>
> This is code by changing the 'shutting_down' to an atomic and
> having a reboot notifier. If the 'shutting_down' is set to anything
> but SHUTDOWN_INVALID the XenBus handler will not run.
>
> That is exactly what we do in the reboot notifier - we set the
> 'shutting_down' to SHUTDOWN_POWEROFF.
>
> The reason we change the 'shutting_down' to an atomic is that
> the changes to said variable were normally guarded by the XenBus
> mutex - "xenwatch_mutex" - guarantting only one caller changing
> shutting_down. Since we have now the reboot notifier we have
> another user of this variable. Surfacing the 'xenwatch_mutex'
> out of XenBus is not a nice way of doing it. Having the
> variable however be atomic solves the problem easily.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> [v2: Don't expose xenwatch_mutex, add comments]

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

> ---
>   drivers/xen/manage.c | 51 ++++++++++++++++++++++++++++++++++++++++++---------
>   1 file changed, 42 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
> index 3f8496c..323703a 100644
> --- a/drivers/xen/manage.c
> +++ b/drivers/xen/manage.c
> @@ -36,8 +36,16 @@ enum shutdown_state {
>   	 SHUTDOWN_HALT = 4,
>   };
>   
> -/* Ignore multiple shutdown requests. */
> -static enum shutdown_state shutting_down = SHUTDOWN_INVALID;
> +/* Ignore multiple shutdown requests. There are two potential race conditions:
> + *  - Multiple XenStore 'shutdown' requests. We don't want to run any off
> + *    the callbacks in parallel.
> + *  - In progress 'poweroff' (initiated inside the guest) and a XenStore
> + *     'shutdown' request. If the poweroff has transitioned 'system_state' to
> + *    SYSTEM_POWER_OFF we do not want to call orderly_poweroff. 'system_state'
> + *    is not SMP safe so we depend on reboot notifiers to set 'shutting_down'
> + *    so that we will ignore XenBus shutdown requests.
> + */
> +static atomic_t shutting_down = ATOMIC_INIT(SHUTDOWN_INVALID);
>   
>   struct suspend_info {
>   	int cancelled;
> @@ -109,7 +117,7 @@ static void do_suspend(void)
>   	int err;
>   	struct suspend_info si;
>   
> -	shutting_down = SHUTDOWN_SUSPEND;
> +	atomic_set(&shutting_down, SHUTDOWN_SUSPEND);
>   
>   #ifdef CONFIG_PREEMPT
>   	/* If the kernel is preemptible, we need to freeze all the processes
> @@ -173,7 +181,7 @@ out_thaw:
>   	thaw_processes();
>   out:
>   #endif
> -	shutting_down = SHUTDOWN_INVALID;
> +	atomic_set(&shutting_down, SHUTDOWN_INVALID);
>   }
>   #endif	/* CONFIG_HIBERNATE_CALLBACKS */
>   
> @@ -184,7 +192,7 @@ struct shutdown_handler {
>   
>   static void do_poweroff(void)
>   {
> -	shutting_down = SHUTDOWN_POWEROFF;
> +	atomic_set(&shutting_down, SHUTDOWN_POWEROFF);
>   	switch (system_state) {
>   	case SYSTEM_BOOTING:
>   		orderly_poweroff(true);
> @@ -201,7 +209,7 @@ static void do_poweroff(void)
>   
>   static void do_reboot(void)
>   {
> -	shutting_down = SHUTDOWN_POWEROFF; /* ? */
> +	atomic_set(&shutting_down, SHUTDOWN_POWEROFF); /* ? */
>   	ctrl_alt_del();
>   }
>   
> @@ -222,7 +230,7 @@ static void shutdown_handler(struct xenbus_watch *watch,
>   	};
>   	static struct shutdown_handler *handler;
>   
> -	if (shutting_down != SHUTDOWN_INVALID)
> +	if (atomic_read(&shutting_down) != SHUTDOWN_INVALID)
>   		return;
>   
>    again:
> @@ -256,12 +264,29 @@ static void shutdown_handler(struct xenbus_watch *watch,
>   		handler->cb();
>   	} else {
>   		pr_info("Ignoring shutdown request: %s\n", str);
> -		shutting_down = SHUTDOWN_INVALID;
> +		atomic_set(&shutting_down, SHUTDOWN_INVALID);
>   	}
>   
>   	kfree(str);
>   }
> +/*
> + * This function is called when the system is being rebooted.
> + */
> +static int
> +xen_system_reboot(struct notifier_block *nb, unsigned long event, void *unused)
> +{
> +	switch (event) {
> +	case SYS_RESTART:
> +	case SYS_HALT:
> +	case SYS_POWER_OFF:
> +		atomic_set(&shutting_down, SHUTDOWN_POWEROFF);
> +		break;
> +	default:
> +		break;
> +	}
>   
> +	return NOTIFY_DONE;
> +}
>   #ifdef CONFIG_MAGIC_SYSRQ
>   static void sysrq_handler(struct xenbus_watch *watch, const char **vec,
>   			  unsigned int len)
> @@ -302,6 +327,10 @@ static struct xenbus_watch shutdown_watch = {
>   	.callback = shutdown_handler
>   };
>   
> +static struct notifier_block xen_shutdown_notifier = {
> +	.notifier_call = xen_system_reboot,
> +};
> +
>   static int setup_shutdown_watcher(void)
>   {
>   	int err;
> @@ -319,7 +348,11 @@ static int setup_shutdown_watcher(void)
>   		return err;
>   	}
>   #endif
> -
> +	err = register_reboot_notifier(&xen_shutdown_notifier);
> +	if (err) {
> +		pr_warn("Failed to register shutdown notifier\n");
> +		return err;
> +	}
>   	return 0;
>   }
>   


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/4] xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas.
  2013-11-08 17:38 ` [PATCH 1/4] xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas Konrad Rzeszutek Wilk
@ 2013-11-21 10:37   ` David Vrabel
  0 siblings, 0 replies; 27+ messages in thread
From: David Vrabel @ 2013-11-21 10:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky,
	Mukesh Rathor

On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> The PCI MMCONFIG area is usually reserved via the E820 so the Xen hypervisor
> is aware of these regions. But they can also be enumerated in the ACPI
> DSDT which means the hypervisor won't know of them until the initial
> domain informs it of via PHYSDEVOP_pci_mmcfg_reserved.
> 
> This is what this patch does for all of the MCFG regions that the
> initial domain is aware of (E820 enumerated and ACPI).
> 
> Reported-by:  Santosh Jodh <Santosh.Jodh@citrix.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> CC: David Vrabel <david.vrabel@citrix.com>
> CC: Mukesh Rathor <mukesh.rathor@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

> [v1: Redid it a bit]
> [v2: Dropped the P2M 1-1 setting]
> [v3: Check for Xen in-case we are running under baremetal]
> [v4: Wrap with CONFIG_PCI_MMCONFIG]

This sort of change list isn't useful in the long term. I'd prefer it if
this was after the --- divider.

> --- a/drivers/xen/pci.c
> +++ b/drivers/xen/pci.c
> @@ -26,6 +26,7 @@
> +		default:
> +			pr_warn("Failed to report MMCONFIG reservation"
> +				" state for %s to hypervisor"
> +				" (%d)\n",

Message strings shouldn't be split like this.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus.
  2013-11-08 17:38 ` [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus Konrad Rzeszutek Wilk
  2013-11-20 21:40   ` Boris Ostrovsky
@ 2013-11-21 11:09   ` David Vrabel
  2013-11-26 16:45     ` Konrad Rzeszutek Wilk
  2014-04-01 13:18   ` David Vrabel
  2 siblings, 1 reply; 27+ messages in thread
From: David Vrabel @ 2013-11-21 11:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> There is a race case where the user does 'poweroff'
> and at the same time the system admin does 'xl shutdown'.

This isn't a Xen-specific problem is it?  Wouldn't it be better to fix
this in generic code?

Especially since I don't think this patch actually fixes the race
completely.

> --- a/drivers/xen/manage.c
> +++ b/drivers/xen/manage.c
[...]
> @@ -222,7 +230,7 @@ static void shutdown_handler(struct xenbus_watch *watch,
>  	};
>  	static struct shutdown_handler *handler;
>  
> -	if (shutting_down != SHUTDOWN_INVALID)
> +	if (atomic_read(&shutting_down) != SHUTDOWN_INVALID)
>  		return;

In guest initiated poweroff at this time will still race with this
toolstack initiated poweroff.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet up.
  2013-11-08 17:38 ` [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet up Konrad Rzeszutek Wilk
  2013-11-20 21:11   ` Boris Ostrovsky
@ 2013-11-21 11:33   ` David Vrabel
  2013-11-26 16:47     ` Konrad Rzeszutek Wilk
  2014-04-01 15:43     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 27+ messages in thread
From: David Vrabel @ 2013-11-21 11:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> The user can launch the guest in this sequence:
> 
> xl create -p /vm.cfg	[launch, but pause it]
> xl shutdown latest	[sets control/shutdown=poweroff]
> xl unpause latest
> xl console latest	[and see that the guest has completely
> ignored the shutdown request]
> 
> In reality the guest hasn't ignored it. It registers a watch
> and gets a notification that there is value. It then calls
> the shutdown_handler which ends up calling orderly_shutdown.

Is this really a bug?.

>From the xl(1) man page.

  shutdown [OPTIONS] -a|domain-id
     Gracefully shuts down a domain.  This coordinates with the
     domain OS to perform graceful shutdown, so there is no guarantee
     that it will succeed, and may take a variable length of time
     depending on what services must be shutdown in the domain.

Seems like ignoring a shutdown request when the guest cannot yet
shutdown gracefully is the expected behaviour.

This also doesn't seem sufficient.  SYSTEM_RUNNING is set prior to
starting init in an initramfs and orderly_power_off(false) will still
likely fail at this point.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  2013-11-08 17:38 ` [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart Konrad Rzeszutek Wilk
@ 2013-11-21 17:52   ` David Vrabel
  2013-11-22  9:30     ` Ian Campbell
  2013-11-26 16:50     ` Konrad Rzeszutek Wilk
  2014-01-26  1:13   ` Zhang, Yang Z
  1 sibling, 2 replies; 27+ messages in thread
From: David Vrabel @ 2013-11-21 17:52 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, david.vrabel,
	boris.ostrovsky

On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
> 'process_msg' is running from within the 'xenbus' thread. Whenever
> a message shows up in XenBus it is put on a xs_state.reply_list list
> and 'read_reply' picks it up.
> 
> The problem is if the backend domain or the xenstored process is killed.
> In which case 'xenbus' is still awaiting - and 'read_reply' if called -
> stuck forever waiting for the reply_list to have some contents.
> 
> This is normally not a problem - as the backend domain can come back
> or the xenstored process can be restarted. However if the domain
> is in process of being powered off/restarted/halted - there is no
> point of waiting on it coming back - as we are effectively being
> terminated and should not impede the progress.
> 
> This patch solves this problem by checking the 'system_state' value
> to see if we are in heading towards death. We also make the wait
> mechanism a bit more asynchronous.

This seems to be checking the wrong thing conceptually.  We should abort
the wait if xenstored is dead not if our domain is dying.

I think you can consider xenstored as dead if:

a) it's local and we're dying.
b) it's remote and the remote domain is dead.

> Fixes-Bug: http://bugs.xenproject.org/xen/bug/8

This bug link has no useful information in it.

> --- a/drivers/xen/xenbus/xenbus_xs.c
> +++ b/drivers/xen/xenbus/xenbus_xs.c
> @@ -148,9 +148,24 @@ static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
>  
>  	while (list_empty(&xs_state.reply_list)) {
>  		spin_unlock(&xs_state.reply_lock);
> -		/* XXX FIXME: Avoid synchronous wait for response here. */
> -		wait_event(xs_state.reply_waitq,
> -			   !list_empty(&xs_state.reply_list));
> +		wait_event_timeout(xs_state.reply_waitq,
> +				   !list_empty(&xs_state.reply_list),
> +				   msecs_to_jiffies(500));

This is still a synchronous wait.  Is the removal of the FIXME comment
correct?

> +
> +		/*
> +		 * If we are in the process of being shut-down there is
> +		 * no point of trying to contact XenBus - it is either
> +		 * killed (xenstored application) or the other domain
> +		 * has been killed or is unreachable.

Not necessarily, xenstore could just be slow.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  2013-11-21 17:52   ` [Xen-devel] " David Vrabel
@ 2013-11-22  9:30     ` Ian Campbell
  2013-11-26 16:50     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 27+ messages in thread
From: Ian Campbell @ 2013-11-22  9:30 UTC (permalink / raw)
  To: David Vrabel
  Cc: Konrad Rzeszutek Wilk, xen-devel, linux-kernel, JBeulich,
	boris.ostrovsky

graft 8 <20130528152156.GB3027@phenom.dumpdata.com>
prune 8 <20130528181149.GA27718@phenom.dumpdata.com>
thanks

On Thu, 2013-11-21 at 17:52 +0000, David Vrabel wrote:
> > Fixes-Bug: http://bugs.xenproject.org/xen/bug/8
> 
> This bug link has no useful information in it.

Looks like the intention was for it to reference this mail:
http://thread.gmane.org/gmane.comp.emulators.xen.devel/160720/focus=160828
this has the exact same contents as the control mail that created this
bug which I dug out of the bug trackers spool. The probles was that in
the original the commands were appended instead of at the front of the
message, so they got ignored. Then when the commands were correctly sent
the mail in question used "!" (meaning this mail) but didn't go to
xen-devel, so it didn't actually refer to a known thread. The correct
thing to do in that resend would have been to reference the relevant
message id directly.

I think I've fixed it up with the above commands.

Ian.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus.
  2013-11-21 11:09   ` David Vrabel
@ 2013-11-26 16:45     ` Konrad Rzeszutek Wilk
  2013-12-02 11:27       ` David Vrabel
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-11-26 16:45 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On Thu, Nov 21, 2013 at 11:09:52AM +0000, David Vrabel wrote:
> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> > There is a race case where the user does 'poweroff'
> > and at the same time the system admin does 'xl shutdown'.
> 
> This isn't a Xen-specific problem is it?  Wouldn't it be better to fix
> this in generic code?

Possibly. I believe the reason for the reboot_notifier to exist is
to provide a means to fix the race.

> 
> Especially since I don't think this patch actually fixes the race
> completely.
> 
> > --- a/drivers/xen/manage.c
> > +++ b/drivers/xen/manage.c
> [...]
> > @@ -222,7 +230,7 @@ static void shutdown_handler(struct xenbus_watch *watch,
> >  	};
> >  	static struct shutdown_handler *handler;
> >  
> > -	if (shutting_down != SHUTDOWN_INVALID)
> > +	if (atomic_read(&shutting_down) != SHUTDOWN_INVALID)
> >  		return;
> 
> In guest initiated poweroff at this time will still race with this
> toolstack initiated poweroff.

No, b/c the reboot notifier would have set 'shutting_down' already.

> 
> David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet up.
  2013-11-21 11:33   ` David Vrabel
@ 2013-11-26 16:47     ` Konrad Rzeszutek Wilk
  2014-04-01 15:43     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-11-26 16:47 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On Thu, Nov 21, 2013 at 11:33:49AM +0000, David Vrabel wrote:
> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> > The user can launch the guest in this sequence:
> > 
> > xl create -p /vm.cfg	[launch, but pause it]
> > xl shutdown latest	[sets control/shutdown=poweroff]
> > xl unpause latest
> > xl console latest	[and see that the guest has completely
> > ignored the shutdown request]
> > 
> > In reality the guest hasn't ignored it. It registers a watch
> > and gets a notification that there is value. It then calls
> > the shutdown_handler which ends up calling orderly_shutdown.
> 
> Is this really a bug?.

Yes. We did get the action, we just did not properly act on it.
> 
> >From the xl(1) man page.
> 
>   shutdown [OPTIONS] -a|domain-id
>      Gracefully shuts down a domain.  This coordinates with the
>      domain OS to perform graceful shutdown, so there is no guarantee
>      that it will succeed, and may take a variable length of time
>      depending on what services must be shutdown in the domain.
> 
> Seems like ignoring a shutdown request when the guest cannot yet
> shutdown gracefully is the expected behaviour.
> 
> This also doesn't seem sufficient.  SYSTEM_RUNNING is set prior to
> starting init in an initramfs and orderly_power_off(false) will still
> likely fail at this point.

Ugh, will have to figure out how else to realize when it the user-space
is ready to launch programs.
> 
> David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  2013-11-21 17:52   ` [Xen-devel] " David Vrabel
  2013-11-22  9:30     ` Ian Campbell
@ 2013-11-26 16:50     ` Konrad Rzeszutek Wilk
  2013-12-02 11:41       ` David Vrabel
  1 sibling, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-11-26 16:50 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On Thu, Nov 21, 2013 at 05:52:28PM +0000, David Vrabel wrote:
> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> > The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
> > 'process_msg' is running from within the 'xenbus' thread. Whenever
> > a message shows up in XenBus it is put on a xs_state.reply_list list
> > and 'read_reply' picks it up.
> > 
> > The problem is if the backend domain or the xenstored process is killed.
> > In which case 'xenbus' is still awaiting - and 'read_reply' if called -
> > stuck forever waiting for the reply_list to have some contents.
> > 
> > This is normally not a problem - as the backend domain can come back
> > or the xenstored process can be restarted. However if the domain
> > is in process of being powered off/restarted/halted - there is no
> > point of waiting on it coming back - as we are effectively being
> > terminated and should not impede the progress.
> > 
> > This patch solves this problem by checking the 'system_state' value
> > to see if we are in heading towards death. We also make the wait
> > mechanism a bit more asynchronous.
> 
> This seems to be checking the wrong thing conceptually.  We should abort
> the wait if xenstored is dead not if our domain is dying.
> 
> I think you can consider xenstored as dead if:
> 
> a) it's local and we're dying.

OK. Not sure exactly how to do that but that should be possible.

> b) it's remote and the remote domain is dead.

OK, any idea how to do that? As in check if a remote domain is dead?

> 
> > Fixes-Bug: http://bugs.xenproject.org/xen/bug/8
> 
> This bug link has no useful information in it.
> 
> > --- a/drivers/xen/xenbus/xenbus_xs.c
> > +++ b/drivers/xen/xenbus/xenbus_xs.c
> > @@ -148,9 +148,24 @@ static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
> >  
> >  	while (list_empty(&xs_state.reply_list)) {
> >  		spin_unlock(&xs_state.reply_lock);
> > -		/* XXX FIXME: Avoid synchronous wait for response here. */
> > -		wait_event(xs_state.reply_waitq,
> > -			   !list_empty(&xs_state.reply_list));
> > +		wait_event_timeout(xs_state.reply_waitq,
> > +				   !list_empty(&xs_state.reply_list),
> > +				   msecs_to_jiffies(500));
> 
> This is still a synchronous wait.  Is the removal of the FIXME comment
> correct?

I thought that the comment was meant in terms of it blocking forever.
But perhaps that was not the intent of the comment?
> 
> > +
> > +		/*
> > +		 * If we are in the process of being shut-down there is
> > +		 * no point of trying to contact XenBus - it is either
> > +		 * killed (xenstored application) or the other domain
> > +		 * has been killed or is unreachable.
> 
> Not necessarily, xenstore could just be slow.

That is true. Your suggestion would help in evaluating when XenBus
end point is kaput.
> 
> David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus.
  2013-11-26 16:45     ` Konrad Rzeszutek Wilk
@ 2013-12-02 11:27       ` David Vrabel
  2014-03-31 19:09         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 27+ messages in thread
From: David Vrabel @ 2013-12-02 11:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On 26/11/13 16:45, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 21, 2013 at 11:09:52AM +0000, David Vrabel wrote:
>> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
>>> There is a race case where the user does 'poweroff'
>>> and at the same time the system admin does 'xl shutdown'.
>>
>> This isn't a Xen-specific problem is it?  Wouldn't it be better to fix
>> this in generic code?
> 
> Possibly. I believe the reason for the reboot_notifier to exist is
> to provide a means to fix the race.
> 
>>
>> Especially since I don't think this patch actually fixes the race
>> completely.
>>
>>> --- a/drivers/xen/manage.c
>>> +++ b/drivers/xen/manage.c
>> [...]
>>> @@ -222,7 +230,7 @@ static void shutdown_handler(struct xenbus_watch *watch,
>>>  	};
>>>  	static struct shutdown_handler *handler;
>>>  
>>> -	if (shutting_down != SHUTDOWN_INVALID)
>>> +	if (atomic_read(&shutting_down) != SHUTDOWN_INVALID)
>>>  		return;
>>
>> In guest initiated poweroff at this time will still race with this
>> toolstack initiated poweroff.
> 
> No, b/c the reboot notifier would have set 'shutting_down' already.

If the guest initiated power off is started here, the reboot notifier
won't have run yet.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  2013-11-26 16:50     ` Konrad Rzeszutek Wilk
@ 2013-12-02 11:41       ` David Vrabel
  2014-03-31 20:33         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 27+ messages in thread
From: David Vrabel @ 2013-12-02 11:41 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On 26/11/13 16:50, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 21, 2013 at 05:52:28PM +0000, David Vrabel wrote:
>> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
>>> The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
>>> 'process_msg' is running from within the 'xenbus' thread. Whenever
>>> a message shows up in XenBus it is put on a xs_state.reply_list list
>>> and 'read_reply' picks it up.
>>>
>>> The problem is if the backend domain or the xenstored process is killed.
>>> In which case 'xenbus' is still awaiting - and 'read_reply' if called -
>>> stuck forever waiting for the reply_list to have some contents.
>>>
>>> This is normally not a problem - as the backend domain can come back
>>> or the xenstored process can be restarted. However if the domain
>>> is in process of being powered off/restarted/halted - there is no
>>> point of waiting on it coming back - as we are effectively being
>>> terminated and should not impede the progress.
>>>
>>> This patch solves this problem by checking the 'system_state' value
>>> to see if we are in heading towards death. We also make the wait
>>> mechanism a bit more asynchronous.
>>
>> This seems to be checking the wrong thing conceptually.  We should abort
>> the wait if xenstored is dead not if our domain is dying.
>>
>> I think you can consider xenstored as dead if:
>>
>> a) it's local and we're dying.
> 
> OK. Not sure exactly how to do that but that should be possible.

xen_store_domain_type == XS_LOCAL and looking at system_state?

>> b) it's remote and the remote domain is dead.
> 
> OK, any idea how to do that? As in check if a remote domain is dead?

Let someone who cares about xenstore domains fix this -- this is not the
most common use case.

I'd be happy to have some thing like:

bool xenbus_ok(void)
{
    switch (xen_store_domain_type) {
    case XS_LOCAL:
         return system_state != dying;
    case XS_PV:
    case XS_HVM;
         /* FIXME: could check remote domain is alive, but it's
            normally dom0. */
         return true;
    // ...
    default:
         return true;
    }
}

>>> Fixes-Bug: http://bugs.xenproject.org/xen/bug/8
>>
>> This bug link has no useful information in it.

And it now does, thanks Ian.

>>> --- a/drivers/xen/xenbus/xenbus_xs.c
>>> +++ b/drivers/xen/xenbus/xenbus_xs.c
>>> @@ -148,9 +148,24 @@ static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
>>>  
>>>  	while (list_empty(&xs_state.reply_list)) {
>>>  		spin_unlock(&xs_state.reply_lock);
>>> -		/* XXX FIXME: Avoid synchronous wait for response here. */
>>> -		wait_event(xs_state.reply_waitq,
>>> -			   !list_empty(&xs_state.reply_list));
>>> +		wait_event_timeout(xs_state.reply_waitq,
>>> +				   !list_empty(&xs_state.reply_list),
>>> +				   msecs_to_jiffies(500));
>>
>> This is still a synchronous wait.  Is the removal of the FIXME comment
>> correct?
> 
> I thought that the comment was meant in terms of it blocking forever.
> But perhaps that was not the intent of the comment?

Ok. I don't anticipate a fully async interface here being sensible anyway.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [Xen-devel] [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  2013-11-08 17:38 ` [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart Konrad Rzeszutek Wilk
  2013-11-21 17:52   ` [Xen-devel] " David Vrabel
@ 2014-01-26  1:13   ` Zhang, Yang Z
  2014-01-26  3:44     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 27+ messages in thread
From: Zhang, Yang Z @ 2014-01-26  1:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Ian.Campbell, xen-devel, linux-kernel,
	JBeulich, david.vrabel, boris.ostrovsky

Konrad Rzeszutek Wilk wrote on 2013-11-09:
> The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
> 'process_msg' is running from within the 'xenbus' thread. Whenever a
> message shows up in XenBus it is put on a xs_state.reply_list list and
> 'read_reply' picks it up.
> 
> The problem is if the backend domain or the xenstored process is killed.
> In which case 'xenbus' is still awaiting - and 'read_reply' if called
> - stuck forever waiting for the reply_list to have some contents.
> 
> This is normally not a problem - as the backend domain can come back
> or the xenstored process can be restarted. However if the domain is in
> process of being powered off/restarted/halted - there is no point of
> waiting on it coming back - as we are effectively being terminated and
> should not impede the progress.
> 

Hi, Konrad,

Is this patch applied in Linux upstream tree? I didn't find it with latest Linux upstream source.

> This patch solves this problem by checking the 'system_state' value to
> see if we are in heading towards death. We also make the wait
> mechanism a bit more asynchronous.
> 
> Fixes-Bug: http://bugs.xenproject.org/xen/bug/8
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  drivers/xen/xenbus/xenbus_xs.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> diff --git a/drivers/xen/xenbus/xenbus_xs.c
> b/drivers/xen/xenbus/xenbus_xs.c index b6d5fff..4f22706 100644
> --- a/drivers/xen/xenbus/xenbus_xs.c
> +++ b/drivers/xen/xenbus/xenbus_xs.c
> @@ -148,9 +148,24 @@ static void *read_reply(enum xsd_sockmsg_type
> *type, unsigned int *len)
> 
>  	while (list_empty(&xs_state.reply_list)) {
>  		spin_unlock(&xs_state.reply_lock);
> -		/* XXX FIXME: Avoid synchronous wait for response here. */
> -		wait_event(xs_state.reply_waitq, -			  
> !list_empty(&xs_state.reply_list));
> +		wait_event_timeout(xs_state.reply_waitq, +				  
> !list_empty(&xs_state.reply_list), +				   msecs_to_jiffies(500)); +
> +		/* +		 * If we are in the process of being shut-down there is +		 *
> no point of trying to contact XenBus - it is either +		 * killed
> (xenstored application) or the other domain +		 * has been killed or is
> unreachable. +		 */ +		switch (system_state) { +		case SYSTEM_POWER_OFF:
> +		case SYSTEM_RESTART: +		case SYSTEM_HALT: +			return ERR_PTR(-EIO);
> +		default: +			break; +		}
>  		spin_lock(&xs_state.reply_lock);
>  	}
> @@ -215,6 +230,9 @@ void *xenbus_dev_request_and_reply(struct
> xsd_sockmsg *msg)
> 
>  	mutex_unlock(&xs_state.request_mutex);
> +	if (IS_ERR(ret))
> +		return ret;
> +
>  	if ((msg->type == XS_TRANSACTION_END) ||
>  	    ((req_msg.type == XS_TRANSACTION_START) &&
>  	     (msg->type == XS_ERROR)))


Best regards,
Yang



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [Xen-devel] [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  2014-01-26  1:13   ` Zhang, Yang Z
@ 2014-01-26  3:44     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-26  3:44 UTC (permalink / raw)
  To: Zhang, Yang Z, Ian.Campbell, xen-devel, linux-kernel, JBeulich,
	david.vrabel, boris.ostrovsky

"Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
>Konrad Rzeszutek Wilk wrote on 2013-11-09:
>> The 'read_reply' works with 'process_msg' to read of a reply in
>XenBus.
>> 'process_msg' is running from within the 'xenbus' thread. Whenever a
>> message shows up in XenBus it is put on a xs_state.reply_list list
>and
>> 'read_reply' picks it up.
>> 
>> The problem is if the backend domain or the xenstored process is
>killed.
>> In which case 'xenbus' is still awaiting - and 'read_reply' if called
>> - stuck forever waiting for the reply_list to have some contents.
>> 
>> This is normally not a problem - as the backend domain can come back
>> or the xenstored process can be restarted. However if the domain is
>in
>> process of being powered off/restarted/halted - there is no point of
>> waiting on it coming back - as we are effectively being terminated
>and
>> should not impede the progress.
>> 
>
>Hi, Konrad,
>
>Is this patch applied in Linux upstream tree? I didn't find it with
>latest Linux upstream source.
>

No. It needs rework.
>> This patch solves this problem by checking the 'system_state' value
>to
>> see if we are in heading towards death. We also make the wait
>> mechanism a bit more asynchronous.
>> 
>> Fixes-Bug: http://bugs.xenproject.org/xen/bug/8
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>  drivers/xen/xenbus/xenbus_xs.c | 24 +++++++++++++++++++++---
>>  1 file changed, 21 insertions(+), 3 deletions(-)
>> diff --git a/drivers/xen/xenbus/xenbus_xs.c
>> b/drivers/xen/xenbus/xenbus_xs.c index b6d5fff..4f22706 100644
>> --- a/drivers/xen/xenbus/xenbus_xs.c
>> +++ b/drivers/xen/xenbus/xenbus_xs.c
>> @@ -148,9 +148,24 @@ static void *read_reply(enum xsd_sockmsg_type
>> *type, unsigned int *len)
>> 
>>  	while (list_empty(&xs_state.reply_list)) {
>>  		spin_unlock(&xs_state.reply_lock);
>> -		/* XXX FIXME: Avoid synchronous wait for response here. */
>> -		wait_event(xs_state.reply_waitq, -			  
>> !list_empty(&xs_state.reply_list));
>> +		wait_event_timeout(xs_state.reply_waitq, +				  
>> !list_empty(&xs_state.reply_list), +				   msecs_to_jiffies(500)); +
>> +		/* +		 * If we are in the process of being shut-down there is +		
>*
>> no point of trying to contact XenBus - it is either +		 * killed
>> (xenstored application) or the other domain +		 * has been killed or
>is
>> unreachable. +		 */ +		switch (system_state) { +		case
>SYSTEM_POWER_OFF:
>> +		case SYSTEM_RESTART: +		case SYSTEM_HALT: +			return
>ERR_PTR(-EIO);
>> +		default: +			break; +		}
>>  		spin_lock(&xs_state.reply_lock);
>>  	}
>> @@ -215,6 +230,9 @@ void *xenbus_dev_request_and_reply(struct
>> xsd_sockmsg *msg)
>> 
>>  	mutex_unlock(&xs_state.request_mutex);
>> +	if (IS_ERR(ret))
>> +		return ret;
>> +
>>  	if ((msg->type == XS_TRANSACTION_END) ||
>>  	    ((req_msg.type == XS_TRANSACTION_START) &&
>>  	     (msg->type == XS_ERROR)))
>
>
>Best regards,
>Yang



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus.
  2013-12-02 11:27       ` David Vrabel
@ 2014-03-31 19:09         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-31 19:09 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On Mon, Dec 02, 2013 at 11:27:40AM +0000, David Vrabel wrote:
> On 26/11/13 16:45, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 21, 2013 at 11:09:52AM +0000, David Vrabel wrote:
> >> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> >>> There is a race case where the user does 'poweroff'
> >>> and at the same time the system admin does 'xl shutdown'.
> >>
> >> This isn't a Xen-specific problem is it?  Wouldn't it be better to fix
> >> this in generic code?
> > 
> > Possibly. I believe the reason for the reboot_notifier to exist is
> > to provide a means to fix the race.
> > 
> >>
> >> Especially since I don't think this patch actually fixes the race
> >> completely.
> >>
> >>> --- a/drivers/xen/manage.c
> >>> +++ b/drivers/xen/manage.c
> >> [...]
> >>> @@ -222,7 +230,7 @@ static void shutdown_handler(struct xenbus_watch *watch,
> >>>  	};
> >>>  	static struct shutdown_handler *handler;
> >>>  
> >>> -	if (shutting_down != SHUTDOWN_INVALID)
> >>> +	if (atomic_read(&shutting_down) != SHUTDOWN_INVALID)
> >>>  		return;
> >>
> >> In guest initiated poweroff at this time will still race with this
> >> toolstack initiated poweroff.
> > 
> > No, b/c the reboot notifier would have set 'shutting_down' already.
> 
> If the guest initiated power off is started here, the reboot notifier
> won't have run yet.

This is what I think you are saying:

CPU0                                                 CPU1

'poweroff'						'shutdown_handler'
->SYSCALL_DEFINE4(reboot)				 -> atomic_read(&shutting_down) == SHUTDOWN_INVALID
  mutex_lock(&reboot_mutex)			 	 -> do_poweroff
  kernel_power_off()
    -> kernel_shutdown_prepare

         -> blocking_notifier_call_chain()
		\- xen_system_reboot
			\- atomic_set(&shutting_down, SHUTDOWN_POWEROFF);

                                                         -> atomic_set(&shutting_down, SHUTDOWN_POWEROFF);
                                                         -> orderly_poweroff(false)
								-> 'poweroff' called
									->SYSCALL_DEFINE4(reboot)
									     -> mutex_lock(&reboot_mutex)
          -> system_state = SYSTEM_HALT
     -> machine_halt().

What you are describing was outlined in the commit description:

"
   'poweroff' and 'xl shutdown'..

    Depending on the race, the system_state will be SYSTEM_RUNNING or
    SYSTEM_POWER_OFF. If SYSTEM_RUNNING we just end up making
    a duplicate call to 'poweroff' (while it is running).

    That will fail or execute (And if executed then it will be
    stuck in the reboot_mutex mutex). But nobody will care b/c the
    machine is in poweroff sequence.
"

which means that this code does guard.. but not that well :-(

> 
> David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  2013-12-02 11:41       ` David Vrabel
@ 2014-03-31 20:33         ` Konrad Rzeszutek Wilk
  2014-04-01 12:53           ` David Vrabel
  0 siblings, 1 reply; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-03-31 20:33 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On Mon, Dec 02, 2013 at 11:41:57AM +0000, David Vrabel wrote:
> On 26/11/13 16:50, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 21, 2013 at 05:52:28PM +0000, David Vrabel wrote:
> >> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> >>> The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
> >>> 'process_msg' is running from within the 'xenbus' thread. Whenever
> >>> a message shows up in XenBus it is put on a xs_state.reply_list list
> >>> and 'read_reply' picks it up.
> >>>
> >>> The problem is if the backend domain or the xenstored process is killed.
> >>> In which case 'xenbus' is still awaiting - and 'read_reply' if called -
> >>> stuck forever waiting for the reply_list to have some contents.
> >>>
> >>> This is normally not a problem - as the backend domain can come back
> >>> or the xenstored process can be restarted. However if the domain
> >>> is in process of being powered off/restarted/halted - there is no
> >>> point of waiting on it coming back - as we are effectively being
> >>> terminated and should not impede the progress.
> >>>
> >>> This patch solves this problem by checking the 'system_state' value
> >>> to see if we are in heading towards death. We also make the wait
> >>> mechanism a bit more asynchronous.
> >>
> >> This seems to be checking the wrong thing conceptually.  We should abort
> >> the wait if xenstored is dead not if our domain is dying.
> >>
> >> I think you can consider xenstored as dead if:
> >>
> >> a) it's local and we're dying.
> > 
> > OK. Not sure exactly how to do that but that should be possible.
> 
> xen_store_domain_type == XS_LOCAL and looking at system_state?
> 
> >> b) it's remote and the remote domain is dead.
> > 
> > OK, any idea how to do that? As in check if a remote domain is dead?
> 
> Let someone who cares about xenstore domains fix this -- this is not the
> most common use case.
> 
> I'd be happy to have some thing like:
> 
> bool xenbus_ok(void)
> {
>     switch (xen_store_domain_type) {
>     case XS_LOCAL:
>          return system_state != dying;
>     case XS_PV:
>     case XS_HVM;
>          /* FIXME: could check remote domain is alive, but it's
>             normally dom0. */
>          return true;
>     // ...
>     default:
>          return true;
>     }
> }

>From 227d72806311694ced6cdedfd61a05f5bb1893f7 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 8 Nov 2013 10:48:58 -0500
Subject: [PATCH] xen/xenbus: Avoid synchronous wait on XenBus stalling
 shutdown/restart.

The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
'process_msg' is running from within the 'xenbus' thread. Whenever
a message shows up in XenBus it is put on a xs_state.reply_list list
and 'read_reply' picks it up.

The problem is if the backend domain or the xenstored process is killed.
In which case 'xenbus' is still awaiting - and 'read_reply' if called -
stuck forever waiting for the reply_list to have some contents.

This is normally not a problem - as the backend domain can come back
or the xenstored process can be restarted. However if the domain
is in process of being powered off/restarted/halted - there is no
point of waiting on it coming back - as we are effectively being
terminated and should not impede the progress.

This patch solves this problem by checking whether the guest is
the right domain. If it is an initial domain and hurtling towards
death - there is no point of continuing the wait. All other type
of guests continue with their behavior.
mechanism a bit more asynchronous.

Fixes-Bug: http://bugs.xenproject.org/xen/bug/8
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Fixed it up per David's suggestions]
---
 drivers/xen/xenbus/xenbus_xs.c | 44 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index b6d5fff..ba804f3 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -50,6 +50,7 @@
 #include <xen/xenbus.h>
 #include <xen/xen.h>
 #include "xenbus_comms.h"
+#include "xenbus_probe.h"
 
 struct xs_stored_msg {
 	struct list_head list;
@@ -139,6 +140,29 @@ static int get_error(const char *errorstring)
 	return xsd_errors[i].errnum;
 }
 
+static bool xenbus_ok(void)
+{
+	switch (xen_store_domain_type) {
+	case XS_LOCAL:
+		switch (system_state) {
+		case SYSTEM_POWER_OFF:
+		case SYSTEM_RESTART:
+		case SYSTEM_HALT:
+			return false;
+		default:
+			break;
+		}
+		return true;
+	case XS_PV:
+	case XS_HVM:
+		/* FIXME: Could check that the remote domain is alive,
+		 * but it is normally initial domain. */
+		return true;
+	default:
+		break;
+	}
+	return false;
+}
 static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
 {
 	struct xs_stored_msg *msg;
@@ -148,9 +172,20 @@ static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
 
 	while (list_empty(&xs_state.reply_list)) {
 		spin_unlock(&xs_state.reply_lock);
-		/* XXX FIXME: Avoid synchronous wait for response here. */
-		wait_event(xs_state.reply_waitq,
-			   !list_empty(&xs_state.reply_list));
+		if (xenbus_ok())
+			/* XXX FIXME: Avoid synchronous wait for response here. */
+			wait_event_timeout(xs_state.reply_waitq,
+					   !list_empty(&xs_state.reply_list),
+					   msecs_to_jiffies(500));
+		else {
+			/*
+			 * If we are in the process of being shut-down there is
+			 * no point of trying to contact XenBus - it is either
+			 * killed (xenstored application) or the other domain
+			 * has been killed or is unreachable.
+			 */
+			return ERR_PTR(-EIO);
+		}
 		spin_lock(&xs_state.reply_lock);
 	}
 
@@ -215,6 +250,9 @@ void *xenbus_dev_request_and_reply(struct xsd_sockmsg *msg)
 
 	mutex_unlock(&xs_state.request_mutex);
 
+	if (IS_ERR(ret))
+		return ret;
+
 	if ((msg->type == XS_TRANSACTION_END) ||
 	    ((req_msg.type == XS_TRANSACTION_START) &&
 	     (msg->type == XS_ERROR)))
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
  2014-03-31 20:33         ` Konrad Rzeszutek Wilk
@ 2014-04-01 12:53           ` David Vrabel
  0 siblings, 0 replies; 27+ messages in thread
From: David Vrabel @ 2014-04-01 12:53 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, xen-devel, boris.ostrovsky, Ian.Campbell, JBeulich,
	linux-kernel

On 31/03/14 21:33, Konrad Rzeszutek Wilk wrote:
> 
> Subject: [PATCH] xen/xenbus: Avoid synchronous wait on XenBus stalling
>  shutdown/restart.
> 
> The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
> 'process_msg' is running from within the 'xenbus' thread. Whenever
> a message shows up in XenBus it is put on a xs_state.reply_list list
> and 'read_reply' picks it up.
> 
> The problem is if the backend domain or the xenstored process is killed.
> In which case 'xenbus' is still awaiting - and 'read_reply' if called -
> stuck forever waiting for the reply_list to have some contents.
> 
> This is normally not a problem - as the backend domain can come back
> or the xenstored process can be restarted. However if the domain
> is in process of being powered off/restarted/halted - there is no
> point of waiting on it coming back - as we are effectively being
> terminated and should not impede the progress.
> 
> This patch solves this problem by checking whether the guest is
> the right domain. If it is an initial domain and hurtling towards
> death - there is no point of continuing the wait. All other type
> of guests continue with their behavior.
> mechanism a bit more asynchronous.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus.
  2013-11-08 17:38 ` [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus Konrad Rzeszutek Wilk
  2013-11-20 21:40   ` Boris Ostrovsky
  2013-11-21 11:09   ` David Vrabel
@ 2014-04-01 13:18   ` David Vrabel
  2014-04-01 14:03     ` Konrad Rzeszutek Wilk
  2 siblings, 1 reply; 27+ messages in thread
From: David Vrabel @ 2014-04-01 13:18 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> There is a race case where the user does 'poweroff'
> and at the same time the system admin does 'xl shutdown'.
> 
> Depending on the race, the system_state will be SYSTEM_RUNNING or
> SYSTEM_POWER_OFF. If SYSTEM_RUNNING we just end up making
> a duplicate call to 'poweroff' (while it is running).
> 
> That will fail or execute (And if executed then it will be
> stuck in the reboot_mutex mutex). But nobody will care b/c the
> machine is in poweroff sequence.

If this race isn't a problem...

> If the system_state is SYSTEM_POWER_OFF then we end up making
> a duplicate call to kernel_power_off. There is no locking
> there so we walk in the same steps as what 'poweroff'
> has been doing.

... and this one doesn't happen because do_power_off() calls
orderly_poweroff(false) so does not call kernel_power_off().

Then isn't the patch unnecessary?

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus.
  2014-04-01 13:18   ` David Vrabel
@ 2014-04-01 14:03     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-04-01 14:03 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On Tue, Apr 01, 2014 at 02:18:02PM +0100, David Vrabel wrote:
> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> > There is a race case where the user does 'poweroff'
> > and at the same time the system admin does 'xl shutdown'.
> > 
> > Depending on the race, the system_state will be SYSTEM_RUNNING or
> > SYSTEM_POWER_OFF. If SYSTEM_RUNNING we just end up making
> > a duplicate call to 'poweroff' (while it is running).
> > 
> > That will fail or execute (And if executed then it will be
> > stuck in the reboot_mutex mutex). But nobody will care b/c the
> > machine is in poweroff sequence.
> 
> If this race isn't a problem...
> 
> > If the system_state is SYSTEM_POWER_OFF then we end up making
> > a duplicate call to kernel_power_off. There is no locking
> > there so we walk in the same steps as what 'poweroff'
> > has been doing.
> 
> ... and this one doesn't happen because do_power_off() calls
> orderly_poweroff(false) so does not call kernel_power_off().
> 
> Then isn't the patch unnecessary?

Yup :-)

I realized that once I wrote up the race condition.
> 
> David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet up.
  2013-11-21 11:33   ` David Vrabel
  2013-11-26 16:47     ` Konrad Rzeszutek Wilk
@ 2014-04-01 15:43     ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-04-01 15:43 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On Thu, Nov 21, 2013 at 11:33:49AM +0000, David Vrabel wrote:
> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> > The user can launch the guest in this sequence:
> > 
> > xl create -p /vm.cfg	[launch, but pause it]
> > xl shutdown latest	[sets control/shutdown=poweroff]
> > xl unpause latest
> > xl console latest	[and see that the guest has completely
> > ignored the shutdown request]
> > 
> > In reality the guest hasn't ignored it. It registers a watch
> > and gets a notification that there is value. It then calls
> > the shutdown_handler which ends up calling orderly_shutdown.
> 
> Is this really a bug?.
> 
> >From the xl(1) man page.
> 
>   shutdown [OPTIONS] -a|domain-id
>      Gracefully shuts down a domain.  This coordinates with the
>      domain OS to perform graceful shutdown, so there is no guarantee
>      that it will succeed, and may take a variable length of time
>      depending on what services must be shutdown in the domain.
> 
> Seems like ignoring a shutdown request when the guest cannot yet
> shutdown gracefully is the expected behaviour.
> 
> This also doesn't seem sufficient.  SYSTEM_RUNNING is set prior to
> starting init in an initramfs and orderly_power_off(false) will still
> likely fail at this point.

And I found out that there is not much I can do about it. I tried
setting up workqueues, using usermodehelper_read_trylock - none
of those helped when SYSTEM_RUNNING is set and prior to /sbin/init
being started.

The best I could do was relax the gate - meaning that the
  system_state = SYSTEM_RUNNING

we set unconditionally is removed. Instead we only set that when
the shutdown process is actually in progress.

This patch does this:

>From cc3c611cf30f30d435bd5ea80649aae53e490175 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Wed, 6 Nov 2013 10:57:56 -0500
Subject: [PATCH] xen/manage: Poweroff forcefully if user-space is not yet up.

The user can launch the guest in this sequence:

xl create -p /vm.cfg	[launch, but pause it]
xl shutdown latest	[sets control/shutdown=poweroff]
xl unpause latest
xl console latest	[and see that the guest has completely
ignored the shutdown request]

In reality the guest hasn't ignored it. It registers a watch
and gets a notification that there is value. It then calls
the shutdown_handler which ends up calling orderly_shutdown.

Unfortunately that is so early in the bootup that there
are no user-space. Which means that the orderly_shutdown fails.
But since the force flag was set to false it continues on without
reporting.

What we really want to is to use the force when we are in the
SYSTEM_BOOTING state and not use the 'force' when SYSTEM_RUNNING.

However, if we are in the running state - and the shutdown command
has been given before the user-space has been setup, there is nothing
we can do. Worst yet, we stop ignoring the 'xl shutdown' requests!

As such, the other part of this patch is to only stop ignoring
the 'xl shutdown' when we are truly in the power off sequence.

That means the user can do multiple 'xl shutdown' and we will try
to act on them instead of ignoring them.

Fixes-Bug: http://bugs.xenproject.org/xen/bug/6
Reported-by:  Alex Bligh <alex@alex.org.uk>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Add switch statement]
[v3: Add a reboot notifier]
---
 drivers/xen/manage.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 624e8dc..0cf7fe1 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -182,10 +182,32 @@ struct shutdown_handler {
 	void (*cb)(void);
 };
 
+static int poweroff_nb(struct notifier_block *cb, unsigned long code, void *unused)
+{
+	switch (code) {
+	case SYS_DOWN:
+	case SYS_HALT:
+	case SYS_POWER_OFF:
+		shutting_down = SHUTDOWN_POWEROFF;
+	default:
+		break;
+	}
+	return NOTIFY_DONE;
+}
 static void do_poweroff(void)
 {
-	shutting_down = SHUTDOWN_POWEROFF;
-	orderly_poweroff(false);
+	switch (system_state) {
+	case SYSTEM_BOOTING:
+		orderly_poweroff(true);
+		break;
+	case SYSTEM_RUNNING:
+		orderly_poweroff(false);
+		break;
+	default:
+		/* Don't do it when we are halting/rebooting. */
+		pr_info("Ignoring Xen toolstack shutdown.\n");
+		break;
+	}
 }
 
 static void do_reboot(void)
@@ -291,6 +313,10 @@ static struct xenbus_watch shutdown_watch = {
 	.callback = shutdown_handler
 };
 
+static struct notifier_block xen_reboot_nb = {
+	.notifier_call = poweroff_nb,
+};
+
 static int setup_shutdown_watcher(void)
 {
 	int err;
@@ -301,6 +327,7 @@ static int setup_shutdown_watcher(void)
 		return err;
 	}
 
+
 #ifdef CONFIG_MAGIC_SYSRQ
 	err = register_xenbus_watch(&sysrq_watch);
 	if (err) {
@@ -329,6 +356,7 @@ int xen_setup_shutdown_event(void)
 	if (!xen_domain())
 		return -ENODEV;
 	register_xenstore_notifier(&xenstore_notifier);
+	register_reboot_notifier(&xen_reboot_nb);
 
 	return 0;
 }
-- 
1.8.5.3


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1).
  2013-11-08 17:38 [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) Konrad Rzeszutek Wilk
                   ` (3 preceding siblings ...)
  2013-11-08 17:38 ` [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart Konrad Rzeszutek Wilk
@ 2014-04-03 11:59 ` David Vrabel
  2014-04-03 18:07   ` Konrad Rzeszutek Wilk
  4 siblings, 1 reply; 27+ messages in thread
From: David Vrabel @ 2014-04-03 11:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> Hey,
> 
> Two of these:
>  [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet
>  [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling
> 
> fix the bugs.xenproject.org outstanding bugs.
> 
> The other ones are that were discovered on xen-devel and discussed.
> They should go in v3.13 and as the merge window is next week I am
> hoping they can be squeezed in then.
> 
> 
>  drivers/xen/manage.c            | 64 ++++++++++++++++++++++++++++++++++-------
>  drivers/xen/pci.c               | 47 ++++++++++++++++++++++++++++++
>  drivers/xen/xenbus/xenbus_xs.c  | 24 ++++++++++++++--
>  include/xen/interface/physdev.h | 11 +++++++
>  
> Konrad Rzeszutek Wilk (4):
>       xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas.
>       xen/manage: Poweroff forcefully if user-space is not yet up.
>       xen/manage: Guard against user-space initiated poweroff and XenBus.
>       xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.

Konrad, can you repost a final series please?  I'm not 100% which
patches should be applied.

David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1).
  2014-04-03 11:59 ` [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) David Vrabel
@ 2014-04-03 18:07   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 27+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-04-03 18:07 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ian.Campbell, xen-devel, linux-kernel, JBeulich, boris.ostrovsky

On Thu, Apr 03, 2014 at 12:59:00PM +0100, David Vrabel wrote:
> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> > Hey,
> > 
> > Two of these:
> >  [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet
> >  [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling
> > 
> > fix the bugs.xenproject.org outstanding bugs.
> > 
> > The other ones are that were discovered on xen-devel and discussed.
> > They should go in v3.13 and as the merge window is next week I am
> > hoping they can be squeezed in then.
> > 
> > 
> >  drivers/xen/manage.c            | 64 ++++++++++++++++++++++++++++++++++-------
> >  drivers/xen/pci.c               | 47 ++++++++++++++++++++++++++++++
> >  drivers/xen/xenbus/xenbus_xs.c  | 24 ++++++++++++++--
> >  include/xen/interface/physdev.h | 11 +++++++
> >  
> > Konrad Rzeszutek Wilk (4):
> >       xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas.
> >       xen/manage: Poweroff forcefully if user-space is not yet up.
> >       xen/manage: Guard against user-space initiated poweroff and XenBus.
> >       xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
> 
> Konrad, can you repost a final series please?  I'm not 100% which
> patches should be applied.

Sure thing. Will do that on tomorrow.
> 
> David

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-04-03 18:07 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-08 17:38 [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) Konrad Rzeszutek Wilk
2013-11-08 17:38 ` [PATCH 1/4] xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas Konrad Rzeszutek Wilk
2013-11-21 10:37   ` David Vrabel
2013-11-08 17:38 ` [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet up Konrad Rzeszutek Wilk
2013-11-20 21:11   ` Boris Ostrovsky
2013-11-21 11:33   ` David Vrabel
2013-11-26 16:47     ` Konrad Rzeszutek Wilk
2014-04-01 15:43     ` Konrad Rzeszutek Wilk
2013-11-08 17:38 ` [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus Konrad Rzeszutek Wilk
2013-11-20 21:40   ` Boris Ostrovsky
2013-11-21 11:09   ` David Vrabel
2013-11-26 16:45     ` Konrad Rzeszutek Wilk
2013-12-02 11:27       ` David Vrabel
2014-03-31 19:09         ` Konrad Rzeszutek Wilk
2014-04-01 13:18   ` David Vrabel
2014-04-01 14:03     ` Konrad Rzeszutek Wilk
2013-11-08 17:38 ` [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart Konrad Rzeszutek Wilk
2013-11-21 17:52   ` [Xen-devel] " David Vrabel
2013-11-22  9:30     ` Ian Campbell
2013-11-26 16:50     ` Konrad Rzeszutek Wilk
2013-12-02 11:41       ` David Vrabel
2014-03-31 20:33         ` Konrad Rzeszutek Wilk
2014-04-01 12:53           ` David Vrabel
2014-01-26  1:13   ` Zhang, Yang Z
2014-01-26  3:44     ` Konrad Rzeszutek Wilk
2014-04-03 11:59 ` [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) David Vrabel
2014-04-03 18:07   ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).