All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] xen: cpupools: avoid crashing if shutting down with free CPUs
@ 2015-05-12 13:03 Dario Faggioli
  2015-05-28  9:03 ` Jan Beulich
  0 siblings, 1 reply; 3+ messages in thread
From: Dario Faggioli @ 2015-05-12 13:03 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, George Dunlap, Keir Fraser, Jan Beulich, Tim Deegan

in fact, before this change, shutting down or suspending the
system with some CPUs not assigned to any cpupool, would
crash as follows:

  (XEN) Xen call trace:
  (XEN)    [<ffff82d080101757>] disable_nonboot_cpus+0xb5/0x138
  (XEN)    [<ffff82d0801a8824>] enter_state_helper+0xbd/0x369
  (XEN)    [<ffff82d08010614a>] continue_hypercall_tasklet_handler+0x4a/0xb1
  (XEN)    [<ffff82d0801320bd>] do_tasklet_work+0x78/0xab
  (XEN)    [<ffff82d0801323f3>] do_tasklet+0x5e/0x8a
  (XEN)    [<ffff82d080163cb6>] idle_loop+0x56/0x6b
  (XEN)
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 0:
  (XEN) Xen BUG at cpu.c:191
  (XEN) ****************************************

This is because, for free CPUs, -EBUSY were being returned
when trying to tear them down, making cpu_down() unhappy.

It is certainly unpractical to forbid shutting down or
suspenging if there are unassigned CPUs, so this change
fixes the above by just avoiding returning -EBUSY for those
CPUs. If shutting off, that does not matter much anyway. If
suspending, we make sure that the CPUs remain unassigned
when resuming.

While there, take the chance to:
 - fix the doc comment of cpupool_cpu_remove() (it was
   wrong);
 - improve comments in general around and in cpupool_cpu_remove()
   and cpupool_cpu_add();
 - add a couple of ASSERT()-s for checking consistency.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Tim Deegan <tim@xen.org>
Cc: Keir Fraser <keir@xen.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
---
Changes from v1:
 * ret initialized to 0 in cpupool_cpu_add(), to fix a
   bug in the suspend/resume case, identified by Juergen.
---
 xen/common/cpupool.c |   82 +++++++++++++++++++++++++++++++++++---------------
 1 file changed, 57 insertions(+), 25 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index cabaccd..563864d 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -453,14 +453,16 @@ void cpupool_rm_domain(struct domain *d)
 }
 
 /*
- * called to add a new cpu to pool admin
- * we add a hotplugged cpu to the cpupool0 to be able to add it to dom0,
- * unless we are resuming from S3, in which case we put the cpu back
- * in the cpupool it was in prior to suspend.
+ * Called to add a cpu to a pool. CPUs being hot-plugged are added to pool0,
+ * as they must have been in there when unplugged.
+ *
+ * If, on the other hand, we are adding CPUs because we are resuming (e.g.,
+ * after ACPI S3) we put the cpu back in the pool where it was in prior when
+ * we suspended.
  */
 static int cpupool_cpu_add(unsigned int cpu)
 {
-    int ret = -EINVAL;
+    int ret = 0;
 
     spin_lock(&cpupool_lock);
     cpumask_clear_cpu(cpu, &cpupool_locked_cpus);
@@ -478,12 +480,28 @@ static int cpupool_cpu_add(unsigned int cpu)
                 if ( ret )
                     goto out;
                 cpumask_clear_cpu(cpu, (*c)->cpu_suspended);
+                break;
             }
         }
-    }
 
-    if ( cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+        /*
+         * Either cpu has been found as suspended in a pool, and added back
+         * there, or it stayed free (if it did not belong to any pool when
+         * suspending), and we don't want to do anything.
+         */
+        ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
+               cpumask_test_cpu(cpu, (*c)->cpu_valid));
+    }
+    else
+    {
+        /*
+         * If we are not resuming, we are hot-plugging cpu, and in which case
+         * we add it to pool0, as it certainly was there when hot-unplagged
+         * (or unplugging would have failed) and that is the default behavior
+         * anyway.
+         */
         ret = cpupool_assign_cpu_locked(cpupool0, cpu);
+    }
  out:
     spin_unlock(&cpupool_lock);
 
@@ -491,29 +509,52 @@ static int cpupool_cpu_add(unsigned int cpu)
 }
 
 /*
- * called to remove a cpu from pool admin
- * the cpu to be removed is locked to avoid removing it from dom0
- * returns failure if not in pool0
+ * Called to remove a CPU from a pool. The CPU is locked, to forbid removing
+ * it from pool0. In fact, if we want to hot-unplug a CPU, it must belong to
+ * pool0, or we fail.
+ *
+ * However, if we are suspending (e.g., to ACPI S3), we mark the CPU in such
+ * a way that it can be put back in its pool when resuming.
  */
 static int cpupool_cpu_remove(unsigned int cpu)
 {
     int ret = -EBUSY;
-    struct cpupool **c;
 
     spin_lock(&cpupool_lock);
-    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
-        ret = 0;
-    else
+    if ( system_state == SYS_STATE_suspend )
     {
+        struct cpupool **c;
+
         for_each_cpupool(c)
         {
-            if ( cpumask_test_cpu(cpu, (*c)->cpu_suspended ) )
+            if ( cpumask_test_cpu(cpu, (*c)->cpu_valid ) )
             {
-                ret = 0;
+                cpumask_set_cpu(cpu, (*c)->cpu_suspended);
                 break;
             }
         }
+
+        /*
+         * Either we found cpu in a pool, or it must be free (if it has been
+         * hot-unplagged, then we must have found it in pool0). It is, of
+         * course, fine to suspend or shutdown with CPUs not assigned to a
+         * pool, and (in case of suspend) they will stay free when resuming.
+         */
+        ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
+               cpumask_test_cpu(cpu, (*c)->cpu_suspended));
+        ASSERT(cpumask_test_cpu(cpu, &cpu_online_map) ||
+               cpumask_test_cpu(cpu, cpupool0->cpu_suspended));
+        ret = 0;
+    }
+    else if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    {
+        /*
+         * If we are not suspending, we are hot-unplugging cpu, and that is
+         * allowed only for CPUs in pool0.
+         */
+        ret = 0;
     }
+
     if ( !ret )
         cpumask_set_cpu(cpu, &cpupool_locked_cpus);
     spin_unlock(&cpupool_lock);
@@ -709,15 +750,6 @@ static int cpu_callback(
     unsigned int cpu = (unsigned long)hcpu;
     int rc = 0;
 
-    if ( system_state == SYS_STATE_suspend )
-    {
-        struct cpupool **c;
-
-        for_each_cpupool(c)
-            if ( cpumask_test_cpu(cpu, (*c)->cpu_valid ) )
-                cpumask_set_cpu(cpu, (*c)->cpu_suspended);
-    }
-
     switch ( action )
     {
     case CPU_DOWN_FAILED:

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] xen: cpupools: avoid crashing if shutting down with free CPUs
  2015-05-12 13:03 [PATCH v2] xen: cpupools: avoid crashing if shutting down with free CPUs Dario Faggioli
@ 2015-05-28  9:03 ` Jan Beulich
  2015-05-28 11:27   ` Dario Faggioli
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Beulich @ 2015-05-28  9:03 UTC (permalink / raw)
  To: Dario Faggioli, Juergen Gross
  Cc: George Dunlap, Tim Deegan, Keir Fraser, xen-devel

>>> On 12.05.15 at 15:03, <dario.faggioli@citrix.com> wrote:
> in fact, before this change, shutting down or suspending the
> system with some CPUs not assigned to any cpupool, would
> crash as follows:
> 
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d080101757>] disable_nonboot_cpus+0xb5/0x138
>   (XEN)    [<ffff82d0801a8824>] enter_state_helper+0xbd/0x369
>   (XEN)    [<ffff82d08010614a>] continue_hypercall_tasklet_handler+0x4a/0xb1
>   (XEN)    [<ffff82d0801320bd>] do_tasklet_work+0x78/0xab
>   (XEN)    [<ffff82d0801323f3>] do_tasklet+0x5e/0x8a
>   (XEN)    [<ffff82d080163cb6>] idle_loop+0x56/0x6b
>   (XEN)
>   (XEN)
>   (XEN) ****************************************
>   (XEN) Panic on CPU 0:
>   (XEN) Xen BUG at cpu.c:191
>   (XEN) ****************************************
> 
> This is because, for free CPUs, -EBUSY were being returned
> when trying to tear them down, making cpu_down() unhappy.
> 
> It is certainly unpractical to forbid shutting down or
> suspenging if there are unassigned CPUs, so this change
> fixes the above by just avoiding returning -EBUSY for those
> CPUs. If shutting off, that does not matter much anyway. If
> suspending, we make sure that the CPUs remain unassigned
> when resuming.
> 
> While there, take the chance to:
>  - fix the doc comment of cpupool_cpu_remove() (it was
>    wrong);
>  - improve comments in general around and in cpupool_cpu_remove()
>    and cpupool_cpu_add();
>  - add a couple of ASSERT()-s for checking consistency.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Tim Deegan <tim@xen.org>
> Cc: Keir Fraser <keir@xen.org>
> Reviewed-by: Juergen Gross <jgross@suse.com>
> Tested-by: Juergen Gross <jgross@suse.com>

Could one or both of you confirm that this as well as the other patch
it was originally paired with ("cpupool: assigning a CPU to a pool can
fail") ought to be backported (presumably to both 4.5 and 4.4)
please?

Jan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] xen: cpupools: avoid crashing if shutting down with free CPUs
  2015-05-28  9:03 ` Jan Beulich
@ 2015-05-28 11:27   ` Dario Faggioli
  0 siblings, 0 replies; 3+ messages in thread
From: Dario Faggioli @ 2015-05-28 11:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, George Dunlap, Tim Deegan, Keir Fraser, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 790 bytes --]

On Thu, 2015-05-28 at 10:03 +0100, Jan Beulich wrote:

> Could one or both of you confirm that this as well as the other patch
> it was originally paired with ("cpupool: assigning a CPU to a pool can
> fail") ought to be backported (presumably to both 4.5 and 4.4)
> please?
> 
Right, I was about to ask for this.

In fact, I've just checked, and both 4.5 and 4.4 are affected by the
issue fixed by this patch.

"cpupool: assigning a CPU to a pool can fail" should be backported as
well, IMO.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-05-28 11:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-12 13:03 [PATCH v2] xen: cpupools: avoid crashing if shutting down with free CPUs Dario Faggioli
2015-05-28  9:03 ` Jan Beulich
2015-05-28 11:27   ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.