All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] xen: simplify suspend/resume handling
@ 2019-04-02 16:19 Juergen Gross
  2019-04-02 16:19 ` [PATCH v4 1/6] xen/sched: call cpu_disable_scheduler() via cpu notifier Juergen Gross
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Juergen Gross @ 2019-04-02 16:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich, Roger Pau Monné

Especially in the scheduler area (schedule.c, cpupool.c) there is a
rather complex handling involved when doing suspend and resume.

This can be simplified a lot by not performing a complete cpu down and
up cycle for the non-boot cpus, but keeping the pure software related
state and freeing it only in case a cpu didn't come up again during
resume.

In summary not only the complexity can be reduced, but the failure
tolerance will be even better with this series: With a dedicated hook
for failing cpus when resuming it is now possible to survive e.g. a
cpupool being left without any cpu after resume by moving its domains
to cpupool0.

Juergen Gross (6):
  xen/sched: call cpu_disable_scheduler() via cpu notifier
  xen: add helper for calling notifier_call_chain() to common/cpu.c
  xen: add new cpu notifier action CPU_RESUME_FAILED
  xen: don't free percpu areas during suspend
  xen/cpupool: simplify suspend/resume handling
  xen/sched: don't disable scheduler on cpus during suspend

 xen/arch/arm/smpboot.c     |   2 -
 xen/arch/x86/percpu.c      |   3 +-
 xen/arch/x86/smpboot.c     |   3 -
 xen/common/cpu.c           |  61 +++++++-------
 xen/common/cpupool.c       | 131 ++++++++++++------------------
 xen/common/schedule.c      | 197 +++++++++++++++++++--------------------------
 xen/include/xen/cpu.h      |  29 ++++---
 xen/include/xen/sched-if.h |   1 -
 8 files changed, 184 insertions(+), 243 deletions(-)

-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 1/6] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-02 16:19 [PATCH v4 0/6] xen: simplify suspend/resume handling Juergen Gross
@ 2019-04-02 16:19 ` Juergen Gross
  2019-04-04 13:06   ` Dario Faggioli
  2019-04-02 16:19 ` [PATCH v4 2/6] xen: add helper for calling notifier_call_chain() to common/cpu.c Juergen Gross
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: Juergen Gross @ 2019-04-02 16:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu, George Dunlap,
	Andrew Cooper, Dario Faggioli, Julien Grall, Jan Beulich,
	Roger Pau Monné

cpu_disable_scheduler() is being called from __cpu_disable() today.
There is no need to execute it on the cpu just being disabled, so use
the CPU_DEAD case of the cpu notifier chain. Moving the call out of
stop_machine() context is fine, as we just need to hold the domain RCU
lock and need the scheduler percpu data to be still allocated.

Add another hook for CPU_DOWN_PREPARE to bail out early in case
cpu_disable_scheduler() would fail. This will avoid crashes in rare
cases for cpu hotplug or suspend.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
---
V4:
- correct cpu_disable_scheduler_check() for hotplug case (Andrew Cooper)

V3:
- restore smp_mb() (Julien Grall)

V2:
- add CPU_DOWN_PREPARE hook
- BUG() in case of cpu_disable_scheduler() failing in CPU_DEAD
  (Jan Beulich)
- modify ARM __cpu_disable(), too (Andrew Cooper)
---
 xen/arch/arm/smpboot.c |  2 --
 xen/arch/x86/smpboot.c |  3 ---
 xen/common/schedule.c  | 36 +++++++++++++++++++++++++++++-------
 3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 25cd44549c..f756444362 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -386,8 +386,6 @@ void __cpu_disable(void)
     /* It's now safe to remove this processor from the online map */
     cpumask_clear_cpu(cpu, &cpu_online_map);
 
-    if ( cpu_disable_scheduler(cpu) )
-        BUG();
     smp_mb();
 
     /* Return to caller; eventually the IPI mechanism will unwind and the 
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 7d1226d7bc..b7a0a4a419 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -1221,9 +1221,6 @@ void __cpu_disable(void)
     cpumask_clear_cpu(cpu, &cpu_online_map);
     fixup_irqs(&cpu_online_map, 1);
     fixup_eoi();
-
-    if ( cpu_disable_scheduler(cpu) )
-        BUG();
 }
 
 void __cpu_die(unsigned int cpu)
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 76d60785e2..86bd10945d 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -773,8 +773,9 @@ void restore_vcpu_affinity(struct domain *d)
 }
 
 /*
- * This function is used by cpu_hotplug code from stop_machine context
+ * This function is used by cpu_hotplug code via cpu notifier chain
  * and from cpupools to switch schedulers on a cpu.
+ * Caller must get domlist_read_lock.
  */
 int cpu_disable_scheduler(unsigned int cpu)
 {
@@ -789,12 +790,6 @@ int cpu_disable_scheduler(unsigned int cpu)
     if ( c == NULL )
         return ret;
 
-    /*
-     * We'd need the domain RCU lock, but:
-     *  - when we are called from cpupool code, it's acquired there already;
-     *  - when we are called for CPU teardown, we're in stop-machine context,
-     *    so that's not be a problem.
-     */
     for_each_domain_in_cpupool ( d, c )
     {
         for_each_vcpu ( d, v )
@@ -893,6 +888,24 @@ int cpu_disable_scheduler(unsigned int cpu)
     return ret;
 }
 
+static int cpu_disable_scheduler_check(unsigned int cpu)
+{
+    struct domain *d;
+    struct vcpu *v;
+    struct cpupool *c;
+
+    c = per_cpu(cpupool, cpu);
+    if ( c == NULL )
+        return 0;
+
+    for_each_domain_in_cpupool ( d, c )
+        for_each_vcpu ( d, v )
+            if ( v->affinity_broken )
+                return -EADDRINUSE;
+
+    return 0;
+}
+
 /*
  * In general, this must be called with the scheduler lock held, because the
  * adjust_affinity hook may want to modify the vCPU state. However, when the
@@ -1734,7 +1747,16 @@ static int cpu_schedule_callback(
     case CPU_UP_PREPARE:
         rc = cpu_schedule_up(cpu);
         break;
+    case CPU_DOWN_PREPARE:
+        rcu_read_lock(&domlist_read_lock);
+        rc = cpu_disable_scheduler_check(cpu);
+        rcu_read_unlock(&domlist_read_lock);
+        break;
     case CPU_DEAD:
+        rcu_read_lock(&domlist_read_lock);
+        rc = cpu_disable_scheduler(cpu);
+        BUG_ON(rc);
+        rcu_read_unlock(&domlist_read_lock);
         SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
         /* Fallthrough */
     case CPU_UP_CANCELED:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 2/6] xen: add helper for calling notifier_call_chain() to common/cpu.c
  2019-04-02 16:19 [PATCH v4 0/6] xen: simplify suspend/resume handling Juergen Gross
  2019-04-02 16:19 ` [PATCH v4 1/6] xen/sched: call cpu_disable_scheduler() via cpu notifier Juergen Gross
@ 2019-04-02 16:19 ` Juergen Gross
  2019-04-02 16:19 ` [PATCH v4 3/6] xen: add new cpu notifier action CPU_RESUME_FAILED Juergen Gross
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Juergen Gross @ 2019-04-02 16:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich

Add a helper cpu_notifier_call_chain() to call notifier_call_chain()
for a cpu with a specified action, returning an errno value.

This avoids coding the same pattern multiple times.

While at it avoid side effects from using BUG_ON() by not using
cpu_online(cpu) as a parameter.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
V2:
- add nofail parameter to cpu_notifier_call_chain()
- avoid side effects from using BUG_ON() macro (Andrew Cooper)
---
 xen/common/cpu.c | 56 ++++++++++++++++++++++++++------------------------------
 1 file changed, 26 insertions(+), 30 deletions(-)

diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index 836c62f97f..8bf69600a6 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -71,11 +71,21 @@ void __init register_cpu_notifier(struct notifier_block *nb)
     spin_unlock(&cpu_add_remove_lock);
 }
 
+static int cpu_notifier_call_chain(unsigned int cpu, unsigned long action,
+                                   struct notifier_block **nb, bool nofail)
+{
+    void *hcpu = (void *)(long)cpu;
+    int notifier_rc = notifier_call_chain(&cpu_chain, action, hcpu, nb);
+    int ret = (notifier_rc == NOTIFY_DONE) ? 0 : notifier_to_errno(notifier_rc);
+
+    BUG_ON(ret && nofail);
+
+    return ret;
+}
+
 static void _take_cpu_down(void *unused)
 {
-    void *hcpu = (void *)(long)smp_processor_id();
-    int notifier_rc = notifier_call_chain(&cpu_chain, CPU_DYING, hcpu, NULL);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(smp_processor_id(), CPU_DYING, NULL, true);
     __cpu_disable();
 }
 
@@ -87,8 +97,7 @@ static int take_cpu_down(void *arg)
 
 int cpu_down(unsigned int cpu)
 {
-    int err, notifier_rc;
-    void *hcpu = (void *)(long)cpu;
+    int err;
     struct notifier_block *nb = NULL;
 
     if ( !cpu_hotplug_begin() )
@@ -100,12 +109,9 @@ int cpu_down(unsigned int cpu)
         return -EINVAL;
     }
 
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE, hcpu, &nb);
-    if ( notifier_rc != NOTIFY_DONE )
-    {
-        err = notifier_to_errno(notifier_rc);
+    err = cpu_notifier_call_chain(cpu, CPU_DOWN_PREPARE, &nb, false);
+    if ( err )
         goto fail;
-    }
 
     if ( unlikely(system_state < SYS_STATE_active) )
         on_selected_cpus(cpumask_of(cpu), _take_cpu_down, NULL, true);
@@ -113,26 +119,24 @@ int cpu_down(unsigned int cpu)
         goto fail;
 
     __cpu_die(cpu);
-    BUG_ON(cpu_online(cpu));
+    err = cpu_online(cpu);
+    BUG_ON(err);
 
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_DEAD, hcpu, NULL);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(cpu, CPU_DEAD, NULL, true);
 
     send_global_virq(VIRQ_PCPU_STATE);
     cpu_hotplug_done();
     return 0;
 
  fail:
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_DOWN_FAILED, hcpu, &nb);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(cpu, CPU_DOWN_FAILED, &nb, true);
     cpu_hotplug_done();
     return err;
 }
 
 int cpu_up(unsigned int cpu)
 {
-    int notifier_rc, err = 0;
-    void *hcpu = (void *)(long)cpu;
+    int err;
     struct notifier_block *nb = NULL;
 
     if ( !cpu_hotplug_begin() )
@@ -144,19 +148,15 @@ int cpu_up(unsigned int cpu)
         return -EINVAL;
     }
 
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_UP_PREPARE, hcpu, &nb);
-    if ( notifier_rc != NOTIFY_DONE )
-    {
-        err = notifier_to_errno(notifier_rc);
+    err = cpu_notifier_call_chain(cpu, CPU_UP_PREPARE, &nb, false);
+    if ( err )
         goto fail;
-    }
 
     err = __cpu_up(cpu);
     if ( err < 0 )
         goto fail;
 
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_ONLINE, hcpu, NULL);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(cpu, CPU_ONLINE, NULL, true);
 
     send_global_virq(VIRQ_PCPU_STATE);
 
@@ -164,18 +164,14 @@ int cpu_up(unsigned int cpu)
     return 0;
 
  fail:
-    notifier_rc = notifier_call_chain(&cpu_chain, CPU_UP_CANCELED, hcpu, &nb);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(cpu, CPU_UP_CANCELED, &nb, true);
     cpu_hotplug_done();
     return err;
 }
 
 void notify_cpu_starting(unsigned int cpu)
 {
-    void *hcpu = (void *)(long)cpu;
-    int notifier_rc = notifier_call_chain(
-        &cpu_chain, CPU_STARTING, hcpu, NULL);
-    BUG_ON(notifier_rc != NOTIFY_DONE);
+    cpu_notifier_call_chain(cpu, CPU_STARTING, NULL, true);
 }
 
 static cpumask_t frozen_cpus;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 3/6] xen: add new cpu notifier action CPU_RESUME_FAILED
  2019-04-02 16:19 [PATCH v4 0/6] xen: simplify suspend/resume handling Juergen Gross
  2019-04-02 16:19 ` [PATCH v4 1/6] xen/sched: call cpu_disable_scheduler() via cpu notifier Juergen Gross
  2019-04-02 16:19 ` [PATCH v4 2/6] xen: add helper for calling notifier_call_chain() to common/cpu.c Juergen Gross
@ 2019-04-02 16:19 ` Juergen Gross
  2019-04-02 16:19 ` [PATCH v4 4/6] xen: don't free percpu areas during suspend Juergen Gross
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Juergen Gross @ 2019-04-02 16:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich

Add a new cpu notifier action CPU_RESUME_FAILED which is called for all
cpus which failed to come up at resume. The calls will be done after
all other cpus are already up in order to know which resources are
available then.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
V2:
- added comment in xen/include/xen/cpu.h (Dario Faggioli)
---
 xen/common/cpu.c      |  5 +++++
 xen/include/xen/cpu.h | 29 ++++++++++++++++++-----------
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index 8bf69600a6..a6efc5e604 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -218,7 +218,12 @@ void enable_nonboot_cpus(void)
             printk("Error bringing CPU%d up: %d\n", cpu, error);
             BUG_ON(error == -EBUSY);
         }
+        else
+            __cpumask_clear_cpu(cpu, &frozen_cpus);
     }
 
+    for_each_cpu ( cpu, &frozen_cpus )
+        cpu_notifier_call_chain(cpu, CPU_RESUME_FAILED, NULL, true);
+
     cpumask_clear(&frozen_cpus);
 }
diff --git a/xen/include/xen/cpu.h b/xen/include/xen/cpu.h
index 2fe3ec05d8..4638c509e2 100644
--- a/xen/include/xen/cpu.h
+++ b/xen/include/xen/cpu.h
@@ -22,33 +22,40 @@ void register_cpu_notifier(struct notifier_block *nb);
  *  CPU_UP_PREPARE -> CPU_STARTING -> CPU_ONLINE -- successful CPU up
  *  CPU_DOWN_PREPARE -> CPU_DOWN_FAILED          -- failed CPU down
  *  CPU_DOWN_PREPARE -> CPU_DYING -> CPU_DEAD    -- successful CPU down
- * 
+ * in the resume case we have additionally:
+ *  CPU_UP_PREPARE -> CPU_UP_CANCELLED -> CPU_RESUME_FAILED -- CPU not resumed
+ *  with the CPU_RESUME_FAILED handler called only after all CPUs have been
+ *  tried to put online again in order to know which CPUs did restart
+ *  successfully.
+ *
  * Hence note that only CPU_*_PREPARE handlers are allowed to fail. Also note
  * that once CPU_DYING is delivered, an offline action can no longer fail.
- * 
+ *
  * Notifiers are called highest-priority-first when:
  *  (a) A CPU is coming up; or (b) CPU_DOWN_FAILED
  * Notifiers are called lowest-priority-first when:
  *  (a) A CPU is going down; or (b) CPU_UP_CANCELED
  */
 /* CPU_UP_PREPARE: Preparing to bring CPU online. */
-#define CPU_UP_PREPARE   (0x0001 | NOTIFY_FORWARD)
+#define CPU_UP_PREPARE    (0x0001 | NOTIFY_FORWARD)
 /* CPU_UP_CANCELED: CPU is no longer being brought online. */
-#define CPU_UP_CANCELED  (0x0002 | NOTIFY_REVERSE)
+#define CPU_UP_CANCELED   (0x0002 | NOTIFY_REVERSE)
 /* CPU_STARTING: CPU nearly online. Runs on new CPU, irqs still disabled. */
-#define CPU_STARTING     (0x0003 | NOTIFY_FORWARD)
+#define CPU_STARTING      (0x0003 | NOTIFY_FORWARD)
 /* CPU_ONLINE: CPU is up. */
-#define CPU_ONLINE       (0x0004 | NOTIFY_FORWARD)
+#define CPU_ONLINE        (0x0004 | NOTIFY_FORWARD)
 /* CPU_DOWN_PREPARE: CPU is going down. */
-#define CPU_DOWN_PREPARE (0x0005 | NOTIFY_REVERSE)
+#define CPU_DOWN_PREPARE  (0x0005 | NOTIFY_REVERSE)
 /* CPU_DOWN_FAILED: CPU is no longer going down. */
-#define CPU_DOWN_FAILED  (0x0006 | NOTIFY_FORWARD)
+#define CPU_DOWN_FAILED   (0x0006 | NOTIFY_FORWARD)
 /* CPU_DYING: CPU is nearly dead (in stop_machine context). */
-#define CPU_DYING        (0x0007 | NOTIFY_REVERSE)
+#define CPU_DYING         (0x0007 | NOTIFY_REVERSE)
 /* CPU_DEAD: CPU is dead. */
-#define CPU_DEAD         (0x0008 | NOTIFY_REVERSE)
+#define CPU_DEAD          (0x0008 | NOTIFY_REVERSE)
 /* CPU_REMOVE: CPU was removed. */
-#define CPU_REMOVE       (0x0009 | NOTIFY_REVERSE)
+#define CPU_REMOVE        (0x0009 | NOTIFY_REVERSE)
+/* CPU_RESUME_FAILED: CPU failed to come up in resume, all other CPUs up. */
+#define CPU_RESUME_FAILED (0x000a | NOTIFY_REVERSE)
 
 /* Perform CPU hotplug. May return -EAGAIN. */
 int cpu_down(unsigned int cpu);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 4/6] xen: don't free percpu areas during suspend
  2019-04-02 16:19 [PATCH v4 0/6] xen: simplify suspend/resume handling Juergen Gross
                   ` (2 preceding siblings ...)
  2019-04-02 16:19 ` [PATCH v4 3/6] xen: add new cpu notifier action CPU_RESUME_FAILED Juergen Gross
@ 2019-04-02 16:19 ` Juergen Gross
  2019-04-02 16:19 ` [PATCH v4 5/6] xen/cpupool: simplify suspend/resume handling Juergen Gross
  2019-04-02 16:19 ` [PATCH v4 6/6] xen/sched: don't disable scheduler on cpus during suspend Juergen Gross
  5 siblings, 0 replies; 8+ messages in thread
From: Juergen Gross @ 2019-04-02 16:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

Instead of freeing percpu areas during suspend and allocating them
again when resuming keep them. Only free an area in case a cpu didn't
come up again when resuming.

It should be noted that there is a potential change in behaviour as
the percpu areas are no longer zeroed out during suspend/resume. While
I have checked the called cpu notifier hooks to cope with that there
might be some well hidden dependency on the previous behaviour. OTOH
a component not registering itself for cpu down/up and expecting to
see a zeroed percpu variable after suspend/resume is kind of broken
already. And the opposite case, where a component is not registered
to be called for cpu down/up and is not expecting a percpu variable
suddenly to be zero due to suspend/resume is much more probable,
especially as the suspend/resume functionality seems not to be tested
that often.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/percpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/percpu.c b/xen/arch/x86/percpu.c
index 8be4ebddf4..5ea14b6ec3 100644
--- a/xen/arch/x86/percpu.c
+++ b/xen/arch/x86/percpu.c
@@ -76,7 +76,8 @@ static int cpu_percpu_callback(
         break;
     case CPU_UP_CANCELED:
     case CPU_DEAD:
-        if ( !park_offline_cpus )
+    case CPU_RESUME_FAILED:
+        if ( !park_offline_cpus && system_state != SYS_STATE_suspend )
             free_percpu_area(cpu);
         break;
     case CPU_REMOVE:
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 5/6] xen/cpupool: simplify suspend/resume handling
  2019-04-02 16:19 [PATCH v4 0/6] xen: simplify suspend/resume handling Juergen Gross
                   ` (3 preceding siblings ...)
  2019-04-02 16:19 ` [PATCH v4 4/6] xen: don't free percpu areas during suspend Juergen Gross
@ 2019-04-02 16:19 ` Juergen Gross
  2019-04-02 16:19 ` [PATCH v4 6/6] xen/sched: don't disable scheduler on cpus during suspend Juergen Gross
  5 siblings, 0 replies; 8+ messages in thread
From: Juergen Gross @ 2019-04-02 16:19 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Dario Faggioli, Julien Grall, Jan Beulich

Instead of removing cpus temporarily from cpupools during
suspend/resume only remove cpus finally which didn't come up when
resuming.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
V2:
- add comment (George Dunlap)
---
 xen/common/cpupool.c       | 131 ++++++++++++++++++---------------------------
 xen/include/xen/sched-if.h |   1 -
 2 files changed, 52 insertions(+), 80 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index e89bb67e71..31ac323e40 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -47,12 +47,6 @@ static struct cpupool *alloc_cpupool_struct(void)
         xfree(c);
         c = NULL;
     }
-    else if ( !zalloc_cpumask_var(&c->cpu_suspended) )
-    {
-        free_cpumask_var(c->cpu_valid);
-        xfree(c);
-        c = NULL;
-    }
 
     return c;
 }
@@ -60,10 +54,7 @@ static struct cpupool *alloc_cpupool_struct(void)
 static void free_cpupool_struct(struct cpupool *c)
 {
     if ( c )
-    {
-        free_cpumask_var(c->cpu_suspended);
         free_cpumask_var(c->cpu_valid);
-    }
     xfree(c);
 }
 
@@ -477,10 +468,6 @@ void cpupool_rm_domain(struct domain *d)
 /*
  * Called to add a cpu to a pool. CPUs being hot-plugged are added to pool0,
  * as they must have been in there when unplugged.
- *
- * If, on the other hand, we are adding CPUs because we are resuming (e.g.,
- * after ACPI S3) we put the cpu back in the pool where it was in prior when
- * we suspended.
  */
 static int cpupool_cpu_add(unsigned int cpu)
 {
@@ -490,42 +477,15 @@ static int cpupool_cpu_add(unsigned int cpu)
     cpumask_clear_cpu(cpu, &cpupool_locked_cpus);
     cpumask_set_cpu(cpu, &cpupool_free_cpus);
 
-    if ( system_state == SYS_STATE_suspend || system_state == SYS_STATE_resume )
-    {
-        struct cpupool **c;
-
-        for_each_cpupool(c)
-        {
-            if ( cpumask_test_cpu(cpu, (*c)->cpu_suspended ) )
-            {
-                ret = cpupool_assign_cpu_locked(*c, cpu);
-                if ( ret )
-                    goto out;
-                cpumask_clear_cpu(cpu, (*c)->cpu_suspended);
-                break;
-            }
-        }
+    /*
+     * If we are not resuming, we are hot-plugging cpu, and in which case
+     * we add it to pool0, as it certainly was there when hot-unplagged
+     * (or unplugging would have failed) and that is the default behavior
+     * anyway.
+     */
+    per_cpu(cpupool, cpu) = NULL;
+    ret = cpupool_assign_cpu_locked(cpupool0, cpu);
 
-        /*
-         * Either cpu has been found as suspended in a pool, and added back
-         * there, or it stayed free (if it did not belong to any pool when
-         * suspending), and we don't want to do anything.
-         */
-        ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
-               cpumask_test_cpu(cpu, (*c)->cpu_valid));
-    }
-    else
-    {
-        /*
-         * If we are not resuming, we are hot-plugging cpu, and in which case
-         * we add it to pool0, as it certainly was there when hot-unplagged
-         * (or unplugging would have failed) and that is the default behavior
-         * anyway.
-         */
-        per_cpu(cpupool, cpu) = NULL;
-        ret = cpupool_assign_cpu_locked(cpupool0, cpu);
-    }
- out:
     spin_unlock(&cpupool_lock);
 
     return ret;
@@ -535,42 +495,14 @@ static int cpupool_cpu_add(unsigned int cpu)
  * Called to remove a CPU from a pool. The CPU is locked, to forbid removing
  * it from pool0. In fact, if we want to hot-unplug a CPU, it must belong to
  * pool0, or we fail.
- *
- * However, if we are suspending (e.g., to ACPI S3), we mark the CPU in such
- * a way that it can be put back in its pool when resuming.
  */
 static int cpupool_cpu_remove(unsigned int cpu)
 {
     int ret = -ENODEV;
 
     spin_lock(&cpupool_lock);
-    if ( system_state == SYS_STATE_suspend )
-    {
-        struct cpupool **c;
-
-        for_each_cpupool(c)
-        {
-            if ( cpumask_test_cpu(cpu, (*c)->cpu_valid ) )
-            {
-                cpumask_set_cpu(cpu, (*c)->cpu_suspended);
-                cpumask_clear_cpu(cpu, (*c)->cpu_valid);
-                break;
-            }
-        }
 
-        /*
-         * Either we found cpu in a pool, or it must be free (if it has been
-         * hot-unplagged, then we must have found it in pool0). It is, of
-         * course, fine to suspend or shutdown with CPUs not assigned to a
-         * pool, and (in case of suspend) they will stay free when resuming.
-         */
-        ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus) ||
-               cpumask_test_cpu(cpu, (*c)->cpu_suspended));
-        ASSERT(cpumask_test_cpu(cpu, &cpu_online_map) ||
-               cpumask_test_cpu(cpu, cpupool0->cpu_suspended));
-        ret = 0;
-    }
-    else if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
+    if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) )
     {
         /*
          * If we are not suspending, we are hot-unplugging cpu, and that is
@@ -587,6 +519,41 @@ static int cpupool_cpu_remove(unsigned int cpu)
     return ret;
 }
 
+/*
+ * Called during resume for all cpus which didn't come up again. The cpu must
+ * be removed from the cpupool it is assigned to. In case a cpupool will be
+ * left without cpu we move all domains of that cpupool to cpupool0.
+ */
+static void cpupool_cpu_remove_forced(unsigned int cpu)
+{
+    struct cpupool **c;
+    struct domain *d;
+
+    spin_lock(&cpupool_lock);
+
+    if ( cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+        cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+    else
+    {
+        for_each_cpupool(c)
+        {
+            if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) )
+            {
+                cpumask_clear_cpu(cpu, (*c)->cpu_valid);
+                if ( cpumask_weight((*c)->cpu_valid) == 0 )
+                {
+                    if ( *c == cpupool0 )
+                        panic("No cpu left in cpupool0\n");
+                    for_each_domain_in_cpupool(d, *c)
+                        cpupool_move_domain_locked(d, cpupool0);
+                }
+            }
+        }
+    }
+
+    spin_unlock(&cpupool_lock);
+}
+
 /*
  * do cpupool related sysctl operations
  */
@@ -774,10 +741,16 @@ static int cpu_callback(
     {
     case CPU_DOWN_FAILED:
     case CPU_ONLINE:
-        rc = cpupool_cpu_add(cpu);
+        if ( system_state <= SYS_STATE_active )
+            rc = cpupool_cpu_add(cpu);
         break;
     case CPU_DOWN_PREPARE:
-        rc = cpupool_cpu_remove(cpu);
+        /* Suspend/Resume don't change assignments of cpus to cpupools. */
+        if ( system_state <= SYS_STATE_active )
+            rc = cpupool_cpu_remove(cpu);
+        break;
+    case CPU_RESUME_FAILED:
+        cpupool_cpu_remove_forced(cpu);
         break;
     default:
         break;
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 9596eae1e2..92bc7a0365 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -214,7 +214,6 @@ struct cpupool
 {
     int              cpupool_id;
     cpumask_var_t    cpu_valid;      /* all cpus assigned to pool */
-    cpumask_var_t    cpu_suspended;  /* cpus in S3 that should be in this pool */
     struct cpupool   *next;
     unsigned int     n_dom;
     struct scheduler *sched;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 6/6] xen/sched: don't disable scheduler on cpus during suspend
  2019-04-02 16:19 [PATCH v4 0/6] xen: simplify suspend/resume handling Juergen Gross
                   ` (4 preceding siblings ...)
  2019-04-02 16:19 ` [PATCH v4 5/6] xen/cpupool: simplify suspend/resume handling Juergen Gross
@ 2019-04-02 16:19 ` Juergen Gross
  5 siblings, 0 replies; 8+ messages in thread
From: Juergen Gross @ 2019-04-02 16:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Dario Faggioli

Today there is special handling in cpu_disable_scheduler() for suspend
by forcing all vcpus to the boot cpu. In fact there is no need for that
as during resume the vcpus are put on the correct cpus again.

So we can just omit the call of cpu_disable_scheduler() when offlining
a cpu due to suspend and on resuming we can omit taking the schedule
lock for selecting the new processor.

In restore_vcpu_affinity() we should be careful when applying affinity
as the cpu might not have come back to life. This in turn enables us
to even support affinity_broken across suspend/resume.

Avoid all other scheduler dealloc - alloc dance when doing suspend and
resume, too. It is enough to react on cpus failing to come up on resume
again.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
---
 xen/common/schedule.c | 161 ++++++++++++++++----------------------------------
 1 file changed, 52 insertions(+), 109 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 86bd10945d..a418506c6b 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -560,33 +560,6 @@ static void vcpu_move_locked(struct vcpu *v, unsigned int new_cpu)
         v->processor = new_cpu;
 }
 
-/*
- * Move a vcpu from its current processor to a target new processor,
- * without asking the scheduler to do any placement. This is intended
- * for being called from special contexts, where things are quiet
- * enough that no contention is supposed to happen (i.e., during
- * shutdown or software suspend, like ACPI S3).
- */
-static void vcpu_move_nosched(struct vcpu *v, unsigned int new_cpu)
-{
-    unsigned long flags;
-    spinlock_t *lock, *new_lock;
-
-    ASSERT(system_state == SYS_STATE_suspend);
-    ASSERT(!vcpu_runnable(v) && (atomic_read(&v->pause_count) ||
-                                 atomic_read(&v->domain->pause_count)));
-
-    lock = per_cpu(schedule_data, v->processor).schedule_lock;
-    new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
-
-    sched_spin_lock_double(lock, new_lock, &flags);
-    ASSERT(new_cpu != v->processor);
-    vcpu_move_locked(v, new_cpu);
-    sched_spin_unlock_double(lock, new_lock, flags);
-
-    sched_move_irqs(v);
-}
-
 /*
  * Initiating migration
  *
@@ -735,31 +708,36 @@ void restore_vcpu_affinity(struct domain *d)
 
         ASSERT(!vcpu_runnable(v));
 
-        lock = vcpu_schedule_lock_irq(v);
-
-        if ( v->affinity_broken )
-        {
-            sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
-            v->affinity_broken = 0;
-
-        }
-
         /*
-         * During suspend (in cpu_disable_scheduler()), we moved every vCPU
-         * to BSP (which, as of now, is pCPU 0), as a temporary measure to
-         * allow the nonboot processors to have their data structure freed
-         * and go to sleep. But nothing guardantees that the BSP is a valid
-         * pCPU for a particular domain.
+         * Re-assign the initial processor as after resume we have no
+         * guarantee the old processor has come back to life again.
          *
          * Therefore, here, before actually unpausing the domains, we should
          * set v->processor of each of their vCPUs to something that will
          * make sense for the scheduler of the cpupool in which they are in.
          */
         cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
-                    cpupool_domain_cpumask(v->domain));
-        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+                    cpupool_domain_cpumask(d));
+        if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
+        {
+            if ( v->affinity_broken )
+            {
+                sched_set_affinity(v, v->cpu_hard_affinity_saved, NULL);
+                v->affinity_broken = 0;
+                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                            cpupool_domain_cpumask(d));
+            }
 
-        spin_unlock_irq(lock);
+            if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
+            {
+                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
+                sched_set_affinity(v, &cpumask_all, NULL);
+                cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                            cpupool_domain_cpumask(d));
+            }
+        }
+
+        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
 
         lock = vcpu_schedule_lock_irq(v);
         v->processor = SCHED_OP(vcpu_scheduler(v), pick_cpu, v);
@@ -783,7 +761,6 @@ int cpu_disable_scheduler(unsigned int cpu)
     struct vcpu *v;
     struct cpupool *c;
     cpumask_t online_affinity;
-    unsigned int new_cpu;
     int ret = 0;
 
     c = per_cpu(cpupool, cpu);
@@ -809,14 +786,7 @@ int cpu_disable_scheduler(unsigned int cpu)
                     break;
                 }
 
-                if (system_state == SYS_STATE_suspend)
-                {
-                    cpumask_copy(v->cpu_hard_affinity_saved,
-                                 v->cpu_hard_affinity);
-                    v->affinity_broken = 1;
-                }
-                else
-                    printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
+                printk(XENLOG_DEBUG "Breaking affinity for %pv\n", v);
 
                 sched_set_affinity(v, &cpumask_all, NULL);
             }
@@ -828,60 +798,26 @@ int cpu_disable_scheduler(unsigned int cpu)
                 continue;
             }
 
-            /* If it is on this cpu, we must send it away. */
-            if ( unlikely(system_state == SYS_STATE_suspend) )
-            {
-                vcpu_schedule_unlock_irqrestore(lock, flags, v);
-
-                /*
-                 * If we are doing a shutdown/suspend, it is not necessary to
-                 * ask the scheduler to chime in. In fact:
-                 *  * there is no reason for it: the end result we are after
-                 *    is just 'all the vcpus on the boot pcpu, and no vcpu
-                 *    anywhere else', so let's just go for it;
-                 *  * it's wrong, for cpupools with only non-boot pcpus, as
-                 *    the scheduler would always fail to send the vcpus away
-                 *    from the last online (non boot) pcpu!
-                 *
-                 * Therefore, in the shutdown/suspend case, we just pick up
-                 * one (still) online pcpu. Note that, at this stage, all
-                 * domains (including dom0) have been paused already, so we
-                 * do not expect any vcpu activity at all.
-                 */
-                cpumask_andnot(&online_affinity, &cpu_online_map,
-                               cpumask_of(cpu));
-                BUG_ON(cpumask_empty(&online_affinity));
-                /*
-                 * As boot cpu is, usually, pcpu #0, using cpumask_first()
-                 * will make us converge quicker.
-                 */
-                new_cpu = cpumask_first(&online_affinity);
-                vcpu_move_nosched(v, new_cpu);
-            }
-            else
-            {
-                /*
-                 * OTOH, if the system is still live, and we are here because
-                 * we are doing some cpupool manipulations:
-                 *  * we want to call the scheduler, and let it re-evaluation
-                 *    the placement of the vcpu, taking into account the new
-                 *    cpupool configuration;
-                 *  * the scheduler will always fine a suitable solution, or
-                 *    things would have failed before getting in here.
-                 */
-                vcpu_migrate_start(v);
-                vcpu_schedule_unlock_irqrestore(lock, flags, v);
+            /* If it is on this cpu, we must send it away.
+             * We are doing some cpupool manipulations:
+             *  * we want to call the scheduler, and let it re-evaluation
+             *    the placement of the vcpu, taking into account the new
+             *    cpupool configuration;
+             *  * the scheduler will always find a suitable solution, or
+             *    things would have failed before getting in here.
+             */
+            vcpu_migrate_start(v);
+            vcpu_schedule_unlock_irqrestore(lock, flags, v);
 
-                vcpu_migrate_finish(v);
+            vcpu_migrate_finish(v);
 
-                /*
-                 * The only caveat, in this case, is that if a vcpu active in
-                 * the hypervisor isn't migratable. In this case, the caller
-                 * should try again after releasing and reaquiring all locks.
-                 */
-                if ( v->processor == cpu )
-                    ret = -EAGAIN;
-            }
+            /*
+             * The only caveat, in this case, is that if a vcpu active in
+             * the hypervisor isn't migratable. In this case, the caller
+             * should try again after releasing and reaquiring all locks.
+             */
+            if ( v->processor == cpu )
+                ret = -EAGAIN;
         }
     }
 
@@ -1742,26 +1678,33 @@ static int cpu_schedule_callback(
     switch ( action )
     {
     case CPU_STARTING:
-        SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
+        if ( system_state != SYS_STATE_resume )
+            SCHED_OP(sched, init_pdata, sd->sched_priv, cpu);
         break;
     case CPU_UP_PREPARE:
-        rc = cpu_schedule_up(cpu);
+        if ( system_state != SYS_STATE_resume )
+            rc = cpu_schedule_up(cpu);
         break;
     case CPU_DOWN_PREPARE:
         rcu_read_lock(&domlist_read_lock);
         rc = cpu_disable_scheduler_check(cpu);
         rcu_read_unlock(&domlist_read_lock);
         break;
+    case CPU_RESUME_FAILED:
     case CPU_DEAD:
+        if ( system_state == SYS_STATE_suspend )
+            break;
         rcu_read_lock(&domlist_read_lock);
         rc = cpu_disable_scheduler(cpu);
         BUG_ON(rc);
         rcu_read_unlock(&domlist_read_lock);
         SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
-        /* Fallthrough */
-    case CPU_UP_CANCELED:
         cpu_schedule_down(cpu);
         break;
+    case CPU_UP_CANCELED:
+        if ( system_state != SYS_STATE_resume )
+            cpu_schedule_down(cpu);
+        break;
     default:
         break;
     }
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/6] xen/sched: call cpu_disable_scheduler() via cpu notifier
  2019-04-02 16:19 ` [PATCH v4 1/6] xen/sched: call cpu_disable_scheduler() via cpu notifier Juergen Gross
@ 2019-04-04 13:06   ` Dario Faggioli
  0 siblings, 0 replies; 8+ messages in thread
From: Dario Faggioli @ 2019-04-04 13:06 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Wei Liu, George Dunlap, Andrew Cooper,
	Julien Grall, Jan Beulich, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1043 bytes --]

On Tue, 2019-04-02 at 18:19 +0200, Juergen Gross wrote:
> cpu_disable_scheduler() is being called from __cpu_disable() today.
> There is no need to execute it on the cpu just being disabled, so use
> the CPU_DEAD case of the cpu notifier chain. Moving the call out of
> stop_machine() context is fine, as we just need to hold the domain
> RCU
> lock and need the scheduler percpu data to be still allocated.
> 
> Add another hook for CPU_DOWN_PREPARE to bail out early in case
> cpu_disable_scheduler() would fail. This will avoid crashes in rare
> cases for cpu hotplug or suspend.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Acked-by: Julien Grall <julien.grall@arm.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Regards,
Dario

-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-04-04 13:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-02 16:19 [PATCH v4 0/6] xen: simplify suspend/resume handling Juergen Gross
2019-04-02 16:19 ` [PATCH v4 1/6] xen/sched: call cpu_disable_scheduler() via cpu notifier Juergen Gross
2019-04-04 13:06   ` Dario Faggioli
2019-04-02 16:19 ` [PATCH v4 2/6] xen: add helper for calling notifier_call_chain() to common/cpu.c Juergen Gross
2019-04-02 16:19 ` [PATCH v4 3/6] xen: add new cpu notifier action CPU_RESUME_FAILED Juergen Gross
2019-04-02 16:19 ` [PATCH v4 4/6] xen: don't free percpu areas during suspend Juergen Gross
2019-04-02 16:19 ` [PATCH v4 5/6] xen/cpupool: simplify suspend/resume handling Juergen Gross
2019-04-02 16:19 ` [PATCH v4 6/6] xen/sched: don't disable scheduler on cpus during suspend Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.