Re: [PATCH v3 1/2] xen: sched: reorganize cpu_disable_scheduler()

From: George Dunlap <george.dunlap@citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Juergen Gross <jgross@suse.com>
Subject: Re: [PATCH v3 1/2] xen: sched: reorganize cpu_disable_scheduler()
Date: Wed, 22 Jul 2015 15:48:29 +0100	[thread overview]
Message-ID: <55AFAD3D.40900@citrix.com> (raw)
In-Reply-To: <20150721160612.14003.85926.stgit@Solace.station>

On 07/21/2015 05:06 PM, Dario Faggioli wrote:
> The function is called both when we want to remove a cpu
> from a cpupool, and during cpu teardown, for suspend or
> shutdown. If, however, the boot cpu (cpu 0, most of the
> times) is not present in the default cpupool, during
> suspend or shutdown, Xen crashes like this:
> 
>   root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0
>   root@Zhaman:~# shutdown -h now
>   (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>   ...
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d0801238de>] _csched_cpu_pick+0x156/0x61f
>   (XEN)    [<ffff82d080123db5>] csched_cpu_pick+0xe/0x10
>   (XEN)    [<ffff82d08012de3c>] vcpu_migrate+0x18e/0x321
>   (XEN)    [<ffff82d08012e4f8>] cpu_disable_scheduler+0x1cf/0x2ac
>   (XEN)    [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
>   (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>   (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
>   (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
>   (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
>   (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>   (XEN)
>   (XEN)
>   (XEN) ****************************************
>   (XEN) Panic on CPU 15:
>   (XEN) Assertion 'cpu < nr_cpu_ids' failed at ...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97
>   (XEN) ****************************************
> 
> There also are problems when we try to suspend or shutdown
> with a cpupool configured with just one cpu (no matter, in
> this case, whether that is the boot cpu or not):
> 
>   root@Zhaman:~# xl create /etc/xen/test.cfg
>   root@Zhaman:~# xl cpupool-migrate test Pool-1
>   root@Zhaman:~# xl cpupool-list -c
>   Name               CPU list
>   Pool-0             0,1,2,3,4,5,6,7,8,9,10,11,13,14,15
>   Pool-1             12
>   root@Zhaman:~# shutdown -h now
>   (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>   (XEN) CPU:    12
>   ...
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d08018bb91>] __cpu_disable+0x317/0x36e
>   (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>   (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
>   (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
>   (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
>   (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>   (XEN)
>   (XEN)
>   (XEN) ****************************************
>   (XEN) Panic on CPU 12:
>   (XEN) Xen BUG at smpboot.c:895
>   (XEN) ****************************************
> 
> In both cases, the problem is the scheduler not being able
> to:
>  - move all the vcpus to the boot cpu (as the boot cpu is
>    not in the cpupool), in the former;
>  - move the vcpus away from a cpu at all (as that is the
>    only one cpu in the cpupool), in the latter.
> 
> Solution is to distinguish, inside cpu_disable_scheduler(),
> the two cases of cpupool manipulation and teardown. For
> cpupool manipulation, it is correct to ask the scheduler to
> take an action, as pathological situation (like there not
> being any cpu in the pool where to send vcpus) are taken
> care of (i.e., forbidden!) already. For suspend and shutdown,
> we don't want the scheduler to be involved at all, as the
> final goal is pretty simple: "send all the vcpus to the
> boot cpu ASAP", so we just go for it.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Changes from v2:
>  * add a missing spin_unlock, most likely eaten by a
>    forgotten `stg refresh' (sorry!)
>  * fix a typo
> 
> Changes from v1:
>  * BUG_ON() if, in the suspend/shutdown case, the mask of
>    online pCPUs will ever get empty, as suggested during
>    review;
>  * reorganize and improve comments inside cpu_disable_scheduler()
>    as suggested during review;
>  * make it more clear that vcpu_move_nosched() (name
>    changed, as suggested during review), should only be
>    called from "quite contextes", such us, during suspend
>    or shutdown. Do that via both comments and asserts,
>    as requested during review;
>  * reorganize cpu_disable_scheduler() and vcpu_move_nosched()
>    so that calling to sleep and wakeup functions are only
>    called when necessary (i.e., *not* in case we are
>    suspending/shutting down, as requested during review.
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>

Dario,

Did you not get my response to v2 of this, with the language nits in the
comments?

It's in my sent-mail folder from my corporate account, but I don't see
it anywhere in my gmail.  Anyway let me re-send:

> +/*
> + * Move a vcpu from it's current processor to a target new processor,

*its

> + * without asking the scheduler to do any placement. This is intended
> + * for being called from special contextes, where things are quiet

*contexts

> + * enough that no contention is supposed to happen (i.e., during
> + * shutdown or software suspend, like ACPI S3).
> + */
> +static void vcpu_move_nosched(struct vcpu *v, unsigned int new_cpu)
> +{
> +    unsigned long flags;
> +    spinlock_t *lock, *new_lock;
> +
> +    ASSERT(system_state == SYS_STATE_suspend);
> +    ASSERT(!vcpu_runnable(v) && (atomic_read(&v->pause_count) ||
> +                                 atomic_read(&v->domain->pause_count)));
> +
> +    lock = per_cpu(schedule_data, v->processor).schedule_lock;
> +    new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
> +
> +    sched_spin_lock_double(lock, new_lock, &flags);
> +    ASSERT(new_cpu != v->processor);
> +    vcpu_move_locked(v, new_cpu);
> +    sched_spin_unlock_double(lock, new_lock, flags);
> +
> +    sched_move_irqs(v);
> +}
> +
>  static void vcpu_migrate(struct vcpu *v)
>  {
>      unsigned long flags;
> @@ -542,7 +570,7 @@ static void vcpu_migrate(struct vcpu *v)
>          return;
>      }
>  
> -    vcpu_move(v, old_cpu, new_cpu);
> +    vcpu_move_locked(v, new_cpu);
>  
>      sched_spin_unlock_double(old_lock, new_lock, flags);
>  
> @@ -615,7 +643,8 @@ int cpu_disable_scheduler(unsigned int cpu)
>      struct vcpu *v;
>      struct cpupool *c;
>      cpumask_t online_affinity;
> -    int    ret = 0;
> +    unsigned int new_cpu;
> +    int ret = 0;
>  
>      c = per_cpu(cpupool, cpu);
>      if ( c == NULL )
> @@ -644,25 +673,68 @@ int cpu_disable_scheduler(unsigned int cpu)
>                  cpumask_setall(v->cpu_hard_affinity);
>              }
>  
> -            if ( v->processor == cpu )
> +            if ( v->processor != cpu )
>              {
> -                set_bit(_VPF_migrating, &v->pause_flags);
> +                /* This vcpu is not on cpu, so we can move on. */

"This vcpu is not on this cpu..."?

>                  vcpu_schedule_unlock_irqrestore(lock, flags, v);
> -                vcpu_sleep_nosync(v);
> -                vcpu_migrate(v);
> +                continue;
> +            }
> +
> +            /* If it is on cpu, we must send it away. */

Again, "this cpu"?

Other than that:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>