Re: [PATCH v2 1/2] xen: sched: reorganize cpu_disable_scheduler()

From: Juergen Gross <jgross@suse.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Subject: Re: [PATCH v2 1/2] xen: sched: reorganize cpu_disable_scheduler()
Date: Mon, 20 Jul 2015 13:41:49 +0200	[thread overview]
Message-ID: <55ACDE7D.50108@suse.com> (raw)
In-Reply-To: <20150717133554.29612.14835.stgit@Solace.station>

On 07/17/2015 03:35 PM, Dario Faggioli wrote:
> The function is called both when we want to remove a cpu
> from a cpupool, and during cpu teardown, for suspend or
> shutdown. If, however, the boot cpu (cpu 0, most of the
> times) is not present in the default cpupool, during
> suspend or shutdown, Xen crashes like this:
>
>    root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0
>    root@Zhaman:~# shutdown -h now
>    (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>    ...
>    (XEN) Xen call trace:
>    (XEN)    [<ffff82d0801238de>] _csched_cpu_pick+0x156/0x61f
>    (XEN)    [<ffff82d080123db5>] csched_cpu_pick+0xe/0x10
>    (XEN)    [<ffff82d08012de3c>] vcpu_migrate+0x18e/0x321
>    (XEN)    [<ffff82d08012e4f8>] cpu_disable_scheduler+0x1cf/0x2ac
>    (XEN)    [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
>    (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>    (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
>    (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
>    (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
>    (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>    (XEN)
>    (XEN)
>    (XEN) ****************************************
>    (XEN) Panic on CPU 15:
>    (XEN) Assertion 'cpu < nr_cpu_ids' failed at ...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97
>    (XEN) ****************************************
>
> There also are problems when we try to suspend or shutdown
> with a cpupool configured with just one cpu (no matter, in
> this case, whether that is the boot cpu or not):
>
>    root@Zhaman:~# xl create /etc/xen/test.cfg
>    root@Zhaman:~# xl cpupool-migrate test Pool-1
>    root@Zhaman:~# xl cpupool-list -c
>    Name               CPU list
>    Pool-0             0,1,2,3,4,5,6,7,8,9,10,11,13,14,15
>    Pool-1             12
>    root@Zhaman:~# shutdown -h now
>    (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>    (XEN) CPU:    12
>    ...
>    (XEN) Xen call trace:
>    (XEN)    [<ffff82d08018bb91>] __cpu_disable+0x317/0x36e
>    (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>    (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
>    (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
>    (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
>    (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>    (XEN)
>    (XEN)
>    (XEN) ****************************************
>    (XEN) Panic on CPU 12:
>    (XEN) Xen BUG at smpboot.c:895
>    (XEN) ****************************************
>
> In both cases, the problem is the scheduler not being able
> to:
>   - move all the vcpus to the boot cpu (as the boot cpu is
>     not in the cpupool), in the former;
>   - move the vcpus away from a cpu at all (as that is the
>     only one cpu in the cpupool), in the latter.
>
> Solution is to distinguish, inside cpu_disable_scheduler(),
> the two cases of cpupool manipulation and teardown. For
> cpupool manipulation, it is correct to ask the scheduler to
> take an action, as pathological situation (like there not
> being any cpu in the pool where to send vcpus) are taken
> care of (i.e., forbidden!) already. For suspend and shutdown,
> we don't want the scheduler to be involved at all, as the
> final goal is pretty simple: "send all the vcpus to the
> boot cpu ASAP", so we just go for it.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Changes from v1:
>   * BUG_ON() if, in the suspend/shutdown case, the mask of
>     online pCPUs will ever get empty, as suggested during
>     review;
>   * reorganize and improve comments inside cpu_disable_scheduler()
>     as suggested during review;
>   * make it more clear that vcpu_move_nosched() (name
>     changed, as suggested during review), should only be
>     called from "quite contextes", such us, during suspend

s/quite/quiet/

>     or shutdown. Do that via both comments and asserts,
>     as requested during review;
>   * reorganize cpu_disable_scheduler() and vcpu_move_nosched()
>     so that calling to sleep and wakeup functions are only
>     called when necessary (i.e., *not* in case we are
>     suspending/shutting down, as requested during review.
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
>   xen/common/schedule.c |  102 +++++++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 86 insertions(+), 16 deletions(-)
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index df8c1d0..ed0f356 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c

...

> @@ -644,25 +673,66 @@ int cpu_disable_scheduler(unsigned int cpu)
>                   cpumask_setall(v->cpu_hard_affinity);
>               }
>
> -            if ( v->processor == cpu )
> +            if ( v->processor != cpu )
>               {
> -                set_bit(_VPF_migrating, &v->pause_flags);
> +                /* This vcpu is not on cpu, so we can move on. */
>                   vcpu_schedule_unlock_irqrestore(lock, flags, v);
> -                vcpu_sleep_nosync(v);
> -                vcpu_migrate(v);
> +                continue;
> +            }
> +
> +            /* If it is on cpu, we must send it away. */
> +            if ( unlikely(system_state == SYS_STATE_suspend) )
> +            {
> +                /*
> +                 * If we are doing a shutdown/suspend, it is not necessary to
> +                 * ask the scheduler to chime in. In fact:
> +                 *  * there is no reason for it: the end result we are after
> +                 *    is just 'all the vcpus on the boot pcpu, and no vcpu
> +                 *    anywhere else', so let's just go for it;
> +                 *  * it's wrong, for cpupools with only non-boot pcpus, as
> +                 *    the scheduler would always fail to send the vcpus away
> +                 *    from the last online (non boot) pcpu!
> +                 *
> +                 * Therefore, in the shutdown/suspend case, we just pick up
> +                 * one (still) online pcpu. Note that, at this stage, all
> +                 * domains (including dom0) have been paused already, so we
> +                 * do not expect any vcpu activity at all.
> +                 */
> +                cpumask_andnot(&online_affinity, &cpu_online_map,
> +                               cpumask_of(cpu));
> +                BUG_ON(cpumask_empty(&online_affinity));
> +                /*
> +                 * As boot cpu is, usually, pcpu #0, using cpumask_first()
> +                 * will make us converge quicker.
> +                 */
> +                new_cpu = cpumask_first(&online_affinity);
> +                vcpu_move_nosched(v, new_cpu);

Shouldn't there be a vcpu_schedule_unlock_irqrestore() ?

Juergen