Re: Scheduler regression in 4.7

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>,
	Xen-devel List <xen-devel@lists.xen.org>
Subject: Re: Scheduler regression in 4.7
Date: Thu, 11 Aug 2016 16:42:13 +0100	[thread overview]
Message-ID: <bbc5e619-4ef5-16fd-bb4c-782ffd9b0988@citrix.com> (raw)
In-Reply-To: <1470925692.6250.20.camel@citrix.com>

On 11/08/16 15:28, Dario Faggioli wrote:
> On Thu, 2016-08-11 at 14:39 +0100, Andrew Cooper wrote:
>> On 11/08/16 14:24, George Dunlap wrote:
>>> On 11/08/16 12:35, Andrew Cooper wrote:
>>>> The actual cause is _csched_cpu_pick() falling over LIST_POISON,
>>>> which
>>>> happened to occur at the same time as a domain was shutting
>>>> down.  The
>>>> instruction in question is `mov 0x10(%rax),%rax` which looks like
>>>> reverse list traversal.
> Thanks for the report.
>
>>> Could you use line2addr or objdump -dl to get a better idea where
>>> the
>>> #GP is happening?
>> addr2line -e xen-syms-4.7.0-xs127493 ffff82d08012944f
>> /obj/RPM_BUILD_DIRECTORY/xen-4.7.0/xen/common/sched_credit.c:775
>> (discriminator 1)
>>
>> It will be IS_RUNQ_IDLE() which is the problem.
>>
> Ok, that does one step of list traversing (the runq). What I didn't
> understand from your report is what crashed when.

IS_RUNQ_IDLE() was traversing a list, and it encountered an element
which was being concurrently deleted on a different pcpu.

>
> IS_RUNQ_IDLE() has been introduced a while back and anything like that
> has been ever caught so far. George's patch makes _csched_cpu_pick() be
> called during insert_vcpu()-->csched_vcpu_insert() which, in 4.7, is
> called:
>  1) during domain (well, vcpu) creation,
>  2) when domain is moved among cpupools
>
> AFAICR, during domain destruction we basically move the domain to
> cpupool0, and without a patch that I sent recently, that is always done
> as a full fledged cpupool movement, even if the domain is _already_ in
> cpupool0. So, even if you are not using cpupools, and since you mention
> domain shutdown we probably are looking at 2).

XenServer doesn't use any cpupools, so all pcpus and vcpus are in cpupool0.

>
> But this is what I'm not sure I got well... Do you have enough info to
> tell precisely when the crash manifests? Is it indeed during a domain
> shutdown, or was it during a domain creation (sched_init_vcpu() is in
> the stack trace... although I've read it's a non-debug one)? And is it
> a 'regular' domain or dom0 that is shutting down/coming up?

It is a vm reboot of a an HVM domU (CentOS 7 64bit, although I doubt
that is relevant).

The testcase is vm lifecycle ops on a 32vcpu VM, on a host which happens
to have 32pcpus.

>
> The idea behind IS_RUNQ_IDLE() is that we need to know whether there is
> someone in the runq of a cpu or not, to correctly initialize --and
> hence avoid biasing-- some load balancing calculations. I've never
> liked the idea (leave it alone the code), but it's necessary (or, at
> least, I don't see a sensible alternative).
>
> The questions I'm asking above have the aim of figuring out what the
> status of the runq could be, and why adding a call to csched_cpu_pick()
> from insert_vcpu() is making things explode...

It turns out that the stack trace is rather less stack rubble than I
first thought.  We are in domain construction, and specifically the
XEN_DOMCTL_max_vcpus hypercall.  All other pcpus are in idle.

    for ( i = 0; i < max; i++ )
    {
            if ( d->vcpu[i] != NULL )
                continue;

            cpu = (i == 0) ?
                cpumask_any(online) :
                cpumask_cycle(d->vcpu[i-1]->processor, online);

            if ( alloc_vcpu(d, i, cpu) == NULL )
                goto maxvcpu_out;
    }

The cpumask_cycle() call is complete and execution has moved into
alloc_vcpu()

Unfortunately, none of the code around here spills i or cpu onto the
stack, so I can't see which values the have from the stack dump.

However, I see that csched_vcpu_insert() plays with vc->processor, which
surely invalidates the cycle logic behind this loop?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel