From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <george.dunlap@eu.citrix.com>
Subject: Re: Xen crashing when killing a domain with no VCPUs
	allocated
Date: Mon, 21 Jul 2014 11:33:08 +0100
Message-ID: <53CCEC64.7040304@eu.citrix.com>
References: <53C920DD.6060300@linaro.org>
	<1405701560.14973.1.camel@kazak.uk.xensource.com>
	<53C982FF.7070608@linaro.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <53C982FF.7070608@linaro.org>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Julien Grall <julien.grall@linaro.org>, Ian Campbell <Ian.Campbell@citrix.com>
Cc: jgross@suse.com, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, Dario Faggioli <dario.faggioli@citrix.com>, Tim Deegan <tim@xen.org>, george.dunlap@citrix.com, xen-devel <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 07/18/2014 09:26 PM, Julien Grall wrote:
>
> On 18/07/14 17:39, Ian Campbell wrote:
>> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote:
>>> Hi all,
>>>
>>> I've been played with the function alloc_vcpu on ARM. And I hit one case
>>> where this function can failed.
>>>
>>> During domain creation, the toolstack will call DOMCTL_max_vcpus which may
>>> fail, for instance because alloc_vcpu didn't succeed. In this case, the
>>> toolstack will call DOMCTL_domaindestroy. And I got the below stack trace.
>>>
>>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning
>>> in an error in vcpu_initialize.
>>>
>>> I'm not sure how to correctly fix it.
>> I think a simple check at the head of the function would be ok.
>>
>> Alternatively perhaps in sched_mode_domain, which could either detect
>> this or could detect a domain in pool0 being moved to pool0 and short
>> circuit.
> I was thinking about the small fix below. If it's fine for everyone, I can
> send a patch next week.
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index e9eb0bc..c44d047 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>       }
>   
>       /* Do we have vcpus already? If not, no need to update node-affinity */
> -    if ( d->vcpu )
> +    if ( d->vcpu && d->vcpu[0] != NULL )
>           domain_update_node_affinity(d);

So is the problem that we're allocating the vcpu array area, but not 
putting any vcpus in it?

Overall it seems like those checks for the existence of cpus should be 
moved into domain_update_node_affinity().  The ASSERT() there I think is 
just a sanity check to make sure we're not getting a ridiculous result 
out of our calculation; but of course if there actually are no vcpus, 
it's not ridiculous at all.

One solution might be to change the ASSERT to 
ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]).  Then we 
could probably even remove the d->vcpu conditional when calling it.

  -George