From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: Xen crashing when killing a domain with no VCPUs allocated Date: Mon, 21 Jul 2014 11:33:08 +0100 Message-ID: <53CCEC64.7040304@eu.citrix.com> References: <53C920DD.6060300@linaro.org> <1405701560.14973.1.camel@kazak.uk.xensource.com> <53C982FF.7070608@linaro.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <53C982FF.7070608@linaro.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Julien Grall , Ian Campbell Cc: jgross@suse.com, Stefano Stabellini , Dario Faggioli , Tim Deegan , george.dunlap@citrix.com, xen-devel List-Id: xen-devel@lists.xenproject.org On 07/18/2014 09:26 PM, Julien Grall wrote: > > On 18/07/14 17:39, Ian Campbell wrote: >> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote: >>> Hi all, >>> >>> I've been played with the function alloc_vcpu on ARM. And I hit one case >>> where this function can failed. >>> >>> During domain creation, the toolstack will call DOMCTL_max_vcpus which may >>> fail, for instance because alloc_vcpu didn't succeed. In this case, the >>> toolstack will call DOMCTL_domaindestroy. And I got the below stack trace. >>> >>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning >>> in an error in vcpu_initialize. >>> >>> I'm not sure how to correctly fix it. >> I think a simple check at the head of the function would be ok. >> >> Alternatively perhaps in sched_mode_domain, which could either detect >> this or could detect a domain in pool0 being moved to pool0 and short >> circuit. > I was thinking about the small fix below. If it's fine for everyone, I can > send a patch next week. > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index e9eb0bc..c44d047 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) > } > > /* Do we have vcpus already? If not, no need to update node-affinity */ > - if ( d->vcpu ) > + if ( d->vcpu && d->vcpu[0] != NULL ) > domain_update_node_affinity(d); So is the problem that we're allocating the vcpu array area, but not putting any vcpus in it? Overall it seems like those checks for the existence of cpus should be moved into domain_update_node_affinity(). The ASSERT() there I think is just a sanity check to make sure we're not getting a ridiculous result out of our calculation; but of course if there actually are no vcpus, it's not ridiculous at all. One solution might be to change the ASSERT to ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]). Then we could probably even remove the d->vcpu conditional when calling it. -George