All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen crashing when killing a domain with no VCPUs allocated
@ 2014-07-18 13:27 Julien Grall
  2014-07-18 16:39 ` Ian Campbell
  0 siblings, 1 reply; 11+ messages in thread
From: Julien Grall @ 2014-07-18 13:27 UTC (permalink / raw)
  To: xen-devel, george.dunlap, Dario Faggioli, Ian Campbell, jgross
  Cc: Tim Deegan, Stefano Stabellini

Hi all,

I've been played with the function alloc_vcpu on ARM. And I hit one case
where this function can failed.

During domain creation, the toolstack will call DOMCTL_max_vcpus which may
fail, for instance because alloc_vcpu didn't succeed. In this case, the
toolstack will call DOMCTL_domaindestroy. And I got the below stack trace.

It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning
in an error in vcpu_initialize.

I'm not sure how to correctly fix it.

Regards,

(XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:452
(XEN) ----[ Xen-4.5-unstable  arm32  debug=y  Tainted:    C ]----
(XEN) CPU:    1
(XEN) PC:     00207bd8 domain_update_node_affinity+0x10c/0x238
(XEN) CPSR:   2007015a MODE:Hypervisor
(XEN)      R0: 00000001 R1: 00000080 R2: 7ffce7c4 R3: 00000004
(XEN)      R4: 00000000 R5: 7ffce7b8 R6: 7ffce7a8 R7: 7ffce7d0
(XEN)      R8: 7ffce7b8 R9: 7ffcf000 R10:4000ed08 R11:7ffd7d64 R12:00000000
(XEN) HYP: SP: 7ffd7d34 LR: 00000004
(XEN) 
(XEN)   VTCR_EL2: 80003558
(XEN)  VTTBR_EL2: 00010002f9ffc000
(XEN) 
(XEN)  SCTLR_EL2: 30cd187f
(XEN)    HCR_EL2: 000000000038643f
(XEN)  TTBR0_EL2: 00000000fdfe5000
(XEN) 
(XEN)    ESR_EL2: 00000000
(XEN)  HPFAR_EL2: 0000000000fff110
(XEN)      HDFAR: a0800f00
(XEN)      HIFAR: 00000000
(XEN) 
(XEN) Xen stack trace from sp=7ffd7d34:
(XEN)    00207bd0 00000004 7ffcf6a8 4000ece0 00000000 00000000 7ffce7a8 4000ece0
(XEN)    00000ea1 9cd1e000 7ecc0ff8 7ffd7dac 00226870 00305000 7ffce758 9cd1e000
(XEN)    7ecc0ff8 00270e00 0024ec44 00000000 7ffcf000 fffff000 7ffcf000 76efb004
(XEN)    00000000 00305000 00000ea1 9cd1e000 7ecc0ff8 7ffd7dc4 0020925c 7ffcf000
(XEN)    76efb004 00000000 00305000 7ffd7edc 00206a0c 7ffd7e0c 7ffd7e50 00000004
(XEN)    7ffd7dec 7ffd7e0c 00001800 774623f9 00000000 00000000 00000000 00007747
(XEN)    0000bc4e 7ffd7ea4 00000000 00000000 00008f0d 00001800 00000000 40021578
(XEN)    40021000 40021560 400218e4 40021948 00000001 00000002 0000000a ffff0001
(XEN)    00000000 76e32110 76f01680 7ecc10dc 76eeea11 76efd140 00000001 00000001
(XEN)    00000000 00000000 76d24228 00000000 00031dd8 00037840 00000000 76efc000
(XEN)    76e57000 0006f73c 00000000 00000001 00031330 7ecc111c 76eeea11 00000000
(XEN)    00000001 00000001 00000000 7ecc10cc 76e32110 00000001 00037840 00030030
(XEN)    00000001 7ffd7f58 7ffd7f58 8000dcb4 00000005 00305000 00000ea1 9cd1e000
(XEN)    7ecc0ff8 7ffd7f54 002529c0 5a962879 002f7594 7ffd7f3c 7ffd7f3c 002c1ff4
(XEN)    200e01da 00000004 00270b80 002c1ff0 002f4244 7ffd7f3c 7ffd7f3c 00000004
(XEN)    40021000 1f680000 002be000 002f7594 30c23c7d 80599e7c 20003010 413fc0f2
(XEN)    5a962879 9e9ac400 00000005 00305000 00000005 9cd1e000 7ecc0ff8 7ffd7f58
(XEN)    002559f0 76efb004 00000001 000000f6 00000000 5a962879 9e9ac400 00000005
(XEN)    00305000 00000005 9cd1e000 7ecc0ff8 9f1309b8 00000024 ffffffff 76e4b130
(XEN)    8000dcb4 60070013 00000000 7ecc0fc4 80599d40 8001ebe0 9cd1feb4 80210604
(XEN) Xen call trace:
(XEN)    [<00207bd8>] domain_update_node_affinity+0x10c/0x238 (PC)
(XEN)    [<00000004>] 00000004 (LR)
(XEN)    [<00226870>] sched_move_domain+0x3cc/0x42c
(XEN)    [<0020925c>] domain_kill+0xc8/0x178
(XEN)    [<00206a0c>] do_domctl+0xaac/0x15e4
(XEN)    [<002529c0>] do_trap_hypervisor+0xc5c/0xf94
(XEN)    [<002559f0>] return_from_trap+0/0x4
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:452
(XEN) ****************************************

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Xen crashing when killing a domain with no VCPUs allocated
  2014-07-18 13:27 Xen crashing when killing a domain with no VCPUs allocated Julien Grall
@ 2014-07-18 16:39 ` Ian Campbell
  2014-07-18 20:26   ` Julien Grall
  2014-07-21 10:12   ` George Dunlap
  0 siblings, 2 replies; 11+ messages in thread
From: Ian Campbell @ 2014-07-18 16:39 UTC (permalink / raw)
  To: Julien Grall
  Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan,
	george.dunlap, xen-devel

On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote:
> Hi all,
> 
> I've been played with the function alloc_vcpu on ARM. And I hit one case
> where this function can failed.
> 
> During domain creation, the toolstack will call DOMCTL_max_vcpus which may
> fail, for instance because alloc_vcpu didn't succeed. In this case, the
> toolstack will call DOMCTL_domaindestroy. And I got the below stack trace.
> 
> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning
> in an error in vcpu_initialize.
> 
> I'm not sure how to correctly fix it.

I think a simple check at the head of the function would be ok.

Alternatively perhaps in sched_mode_domain, which could either detect
this or could detect a domain in pool0 being moved to pool0 and short
circuit.

[...]
> (XEN)    [<00226870>] sched_move_domain+0x3cc/0x42c
> (XEN)    [<0020925c>] domain_kill+0xc8/0x178

This call path surprised me but it is from:

commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a
Author: Juergen Gross <juergen.gross@ts.fujitsu.com>
Date:   Tue May 20 15:55:42 2014 +0200

    move domain to cpupool0 before destroying it
    
    Currently when a domain is destroyed it is removed from the domain_list
    before all of it's resources, including the cpupool membership, are freed.
    This can lead to a situation where the domain is still member of a cpupool
    without for_each_domain_in_cpupool() (or even for_each_domain()) being
    able to find it any more. This in turn can result in rejection of removing
    the last cpu from a cpupool, because there seems to be still a domain in
    the cpupool, even if it can't be found by scanning through all domains.
    
    This situation can be avoided by moving the domain to be destroyed to
    cpupool0 first and then remove it from this cpupool BEFORE deleting it from
    the domain_list. As cpupool0 is always active and a domain without any cpupool
    membership is implicitly regarded as belonging to cpupool0, this poses no
    problem.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Xen crashing when killing a domain with no VCPUs allocated
  2014-07-18 16:39 ` Ian Campbell
@ 2014-07-18 20:26   ` Julien Grall
  2014-07-21 10:33     ` George Dunlap
  2014-07-21 10:12   ` George Dunlap
  1 sibling, 1 reply; 11+ messages in thread
From: Julien Grall @ 2014-07-18 20:26 UTC (permalink / raw)
  To: Ian Campbell
  Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan,
	george.dunlap, xen-devel



On 18/07/14 17:39, Ian Campbell wrote:
> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote:
>> Hi all,
>>
>> I've been played with the function alloc_vcpu on ARM. And I hit one case
>> where this function can failed.
>>
>> During domain creation, the toolstack will call DOMCTL_max_vcpus which may
>> fail, for instance because alloc_vcpu didn't succeed. In this case, the
>> toolstack will call DOMCTL_domaindestroy. And I got the below stack trace.
>>
>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning
>> in an error in vcpu_initialize.
>>
>> I'm not sure how to correctly fix it.
> 
> I think a simple check at the head of the function would be ok.
> 
> Alternatively perhaps in sched_mode_domain, which could either detect
> this or could detect a domain in pool0 being moved to pool0 and short
> circuit.

I was thinking about the small fix below. If it's fine for everyone, I can
send a patch next week.

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e9eb0bc..c44d047 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     }
 
     /* Do we have vcpus already? If not, no need to update node-affinity */
-    if ( d->vcpu )
+    if ( d->vcpu && d->vcpu[0] != NULL )
         domain_update_node_affinity(d);
 
     domain_unpause(d);


Regards,

-- 
Julien Grall

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Xen crashing when killing a domain with no VCPUs allocated
  2014-07-18 16:39 ` Ian Campbell
  2014-07-18 20:26   ` Julien Grall
@ 2014-07-21 10:12   ` George Dunlap
  1 sibling, 0 replies; 11+ messages in thread
From: George Dunlap @ 2014-07-21 10:12 UTC (permalink / raw)
  To: Ian Campbell, Julien Grall
  Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan,
	george.dunlap, xen-devel, jbeulich

On 07/18/2014 05:39 PM, Ian Campbell wrote:
> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote:
>> Hi all,
>>
>> I've been played with the function alloc_vcpu on ARM. And I hit one case
>> where this function can failed.
>>
>> During domain creation, the toolstack will call DOMCTL_max_vcpus which may
>> fail, for instance because alloc_vcpu didn't succeed. In this case, the
>> toolstack will call DOMCTL_domaindestroy. And I got the below stack trace.
>>
>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning
>> in an error in vcpu_initialize.
>>
>> I'm not sure how to correctly fix it.
> I think a simple check at the head of the function would be ok.
>
> Alternatively perhaps in sched_mode_domain, which could either detect
> this or could detect a domain in pool0 being moved to pool0 and short
> circuit.
>
> [...]
>> (XEN)    [<00226870>] sched_move_domain+0x3cc/0x42c
>> (XEN)    [<0020925c>] domain_kill+0xc8/0x178
> This call path surprised me but it is from:
>
> commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a
> Author: Juergen Gross <juergen.gross@ts.fujitsu.com>
> Date:   Tue May 20 15:55:42 2014 +0200
>
>      move domain to cpupool0 before destroying it

And this is the second unforseen hypervisor crash that happened as a 
side effect of this patch.  Obviously I need to be more of a bastard as 
a mainainer.

  -George

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Xen crashing when killing a domain with no VCPUs allocated
  2014-07-18 20:26   ` Julien Grall
@ 2014-07-21 10:33     ` George Dunlap
  2014-07-21 10:42       ` Andrew Cooper
  2014-07-21 11:46       ` Julien Grall
  0 siblings, 2 replies; 11+ messages in thread
From: George Dunlap @ 2014-07-21 10:33 UTC (permalink / raw)
  To: Julien Grall, Ian Campbell
  Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan,
	george.dunlap, xen-devel

On 07/18/2014 09:26 PM, Julien Grall wrote:
>
> On 18/07/14 17:39, Ian Campbell wrote:
>> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote:
>>> Hi all,
>>>
>>> I've been played with the function alloc_vcpu on ARM. And I hit one case
>>> where this function can failed.
>>>
>>> During domain creation, the toolstack will call DOMCTL_max_vcpus which may
>>> fail, for instance because alloc_vcpu didn't succeed. In this case, the
>>> toolstack will call DOMCTL_domaindestroy. And I got the below stack trace.
>>>
>>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning
>>> in an error in vcpu_initialize.
>>>
>>> I'm not sure how to correctly fix it.
>> I think a simple check at the head of the function would be ok.
>>
>> Alternatively perhaps in sched_mode_domain, which could either detect
>> this or could detect a domain in pool0 being moved to pool0 and short
>> circuit.
> I was thinking about the small fix below. If it's fine for everyone, I can
> send a patch next week.
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index e9eb0bc..c44d047 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
>       }
>   
>       /* Do we have vcpus already? If not, no need to update node-affinity */
> -    if ( d->vcpu )
> +    if ( d->vcpu && d->vcpu[0] != NULL )
>           domain_update_node_affinity(d);

So is the problem that we're allocating the vcpu array area, but not 
putting any vcpus in it?

Overall it seems like those checks for the existence of cpus should be 
moved into domain_update_node_affinity().  The ASSERT() there I think is 
just a sanity check to make sure we're not getting a ridiculous result 
out of our calculation; but of course if there actually are no vcpus, 
it's not ridiculous at all.

One solution might be to change the ASSERT to 
ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]).  Then we 
could probably even remove the d->vcpu conditional when calling it.

  -George

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Xen crashing when killing a domain with no VCPUs allocated
  2014-07-21 10:33     ` George Dunlap
@ 2014-07-21 10:42       ` Andrew Cooper
  2014-07-21 10:49         ` George Dunlap
  2014-07-21 11:46       ` Julien Grall
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Cooper @ 2014-07-21 10:42 UTC (permalink / raw)
  To: George Dunlap, Julien Grall, Ian Campbell
  Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan,
	george.dunlap, xen-devel

On 21/07/14 11:33, George Dunlap wrote:
> On 07/18/2014 09:26 PM, Julien Grall wrote:
>>
>> On 18/07/14 17:39, Ian Campbell wrote:
>>> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote:
>>>> Hi all,
>>>>
>>>> I've been played with the function alloc_vcpu on ARM. And I hit one
>>>> case
>>>> where this function can failed.
>>>>
>>>> During domain creation, the toolstack will call DOMCTL_max_vcpus
>>>> which may
>>>> fail, for instance because alloc_vcpu didn't succeed. In this case,
>>>> the
>>>> toolstack will call DOMCTL_domaindestroy. And I got the below stack
>>>> trace.
>>>>
>>>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by
>>>> returning
>>>> in an error in vcpu_initialize.
>>>>
>>>> I'm not sure how to correctly fix it.
>>> I think a simple check at the head of the function would be ok.
>>>
>>> Alternatively perhaps in sched_mode_domain, which could either detect
>>> this or could detect a domain in pool0 being moved to pool0 and short
>>> circuit.
>> I was thinking about the small fix below. If it's fine for everyone,
>> I can
>> send a patch next week.
>>
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index e9eb0bc..c44d047 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct
>> cpupool *c)
>>       }
>>         /* Do we have vcpus already? If not, no need to update
>> node-affinity */
>> -    if ( d->vcpu )
>> +    if ( d->vcpu && d->vcpu[0] != NULL )
>>           domain_update_node_affinity(d);
>
> So is the problem that we're allocating the vcpu array area, but not
> putting any vcpus in it?

The problem (as I recall) was that domain_create() got midway through
and alloc_vcpu(0) failed with -ENOMEM.  Following that failure, the
toolstack called domain_destroy().

Having d->vcpu properly allocated and containing fully NULL pointers is
a valid position to be in, especial in error or teardown paths.

>
> Overall it seems like those checks for the existence of cpus should be
> moved into domain_update_node_affinity().  The ASSERT() there I think
> is just a sanity check to make sure we're not getting a ridiculous
> result out of our calculation; but of course if there actually are no
> vcpus, it's not ridiculous at all.
>
> One solution might be to change the ASSERT to
> ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]).  Then
> we could probably even remove the d->vcpu conditional when calling it.

If you were going along this line, the pointer checks are substantially
less expensive than cpumask_empty(), so the ||'s should be reordered. 
However, I am not convinced that it is necessarily the best solution,
given my previous observation.

~Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Xen crashing when killing a domain with no VCPUs allocated
  2014-07-21 10:42       ` Andrew Cooper
@ 2014-07-21 10:49         ` George Dunlap
  0 siblings, 0 replies; 11+ messages in thread
From: George Dunlap @ 2014-07-21 10:49 UTC (permalink / raw)
  To: Andrew Cooper, Julien Grall, Ian Campbell
  Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan,
	george.dunlap, xen-devel

On 07/21/2014 11:42 AM, Andrew Cooper wrote:
> On 21/07/14 11:33, George Dunlap wrote:
>> On 07/18/2014 09:26 PM, Julien Grall wrote:
>>> On 18/07/14 17:39, Ian Campbell wrote:
>>>> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote:
>>>>> Hi all,
>>>>>
>>>>> I've been played with the function alloc_vcpu on ARM. And I hit one
>>>>> case
>>>>> where this function can failed.
>>>>>
>>>>> During domain creation, the toolstack will call DOMCTL_max_vcpus
>>>>> which may
>>>>> fail, for instance because alloc_vcpu didn't succeed. In this case,
>>>>> the
>>>>> toolstack will call DOMCTL_domaindestroy. And I got the below stack
>>>>> trace.
>>>>>
>>>>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by
>>>>> returning
>>>>> in an error in vcpu_initialize.
>>>>>
>>>>> I'm not sure how to correctly fix it.
>>>> I think a simple check at the head of the function would be ok.
>>>>
>>>> Alternatively perhaps in sched_mode_domain, which could either detect
>>>> this or could detect a domain in pool0 being moved to pool0 and short
>>>> circuit.
>>> I was thinking about the small fix below. If it's fine for everyone,
>>> I can
>>> send a patch next week.
>>>
>>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>>> index e9eb0bc..c44d047 100644
>>> --- a/xen/common/schedule.c
>>> +++ b/xen/common/schedule.c
>>> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct
>>> cpupool *c)
>>>        }
>>>          /* Do we have vcpus already? If not, no need to update
>>> node-affinity */
>>> -    if ( d->vcpu )
>>> +    if ( d->vcpu && d->vcpu[0] != NULL )
>>>            domain_update_node_affinity(d);
>> So is the problem that we're allocating the vcpu array area, but not
>> putting any vcpus in it?
> The problem (as I recall) was that domain_create() got midway through
> and alloc_vcpu(0) failed with -ENOMEM.  Following that failure, the
> toolstack called domain_destroy().
>
> Having d->vcpu properly allocated and containing fully NULL pointers is
> a valid position to be in, especial in error or teardown paths.
>
>> Overall it seems like those checks for the existence of cpus should be
>> moved into domain_update_node_affinity().  The ASSERT() there I think
>> is just a sanity check to make sure we're not getting a ridiculous
>> result out of our calculation; but of course if there actually are no
>> vcpus, it's not ridiculous at all.
>>
>> One solution might be to change the ASSERT to
>> ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]).  Then
>> we could probably even remove the d->vcpu conditional when calling it.
> If you were going along this line, the pointer checks are substantially
> less expensive than cpumask_empty(), so the ||'s should be reordered.
> However, I am not convinced that it is necessarily the best solution,
> given my previous observation.

Er, I was with you until the last part.  What's wrong with changing the 
assert from "Make sure I have *something* in there" to "Make sure I have 
*something* in there *if I have any vcpus*"?  That seems to be accepting 
that having d->vcpu allocated but full of null pointers is a valid 
condition.

  -George

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Xen crashing when killing a domain with no VCPUs allocated
  2014-07-21 10:33     ` George Dunlap
  2014-07-21 10:42       ` Andrew Cooper
@ 2014-07-21 11:46       ` Julien Grall
  2014-07-21 12:57         ` Dario Faggioli
  1 sibling, 1 reply; 11+ messages in thread
From: Julien Grall @ 2014-07-21 11:46 UTC (permalink / raw)
  To: George Dunlap, Ian Campbell
  Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan,
	george.dunlap, xen-devel

Hi George,

On 07/21/2014 11:33 AM, George Dunlap wrote:
> On 07/18/2014 09:26 PM, Julien Grall wrote:
>>
>> On 18/07/14 17:39, Ian Campbell wrote:
>>> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote:
>>>> Hi all,
>>>>
>>>> I've been played with the function alloc_vcpu on ARM. And I hit one
>>>> case
>>>> where this function can failed.
>>>>
>>>> During domain creation, the toolstack will call DOMCTL_max_vcpus
>>>> which may
>>>> fail, for instance because alloc_vcpu didn't succeed. In this case, the
>>>> toolstack will call DOMCTL_domaindestroy. And I got the below stack
>>>> trace.
>>>>
>>>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by
>>>> returning
>>>> in an error in vcpu_initialize.
>>>>
>>>> I'm not sure how to correctly fix it.
>>> I think a simple check at the head of the function would be ok.
>>>
>>> Alternatively perhaps in sched_mode_domain, which could either detect
>>> this or could detect a domain in pool0 being moved to pool0 and short
>>> circuit.
>> I was thinking about the small fix below. If it's fine for everyone, I
>> can
>> send a patch next week.
>>
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index e9eb0bc..c44d047 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct
>> cpupool *c)
>>       }
>>         /* Do we have vcpus already? If not, no need to update
>> node-affinity */
>> -    if ( d->vcpu )
>> +    if ( d->vcpu && d->vcpu[0] != NULL )
>>           domain_update_node_affinity(d);
> 
> So is the problem that we're allocating the vcpu array area, but not
> putting any vcpus in it?

Yes.


> Overall it seems like those checks for the existence of cpus should be
> moved into domain_update_node_affinity().  The ASSERT() there I think is
> just a sanity check to make sure we're not getting a ridiculous result
> out of our calculation; but of course if there actually are no vcpus,
> it's not ridiculous at all.
> 
> One solution might be to change the ASSERT to
> ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]).  Then we
> could probably even remove the d->vcpu conditional when calling it.

This solution also works for me. Which change do you prefer?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Xen crashing when killing a domain with no VCPUs allocated
  2014-07-21 11:46       ` Julien Grall
@ 2014-07-21 12:57         ` Dario Faggioli
  2014-07-23 15:31           ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Dario Faggioli @ 2014-07-21 12:57 UTC (permalink / raw)
  To: Julien Grall
  Cc: jgross, Ian Campbell, Stefano Stabellini, George Dunlap,
	Tim Deegan, george.dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1711 bytes --]

On lun, 2014-07-21 at 12:46 +0100, Julien Grall wrote:
> On 07/21/2014 11:33 AM, George Dunlap wrote:
> > On 07/18/2014 09:26 PM, Julien Grall wrote:

> >> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> >> index e9eb0bc..c44d047 100644
> >> --- a/xen/common/schedule.c
> >> +++ b/xen/common/schedule.c
> >> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct
> >> cpupool *c)
> >>       }
> >>         /* Do we have vcpus already? If not, no need to update
> >> node-affinity */
> >> -    if ( d->vcpu )
> >> +    if ( d->vcpu && d->vcpu[0] != NULL )
> >>           domain_update_node_affinity(d);
> > 

> > Overall it seems like those checks for the existence of cpus should be
> > moved into domain_update_node_affinity().  The ASSERT() there I think is
> > just a sanity check to make sure we're not getting a ridiculous result
> > out of our calculation; but of course if there actually are no vcpus,
> > it's not ridiculous at all.
> > 
> > One solution might be to change the ASSERT to
> > ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]).  Then we
> > could probably even remove the d->vcpu conditional when calling it.
> 
> This solution also works for me. Which change do you prefer?
> 
FWIW, I think I like changing the ASSERT() in
domain_update_node_affinity(), as George suggested (and perhaps with the
reordering Andrew suggested) better.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Xen crashing when killing a domain with no VCPUs allocated
  2014-07-21 12:57         ` Dario Faggioli
@ 2014-07-23 15:31           ` Jan Beulich
  2014-07-24 14:04             ` Julien Grall
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2014-07-23 15:31 UTC (permalink / raw)
  To: Dario Faggioli, Julien Grall
  Cc: Juergen Gross, Ian Campbell, Stefano Stabellini, George Dunlap,
	Tim Deegan, george.dunlap, xen-devel

>>> On 21.07.14 at 14:57, <dario.faggioli@citrix.com> wrote:
> On lun, 2014-07-21 at 12:46 +0100, Julien Grall wrote:
>> On 07/21/2014 11:33 AM, George Dunlap wrote:
>> > On 07/18/2014 09:26 PM, Julien Grall wrote:
> 
>> >> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> >> index e9eb0bc..c44d047 100644
>> >> --- a/xen/common/schedule.c
>> >> +++ b/xen/common/schedule.c
>> >> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct
>> >> cpupool *c)
>> >>       }
>> >>         /* Do we have vcpus already? If not, no need to update
>> >> node-affinity */
>> >> -    if ( d->vcpu )
>> >> +    if ( d->vcpu && d->vcpu[0] != NULL )
>> >>           domain_update_node_affinity(d);
>> > 
> 
>> > Overall it seems like those checks for the existence of cpus should be
>> > moved into domain_update_node_affinity().  The ASSERT() there I think is
>> > just a sanity check to make sure we're not getting a ridiculous result
>> > out of our calculation; but of course if there actually are no vcpus,
>> > it's not ridiculous at all.
>> > 
>> > One solution might be to change the ASSERT to
>> > ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]).  Then we
>> > could probably even remove the d->vcpu conditional when calling it.
>> 
>> This solution also works for me. Which change do you prefer?
>> 
> FWIW, I think I like changing the ASSERT() in
> domain_update_node_affinity(), as George suggested (and perhaps with the
> reordering Andrew suggested) better.

+1

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Xen crashing when killing a domain with no VCPUs allocated
  2014-07-23 15:31           ` Jan Beulich
@ 2014-07-24 14:04             ` Julien Grall
  0 siblings, 0 replies; 11+ messages in thread
From: Julien Grall @ 2014-07-24 14:04 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli
  Cc: Juergen Gross, Ian Campbell, Stefano Stabellini, George Dunlap,
	Tim Deegan, george.dunlap, xen-devel

Hi,

On 07/23/2014 04:31 PM, Jan Beulich wrote:
>>>> On 21.07.14 at 14:57, <dario.faggioli@citrix.com> wrote:
>> On lun, 2014-07-21 at 12:46 +0100, Julien Grall wrote:
>>> On 07/21/2014 11:33 AM, George Dunlap wrote:
>>>> On 07/18/2014 09:26 PM, Julien Grall wrote:
>>
>>>>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>>>>> index e9eb0bc..c44d047 100644
>>>>> --- a/xen/common/schedule.c
>>>>> +++ b/xen/common/schedule.c
>>>>> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct
>>>>> cpupool *c)
>>>>>       }
>>>>>         /* Do we have vcpus already? If not, no need to update
>>>>> node-affinity */
>>>>> -    if ( d->vcpu )
>>>>> +    if ( d->vcpu && d->vcpu[0] != NULL )
>>>>>           domain_update_node_affinity(d);
>>>>
>>
>>>> Overall it seems like those checks for the existence of cpus should be
>>>> moved into domain_update_node_affinity().  The ASSERT() there I think is
>>>> just a sanity check to make sure we're not getting a ridiculous result
>>>> out of our calculation; but of course if there actually are no vcpus,
>>>> it's not ridiculous at all.
>>>>
>>>> One solution might be to change the ASSERT to
>>>> ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]).  Then we
>>>> could probably even remove the d->vcpu conditional when calling it.
>>>
>>> This solution also works for me. Which change do you prefer?
>>>
>> FWIW, I think I like changing the ASSERT() in
>> domain_update_node_affinity(), as George suggested (and perhaps with the
>> reordering Andrew suggested) better.
> 
> +1

Thanks. I will send a patch during the next couple days to fix this issue.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-07-24 14:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-18 13:27 Xen crashing when killing a domain with no VCPUs allocated Julien Grall
2014-07-18 16:39 ` Ian Campbell
2014-07-18 20:26   ` Julien Grall
2014-07-21 10:33     ` George Dunlap
2014-07-21 10:42       ` Andrew Cooper
2014-07-21 10:49         ` George Dunlap
2014-07-21 11:46       ` Julien Grall
2014-07-21 12:57         ` Dario Faggioli
2014-07-23 15:31           ` Jan Beulich
2014-07-24 14:04             ` Julien Grall
2014-07-21 10:12   ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.