* Xen crashing when killing a domain with no VCPUs allocated @ 2014-07-18 13:27 Julien Grall 2014-07-18 16:39 ` Ian Campbell 0 siblings, 1 reply; 11+ messages in thread From: Julien Grall @ 2014-07-18 13:27 UTC (permalink / raw) To: xen-devel, george.dunlap, Dario Faggioli, Ian Campbell, jgross Cc: Tim Deegan, Stefano Stabellini Hi all, I've been played with the function alloc_vcpu on ARM. And I hit one case where this function can failed. During domain creation, the toolstack will call DOMCTL_max_vcpus which may fail, for instance because alloc_vcpu didn't succeed. In this case, the toolstack will call DOMCTL_domaindestroy. And I got the below stack trace. It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning in an error in vcpu_initialize. I'm not sure how to correctly fix it. Regards, (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:452 (XEN) ----[ Xen-4.5-unstable arm32 debug=y Tainted: C ]---- (XEN) CPU: 1 (XEN) PC: 00207bd8 domain_update_node_affinity+0x10c/0x238 (XEN) CPSR: 2007015a MODE:Hypervisor (XEN) R0: 00000001 R1: 00000080 R2: 7ffce7c4 R3: 00000004 (XEN) R4: 00000000 R5: 7ffce7b8 R6: 7ffce7a8 R7: 7ffce7d0 (XEN) R8: 7ffce7b8 R9: 7ffcf000 R10:4000ed08 R11:7ffd7d64 R12:00000000 (XEN) HYP: SP: 7ffd7d34 LR: 00000004 (XEN) (XEN) VTCR_EL2: 80003558 (XEN) VTTBR_EL2: 00010002f9ffc000 (XEN) (XEN) SCTLR_EL2: 30cd187f (XEN) HCR_EL2: 000000000038643f (XEN) TTBR0_EL2: 00000000fdfe5000 (XEN) (XEN) ESR_EL2: 00000000 (XEN) HPFAR_EL2: 0000000000fff110 (XEN) HDFAR: a0800f00 (XEN) HIFAR: 00000000 (XEN) (XEN) Xen stack trace from sp=7ffd7d34: (XEN) 00207bd0 00000004 7ffcf6a8 4000ece0 00000000 00000000 7ffce7a8 4000ece0 (XEN) 00000ea1 9cd1e000 7ecc0ff8 7ffd7dac 00226870 00305000 7ffce758 9cd1e000 (XEN) 7ecc0ff8 00270e00 0024ec44 00000000 7ffcf000 fffff000 7ffcf000 76efb004 (XEN) 00000000 00305000 00000ea1 9cd1e000 7ecc0ff8 7ffd7dc4 0020925c 7ffcf000 (XEN) 76efb004 00000000 00305000 7ffd7edc 00206a0c 7ffd7e0c 7ffd7e50 00000004 (XEN) 7ffd7dec 7ffd7e0c 00001800 774623f9 00000000 00000000 00000000 00007747 (XEN) 0000bc4e 7ffd7ea4 00000000 00000000 00008f0d 00001800 00000000 40021578 (XEN) 40021000 40021560 400218e4 40021948 00000001 00000002 0000000a ffff0001 (XEN) 00000000 76e32110 76f01680 7ecc10dc 76eeea11 76efd140 00000001 00000001 (XEN) 00000000 00000000 76d24228 00000000 00031dd8 00037840 00000000 76efc000 (XEN) 76e57000 0006f73c 00000000 00000001 00031330 7ecc111c 76eeea11 00000000 (XEN) 00000001 00000001 00000000 7ecc10cc 76e32110 00000001 00037840 00030030 (XEN) 00000001 7ffd7f58 7ffd7f58 8000dcb4 00000005 00305000 00000ea1 9cd1e000 (XEN) 7ecc0ff8 7ffd7f54 002529c0 5a962879 002f7594 7ffd7f3c 7ffd7f3c 002c1ff4 (XEN) 200e01da 00000004 00270b80 002c1ff0 002f4244 7ffd7f3c 7ffd7f3c 00000004 (XEN) 40021000 1f680000 002be000 002f7594 30c23c7d 80599e7c 20003010 413fc0f2 (XEN) 5a962879 9e9ac400 00000005 00305000 00000005 9cd1e000 7ecc0ff8 7ffd7f58 (XEN) 002559f0 76efb004 00000001 000000f6 00000000 5a962879 9e9ac400 00000005 (XEN) 00305000 00000005 9cd1e000 7ecc0ff8 9f1309b8 00000024 ffffffff 76e4b130 (XEN) 8000dcb4 60070013 00000000 7ecc0fc4 80599d40 8001ebe0 9cd1feb4 80210604 (XEN) Xen call trace: (XEN) [<00207bd8>] domain_update_node_affinity+0x10c/0x238 (PC) (XEN) [<00000004>] 00000004 (LR) (XEN) [<00226870>] sched_move_domain+0x3cc/0x42c (XEN) [<0020925c>] domain_kill+0xc8/0x178 (XEN) [<00206a0c>] do_domctl+0xaac/0x15e4 (XEN) [<002529c0>] do_trap_hypervisor+0xc5c/0xf94 (XEN) [<002559f0>] return_from_trap+0/0x4 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 1: (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:452 (XEN) **************************************** -- Julien Grall ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Xen crashing when killing a domain with no VCPUs allocated 2014-07-18 13:27 Xen crashing when killing a domain with no VCPUs allocated Julien Grall @ 2014-07-18 16:39 ` Ian Campbell 2014-07-18 20:26 ` Julien Grall 2014-07-21 10:12 ` George Dunlap 0 siblings, 2 replies; 11+ messages in thread From: Ian Campbell @ 2014-07-18 16:39 UTC (permalink / raw) To: Julien Grall Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan, george.dunlap, xen-devel On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote: > Hi all, > > I've been played with the function alloc_vcpu on ARM. And I hit one case > where this function can failed. > > During domain creation, the toolstack will call DOMCTL_max_vcpus which may > fail, for instance because alloc_vcpu didn't succeed. In this case, the > toolstack will call DOMCTL_domaindestroy. And I got the below stack trace. > > It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning > in an error in vcpu_initialize. > > I'm not sure how to correctly fix it. I think a simple check at the head of the function would be ok. Alternatively perhaps in sched_mode_domain, which could either detect this or could detect a domain in pool0 being moved to pool0 and short circuit. [...] > (XEN) [<00226870>] sched_move_domain+0x3cc/0x42c > (XEN) [<0020925c>] domain_kill+0xc8/0x178 This call path surprised me but it is from: commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a Author: Juergen Gross <juergen.gross@ts.fujitsu.com> Date: Tue May 20 15:55:42 2014 +0200 move domain to cpupool0 before destroying it Currently when a domain is destroyed it is removed from the domain_list before all of it's resources, including the cpupool membership, are freed. This can lead to a situation where the domain is still member of a cpupool without for_each_domain_in_cpupool() (or even for_each_domain()) being able to find it any more. This in turn can result in rejection of removing the last cpu from a cpupool, because there seems to be still a domain in the cpupool, even if it can't be found by scanning through all domains. This situation can be avoided by moving the domain to be destroyed to cpupool0 first and then remove it from this cpupool BEFORE deleting it from the domain_list. As cpupool0 is always active and a domain without any cpupool membership is implicitly regarded as belonging to cpupool0, this poses no problem. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Xen crashing when killing a domain with no VCPUs allocated 2014-07-18 16:39 ` Ian Campbell @ 2014-07-18 20:26 ` Julien Grall 2014-07-21 10:33 ` George Dunlap 2014-07-21 10:12 ` George Dunlap 1 sibling, 1 reply; 11+ messages in thread From: Julien Grall @ 2014-07-18 20:26 UTC (permalink / raw) To: Ian Campbell Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan, george.dunlap, xen-devel On 18/07/14 17:39, Ian Campbell wrote: > On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote: >> Hi all, >> >> I've been played with the function alloc_vcpu on ARM. And I hit one case >> where this function can failed. >> >> During domain creation, the toolstack will call DOMCTL_max_vcpus which may >> fail, for instance because alloc_vcpu didn't succeed. In this case, the >> toolstack will call DOMCTL_domaindestroy. And I got the below stack trace. >> >> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning >> in an error in vcpu_initialize. >> >> I'm not sure how to correctly fix it. > > I think a simple check at the head of the function would be ok. > > Alternatively perhaps in sched_mode_domain, which could either detect > this or could detect a domain in pool0 being moved to pool0 and short > circuit. I was thinking about the small fix below. If it's fine for everyone, I can send a patch next week. diff --git a/xen/common/schedule.c b/xen/common/schedule.c index e9eb0bc..c44d047 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) } /* Do we have vcpus already? If not, no need to update node-affinity */ - if ( d->vcpu ) + if ( d->vcpu && d->vcpu[0] != NULL ) domain_update_node_affinity(d); domain_unpause(d); Regards, -- Julien Grall ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Xen crashing when killing a domain with no VCPUs allocated 2014-07-18 20:26 ` Julien Grall @ 2014-07-21 10:33 ` George Dunlap 2014-07-21 10:42 ` Andrew Cooper 2014-07-21 11:46 ` Julien Grall 0 siblings, 2 replies; 11+ messages in thread From: George Dunlap @ 2014-07-21 10:33 UTC (permalink / raw) To: Julien Grall, Ian Campbell Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan, george.dunlap, xen-devel On 07/18/2014 09:26 PM, Julien Grall wrote: > > On 18/07/14 17:39, Ian Campbell wrote: >> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote: >>> Hi all, >>> >>> I've been played with the function alloc_vcpu on ARM. And I hit one case >>> where this function can failed. >>> >>> During domain creation, the toolstack will call DOMCTL_max_vcpus which may >>> fail, for instance because alloc_vcpu didn't succeed. In this case, the >>> toolstack will call DOMCTL_domaindestroy. And I got the below stack trace. >>> >>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning >>> in an error in vcpu_initialize. >>> >>> I'm not sure how to correctly fix it. >> I think a simple check at the head of the function would be ok. >> >> Alternatively perhaps in sched_mode_domain, which could either detect >> this or could detect a domain in pool0 being moved to pool0 and short >> circuit. > I was thinking about the small fix below. If it's fine for everyone, I can > send a patch next week. > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index e9eb0bc..c44d047 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) > } > > /* Do we have vcpus already? If not, no need to update node-affinity */ > - if ( d->vcpu ) > + if ( d->vcpu && d->vcpu[0] != NULL ) > domain_update_node_affinity(d); So is the problem that we're allocating the vcpu array area, but not putting any vcpus in it? Overall it seems like those checks for the existence of cpus should be moved into domain_update_node_affinity(). The ASSERT() there I think is just a sanity check to make sure we're not getting a ridiculous result out of our calculation; but of course if there actually are no vcpus, it's not ridiculous at all. One solution might be to change the ASSERT to ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]). Then we could probably even remove the d->vcpu conditional when calling it. -George ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Xen crashing when killing a domain with no VCPUs allocated 2014-07-21 10:33 ` George Dunlap @ 2014-07-21 10:42 ` Andrew Cooper 2014-07-21 10:49 ` George Dunlap 2014-07-21 11:46 ` Julien Grall 1 sibling, 1 reply; 11+ messages in thread From: Andrew Cooper @ 2014-07-21 10:42 UTC (permalink / raw) To: George Dunlap, Julien Grall, Ian Campbell Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan, george.dunlap, xen-devel On 21/07/14 11:33, George Dunlap wrote: > On 07/18/2014 09:26 PM, Julien Grall wrote: >> >> On 18/07/14 17:39, Ian Campbell wrote: >>> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote: >>>> Hi all, >>>> >>>> I've been played with the function alloc_vcpu on ARM. And I hit one >>>> case >>>> where this function can failed. >>>> >>>> During domain creation, the toolstack will call DOMCTL_max_vcpus >>>> which may >>>> fail, for instance because alloc_vcpu didn't succeed. In this case, >>>> the >>>> toolstack will call DOMCTL_domaindestroy. And I got the below stack >>>> trace. >>>> >>>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by >>>> returning >>>> in an error in vcpu_initialize. >>>> >>>> I'm not sure how to correctly fix it. >>> I think a simple check at the head of the function would be ok. >>> >>> Alternatively perhaps in sched_mode_domain, which could either detect >>> this or could detect a domain in pool0 being moved to pool0 and short >>> circuit. >> I was thinking about the small fix below. If it's fine for everyone, >> I can >> send a patch next week. >> >> diff --git a/xen/common/schedule.c b/xen/common/schedule.c >> index e9eb0bc..c44d047 100644 >> --- a/xen/common/schedule.c >> +++ b/xen/common/schedule.c >> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct >> cpupool *c) >> } >> /* Do we have vcpus already? If not, no need to update >> node-affinity */ >> - if ( d->vcpu ) >> + if ( d->vcpu && d->vcpu[0] != NULL ) >> domain_update_node_affinity(d); > > So is the problem that we're allocating the vcpu array area, but not > putting any vcpus in it? The problem (as I recall) was that domain_create() got midway through and alloc_vcpu(0) failed with -ENOMEM. Following that failure, the toolstack called domain_destroy(). Having d->vcpu properly allocated and containing fully NULL pointers is a valid position to be in, especial in error or teardown paths. > > Overall it seems like those checks for the existence of cpus should be > moved into domain_update_node_affinity(). The ASSERT() there I think > is just a sanity check to make sure we're not getting a ridiculous > result out of our calculation; but of course if there actually are no > vcpus, it's not ridiculous at all. > > One solution might be to change the ASSERT to > ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]). Then > we could probably even remove the d->vcpu conditional when calling it. If you were going along this line, the pointer checks are substantially less expensive than cpumask_empty(), so the ||'s should be reordered. However, I am not convinced that it is necessarily the best solution, given my previous observation. ~Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Xen crashing when killing a domain with no VCPUs allocated 2014-07-21 10:42 ` Andrew Cooper @ 2014-07-21 10:49 ` George Dunlap 0 siblings, 0 replies; 11+ messages in thread From: George Dunlap @ 2014-07-21 10:49 UTC (permalink / raw) To: Andrew Cooper, Julien Grall, Ian Campbell Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan, george.dunlap, xen-devel On 07/21/2014 11:42 AM, Andrew Cooper wrote: > On 21/07/14 11:33, George Dunlap wrote: >> On 07/18/2014 09:26 PM, Julien Grall wrote: >>> On 18/07/14 17:39, Ian Campbell wrote: >>>> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote: >>>>> Hi all, >>>>> >>>>> I've been played with the function alloc_vcpu on ARM. And I hit one >>>>> case >>>>> where this function can failed. >>>>> >>>>> During domain creation, the toolstack will call DOMCTL_max_vcpus >>>>> which may >>>>> fail, for instance because alloc_vcpu didn't succeed. In this case, >>>>> the >>>>> toolstack will call DOMCTL_domaindestroy. And I got the below stack >>>>> trace. >>>>> >>>>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by >>>>> returning >>>>> in an error in vcpu_initialize. >>>>> >>>>> I'm not sure how to correctly fix it. >>>> I think a simple check at the head of the function would be ok. >>>> >>>> Alternatively perhaps in sched_mode_domain, which could either detect >>>> this or could detect a domain in pool0 being moved to pool0 and short >>>> circuit. >>> I was thinking about the small fix below. If it's fine for everyone, >>> I can >>> send a patch next week. >>> >>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c >>> index e9eb0bc..c44d047 100644 >>> --- a/xen/common/schedule.c >>> +++ b/xen/common/schedule.c >>> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct >>> cpupool *c) >>> } >>> /* Do we have vcpus already? If not, no need to update >>> node-affinity */ >>> - if ( d->vcpu ) >>> + if ( d->vcpu && d->vcpu[0] != NULL ) >>> domain_update_node_affinity(d); >> So is the problem that we're allocating the vcpu array area, but not >> putting any vcpus in it? > The problem (as I recall) was that domain_create() got midway through > and alloc_vcpu(0) failed with -ENOMEM. Following that failure, the > toolstack called domain_destroy(). > > Having d->vcpu properly allocated and containing fully NULL pointers is > a valid position to be in, especial in error or teardown paths. > >> Overall it seems like those checks for the existence of cpus should be >> moved into domain_update_node_affinity(). The ASSERT() there I think >> is just a sanity check to make sure we're not getting a ridiculous >> result out of our calculation; but of course if there actually are no >> vcpus, it's not ridiculous at all. >> >> One solution might be to change the ASSERT to >> ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]). Then >> we could probably even remove the d->vcpu conditional when calling it. > If you were going along this line, the pointer checks are substantially > less expensive than cpumask_empty(), so the ||'s should be reordered. > However, I am not convinced that it is necessarily the best solution, > given my previous observation. Er, I was with you until the last part. What's wrong with changing the assert from "Make sure I have *something* in there" to "Make sure I have *something* in there *if I have any vcpus*"? That seems to be accepting that having d->vcpu allocated but full of null pointers is a valid condition. -George ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Xen crashing when killing a domain with no VCPUs allocated 2014-07-21 10:33 ` George Dunlap 2014-07-21 10:42 ` Andrew Cooper @ 2014-07-21 11:46 ` Julien Grall 2014-07-21 12:57 ` Dario Faggioli 1 sibling, 1 reply; 11+ messages in thread From: Julien Grall @ 2014-07-21 11:46 UTC (permalink / raw) To: George Dunlap, Ian Campbell Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan, george.dunlap, xen-devel Hi George, On 07/21/2014 11:33 AM, George Dunlap wrote: > On 07/18/2014 09:26 PM, Julien Grall wrote: >> >> On 18/07/14 17:39, Ian Campbell wrote: >>> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote: >>>> Hi all, >>>> >>>> I've been played with the function alloc_vcpu on ARM. And I hit one >>>> case >>>> where this function can failed. >>>> >>>> During domain creation, the toolstack will call DOMCTL_max_vcpus >>>> which may >>>> fail, for instance because alloc_vcpu didn't succeed. In this case, the >>>> toolstack will call DOMCTL_domaindestroy. And I got the below stack >>>> trace. >>>> >>>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by >>>> returning >>>> in an error in vcpu_initialize. >>>> >>>> I'm not sure how to correctly fix it. >>> I think a simple check at the head of the function would be ok. >>> >>> Alternatively perhaps in sched_mode_domain, which could either detect >>> this or could detect a domain in pool0 being moved to pool0 and short >>> circuit. >> I was thinking about the small fix below. If it's fine for everyone, I >> can >> send a patch next week. >> >> diff --git a/xen/common/schedule.c b/xen/common/schedule.c >> index e9eb0bc..c44d047 100644 >> --- a/xen/common/schedule.c >> +++ b/xen/common/schedule.c >> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct >> cpupool *c) >> } >> /* Do we have vcpus already? If not, no need to update >> node-affinity */ >> - if ( d->vcpu ) >> + if ( d->vcpu && d->vcpu[0] != NULL ) >> domain_update_node_affinity(d); > > So is the problem that we're allocating the vcpu array area, but not > putting any vcpus in it? Yes. > Overall it seems like those checks for the existence of cpus should be > moved into domain_update_node_affinity(). The ASSERT() there I think is > just a sanity check to make sure we're not getting a ridiculous result > out of our calculation; but of course if there actually are no vcpus, > it's not ridiculous at all. > > One solution might be to change the ASSERT to > ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]). Then we > could probably even remove the d->vcpu conditional when calling it. This solution also works for me. Which change do you prefer? Regards, -- Julien Grall ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Xen crashing when killing a domain with no VCPUs allocated 2014-07-21 11:46 ` Julien Grall @ 2014-07-21 12:57 ` Dario Faggioli 2014-07-23 15:31 ` Jan Beulich 0 siblings, 1 reply; 11+ messages in thread From: Dario Faggioli @ 2014-07-21 12:57 UTC (permalink / raw) To: Julien Grall Cc: jgross, Ian Campbell, Stefano Stabellini, George Dunlap, Tim Deegan, george.dunlap, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1711 bytes --] On lun, 2014-07-21 at 12:46 +0100, Julien Grall wrote: > On 07/21/2014 11:33 AM, George Dunlap wrote: > > On 07/18/2014 09:26 PM, Julien Grall wrote: > >> diff --git a/xen/common/schedule.c b/xen/common/schedule.c > >> index e9eb0bc..c44d047 100644 > >> --- a/xen/common/schedule.c > >> +++ b/xen/common/schedule.c > >> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct > >> cpupool *c) > >> } > >> /* Do we have vcpus already? If not, no need to update > >> node-affinity */ > >> - if ( d->vcpu ) > >> + if ( d->vcpu && d->vcpu[0] != NULL ) > >> domain_update_node_affinity(d); > > > > Overall it seems like those checks for the existence of cpus should be > > moved into domain_update_node_affinity(). The ASSERT() there I think is > > just a sanity check to make sure we're not getting a ridiculous result > > out of our calculation; but of course if there actually are no vcpus, > > it's not ridiculous at all. > > > > One solution might be to change the ASSERT to > > ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]). Then we > > could probably even remove the d->vcpu conditional when calling it. > > This solution also works for me. Which change do you prefer? > FWIW, I think I like changing the ASSERT() in domain_update_node_affinity(), as George suggested (and perhaps with the reordering Andrew suggested) better. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Xen crashing when killing a domain with no VCPUs allocated 2014-07-21 12:57 ` Dario Faggioli @ 2014-07-23 15:31 ` Jan Beulich 2014-07-24 14:04 ` Julien Grall 0 siblings, 1 reply; 11+ messages in thread From: Jan Beulich @ 2014-07-23 15:31 UTC (permalink / raw) To: Dario Faggioli, Julien Grall Cc: Juergen Gross, Ian Campbell, Stefano Stabellini, George Dunlap, Tim Deegan, george.dunlap, xen-devel >>> On 21.07.14 at 14:57, <dario.faggioli@citrix.com> wrote: > On lun, 2014-07-21 at 12:46 +0100, Julien Grall wrote: >> On 07/21/2014 11:33 AM, George Dunlap wrote: >> > On 07/18/2014 09:26 PM, Julien Grall wrote: > >> >> diff --git a/xen/common/schedule.c b/xen/common/schedule.c >> >> index e9eb0bc..c44d047 100644 >> >> --- a/xen/common/schedule.c >> >> +++ b/xen/common/schedule.c >> >> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct >> >> cpupool *c) >> >> } >> >> /* Do we have vcpus already? If not, no need to update >> >> node-affinity */ >> >> - if ( d->vcpu ) >> >> + if ( d->vcpu && d->vcpu[0] != NULL ) >> >> domain_update_node_affinity(d); >> > > >> > Overall it seems like those checks for the existence of cpus should be >> > moved into domain_update_node_affinity(). The ASSERT() there I think is >> > just a sanity check to make sure we're not getting a ridiculous result >> > out of our calculation; but of course if there actually are no vcpus, >> > it's not ridiculous at all. >> > >> > One solution might be to change the ASSERT to >> > ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]). Then we >> > could probably even remove the d->vcpu conditional when calling it. >> >> This solution also works for me. Which change do you prefer? >> > FWIW, I think I like changing the ASSERT() in > domain_update_node_affinity(), as George suggested (and perhaps with the > reordering Andrew suggested) better. +1 Jan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Xen crashing when killing a domain with no VCPUs allocated 2014-07-23 15:31 ` Jan Beulich @ 2014-07-24 14:04 ` Julien Grall 0 siblings, 0 replies; 11+ messages in thread From: Julien Grall @ 2014-07-24 14:04 UTC (permalink / raw) To: Jan Beulich, Dario Faggioli Cc: Juergen Gross, Ian Campbell, Stefano Stabellini, George Dunlap, Tim Deegan, george.dunlap, xen-devel Hi, On 07/23/2014 04:31 PM, Jan Beulich wrote: >>>> On 21.07.14 at 14:57, <dario.faggioli@citrix.com> wrote: >> On lun, 2014-07-21 at 12:46 +0100, Julien Grall wrote: >>> On 07/21/2014 11:33 AM, George Dunlap wrote: >>>> On 07/18/2014 09:26 PM, Julien Grall wrote: >> >>>>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c >>>>> index e9eb0bc..c44d047 100644 >>>>> --- a/xen/common/schedule.c >>>>> +++ b/xen/common/schedule.c >>>>> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct >>>>> cpupool *c) >>>>> } >>>>> /* Do we have vcpus already? If not, no need to update >>>>> node-affinity */ >>>>> - if ( d->vcpu ) >>>>> + if ( d->vcpu && d->vcpu[0] != NULL ) >>>>> domain_update_node_affinity(d); >>>> >> >>>> Overall it seems like those checks for the existence of cpus should be >>>> moved into domain_update_node_affinity(). The ASSERT() there I think is >>>> just a sanity check to make sure we're not getting a ridiculous result >>>> out of our calculation; but of course if there actually are no vcpus, >>>> it's not ridiculous at all. >>>> >>>> One solution might be to change the ASSERT to >>>> ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]). Then we >>>> could probably even remove the d->vcpu conditional when calling it. >>> >>> This solution also works for me. Which change do you prefer? >>> >> FWIW, I think I like changing the ASSERT() in >> domain_update_node_affinity(), as George suggested (and perhaps with the >> reordering Andrew suggested) better. > > +1 Thanks. I will send a patch during the next couple days to fix this issue. Regards, -- Julien Grall ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Xen crashing when killing a domain with no VCPUs allocated 2014-07-18 16:39 ` Ian Campbell 2014-07-18 20:26 ` Julien Grall @ 2014-07-21 10:12 ` George Dunlap 1 sibling, 0 replies; 11+ messages in thread From: George Dunlap @ 2014-07-21 10:12 UTC (permalink / raw) To: Ian Campbell, Julien Grall Cc: jgross, Stefano Stabellini, Dario Faggioli, Tim Deegan, george.dunlap, xen-devel, jbeulich On 07/18/2014 05:39 PM, Ian Campbell wrote: > On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote: >> Hi all, >> >> I've been played with the function alloc_vcpu on ARM. And I hit one case >> where this function can failed. >> >> During domain creation, the toolstack will call DOMCTL_max_vcpus which may >> fail, for instance because alloc_vcpu didn't succeed. In this case, the >> toolstack will call DOMCTL_domaindestroy. And I got the below stack trace. >> >> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by returning >> in an error in vcpu_initialize. >> >> I'm not sure how to correctly fix it. > I think a simple check at the head of the function would be ok. > > Alternatively perhaps in sched_mode_domain, which could either detect > this or could detect a domain in pool0 being moved to pool0 and short > circuit. > > [...] >> (XEN) [<00226870>] sched_move_domain+0x3cc/0x42c >> (XEN) [<0020925c>] domain_kill+0xc8/0x178 > This call path surprised me but it is from: > > commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a > Author: Juergen Gross <juergen.gross@ts.fujitsu.com> > Date: Tue May 20 15:55:42 2014 +0200 > > move domain to cpupool0 before destroying it And this is the second unforseen hypervisor crash that happened as a side effect of this patch. Obviously I need to be more of a bastard as a mainainer. -George ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-07-24 14:04 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-07-18 13:27 Xen crashing when killing a domain with no VCPUs allocated Julien Grall 2014-07-18 16:39 ` Ian Campbell 2014-07-18 20:26 ` Julien Grall 2014-07-21 10:33 ` George Dunlap 2014-07-21 10:42 ` Andrew Cooper 2014-07-21 10:49 ` George Dunlap 2014-07-21 11:46 ` Julien Grall 2014-07-21 12:57 ` Dario Faggioli 2014-07-23 15:31 ` Jan Beulich 2014-07-24 14:04 ` Julien Grall 2014-07-21 10:12 ` George Dunlap
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.