From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split Date: Fri, 28 Jan 2011 12:07:09 +0100 Message-ID: <4D42A35D.3050507@amd.com> References: <4D41FD3A.5090506@amd.com> <4D426673.7020200@ts.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4D426673.7020200@ts.fujitsu.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Juergen Gross Cc: "xen-devel@lists.xensource.com" , Ian Jackson , Keir Fraser List-Id: xen-devel@lists.xenproject.org Juergen Gross wrote: > On 01/28/11 00:18, Andre Przywara wrote: >> Hi, >> >> when I boot my machine without restricting Dom0 (dom0_mem= >> dom0_max_vcpus=) I get an _hypervisor_ crash when I run >> # xl cpupool-numa-split >> If Dom0's resources are limited on the Xen cmdline, everything works fine. >> The crashdump points to a scheduling problem with weights, so I assume >> the NUMA distribution algorithm some fools the hypervisor completely. >> >> I will investigate this further tomorrow, but maybe someone has some >> good idea. > > I've seen this once with an older cpupool version on a 24 processor machine. > It was NOT related to NUMA, but did occur only on reboot after a Dom0 panic. > The machine had an init script creating a cpupool and populating it with > cpus. The machine was in a panic loop due to the BUG in sched_acct then until > it was resetted manually. After the reset the problem was gone. > > As I was never able to reproduce the problem later (the same software is > running on dozens of machines!), I assumed there was a problem related to > the first Dom0 panic, may be some destroyed BIOS tables. > > Can the crash be reproduced easily? Yes. If I don't specify dom0_max_vcpus= and dom0_mem= on the Xen cmdline, I can reliably trigger the crash with xl cpupool-numa-split. Omitting dom0_max_vcpus only does not suffice. Will continue after lunch-break ;-) Regards, Andre. > > > Juergen > >> Regards, >> Andre. >> >> root@dosorca:/data/images# xl cpupool-numa-split >> (XEN) Xen BUG at sched_credit.c:990 >> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[] csched_acct+0x11f/0x419 >> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor >> (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx: 0000000000000100 >> (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi: 0000000000000010 >> (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8: 0000000000000100 >> (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11: 000000afc7df0edf >> (XEN) r12: ffff830437ffa5e0 r13: ffff82c480117fd9 r14: ffff830437f9f2e8 >> (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4: 00000000000006f0 >> (XEN) cr3: 000000080df4e000 cr2: ffff88179af79618 >> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) Xen stack trace from rsp=ffff82c480297d80: >> (XEN) 0000000000000282 fffffed4802d3f80 0000000000000eff ffff830437ffa5e0 >> (XEN) ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0 0000000000000282 >> (XEN) ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00 0000000000000000 >> (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c480117fd9 >> (XEN) ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40 ffff82c480125f34 >> (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80 000000afb6f8667f >> (XEN) ffff82c480297e90 ffff82c480126259 ffff82c48024ae20 ffff82c4802d3f80 >> (XEN) ffff830437f9f2e0 0000000000000000 0000000000000000 ffff82c4802b0880 >> (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123327 >> (XEN) ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20 ffff82c480297f18 >> (XEN) 000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801233a2 >> (XEN) ffff82c480297f10 ffff82c4801563f5 0000000000000000 ffff8300c7cd6000 >> (XEN) 0000000000000000 ffff8300c7ad4000 ffff82c480297d48 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8503f10 >> (XEN) ffff8817a8503fd8 0000000000000246 ffff8817a8503e80 ffff880000000001 >> (XEN) 0000000000000000 0000000000000000 ffffffff810093aa 000000aafab2f86e >> (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa >> (XEN) 000000000000e033 0000000000000246 ffff8817a8503ef8 000000000000e02b >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [] csched_acct+0x11f/0x419 >> (XEN) [] execute_timer+0x4e/0x6c >> (XEN) [] timer_softirq_action+0xf2/0x245 >> (XEN) [] __do_softirq+0x88/0x99 >> (XEN) [] do_softirq+0x6a/0x7a >> (XEN) [] idle_loop+0x6a/0x6f >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Xen BUG at sched_credit.c:990 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> >> > > -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712