All of lore.kernel.org
 help / color / mirror / Atom feed
* Arbitrary reboot with xen 3.4.x
@ 2009-11-19 18:06 Guillaume Rousse
  2009-11-19 18:09 ` Pasi Kärkkäinen
  0 siblings, 1 reply; 7+ messages in thread
From: Guillaume Rousse @ 2009-11-19 18:06 UTC (permalink / raw)
  To: xen-devel

Hello.

I've a dom0 working perfectly under xen 3.3.x, with a bout 15 HVM domU. 
When migrating to xen 3.4.1, with the same dom0 kernel (2.6.27.37), 
everything seems to be fine, I can launch the various hosts, but 5 to 10 
minutes later, the host violently reboot... I can't find any trace in 
the logs. I do have a second host with the same configuration and setup, 
and the result is similar. It seems to be linked with domU activity, 
because without any domU, or without any domU with actual activity, I 
don't have any reboot. I had to rollback to xen 3.3.0.

I already attempted such upgrade to xen 3.4.0 this summer, with exactly 
the same result.

It seems like an hardware issue (but it doesn't appears with 3.3.0), or 
a crash in the hypervisor, than syslog is unable to catch when it 
appears. How can I try to get a trace ?
-- 
BOFH excuse #248:

Too much radiation coming from the soil.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Arbitrary reboot with xen 3.4.x
  2009-11-19 18:06 Arbitrary reboot with xen 3.4.x Guillaume Rousse
@ 2009-11-19 18:09 ` Pasi Kärkkäinen
  2009-11-20 10:42   ` Guillaume Rousse
  0 siblings, 1 reply; 7+ messages in thread
From: Pasi Kärkkäinen @ 2009-11-19 18:09 UTC (permalink / raw)
  To: Guillaume Rousse; +Cc: xen-devel

On Thu, Nov 19, 2009 at 07:06:56PM +0100, Guillaume Rousse wrote:
> Hello.
> 
> I've a dom0 working perfectly under xen 3.3.x, with a bout 15 HVM domU. 
> When migrating to xen 3.4.1, with the same dom0 kernel (2.6.27.37), 
> everything seems to be fine, I can launch the various hosts, but 5 to 10 
> minutes later, the host violently reboot... I can't find any trace in 
> the logs. I do have a second host with the same configuration and setup, 
> and the result is similar. It seems to be linked with domU activity, 
> because without any domU, or without any domU with actual activity, I 
> don't have any reboot. I had to rollback to xen 3.3.0.
> 

Did you try the new Xen 3.4.2 ?

> I already attempted such upgrade to xen 3.4.0 this summer, with exactly 
> the same result.
> 

Ok..

> It seems like an hardware issue (but it doesn't appears with 3.3.0), or 
> a crash in the hypervisor, than syslog is unable to catch when it 
> appears. How can I try to get a trace ?
>

You should setup a serial console, so you can capture and
log the full console (xen + dom0 kernel) output to other computer..

-- Pasi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Arbitrary reboot with xen 3.4.x
  2009-11-19 18:09 ` Pasi Kärkkäinen
@ 2009-11-20 10:42   ` Guillaume Rousse
  2009-11-20 13:35     ` Pasi Kärkkäinen
  0 siblings, 1 reply; 7+ messages in thread
From: Guillaume Rousse @ 2009-11-20 10:42 UTC (permalink / raw)
  To: Pasi Kärkkäinen; +Cc: xen-devel

Pasi Kärkkäinen a écrit :
> On Thu, Nov 19, 2009 at 07:06:56PM +0100, Guillaume Rousse wrote:
>> Hello.
>>
>> I've a dom0 working perfectly under xen 3.3.x, with a bout 15 HVM domU. 
>> When migrating to xen 3.4.1, with the same dom0 kernel (2.6.27.37), 
>> everything seems to be fine, I can launch the various hosts, but 5 to 10 
>> minutes later, the host violently reboot... I can't find any trace in 
>> the logs. I do have a second host with the same configuration and setup, 
>> and the result is similar. It seems to be linked with domU activity, 
>> because without any domU, or without any domU with actual activity, I 
>> don't have any reboot. I had to rollback to xen 3.3.0.
>>
> 
> Did you try the new Xen 3.4.2 ?
I just did this morning. Without any changelog, it's a bit 'upgrade and 
pray'...

>> It seems like an hardware issue (but it doesn't appears with 3.3.0), or 
>> a crash in the hypervisor, than syslog is unable to catch when it 
>> appears. How can I try to get a trace ?
>>
> 
> You should setup a serial console, so you can capture and
> log the full console (xen + dom0 kernel) output to other computer..
Indeed.

Here is the output. At first domU crash, because of memory ballooning 
issue, is not fatal. The second crash, however is. I don't know if it's 
because of uncorrect state after initial crash, or because of additional 
domUs launched in the interim.

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) Domain 1 (vcpu#0) crashed on cpu#3: 

(XEN) ----[ Xen-3.4.1  x86_64  debug=n  Not tainted ]---- 

(XEN) CPU:    3 

(XEN) RIP:    0010:[<ffffffff811ed7ed>] 

(XEN) RFLAGS: 0000000000010246   CONTEXT: hvm guest 

(XEN) rax: 00000000007028b8   rbx: 0000000000001000   rcx: 
0000000000000200
(XEN) rdx: 0000000000000000   rsi: 00000000007028b8   rdi: 
ffff8800123a0000
(XEN) rbp: ffff88001a119b68   rsp: ffff88001a119b50   r8: 
ffffea00003fcb00
(XEN) r9:  000000000001050f   r10: 0000000000000000   r11: 
0000000000000001
(XEN) r12: 0000000000001000   r13: 0000000000000000   r14: 
ffff88001796aea8
(XEN) r15: 0000000000001000   cr0: 000000008005003b   cr4: 
00000000000006f0
(XEN) cr3: 000000001a079000   cr2: 00007fc176c772e8 

(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0018   cs: 0010 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) Domain 2 reported crashed by domain 0 on cpu#0: 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) p2m_pod_demand_populate: Out of populate-on-demand memory! 

(XEN) domain_crash called from p2m.c:1091 

(XEN) ----[ Xen-3.4.1  x86_64  debug=n  Not tainted ]---- 

(XEN) CPU:    0 

(XEN) RIP:    e008:[<ffff828c801aab29>] hash_foreach+0x59/0xe0 

(XEN) RFLAGS: 0000000000010296   CONTEXT: hypervisor 

(XEN) rax: 0000000000000000   rbx: ffff8284000c1780   rcx: 
00000000000060bc
(XEN) rdx: ffff83041f98c000   rsi: 0000000000000336   rdi: 
ffff8300be7c0000
(XEN) rbp: 0000000000000336   rsp: ffff828c80257848   r8: 
0000000000200c00
(XEN) r9:  0000000000000001   r10: ffff83041f98c000   r11: 
ffff828c801b10e0
(XEN) r12: 0000000000000001   r13: 0000000000000000   r14: 
00000000000060bc
(XEN) r15: ffff828c80205f80   cr0: 000000008005003b   cr4: 
00000000000026f0
(XEN) cr3: 0000000021759000   cr2: 0000000000000000 

(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008 

(XEN) Xen stack trace from rsp=ffff828c80257848: 

(XEN)    0000000000000000 ffff8300be7c0000 ffff83041f98c000 
ffff8284000c1780
(XEN)    ffff8300be7c0000 00000000000060bc 0000000000000000 
00000000000144bc
(XEN)    ffff8300be7c0000 ffff828c801aae4d ffff828c80257960 
00000000000060bc
(XEN)    ffff828c80257960 ffff83041f98c000 ffff83041f98c000 
ffff828c801b13bf
(XEN)    00000000000144bc 0000000000200c00 ffff83041f4ed5e0 
ffff83041f98d130
(XEN)    ffff828c80284d24 ffff83041f4ed5e0 ffff828c80257960 
ffff828c80257968
(XEN)    ffff83041f98c000 00000000000144bc 0000000000000000 
ffff828c801a96d4
(XEN)    0000000000000200 2000000000000000 ffff828c80257a80 
000000061f98c000
(XEN)    0000000000000200 007fffffffffffff 0000000000000000 
ffff83041f4ed000
(XEN)    000000000041f4ed 0000000000000001 0000000000000001 
0000000000000200
(XEN)    00000000000144bc ffff83041f98c000 0000000000000006 
ffff828c801a5991
(XEN)    ffff828c80257abc 0000000000000001 ffff828c80257ba8 
007fffffffffffff
(XEN)    ffff828c802579f0 ffff83041f98c000 ffff828c80257a80 
ffff828c801a6efb
(XEN)    0000000400000000 0000000000000000 ffff8300060bc000 
ffff8300060bb000
(XEN)    ffff8300060ba000 ffff8300060b9000 ffff8300060b8000 
ffff8300060b7000
(XEN)    ffff8300060b6000 ffff8300060b5000 ffff8300060b4000 
ffff8300060b3000
(XEN)    ffff8300060b2000 ffff8300060b1000 ffff8300060b0000 
ffff8300060af000
(XEN)    ffff8300060ae000 ffff828c801f16dc 0000000000000082 
0000000100000001
(XEN)    0000000100000001 0000000100000001 0000000100000001 
0000000100000001
(XEN)    0000000100000001 0000000100000001 0000000100000001 
0000000000000286
(XEN) Xen call trace: 

(XEN)    [<ffff828c801aab29>] hash_foreach+0x59/0xe0 

(XEN)    [<ffff828c801aae4d>] sh_remove_all_mappings+0x8d/0x200 

(XEN)    [<ffff828c801b13bf>] shadow_write_p2m_entry+0x2df/0x330 

(XEN)    [<ffff828c801a96d4>] p2m_set_entry+0x344/0x430 

(XEN)    [<ffff828c801a5991>] set_p2m_entry+0x71/0xa0 

(XEN)    [<ffff828c801a6efb>] p2m_pod_zero_check+0x1db/0x310 

(XEN)    [<ffff828c801a8a20>] p2m_pod_demand_populate+0x830/0xa40 

(XEN)    [<ffff828c801a90b4>] p2m_gfn_to_mfn+0x224/0x260 

(XEN)    [<ffff828c80151fd5>] mod_l1_entry+0x6e5/0x7b0 

(XEN)    [<ffff828c80153067>] do_mmu_update+0x937/0x16e0 

(XEN)    [<ffff828c8014df0b>] get_page_type+0xb/0x20 

(XEN)    [<ffff828c801112b4>] do_multicall+0x164/0x370 

(XEN)    [<ffff828c801c8169>] syscall_enter+0xa9/0xae 

(XEN) 

(XEN) Pagetable walk from 0000000000000000: 

(XEN)  L4[0x000] = 000000001cb48067 00000000003d6ca9 

(XEN)  L3[0x000] = 000000000c58b067 00000000003e72ec 

(XEN)  L2[0x000] = 0000000000000000 ffffffffffffffff 

(XEN) 

(XEN) **************************************** 

(XEN) Panic on CPU 0: 

(XEN) FATAL PAGE FAULT 

(XEN) [error_code=0000] 

(XEN) Faulting linear address: 0000000000000000 

(XEN) **************************************** 

(XEN) 

(XEN) Reboot in five seconds...


My domUs all have this configuration:
memory = 256
maxmem = 512

Or different values, but always with the same ratio between memory and 
max memory. Which seems to be quite useless for hvm domUs, as memory 
ballooning is not supported AFAIK, unless using pv-drivers (which I 
can't manage to build).

With identical values, the issue does'nt appear.

With Xen 3.4.2, the domUs still crash, but at least dom0 does not 
reboot. So it's just less worst :)
-- 
BOFH excuse #426:

internet is needed to catch the etherbunny

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Arbitrary reboot with xen 3.4.x
  2009-11-20 10:42   ` Guillaume Rousse
@ 2009-11-20 13:35     ` Pasi Kärkkäinen
  2009-11-21  1:29       ` Blktap device monitoring code! Ata E Husain
  2009-11-21 13:49       ` Arbitrary reboot with xen 3.4.x Guillaume Rousse
  0 siblings, 2 replies; 7+ messages in thread
From: Pasi Kärkkäinen @ 2009-11-20 13:35 UTC (permalink / raw)
  To: Guillaume Rousse; +Cc: xen-devel

On Fri, Nov 20, 2009 at 11:42:23AM +0100, Guillaume Rousse wrote:
> Pasi Kärkkäinen a écrit :
> >On Thu, Nov 19, 2009 at 07:06:56PM +0100, Guillaume Rousse wrote:
> >>Hello.
> >>
> >>I've a dom0 working perfectly under xen 3.3.x, with a bout 15 HVM domU. 
> >>When migrating to xen 3.4.1, with the same dom0 kernel (2.6.27.37), 
> >>everything seems to be fine, I can launch the various hosts, but 5 to 10 
> >>minutes later, the host violently reboot... I can't find any trace in 
> >>the logs. I do have a second host with the same configuration and setup, 
> >>and the result is similar. It seems to be linked with domU activity, 
> >>because without any domU, or without any domU with actual activity, I 
> >>don't have any reboot. I had to rollback to xen 3.3.0.
> >>
> >
> >Did you try the new Xen 3.4.2 ?
> I just did this morning. Without any changelog, it's a bit 'upgrade and 
> pray'...
> 

Changelog is here: http://xenbits.xen.org/xen-3.4-testing.hg

> >>It seems like an hardware issue (but it doesn't appears with 3.3.0), or 
> >>a crash in the hypervisor, than syslog is unable to catch when it 
> >>appears. How can I try to get a trace ?
> >>
> >
> >You should setup a serial console, so you can capture and
> >log the full console (xen + dom0 kernel) output to other computer..
> Indeed.
> 
> Here is the output. At first domU crash, because of memory ballooning 
> issue, is not fatal. The second crash, however is. I don't know if it's 
> because of uncorrect state after initial crash, or because of additional 
> domUs launched in the interim.
> 

<snip>

> 
> 
> My domUs all have this configuration:
> memory = 256
> maxmem = 512
> 
> Or different values, but always with the same ratio between memory and 
> max memory. Which seems to be quite useless for hvm domUs, as memory 
> ballooning is not supported AFAIK, unless using pv-drivers (which I 
> can't manage to build).
> 
> With identical values, the issue does'nt appear.
> 

Hmm.. so it's definitely related to ballooning.

> With Xen 3.4.2, the domUs still crash, but at least dom0 does not 
> reboot. So it's just less worst :)
>

So 3.4.2 fixes the hypervisor crash. That's good.

-- Pasi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Blktap device monitoring code!
  2009-11-21  1:29       ` Blktap device monitoring code! Ata E Husain
@ 2009-11-20 22:48         ` Daniel Stodden
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel Stodden @ 2009-11-20 22:48 UTC (permalink / raw)
  To: Ata E Husain; +Cc: Xen developer

On Fri, 2009-11-20 at 20:29 -0500, Ata E Husain wrote:
> Dear All,

Hi.

> I am trying to get details with respect to the disk accesses (read/writes) 
> done by domUs by putting some monitoring code in dom0. One way to achieve 
> this is to create all domUs in separate  LVM and use some disk monitoring 
> utility program like iostat to get the details, but is it possible to get 
> the details without running any utility program such as putting some 
> monitoring code in hypervisor to do the required ?

The hypervisor is quite definitely the wrong place as it doesn't do
peripheral I/O virtualization.

> I am using a tap:aio configuration to mount by domU image, and am going 
> through blktap device driver files to get some details but till now no luck.

You could hook into the dom0 kernel. There are ways to do so with or
without xen, but since  I/O virtualization can forward block I/O
requests to userspace, blktap is probably exactly what you want.

Look into tools/blktap2 and the block-log driver. IIRC it's only logging
writes right now (?). Anyway, it should get you started.

Daniel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Blktap device monitoring code!
  2009-11-20 13:35     ` Pasi Kärkkäinen
@ 2009-11-21  1:29       ` Ata E Husain
  2009-11-20 22:48         ` Daniel Stodden
  2009-11-21 13:49       ` Arbitrary reboot with xen 3.4.x Guillaume Rousse
  1 sibling, 1 reply; 7+ messages in thread
From: Ata E Husain @ 2009-11-21  1:29 UTC (permalink / raw)
  To: Xen developer

Dear All,

I am trying to get details with respect to the disk accesses (read/writes) 
done by domUs by putting some monitoring code in dom0. One way to achieve 
this is to create all domUs in separate  LVM and use some disk monitoring 
utility program like iostat to get the details, but is it possible to get 
the details without running any utility program such as putting some 
monitoring code in hypervisor to do the required ?
I am using a tap:aio configuration to mount by domU image, and am going 
through blktap device driver files to get some details but till now no luck.

I have written mails earlier to this community but have never got a single 
reply. I am not if I am not able to put the question in a proper manner or 
this does not fall under anybody of your interest.  If in case I am not so 
clear with respect to my questions, please do reply so I get a chance to 
rephrase by question.

It would be a great help if someone can provide some useful pointers to 
perform above mentioned tasks.

Thanks!
Ata

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Arbitrary reboot with xen 3.4.x
  2009-11-20 13:35     ` Pasi Kärkkäinen
  2009-11-21  1:29       ` Blktap device monitoring code! Ata E Husain
@ 2009-11-21 13:49       ` Guillaume Rousse
  1 sibling, 0 replies; 7+ messages in thread
From: Guillaume Rousse @ 2009-11-21 13:49 UTC (permalink / raw)
  To: Pasi Kärkkäinen; +Cc: xen-devel

Pasi Kärkkäinen a écrit :
> On Fri, Nov 20, 2009 at 11:42:23AM +0100, Guillaume Rousse wrote:
>> Pasi Kärkkäinen a écrit :
>>> On Thu, Nov 19, 2009 at 07:06:56PM +0100, Guillaume Rousse wrote:
>>>> Hello.
>>>>
>>>> I've a dom0 working perfectly under xen 3.3.x, with a bout 15 HVM domU. 
>>>> When migrating to xen 3.4.1, with the same dom0 kernel (2.6.27.37), 
>>>> everything seems to be fine, I can launch the various hosts, but 5 to 10 
>>>> minutes later, the host violently reboot... I can't find any trace in 
>>>> the logs. I do have a second host with the same configuration and setup, 
>>>> and the result is similar. It seems to be linked with domU activity, 
>>>> because without any domU, or without any domU with actual activity, I 
>>>> don't have any reboot. I had to rollback to xen 3.3.0.
>>>>
>>> Did you try the new Xen 3.4.2 ?
>> I just did this morning. Without any changelog, it's a bit 'upgrade and 
>> pray'...
>>
> 
> Changelog is here: http://xenbits.xen.org/xen-3.4-testing.hg
The exhaustive list of all files modifications is totally useless for an 
admin to decide if the risk of updating a working system is worth the 
attempt. What is missing in all Xen releases is a comprehensive, 
user-targeted list of bug fixed and behavior changes.

[..]
>> With Xen 3.4.2, the domUs still crash, but at least dom0 does not 
>> reboot. So it's just less worst :)
>>
> 
> So 3.4.2 fixes the hypervisor crash. That's good.
Yes, that's lesser evil :)

What is really strange, tough, is the difference behaviour exhibited by 
this new feature. On other systems, I just can't launch any HVM with 
different values for 'memory' and 'maxmem'.
-- 
BOFH excuse #179:

multicasts on broken packets

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-11-21 13:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-19 18:06 Arbitrary reboot with xen 3.4.x Guillaume Rousse
2009-11-19 18:09 ` Pasi Kärkkäinen
2009-11-20 10:42   ` Guillaume Rousse
2009-11-20 13:35     ` Pasi Kärkkäinen
2009-11-21  1:29       ` Blktap device monitoring code! Ata E Husain
2009-11-20 22:48         ` Daniel Stodden
2009-11-21 13:49       ` Arbitrary reboot with xen 3.4.x Guillaume Rousse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.