From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wei Liu Subject: Re: netback Oops then xenwatch stuck in D state Date: Sat, 2 Feb 2013 01:01:48 +0000 Message-ID: References: <510C3AA3.2090508@theshore.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3213613126076694081==" Return-path: In-Reply-To: <510C3AA3.2090508@theshore.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "Christopher S. Aker" Cc: xen devel List-Id: xen-devel@lists.xenproject.org --===============3213613126076694081== Content-Type: multipart/alternative; boundary=f46d043c064469976304d4b3694b --f46d043c064469976304d4b3694b Content-Type: text/plain; charset=UTF-8 Dose your Dom0 has very limited RAM? Just happened to fix a bug related to OOM not getting handled correctly. http://lists.xen.org/archives/html/xen-devel/2013-01/msg02549.html Wei. On Fri, Feb 1, 2013 at 9:58 PM, Christopher S. Aker wrote: > We've been hitting the following issue on a variety of hosts and recent > Xen/dom0 version combinations. Here's an excerpt from our latest: > > Xen: 4.1.4 (xenbits @ 23432) > Dom0: 3.7.1-x86_64 > > BUG: unable to handle kernel NULL pointer dereference at 000000000000001c > IP: [] evtchn_from_irq+0x11/0x40 > PGD 0 > Oops: 0000 [#1] SMP > Modules linked in: ebt_comment ebt_arp ebt_set ebt_limit ebt_ip6 ebt_ip > ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding ebtable_filter igb > CPU 0 > Pid: 1636, comm: netback/0 Not tainted 3.7.1-x86_64 #1 Supermicro > X9DRi-LN4+/X9DR3-LN4+/X9DRi-**LN4+/X9DR3-LN4+ > RIP: e030:[] [] > evtchn_from_irq+0x11/0x40 > RSP: e02b:ffff88004334fc98 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffff880004964700 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 00000000000001dc RDI: 000000000000001c > RBP: ffff88004334fc98 R08: ffffea00010bf818 R09: 0000000000000000 > R10: 0000000000000001 R11: ffff880000000000 R12: ffff880004964720 > R13: ffff88002d34d700 R14: 00000000ffffffff R15: ffff88004334fd84 > FS: 00007f8939347700(0000) GS:ffff880101e00000(0000) > knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000000001c CR3: 0000000001c0b000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process netback/0 (pid: 1636, threadinfo ffff88004334e000, task > ffff880043fd5fe0) > Stack: > ffff88004334fcb8 ffffffff8141b06d ffff880000000218 ffff880042fe1200 > ffff88004334fdb8 ffffffff81543b9b ffff88004334fd84 ffff880042c59040 > ffff88004334fd68 ffff88004334fd48 ffff880000000cc0 ffffc900106c7ac0 > Call Trace: > [] notify_remote_via_irq+0xd/0x40 > [] xen_netbk_rx_action+0x73b/**0x800 > [] xen_netbk_kthread+0xb5/0xa60 > [] ? finish_task_switch+0x60/0xd0 > [] ? wake_up_bit+0x40/0x40 > [] ? xen_netbk_tx_build_gops+0xa10/**0xa10 > [] kthread+0xc6/0xd0 > [] ? xen_end_context_switch+0x19/**0x20 > [] ? kthread_freezable_should_stop+**0x70/0x70 > [] ret_from_fork+0x7c/0xb0 > [] ? kthread_freezable_should_stop+**0x70/0x70 > Code: be f5 01 00 00 48 c7 c7 12 e2 99 81 e8 d9 4c c3 ff eb cd 0f 1f 80 00 > 00 00 00 55 48 89 e5 39 3d c6 fd 80 00 76 0b e8 df fa ff ff <0f> b7 40 1c > c9 c3 89 f9 31 c0 48 c7 c2 27 e2 99 81 be db 00 00 > RIP [] evtchn_from_irq+0x11/0x40 > RSP > CR2: 000000000000001c > ---[ end trace 1b5f6b359343fcfe ]--- > > > Which leads to xenwatch being stuck in D state, which then requires us to > reboot the host. > > SysRq : Show Blocked State > task PC stack pid father > xenwatch D ffff880101f938c0 5056 49 2 0x00000000 > ffff880101305cb8 0000000000000246 ffff8801012a0760 00000000000138c0 > ffff880101305fd8 ffff880101304010 00000000000138c0 00000000000138c0 > ffff880101305fd8 00000000000138c0 ffff8800349224e0 ffff8801012a0760 > Call Trace: > [] schedule+0x24/0x70 > [] xenvif_disconnect+0x7d/0x130 > [] ? wake_up_bit+0x40/0x40 > [] frontend_changed+0x214/0x660 > [] ? finish_task_switch+0x60/0xd0 > [] xenbus_otherend_changed+0xb2/**0xc0 > [] ? _raw_spin_unlock_irqrestore+**0x19/0x20 > [] frontend_changed+0xb/0x10 > [] xenwatch_thread+0xba/0x180 > [] ? wake_up_bit+0x40/0x40 > [] ? xs_watch+0x60/0x60 > [] kthread+0xc6/0xd0 > [] ? xen_end_context_switch+0x19/**0x20 > [] ? kthread_freezable_should_stop+**0x70/0x70 > [] ret_from_fork+0x7c/0xb0 > [] ? kthread_freezable_should_stop+**0x70/0x70 > > I'll give building an updated dom0 kernel a shot, but was hoping this rang > a bell or two. > > Thanks, > -Chris > > ______________________________**_________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > --f46d043c064469976304d4b3694b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Dose your Dom0 has very limited RAM?

Ju= st happened to fix a bug related to OOM not getting handled correctly.



Wei.

<= br>
On Fri, Feb 1, 2013 at 9:58 PM, Christopher S= . Aker <caker@theshore.net> wrote:
We've been hitting the following issue on a variety of= hosts and recent Xen/dom0 version combinations. =C2=A0Here's an excerp= t from our latest:

Xen: 4.1.4 (xenbits @ 23432)
Dom0: 3.7.1-x86_64

BUG: unable to handle kernel NULL pointer dereference at 000000000000001c IP: [<ffffffff8141a301>] evtchn_from_irq+0x11/0x40
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: ebt_comment ebt_arp ebt_set ebt_limit ebt_ip6 ebt_ip ip_= set_hash_net ip_set ebtable_nat xen_gntdev bonding ebtable_filter igb
CPU 0
Pid: 1636, comm: netback/0 Not tainted 3.7.1-x86_64 #1 Supermicro X9DRi-LN4= +/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+
RIP: e030:[<ffffffff8141a301>] =C2=A0[<ffffffff8141a301>] evtch= n_from_irq+0x11/0x40
RSP: e02b:ffff88004334fc98 =C2=A0EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880004964700 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 00000000000001dc RDI: 000000000000001c
RBP: ffff88004334fc98 R08: ffffea00010bf818 R09: 0000000000000000
R10: 0000000000000001 R11: ffff880000000000 R12: ffff880004964720
R13: ffff88002d34d700 R14: 00000000ffffffff R15: ffff88004334fd84
FS: =C2=A000007f8939347700(0000) GS:ffff880101e00000(0000) knlGS:0000000000= 000000
CS: =C2=A0e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000001c CR3: 0000000001c0b000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process netback/0 (pid: 1636, threadinfo ffff88004334e000, task ffff880043f= d5fe0)
Stack:
=C2=A0ffff88004334fcb8 ffffffff8141b06d ffff880000000218 ffff880042fe1200 =C2=A0ffff88004334fdb8 ffffffff81543b9b ffff88004334fd84 ffff880042c59040 =C2=A0ffff88004334fd68 ffff88004334fd48 ffff880000000cc0 ffffc900106c7ac0 Call Trace:
=C2=A0[<ffffffff8141b06d>] notify_remote_via_irq+0xd/0x40
=C2=A0[<ffffffff81543b9b>] xen_netbk_rx_action+0x73b/0x800
=C2=A0[<ffffffff81544c25>] xen_netbk_kthread+0xb5/0xa60
=C2=A0[<ffffffff81080050>] ? finish_task_switch+0x60/0xd0
=C2=A0[<ffffffff81071fe0>] ? wake_up_bit+0x40/0x40
=C2=A0[<ffffffff81544b70>] ? xen_netbk_tx_build_gops+0xa10/0xa= 10
=C2=A0[<ffffffff81071926>] kthread+0xc6/0xd0
=C2=A0[<ffffffff810037b9>] ? xen_end_context_switch+0x19/0x20<= br> =C2=A0[<ffffffff81071860>] ? kthread_freezable_should_stop+0x7= 0/0x70
=C2=A0[<ffffffff81767c7c>] ret_from_fork+0x7c/0xb0
=C2=A0[<ffffffff81071860>] ? kthread_freezable_should_stop+0x7= 0/0x70
Code: be f5 01 00 00 48 c7 c7 12 e2 99 81 e8 d9 4c c3 ff eb cd 0f 1f 80 00 = 00 00 00 55 48 89 e5 39 3d c6 fd 80 00 76 0b e8 df fa ff ff <0f> b7 4= 0 1c c9 c3 89 f9 31 c0 48 c7 c2 27 e2 99 81 be db 00 00
RIP =C2=A0[<ffffffff8141a301>] evtchn_from_irq+0x11/0x40
=C2=A0RSP <ffff88004334fc98>
CR2: 000000000000001c
---[ end trace 1b5f6b359343fcfe ]---


Which leads to xenwatch being stuck in D state, which then requires us to r= eboot the host.

SysRq : Show Blocked State
=C2=A0 task =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0PC stack =C2=A0 pid father
xenwatch =C2=A0 =C2=A0 =C2=A0 =C2=A0D ffff880101f938c0 =C2=A05056 =C2=A0 = =C2=A049 =C2=A0 =C2=A0 =C2=A02 0x00000000
=C2=A0ffff880101305cb8 0000000000000246 ffff8801012a0760 00000000000138c0 =C2=A0ffff880101305fd8 ffff880101304010 00000000000138c0 00000000000138c0 =C2=A0ffff880101305fd8 00000000000138c0 ffff8800349224e0 ffff8801012a0760 Call Trace:
=C2=A0[<ffffffff8175f444>] schedule+0x24/0x70
=C2=A0[<ffffffff8154698d>] xenvif_disconnect+0x7d/0x130
=C2=A0[<ffffffff81071fe0>] ? wake_up_bit+0x40/0x40
=C2=A0[<ffffffff81545ac4>] frontend_changed+0x214/0x660
=C2=A0[<ffffffff81080050>] ? finish_task_switch+0x60/0xd0
=C2=A0[<ffffffff8141fb22>] xenbus_otherend_changed+0xb2/0xc0 =C2=A0[<ffffffff8175fe39>] ? _raw_spin_unlock_irqrestore+0x19/= 0x20
=C2=A0[<ffffffff8141fd3b>] frontend_changed+0xb/0x10
=C2=A0[<ffffffff8141da3a>] xenwatch_thread+0xba/0x180
=C2=A0[<ffffffff81071fe0>] ? wake_up_bit+0x40/0x40
=C2=A0[<ffffffff8141d980>] ? xs_watch+0x60/0x60
=C2=A0[<ffffffff81071926>] kthread+0xc6/0xd0
=C2=A0[<ffffffff810037b9>] ? xen_end_context_switch+0x19/0x20<= br> =C2=A0[<ffffffff81071860>] ? kthread_freezable_should_stop+0x7= 0/0x70
=C2=A0[<ffffffff81767c7c>] ret_from_fork+0x7c/0xb0
=C2=A0[<ffffffff81071860>] ? kthread_freezable_should_stop+0x7= 0/0x70

I'll give building an updated dom0 kernel a shot, but was hoping this r= ang a bell or two.

Thanks,
-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@list= s.xen.org
http://lists.x= en.org/xen-devel

--f46d043c064469976304d4b3694b-- --===============3213613126076694081== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============3213613126076694081==--