Re: LVM userspace causing dom0 crash

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: "Christopher S. Aker" <caker@theshore.net>
Cc: xen devel <xen-devel@lists.xensource.com>
Subject: Re: LVM userspace causing dom0 crash
Date: Fri, 11 May 2012 11:39:31 -0400	[thread overview]
Message-ID: <20120511153931.GA21486@phenom.dumpdata.com> (raw)
In-Reply-To: <4FA7EBF6.6040204@theshore.net>

On Mon, May 07, 2012 at 11:36:22AM -0400, Christopher S. Aker wrote:
> Xen: 4.1.3-rc1-pre (xenbits @ 23285)
> Dom0: 3.2.6 PAE and 3.3.4 PAE
> 
> We seeing the below crash on 3.x dom0s.  A simple lvcreate/lvremove
> loop deployed to a few dozen boxes will hit it quite reliably within
> a short time.  This happens on both an older LVM userspace and
> newest, and in production we have seen this hit on lvremove,
> lvrename, and lvdelete.
> 
> #!/bin/bash
> while true; do
>    lvcreate -L 256M -n test1 vg1; lvremove -f vg1/test1
> done

So I tried this with 3.4-rc6 and didn't see this.
The machine isn't that powerfull - just a  Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz
so four CPUs are visible.

Let me try with 3.2.x shortly.
> 
> BUG: unable to handle kernel paging request at bffff628
> IP: [<c10ebc58>] __page_check_address+0xb8/0x170
> *pdpt = 0000000003cfb027 *pde = 0000000013873067 *pte = 0000000000000000
> Oops: 0000 [#1] SMP
> Modules linked in: ebt_comment ebt_arp ebt_set ebt_limit ebt_ip6
> ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev e1000e
> Pid: 27902, comm: lvremove Not tainted 3.2.6-1 #1 Supermicro X8DT6/X8DT6
> EIP: 0061:[<c10ebc58>] EFLAGS: 00010246 CPU: 6
> EIP is at __page_check_address+0xb8/0x170
> EAX: bffff000 EBX: cbf76dd8 ECX: 00000000 EDX: 00000000
> ESI: bffff628 EDI: e49ed900 EBP: c80ffe60 ESP: c80ffe4c
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
> Process lvremove (pid: 27902, ti=c80fe000 task=d29adca0 task.ti=c80fe000)
> Stack:
>  e4205000 00000fff da9b6bc0 d0068dc0 e49ed900 c80ffe94 c10ec769 c80ffe84
>  00000000 00000129 00000125 b76c5000 00000001 00000000 d0068c08 d0068dc0
>  b76c5000 e49ed900 c80fff24 c10ecb73 00000002 00000005 35448025 c80ffec4
> Call Trace:
>  [<c10ec769>] try_to_unmap_one+0x29/0x310
>  [<c10ecb73>] try_to_unmap_file+0x83/0x560
>  [<c1005829>] ? xen_pte_val+0xb9/0x140
>  [<c1004116>] ? __raw_callee_save_xen_pte_val+0x6/0x8
>  [<c10e1bf8>] ? vm_normal_page+0x28/0xc0
>  [<c1038e95>] ? kmap_atomic_prot+0x45/0x110
>  [<c10ed13c>] try_to_munlock+0x1c/0x40
>  [<c10e7109>] munlock_vma_page+0x49/0x90
>  [<c10e7247>] munlock_vma_pages_range+0x57/0xa0
>  [<c10e7352>] mlock_fixup+0xc2/0x130
>  [<c10e742c>] do_mlockall+0x6c/0x80
>  [<c10e7469>] sys_munlockall+0x29/0x50
>  [<c166f1d8>] sysenter_do_call+0x12/0x28
> Code: ff c1 ee 09 81 e6 f8 0f 00 00 81 e1 ff 0f 00 00 0f ac ca 0c c1
> e2 05 03 55 ec 89 d0 e8 12 d3 f4 ff 8b 4d 0c 85 c9 8d 34 30 75 0c
> <f7> 06 01 01 00 00 0f 84 84 00 00 00 8b 0d 00 0e 9b c1 89 4d f0
> EIP: [<c10ebc58>] __page_check_address+0xb8/0x170 SS:ESP 0069:c80ffe4c
> CR2: 00000000bffff628
> ---[ end trace 8039aeca9c19f5ab ]---
> note: lvremove[27902] exited with preempt_count 1
> BUG: scheduling while atomic: lvremove/27902/0x00000001
> Modules linked in: ebt_comment ebt_arp ebt_set ebt_limit ebt_ip6
> ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev e1000e
> Pid: 27902, comm: lvremove Tainted: G      D      3.2.6-1 #1
> Call Trace:
>  [<c1040fcd>] __schedule_bug+0x5d/0x70
>  [<c1666fb9>] __schedule+0x679/0x830
>  [<c100828b>] ? xen_restore_fl_direct_reloc+0x4/0x4
>  [<c10a05fc>] ? rcu_enter_nohz+0x3c/0x60
>  [<c13b2070>] ? xen_evtchn_do_upcall+0x20/0x30
>  [<c1001227>] ? hypercall_page+0x227/0x1000
>  [<c10079ea>] ? xen_force_evtchn_callback+0x1a/0x30
>  [<c1667250>] schedule+0x30/0x50
>  [<c166890d>] rwsem_down_failed_common+0x9d/0xf0
>  [<c1668992>] rwsem_down_read_failed+0x12/0x14
>  [<c1346b63>] call_rwsem_down_read_failed+0x7/0xc
>  [<c166814d>] ? down_read+0xd/0x10
>  [<c1086f9a>] acct_collect+0x3a/0x170
>  [<c105028a>] do_exit+0x62a/0x7d0
>  [<c104cb37>] ? kmsg_dump+0x37/0xc0
>  [<c1669ac0>] oops_end+0x90/0xd0
>  [<c1032dbe>] no_context+0xbe/0x190
>  [<c1032f28>] __bad_area_nosemaphore+0x98/0x140
>  [<c1008089>] ? xen_clocksource_read+0x19/0x20
>  [<c10081f7>] ? xen_vcpuop_set_next_event+0x47/0x80
>  [<c1032fe2>] bad_area_nosemaphore+0x12/0x20
>  [<c166bc12>] do_page_fault+0x2d2/0x3f0
>  [<c106e389>] ? hrtimer_interrupt+0x1a9/0x2b0
>  [<c10079ea>] ? xen_force_evtchn_callback+0x1a/0x30
>  [<c1008294>] ? check_events+0x8/0xc
>  [<c100828b>] ? xen_restore_fl_direct_reloc+0x4/0x4
>  [<c1668a44>] ? _raw_spin_unlock_irqrestore+0x14/0x20
>  [<c166b940>] ? spurious_fault+0x130/0x130
>  [<c166932e>] error_code+0x5a/0x60
>  [<c166b940>] ? spurious_fault+0x130/0x130
>  [<c10ebc58>] ? __page_check_address+0xb8/0x170
>  [<c10ec769>] try_to_unmap_one+0x29/0x310
>  [<c10ecb73>] try_to_unmap_file+0x83/0x560
>  [<c1005829>] ? xen_pte_val+0xb9/0x140
>  [<c1004116>] ? __raw_callee_save_xen_pte_val+0x6/0x8
>  [<c10e1bf8>] ? vm_normal_page+0x28/0xc0
>  [<c1038e95>] ? kmap_atomic_prot+0x45/0x110
>  [<c10ed13c>] try_to_munlock+0x1c/0x40
>  [<c10e7109>] munlock_vma_page+0x49/0x90
>  [<c10e7247>] munlock_vma_pages_range+0x57/0xa0
>  [<c10e7352>] mlock_fixup+0xc2/0x130
>  [<c10e742c>] do_mlockall+0x6c/0x80
>  [<c10e7469>] sys_munlockall+0x29/0x50
>  [<c166f1d8>] sysenter_do_call+0x12/0x28
> 
> Thanks,
> -Chris
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel