From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: LVM userspace causing dom0 crash Date: Mon, 7 May 2012 13:17:03 -0400 Message-ID: <20120507171703.GA5746@phenom.dumpdata.com> References: <4FA7EBF6.6040204@theshore.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <4FA7EBF6.6040204@theshore.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "Christopher S. Aker" Cc: xen devel List-Id: xen-devel@lists.xenproject.org On Mon, May 07, 2012 at 11:36:22AM -0400, Christopher S. Aker wrote: > Xen: 4.1.3-rc1-pre (xenbits @ 23285) > Dom0: 3.2.6 PAE and 3.3.4 PAE This looks suspicious like a fix that went in some time ago, ah: 2cd1c8d x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode but that went in 3.2 so that can't be it. Hm, can you give more details on what parameters you are passing to dom0 and the hypervisor so I can reproduce it? Also, could you send me your .config file? Is the underlaying storage SCSI? And is this only happening on these SuperMicro boxes or are you seeing this on other hardware as well? > > We seeing the below crash on 3.x dom0s. A simple lvcreate/lvremove > loop deployed to a few dozen boxes will hit it quite reliably within > a short time. This happens on both an older LVM userspace and > newest, and in production we have seen this hit on lvremove, > lvrename, and lvdelete. > > #!/bin/bash > while true; do > lvcreate -L 256M -n test1 vg1; lvremove -f vg1/test1 > done > > BUG: unable to handle kernel paging request at bffff628 > IP: [] __page_check_address+0xb8/0x170 > *pdpt = 0000000003cfb027 *pde = 0000000013873067 *pte = 0000000000000000 > Oops: 0000 [#1] SMP > Modules linked in: ebt_comment ebt_arp ebt_set ebt_limit ebt_ip6 > ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev e1000e > Pid: 27902, comm: lvremove Not tainted 3.2.6-1 #1 Supermicro X8DT6/X8DT6 > EIP: 0061:[] EFLAGS: 00010246 CPU: 6 > EIP is at __page_check_address+0xb8/0x170 > EAX: bffff000 EBX: cbf76dd8 ECX: 00000000 EDX: 00000000 > ESI: bffff628 EDI: e49ed900 EBP: c80ffe60 ESP: c80ffe4c > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069 > Process lvremove (pid: 27902, ti=c80fe000 task=d29adca0 task.ti=c80fe000) > Stack: > e4205000 00000fff da9b6bc0 d0068dc0 e49ed900 c80ffe94 c10ec769 c80ffe84 > 00000000 00000129 00000125 b76c5000 00000001 00000000 d0068c08 d0068dc0 > b76c5000 e49ed900 c80fff24 c10ecb73 00000002 00000005 35448025 c80ffec4 > Call Trace: > [] try_to_unmap_one+0x29/0x310 > [] try_to_unmap_file+0x83/0x560 > [] ? xen_pte_val+0xb9/0x140 > [] ? __raw_callee_save_xen_pte_val+0x6/0x8 > [] ? vm_normal_page+0x28/0xc0 > [] ? kmap_atomic_prot+0x45/0x110 > [] try_to_munlock+0x1c/0x40 > [] munlock_vma_page+0x49/0x90 > [] munlock_vma_pages_range+0x57/0xa0 > [] mlock_fixup+0xc2/0x130 > [] do_mlockall+0x6c/0x80 > [] sys_munlockall+0x29/0x50 > [] sysenter_do_call+0x12/0x28 > Code: ff c1 ee 09 81 e6 f8 0f 00 00 81 e1 ff 0f 00 00 0f ac ca 0c c1 > e2 05 03 55 ec 89 d0 e8 12 d3 f4 ff 8b 4d 0c 85 c9 8d 34 30 75 0c > 06 01 01 00 00 0f 84 84 00 00 00 8b 0d 00 0e 9b c1 89 4d f0 > EIP: [] __page_check_address+0xb8/0x170 SS:ESP 0069:c80ffe4c > CR2: 00000000bffff628 > ---[ end trace 8039aeca9c19f5ab ]--- > note: lvremove[27902] exited with preempt_count 1 > BUG: scheduling while atomic: lvremove/27902/0x00000001 > Modules linked in: ebt_comment ebt_arp ebt_set ebt_limit ebt_ip6 > ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev e1000e > Pid: 27902, comm: lvremove Tainted: G D 3.2.6-1 #1 > Call Trace: > [] __schedule_bug+0x5d/0x70 > [] __schedule+0x679/0x830 > [] ? xen_restore_fl_direct_reloc+0x4/0x4 > [] ? rcu_enter_nohz+0x3c/0x60 > [] ? xen_evtchn_do_upcall+0x20/0x30 > [] ? hypercall_page+0x227/0x1000 > [] ? xen_force_evtchn_callback+0x1a/0x30 > [] schedule+0x30/0x50 > [] rwsem_down_failed_common+0x9d/0xf0 > [] rwsem_down_read_failed+0x12/0x14 > [] call_rwsem_down_read_failed+0x7/0xc > [] ? down_read+0xd/0x10 > [] acct_collect+0x3a/0x170 > [] do_exit+0x62a/0x7d0 > [] ? kmsg_dump+0x37/0xc0 > [] oops_end+0x90/0xd0 > [] no_context+0xbe/0x190 > [] __bad_area_nosemaphore+0x98/0x140 > [] ? xen_clocksource_read+0x19/0x20 > [] ? xen_vcpuop_set_next_event+0x47/0x80 > [] bad_area_nosemaphore+0x12/0x20 > [] do_page_fault+0x2d2/0x3f0 > [] ? hrtimer_interrupt+0x1a9/0x2b0 > [] ? xen_force_evtchn_callback+0x1a/0x30 > [] ? check_events+0x8/0xc > [] ? xen_restore_fl_direct_reloc+0x4/0x4 > [] ? _raw_spin_unlock_irqrestore+0x14/0x20 > [] ? spurious_fault+0x130/0x130 > [] error_code+0x5a/0x60 > [] ? spurious_fault+0x130/0x130 > [] ? __page_check_address+0xb8/0x170 > [] try_to_unmap_one+0x29/0x310 > [] try_to_unmap_file+0x83/0x560 > [] ? xen_pte_val+0xb9/0x140 > [] ? __raw_callee_save_xen_pte_val+0x6/0x8 > [] ? vm_normal_page+0x28/0xc0 > [] ? kmap_atomic_prot+0x45/0x110 > [] try_to_munlock+0x1c/0x40 > [] munlock_vma_page+0x49/0x90 > [] munlock_vma_pages_range+0x57/0xa0 > [] mlock_fixup+0xc2/0x130 > [] do_mlockall+0x6c/0x80 > [] sys_munlockall+0x29/0x50 > [] sysenter_do_call+0x12/0x28 > > Thanks, > -Chris > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel