From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arnd Hannemann Subject: Re: xen dom0 2.6.32.15 kernel BUG at drivers/xen/grant-table.c:583 Date: Mon, 14 Jun 2010 14:26:39 +0200 Message-ID: <4C161FFF.4050102@nets.rwth-aachen.de> References: <4C15E000.7060509@nets.rwth-aachen.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7BIT Return-path: In-reply-to: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Stefano Stabellini Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org Hi, Am 14.06.2010 12:57, schrieb Stefano Stabellini: > On Mon, 14 Jun 2010, Arnd Hannemann wrote: >> Hi, >> >> we have regular but hard to reproduce (wait for a day or two starting domUs) kernel panics (see below) with latest >> "xen/stable-2.6.32.x" git tree. >> >> Any idea, anyone? >> > > this CS from origin/xen/dom0/gntdev should fix your problem: > > sstabellini@kaball-desktop:~/xensource/linux-pvops-latest$ git show ad469f0da31bc16b945f9a06710b9d45434d0091 > commit ad469f0da31bc16b945f9a06710b9d45434d0091 > Author: Stefano Stabellini > Date: Wed Jun 9 12:34:02 2010 -0700 > > xen/gntdev: use spinlocks rather than rwsem for locking > > The mmu notifier mechanism calls its callbacks with an rcu lock, > which disables preemption. This means we cannot use any blocking > synchronization for locking. > > Convert all the rwsemas to plain spinlocks. This requires that > the memory allocation and copying to/from userspace be split > from the actual datastructure updates since they can't be done > under spinlock. > > Signed-off-by: Stefano Stabellini > Signed-off-by: Jeremy Fitzhardinge > Unfortunately, this patch does not seem to help. We get a very similar backtrace after one hour stress testing with a script starting and stopping domUs in a loop. Maybe the problem is the hypervisor itself? We are currently using 4.0.1-rc2-pre (we updated from 4.0.0 because of what we believed was the same problem, we had no working netconsole back then though). Jun 14 14:07:22 vmhost2 [ 2418.542425] ------------[ cut here ]------------ Jun 14 14:07:22 vmhost2 [ 2418.542475] kernel BUG at drivers/xen/grant-table.c:583! Jun 14 14:07:22 vmhost2 [ 2418.542515] invalid opcode: 0000 [#1] Jun 14 14:07:22 vmhost2 SMP Jun 14 14:07:22 vmhost2 Jun 14 14:07:22 vmhost2 [ 2418.542574] last sysfs file: /sys/devices/virtual/net/br0/bridge/topology_change_detected Jun 14 14:07:22 vmhost2 [ 2418.542640] Modules linked in: Jun 14 14:07:22 vmhost2 netconsole Jun 14 14:07:22 vmhost2 raid0 Jun 14 14:07:22 vmhost2 md_mod Jun 14 14:07:22 vmhost2 rtc_cmos Jun 14 14:07:22 vmhost2 rtc_core Jun 14 14:07:22 vmhost2 rtc_lib Jun 14 14:07:22 vmhost2 ipv6 Jun 14 14:07:22 vmhost2 thermal Jun 14 14:07:22 vmhost2 processor Jun 14 14:07:22 vmhost2 thermal_sys Jun 14 14:07:22 vmhost2 hwmon Jun 14 14:07:22 vmhost2 pl2303 Jun 14 14:07:22 vmhost2 button Jun 14 14:07:22 vmhost2 acpi_processor Jun 14 14:07:22 vmhost2 usbserial Jun 14 14:07:22 vmhost2 sr_mod Jun 14 14:07:22 vmhost2 evdev Jun 14 14:07:22 vmhost2 cdrom Jun 14 14:07:22 vmhost2 Jun 14 14:07:22 vmhost2 [ 2418.542937] Jun 14 14:07:22 vmhost2 [ 2418.542970] Pid: 0, comm: swapper Not tainted (2.6.32.15-xen4.0.0-dom0-stefano #2) System Product Name Jun 14 14:07:22 vmhost2 [ 2418.543034] EIP: 0061:[] EFLAGS: 00010282 CPU: 0 Jun 14 14:07:22 vmhost2 [ 2418.543077] EIP is at gnttab_copy_grant_page+0x1f0/0x260 Jun 14 14:07:22 vmhost2 [ 2418.543117] EAX: ffffffea EBX: c153be84 ECX: 00000001 EDX: 00000000 Jun 14 14:07:22 vmhost2 [ 2418.543158] ESI: 00007ff0 EDI: 00000013 EBP: c290e660 ESP: c153be50 Jun 14 14:07:22 vmhost2 [ 2418.543199] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 Jun 14 14:07:22 vmhost2 [ 2418.543239] Process swapper (pid: 0, ti=c153a000 task=c1543760 task.ti=c153a000) Jun 14 14:07:22 vmhost2 [ 2418.543297] Stack: Jun 14 14:07:22 vmhost2 [ 2418.543329] 00000000 Jun 14 14:07:22 vmhost2 00213784 Jun 14 14:07:22 vmhost2 c2904dc0 Jun 14 14:07:22 vmhost2 0002c233 Jun 14 14:07:22 vmhost2 ec233000 Jun 14 14:07:22 vmhost2 ecf85bec Jun 14 14:07:22 vmhost2 00000013 Jun 14 14:07:22 vmhost2 ec233000 Jun 14 14:07:22 vmhost2 Jun 14 14:07:22 vmhost2 [ 2418.543461] <0> Jun 14 14:07:22 vmhost2 00000000 Jun 14 14:07:22 vmhost2 ebd6e000 Jun 14 14:07:22 vmhost2 00000000 Jun 14 14:07:22 vmhost2 00000013 Jun 14 14:07:22 vmhost2 c1350000 Jun 14 14:07:22 vmhost2 13784001 Jun 14 14:07:22 vmhost2 00000000 Jun 14 14:07:22 vmhost2 0002c233 Jun 14 14:07:22 vmhost2 Jun 14 14:07:22 vmhost2 [ 2418.543616] <0> Jun 14 14:07:22 vmhost2 00000000 Jun 14 14:07:22 vmhost2 c1628284 Jun 14 14:07:22 vmhost2 c155b978 Jun 14 14:07:22 vmhost2 c1628284 Jun 14 14:07:22 vmhost2 00560014 Jun 14 14:07:22 vmhost2 c12200c1 Jun 14 14:07:22 vmhost2 00000001 Jun 14 14:07:22 vmhost2 00000000 Jun 14 14:07:22 vmhost2 Jun 14 14:07:22 vmhost2 [ 2418.543797] Call Trace: Jun 14 14:07:22 vmhost2 [ 2418.543838] [] ? sock_release+0x10/0x80 Jun 14 14:07:22 vmhost2 [ 2418.543882] [] ? net_tx_action+0x1d1/0x9b0 Jun 14 14:07:22 vmhost2 [ 2418.543925] [] ? tasklet_action+0x9e/0xb0 Jun 14 14:07:22 vmhost2 [ 2418.543967] [] ? __do_softirq+0x88/0x110 Jun 14 14:07:22 vmhost2 [ 2418.544009] [] ? __xen_evtchn_do_upcall+0xd7/0x160 Jun 14 14:07:22 vmhost2 [ 2418.544053] [] ? do_softirq+0x3d/0x40 Jun 14 14:07:22 vmhost2 [ 2418.544094] [] ? xen_evtchn_do_upcall+0x2a/0x40 Jun 14 14:07:22 vmhost2 [ 2418.544147] [] ? xen_do_upcall+0x7/0xc Jun 14 14:07:22 vmhost2 [ 2418.544190] [] ? hypercall_page+0x3a7/0x1010 Jun 14 14:07:22 vmhost2 [ 2418.544234] [] ? xen_safe_halt+0xf/0x20 Jun 14 14:07:22 vmhost2 [ 2418.544275] [] ? xen_idle+0x1c/0x30 Jun 14 14:07:22 vmhost2 [ 2418.544316] [] ? cpu_idle+0x3a/0x60 Jun 14 14:07:22 vmhost2 [ 2418.544359] [] ? start_kernel+0x2c6/0x2cb Jun 14 14:07:22 vmhost2 [ 2418.544401] [] ? unknown_bootoption+0x0/0x190 Jun 14 14:07:22 vmhost2 [ 2418.544444] [] ? xen_start_kernel+0x624/0x62c Jun 14 14:07:22 vmhost2 [ 2418.544483] Code: Jun 14 14:07:22 vmhost2 8d Jun 14 14:07:22 vmhost2 5c Jun 14 14:07:22 vmhost2 24 Jun 14 14:07:22 vmhost2 34 Jun 14 14:07:22 vmhost2 c1 Jun 14 14:07:22 vmhost2 e0 Jun 14 14:07:22 vmhost2 0c Jun 14 14:07:22 vmhost2 83 Jun 14 14:07:22 vmhost2 c8 Jun 14 14:07:22 vmhost2 01 Jun 14 14:07:22 vmhost2 89 Jun 14 14:07:22 vmhost2 44 Jun 14 14:07:22 vmhost2 24 Jun 14 14:07:22 vmhost2 34 Jun 14 14:07:22 vmhost2 8b Jun 14 14:07:22 vmhost2 44 Jun 14 14:07:22 vmhost2 24 Jun 14 14:07:22 vmhost2 0c Jun 14 14:07:22 vmhost2 c7 Jun 14 14:07:22 vmhost2 44 Jun 14 14:07:22 vmhost2 24 Jun 14 14:07:22 vmhost2 40 Jun 14 14:07:22 vmhost2 00 Jun 14 14:07:22 vmhost2 00 Jun 14 14:07:22 vmhost2 00 Jun 14 14:07:22 vmhost2 00 Jun 14 14:07:22 vmhost2 89 Jun 14 14:07:22 vmhost2 44 Jun 14 14:07:22 vmhost2 24 Jun 14 14:07:22 vmhost2 3c Jun 14 14:07:22 vmhost2 e8 Jun 14 14:07:22 vmhost2 b8 Jun 14 14:07:22 vmhost2 1e Jun 14 14:07:22 vmhost2 df Jun 14 14:07:22 vmhost2 ff Jun 14 14:07:22 vmhost2 85 Jun 14 14:07:22 vmhost2 c0 Jun 14 14:07:22 vmhost2 0f Jun 14 14:07:22 vmhost2 84 Jun 14 14:07:22 vmhost2 2c Jun 14 14:07:22 vmhost2 ff Jun 14 14:07:22 vmhost2 ff Jun 14 14:07:22 vmhost2 ff Jun 14 12:07:21 vmhost2 unparseable log message: "<0f> " Jun 14 14:07:22 vmhost2 0b Jun 14 14:07:22 vmhost2 eb Jun 14 14:07:22 vmhost2 fe Jun 14 14:07:22 vmhost2 0f Jun 14 14:07:22 vmhost2 0b Jun 14 14:07:22 vmhost2 eb Jun 14 14:07:22 vmhost2 fe Jun 14 14:07:22 vmhost2 0f Jun 14 14:07:22 vmhost2 0b Jun 14 14:07:22 vmhost2 eb Jun 14 14:07:22 vmhost2 fe Jun 14 14:07:22 vmhost2 8b Jun 14 14:07:22 vmhost2 54 Jun 14 14:07:22 vmhost2 24 Jun 14 14:07:22 vmhost2 04 Jun 14 14:07:22 vmhost2 8b Jun 14 14:07:22 vmhost2 44 Jun 14 14:07:22 vmhost2 24 Jun 14 14:07:22 vmhost2 0c Jun 14 14:07:22 vmhost2 e8 Jun 14 14:07:22 vmhost2 Jun 14 14:07:22 vmhost2 [ 2418.545277] EIP: [] Jun 14 14:07:22 vmhost2 gnttab_copy_grant_page+0x1f0/0x260 Jun 14 14:07:22 vmhost2 SS:ESP 0069:c153be50 Jun 14 14:07:22 vmhost2 [ 2418.545597] ---[ end trace f877a40240218318 ]--- Jun 14 14:07:22 vmhost2 [ 2418.545669] Kernel panic - not syncing: Fatal exception in interrupt Jun 14 14:07:22 vmhost2 [ 2418.545746] Pid: 0, comm: swapper Tainted: G D 2.6.32.15-xen4.0.0-dom0-stefano #2 Jun 14 14:07:22 vmhost2 [ 2418.545840] Call Trace: Jun 14 14:07:22 vmhost2 [ 2418.545912] [] ? panic+0x42/0xe1 Jun 14 14:07:22 vmhost2 [ 2418.545986] [] ? oops_end+0x96/0xa0 Jun 14 14:07:22 vmhost2 [ 2418.546060] [] ? do_invalid_op+0x7f/0x90 Jun 14 14:07:22 vmhost2 [ 2418.546135] [] ? gnttab_copy_grant_page+0x1f0/0x260 Jun 14 14:07:22 vmhost2 [ 2418.546223] [] ? __alloc_pages_nodemask+0xe4/0x5b0 Jun 14 14:07:22 vmhost2 [ 2418.546303] [] ? xen_force_evtchn_callback+0x17/0x30 Jun 14 14:07:22 vmhost2 [ 2418.546380] [] ? check_events+0x8/0xc Jun 14 14:07:22 vmhost2 [ 2418.546455] [] ? error_code+0x66/0x6c Jun 14 14:07:22 vmhost2 [ 2418.546530] [] ? do_invalid_op+0x0/0x90 Jun 14 14:07:22 vmhost2 [ 2418.546606] [] ? gnttab_copy_grant_page+0x1f0/0x260 Jun 14 14:07:22 vmhost2 [ 2418.546687] [] ? sock_release+0x10/0x80 Jun 14 14:07:22 vmhost2 [ 2418.546763] [] ? net_tx_action+0x1d1/0x9b0 Jun 14 14:07:22 vmhost2 [ 2418.546839] [] ? tasklet_action+0x9e/0xb0 Jun 14 14:07:22 vmhost2 [ 2418.546915] [] ? __do_softirq+0x88/0x110 Jun 14 14:07:22 vmhost2 [ 2418.546993] [] ? __xen_evtchn_do_upcall+0xd7/0x160 Jun 14 14:07:22 vmhost2 [ 2418.547070] [] ? do_softirq+0x3d/0x40 Jun 14 14:07:22 vmhost2 [ 2418.547145] [] ? xen_evtchn_do_upcall+0x2a/0x40 Jun 14 14:07:22 vmhost2 [ 2418.547222] [] ? xen_do_upcall+0x7/0xc Jun 14 14:07:22 vmhost2 [ 2418.547299] [] ? hypercall_page+0x3a7/0x1010 Jun 14 14:07:22 vmhost2 [ 2418.547385] [] ? xen_safe_halt+0xf/0x20 Jun 14 14:07:22 vmhost2 [ 2418.547463] [] ? xen_idle+0x1c/0x30 Jun 14 14:07:22 vmhost2 [ 2418.547537] [] ? cpu_idle+0x3a/0x60 Jun 14 14:07:22 vmhost2 [ 2418.547615] [] ? start_kernel+0x2c6/0x2cb Jun 14 14:07:22 vmhost2 [ 2418.547690] [] ? unknown_bootoption+0x0/0x190 Jun 14 14:07:22 vmhost2 [ 2418.547766] [] ? xen_start_kernel+0x624/0x62c Best regards, Arnd