From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38044) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WUgj1-0007q9-Vu for qemu-devel@nongnu.org; Mon, 31 Mar 2014 14:14:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WUgix-0006Qs-6L for qemu-devel@nongnu.org; Mon, 31 Mar 2014 14:14:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:30266) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WUgiw-0006Ql-TJ for qemu-devel@nongnu.org; Mon, 31 Mar 2014 14:14:19 -0400 Message-ID: <5339B077.4040707@redhat.com> Date: Mon, 31 Mar 2014 14:14:15 -0400 From: Cole Robinson MIME-Version: 1.0 References: <53387E35.3010909@redhat.com> <533899F1.1030808@suse.de> In-Reply-To: <533899F1.1030808@suse.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] 2.0 regression: loadvm assertion with ehci + tablet List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?UTF-8?B?QW5kcmVhcyBGw6RyYmVy?= Cc: Paolo Bonzini , "Michael S. Tsirkin" , qemu-devel , Gerd Hoffmann On 03/30/2014 06:25 PM, Andreas F=C3=A4rber wrote: > Hi, >=20 > Am 30.03.2014 22:27, schrieb Cole Robinson: >> With git master, loadvm hits an assert failure if using ehci and usb t= ablet. >> Steps to reproduce: >> >> $ qemu-img create -f qcow2 foo.qcow2 10G >> $ ./x86_64-softmmu/qemu-system-x86_64 \ >> -enable-kvm -m 4096 \ >> -device ich9-usb-ehci1,id=3Dusb,bus=3Dpci.0,addr=3D0x5.0x7 \ >> -device >> ich9-usb-uhci1,masterbus=3Dusb.0,firstport=3D0,bus=3Dpci.0,multifuncti= on=3Don,addr=3D0x5 \ >> -device ich9-usb-uhci2,masterbus=3Dusb.0,firstport=3D2,bus=3Dpci.0,a= ddr=3D0x5.0x1 \ >> -device ich9-usb-uhci3,masterbus=3Dusb.0,firstport=3D4,bus=3Dpci.0,a= ddr=3D0x5.0x2 \ >> -device usb-tablet,id=3Dinput0 \ >> -hda foo.qcow2 \ >> -cdrom Fedora-20-x86_64-Live-Desktop.iso \ >> -boot d -monitor stdio >> >> >> (qemu) savevm foo >> (qemu) loadvm foo >> qemu-system-x86_64: hw/pci/pci.c:250: pcibus_reset: Assertion >> `bus->irq_count[i] =3D=3D 0' failed. >> >> The relevant backtrace bits for the assertion: >> >> #4 0x00007f8f7241971e in pcibus_reset (qbus=3D0x7f8f74082fd0) >> at hw/pci/pci.c:250 >> #5 0x00007f8f723bd36d in qbus_reset_one (bus=3D0x7f8f74082fd0, >> opaque=3D) at hw/core/qdev.c:249 >> #6 0x00007f8f723bec88 in qdev_walk_children (dev=3D0x7f8f73efb320, >> pre_devfn=3D0x0, pre_busfn=3D0x0, post_devfn=3D0x7f8f723bf4f0 , >> post_busfn=3D0x7f8f723bd320 , opaque=3D0x0) >> at hw/core/qdev.c:403 >> #7 0x00007f8f723bedb8 in qbus_walk_children (bus=3D0x7f8f740706e0, >> pre_devfn=3D0x0, pre_busfn=3D0x0, post_devfn=3D0x7f8f723bf4f0 , >> post_busfn=3D0x7f8f723bd320 , opaque=3D0x0) >> at hw/core/qdev.c:369 >> #8 0x00007f8f724f5c5d in qemu_devices_reset () at vl.c:1867 >> #9 qemu_system_reset (report=3Dreport@entry=3Dfalse) at vl.c:1880 >> #10 0x00007f8f7256dba2 in load_vmstate (name=3Dname@entry=3D0x7f8f7417= a160 "foo") >> at /home/crobinso/src/qemu/savevm.c:1098 >> >> The 'cause' is this: >> >> #0 ehci_detach (port=3D0x555556436968) at hw/usb/hcd-ehci.c:810 >> #1 0x0000555555727b5e in usb_detach (port=3Dport@entry=3D0x5555564369= 68) >> at hw/usb/core.c:49 >> #2 0x0000555555736bf3 in ehci_reset (opaque=3D0x5555564364d8) >> at hw/usb/hcd-ehci.c:941 >> #3 0x00005555557e1fcd in qemu_devices_reset () at vl.c:1867 >> #4 qemu_system_reset (report=3Dreport@entry=3Dfalse) at vl.c:1880 >> #5 0x0000555555859f12 in load_vmstate (name=3Dname@entry=3D0x55555645= 8210 "foo") >> at /home/crobinso/src/qemu/savevm.c:1098 >> >> ehci_reset calls usb_detach which sets pcibus->irq_count[3] =3D 1. pci= bus_reset >> runs and hits the assertion. But I don't understand this stuff enough = to >> determine what's actually wrong here :) >> >> I bisected the issue to: >> >> commit 31b030d4abc5bea89c2b33b39d3b302836f6b6ee >> Author: Andreas F=C3=A4rber >> Date: Wed Sep 4 01:29:02 2013 +0200 >> >> cputlb: Change tlb_flush_page() argument to CPUState >> >> Signed-off-by: Andreas F=C3=A4rber >> >> ...and then I double checked it since that sounds unrelated. Same resu= lt. >=20 > You are running into an unrelated migration bug: > http://git.qemu.org/?p=3Dqemu.git;a=3Dcommit;h=3Dc01a71c1a56fa27f43449f= f59e5d03b2483658a2 >=20 > Sorry about that. You'll need to patch -p1 the above commit on top of > each git-bisect commit to find the actual breakage if the above commit > is already bad (can't test right now). >=20 Indeed, that seemed to be messing up my search, thanks. So the real culprit is: commit 9bdbbfc3a04c28dc43af5afffb32066623cb0022 Author: Paolo Bonzini Date: Fri Dec 6 17:54:25 2013 +0100 pci: clean up resetting of IRQs pci_device_reset will deassert the INTX pins, and this will make the irq_count array all-zeroes. Check that this is the case, and remove the existing loop which might even unsync irq_count and irq_state. Which is what adds the assert. Looking at pci_device_reset, there is an issue: dev->irq_state =3D 0; pci_update_irq_status(dev); pci_device_deassert_intx(dev); irq_state is cleared before pci_device_deassert_intx. But tries to clear = all irqs via pci_irq_handler, but that function will exit without taking any action if the requested irq level matches what we already track in irq_st= ate. Since irq_state is 0, pci_device_deassert_intx is basically a no-op. Any interrupts with level=3D1 will not be cleared, which is the case with the= usb tablet after usb_detach. This fixes things for me, but I have no idea if it's the proper fix: diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 8f722dd..1912dfb 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -189,9 +189,9 @@ static void pci_do_device_reset(PCIDevice *dev) { int r; + pci_device_deassert_intx(dev); dev->irq_state =3D 0; pci_update_irq_status(dev); - pci_device_deassert_intx(dev); /* Clear all writable bits */ pci_word_test_and_clear_mask(dev->config + PCI_COMMAND, pci_get_word(dev->wmask + PCI_COMMAND) = | - Cole