From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wright Subject: Re: PCI Passthrough, error: The driver 'pci-stub' is occupying your device 0000:08:06.2 Date: Fri, 25 Feb 2011 15:14:03 -0800 Message-ID: <20110225231403.GE4988@sequoia.sous-sol.org> References: <1298322078.5764.45.camel@x201> <20110222015119.GY9869@sequoia.sous-sol.org> <20110223001103.GZ9869@sequoia.sous-sol.org> <20110225000622.GB4988@sequoia.sous-sol.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Chris Wright , Alex Williamson , kvm@vger.kernel.org, "Roedel, Joerg" To: James Neave Return-path: Received: from sous-sol.org ([216.99.217.87]:37021 "EHLO sequoia.sous-sol.org" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932994Ab1BYXOY (ORCPT ); Fri, 25 Feb 2011 18:14:24 -0500 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: * James Neave (roboj1m@gmail.com) wrote: > On Fri, Feb 25, 2011 at 12:06 AM, Chris Wright = wrote: > > * James Neave (roboj1m@gmail.com) wrote: > >> OK, here's my latest dmesg with amd_iommu_dump and debug with no q= uiet > >> http://pastebin.com/JxEwvqRA > > > > Yeah, that's what I expected: > > > > [ =A0 =A00.724403] AMD-Vi: =A0 DEV_ALIAS_RANGE =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 devid: 08:00.0 flags: 00 devid_to: 00:14.4 > > [ =A0 =A00.724439] AMD-Vi: =A0 DEV_RANGE_END =A0 =A0 =A0 =A0 =A0 de= vid: 08:1f.7 > > > > That basically says 08:00.0 - 08:1f.7 will show up as 00:14.4 (and > > should all go into same iommu domain). > > > >> I've just figured out a sequence of "echo DEV > PATH" commands to = call > >> for 14.4 gets me past the "claimed by pci-stub" error and gets me = to > >> the "failed to assign IRQ" error. > >> I'm going to narrow down the required sequence and then post it. > > > > Kind of afraid to ask, but does it include: > > > > (assuming 1002 4384 is the pci to pci bridge) > > echo 1002 4384 > /sys/bus/pci/drivers/pci-stub/new_id > > echo 0000:00:14.4 > /sys/bus/pci/drivers/pci-stub/unbind > > > > (this has the side effect of detaching the bridge from its domain) >=20 > Exact sequence is: >=20 > echo "1002 4384" > /sys/bus/pci/drivers/pci-stub/new_id > echo "0000:00:14.4" > /sys/bus/pci/devices/0000:00:14.4/driver/unbind OK, same, since driver is a symlink to pci-stub. > I take it this is a bad thing then? It just means the amd iommu driver might be susceptible to a refcountin= g issue. Indeed, here's what I do that assigning a device below the PCI-PCI bridge, then shutdown the guest: [ 406.535873] ------------[ cut here ]------------ [ 406.536864] kernel BUG at arch/x86/kernel/amd_iommu.c:2460! [ 406.536864] invalid opcode: 0000 [#1] SMP=20 [ 406.536864] last sysfs file: /sys/devices/pci0000:00/0000:00:14.4/00= 00:03:06.0/device [ 406.536864] CPU 0=20 [ 406.536864] Modules linked in: kvm_amd kvm e1000e bnx2 [ 406.536864]=20 [ 406.536864] Pid: 4265, comm: qemu-system-x86 Not tainted 2.6.37-rc6+= #61 Toonie/Toonie [ 406.536864] RIP: 0010:[] [] amd= _iommu_domain_destroy+0x75/0x9d [ 406.536864] RSP: 0018:ffff88013507fb78 EFLAGS: 00010202 [ 406.536864] RAX: ffff8801346ebeb8 RBX: ffff8801346ebeb8 RCX: 0000000= 000014f67 [ 406.536864] RDX: 0000000000000202 RSI: 0000000000000202 RDI: fffffff= f81a118a0 [ 406.536864] RBP: ffff88013507fba8 R08: 0000000000000000 R09: ffff880= 07900f8e8 [ 406.536864] R10: ffff88013507f8d8 R11: 0000000000000006 R12: ffff880= 1346ebea8 [ 406.536864] R13: ffff8800783b73a8 R14: 0000000000000202 R15: ffff880= 135089570 [ 406.536864] FS: 00007fe794db76e0(0000) GS:ffff88007fc00000(0000) kn= lGS:0000000000000000 [ 406.536864] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 406.536864] CR2: 0000000000000000 CR3: 000000007c6fb000 CR4: 0000000= 0000006f0 [ 406.536864] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000= 000000000 [ 406.536864] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000= 000000400 [ 406.536864] Process qemu-system-x86 (pid: 4265, threadinfo ffff88013= 507e000, task ffff88013496b090) [ 406.536864] Stack: [ 406.536864] 0000000000000009 ffff880135089570 ffff88007c734ca0 0000= 000000000001 [ 406.536864] ffff88007c74e3c8 0000000000000002 ffff88013507fbc8 ffff= ffff813013b7 [ 406.536864] 0000000000000001 ffff880135089570 ffff88013507fbe8 ffff= ffffa003f d81 [ 406.536864] Call Trace: [ 406.536864] [] iommu_domain_free+0x16/0x22 [ 406.536864] [] kvm_iommu_unmap_guest+0x22/0x28 [k= vm] [ 406.536864] [] kvm_arch_destroy_vm+0x15/0x119 [kv= m] [ 406.536864] [] kvm_put_kvm+0xde/0x103 [kvm] [ 406.536864] [] kvm_vcpu_release+0x13/0x17 [kvm] [ 406.536864] [] fput+0x11b/0x1bc [ 406.536864] [] filp_close+0x67/0x72 [ 406.536864] [] put_files_struct+0x70/0xc3 [ 406.536864] [] exit_files+0x34/0x39 [ 406.536864] [] do_exit+0x267/0x72e [ 406.536864] [] ? lock_timer_base+0x26/0x4a [ 406.536864] [] ? freezing+0xe/0x10 [ 406.536864] [] sys_exit_group+0x0/0x16 [ 406.536864] [] get_signal_to_deliver+0x31c/0x33b [ 406.536864] [] do_notify_resume+0x8b/0x6c3 [ 406.536864] [] ? set_tsk_thread_flag+0xd/0xf [ 406.536864] [] ? sys_rt_sigtimedwait+0x18e/0x208 [ 406.536864] [] ? path_put+0x1d/0x22 [ 406.536864] [] int_signal+0x12/0x17 [ 406.536864] Code: 00 00 00 4c 89 eb 4d 8b 6d 00 49 8d 44 24 10 48 39 c3 75 df 4c 89 f6 48 c7 c7 a0 18 a1 81 e8 fa b5 56 00 41 83 7c 24 64 00 74 04 <0f> 0b eb fe 4c 89 e7 e8 2c f5 ff ff 4c 89 e7 e8 9e e2 ff ff 49=20 [ 406.536864] RIP [] amd_iommu_domain_destroy+0x75/= 0x9d [ 406.536864] RSP [ 406.854138] ---[ end trace 13c9f9241c8b376b ]--- [ 406.859182] Fixing recursive fault but reboot is needed! > > I assume this means that 00:14.4 is still left claimed by pci-stub? >=20 > Yes >=20 > > How are you determining this? The lspci paste above has pci-stub f= or all > > of them. The easiest thing might be to start with manually disabli= ng > > host driver and reassigning pci-stub to: 00:14.4, 08: 06.2,3 and 0e= =2E0 > > Then giving the guest only 08:06.1. >=20 > I determined it by being half asleep and not reading it properly... >= =2E< > You're right, all 5 devices were using pci-stub Heh, ok ;) > >> libvirtError: this function is not supported by the connection dri= ver: > >> Unable to reset PCI device 0000:00:14.4: no FLR, PM reset or bus r= eset > >> available >=20 > > Right, libvirt is more restrictive than qemu-kvm (forgot you were u= sing > > libvirt here). >=20 > What does that libvirt error mean? I can't find a definition. libvirt is being very strict and erring on the side of safety. It's looking at each device and verifying that there is some method for resetting it. Not resetting the device between users can lead to information leakage or robustness issues (worst cases would be a guest learning host or other guest secrets or a guest maliciously leaving the device in an unusual state that caused next user -- host or another guest -- to crash, etc). It is not seeing the ability to do FLR (Function Level Reset), PM (Power Management) reset (switch to Power state off, then back on basically), and can't to a secondary bus reset because the device 00:14.4 is on the root bus. > Am I limiting myself by using libvirt? Would not using it help and ho= w > would I go about not using it? I think if you can do just a single device, and ensure that it's not sharing an IRQ with any other device, then=20 > > Trouble now is that > > with shared IRQ we don't have a good way to handle that right now. >=20 > Game over then? Yeah, unless you can do just a single device (and don't need the other devices in the host). > I've tried assigning the USB devices before, I couldn't do it because > qemu doesn't support USB2 devices. > I don't really understand where this IRQ conflict is, the firewire an= d > the USB2 device share IRQ22 but I'm assigning them both to the VM? > Is that still a problem? You'll see them on the host in /proc/interrupts. The problem is that i= f you have two devices sharing an interrupt and they are each driven by different guest (or one guest and one host), you subject a system that relied on cooperating drivers to properly ack the interrupt to abuse. > I don't suppose there's any way to change which IRQ they use in the > BIOS or with a command is there? >=20 > I don't know if it means anything but this page: >=20 > http://linuxtv.org/wiki/index.php/Hauppauge_WinTV-HVR-2200 >=20 > Has the lspci output for the HVR-2200 which mentions MSI and IRQ255. > My knowledge it very limited on this subject so I don't know if that'= s > meaningless looking at the output from another person's lspci. Yeah, that suggests the device will use MSI interrupts. > Anything left to try? Just the idea of reducing the problem to a single device to the guest (and may have to do the sysfs black magic above). thanks. -chris