From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: AMD KVM Pci Passthrough reports device busy Date: Mon, 04 Jun 2012 21:44:05 -0600 Message-ID: <1338867845.23475.168.camel@bling.home> References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Chris Sanders Return-path: Received: from mx1.redhat.com ([209.132.183.28]:7570 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761418Ab2FEDoJ (ORCPT ); Mon, 4 Jun 2012 23:44:09 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Mon, 2012-06-04 at 16:11 -0500, Chris Sanders wrote: > Hello, I've been working for several days trying to get Pci > Passthrough to work. So far the #kvm IRC channel has helped me with a > few suggestions, though that hasn't yet solved the problem. I'm > running CentOS 6.2 and was suggested I try compiling 3.2.18 kernel > form kernel.org. This has changed a few of the messages but the guest > still fails to start. > > Grepping for AMD-VI produces: > # dmesg | grep AMD-Vi > AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40 > AMD-Vi: Lazy IO/TLB flushing enabled > > After boot I'm running the following script > echo "unbind pci-pci bridge" > echo "1002 4383" > /sys/bus/pci/drivers/pci-stub/new_id > echo "unbind pci device" > echo "0000:03:07.0" > /sys/bus/pci/drivers/ivtv/unbind > echo "4444 0803" > /sys/bus/pci/drivers/pci-stub/new_id > echo 0000:03:07.0 > /sys/bus/pci/devices/0000\:03\:07.0/driver/unbind > echo 0000:03:07.0 > /sys/bus/pci/drivers/pci-stub/bind > > This is lspci -n showing my device behind the Pci-Pci bridge > -[0000:00]-+-00.0 > +-00.2 > +-02.0-[01]--+-00.0 > | \-00.1 > +-09.0-[02]----00.0 > +-11.0 > +-12.0 > +-12.2 > +-13.0 > +-13.2 > +-14.0 > +-14.1 > +-14.3 > +-14.4-[03]----07.0 > +-14.5 > +-15.0-[04]-- > +-16.0 > +-16.2 > +-18.0 > +-18.1 > +-18.2 > +-18.3 > +-18.4 > \-18.5 > > My kvm command and error are: > # /usr/libexec/qemu-kvm -m 3048 -net none -device > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive > file=/dev/vg_hdd/lv_sagetv,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native > -device pci-assign,host=03:07.0 > Failed to assign device "(null)" : Device or resource busy > qemu-kvm: -device pci-assign,host=03:07.0: Device 'pci-assign' could > not be initialized I have a setup with an AMD 990FX system and a spare PVR-350 card that I installed to reproduce. The sad answer is that it's nearly impossible to assign PCI devices on these systems due to the aliasing of devices below the PCIe-to-PCI bridge (PCIe devices are much, much easier to assign). If you boot with amd_iommu_dump, you'll see some output like this: AMD-Vi: DEV_ALIAS_RANGE devid: 05:00.0 flags: 00 devid_to: 00:14.4 AMD-Vi: DEV_RANGE_END devid: 05:1f.7 This says the devices on bus 5 (my bus 5 is equivalent to your bus 3) are all aliased to device 14.4: 00:14.4 PCI bridge: Advanced Micro Devices [AMD] nee ATI SBx00 PCI to PCI Bridge (rev 40) (prog-if 01 [Subtractive decode]) What that means is that the IOMMU can't distinguish devices behind the PCI-to-PCI bridge so all devices are grouped as an alias to device 14.4. You would hopefully not care about this, you don't have any other devices anyway. Unfortunately amd_iommu pre-allocates IOMMU domains for every device, so it's already allocated a domain for device 14.4 and adds device 03:07.0 into it. Unbinding 03:07.0 from the ivtv driver detaches that devices from the domain, but when we go to assign it to a guest we create a new domain. Assigning 03:07.0 into that new domain fails because the device is an alias for 00:14.4, which still has a different domain. One way to get around this would be to also assign the bridge to the guest, but we don't support and actually reject assigning bridges :( This works a bit better on Intel VT-d systems because domains are dynamically allocated. Thus for streaming DMA, the domain is only created when the driver attempts to setup a DMA transaction. When the driver is unbound, the domain is destroyed thus allowing us to setup a new domain for device assignment. If you don't mind running non-upstream code, VFIO is a re-write of device assignment for Qemu that is aware of such alias problems and actually works in this case. The downside is that VFIO is strict about multifunction devices supporting ACS to prevent peer-to-peer between domains, so will require all of the 14.x devices to be bound to pci-stub as well. On my system, this includes an smbus controller, audio device, lpc controller, and usb device. If AMD could confirm this device doesn't allow peer-to-peer between functions, we could relax this requirement a bit. VFIO kernel and qemu can be found here: git://github.com/awilliam/linux-vfio.git (iommu-group-vfio-next-20120529) git://github.com/awilliam/qemu-vfio.git (iommu-group-vfio) See Documentation/vfio.txt for description. The major difference is the setup of drivers. At a minimum, the device you want to assign to the guest needs to be bound to the vfio-pci driver, much in the same way you bind devices to pci-stub. All other devices in the group should be bound to pci-stub. You can find the other devices in the group by following the iommu_group link in sysfs, ex: /sys/bus/pci/devices/0000:03:07.0/iommu_group/devices Once you have that, simply replace pci-assign with vfio-pci on the qemu command line. I'm actively trying to get this code upstream and hoping we can do it asap, so barring it getting rejected, this would hopefully eventually make mainline. Thanks, Alex