From mboxrd@z Thu Jan 1 00:00:00 1970 From: Leo Yan Subject: Re: Question: KVM: Failed to bind vfio with PCI-e / SMMU on Juno-r2 Date: Wed, 13 Mar 2019 19:35:49 +0800 Message-ID: <20190313113549.GK13422@leoy-ThinkPad-X240s> References: <20190311064248.GC13422@leoy-ThinkPad-X240s> <20190311093958.GF13422@leoy-ThinkPad-X240s> <762d54fb-b146-e591-d544-676cb5606837@redhat.com> <20190311143501.GH13422@leoy-ThinkPad-X240s> <20190313080048.GI13422@leoy-ThinkPad-X240s> <35c22d0c-7da5-4e68-effb-05c8571d8b63@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 9ECED4A2E5 for ; Wed, 13 Mar 2019 07:35:57 -0400 (EDT) Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qnmevZiljsev for ; Wed, 13 Mar 2019 07:35:56 -0400 (EDT) Received: from mail-yw1-f52.google.com (mail-yw1-f52.google.com [209.85.161.52]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 918E249F94 for ; Wed, 13 Mar 2019 07:35:56 -0400 (EDT) Received: by mail-yw1-f52.google.com with SMTP id m207so1133349ywd.5 for ; Wed, 13 Mar 2019 04:35:56 -0700 (PDT) Content-Disposition: inline In-Reply-To: <35c22d0c-7da5-4e68-effb-05c8571d8b63@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu To: Auger Eric Cc: Daniel Thompson , Robin Murphy , kvmarm@lists.cs.columbia.edu List-Id: kvmarm@lists.cs.columbia.edu Hi Eric, On Wed, Mar 13, 2019 at 11:01:33AM +0100, Auger Eric wrote: [...] > > I want to confirm, if this is the recommended mode for > > passthrough PCI-e device to use msi both in host OS and geust OS? > > Or it's will be fine for host OS using msi and guest OS using > > INTx mode? > > If the NIC supports MSIs they logically are used. This can be easily > checked on host by issuing "cat /proc/interrupts | grep vfio". Can you > check whether the guest received any interrupt? I remember that Robin > said in the past that on Juno, the MSI doorbell was in the PCI host > bridge window and possibly transactions towards the doorbell could not > reach it since considered as peer to peer. Using GICv2M should not bring > any performance issue. I tested that in the past with Seattle board. I can see below info on host with launching KVM: root@debian:~# cat /proc/interrupts | grep vfio 46: 0 0 0 0 0 0 MSI 4194304 Edge vfio-msi[0](0000:08:00.0) And below is interrupts in guest: # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 3: 506 400 281 403 298 330 GIC-0 27 Level arch_timer 5: 768 0 0 0 0 0 GIC-0 101 Edge virtio0 6: 246 0 0 0 0 0 GIC-0 102 Edge virtio1 7: 2 0 0 0 0 0 GIC-0 103 Edge virtio2 8: 210 0 0 0 0 0 GIC-0 97 Level ttyS0 13: 0 0 0 0 0 0 MSI 0 Edge eth1 > > - The second question is for GICv2m. If I understand correctly, when > > passthrough PCI-e device to guest OS, in the guest OS we should > > create below data path for PCI-e devices: > > +--------+ > > -> | Memory | > > +-----------+ +------------------+ +-------+ / +--------+ > > | Net card | -> | PCI-e controller | -> | IOMMU | - > > +-----------+ +------------------+ +-------+ \ +--------+ > > -> | MSI | > > | frame | > > +--------+ > > > > Since now the master is network card/PCI-e controller but not CPU, > > thus there have no 2 stages for memory accessing (VA->IPA->PA). In > > this case, if we configure IOMMU (SMMU) for guest OS for address > > translation before switch from host to guest, right? Or SMMU also > > have two stages memory mapping? > > in your use case you don't have any virtual IOMMU. So the guest programs > the assigned device with guest physical device and the virtualizer uses > the physical IOMMU to translate this GPA into host physical address > backing the guest RAM and the MSI frame. A single stage of the physical > IOMMU is used (stage1). Thanks a lot for the explaination. > > Another thing confuses me is I can see the MSI frame is mapped to > > GIC's physical address in host OS, thus the PCI-e device can send > > message correctly to msi frame. But for guest OS, the MSI frame is > > mapped to one IPA memory region, and this region is use to emulate > > GICv2 msi frame rather than the hardware msi frame; thus will any > > access from PCI-e to this region will trap to hypervisor in CPU > > side so KVM hyperviso can help emulate (and inject) the interrupt > > for guest OS? > > when the device sends an MSI it uses a host allocated IOVA for the > physical MSI doorbell. This gets translated by the physical IOMMU, > reaches the physical doorbell. The physical GICv2m triggers the > associated physical SPI -> kvm irqfd -> virtual IRQ > With GICv2M we have direct GSI mapping on guest. Just want to confirm, in your elaborated flow the virtual IRQ will be injected by qemu (or kvmtool) for every time but it's not needed to interfere with IRQ's deactivation, right? > > Essentially, I want to check what's the expected behaviour for GICv2 > > msi frame working mode when we want to passthrough one PCI-e device > > to guest OS and the PCI-e device has one static msi frame for it. > > Your config was tested in the past with Seattle (not with sky2 NIC > though). Adding Robin for the peer to peer potential concern. Very appreciate for your help. Thanks, Leo Yan