On Thu, Jan 27, 2022 at 02:22:53PM -0700, Alex Williamson wrote: > On Thu, 27 Jan 2022 08:30:13 +0000 > Stefan Hajnoczi wrote: > > > On Wed, Jan 26, 2022 at 04:13:33PM -0500, Michael S. Tsirkin wrote: > > > On Wed, Jan 26, 2022 at 08:07:36PM +0000, Dr. David Alan Gilbert wrote: > > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote: > > > > > On Wed, Jan 26, 2022 at 05:27:32AM +0000, Jag Raman wrote: > > > > > > > > > > > > > > > > > > > On Jan 25, 2022, at 1:38 PM, Dr. David Alan Gilbert wrote: > > > > > > > > > > > > > > * Jag Raman (jag.raman@oracle.com) wrote: > > > > > > >> > > > > > > >> > > > > > > >>> On Jan 19, 2022, at 7:12 PM, Michael S. Tsirkin wrote: > > > > > > >>> > > > > > > >>> On Wed, Jan 19, 2022 at 04:41:52PM -0500, Jagannathan Raman wrote: > > > > > > >>>> Allow PCI buses to be part of isolated CPU address spaces. This has a > > > > > > >>>> niche usage. > > > > > > >>>> > > > > > > >>>> TYPE_REMOTE_MACHINE allows multiple VMs to house their PCI devices in > > > > > > >>>> the same machine/server. This would cause address space collision as > > > > > > >>>> well as be a security vulnerability. Having separate address spaces for > > > > > > >>>> each PCI bus would solve this problem. > > > > > > >>> > > > > > > >>> Fascinating, but I am not sure I understand. any examples? > > > > > > >> > > > > > > >> Hi Michael! > > > > > > >> > > > > > > >> multiprocess QEMU and vfio-user implement a client-server model to allow > > > > > > >> out-of-process emulation of devices. The client QEMU, which makes ioctls > > > > > > >> to the kernel and runs VCPUs, could attach devices running in a server > > > > > > >> QEMU. The server QEMU needs access to parts of the client’s RAM to > > > > > > >> perform DMA. > > > > > > > > > > > > > > Do you ever have the opposite problem? i.e. when an emulated PCI device > > > > > > > > > > > > That’s an interesting question. > > > > > > > > > > > > > exposes a chunk of RAM-like space (frame buffer, or maybe a mapped file) > > > > > > > that the client can see. What happens if two emulated devices need to > > > > > > > access each others emulated address space? > > > > > > > > > > > > In this case, the kernel driver would map the destination’s chunk of internal RAM into > > > > > > the DMA space of the source device. Then the source device could write to that > > > > > > mapped address range, and the IOMMU should direct those writes to the > > > > > > destination device. > > > > > > > > > > > > I would like to take a closer look at the IOMMU implementation on how to achieve > > > > > > this, and get back to you. I think the IOMMU would handle this. Could you please > > > > > > point me to the IOMMU implementation you have in mind? > > > > > > > > > > I don't know if the current vfio-user client/server patches already > > > > > implement device-to-device DMA, but the functionality is supported by > > > > > the vfio-user protocol. > > > > > > > > > > Basically: if the DMA regions lookup inside the vfio-user server fails, > > > > > fall back to VFIO_USER_DMA_READ/WRITE messages instead. > > > > > https://github.com/nutanix/libvfio-user/blob/master/docs/vfio-user.rst#vfio-user-dma-read > > > > > > > > > > Here is the flow: > > > > > 1. The vfio-user server with device A sends a DMA read to QEMU. > > > > > 2. QEMU finds the MemoryRegion associated with the DMA address and sees > > > > > it's a device. > > > > > a. If it's emulated inside the QEMU process then the normal > > > > > device emulation code kicks in. > > > > > b. If it's another vfio-user PCI device then the vfio-user PCI proxy > > > > > device forwards the DMA to the second vfio-user server's device B. > > > > > > > > I'm starting to be curious if there's a way to persuade the guest kernel > > > > to do it for us; in general is there a way to say to PCI devices that > > > > they can only DMA to the host and not other PCI devices? > > > > > > > > > But of course - this is how e.g. VFIO protects host PCI devices from > > > each other when one of them is passed through to a VM. > > > > Michael: Are you saying just turn on vIOMMU? :) > > > > Devices in different VFIO groups have their own IOMMU context, so their > > IOVA space is isolated. Just don't map other devices into the IOVA space > > and those other devices will be inaccessible. > > Devices in different VFIO *containers* have their own IOMMU context. > Based on the group attachment to a container, groups can either have > shared or isolated IOVA space. That determination is made by looking > at the address space of the bus, which is governed by the presence of a > vIOMMU. Oops, thank you for pointing that out! Stefan