All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "eduardo@habkost.net" <eduardo@habkost.net>,
	"Elena Ufimtseva" <elena.ufimtseva@oracle.com>,
	"Jag Raman" <jag.raman@oracle.com>,
	"Beraldo Leal" <bleal@redhat.com>,
	"John Johnson" <john.g.johnson@oracle.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@gmail.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"thanos.makatos@nutanix.com" <thanos.makatos@nutanix.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"john.levon@nutanix.com" <john.levon@nutanix.com>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>
Subject: Re: [PATCH v5 03/18] pci: isolated address space for PCI bus
Date: Thu, 10 Feb 2022 19:26:55 -0500	[thread overview]
Message-ID: <20220210190654-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20220210164933.53b32a1e.alex.williamson@redhat.com>

On Thu, Feb 10, 2022 at 04:49:33PM -0700, Alex Williamson wrote:
> On Thu, 10 Feb 2022 18:28:56 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Feb 10, 2022 at 04:17:34PM -0700, Alex Williamson wrote:
> > > On Thu, 10 Feb 2022 22:23:01 +0000
> > > Jag Raman <jag.raman@oracle.com> wrote:
> > >   
> > > > > On Feb 10, 2022, at 3:02 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > 
> > > > > On Thu, Feb 10, 2022 at 12:08:27AM +0000, Jag Raman wrote:    
> > > > >> 
> > > > >> Thanks for the explanation, Alex. Thanks to everyone else in the thread who
> > > > >> helped to clarify this problem.
> > > > >> 
> > > > >> We have implemented the memory isolation based on the discussion in the
> > > > >> thread. We will send the patches out shortly.
> > > > >> 
> > > > >> Devices such as “name" and “e1000” worked fine. But I’d like to note that
> > > > >> the LSI device (TYPE_LSI53C895A) had some problems - it doesn’t seem
> > > > >> to be IOMMU aware. In LSI’s case, the kernel driver is asking the device to
> > > > >> read instructions from the CPU VA (lsi_execute_script() -> read_dword()),
> > > > >> which is forbidden when IOMMU is enabled. Specifically, the driver is asking
> > > > >> the device to access other BAR regions by using the BAR address programmed
> > > > >> in the PCI config space. This happens even without vfio-user patches. For example,
> > > > >> we could enable IOMMU using “-device intel-iommu” QEMU option and also
> > > > >> adding the following to the kernel command-line: “intel_iommu=on iommu=nopt”.
> > > > >> In this case, we could see an IOMMU fault.    
> > > > > 
> > > > > So, device accessing its own BAR is different. Basically, these
> > > > > transactions never go on the bus at all, never mind get to the IOMMU.    
> > > > 
> > > > Hi Michael,
> > > > 
> > > > In LSI case, I did notice that it went to the IOMMU. The device is reading the BAR
> > > > address as if it was a DMA address.
> > > >   
> > > > > I think it's just used as a handle to address internal device memory.
> > > > > This kind of trick is not universal, but not terribly unusual.
> > > > > 
> > > > >     
> > > > >> Unfortunately, we started off our project with the LSI device. So that lead to all the
> > > > >> confusion about what is expected at the server end in-terms of
> > > > >> vectoring/address-translation. It gave an impression as if the request was still on
> > > > >> the CPU side of the PCI root complex, but the actual problem was with the
> > > > >> device driver itself.
> > > > >> 
> > > > >> I’m wondering how to deal with this problem. Would it be OK if we mapped the
> > > > >> device’s BAR into the IOVA, at the same CPU VA programmed in the BAR registers?
> > > > >> This would help devices such as LSI to circumvent this problem. One problem
> > > > >> with this approach is that it has the potential to collide with another legitimate
> > > > >> IOVA address. Kindly share your thought on this.
> > > > >> 
> > > > >> Thank you!    
> > > > > 
> > > > > I am not 100% sure what do you plan to do but it sounds fine since even
> > > > > if it collides, with traditional PCI device must never initiate cycles    
> > > > 
> > > > OK sounds good, I’ll create a mapping of the device BARs in the IOVA.  
> > > 
> > > I don't think this is correct.  Look for instance at ACPI _TRA support
> > > where a system can specify a translation offset such that, for example,
> > > a CPU access to a device is required to add the provided offset to the
> > > bus address of the device.  A system using this could have multiple
> > > root bridges, where each is given the same, overlapping MMIO aperture.  
> > > >From the processor perspective, each MMIO range is unique and possibly  
> > > none of those devices have a zero _TRA, there could be system memory at
> > > the equivalent flat memory address.  
> > 
> > I am guessing there are reasons to have these in acpi besides firmware
> > vendors wanting to find corner cases in device implementations though
> > :). E.g. it's possible something else is tweaking DMA in similar ways. I
> > can't say for sure and I wonder why do we care as long as QEMU does not
> > have _TRA.
> 
> How many complaints do we get about running out of I/O port space on
> q35 because we allow an arbitrary number of root ports?  What if we
> used _TRA to provide the full I/O port range per root port?  32-bit
> MMIO could be duplicated as well.

It's an interesting idea. To clarify what I said, I suspect some devices
are broken in presence of translating bridges unless DMA
is also translated to match.

I agree it's a mess though, in that some devices when given their own
BAR to DMA to will probably just satisfy the access from internal
memory, while others will ignore that and send it up as DMA
and both types are probably out there in the field.


> > > So if the transaction actually hits this bus, which I think is what
> > > making use of the device AddressSpace implies, I don't think it can
> > > assume that it's simply reflected back at itself.  Conventional PCI and
> > > PCI Express may be software compatible, but there's a reason we don't
> > > see IOMMUs that provide both translation and isolation in conventional
> > > topologies.
> > > 
> > > Is this more a bug in the LSI device emulation model?  For instance in
> > > vfio-pci, if I want to access an offset into a BAR from within QEMU, I
> > > don't care what address is programmed into that BAR, I perform an
> > > access relative to the vfio file descriptor region representing that
> > > BAR space.  I'd expect that any viable device emulation model does the
> > > same, an access to device memory uses an offset from an internal
> > > resource, irrespective of the BAR address.  
> > 
> > However, using BAR seems like a reasonable shortcut allowing
> > device to use the same 64 bit address to refer to system
> > and device RAM interchangeably.
> 
> A shortcut that breaks when an IOMMU is involved.

Maybe. But if that's how hardware behaves, we have little choice but
emulate it.

> > > It would seem strange if the driver is actually programming the device
> > > to DMA to itself and if that's actually happening, I'd wonder if this
> > > driver is actually compatible with an IOMMU on bare metal.  
> > 
> > You know, it's hardware after all. As long as things work vendors
> > happily ship the device. They don't really worry about theoretical issues ;).
> 
> We're talking about a 32-bit conventional PCI device from the previous
> century.  IOMMUs are no longer theoretical, but not all drivers have
> kept up.  It's maybe not the best choice as the de facto standard
> device, imo.  Thanks,
> 
> Alex

More importantly lots devices most likely don't support arbitrary
configurations and break if you try to create something that matches the
spec but never or even rarely occurs on bare metal.

-- 
MST



  reply	other threads:[~2022-02-11  0:28 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-19 21:41 [PATCH v5 00/18] vfio-user server in QEMU Jagannathan Raman
2022-01-19 21:41 ` [PATCH v5 01/18] configure, meson: override C compiler for cmake Jagannathan Raman
2022-01-20 13:27   ` Paolo Bonzini
2022-01-20 15:21     ` Jag Raman
2022-02-17  6:10     ` Jag Raman
2022-01-19 21:41 ` [PATCH v5 02/18] tests/avocado: Specify target VM argument to helper routines Jagannathan Raman
2022-01-25  9:40   ` Stefan Hajnoczi
2022-01-19 21:41 ` [PATCH v5 03/18] pci: isolated address space for PCI bus Jagannathan Raman
2022-01-20  0:12   ` Michael S. Tsirkin
2022-01-20 15:20     ` Jag Raman
2022-01-25 18:38       ` Dr. David Alan Gilbert
2022-01-26  5:27         ` Jag Raman
2022-01-26  9:45           ` Stefan Hajnoczi
2022-01-26 20:07             ` Dr. David Alan Gilbert
2022-01-26 21:13               ` Michael S. Tsirkin
2022-01-27  8:30                 ` Stefan Hajnoczi
2022-01-27 12:50                   ` Michael S. Tsirkin
2022-01-27 21:22                   ` Alex Williamson
2022-01-28  8:19                     ` Stefan Hajnoczi
2022-01-28  9:18                     ` Stefan Hajnoczi
2022-01-31 16:16                       ` Alex Williamson
2022-02-01  9:30                         ` Stefan Hajnoczi
2022-02-01 15:24                           ` Alex Williamson
2022-02-01 21:24                             ` Jag Raman
2022-02-01 22:47                               ` Alex Williamson
2022-02-02  1:13                                 ` Jag Raman
2022-02-02  5:34                                   ` Alex Williamson
2022-02-02  9:22                                     ` Stefan Hajnoczi
2022-02-10  0:08                                     ` Jag Raman
2022-02-10  8:02                                       ` Michael S. Tsirkin
2022-02-10 22:23                                         ` Jag Raman
2022-02-10 22:53                                           ` Michael S. Tsirkin
2022-02-10 23:46                                             ` Jag Raman
2022-02-10 23:17                                           ` Alex Williamson
2022-02-10 23:28                                             ` Michael S. Tsirkin
2022-02-10 23:49                                               ` Alex Williamson
2022-02-11  0:26                                                 ` Michael S. Tsirkin [this message]
2022-02-11  0:54                                                   ` Jag Raman
2022-02-11  0:10                                             ` Jag Raman
2022-02-02  9:30                                 ` Peter Maydell
2022-02-02 10:06                                   ` Michael S. Tsirkin
2022-02-02 15:49                                     ` Alex Williamson
2022-02-02 16:53                                       ` Michael S. Tsirkin
2022-02-02 17:12                                   ` Alex Williamson
2022-02-01 10:42                     ` Dr. David Alan Gilbert
2022-01-26 18:13           ` Dr. David Alan Gilbert
2022-01-27 17:43             ` Jag Raman
2022-01-25  9:56   ` Stefan Hajnoczi
2022-01-25 13:49     ` Jag Raman
2022-01-25 14:19       ` Stefan Hajnoczi
2022-01-19 21:41 ` [PATCH v5 04/18] pci: create and free isolated PCI buses Jagannathan Raman
2022-01-25 10:25   ` Stefan Hajnoczi
2022-01-25 14:10     ` Jag Raman
2022-01-19 21:41 ` [PATCH v5 05/18] qdev: unplug blocker for devices Jagannathan Raman
2022-01-25 10:27   ` Stefan Hajnoczi
2022-01-25 14:43     ` Jag Raman
2022-01-26  9:32       ` Stefan Hajnoczi
2022-01-26 15:13         ` Jag Raman
2022-01-19 21:41 ` [PATCH v5 06/18] vfio-user: add HotplugHandler for remote machine Jagannathan Raman
2022-01-25 10:32   ` Stefan Hajnoczi
2022-01-25 18:12     ` Jag Raman
2022-01-26  9:35       ` Stefan Hajnoczi
2022-01-26 15:20         ` Jag Raman
2022-01-26 15:43           ` Stefan Hajnoczi
2022-01-19 21:41 ` [PATCH v5 07/18] vfio-user: set qdev bus callbacks " Jagannathan Raman
2022-01-25 10:44   ` Stefan Hajnoczi
2022-01-25 21:12     ` Jag Raman
2022-01-26  9:37       ` Stefan Hajnoczi
2022-01-26 15:51         ` Jag Raman
2022-01-19 21:41 ` [PATCH v5 08/18] vfio-user: build library Jagannathan Raman
2022-01-19 21:41 ` [PATCH v5 09/18] vfio-user: define vfio-user-server object Jagannathan Raman
2022-01-25 14:40   ` Stefan Hajnoczi
2022-01-19 21:41 ` [PATCH v5 10/18] vfio-user: instantiate vfio-user context Jagannathan Raman
2022-01-25 14:44   ` Stefan Hajnoczi
2022-01-19 21:42 ` [PATCH v5 11/18] vfio-user: find and init PCI device Jagannathan Raman
2022-01-25 14:48   ` Stefan Hajnoczi
2022-01-26  3:14     ` Jag Raman
2022-01-19 21:42 ` [PATCH v5 12/18] vfio-user: run vfio-user context Jagannathan Raman
2022-01-25 15:10   ` Stefan Hajnoczi
2022-01-26  3:26     ` Jag Raman
2022-01-19 21:42 ` [PATCH v5 13/18] vfio-user: handle PCI config space accesses Jagannathan Raman
2022-01-25 15:13   ` Stefan Hajnoczi
2022-01-19 21:42 ` [PATCH v5 14/18] vfio-user: handle DMA mappings Jagannathan Raman
2022-01-19 21:42 ` [PATCH v5 15/18] vfio-user: handle PCI BAR accesses Jagannathan Raman
2022-01-19 21:42 ` [PATCH v5 16/18] vfio-user: handle device interrupts Jagannathan Raman
2022-01-25 15:25   ` Stefan Hajnoczi
2022-01-19 21:42 ` [PATCH v5 17/18] vfio-user: register handlers to facilitate migration Jagannathan Raman
2022-01-25 15:48   ` Stefan Hajnoczi
2022-01-27 17:04     ` Jag Raman
2022-01-28  8:29       ` Stefan Hajnoczi
2022-01-28 14:49         ` Thanos Makatos
2022-02-01  3:49         ` Jag Raman
2022-02-01  9:37           ` Stefan Hajnoczi
2022-01-19 21:42 ` [PATCH v5 18/18] vfio-user: avocado tests for vfio-user Jagannathan Raman
2022-01-26  4:25   ` Philippe Mathieu-Daudé via
2022-01-26 15:12     ` Jag Raman
2022-01-25 16:00 ` [PATCH v5 00/18] vfio-user server in QEMU Stefan Hajnoczi
2022-01-26  5:04   ` Jag Raman
2022-01-26  9:56     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220210190654-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=bleal@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=elena.ufimtseva@oracle.com \
    --cc=f4bug@amsat.org \
    --cc=jag.raman@oracle.com \
    --cc=john.g.johnson@oracle.com \
    --cc=john.levon@nutanix.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=thanos.makatos@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.