kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Niklas Schnelle <schnelle@linux.ibm.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
	Matthew Rosato <mjrosato@linux.ibm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	linux-s390@vger.kernel.org, cohuck@redhat.com,
	farman@linux.ibm.com, pmorel@linux.ibm.com,
	borntraeger@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com,
	gerald.schaefer@linux.ibm.com, agordeev@linux.ibm.com,
	frankja@linux.ibm.com, david@redhat.com, imbrenda@linux.ibm.com,
	vneethv@linux.ibm.com, oberpar@linux.ibm.com,
	freude@linux.ibm.com, thuth@redhat.com, pasic@linux.ibm.com,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 24/30] vfio-pci/zdev: wire up group notifier
Date: Thu, 10 Feb 2022 12:15:58 +0100	[thread overview]
Message-ID: <13cf51210d125d48a47d55d9c6a20c93f5a2b78b.camel@linux.ibm.com> (raw)
In-Reply-To: <20220208204041.GK4160@nvidia.com>

On Tue, 2022-02-08 at 16:40 -0400, Jason Gunthorpe wrote:
> On Tue, Feb 08, 2022 at 03:33:58PM -0500, Matthew Rosato wrote:
> 
> > > Is the purpose of IOAT to associate the device to a set of KVM page
> > > tables?  That seems like a container or future iommufd operation.  I
> > 
> > Yes, here we are establishing a relationship with the DMA table in the guest
> > so that once mappings are established guest PCI operations (handled via
> > special instructions in s390) don't need to go through the host but can be
> > directly handled by firmware (so, effectively guest can keep running on its
> > vcpu vs breaking out).
> 
> Oh, well, certainly sounds like a NAK on that - anything to do with
> the DMA translation of a PCI device must go through the iommu layer,
> not here.
> 
> Lets not repeat the iommu subsytem bypass mess power made please.

Maybe some context on all of this. First it's important to note that on
s390x the PCI IOMMU hardware is controlled with special instructions.
For pass-through this is actually quite nice as it makes it relatively
simple for us to always run with an IOMMU in the guest we simply need
to provide the instructions. Meaning we get full IOMMU protection for
pass-through devices on KVM guests, guests with pass-through remain
pageable and we can even support nested pass-through.

This is possible with relatively little overhead because we can do all
of the per map/unmap guest IOMMU  operations with a single instruction
intercept. The instruction we need to intercept is called Refresh PCI
Translations (RPCIT). It's job is twofold.

For an OS running directly on our machine hypervisor LPAR it flushes
the IOMMU's TLB by informing it which pages have been invalidated while
the hardware walks the page tables and fills the TLB on it's own for
establishing a mapping for previously invalid IOVAs.

In a KVM or z/VM guest the guest is informed that IOMMU translations
need to be refreshed even for previously invalid IOVAs. With this the
guest builds it's IOMMU translation tables as normal but then does a
RPCIT for the IOVA range it touched. In the hypervisor we can then
simply walk the translation tables, pin the guest pages and map them in
the host IOMMU. Prior to this series this happened in QEMU which does
the map via vfio-iommu-type1 from user-space. This works and will
remain as a fallback. Sadly it is quite slow and has a large impact on
performance as we need to do a lot of mapping operations as the DMA API
of the guest goes through the virtual IOMMU. This series thus adds the
same functionality but as a KVM intercept of RPCIT. Now I think this
neatly fits into KVM, we're emulating an instruction after all and most
of its work is KVM specific pinning of guest pages. Importantly all
other handling like IOMMU domain attachment still goes through vfio-
iommu-type1 and we just fast path the map/unmap operations.

In the code the map/unmap boils down to dma_walk_cpu_trans() and parts
of dma_shadow_cpu_trans() both called in dma_table_shadow(). The former
is a function already shared between our DMA API and IOMMU API
implementations and the only code that walks the host translation
tables. So in a way we're side stepping the IOMMU API ops that is true
but we do not side step the IOMMU host table access code paths. Notice
how our IOMMU API is also < 400 LOC because both the DMA and IOMMU APIs
share code.

That said, I believe we should be able to do the mapping still in a KVM
RPCIT intercept but going through IOMMU API ops if this side stepping
is truly unacceptable. It definitely adds overhead though and I'm not
sure what we gain in clarity or maintainability since we already share
the actual host table access code and there is only one PCI IOMMU and
that is part of the architecture. Also either KVM or QEMU needs to know
about the same details for looking at guest IOMMU translation tables /
emulating the guest IOMMU. It's also clear that the IOMMU API will
remain functional on its own as it is necesssary for any non-KVM use
case which of course can't intercept RPCIT but on the other hand can
also keep mappings much longer signficantly reducing overhead.


  parent reply	other threads:[~2022-02-10 11:16 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-04 21:15 [PATCH v3 00/30] KVM: s390: enable zPCI for interpretive execution Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 01/30] s390/sclp: detect the zPCI load/store interpretation facility Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 02/30] s390/sclp: detect the AISII facility Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 03/30] s390/sclp: detect the AENI facility Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 04/30] s390/sclp: detect the AISI facility Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 05/30] s390/airq: pass more TPI info to airq handlers Matthew Rosato
2022-02-07  8:28   ` Cornelia Huck
2022-02-04 21:15 ` [PATCH v3 06/30] s390/airq: allow for airq structure that uses an input vector Matthew Rosato
2022-02-07  8:29   ` Cornelia Huck
2022-02-07  8:42   ` Claudio Imbrenda
2022-02-04 21:15 ` [PATCH v3 07/30] s390/pci: externalize the SIC operation controls and routine Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 08/30] s390/pci: stash associated GISA designation Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 09/30] s390/pci: export some routines related to RPCIT processing Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 10/30] s390/pci: stash dtsm and maxstbl Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 11/30] s390/pci: add helper function to find device by handle Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 12/30] s390/pci: get SHM information from list pci Matthew Rosato
2022-02-07 10:08   ` Pierre Morel
2022-02-04 21:15 ` [PATCH v3 13/30] s390/pci: return status from zpci_refresh_trans Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 14/30] vfio/pci: re-introduce CONFIG_VFIO_PCI_ZDEV Matthew Rosato
2022-02-07  8:35   ` Cornelia Huck
2022-02-07 15:43     ` Matthew Rosato
2022-02-07 17:59       ` Cornelia Huck
2022-02-07 20:09         ` Matthew Rosato
2022-02-10 10:07           ` Cornelia Huck
2022-02-04 21:15 ` [PATCH v3 15/30] KVM: s390: pci: add basic kvm_zdev structure Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 16/30] KVM: s390: pci: do initial setup for AEN interpretation Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 17/30] KVM: s390: pci: enable host forwarding of Adapter Event Notifications Matthew Rosato
2022-02-14 12:59   ` Pierre Morel
2022-02-14 20:35     ` Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 18/30] KVM: s390: mechanism to enable guest zPCI Interpretation Matthew Rosato
2022-02-14 13:06   ` Pierre Morel
2022-02-04 21:15 ` [PATCH v3 19/30] KVM: s390: pci: provide routines for enabling/disabling interpretation Matthew Rosato
2022-02-14 13:22   ` Pierre Morel
2022-02-04 21:15 ` [PATCH v3 20/30] KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 21/30] KVM: s390: pci: provide routines for enabling/disabling IOAT assist Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 22/30] KVM: s390: pci: handle refresh of PCI translations Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 23/30] KVM: s390: intercept the rpcit instruction Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 24/30] vfio-pci/zdev: wire up group notifier Matthew Rosato
2022-02-08 17:43   ` Alex Williamson
2022-02-08 18:51     ` Jason Gunthorpe
2022-02-08 19:26       ` Alex Williamson
2022-02-08 19:51         ` Jason Gunthorpe
2022-02-08 20:33         ` Matthew Rosato
2022-02-08 20:40           ` Jason Gunthorpe
2022-02-08 21:37             ` Matthew Rosato
2022-02-10 11:15             ` Niklas Schnelle [this message]
2022-02-10 13:01               ` Jason Gunthorpe
2022-02-10 14:06                 ` Niklas Schnelle
2022-02-10 15:23                   ` Jason Gunthorpe
2022-02-10 18:59                     ` Matthew Rosato
2022-02-10 23:45                       ` Jason Gunthorpe
2022-02-04 21:15 ` [PATCH v3 25/30] vfio-pci/zdev: wire up zPCI interpretive execution support Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 26/30] vfio-pci/zdev: wire up zPCI adapter interrupt forwarding support Matthew Rosato
2022-02-07 16:38   ` Pierre Morel
2022-02-04 21:15 ` [PATCH v3 27/30] vfio-pci/zdev: wire up zPCI IOAT assist support Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 28/30] vfio-pci/zdev: add DTSM to clp group capability Matthew Rosato
2022-02-04 21:15 ` [PATCH v3 29/30] KVM: s390: introduce CPU feature for zPCI Interpretation Matthew Rosato
2022-02-07 16:36   ` Pierre Morel
2022-02-04 21:15 ` [PATCH v3 30/30] MAINTAINERS: additional files related kvm s390 pci passthrough Matthew Rosato
2022-02-07 13:04   ` Christian Borntraeger
2022-02-07 15:44     ` Matthew Rosato
2022-02-04 21:33 ` [PATCH v3 00/30] KVM: s390: enable zPCI for interpretive execution Matthew Rosato

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13cf51210d125d48a47d55d9c6a20c93f5a2b78b.camel@linux.ibm.com \
    --to=schnelle@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=farman@linux.ibm.com \
    --cc=frankja@linux.ibm.com \
    --cc=freude@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=oberpar@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=pmorel@linux.ibm.com \
    --cc=thuth@redhat.com \
    --cc=vneethv@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).