kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Shenming Lu <lushenming@huawei.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Eric Auger <eric.auger@redhat.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	"wanghaibin.wang@huawei.com" <wanghaibin.wang@huawei.com>,
	"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>
Subject: RE: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
Date: Sun, 7 Feb 2021 08:20:06 +0000	[thread overview]
Message-ID: <MWHPR11MB1886D07D927A77DCB2FF74D48CB09@MWHPR11MB1886.namprd11.prod.outlook.com> (raw)
In-Reply-To: <YB0f5Yno9frihQq4@myrica>

> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Friday, February 5, 2021 6:37 PM
> 
> Hi,
> 
> On Thu, Feb 04, 2021 at 06:52:10AM +0000, Tian, Kevin wrote:
> > > >>> The static pinning and mapping problem in VFIO and possible
> solutions
> > > >>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
> > > >>> page fault support for VFIO devices. Different from those relatively
> > > >>> complicated software approaches such as presenting a vIOMMU that
> > > >> provides
> > > >>> the DMA buffer information (might include para-virtualized
> > > optimizations),
> 
> I'm curious about the performance difference between this and the
> map/unmap vIOMMU, as well as the coIOMMU. This is probably a lot faster
> but those don't depend on IOPF which is a pretty rare feature at the
> moment.
> 
> [...]
> > > > In reality, many
> > > > devices allow I/O faulting only in selective contexts. However, there
> > > > is no standard way (e.g. PCISIG) for the device to report whether
> > > > arbitrary I/O fault is allowed. Then we may have to maintain device
> > > > specific knowledge in software, e.g. in an opt-in table to list devices
> > > > which allows arbitrary faults. For devices which only support selective
> > > > faulting, a mediator (either through vendor extensions on vfio-pci-core
> > > > or a mdev wrapper) might be necessary to help lock down non-
> faultable
> > > > mappings and then enable faulting on the rest mappings.
> > >
> > > For devices which only support selective faulting, they could tell it to the
> > > IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
> >
> > Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
> > selectively page-pinning. The matter is that 'they' imply some device
> > specific logic to decide which pages must be pinned and such knowledge
> > is outside of VFIO.
> >
> > From enabling p.o.v we could possibly do it in phased approach. First
> > handles devices which tolerate arbitrary DMA faults, and then extends
> > to devices with selective-faulting. The former is simpler, but with one
> > main open whether we want to maintain such device IDs in a static
> > table in VFIO or rely on some hints from other components (e.g. PF
> > driver in VF assignment case). Let's see how Alex thinks about it.
> 
> Do you think selective-faulting will be the norm, or only a problem for
> initial IOPF implementations?  To me it's the selective-faulting kind of
> device that will be the odd one out, but that's pure speculation. Either
> way maintaining a device list seems like a pain.

I would think it's norm for quite some time (e.g. multiple years), as from
what I learned turning a complex accelerator to an implementation 
tolerating arbitrary DMA fault is way complex (in every critical path) and
not cost effective (tracking in-fly requests). It might be OK for some 
purposely-built devices in specific usage but for most it has to be an 
evolving path toward the 100%-faultable goal...

> 
> [...]
> > Yes, it's in plan but just not happened yet. We are still focusing on guest
> > SVA part thus only the 1st-level page fault (+Yi/Jacob). It's always
> welcomed
> > to collaborate/help if you have time. 😊
> 
> By the way the current fault report API is missing a way to invalidate
> partial faults: when the IOMMU device's PRI queue overflows, it may
> auto-respond to page request groups that were already partially reported
> by the IOMMU driver. Upon detecting an overflow, the IOMMU driver needs
> to
> tell all fault consumers to discard their partial groups.
> iopf_queue_discard_partial() [1] does this for the internal IOPF handler
> but we have nothing for the lower-level fault handler at the moment. And
> it gets more complicated when injecting IOPFs to guests, we'd need a
> mechanism to recall partial groups all the way through kernel->userspace
> and userspace->guest.

I didn't know how to recall partial groups through emulated vIOMMUs
(at least for virtual VT-d). Possibly it could be supported by virtio-iommu.
But in any case I consider it more like an optimization instead of a functional
requirement (and could be avoided in below Shenming's suggestion).

> 
> Shenming suggests [2] to also use the IOPF handler for IOPFs managed by
> device drivers. It's worth considering in my opinion because we could hold
> partial groups within the kernel and only report full groups to device
> drivers (and guests). In addition we'd consolidate tracking of IOPFs,
> since they're done both by iommu_report_device_fault() and the IOPF
> handler at the moment.

I also think it's the right thing to do. In concept w/ or w/o DEV_FEAT_IOPF
just reflects how IOPFs are delivered to the system software. In the end 
IOPFs are all about permission violations in the IOMMU page tables thus
we should try to reuse/consolidate the IOMMU fault reporting stack as 
much as possible.

> 
> Note that I plan to upstream the IOPF patch [1] as is because it was
> already in good shape for 5.12, and consolidating the fault handler will
> require some thinking.

This plan makes sense.

> 
> Thanks,
> Jean
> 
> 
> [1] https://lore.kernel.org/linux-iommu/20210127154322.3959196-7-jean-
> philippe@linaro.org/
> [2] https://lore.kernel.org/linux-iommu/f79f06be-e46b-a65a-3951-
> 3e7dbfa66b4a@huawei.com/

Thanks
Kevin

  reply	other threads:[~2021-02-07  8:21 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25  9:03 [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Shenming Lu
2021-01-25  9:03 ` [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages Shenming Lu
2021-01-29 22:58   ` Alex Williamson
2021-01-30  9:31     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 2/4] vfio: Add a page fault handler Shenming Lu
2021-01-27 17:42   ` Christoph Hellwig
2021-01-28  6:10     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices Shenming Lu
2021-01-29 22:42   ` Alex Williamson
2021-01-30  9:31     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 4/4] vfio: Allow to pin and map dynamically Shenming Lu
2021-01-29 22:57 ` [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Alex Williamson
2021-01-30  9:30   ` Shenming Lu
2021-02-01  7:56   ` Tian, Kevin
2021-02-02  6:41     ` Shenming Lu
2021-02-04  6:52       ` Tian, Kevin
2021-02-05 10:37         ` Jean-Philippe Brucker
2021-02-07  8:20           ` Tian, Kevin [this message]
2021-02-07 11:47             ` Shenming Lu
2021-02-09 11:06         ` Liu, Yi L
2021-02-10  8:02           ` Shenming Lu
2021-03-18  7:53         ` Shenming Lu
2021-03-18  9:07           ` Tian, Kevin
2021-03-18 11:53             ` Shenming Lu
2021-03-18 12:32               ` Tian, Kevin
2021-03-18 12:47                 ` Shenming Lu
2021-03-19  0:33               ` Lu Baolu
2021-03-19  1:30                 ` Keqian Zhu
2021-03-20  1:35                   ` Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR11MB1886D07D927A77DCB2FF74D48CB09@MWHPR11MB1886.namprd11.prod.outlook.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=cohuck@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=jean-philippe@linaro.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lushenming@huawei.com \
    --cc=wanghaibin.wang@huawei.com \
    --cc=yi.l.liu@intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).