kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>,
	Shenming Lu <lushenming@huawei.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Eric Auger <eric.auger@redhat.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	"wanghaibin.wang@huawei.com" <wanghaibin.wang@huawei.com>,
	"yuzenghui@huawei.com" <yuzenghui@huawei.com>
Subject: RE: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
Date: Mon, 1 Feb 2021 07:56:45 +0000	[thread overview]
Message-ID: <MWHPR11MB188684B42632FD0B9B5CA1C08CB69@MWHPR11MB1886.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20210129155730.3a1d49c5@omen.home.shazbot.org>

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Saturday, January 30, 2021 6:58 AM
> On Mon, 25 Jan 2021 17:03:58 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> > Hi,
> >
> > The static pinning and mapping problem in VFIO and possible solutions
> > have been discussed a lot [1, 2]. One of the solutions is to add I/O
> > page fault support for VFIO devices. Different from those relatively
> > complicated software approaches such as presenting a vIOMMU that
> provides
> > the DMA buffer information (might include para-virtualized optimizations),
> > IOPF mainly depends on the hardware faulting capability, such as the PCIe
> > PRI extension or Arm SMMU stall model. What's more, the IOPF support in
> > the IOMMU driver is being implemented in SVA [3]. So do we consider to
> > add IOPF support for VFIO passthrough based on the IOPF part of SVA at
> > present?
> >
> > We have implemented a basic demo only for one stage of translation (GPA
> > -> HPA in virtualization, note that it can be configured at either stage),
> > and tested on Hisilicon Kunpeng920 board. The nested mode is more
> complicated
> > since VFIO only handles the second stage page faults (same as the non-
> nested
> > case), while the first stage page faults need to be further delivered to
> > the guest, which is being implemented in [4] on ARM. My thought on this
> > is to report the page faults to VFIO regardless of the occured stage (try
> > to carry the stage information), and handle respectively according to the
> > configured mode in VFIO. Or the IOMMU driver might evolve to support
> more...
> >
> > Might TODO:
> >  - Optimize the faulting path, and measure the performance (it might still
> >    be a big issue).
> >  - Add support for PRI.
> >  - Add a MMU notifier to avoid pinning.
> >  - Add support for the nested mode.
> > ...
> >
> > Any comments and suggestions are very welcome. :-)
> I expect performance to be pretty bad here, the lookup involved per
> fault is excessive.  There are cases where a user is not going to be
> willing to have a slow ramp up of performance for their devices as they
> fault in pages, so we might need to considering making this
> configurable through the vfio interface.  Our page mapping also only

There is another factor to be considered. The presence of IOMMU_
DEV_FEAT_IOPF just indicates the device capability of triggering I/O 
page fault through the IOMMU, but not exactly means that the device 
can tolerate I/O page fault for arbitrary DMA requests. In reality, many 
devices allow I/O faulting only in selective contexts. However, there
is no standard way (e.g. PCISIG) for the device to report whether 
arbitrary I/O fault is allowed. Then we may have to maintain device
specific knowledge in software, e.g. in an opt-in table to list devices
which allows arbitrary faults. For devices which only support selective 
faulting, a mediator (either through vendor extensions on vfio-pci-core
or a mdev wrapper) might be necessary to help lock down non-faultable 
mappings and then enable faulting on the rest mappings.

> grows here, should mappings expire or do we need a least recently
> mapped tracker to avoid exceeding the user's locked memory limit?  How
> does a user know what to set for a locked memory limit?  The behavior
> here would lead to cases where an idle system might be ok, but as soon
> as load increases with more inflight DMA, we start seeing
> "unpredictable" I/O faults from the user perspective.  Seems like there
> are lots of outstanding considerations and I'd also like to hear from
> the SVA folks about how this meshes with their work.  Thanks,

The main overlap between this feature and SVA is the IOPF reporting
framework, which currently still has gap to support both in nested
mode, as discussed here:


Once that gap is resolved in the future, the VFIO fault handler just 
adopts different actions according to the fault-level: 1st level faults
are forwarded to userspace thru the vSVA path while 2nd-level faults
are fixed (or warned if not intended) by VFIO itself thru the IOMMU
mapping interface.


  parent reply	other threads:[~2021-02-01  7:57 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25  9:03 [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Shenming Lu
2021-01-25  9:03 ` [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages Shenming Lu
2021-01-29 22:58   ` Alex Williamson
2021-01-30  9:31     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 2/4] vfio: Add a page fault handler Shenming Lu
2021-01-27 17:42   ` Christoph Hellwig
2021-01-28  6:10     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices Shenming Lu
2021-01-29 22:42   ` Alex Williamson
2021-01-30  9:31     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 4/4] vfio: Allow to pin and map dynamically Shenming Lu
2021-01-29 22:57 ` [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Alex Williamson
2021-01-30  9:30   ` Shenming Lu
2021-02-01  7:56   ` Tian, Kevin [this message]
2021-02-02  6:41     ` Shenming Lu
2021-02-04  6:52       ` Tian, Kevin
2021-02-05 10:37         ` Jean-Philippe Brucker
2021-02-07  8:20           ` Tian, Kevin
2021-02-07 11:47             ` Shenming Lu
2021-02-09 11:06         ` Liu, Yi L
2021-02-10  8:02           ` Shenming Lu
2021-03-18  7:53         ` Shenming Lu
2021-03-18  9:07           ` Tian, Kevin
2021-03-18 11:53             ` Shenming Lu
2021-03-18 12:32               ` Tian, Kevin
2021-03-18 12:47                 ` Shenming Lu
2021-03-19  0:33               ` Lu Baolu
2021-03-19  1:30                 ` Keqian Zhu
2021-03-20  1:35                   ` Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR11MB188684B42632FD0B9B5CA1C08CB69@MWHPR11MB1886.namprd11.prod.outlook.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=cohuck@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lushenming@huawei.com \
    --cc=wanghaibin.wang@huawei.com \
    --cc=yuzenghui@huawei.com \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).