From: "Tian, Kevin" <kevin.tian@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>,
Shenming Lu <lushenming@huawei.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Jean-Philippe Brucker <jean-philippe@linaro.org>,
Eric Auger <eric.auger@redhat.com>,
Lu Baolu <baolu.lu@linux.intel.com>,
"wanghaibin.wang@huawei.com" <wanghaibin.wang@huawei.com>,
"yuzenghui@huawei.com" <yuzenghui@huawei.com>
Subject: RE: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
Date: Mon, 1 Feb 2021 07:56:45 +0000 [thread overview]
Message-ID: <MWHPR11MB188684B42632FD0B9B5CA1C08CB69@MWHPR11MB1886.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20210129155730.3a1d49c5@omen.home.shazbot.org>
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Saturday, January 30, 2021 6:58 AM
>
> On Mon, 25 Jan 2021 17:03:58 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
>
> > Hi,
> >
> > The static pinning and mapping problem in VFIO and possible solutions
> > have been discussed a lot [1, 2]. One of the solutions is to add I/O
> > page fault support for VFIO devices. Different from those relatively
> > complicated software approaches such as presenting a vIOMMU that
> provides
> > the DMA buffer information (might include para-virtualized optimizations),
> > IOPF mainly depends on the hardware faulting capability, such as the PCIe
> > PRI extension or Arm SMMU stall model. What's more, the IOPF support in
> > the IOMMU driver is being implemented in SVA [3]. So do we consider to
> > add IOPF support for VFIO passthrough based on the IOPF part of SVA at
> > present?
> >
> > We have implemented a basic demo only for one stage of translation (GPA
> > -> HPA in virtualization, note that it can be configured at either stage),
> > and tested on Hisilicon Kunpeng920 board. The nested mode is more
> complicated
> > since VFIO only handles the second stage page faults (same as the non-
> nested
> > case), while the first stage page faults need to be further delivered to
> > the guest, which is being implemented in [4] on ARM. My thought on this
> > is to report the page faults to VFIO regardless of the occured stage (try
> > to carry the stage information), and handle respectively according to the
> > configured mode in VFIO. Or the IOMMU driver might evolve to support
> more...
> >
> > Might TODO:
> > - Optimize the faulting path, and measure the performance (it might still
> > be a big issue).
> > - Add support for PRI.
> > - Add a MMU notifier to avoid pinning.
> > - Add support for the nested mode.
> > ...
> >
> > Any comments and suggestions are very welcome. :-)
>
> I expect performance to be pretty bad here, the lookup involved per
> fault is excessive. There are cases where a user is not going to be
> willing to have a slow ramp up of performance for their devices as they
> fault in pages, so we might need to considering making this
> configurable through the vfio interface. Our page mapping also only
There is another factor to be considered. The presence of IOMMU_
DEV_FEAT_IOPF just indicates the device capability of triggering I/O
page fault through the IOMMU, but not exactly means that the device
can tolerate I/O page fault for arbitrary DMA requests. In reality, many
devices allow I/O faulting only in selective contexts. However, there
is no standard way (e.g. PCISIG) for the device to report whether
arbitrary I/O fault is allowed. Then we may have to maintain device
specific knowledge in software, e.g. in an opt-in table to list devices
which allows arbitrary faults. For devices which only support selective
faulting, a mediator (either through vendor extensions on vfio-pci-core
or a mdev wrapper) might be necessary to help lock down non-faultable
mappings and then enable faulting on the rest mappings.
> grows here, should mappings expire or do we need a least recently
> mapped tracker to avoid exceeding the user's locked memory limit? How
> does a user know what to set for a locked memory limit? The behavior
> here would lead to cases where an idle system might be ok, but as soon
> as load increases with more inflight DMA, we start seeing
> "unpredictable" I/O faults from the user perspective. Seems like there
> are lots of outstanding considerations and I'd also like to hear from
> the SVA folks about how this meshes with their work. Thanks,
>
The main overlap between this feature and SVA is the IOPF reporting
framework, which currently still has gap to support both in nested
mode, as discussed here:
https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/
Once that gap is resolved in the future, the VFIO fault handler just
adopts different actions according to the fault-level: 1st level faults
are forwarded to userspace thru the vSVA path while 2nd-level faults
are fixed (or warned if not intended) by VFIO itself thru the IOMMU
mapping interface.
Thanks
Kevin
next prev parent reply other threads:[~2021-02-01 7:57 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-25 9:03 [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Shenming Lu
2021-01-25 9:03 ` [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages Shenming Lu
2021-01-29 22:58 ` Alex Williamson
2021-01-30 9:31 ` Shenming Lu
2021-01-25 9:04 ` [RFC PATCH v1 2/4] vfio: Add a page fault handler Shenming Lu
2021-01-27 17:42 ` Christoph Hellwig
2021-01-28 6:10 ` Shenming Lu
2021-01-25 9:04 ` [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices Shenming Lu
2021-01-29 22:42 ` Alex Williamson
2021-01-30 9:31 ` Shenming Lu
2021-01-25 9:04 ` [RFC PATCH v1 4/4] vfio: Allow to pin and map dynamically Shenming Lu
2021-01-29 22:57 ` [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Alex Williamson
2021-01-30 9:30 ` Shenming Lu
2021-02-01 7:56 ` Tian, Kevin [this message]
2021-02-02 6:41 ` Shenming Lu
2021-02-04 6:52 ` Tian, Kevin
2021-02-05 10:37 ` Jean-Philippe Brucker
2021-02-07 8:20 ` Tian, Kevin
2021-02-07 11:47 ` Shenming Lu
2021-02-09 11:06 ` Liu, Yi L
2021-02-10 8:02 ` Shenming Lu
2021-03-18 7:53 ` Shenming Lu
2021-03-18 9:07 ` Tian, Kevin
2021-03-18 11:53 ` Shenming Lu
2021-03-18 12:32 ` Tian, Kevin
2021-03-18 12:47 ` Shenming Lu
2021-03-19 0:33 ` Lu Baolu
2021-03-19 1:30 ` Keqian Zhu
2021-03-20 1:35 ` Lu Baolu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=MWHPR11MB188684B42632FD0B9B5CA1C08CB69@MWHPR11MB1886.namprd11.prod.outlook.com \
--to=kevin.tian@intel.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=cohuck@redhat.com \
--cc=eric.auger@redhat.com \
--cc=jean-philippe@linaro.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lushenming@huawei.com \
--cc=wanghaibin.wang@huawei.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).