kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Liu, Yi L" <yi.l.liu@intel.com>
To: "Tian, Kevin" <kevin.tian@intel.com>,
	Shenming Lu <lushenming@huawei.com>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Eric Auger <eric.auger@redhat.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	"wanghaibin.wang@huawei.com" <wanghaibin.wang@huawei.com>,
	"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>
Subject: RE: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
Date: Tue, 9 Feb 2021 11:06:09 +0000	[thread overview]
Message-ID: <CY4PR1101MB232892F595111A53803F6DBFC38E9@CY4PR1101MB2328.namprd11.prod.outlook.com> (raw)
In-Reply-To: <MWHPR11MB1886C71A751B48EF626CAC938CB39@MWHPR11MB1886.namprd11.prod.outlook.com>

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Thursday, February 4, 2021 2:52 PM
> 
> > From: Shenming Lu <lushenming@huawei.com>
> > Sent: Tuesday, February 2, 2021 2:42 PM
> >
> > On 2021/2/1 15:56, Tian, Kevin wrote:
> > >> From: Alex Williamson <alex.williamson@redhat.com>
> > >> Sent: Saturday, January 30, 2021 6:58 AM
> > >>
> > >> On Mon, 25 Jan 2021 17:03:58 +0800
> > >> Shenming Lu <lushenming@huawei.com> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> The static pinning and mapping problem in VFIO and possible
> solutions
> > >>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
> > >>> page fault support for VFIO devices. Different from those relatively
> > >>> complicated software approaches such as presenting a vIOMMU that
> > >> provides
> > >>> the DMA buffer information (might include para-virtualized
> > optimizations),
> > >>> IOPF mainly depends on the hardware faulting capability, such as the
> > PCIe
> > >>> PRI extension or Arm SMMU stall model. What's more, the IOPF
> support
> > in
> > >>> the IOMMU driver is being implemented in SVA [3]. So do we
> consider to
> > >>> add IOPF support for VFIO passthrough based on the IOPF part of SVA
> at
> > >>> present?
> > >>>
> > >>> We have implemented a basic demo only for one stage of translation
> > (GPA
> > >>> -> HPA in virtualization, note that it can be configured at either stage),
> > >>> and tested on Hisilicon Kunpeng920 board. The nested mode is more
> > >> complicated
> > >>> since VFIO only handles the second stage page faults (same as the
> non-
> > >> nested
> > >>> case), while the first stage page faults need to be further delivered to
> > >>> the guest, which is being implemented in [4] on ARM. My thought on
> this
> > >>> is to report the page faults to VFIO regardless of the occured stage
> (try
> > >>> to carry the stage information), and handle respectively according to
> the
> > >>> configured mode in VFIO. Or the IOMMU driver might evolve to
> support
> > >> more...
> > >>>
> > >>> Might TODO:
> > >>>  - Optimize the faulting path, and measure the performance (it might
> still
> > >>>    be a big issue).
> > >>>  - Add support for PRI.
> > >>>  - Add a MMU notifier to avoid pinning.
> > >>>  - Add support for the nested mode.
> > >>> ...
> > >>>
> > >>> Any comments and suggestions are very welcome. :-)
> > >>
> > >> I expect performance to be pretty bad here, the lookup involved per
> > >> fault is excessive.  There are cases where a user is not going to be
> > >> willing to have a slow ramp up of performance for their devices as they
> > >> fault in pages, so we might need to considering making this
> > >> configurable through the vfio interface.  Our page mapping also only
> > >
> > > There is another factor to be considered. The presence of IOMMU_
> > > DEV_FEAT_IOPF just indicates the device capability of triggering I/O
> > > page fault through the IOMMU, but not exactly means that the device
> > > can tolerate I/O page fault for arbitrary DMA requests.
> >
> > Yes, so I add a iopf_enabled field in VFIO to indicate the whole path
> faulting
> > capability and set it to true after registering a VFIO page fault handler.
> >
> > > In reality, many
> > > devices allow I/O faulting only in selective contexts. However, there
> > > is no standard way (e.g. PCISIG) for the device to report whether
> > > arbitrary I/O fault is allowed. Then we may have to maintain device
> > > specific knowledge in software, e.g. in an opt-in table to list devices
> > > which allows arbitrary faults. For devices which only support selective
> > > faulting, a mediator (either through vendor extensions on vfio-pci-core
> > > or a mdev wrapper) might be necessary to help lock down non-faultable
> > > mappings and then enable faulting on the rest mappings.
> >
> > For devices which only support selective faulting, they could tell it to the
> > IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
> 
> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
> selectively page-pinning. The matter is that 'they' imply some device
> specific logic to decide which pages must be pinned and such knowledge
> is outside of VFIO.
> 
> From enabling p.o.v we could possibly do it in phased approach. First
> handles devices which tolerate arbitrary DMA faults, and then extends
> to devices with selective-faulting. The former is simpler, but with one
> main open whether we want to maintain such device IDs in a static
> table in VFIO or rely on some hints from other components (e.g. PF
> driver in VF assignment case). Let's see how Alex thinks about it.
> 
> >
> > >
> > >> grows here, should mappings expire or do we need a least recently
> > >> mapped tracker to avoid exceeding the user's locked memory limit?
> How
> > >> does a user know what to set for a locked memory limit?  The behavior
> > >> here would lead to cases where an idle system might be ok, but as
> soon
> > >> as load increases with more inflight DMA, we start seeing
> > >> "unpredictable" I/O faults from the user perspective.  Seems like there
> > >> are lots of outstanding considerations and I'd also like to hear from
> > >> the SVA folks about how this meshes with their work.  Thanks,
> > >>
> > >
> > > The main overlap between this feature and SVA is the IOPF reporting
> > > framework, which currently still has gap to support both in nested
> > > mode, as discussed here:
> > >
> > > https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/
> > >
> > > Once that gap is resolved in the future, the VFIO fault handler just
> > > adopts different actions according to the fault-level: 1st level faults
> > > are forwarded to userspace thru the vSVA path while 2nd-level faults
> > > are fixed (or warned if not intended) by VFIO itself thru the IOMMU
> > > mapping interface.
> >
> > I understand what you mean is:
> > From the perspective of VFIO, first, we need to set FEAT_IOPF, and then
> > regster its
> > own handler with a flag to indicate FLAT or NESTED and which level is
> > concerned,
> > thus the VFIO handler can handle the page faults directly according to the
> > carried
> > level information.
> >
> > Is there any plan for evolving(implementing) the IOMMU driver to
> support
> > this? Or
> > could we help this?  :-)
> >
> 
> Yes, it's in plan but just not happened yet. We are still focusing on guest
> SVA part thus only the 1st-level page fault (+Yi/Jacob). It's always welcomed
> to collaborate/help if you have time. ??

yeah, I saw Eric's page fault support patch is listed as reference. BTW.
one thing needs to clarify, currently only one iommu fault handler supported
for a single device. So for the fault handler added in this series, it should
be consolidated with the one added in Eric's series.

Regards,
Yi Liu

> Thanks
> Kevin

  parent reply	other threads:[~2021-02-09 11:09 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25  9:03 [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Shenming Lu
2021-01-25  9:03 ` [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages Shenming Lu
2021-01-29 22:58   ` Alex Williamson
2021-01-30  9:31     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 2/4] vfio: Add a page fault handler Shenming Lu
2021-01-27 17:42   ` Christoph Hellwig
2021-01-28  6:10     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices Shenming Lu
2021-01-29 22:42   ` Alex Williamson
2021-01-30  9:31     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 4/4] vfio: Allow to pin and map dynamically Shenming Lu
2021-01-29 22:57 ` [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Alex Williamson
2021-01-30  9:30   ` Shenming Lu
2021-02-01  7:56   ` Tian, Kevin
2021-02-02  6:41     ` Shenming Lu
2021-02-04  6:52       ` Tian, Kevin
2021-02-05 10:37         ` Jean-Philippe Brucker
2021-02-07  8:20           ` Tian, Kevin
2021-02-07 11:47             ` Shenming Lu
2021-02-09 11:06         ` Liu, Yi L [this message]
2021-02-10  8:02           ` Shenming Lu
2021-03-18  7:53         ` Shenming Lu
2021-03-18  9:07           ` Tian, Kevin
2021-03-18 11:53             ` Shenming Lu
2021-03-18 12:32               ` Tian, Kevin
2021-03-18 12:47                 ` Shenming Lu
2021-03-19  0:33               ` Lu Baolu
2021-03-19  1:30                 ` Keqian Zhu
2021-03-20  1:35                   ` Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY4PR1101MB232892F595111A53803F6DBFC38E9@CY4PR1101MB2328.namprd11.prod.outlook.com \
    --to=yi.l.liu@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=cohuck@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=jean-philippe@linaro.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lushenming@huawei.com \
    --cc=wanghaibin.wang@huawei.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).