linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shenming Lu <lushenming@huawei.com>
To: "Tian, Kevin" <kevin.tian@intel.com>,
	Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Eric Auger <eric.auger@redhat.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	"wanghaibin.wang@huawei.com" <wanghaibin.wang@huawei.com>,
	"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>
Subject: Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
Date: Sun, 7 Feb 2021 19:47:18 +0800	[thread overview]
Message-ID: <722bf67a-0f59-6335-77ab-095e971a6355@huawei.com> (raw)
In-Reply-To: <MWHPR11MB1886D07D927A77DCB2FF74D48CB09@MWHPR11MB1886.namprd11.prod.outlook.com>

On 2021/2/7 16:20, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>> Sent: Friday, February 5, 2021 6:37 PM
>>
>> Hi,
>>
>> On Thu, Feb 04, 2021 at 06:52:10AM +0000, Tian, Kevin wrote:
>>>>>>> The static pinning and mapping problem in VFIO and possible
>> solutions
>>>>>>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
>>>>>>> page fault support for VFIO devices. Different from those relatively
>>>>>>> complicated software approaches such as presenting a vIOMMU that
>>>>>> provides
>>>>>>> the DMA buffer information (might include para-virtualized
>>>> optimizations),
>>
>> I'm curious about the performance difference between this and the
>> map/unmap vIOMMU, as well as the coIOMMU. This is probably a lot faster
>> but those don't depend on IOPF which is a pretty rare feature at the
>> moment.

Yeah, I will give the performance data later.

>>
>> [...]
>>>>> In reality, many
>>>>> devices allow I/O faulting only in selective contexts. However, there
>>>>> is no standard way (e.g. PCISIG) for the device to report whether
>>>>> arbitrary I/O fault is allowed. Then we may have to maintain device
>>>>> specific knowledge in software, e.g. in an opt-in table to list devices
>>>>> which allows arbitrary faults. For devices which only support selective
>>>>> faulting, a mediator (either through vendor extensions on vfio-pci-core
>>>>> or a mdev wrapper) might be necessary to help lock down non-
>> faultable
>>>>> mappings and then enable faulting on the rest mappings.
>>>>
>>>> For devices which only support selective faulting, they could tell it to the
>>>> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
>>>
>>> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
>>> selectively page-pinning. The matter is that 'they' imply some device
>>> specific logic to decide which pages must be pinned and such knowledge
>>> is outside of VFIO.
>>>
>>> From enabling p.o.v we could possibly do it in phased approach. First
>>> handles devices which tolerate arbitrary DMA faults, and then extends
>>> to devices with selective-faulting. The former is simpler, but with one
>>> main open whether we want to maintain such device IDs in a static
>>> table in VFIO or rely on some hints from other components (e.g. PF
>>> driver in VF assignment case). Let's see how Alex thinks about it.
>>
>> Do you think selective-faulting will be the norm, or only a problem for
>> initial IOPF implementations?  To me it's the selective-faulting kind of
>> device that will be the odd one out, but that's pure speculation. Either
>> way maintaining a device list seems like a pain.
> 
> I would think it's norm for quite some time (e.g. multiple years), as from
> what I learned turning a complex accelerator to an implementation 
> tolerating arbitrary DMA fault is way complex (in every critical path) and
> not cost effective (tracking in-fly requests). It might be OK for some 
> purposely-built devices in specific usage but for most it has to be an 
> evolving path toward the 100%-faultable goal...
> 
>>
>> [...]
>>> Yes, it's in plan but just not happened yet. We are still focusing on guest
>>> SVA part thus only the 1st-level page fault (+Yi/Jacob). It's always
>> welcomed
>>> to collaborate/help if you have time. 😊
>>
>> By the way the current fault report API is missing a way to invalidate
>> partial faults: when the IOMMU device's PRI queue overflows, it may
>> auto-respond to page request groups that were already partially reported
>> by the IOMMU driver. Upon detecting an overflow, the IOMMU driver needs
>> to
>> tell all fault consumers to discard their partial groups.
>> iopf_queue_discard_partial() [1] does this for the internal IOPF handler
>> but we have nothing for the lower-level fault handler at the moment. And
>> it gets more complicated when injecting IOPFs to guests, we'd need a
>> mechanism to recall partial groups all the way through kernel->userspace
>> and userspace->guest.
> 
> I didn't know how to recall partial groups through emulated vIOMMUs
> (at least for virtual VT-d). Possibly it could be supported by virtio-iommu.
> But in any case I consider it more like an optimization instead of a functional
> requirement (and could be avoided in below Shenming's suggestion).
> 
>>
>> Shenming suggests [2] to also use the IOPF handler for IOPFs managed by
>> device drivers. It's worth considering in my opinion because we could hold
>> partial groups within the kernel and only report full groups to device
>> drivers (and guests). In addition we'd consolidate tracking of IOPFs,
>> since they're done both by iommu_report_device_fault() and the IOPF
>> handler at the moment.
> 
> I also think it's the right thing to do. In concept w/ or w/o DEV_FEAT_IOPF
> just reflects how IOPFs are delivered to the system software. In the end 
> IOPFs are all about permission violations in the IOMMU page tables thus
> we should try to reuse/consolidate the IOMMU fault reporting stack as 
> much as possible.
> 
>>
>> Note that I plan to upstream the IOPF patch [1] as is because it was
>> already in good shape for 5.12, and consolidating the fault handler will
>> require some thinking.
> 
> This plan makes sense.

Yeah, maybe this problem could be left for the implementation of a device(VFIO)
specific fault handler... :-)

Thanks,
Shenming

> 
>>
>> Thanks,
>> Jean
>>
>>
>> [1] https://lore.kernel.org/linux-iommu/20210127154322.3959196-7-jean-
>> philippe@linaro.org/
>> [2] https://lore.kernel.org/linux-iommu/f79f06be-e46b-a65a-3951-
>> 3e7dbfa66b4a@huawei.com/
> 
> Thanks
> Kevin
> 

  reply	other threads:[~2021-02-07 11:53 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25  9:03 [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Shenming Lu
2021-01-25  9:03 ` [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages Shenming Lu
2021-01-29 22:58   ` Alex Williamson
2021-01-30  9:31     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 2/4] vfio: Add a page fault handler Shenming Lu
2021-01-27 17:42   ` Christoph Hellwig
2021-01-28  6:10     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices Shenming Lu
2021-01-29 22:42   ` Alex Williamson
2021-01-30  9:31     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 4/4] vfio: Allow to pin and map dynamically Shenming Lu
2021-01-29 22:57 ` [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Alex Williamson
2021-01-30  9:30   ` Shenming Lu
2021-02-01  7:56   ` Tian, Kevin
2021-02-02  6:41     ` Shenming Lu
2021-02-04  6:52       ` Tian, Kevin
2021-02-05 10:37         ` Jean-Philippe Brucker
2021-02-07  8:20           ` Tian, Kevin
2021-02-07 11:47             ` Shenming Lu [this message]
2021-02-09 11:06         ` Liu, Yi L
2021-02-10  8:02           ` Shenming Lu
2021-03-18  7:53         ` Shenming Lu
2021-03-18  9:07           ` Tian, Kevin
2021-03-18 11:53             ` Shenming Lu
2021-03-18 12:32               ` Tian, Kevin
2021-03-18 12:47                 ` Shenming Lu
2021-03-19  0:33               ` Lu Baolu
2021-03-19  1:30                 ` Keqian Zhu
2021-03-20  1:35                   ` Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=722bf67a-0f59-6335-77ab-095e971a6355@huawei.com \
    --to=lushenming@huawei.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=cohuck@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=jean-philippe@linaro.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=wanghaibin.wang@huawei.com \
    --cc=yi.l.liu@intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).