linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Lu Baolu <baolu.lu@linux.intel.com>
To: Jean-Philippe Brucker <jean-philippe@linaro.org>,
	"Tian, Kevin" <kevin.tian@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"vivek.gautam@arm.com" <vivek.gautam@arm.com>,
	"guohanjun@huawei.com" <guohanjun@huawei.com>,
	"will@kernel.org" <will@kernel.org>,
	"lorenzo.pieralisi@arm.com" <lorenzo.pieralisi@arm.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	Zhou Wang <wangzhou1@hisilicon.com>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"zhangfei.gao@linaro.org" <zhangfei.gao@linaro.org>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"devicetree@vger.kernel.org" <devicetree@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"robh+dt@kernel.org" <robh+dt@kernel.org>,
	"Jonathan.Cameron@huawei.com" <Jonathan.Cameron@huawei.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	David Woodhouse <dwmw2@infradead.org>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"shameerali.kolothum.thodi@huawei.com"
	<shameerali.kolothum.thodi@huawei.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"sudeep.holla@arm.com" <sudeep.holla@arm.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"linux-accelerators@lists.ozlabs.org"
	<linux-accelerators@lists.ozlabs.org>,
	baolu.lu@linux.intel.com
Subject: Re: [PATCH v9 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA
Date: Wed, 20 Jan 2021 09:57:42 +0800	[thread overview]
Message-ID: <5e563a0b-e853-bb72-2cec-13c94ab4d554@linux.intel.com> (raw)
In-Reply-To: <YAaxjmJW+ZMvrhac@myrica>

Hi Jean,

On 1/19/21 6:16 PM, Jean-Philippe Brucker wrote:
> On Mon, Jan 18, 2021 at 06:54:28AM +0000, Tian, Kevin wrote:
>>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>> Sent: Saturday, January 16, 2021 11:54 AM
>>>
>>> Hi Jean,
>>>
>>> On 2021/1/15 0:41, Jean-Philippe Brucker wrote:
>>>> I guess detailing what's needed for nested IOPF can help the discussion,
>>>> although I haven't seen any concrete plan about implementing it, and it
>>>> still seems a couple of years away. There are two important steps with
>>>> nested IOPF:
>>>>
>>>> (1) Figuring out whether a fault comes from L1 or L2. A SMMU stall event
>>>>       comes with this information, but a PRI page request doesn't. The
>>> IOMMU
>>>>       driver has to first translate the IOVA to a GPA, injecting the fault
>>>>       into the guest if this translation fails by using the usual
>>>>       iommu_report_device_fault().
>>
>> The IOMMU driver can walk the page tables to find out the level information.
>> If the walk terminates at the 1st level, inject to the guest. Otherwise fix the
>> mm fault at 2nd level. It's not efficient compared to hardware-provided info,
>> but it's doable and actual overhead needs to be measured (optimization exists
>> e.g. having fault client to hint no 2nd level fault expected when registering fault
>> handler in pinned case).
>>
>>>>
>>>> (2) Translating the faulting GPA to a HVA that can be fed to
>>>>       handle_mm_fault(). That requires help from KVM, so another interface -
>>>>       either KVM registering GPA->HVA translation tables or IOMMU driver
>>>>       querying each translation. Either way it should be reusable by device
>>>>       drivers that implement IOPF themselves.
>>
>> Or just leave to the fault client (say VFIO here) to figure it out. VFIO has the
>> information about GPA->HPA and can then call handle_mm_fault to fix the
>> received fault. The IOMMU driver just exports an interface for the device drivers
>> which implement IOPF themselves to report a fault which is then handled by
>> the IOMMU core by reusing the same faulting path.
>>
>>>>
>>>> (1) could be enabled with iommu_dev_enable_feature(). (2) requires a
>>> more
>>>> complex interface. (2) alone might also be desirable - demand-paging for
>>>> level 2 only, no SVA for level 1.
>>
>> Yes, this is what we want to point out. A general FEAT_IOPF implies more than
>> what this patch intended to address.
>>
>>>>
>>>> Anyway, back to this patch. What I'm trying to convey is "can the IOMMU
>>>> receive incoming I/O page faults for this device and, when SVA is enabled,
>>>> feed them to the mm subsystem?  Enable that or return an error." I'm stuck
>>>> on the name. IOPF alone is too vague. Not IOPF_L1 as Kevin noted, since L1
>>>> is also used in virtualization. IOPF_BIND and IOPF_SVA could also mean (2)
>>>> above. IOMMU_DEV_FEAT_IOPF_FLAT?
>>>>
>>>> That leaves space for the nested extensions. (1) above could be
>>>> IOMMU_FEAT_IOPF_NESTED, and (2) requires some new interfacing with
>>> KVM (or
>>>> just an external fault handler) and could be used with either IOPF_FLAT or
>>>> IOPF_NESTED. We can figure out the details later. What do you think?
>>>
>>> I agree that we can define IOPF_ for current usage and leave space for
>>> future extensions.
>>>
>>> IOPF_FLAT represents IOPF on first-level translation, currently first
>>> level translation could be used in below cases.
>>>
>>> 1) FL w/ internal Page Table: Kernel IOVA;
>>> 2) FL w/ external Page Table: VFIO passthrough;
>>> 3) FL w/ shared CPU page table: SVA
>>>
>>> We don't need to support IOPF for case 1). Let's put it aside.
>>>
>>> IOPF handling of 2) and 3) are different. Do we need to define different
>>> names to distinguish these two cases?
>>>
>>
>> Defining feature names according to various use cases does not sound a
>> clean way. In an ideal way we should have just a general FEAT_IOPF since
>> the hardware (at least VT-d) does support fault in either 1st-level, 2nd-
>> level or nested configurations. We are entering this trouble just because
>> there is difficulty for the software evolving to enable full hardware cap
>> in one batch. My last proposal was sort of keeping FEAT_IOPF as a general
>> capability for whether delivering fault through the IOMMU or the ad-hoc
>> device, and then having a separate interface for whether IOPF reporting
>> is available under a specific configuration. The former is about the path
>> between the IOMMU and the device, while the latter is about the interface
>> between the IOMMU driver and its faulting client.
>>
>> The reporting capability can be checked when the fault client is registering
>> its fault handler, and at this time the IOMMU driver knows how the related
>> mapping is configured (1st, 2nd, or nested) and whether fault reporting is
>> supported in such configuration. We may introduce IOPF_REPORT_FLAT and
>> IOPF_REPORT_NESTED respectively. while IOPF_REPORT_FLAT detection is
>> straightforward (2 and 3 can be differentiated internally based on configured
>> level), IOPF_REPORT_NESTED needs additional info to indicate which level is
>> concerned since the vendor driver may not support fault reporting in both
>> levels or the fault client may be interested in only one level (e.g. with 2nd
>> level pinned).
> 
> I agree with this plan (provided I understood it correctly this time):
> have IOMMU_DEV_FEAT_IOPF describing the IOPF interface between device and
> IOMMU. Enabling it on its own doesn't do anything visible to the driver,
> it just probes for capabilities and enables PRI if necessary. For host
> SVA, since there is no additional communication between IOMMU and device
> driver, enabling IOMMU_DEV_FEAT_SVA in addition to IOPF is sufficient.
> Then when implementing nested we'll extend iommu_register_fault_handler()
> with flags and parameters. That will also enable advanced dispatching (1).
> 
> Will it be necessary to enable FEAT_IOPF when doing VFIO passthrough
> (injecting to the guest or handling it with external page tables)?
> I think that would be better. Currently a device driver registering a
> fault handler doesn't know if it will get recoverable page faults or only
> unrecoverable ones.
> 
> So I don't think this patch needs any change. Baolu, are you ok with
> keeping this and patch 4?

It sounds good to me. Keep FEAT_IOPF as the IOMMU capability of
generating I/O page fault and differentiate different I/O page faults by
extending the fault handler register interface.

> 
> Thanks,
> Jean
> 

Best regards,
baolu

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-01-20  2:10 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-08 14:52 [PATCH v9 00/10] iommu: I/O page faults for SMMUv3 Jean-Philippe Brucker
2021-01-08 14:52 ` [PATCH v9 01/10] iommu: Remove obsolete comment Jean-Philippe Brucker
2021-01-19 11:11   ` Jonathan Cameron
2021-01-20 17:41     ` Jean-Philippe Brucker
2021-01-08 14:52 ` [PATCH v9 02/10] iommu/arm-smmu-v3: Use device properties for pasid-num-bits Jean-Philippe Brucker
2021-01-19 11:22   ` Jonathan Cameron
2021-01-08 14:52 ` [PATCH v9 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA Jean-Philippe Brucker
2021-01-12  4:31   ` Lu Baolu
2021-01-12  9:16     ` Jean-Philippe Brucker
2021-01-13  2:49       ` Lu Baolu
2021-01-13  8:10         ` Tian, Kevin
2021-01-14 16:41           ` Jean-Philippe Brucker
2021-01-16  3:54             ` Lu Baolu
2021-01-18  6:54               ` Tian, Kevin
2021-01-19 10:16                 ` Jean-Philippe Brucker
2021-01-20  1:57                   ` Lu Baolu [this message]
2021-01-08 14:52 ` [PATCH v9 04/10] iommu/vt-d: Support IOMMU_DEV_FEAT_IOPF Jean-Philippe Brucker
2021-01-08 14:52 ` [PATCH v9 05/10] uacce: Enable IOMMU_DEV_FEAT_IOPF Jean-Philippe Brucker
2021-01-11  3:29   ` Zhangfei Gao
2021-01-19 12:27   ` Jonathan Cameron
2021-01-20 17:42     ` Jean-Philippe Brucker
2021-01-20 20:47   ` Dave Jiang
2021-01-22 11:53     ` Zhou Wang
2021-01-22 15:43       ` Dave Jiang
2021-01-08 14:52 ` [PATCH v9 06/10] iommu: Add a page fault handler Jean-Philippe Brucker
2021-01-19 13:38   ` Jonathan Cameron
2021-01-20 17:43     ` Jean-Philippe Brucker
2021-01-08 14:52 ` [PATCH v9 07/10] iommu/arm-smmu-v3: Maintain a SID->device structure Jean-Philippe Brucker
2021-01-19 13:51   ` Jonathan Cameron
2021-01-08 14:52 ` [PATCH v9 08/10] dt-bindings: document stall property for IOMMU masters Jean-Philippe Brucker
2021-01-08 14:52 ` [PATCH v9 09/10] ACPI/IORT: Enable stall support for platform devices Jean-Philippe Brucker
2021-01-08 14:52 ` [PATCH v9 10/10] iommu/arm-smmu-v3: Add " Jean-Philippe Brucker
2021-01-19 17:28   ` Robin Murphy
2021-01-20 17:55     ` Jean-Philippe Brucker
2021-01-11  3:26 ` [PATCH v9 00/10] iommu: I/O page faults for SMMUv3 Zhangfei Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5e563a0b-e853-bb72-2cec-13c94ab4d554@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=arnd@arndb.de \
    --cc=devicetree@vger.kernel.org \
    --cc=dwmw2@infradead.org \
    --cc=eric.auger@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guohanjun@huawei.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jean-philippe@linaro.org \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-accelerators@lists.ozlabs.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=robh+dt@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=sudeep.holla@arm.com \
    --cc=vivek.gautam@arm.com \
    --cc=wangzhou1@hisilicon.com \
    --cc=will@kernel.org \
    --cc=zhangfei.gao@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).