All of lore.kernel.org
 help / color / mirror / Atom feed
From: Auger Eric <eric.auger@redhat.com>
To: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: iommu@lists.linux-foundation.org,
	LKML <linux-kernel@vger.kernel.org>,
	Joerg Roedel <joro@8bytes.org>,
	David Woodhouse <dwmw2@infradead.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Jean-Philippe Brucker <jean-philippe.brucker@arm.com>,
	Yi L <yi.l.liu@linux.intel.com>, Raj Ashok <ashok.raj@intel.com>,
	Rafael Wysocki <rafael.j.wysocki@intel.com>,
	Liu@mail.linuxfoundation.org, Jean Delvare <khali@linux-fr.org>
Subject: Re: [PATCH v5 10/23] iommu: introduce device fault data
Date: Wed, 26 Sep 2018 12:20:12 +0200	[thread overview]
Message-ID: <1d02837a-d74a-9c34-6211-c8bc4c423672@redhat.com> (raw)
In-Reply-To: <20180921100550.51847a02@jacob-builder>

Hi Jacob,

On 9/21/18 7:05 PM, Jacob Pan wrote:
> On Fri, 21 Sep 2018 12:07:09 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Jacob,
>>
>> On 5/11/18 10:54 PM, Jacob Pan wrote:
>>> Device faults detected by IOMMU can be reported outside IOMMU
>>> subsystem for further processing. This patch intends to provide
>>> a generic device fault data such that device drivers can be
>>> communicated with IOMMU faults without model specific knowledge.
>>>
>>> The proposed format is the result of discussion at:
>>> https://lkml.org/lkml/2017/11/10/291
>>> Part of the code is based on Jean-Philippe Brucker's patchset
>>> (https://patchwork.kernel.org/patch/9989315/).
>>>
>>> The assumption is that model specific IOMMU driver can filter and
>>> handle most of the internal faults if the cause is within IOMMU
>>> driver control. Therefore, the fault reasons can be reported are
>>> grouped and generalized based common specifications such as PCI ATS.
>>>
>>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>>> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
>>> ---
>>>  include/linux/iommu.h | 101
>>> +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed,
>>> 99 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>>> index e8cadb6..aeadb4f 100644
>>> --- a/include/linux/iommu.h
>>> +++ b/include/linux/iommu.h
>>> @@ -49,13 +49,17 @@ struct bus_type;
>>>  struct device;
>>>  struct iommu_domain;
>>>  struct notifier_block;
>>> +struct iommu_fault_event;
>>>  
>>>  /* iommu fault flags */
>>> -#define IOMMU_FAULT_READ	0x0
>>> -#define IOMMU_FAULT_WRITE	0x1
>>> +#define IOMMU_FAULT_READ		(1 << 0)
>>> +#define IOMMU_FAULT_WRITE		(1 << 1)
>>> +#define IOMMU_FAULT_EXEC		(1 << 2)
>>> +#define IOMMU_FAULT_PRIV		(1 << 3)
>>>  
>>>  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>>>  			struct device *, unsigned long, int, void
>>> *); +typedef int (*iommu_dev_fault_handler_t)(struct
>>> iommu_fault_event *, void *); 
>>>  struct iommu_domain_geometry {
>>>  	dma_addr_t aperture_start; /* First address that can be
>>> mapped    */ @@ -264,6 +268,98 @@ struct iommu_device {
>>>  	struct device *dev;
>>>  };
>>>  
>>> +/*  Generic fault types, can be expanded IRQ remapping fault */
>>> +enum iommu_fault_type {
>>> +	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault
>>> */
>>> +	IOMMU_FAULT_PAGE_REQ,		/* page request fault
>>> */ +};  
>>
>> While doing the exercise of mapping the SMMUv3 events to this, I
>> failed to map some event types to iommu_fault_reason enum values.
> I am not surprised :), this list is intended to grow as we add support
> more IOMMU models. I was thinking of these guidelines when adding to
> this list
> - model agnostic
> - needs to be reported outside iommu subsystem
> - per device identifiable
May be worth adding in the kernel doc comments.
device may need to be clarified as end point device?
> 
>>> +
>>> +enum iommu_fault_reason {
>>> +	IOMMU_FAULT_REASON_UNKNOWN = 0,
>>> +
>>> +	/* IOMMU internal error, no specific reason to report out
>>> */
>>> +	IOMMU_FAULT_REASON_INTERNAL,
>>> +
>>> +	/* Could not access the PASID table */
>>> +	IOMMU_FAULT_REASON_PASID_FETCH,  
>>
>> Would it be possible to add
>>  /* could not access the device context (fetch caused external abort)
>> */ IOMMU_FAULT_REASON_DEVICE_CONTEXT_FETCH,
>>
> sounds reasonable.
>>> +
>>> +	/*
>>> +	 * PASID is out of range (e.g. exceeds the maximum PASID
>>> +	 * supported by the IOMMU) or disabled.
>>> +	 */
>>> +	IOMMU_FAULT_REASON_PASID_INVALID,  
>> Would it be possible to add
>> /* source id is out of range */
>> IOMMU_FAULT_REASON_SOURCEID_INVALID,
>>
> hmm, the fault here should be per device. I guess source ID is PCI dev
> requester ID eqivalent. If the source id is invalid, how could it be
> reported to the right device in the vIOMMU? Should it be handled by the
> host IOMMU itself?
hum you're right. The source id is not under control of the guest so
this should not be exposed to the guest.
>> or alike
>> on ARM the sourceid matches the streamid and pasid matches the
>> substreamid.
>>
>> It would be useful to have:
>> /* pasid entry is invalid or has configuration errors */
>> IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
>>
>> /* device context entry is invalid or has configuration errors */
>> IOMMU_FAULT_REASON_BAD_DEVICE_CONTEXT_ENTRY,
>>
>> This typically allows to return information to the guest about fields
>> in device context entry or pasid entry that are incorrect, not
>> matching the physical IOMMU capability
> Sounds good.
>>> +
>>> +	/* Could not access the page directory (Invalid PASID
>>> entry) */
>>> +	IOMMU_FAULT_REASON_PGD_FETCH,  
>> I was unsure about this one. On my side I needed something more
>> general such as:
>> /*
>> * An external abort occurred fetching (or updating) a translation
>> * table descriptor
>> */
>> IOMMU_FAULT_REASON_WALK_EABT,
>>
>>> +
>>> +	/* Could not access the page table entry (Bad address) */
>>> +	IOMMU_FAULT_REASON_PTE_FETCH,  
>> I interpreted this one as the actual translation failure but that's
>> not obvious to me either. Is it a fetch abort or is it that the PTE is
>> marked invalid. Maybe if we have the former we can just have a
>> translation fault reason instead.
> How about these two?
> IOMMU_FAULT_REASON_TRANSL_TBL
as long as you confirm the semantic matches the one proposed for above
IOMMU_FAULT_REASON_WALK_EABT one, that's OK, ie. abort while fetching
one table base address.
> IOMMU_FAULT_REASON_TRANSL
means there is no translation for the input address?
> 
>>> +
>>> +	/* Protection flag check failed */
>>> +	IOMMU_FAULT_REASON_PERMISSION,  
>> On ARM we also have:
>>
>> /* access flag check failed */
>> IOMMU_FAULT_REASON_ACCESS,
>>
>> and
>>
>> /* Output address of a translation stage caused Address Size fault */
>>  IOMMU_FAULT_REASON_OOR_ADDRESS
>>
> is that for nested translation where stage 1 result is invalid for
> stage 2? I am thinking for any misconfiguration, it should be handled
> by host iommu driver locally.
not necessarily. Each level has its own max output size
level 1 has a max intermediate physical address (configured in the PASID
entry)
level 2 has a max PA (configured in the device context entry, host owned
part).
While decoding either level, each time you decode an address in an
intermediate or leaf TT descripror which is outside of the authorized
output range of this level, this triggers the error.

I understand a level 1 address size fault may stem from inconsistency
between guest PASID entry programming (max IPA range) and actual TT content.

For instance PASID entry says you support a 40b IPA output range at
level1. While decoding level 1 PMD you find next table has an IPA beyond
40b, this error gets triggered

Hope it clarifies

Thanks

Eric

>> I am aware all those suggestions do not match the original goal of
>> your series, mostly targeted at SVA support. However in the prospect
>> to make those APIs as generic as possible it may be useful to take
>> those requirements as well.
>>
>> Hope it does not bring extra noise to the topic ;-)
>>
> not at all. appreciate the effort to make it generally useful.
> 
>> Thanks
>>
>> Eric
>>
>>
>>
>>
>>> +};
>>> +
>>> +/**
>>> + * struct iommu_fault_event - Generic per device fault data
>>> + *
>>> + * - PCI and non-PCI devices
>>> + * - Recoverable faults (e.g. page request), information based on
>>> PCI ATS
>>> + * and PASID spec.
>>> + * - Un-recoverable faults of device interest
>>> + * - DMA remapping and IRQ remapping faults
>>> +
>>> + * @type contains fault type.
>>> + * @reason fault reasons if relevant outside IOMMU driver, IOMMU
>>> driver internal
>>> + *         faults are not reported
>>> + * @addr: tells the offending page address
>>> + * @pasid: contains process address space ID, used in shared
>>> virtual memory(SVM)
>>> + * @page_req_group_id: page request group index
>>> + * @last_req: last request in a page request group
>>> + * @pasid_valid: indicates if the PRQ has a valid PASID
>>> + * @prot: page access protection flag, e.g. IOMMU_FAULT_READ,
>>> IOMMU_FAULT_WRITE
>>> + * @device_private: if present, uniquely identify device-specific
>>> + *                  private data for an individual page request.
>>> + * @iommu_private: used by the IOMMU driver for storing
>>> fault-specific
>>> + *                 data. Users should not modify this field before
>>> + *                 sending the fault response.
>>> + */
>>> +struct iommu_fault_event {
>>> +	enum iommu_fault_type type;
>>> +	enum iommu_fault_reason reason;
>>> +	u64 addr;
>>> +	u32 pasid;
>>> +	u32 page_req_group_id;
>>> +	u32 last_req : 1;
>>> +	u32 pasid_valid : 1;
>>> +	u32 prot;
>>> +	u64 device_private;
>>> +	u64 iommu_private;
>>> +};
>>> +
>>> +/**
>>> + * struct iommu_fault_param - per-device IOMMU fault data
>>> + * @dev_fault_handler: Callback function to handle IOMMU faults at
>>> device level
>>> + * @data: handler private data
>>> + *
>>> + */
>>> +struct iommu_fault_param {
>>> +	iommu_dev_fault_handler_t handler;
>>> +	void *data;
>>> +};
>>> +
>>> +/**
>>> + * struct iommu_param - collection of per-device IOMMU data
>>> + *
>>> + * @fault_param: IOMMU detected device fault reporting data
>>> + *
>>> + * TODO: migrate other per device data pointers under
>>> iommu_dev_data, e.g.
>>> + *	struct iommu_group	*iommu_group;
>>> + *	struct iommu_fwspec	*iommu_fwspec;
>>> + */
>>> +struct iommu_param {
>>> +	struct iommu_fault_param *fault_param;
>>> +};
>>> +
>>>  int  iommu_device_register(struct iommu_device *iommu);
>>>  void iommu_device_unregister(struct iommu_device *iommu);
>>>  int  iommu_device_sysfs_add(struct iommu_device *iommu,
>>> @@ -437,6 +533,7 @@ struct iommu_ops {};
>>>  struct iommu_group {};
>>>  struct iommu_fwspec {};
>>>  struct iommu_device {};
>>> +struct iommu_fault_param {};
>>>  
>>>  static inline bool iommu_present(struct bus_type *bus)
>>>  {
>>>   
> 
> [Jacob Pan]
> 

  reply	other threads:[~2018-09-26 10:20 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-11 20:53 [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual Address (SVA) Jacob Pan
2018-05-11 20:53 ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 01/23] iommu: introduce bind_pasid_table API function Jacob Pan
2018-05-11 20:53   ` Jacob Pan
2018-08-23 16:34   ` Auger Eric
2018-08-24 12:47     ` Liu, Yi L
2018-08-24 12:47       ` Liu, Yi L
2018-08-24 13:20       ` Auger Eric
2018-08-28 17:04         ` Jacob Pan
2018-08-24 15:00   ` Auger Eric
2018-08-28  5:14     ` Jacob Pan
2018-08-28  8:34       ` Auger Eric
2018-08-28  8:34         ` Auger Eric
2018-08-28 16:36         ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 02/23] iommu/vt-d: move device_domain_info to header Jacob Pan
2018-05-11 20:53   ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 03/23] iommu/vt-d: add a flag for pasid table bound status Jacob Pan
2018-05-11 20:53   ` Jacob Pan
2018-05-13  7:33   ` Lu Baolu
2018-05-13  7:33     ` Lu Baolu
2018-05-14 18:51     ` Jacob Pan
2018-05-14 18:51       ` Jacob Pan
2018-05-13  8:01   ` Lu Baolu
2018-05-13  8:01     ` Lu Baolu
2018-05-14 18:52     ` Jacob Pan
2018-05-14 18:52       ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 04/23] iommu/vt-d: add bind_pasid_table function Jacob Pan
2018-05-11 20:53   ` Jacob Pan
2018-05-13  9:29   ` Lu Baolu
2018-05-13  9:29     ` Lu Baolu
2018-05-14 20:22     ` Jacob Pan
2018-05-14 20:22       ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 05/23] iommu: introduce iommu invalidate API function Jacob Pan
2018-05-11 20:53   ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 06/23] iommu/vt-d: add definitions for PFSID Jacob Pan
2018-05-11 20:53   ` Jacob Pan
2018-05-14  1:36   ` Lu Baolu
2018-05-14  1:36     ` Lu Baolu
2018-05-14 20:30     ` Jacob Pan
2018-05-14 20:30       ` Jacob Pan
2018-05-11 20:53 ` [PATCH v5 07/23] iommu/vt-d: fix dev iotlb pfsid use Jacob Pan
2018-05-14  1:52   ` Lu Baolu
2018-05-14  1:52     ` Lu Baolu
2018-05-14 20:38     ` Jacob Pan
2018-05-14 20:38       ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 08/23] iommu/vt-d: support flushing more translation cache types Jacob Pan
2018-05-11 20:54   ` Jacob Pan
2018-05-14  2:18   ` Lu Baolu
2018-05-14  2:18     ` Lu Baolu
2018-05-14 20:46     ` Jacob Pan
2018-05-14 20:46       ` Jacob Pan
2018-05-17  8:44   ` kbuild test robot
2018-05-17  8:44     ` kbuild test robot
2018-05-11 20:54 ` [PATCH v5 09/23] iommu/vt-d: add svm/sva invalidate function Jacob Pan
2018-05-11 20:54   ` Jacob Pan
2018-05-14  3:35   ` Lu Baolu
2018-05-14  3:35     ` Lu Baolu
2018-05-14 20:49     ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 10/23] iommu: introduce device fault data Jacob Pan
2018-05-11 20:54   ` Jacob Pan
2018-09-21 10:07   ` Auger Eric
2018-09-21 17:05     ` Jacob Pan
2018-09-26 10:20       ` Auger Eric [this message]
2018-05-11 20:54 ` [PATCH v5 11/23] driver core: add per device iommu param Jacob Pan
2018-05-11 20:54   ` Jacob Pan
2018-05-14  5:27   ` Lu Baolu
2018-05-14  5:27     ` Lu Baolu
2018-05-14 20:52     ` Jacob Pan
2018-05-14 20:52       ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 12/23] iommu: add a timeout parameter for prq response Jacob Pan
2018-05-11 20:54   ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 13/23] iommu: introduce device fault report API Jacob Pan
2018-05-14  6:01   ` Lu Baolu
2018-05-14  6:01     ` Lu Baolu
2018-05-14 20:55     ` Jacob Pan
2018-05-14 20:55       ` Jacob Pan
2018-05-15  6:52       ` Lu Baolu
2018-05-15  6:52         ` Lu Baolu
2018-05-17 11:41   ` Liu, Yi L
2018-05-17 11:41     ` Liu, Yi L
2018-05-17 15:59     ` Jacob Pan
2018-05-17 15:59       ` Jacob Pan
2018-05-17 23:22       ` Liu, Yi L
2018-05-21 23:03         ` Jacob Pan
2018-09-06  9:25   ` Auger Eric
2018-09-06 12:42     ` Jean-Philippe Brucker
2018-09-06 13:14       ` Auger Eric
2018-09-06 17:06         ` Jean-Philippe Brucker
2018-09-06 17:06           ` Jean-Philippe Brucker
2018-09-07  7:11           ` Auger Eric
2018-09-07 11:23             ` Jean-Philippe Brucker
2018-09-07 11:23               ` Jean-Philippe Brucker
2018-09-14 13:24   ` Auger Eric
2018-09-17 16:57     ` Jacob Pan
2018-09-25 14:58   ` Jean-Philippe Brucker
2018-09-25 14:58     ` Jean-Philippe Brucker
2018-09-25 22:17     ` Jacob Pan
2018-09-26 10:14       ` Jean-Philippe Brucker
2018-05-11 20:54 ` [PATCH v5 14/23] iommu: introduce page response function Jacob Pan
2018-05-14  6:39   ` Lu Baolu
2018-05-14  6:39     ` Lu Baolu
2018-05-29 16:13     ` Jacob Pan
2018-05-29 16:13       ` Jacob Pan
2018-09-10 14:52   ` Auger Eric
2018-09-10 17:50     ` Jacob Pan
2018-09-10 19:06       ` Auger Eric
2018-09-10 19:06         ` Auger Eric
2018-05-11 20:54 ` [PATCH v5 15/23] iommu: handle page response timeout Jacob Pan
2018-05-14  7:43   ` Lu Baolu
2018-05-14  7:43     ` Lu Baolu
2018-05-29 16:20     ` Jacob Pan
2018-05-30  7:46       ` Lu Baolu
2018-05-11 20:54 ` [PATCH v5 16/23] iommu/config: add build dependency for dmar Jacob Pan
2018-05-11 20:54   ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 17/23] iommu/vt-d: report non-recoverable faults to device Jacob Pan
2018-05-11 20:54   ` Jacob Pan
2018-05-14  8:17   ` Lu Baolu
2018-05-14  8:17     ` Lu Baolu
2018-05-29 17:33     ` Jacob Pan
2018-05-29 17:33       ` Jacob Pan
2018-05-11 20:54 ` [PATCH v5 18/23] iommu/intel-svm: report device page request Jacob Pan
2018-05-11 20:54 ` [PATCH v5 19/23] iommu/intel-svm: replace dev ops with fault report API Jacob Pan
2018-05-11 20:54 ` [PATCH v5 20/23] iommu/intel-svm: do not flush iotlb for viommu Jacob Pan
2018-05-11 20:54 ` [PATCH v5 21/23] iommu/vt-d: add intel iommu page response function Jacob Pan
2018-05-11 20:54 ` [PATCH v5 22/23] trace/iommu: add sva trace events Jacob Pan
2018-05-11 20:54 ` [PATCH v5 23/23] iommu: use sva invalidate and device fault trace event Jacob Pan
2018-05-29 15:54 ` [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual Address (SVA) Jacob Pan
2018-05-29 15:54   ` Jacob Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1d02837a-d74a-9c34-6211-c8bc4c423672@redhat.com \
    --to=eric.auger@redhat.com \
    --cc=Liu@mail.linuxfoundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=dwmw2@infradead.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jean-philippe.brucker@arm.com \
    --cc=joro@8bytes.org \
    --cc=khali@linux-fr.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=yi.l.liu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.