linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
To: "Liu, Yi L" <yi.l.liu@intel.com>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Joerg Roedel <joro@8bytes.org>,
	David Woodhouse <dwmw2@infradead.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>
Cc: "Lan, Tianyu" <tianyu.lan@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH v2 08/16] iommu: introduce device fault data
Date: Mon, 6 Nov 2017 19:01:54 +0000	[thread overview]
Message-ID: <09d451dc-c0e9-1fa2-8f85-45a9b1185d48@arm.com> (raw)
In-Reply-To: <A2975661238FB949B60364EF0F2C257439AFC86D@SHSMSX104.ccr.corp.intel.com>

Hi Yi,

Sorry for the late reply, I seem to have missed this.

On 20/10/17 11:07, Liu, Yi L wrote:
[...]
>>> +
>>> +/*  Generic fault types, can be expanded IRQ remapping fault */ enum
>>> +iommu_fault_type {
>>> +	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault */
>>> +	IOMMU_FAULT_PAGE_REQ,		/* page request fault */
>>> +};
>>> +
>>> +enum iommu_fault_reason {
>>> +	IOMMU_FAULT_REASON_CTX = 1,
>>
>> If I read the VT-d spec right, this is a fault encountered while fetching the PASID table
>> pointer?
>>
>>> +	IOMMU_FAULT_REASON_ACCESS,
>>
>> And this a pgd or pte access fault?
>>
>>> +	IOMMU_FAULT_REASON_INVALIDATE,
>>
>> What would this be?
>>
>>> +	IOMMU_FAULT_REASON_UNKNOWN,
>>> +};
>>
>> I'm currently doing the same exploratory work for virtio-iommu, and I'd be tempted
>> to report reasons as detailed as possible to guest or device driver, but it's not clear
>> what they need, how they would use this information. I'd like to discuss this some
>> more.
> 
> [Liu, Yi L] In fact, it's not necessary to pass the detailed unrecoverable fault to guest in
> virtualization case. Unrecoverable fault happened on native indicates fault during native
> IOMMU address translation. If the fault is not due to guest IOMMU page table setting,
> then it is not necessary to inject the fault to guest. And hypervisor should be able to
> deduce it by walking the guest IOMMU page table with the fault address.

I'm not sure the hypervisor should go and inspect the guest's page tables.
The pIOMMU already did the walk and reported the fault, so the hypervisor
knows that they are invalid. I thought VT-d and other pIOMMUs provide
enough information in the fault report to tell if the error was due to
invalid page tables?

> So I think for
> virtualization case, pass the fault address is enough. If hypervisor doesn't see any issue
> after checking the guest IOMMU translation hierarchy, no use to let guest know it. Hypervisor
> can either throw error log or stop the guest. If hypervisor see any error in the guest
> iommu translation hierarchy, then inject the error to guest with a proper fault type.>
> But for device driver or other user-space driver, I'm not sure if they need detailed fault
> info. In fact, it is enough to pass the possible info which would help them to deduce whether
> the unrecoverable fault is due to them. This need more inputs from device driver reviewers.

Agreed, though I'm not sure how to reach them.

At the moment, the only users of report_iommu_fault, the existing fault
reporting mechanism, are ARM-based IOMMU drivers and there are only four
device drivers that register a handler with iommu_set_fault_handler. Two
of them simply print the fault, one resets the offending device, and the
last one (msm GPU) wants to provide more detailed debugging information
about the device state.

>> For unrecoverable faults I guess CTX means "the host IOMMU driver is broken", since
>> the device tables are invalid. In which case there is no use continuing, trying to
>> shutdown the device cleanly is really all the guest/device driver can do.
> 
> [Liu, Yi L] Not sure about what device table mean here. But I agree that if host IOMMU
> driver has no valid CTX for the device, then this kind of error should result in a shutdown to
> the device.

Yes by device table I meant VT-d's root table and context tables.
>> For ACCESS the error is the device driver's or guest's fault, since the device driver
>> triggered DMA on unmapped buffers, or the guest didn't install the right page tables.
>> This can be repaired without shutting down, it may even just be one execution
>> stream that failed in the device while the others continued normally. It's not as
>> recoverable as a PRI Page Request, but the device driver may still be able to isolate
>> the problem (e.g. by killing the process responsible) and the device to recover from it.
>>
>> So maybe ACCESS would benefit from more details, for example differentiating
>> faults encountered while fetching the pgd from those encountered while fetching a
>> second-level table or pte. The former is a lot less recoverable than the latter (bug in
>> the guest IOMMU driver vs.
>> bug in the device driver).
>>
>> Generalizing this maybe we should differentiate each step of the translation in
>> fault_reason:
>>
>> * Device entry (context) fetch -> host IOMMU driver's fault
>> * PASID table fetch -> guest IOMMU driver or host userspace's fault
>> * pgd fetch -> guest IOMMU driver's fault
>> * pte fetch, including validity and access check -> device driver's fault
> 
> [Liu, Yi L] It's a good summary here. BTW. why pte fetch is due to device driver's fault?

Mmh, not necessarily the device driver's fault, but the most likely cause
is that the device driver didn't call map() before triggering the DMA.
Another less likely cause is a programming error in the vIOMMU driver,
where it failed to perform the map() or populate the page tables properly.

Thanks,
Jean

  reply	other threads:[~2017-11-06 18:59 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-05 23:03 [PATCH v2 00/16] IOMMU driver support for SVM virtualization Jacob Pan
2017-10-05 23:03 ` [PATCH v2 01/16] iommu: introduce bind_pasid_table API function Jacob Pan
2017-10-10 13:14   ` Joerg Roedel
2017-10-10 21:32     ` Jacob Pan
2017-10-10 16:45   ` Jean-Philippe Brucker
2017-10-10 21:42     ` Jacob Pan
2017-10-11  9:17       ` Jean-Philippe Brucker
2017-10-05 23:03 ` [PATCH v2 02/16] iommu/vt-d: add bind_pasid_table function Jacob Pan
2017-10-10 13:21   ` Joerg Roedel
2017-10-12 11:12   ` Liu, Yi L
2017-10-12 17:38     ` Jacob Pan
2017-10-05 23:03 ` [PATCH v2 03/16] iommu: introduce iommu invalidate API function Jacob Pan
2017-10-10 13:35   ` Joerg Roedel
2017-10-10 22:09     ` Jacob Pan
2017-10-11  7:54       ` Liu, Yi L
2017-10-11  9:51         ` Joerg Roedel
2017-10-11 11:54           ` Liu, Yi L
2017-10-11 12:15             ` Joerg Roedel
2017-10-11 12:48               ` Jean-Philippe Brucker
2017-10-12  7:43                 ` Joerg Roedel
2017-10-12  9:38                 ` Bob Liu
2017-10-12  9:50                   ` Liu, Yi L
2017-10-12 10:07                     ` Bob Liu
2017-10-12 10:26                       ` Jean-Philippe Brucker
2017-10-12 10:33                       ` Liu, Yi L
2017-10-05 23:03 ` [PATCH v2 04/16] iommu/vt-d: support flushing more TLB types Jacob Pan
2017-10-26 13:02   ` [v2,04/16] " Lukoshkov, Maksim
2017-10-31 20:39     ` Jacob Pan
2017-10-05 23:03 ` [PATCH v2 05/16] iommu/vt-d: add iommu invalidate function Jacob Pan
2017-10-05 23:03 ` [PATCH v2 06/16] iommu/vt-d: move device_domain_info to header Jacob Pan
2017-10-05 23:03 ` [PATCH v2 07/16] iommu/vt-d: assign PFSID in device TLB invalidation Jacob Pan
2017-10-05 23:03 ` [PATCH v2 08/16] iommu: introduce device fault data Jacob Pan
2017-10-10 19:29   ` Jean-Philippe Brucker
2017-10-10 21:43     ` Jacob Pan
2017-10-20 10:07     ` Liu, Yi L
2017-11-06 19:01       ` Jean-Philippe Brucker [this message]
2017-11-07  8:40         ` Liu, Yi L
2017-11-07 11:38           ` Jean-Philippe Brucker
2017-11-09 19:36             ` Jacob Pan
2017-11-10 13:54               ` Jean-Philippe Brucker
2017-11-10 22:18                 ` Jacob Pan
2017-11-13 13:06                   ` Jean-Philippe Brucker
2017-11-13 16:57                     ` Jacob Pan
2017-11-13 17:23                       ` Jean-Philippe Brucker
2017-11-11  0:00                 ` Jacob Pan
2017-11-13 13:19                   ` Jean-Philippe Brucker
2017-11-13 16:12                     ` Jacob Pan
2017-10-05 23:03 ` [PATCH v2 09/16] driver core: add iommu device fault reporting data Jacob Pan
2017-10-06  5:43   ` Greg Kroah-Hartman
2017-10-06  7:11   ` Christoph Hellwig
2017-10-06  8:26     ` Greg Kroah-Hartman
2017-10-06  8:39     ` Joerg Roedel
2017-10-06 16:22       ` Jacob Pan
2017-10-05 23:03 ` [PATCH v2 10/16] iommu: introduce device fault report API Jacob Pan
2017-10-06  9:36   ` Jean-Philippe Brucker
2017-10-09 18:50     ` Jacob Pan
2017-10-10 13:40   ` Joerg Roedel
2017-10-11 17:21     ` Jacob Pan
2017-10-05 23:03 ` [PATCH v2 11/16] iommu/vt-d: use threaded irq for dmar_fault Jacob Pan
2017-10-05 23:03 ` [PATCH v2 12/16] iommu/vt-d: report unrecoverable device faults Jacob Pan
2017-10-05 23:03 ` [PATCH v2 13/16] iommu/intel-svm: notify page request to guest Jacob Pan
2017-10-05 23:03 ` [PATCH v2 14/16] iommu/intel-svm: replace dev ops with fault report API Jacob Pan
2017-10-05 23:03 ` [PATCH v2 15/16] iommu: introduce page response function Jacob Pan
2017-10-05 23:03 ` [PATCH v2 16/16] iommu/vt-d: add intel iommu " Jacob Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09d451dc-c0e9-1fa2-8f85-45a9b1185d48@arm.com \
    --to=jean-philippe.brucker@arm.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=dwmw2@infradead.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=tianyu.lan@intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).