KVM ARM Archive on lore.kernel.org
 help / color / Atom feed
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Cc: "kevin.tian@intel.com" <kevin.tian@intel.com>,
	Vincent Stehle <Vincent.Stehle@arm.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	"ashok.raj@intel.com" <ashok.raj@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Marc Zyngier <Marc.Zyngier@arm.com>,
	jacob.jun.pan@linux.intel.com, Will Deacon <Will.Deacon@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	Robin Murphy <Robin.Murphy@arm.com>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"eric.auger.pro@gmail.com" <eric.auger.pro@gmail.com>
Subject: Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler
Date: Wed, 12 Jun 2019 11:53:58 -0700
Message-ID: <20190612115358.0d90b322@jacob-builder> (raw)
In-Reply-To: <905f130b-02dc-6971-8d5b-ce87d9bc96a4@arm.com>

On Tue, 11 Jun 2019 14:14:33 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 10/06/2019 22:31, Jacob Pan wrote:
> > On Mon, 10 Jun 2019 13:45:02 +0100
> > Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:
> >   
> >> On 07/06/2019 18:43, Jacob Pan wrote:  
> >>>>> So it seems we agree on the following:
> >>>>> - iommu_unregister_device_fault_handler() will never fail
> >>>>> - iommu driver cleans up all pending faults when handler is
> >>>>> unregistered
> >>>>> - assume device driver or guest not sending more page response
> >>>>> _after_ handler is unregistered.
> >>>>> - system will tolerate rare spurious response
> >>>>>
> >>>>> Sounds right?      
> >>>>
> >>>> Yes, I'll add that to the fault series    
> >>> Hold on a second please, I think we need more clarifications.
> >>> Ashok pointed out to me that the spurious response can be harmful
> >>> to other devices when it comes to mdev, where PRQ group id is not
> >>> per PASID, device may reuse the group number and receiving
> >>> spurious page response can confuse the entire PF.     
> >>
> >> I don't understand how mdev differs from the non-mdev situation
> >> (but I also still don't fully get how mdev+PASID will be
> >> implemented). Is the following the case you're worried about?
> >>
> >>   M#: mdev #
> >>
> >> # Dev         Host        mdev drv       VFIO/QEMU        Guest
> >> ====================================================================
> >> 1                     <- reg(handler)
> >> 2 PR1 G1 P1    ->         M1 PR1 G1        inject ->     M1 PR1 G1
> >> 3                     <- unreg(handler)
> >> 4       <- PS1 G1 P1 (F)      |
> >> 5                        unreg(handler)
> >> 6                     <- reg(handler)
> >> 7 PR2 G1 P1    ->         M2 PR2 G1        inject ->     M2 PR2 G1
> >> 8                                                     <- M1 PS1 G1
> >> 9         accept ??    <- PS1 G1 P1
> >> 10                                                    <- M2 PS2 G1
> >> 11        accept       <- PS2 G1 P1
> >>  
> > Not really. I am not worried about PASID reuse or unbind. Just
> > within the same PASID bind lifetime of a single mdev, back to back
> > register/unregister fault handler.
> > After Step 4, device will think G1 is done. Device could reuse G1
> > for the next PR, if we accept PS1 in step 9, device will terminate
> > G1 before the real G1 PS arrives in Step 11. The real G1 PS might
> > have a different response code. Then we just drop the PS in Step
> > 11?  
> 
> Yes, I think we do. Two possibilities:
> 
> * G1 is reused at step 7 for the same PASID context, which means that
> it is for the same mdev. The problem is then identical to the non-mdev
> case, new page faults and old page response may cross:
> 
> # Dev         Host        mdev drv       VFIO/QEMU        Guest
> ====================================================================
> 7 PR2 G1 P1  --.
> 8               \                         .------------- M1 PS1 G1
> 9                '----->  PR2 G1 P1  ->  /   inject  --> M1 PR2 G1
> 10           accept <---  PS1 G1 P1  <--'
> 11           reject <---  PS2 G1 P1  <------------------ M1 PS2 G1
> 
> And the incorrect page response is returned to the guest. However it
> affects a single mdev/guest context, it doesn't affect other mdevs.
> 
> * Or G1 is reused at step 7 for a different PASID. At step 10 the
> fault handler rejects the page response because the PASID is
> different, and step 11 is accepted.
> 
> 
> >>> Having spurious page response is also not
> >>> abiding the PCIe spec. exactly.    
> >>
> >> We are following the PCI spec though, in that we don't send page
> >> responses for PRGIs that aren't in flight.
> >>  
> > You are right, the worst case of the spurious PS is to terminate the
> > group prematurely. Need to know the scope of the HW damage in case
> > of mdev where group IDs can be shared among mdevs belong to the
> > same PF.  
> 
> But from the IOMMU fault API point of view, the full page request is
> identified by both PRGI and PASID. Given that each mdev has its own
> set of PASIDs, it should be easy to isolate page responses per mdev.
> 
On Intel platform, devices sending page request with private data must
receive page response with matching private data. If we solely depend
on PRGI and PASID, we may send stale private data to the device in
those incorrect page response. Since private data may represent PF
device wide contexts, the consequence of sending page response with
wrong private data may affect other mdev/PASID.

One solution we are thinking to do is to inject the sequence #(e.g.
ktime raw mono clock) as vIOMMU private data into to the guest. Guest
would return this fake private data in page response, then host will
send page response back to the device that matches PRG1 and PASID and
private_data.

This solution does not expose HW context related private data to the
guest but need to extend page response in iommu uapi.

/**
 * struct iommu_page_response - Generic page response information
 * @version: API version of this structure
 * @flags: encodes whether the corresponding fields are valid
 *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
 * @pasid: Process Address Space ID
 * @grpid: Page Request Group Index
 * @code: response code from &enum iommu_page_response_code
 * @private_data: private data for the matching page request
 */
struct iommu_page_response {
#define IOMMU_PAGE_RESP_VERSION_1	1
	__u32	version;
#define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
#define IOMMU_PAGE_RESP_PRIVATE_DATA	(1 << 1)
	__u32	flags;
	__u32	pasid;
	__u32	grpid;
	__u32	code;
	__u32	padding;
	__u64	private_data[2];
};

There is also the change needed for separating storage for the real and
fake private data.

Sorry for the last minute change, did not realize the HW implications.

I see this as a future extension due to limited testing, perhaps for
now, can you add paddings similar to page request? Make it 64B as well.

struct iommu_page_response {
#define IOMMU_PAGE_RESP_VERSION_1	1
	__u32	version;
#define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
	__u32	flags;
	__u32	pasid;
	__u32	grpid;
	__u32	code;
	__u8	padding[44];
};

Thanks!

Jacob
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply index

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-26 16:09 [PATCH v8 00/29] SMMUv3 Nested Stage Setup Eric Auger
2019-05-26 16:09 ` [PATCH v8 01/29] driver core: Add per device iommu param Eric Auger
2019-05-26 16:09 ` [PATCH v8 02/29] iommu: Introduce device fault data Eric Auger
2019-05-26 16:09 ` [PATCH v8 03/29] iommu: Introduce device fault report API Eric Auger
2019-05-26 16:09 ` [PATCH v8 04/29] iommu: Add recoverable fault reporting Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-04 15:48     ` Jacob Pan
2019-05-26 16:09 ` [PATCH v8 05/29] iommu: Add a timeout parameter for PRQ response Eric Auger
2019-06-03 22:32   ` Alex Williamson
2019-06-04 10:52     ` Jean-Philippe Brucker
2019-06-04 15:50       ` Jacob Pan
2019-05-26 16:09 ` [PATCH v8 06/29] trace/iommu: Add sva trace events Eric Auger
2019-05-26 16:09 ` [PATCH v8 07/29] iommu: Use device fault trace event Eric Auger
2019-05-26 16:09 ` [PATCH v8 08/29] iommu: Introduce attach/detach_pasid_table API Eric Auger
2019-05-26 16:09 ` [PATCH v8 09/29] iommu: Introduce cache_invalidate API Eric Auger
2019-05-26 16:09 ` [PATCH v8 10/29] iommu: Introduce bind/unbind_guest_msi Eric Auger
2019-05-26 16:09 ` [PATCH v8 11/29] iommu/arm-smmu-v3: Maintain a SID->device structure Eric Auger
2019-05-26 16:09 ` [PATCH v8 12/29] iommu/smmuv3: Dynamically allocate s1_cfg and s2_cfg Eric Auger
2019-05-26 16:09 ` [PATCH v8 13/29] iommu/smmuv3: Get prepared for nested stage support Eric Auger
2019-05-26 16:09 ` [PATCH v8 14/29] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
2019-05-26 16:09 ` [PATCH v8 15/29] iommu/smmuv3: Introduce __arm_smmu_tlb_inv_asid/s1_range_nosync Eric Auger
2019-05-26 16:09 ` [PATCH v8 16/29] iommu/smmuv3: Implement cache_invalidate Eric Auger
2019-05-26 16:09 ` [PATCH v8 17/29] dma-iommu: Implement NESTED_MSI cookie Eric Auger
2019-05-26 16:09 ` [PATCH v8 18/29] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement Eric Auger
2019-05-26 16:09 ` [PATCH v8 19/29] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions Eric Auger
2019-05-26 16:09 ` [PATCH v8 20/29] iommu/smmuv3: Implement bind/unbind_guest_msi Eric Auger
2019-05-26 16:09 ` [PATCH v8 21/29] iommu/smmuv3: Report non recoverable faults Eric Auger
2019-05-26 16:09 ` [PATCH v8 22/29] vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE Eric Auger
2019-06-03 22:32   ` Alex Williamson
2019-05-26 16:09 ` [PATCH v8 23/29] vfio: VFIO_IOMMU_CACHE_INVALIDATE Eric Auger
2019-06-14 12:38   ` Liu, Yi L
2019-06-14 13:17     ` Auger Eric
2019-05-26 16:09 ` [PATCH v8 24/29] vfio: VFIO_IOMMU_BIND/UNBIND_MSI Eric Auger
2019-06-03 22:32   ` Alex Williamson
2019-06-07  8:30     ` Auger Eric
2019-05-26 16:10 ` [PATCH v8 25/29] vfio-pci: Add a new VFIO_REGION_TYPE_NESTED region type Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-07  8:28     ` Auger Eric
2019-06-07 12:47       ` Jean-Philippe Brucker
2019-06-07 16:29       ` Alex Williamson
2019-05-26 16:10 ` [PATCH v8 26/29] vfio-pci: Register an iommu fault handler Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-04 16:11     ` Auger Eric
2019-06-05 22:45       ` Jacob Pan
2019-06-06 18:54         ` Jean-Philippe Brucker
2019-06-06 20:29           ` Jacob Pan
2019-06-07  7:02             ` Auger Eric
2019-06-07 10:28             ` Jean-Philippe Brucker
2019-06-07 17:43               ` Jacob Pan
2019-06-10 12:45                 ` Jean-Philippe Brucker
2019-06-10 21:31                   ` Jacob Pan
2019-06-11 13:14                     ` Jean-Philippe Brucker
2019-06-12 18:53                       ` Jacob Pan [this message]
2019-06-18 14:04                         ` Jean-Philippe Brucker
2019-06-19  0:19                           ` Jacob Pan
2019-06-19 11:44                             ` Jean-Philippe Brucker
2019-07-11 13:07                           ` Auger Eric
2019-06-07 12:48   ` Jean-Philippe Brucker
2019-06-07 14:18     ` Auger Eric
2019-05-26 16:10 ` [PATCH v8 27/29] vfio_pci: Allow to mmap the fault queue Eric Auger
2019-05-26 16:10 ` [PATCH v8 28/29] vfio-pci: Add VFIO_PCI_DMA_FAULT_IRQ_INDEX Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-04 16:11     ` Auger Eric
2019-05-26 16:10 ` [PATCH v8 29/29] vfio: Document nested stage control Eric Auger

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190612115358.0d90b322@jacob-builder \
    --to=jacob.jun.pan@linux.intel.com \
    --cc=Marc.Zyngier@arm.com \
    --cc=Robin.Murphy@arm.com \
    --cc=Vincent.Stehle@arm.com \
    --cc=Will.Deacon@arm.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=eric.auger.pro@gmail.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jean-philippe.brucker@arm.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM ARM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvmarm/0 kvmarm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvmarm kvmarm/ https://lore.kernel.org/kvmarm \
		kvmarm@lists.cs.columbia.edu kvmarm@archiver.kernel.org
	public-inbox-index kvmarm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/edu.columbia.cs.lists.kvmarm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox