KVM ARM Archive on lore.kernel.org
 help / color / Atom feed
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Cc: jacob.jun.pan@linux.intel.com,
	"kevin.tian@intel.com" <kevin.tian@intel.com>,
	"yi.l.liu@intel.com" <yi.l.liu@intel.com>,
	"ashok.raj@intel.com" <ashok.raj@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Marc Zyngier <Marc.Zyngier@arm.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	Will Deacon <Will.Deacon@arm.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Vincent Stehle <Vincent.Stehle@arm.com>,
	Robin Murphy <Robin.Murphy@arm.com>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"eric.auger.pro@gmail.com" <eric.auger.pro@gmail.com>
Subject: Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler
Date: Fri, 7 Jun 2019 10:43:01 -0700
Message-ID: <20190607104301.6b1bbd74@jacob-builder> (raw)
In-Reply-To: <dc051424-67d7-02ff-9b8e-0d7a8a4e59eb@arm.com>

On Fri, 7 Jun 2019 11:28:13 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 06/06/2019 21:29, Jacob Pan wrote:
> >>>>>> iommu_unregister_device_fault_handler(&vdev->pdev->dev);      
> >>>>>
> >>>>>
> >>>>> But this can fail if there are pending faults which leaves a
> >>>>> device reference and then the system is broken :(      
> >>>> This series only features unrecoverable errors and for those the
> >>>> unregistration cannot fail. Now unrecoverable errors were added I
> >>>> admit this is confusing. We need to sort this out or clean the
> >>>> dependencies.    
> >>> As Alex pointed out in 4/29, we can make
> >>> iommu_unregister_device_fault_handler() never fail and clean up
> >>> all the pending faults in the host IOMMU belong to that device.
> >>> But the problem is that if a fault, such as PRQ, has already been
> >>> injected into the guest, the page response may come back after
> >>> handler is unregistered and registered again.    
> >>
> >> I'm trying to figure out if that would be harmful in any way. I
> >> guess it can be a bit nasty if we handle the page response right
> >> after having injected a new page request that uses the same PRGI.
> >> In any other case we discard the page response, but here we
> >> forward it to the endpoint and:
> >>
> >> * If the response status is success, endpoint retries the
> >> translation. The guest probably hasn't had time to handle the new
> >> page request and translation will fail, which may lead the endpoint
> >> to give up (two unsuccessful translation requests). Or send a new
> >> request
> >>  
> > Good point, there shouldn't be any harm if the page response is a
> > "fake" success. In fact it could happen in the normal operation when
> > PRQs to two devices share the same non-leaf translation structure.
> > The worst case is just a retry. I am not aware of the retry limit,
> > is it in the PCIe spec? I cannot find it.  
> 
> I don't think so, it's the implementation's choice. In general I don't
> think devices will have a retry limit, but it doesn't seem like the
> PCI spec prevents them from implementing one either. It could be
> useful to stop retrying after a certain number of faults, for
> preventing livelocks when the OS doesn't fix up the page tables and
> the device would just repeat the fault indefinitely.
> 
> > I think we should just document it, similar to having a spurious
> > interrupt. The PRQ trace event should capture that as well.
> >   
> >> * otherwise the endpoint won't retry the access, and could also
> >> disable PRI if the status is failure.
> >>  
> > That would be true regardless this race condition with handler
> > registration. So should be fine.  
> 
> We do give an invalid response for the old PRG (because of
> unregistering), but also for the new one, which has a different
> address that the guest might be able to page in and would normally
> return success.
> 
> >>> We need a way to reject such page response belong
> >>> to the previous life of the handler. Perhaps a sync call to the
> >>> guest with your fault queue eventfd? I am not sure.    
> >>
> >> We could simply expect the device driver not to send any page
> >> response after unregistering the fault handler. Is there any
> >> reason VFIO would need to unregister and re-register the fault
> >> handler on a live guest? 
> > There is no reason for VFIO to unregister and register again, I was
> > just thinking from security perspective. Someone could write a VFIO
> > app do this attack. But I agree the damage is within the device,
> > may get PRI disabled as a result.  
> 
> Yes I think the damage would always be contained within the
> misbehaving software
> 
> > So it seems we agree on the following:
> > - iommu_unregister_device_fault_handler() will never fail
> > - iommu driver cleans up all pending faults when handler is
> > unregistered
> > - assume device driver or guest not sending more page response
> > _after_ handler is unregistered.
> > - system will tolerate rare spurious response
> > 
> > Sounds right?  
> 
> Yes, I'll add that to the fault series
Hold on a second please, I think we need more clarifications. Ashok
pointed out to me that the spurious response can be harmful to other
devices when it comes to mdev, where PRQ group id is not per PASID,
device may reuse the group number and receiving spurious page response
can confuse the entire PF. Having spurious page response is also not
abiding the PCIe spec. exactly.

We have two options here:
1. unregister handler will get -EBUSY if outstanding fault exists.
	-PROs: block offending device unbind only, eventually timeout
	will clear.
	-CONs: flooded faults can prevent clearing
2. unregister handle will block until all faults are clear in the host.
   Never fails unregistration
	-PROs: simple flow for VFIO, no need to worry about device
	holding reference.
	-CONs: spurious page response may come from
	misbehaving/malicious guest if guest does unregister and
	register back to back.
It seems the only way to prevent spurious page response is to introduce
a SW token or sequence# for each PRQ that needs a response. I still
think option 2 is good.

Consider the following time line:
decoding
 PR#: page request
 G#:  group #
 P#:  PASID
 S#:  sequence #
 A#:  address
 PS#: page response
 (F): Fail
 (S): Success

# Dev		Host		VFIO/QEMU	Guest
===========================================================	
1				<-reg(handler)
2 PR1G1S1A1	->		inject	->	PR1G1S1A1
3 PR2G1S2A2	->		inject	->	PR2G1S2A2
4.				<-unreg(handler)
5.	<-PR1G1S1A1(F)			| 
6.	<-PR2G1S2A2(F)			V
7.				<-unreg(handler)
8.				<-reg(handler)
9 PR3G1S3A1	->		inject	->	PR3G1S3A1
10.						<-PS1G1S1A1
11.		<reject S1>
11.		<accept S3>			<-PS3G1S3A1
12.PS3G1S3A1(S)

The spurious page response comes in at step 10 where the guest sends
response for the request in step 1. But since the sequence # is 1, host
IOMMU driver will reject it. At step 11, we accept page response for
the matching sequence # then respond SUCCESS to the device.

So would it be OK to add this sequence# to iommu_fault and page
response, or could event reuse the time stamp for that purpose.


Jacob
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply index

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-26 16:09 [PATCH v8 00/29] SMMUv3 Nested Stage Setup Eric Auger
2019-05-26 16:09 ` [PATCH v8 01/29] driver core: Add per device iommu param Eric Auger
2019-05-26 16:09 ` [PATCH v8 02/29] iommu: Introduce device fault data Eric Auger
2019-05-26 16:09 ` [PATCH v8 03/29] iommu: Introduce device fault report API Eric Auger
2019-05-26 16:09 ` [PATCH v8 04/29] iommu: Add recoverable fault reporting Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-04 15:48     ` Jacob Pan
2019-05-26 16:09 ` [PATCH v8 05/29] iommu: Add a timeout parameter for PRQ response Eric Auger
2019-06-03 22:32   ` Alex Williamson
2019-06-04 10:52     ` Jean-Philippe Brucker
2019-06-04 15:50       ` Jacob Pan
2019-05-26 16:09 ` [PATCH v8 06/29] trace/iommu: Add sva trace events Eric Auger
2019-05-26 16:09 ` [PATCH v8 07/29] iommu: Use device fault trace event Eric Auger
2019-05-26 16:09 ` [PATCH v8 08/29] iommu: Introduce attach/detach_pasid_table API Eric Auger
2019-05-26 16:09 ` [PATCH v8 09/29] iommu: Introduce cache_invalidate API Eric Auger
2019-05-26 16:09 ` [PATCH v8 10/29] iommu: Introduce bind/unbind_guest_msi Eric Auger
2019-05-26 16:09 ` [PATCH v8 11/29] iommu/arm-smmu-v3: Maintain a SID->device structure Eric Auger
2019-05-26 16:09 ` [PATCH v8 12/29] iommu/smmuv3: Dynamically allocate s1_cfg and s2_cfg Eric Auger
2019-05-26 16:09 ` [PATCH v8 13/29] iommu/smmuv3: Get prepared for nested stage support Eric Auger
2019-05-26 16:09 ` [PATCH v8 14/29] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
2019-05-26 16:09 ` [PATCH v8 15/29] iommu/smmuv3: Introduce __arm_smmu_tlb_inv_asid/s1_range_nosync Eric Auger
2019-05-26 16:09 ` [PATCH v8 16/29] iommu/smmuv3: Implement cache_invalidate Eric Auger
2019-05-26 16:09 ` [PATCH v8 17/29] dma-iommu: Implement NESTED_MSI cookie Eric Auger
2019-05-26 16:09 ` [PATCH v8 18/29] iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement Eric Auger
2019-05-26 16:09 ` [PATCH v8 19/29] iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions Eric Auger
2019-05-26 16:09 ` [PATCH v8 20/29] iommu/smmuv3: Implement bind/unbind_guest_msi Eric Auger
2019-05-26 16:09 ` [PATCH v8 21/29] iommu/smmuv3: Report non recoverable faults Eric Auger
2019-05-26 16:09 ` [PATCH v8 22/29] vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE Eric Auger
2019-06-03 22:32   ` Alex Williamson
2019-05-26 16:09 ` [PATCH v8 23/29] vfio: VFIO_IOMMU_CACHE_INVALIDATE Eric Auger
2019-06-14 12:38   ` Liu, Yi L
2019-06-14 13:17     ` Auger Eric
2019-05-26 16:09 ` [PATCH v8 24/29] vfio: VFIO_IOMMU_BIND/UNBIND_MSI Eric Auger
2019-06-03 22:32   ` Alex Williamson
2019-06-07  8:30     ` Auger Eric
2019-05-26 16:10 ` [PATCH v8 25/29] vfio-pci: Add a new VFIO_REGION_TYPE_NESTED region type Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-07  8:28     ` Auger Eric
2019-06-07 12:47       ` Jean-Philippe Brucker
2019-06-07 16:29       ` Alex Williamson
2019-05-26 16:10 ` [PATCH v8 26/29] vfio-pci: Register an iommu fault handler Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-04 16:11     ` Auger Eric
2019-06-05 22:45       ` Jacob Pan
2019-06-06 18:54         ` Jean-Philippe Brucker
2019-06-06 20:29           ` Jacob Pan
2019-06-07  7:02             ` Auger Eric
2019-06-07 10:28             ` Jean-Philippe Brucker
2019-06-07 17:43               ` Jacob Pan [this message]
2019-06-10 12:45                 ` Jean-Philippe Brucker
2019-06-10 21:31                   ` Jacob Pan
2019-06-11 13:14                     ` Jean-Philippe Brucker
2019-06-12 18:53                       ` Jacob Pan
2019-06-18 14:04                         ` Jean-Philippe Brucker
2019-06-19  0:19                           ` Jacob Pan
2019-06-19 11:44                             ` Jean-Philippe Brucker
2019-07-11 13:07                           ` Auger Eric
2019-06-07 12:48   ` Jean-Philippe Brucker
2019-06-07 14:18     ` Auger Eric
2019-05-26 16:10 ` [PATCH v8 27/29] vfio_pci: Allow to mmap the fault queue Eric Auger
2019-05-26 16:10 ` [PATCH v8 28/29] vfio-pci: Add VFIO_PCI_DMA_FAULT_IRQ_INDEX Eric Auger
2019-06-03 22:31   ` Alex Williamson
2019-06-04 16:11     ` Auger Eric
2019-05-26 16:10 ` [PATCH v8 29/29] vfio: Document nested stage control Eric Auger

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190607104301.6b1bbd74@jacob-builder \
    --to=jacob.jun.pan@linux.intel.com \
    --cc=Marc.Zyngier@arm.com \
    --cc=Robin.Murphy@arm.com \
    --cc=Vincent.Stehle@arm.com \
    --cc=Will.Deacon@arm.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=eric.auger.pro@gmail.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jean-philippe.brucker@arm.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM ARM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvmarm/0 kvmarm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvmarm kvmarm/ https://lore.kernel.org/kvmarm \
		kvmarm@lists.cs.columbia.edu kvmarm@archiver.kernel.org
	public-inbox-index kvmarm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/edu.columbia.cs.lists.kvmarm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox