Re: RMRR device on non-Intel platform

From: Alex Williamson <alex.williamson@redhat.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: RMRR device on non-Intel platform
Date: Thu, 20 Apr 2023 15:49:33 -0600	[thread overview]
Message-ID: <20230420154933.1a79de4e.alex.williamson@redhat.com> (raw)
In-Reply-To: <fd324213-8d77-cb67-1c52-01cd0997a92c@arm.com>

On Thu, 20 Apr 2023 17:55:22 +0100
Robin Murphy <robin.murphy@arm.com> wrote:

> On 20/04/2023 3:49 pm, Alex Williamson wrote:
> > On Thu, 20 Apr 2023 15:19:55 +0100
> > Robin Murphy <robin.murphy@arm.com> wrote:
> >   
> >> On 2023-04-20 15:15, Alex Williamson wrote:  
> >>> On Thu, 20 Apr 2023 06:52:01 +0000
> >>> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >>>      
> >>>> Hi, Alex,
> >>>>
> >>>> Happen to see that we may have inconsistent policy about RMRR devices cross
> >>>> different vendors.
> >>>>
> >>>> Previously only Intel supports RMRR. Now both AMD/ARM have similar thing,
> >>>> AMD IVMD and ARM RMR.  
> >>>
> >>> Any similar requirement imposed by system firmware that the operating
> >>> system must perpetually maintain a specific IOVA mapping for the device
> >>> should impose similar restrictions as we've implemented for VT-d
> >>> RMMR[1].  Thanks,  
> >>
> >> Hmm, does that mean that vfio_iommu_resv_exclude() going to the trouble
> >> of punching out all the reserved region holes isn't really needed?  
> > 
> > While "Reserved Memory Region Reporting", might suggest that the ranges
> > are simply excluded, RMRR actually require that specific mappings are
> > maintained for ongoing, side-band activity, which is not compatible
> > with the ideas that userspace owns the IOVA address space for the
> > device or separation of host vs userspace control of the device.  Such
> > mappings suggest things like system health monitoring where the
> > influence of a user-owned device can easily extend to a system-wide
> > scope if the user it able to manipulate the device to deny that
> > interaction or report bad data.
> > 
> > If these ARM and AMD tables impose similar requirements, we should
> > really be restricting devices encumbered by such requirements from
> > userspace access as well.  Thanks,  
> 
> Indeed the primary use-case behind Arm's RMRs was certain devices like 
> big complex RAID controllers which have already been started by UEFI 
> firmware at boot and have live in-memory data which needs to be preserved.
> 
> However, my point was more that if it's a VFIO policy that any device 
> with an IOMMU_RESV_DIRECT reservation is not suitable for userspace 
> assignment, then vfio_iommu_type1_attach_group() already has everything 
> it would need to robustly enforce that policy itself. It seems silly to 
> me for it to expect the IOMMU driver to fail the attach, then go ahead 
> and dutifully punch out direct regions if it happened not to. A couple 
> of obvious trivial tweaks and there could be no dependency on driver 
> behaviour at all, other than correctly reporting resv_regions to begin with.
> 
> If we think this policy deserves to go beyond VFIO and userspace, and 
> it's reasonable that such devices should never be allowed to attach to 
> any other kind of kernel-owned unmanaged domain either, then we can 
> still trivially enforce that in core IOMMU code. I really see no need 
> for it to be in drivers at all.

It seems like a reasonable choice to me that any mixing of unmanaged
domains with IOMMU_RESV_DIRECT could be restricted globally.  Do we
even have infrastructure for a driver to honor the necessary mapping
requirements?

It looks pretty easy to do as well, something like this (untested):

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 10db680acaed..521f9a731ce9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2012,11 +2012,29 @@ static void __iommu_group_set_core_domain(struct iommu_group *group)
 static int __iommu_attach_device(struct iommu_domain *domain,
 				 struct device *dev)
 {
-	int ret;
+	int ret = 0;
 
 	if (unlikely(domain->ops->attach_dev == NULL))
 		return -ENODEV;
 
+	if (domain->type == IOMMU_DOMAIN_UNMANAGED) {
+		struct iommu_resv_region *region;
+		LIST_HEAD(resv_regions);
+
+		iommu_get_resv_regions(dev, &resv_regions);
+		list_for_each_entry(region, &resv_regions, list) {
+			if (region->type == IOMMU_RESV_DIRECT) {
+				ret = -EPERM;
+				break;
+			}
+		}
+		iommu_put_resv_regions(dev, &resv_regions);
+		if (ret) {
+			dev_warn(dev, "Device may not be used with an unmanaged IOMMU domain due to reserved direct mapping requirement.\n");
+			return ret;
+		}
+	}
+
 	ret = domain->ops->attach_dev(domain, dev);
 	if (ret)
 		return ret;
 
Restrictions in either type1 or iommufd would be pretty trivial as well,
but centralizing it in core IOMMU code would do a better job of covering
all use cases.

This effectively makes the VT-d code further down the same path
redundant, so no new restrictions there.

What sort of fall-out should we expect on ARM or AMD?  This was a pretty
painful restriction to add on Intel.  Thanks,

Alex

> >>> [1]https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf
> >>>      
> >>>> RMRR identity mapping was considered unsafe (except for USB/GPU) for
> >>>> device assignment:
> >>>>
> >>>> /*
> >>>>    * There are a couple cases where we need to restrict the functionality of
> >>>>    * devices associated with RMRRs.  The first is when evaluating a device for
> >>>>    * identity mapping because problems exist when devices are moved in and out
> >>>>    * of domains and their respective RMRR information is lost.  This means that
> >>>>    * a device with associated RMRRs will never be in a "passthrough" domain.
> >>>>    * The second is use of the device through the IOMMU API.  This interface
> >>>>    * expects to have full control of the IOVA space for the device.  We cannot
> >>>>    * satisfy both the requirement that RMRR access is maintained and have an
> >>>>    * unencumbered IOVA space.  We also have no ability to quiesce the device's
> >>>>    * use of the RMRR space or even inform the IOMMU API user of the restriction.
> >>>>    * We therefore prevent devices associated with an RMRR from participating in
> >>>>    * the IOMMU API, which eliminates them from device assignment.
> >>>>    *
> >>>>    * In both cases, devices which have relaxable RMRRs are not concerned by this
> >>>>    * restriction. See device_rmrr_is_relaxable comment.
> >>>>    */
> >>>> static bool device_is_rmrr_locked(struct device *dev)
> >>>> {
> >>>> 	if (!device_has_rmrr(dev))
> >>>> 		return false;
> >>>>
> >>>> 	if (device_rmrr_is_relaxable(dev))
> >>>> 		return false;
> >>>>
> >>>> 	return true;
> >>>> }
> >>>>
> >>>> Then non-relaxable RMRR device is rejected when doing attach:
> >>>>
> >>>> static int intel_iommu_attach_device(struct iommu_domain *domain,
> >>>>                                        struct device *dev)
> >>>> {
> >>>> 	struct device_domain_info *info = dev_iommu_priv_get(dev);
> >>>> 	int ret;
> >>>>
> >>>> 	if (domain->type == IOMMU_DOMAIN_UNMANAGED &&
> >>>> 	    device_is_rmrr_locked(dev)) {
> >>>> 		dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement.  Contact your platform vendor.\n");
> >>>> 		return -EPERM;
> >>>> 	}
> >>>> 	...
> >>>> }
> >>>>
> >>>> But I didn't find the same check in AMD/ARM driver at a glance.
> >>>>
> >>>> Did I overlook some arch difference which makes RMRR device safe in
> >>>> those platforms or is it a gap to be fixed?
> >>>>
> >>>> Thanks
> >>>> Kevin
> >>>>     
> >>>      
> >>  
> >   
>