Re: RMRR device on non-Intel platform

From: Robin Murphy <robin.murphy@arm.com>
To: "Tian, Kevin" <kevin.tian@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Jason Gunthorpe <jgg@nvidia.com>,
	"Lu, Baolu" <baolu.lu@intel.com>
Subject: Re: RMRR device on non-Intel platform
Date: Fri, 21 Apr 2023 12:34:35 +0100	[thread overview]
Message-ID: <b2725362-aede-44dd-76ce-39482511ec94@arm.com> (raw)
In-Reply-To: <BN9PR11MB52760FCB2D35D08723BE3F2A8C609@BN9PR11MB5276.namprd11.prod.outlook.com>

On 2023-04-21 05:10, Tian, Kevin wrote:
>> From: Alex Williamson <alex.williamson@redhat.com>
>> Sent: Friday, April 21, 2023 5:50 AM

[ 2-in-1 reply since I'm lazy... ]

>> On Thu, 20 Apr 2023 17:55:22 +0100
>> Robin Murphy <robin.murphy@arm.com> wrote:
>>
>>> On 20/04/2023 3:49 pm, Alex Williamson wrote:
>>>> On Thu, 20 Apr 2023 15:19:55 +0100
>>>> Robin Murphy <robin.murphy@arm.com> wrote:
>>>>
>>>>> On 2023-04-20 15:15, Alex Williamson wrote:
>>>>>> On Thu, 20 Apr 2023 06:52:01 +0000
>>>>>> "Tian, Kevin" <kevin.tian@intel.com> wrote:
>>>>>>
>>>>>>> Hi, Alex,
>>>>>>>
>>>>>>> Happen to see that we may have inconsistent policy about RMRR
>> devices cross
>>>>>>> different vendors.
>>>>>>>
>>>>>>> Previously only Intel supports RMRR. Now both AMD/ARM have
>> similar thing,
>>>>>>> AMD IVMD and ARM RMR.
>>>>>>
>>>>>> Any similar requirement imposed by system firmware that the
>> operating
>>>>>> system must perpetually maintain a specific IOVA mapping for the
>> device
>>>>>> should impose similar restrictions as we've implemented for VT-d
>>>>>> RMMR[1].  Thanks,
>>>>>
>>>>> Hmm, does that mean that vfio_iommu_resv_exclude() going to the
>> trouble
>>>>> of punching out all the reserved region holes isn't really needed?
>>>>
>>>> While "Reserved Memory Region Reporting", might suggest that the
>> ranges
>>>> are simply excluded, RMRR actually require that specific mappings are
>>>> maintained for ongoing, side-band activity, which is not compatible
>>>> with the ideas that userspace owns the IOVA address space for the
>>>> device or separation of host vs userspace control of the device.  Such
>>>> mappings suggest things like system health monitoring where the
>>>> influence of a user-owned device can easily extend to a system-wide
>>>> scope if the user it able to manipulate the device to deny that
>>>> interaction or report bad data.
>>>>
>>>> If these ARM and AMD tables impose similar requirements, we should
>>>> really be restricting devices encumbered by such requirements from
>>>> userspace access as well.  Thanks,
>>>
>>> Indeed the primary use-case behind Arm's RMRs was certain devices like
>>> big complex RAID controllers which have already been started by UEFI
>>> firmware at boot and have live in-memory data which needs to be
>> preserved.
>>>
>>> However, my point was more that if it's a VFIO policy that any device
>>> with an IOMMU_RESV_DIRECT reservation is not suitable for userspace
>>> assignment, then vfio_iommu_type1_attach_group() already has
>> everything
>>> it would need to robustly enforce that policy itself. It seems silly to
>>> me for it to expect the IOMMU driver to fail the attach, then go ahead
>>> and dutifully punch out direct regions if it happened not to. A couple
>>> of obvious trivial tweaks and there could be no dependency on driver
>>> behaviour at all, other than correctly reporting resv_regions to begin with.
>>>
>>> If we think this policy deserves to go beyond VFIO and userspace, and
>>> it's reasonable that such devices should never be allowed to attach to
>>> any other kind of kernel-owned unmanaged domain either, then we can
>>> still trivially enforce that in core IOMMU code. I really see no need
>>> for it to be in drivers at all.
>>
>> It seems like a reasonable choice to me that any mixing of unmanaged
>> domains with IOMMU_RESV_DIRECT could be restricted globally.  Do we
>> even have infrastructure for a driver to honor the necessary mapping
>> requirements?

In principle a driver that knew what it was doing could call 
iommu_group_get_resv_regions() and map them into its own domain before 
attaching. Since nothing upstream *is* actually doing that, though, I'd 
have no real objection to restricting this at the API level as the 
safest approach, then coming back to consider exceptional cases if and 
when anybody does really have one.

>> It looks pretty easy to do as well, something like this (untested):
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 10db680acaed..521f9a731ce9 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2012,11 +2012,29 @@ static void
>> __iommu_group_set_core_domain(struct iommu_group *group)
>>   static int __iommu_attach_device(struct iommu_domain *domain,
>>   				 struct device *dev)
>>   {
>> -	int ret;
>> +	int ret = 0;
>>
>>   	if (unlikely(domain->ops->attach_dev == NULL))
>>   		return -ENODEV;
>>
>> +	if (domain->type == IOMMU_DOMAIN_UNMANAGED) {
>> +		struct iommu_resv_region *region;
>> +		LIST_HEAD(resv_regions);
>> +
>> +		iommu_get_resv_regions(dev, &resv_regions);
>> +		list_for_each_entry(region, &resv_regions, list) {
>> +			if (region->type == IOMMU_RESV_DIRECT) {
>> +				ret = -EPERM;
>> +				break;
>> +			}
>> +		}
>> +		iommu_put_resv_regions(dev, &resv_regions);
>> +		if (ret) {
>> +			dev_warn(dev, "Device may not be used with an
>> unmanaged IOMMU domain due to reserved direct mapping
>> requirement.\n");
>> +			return ret;
>> +		}
>> +	}
>> +
>>   	ret = domain->ops->attach_dev(domain, dev);
>>   	if (ret)
>>   		return ret;
>>
>> Restrictions in either type1 or iommufd would be pretty trivial as well,
>> but centralizing it in core IOMMU code would do a better job of covering
>> all use cases.

Indeed, I was thinking that either way, since we usually process the 
reserved regions at probe time anyway, it only needs a slight shuffle to 
cache this as a flag that both core code and/or users can easily refer 
to (so VFIO and IOMMUFD could still identify unsuitable devices long 
before having to go as far as the attach failing). Something like:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 265d34e9cbcb..53d3daef7a7b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -935,19 +935,20 @@ static int 
iommu_create_device_direct_mappings(struct iommu_group *group,
  {
  	struct iommu_domain *domain = group->default_domain;
  	struct iommu_resv_region *entry;
-	struct list_head mappings;
+	LIST_HEAD(mappings);
  	unsigned long pg_size;
  	int ret = 0;

+	iommu_get_resv_regions(dev, &mappings);
+	list_for_each_entry(entry, &mappings, list)
+		if (entry->type == IOMMU_RESV_DIRECT)
+			dev->iommu->has_resv_direct = true;
+
  	if (!domain || !iommu_is_dma_domain(domain))
-		return 0;
+		goto out;

  	BUG_ON(!domain->pgsize_bitmap);
-
  	pg_size = 1UL << __ffs(domain->pgsize_bitmap);
-	INIT_LIST_HEAD(&mappings);
-
-	iommu_get_resv_regions(dev, &mappings);

  	/* We need to consider overlapping regions for different devices */
  	list_for_each_entry(entry, &mappings, list) {

>>
>> This effectively makes the VT-d code further down the same path
>> redundant, so no new restrictions there.
>>
>> What sort of fall-out should we expect on ARM or AMD?  This was a pretty
>> painful restriction to add on Intel.  Thanks,

I can't speak for AMD, but I wouldn't imagine any noticeable fallout for 
Arm. RMRs are still relatively new, and the devices that I've heard 
about seem highly unlikely to ever want to be assigned to userspace 
directly (they may well provide virtual functions or sub-devices that 
absolutely *would* be used with VFIO, but the "host" device owning the 
RMRs would not be).

>>
> 
> What about device_rmrr_is_relaxable()? Leave it in specific driver or
> consolidate to be generic?

That's where the generic information already comes from, since it 
provides the region type in intel_iommu_get_resv_regions() - the 
existing resv_region abstraction *is* the consolidation. AFAICS it would 
be needlessly painful to start trying to directly process bits of DMAR 
or IRVS outside their respective drivers. It's only 
device_is_rmrr_locked() which would become redundant here.

> intel-iommu sets RELAXABLE for USB and GPU assuming their RMRR region
> is not used post boot.
> 
> Presumably same policy can be applied to AMD too.
> 
> ARM RMR reports an explicit flag (ACPI_IORT_RMR_REMAP_PERMITTED) to
> mark out whether a RMR region is relaxable. I'm not sure whether USB/GPU
> is already covered.

Arm systems have no legacy of needing to offer compatibility with a PC 
BIOS from the 1990s, so PS/2 or VGA emulation is a non-issue for us ;)

Thanks,
Robin.