Re: [RFC PATCH v2 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	joro@8bytes.org, jroedel@suse.de
Subject: Re: [RFC PATCH v2 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
Date: Thu, 18 Jan 2018 10:25:12 +0700	[thread overview]
Message-ID: <0cdcfade-481a-96de-cadb-291968392318@amd.com> (raw)
In-Reply-To: <20180108135329.3c1e2c88@t450s.home>

Hi Alex,

On 1/9/18 3:53 AM, Alex Williamson wrote:
> On Wed, 27 Dec 2017 04:20:34 -0500
> Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> wrote:
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index e30e29a..f000844 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>
>> ... >>
>> @@ -479,6 +486,40 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
>>   	return unlocked;
>>   }
>>   
>> +/*
>> + * Generally, VFIO needs to unpin remote pages after each IOTLB flush.
>> + * Therefore, when using IOTLB flush sync interface, VFIO need to keep track
>> + * of these regions (currently using a list).
>> + *
>> + * This value specifies maximum number of regions for each IOTLB flush sync.
>> + */
>> +#define VFIO_IOMMU_TLB_SYNC_MAX		512
> 
> Is this an arbitrary value or are there non-obvious considerations for
> this value should we want to further tune it in the future?

This is just an arbitrary value for now, which we could try further tuning.
On some dGPUs that I have been using, I have seen max of ~1500 regions within an unmap call.
In most case, I see less than 100 regions in an unmap call. The structure is currently 40 bytes.
So, I figured capping at 512 entry in the list is 20KB is reasonable. Let me know what you think.

>> ....
>>
>> @@ -887,8 +946,14 @@ static int map_try_harder(struct vfio_domain *domain, dma_addr_t iova,
>>   			break;
>>   	}
>>   
>> -	for (; i < npage && i > 0; i--, iova -= PAGE_SIZE)
>> -		iommu_unmap(domain->domain, iova, PAGE_SIZE);
>> +	for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) {
>> +		unmapped = iommu_unmap_fast(domain->domain, iova, PAGE_SIZE);
>> +		if (WARN_ON(!unmapped))
>> +			break;
>> +		iommu_tlb_range_add(domain->domain, iova, unmapped);
>> +	}
>> +	if (unmapped)
>> +		iommu_tlb_sync(domain->domain);
> 
> Using unmapped here seems a little sketchy, for instance if we got back
> zero on the last call to iommu_unmap_fast() but had other ranges queued
> for flush.  Do we even need a WARN_ON and break here, are we just
> trying to skip adding a zero range?  The intent is that we either leave
> this function with everything mapped or nothing mapped, so perhaps we
> should warn and continue.  Assuming a spurious sync is ok, we could
> check (i < npage) for the sync condition, the only risk being we had no
> mappings at all and therefore no unmaps.
>
> TBH, I wonder if this function is even needed anymore or if the mapping
> problem in amd_iommu has since ben fixed.

Actually, I never hit this execution path in my test runs. I could just left this
unchanged and use the slow unmap path to simplify the logic. I'm not aware of
the history of why this logic is needed for AMD IOMMU. Is this a bug in the driver or
the hardware?

> Also, I'm not sure why you're gating adding fast flushing to amd_iommu
> on vfio making use of it.  These can be done independently.  Thanks,

Currently, the fast unmap interface is mainly called by VFIO. So, I thought I would
submit the patches together for review. If you would prefer, I can submit the IOMMU part
separately.

Thanks,
Suravee