All of lore.kernel.org
 help / color / mirror / Atom feed
From: Auger Eric <eric.auger@redhat.com>
To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
	Alex Williamson <alex.williamson@redhat.com>
Cc: "pmorel@linux.vnet.ibm.com" <pmorel@linux.vnet.ibm.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linuxarm <linuxarm@huawei.com>,
	John Garry <john.garry@huawei.com>,
	"xuwei (O)" <xuwei5@huawei.com>
Subject: Re: [RFC v2 1/5] vfio/type1: Introduce iova list and add iommu aperture validity check
Date: Tue, 23 Jan 2018 12:20:14 +0100	[thread overview]
Message-ID: <5d63d94c-781d-6eb7-d464-4f18ab1d3cfe@redhat.com> (raw)
In-Reply-To: <5FC3163CFD30C246ABAA99954A238FA83863CBE7@FRAEML521-MBX.china.huawei.com>

Hi Shameer,

On 23/01/18 11:04, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@redhat.com]
>> Sent: Tuesday, January 23, 2018 8:25 AM
>> To: Alex Williamson <alex.williamson@redhat.com>; Shameerali Kolothum
>> Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: pmorel@linux.vnet.ibm.com; kvm@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Linuxarm <linuxarm@huawei.com>; John Garry
>> <john.garry@huawei.com>; xuwei (O) <xuwei5@huawei.com>
>> Subject: Re: [RFC v2 1/5] vfio/type1: Introduce iova list and add iommu
>> aperture validity check
>>
>> Hi Shameer,
>>
>> On 18/01/18 01:04, Alex Williamson wrote:
>>> On Fri, 12 Jan 2018 16:45:27 +0000
>>> Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
>>>
>>>> This introduces an iova list that is valid for dma mappings. Make
>>>> sure the new iommu aperture window is valid and doesn't conflict
>>>> with any existing dma mappings during attach. Also update the iova
>>>> list with new aperture window during attach/detach.
>>>>
>>>> Signed-off-by: Shameer Kolothum
>> <shameerali.kolothum.thodi@huawei.com>
>>>> ---
>>>>  drivers/vfio/vfio_iommu_type1.c | 177
>> ++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 177 insertions(+)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>> b/drivers/vfio/vfio_iommu_type1.c
>>>> index e30e29a..11cbd49 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -60,6 +60,7 @@
>>>>
>>>>  struct vfio_iommu {
>>>>  	struct list_head	domain_list;
>>>> +	struct list_head	iova_list;
>>>>  	struct vfio_domain	*external_domain; /* domain for external user
>> */
>>>>  	struct mutex		lock;
>>>>  	struct rb_root		dma_list;
>>>> @@ -92,6 +93,12 @@ struct vfio_group {
>>>>  	struct list_head	next;
>>>>  };
>>>>
>>>> +struct vfio_iova {
>>>> +	struct list_head	list;
>>>> +	phys_addr_t		start;
>>>> +	phys_addr_t		end;
>>>> +};
>>>
>>> dma_list uses dma_addr_t for the iova.  IOVAs are naturally DMA
>>> addresses, why are we using phys_addr_t?
>>>
>>>> +
>>>>  /*
>>>>   * Guest RAM pinning working set or DMA target
>>>>   */
>>>> @@ -1192,6 +1199,123 @@ static bool vfio_iommu_has_sw_msi(struct
>> iommu_group *group, phys_addr_t *base)
>>>>  	return ret;
>>>>  }
>>>>
>>>> +static int vfio_insert_iova(phys_addr_t start, phys_addr_t end,
>>>> +				struct list_head *head)
>>>> +{
>>>> +	struct vfio_iova *region;
>>>> +
>>>> +	region = kmalloc(sizeof(*region), GFP_KERNEL);
>>>> +	if (!region)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	INIT_LIST_HEAD(&region->list);
>>>> +	region->start = start;
>>>> +	region->end = end;
>>>> +
>>>> +	list_add_tail(&region->list, head);
>>>> +	return 0;
>>>> +}
>>>
>>> As I'm reading through this series, I'm learning that there are a lot
>>> of assumptions and subtle details that should be documented.  For
>>> instance, the IOMMU API only provides a single geometry and we build
>>> upon that here as this patch creates a list, but there's only a single
>>> entry for now.  The following patches carve that single iova range into
>>> pieces and somewhat subtly use the list_head passed to keep the list
>>> sorted, allowing the first/last_entry tricks used throughout.  Subtle
>>> interfaces are prone to bugs.
>>>
>>>> +
>>>> +/*
>>>> + * Find whether a mem region overlaps with existing dma mappings
>>>> + */
>>>> +static bool vfio_find_dma_overlap(struct vfio_iommu *iommu,
>>>> +				  phys_addr_t start, phys_addr_t end)
>>>> +{
>>>> +	struct rb_node *n = rb_first(&iommu->dma_list);
>>>> +
>>>> +	for (; n; n = rb_next(n)) {
>>>> +		struct vfio_dma *dma;
>>>> +
>>>> +		dma = rb_entry(n, struct vfio_dma, node);
>>>> +
>>>> +		if (end < dma->iova)
>>>> +			break;
>>>> +		if (start >= dma->iova + dma->size)
>>>> +			continue;
>>>> +		return true;
>>>> +	}
>>>> +
>>>> +	return false;
>>>> +}
>>>
>>> Why do we need this in addition to the existing vfio_find_dma()?  Why
>>> doesn't this use the tree structure of the dma_list?
>>>
>>>> +
>>>> +/*
>>>> + * Check the new iommu aperture is a valid one
>>>> + */
>>>> +static int vfio_iommu_valid_aperture(struct vfio_iommu *iommu,
>>>> +				     phys_addr_t start,
>>>> +				     phys_addr_t end)
>>>> +{
>>>> +	struct vfio_iova *first, *last;
>>>> +	struct list_head *iova = &iommu->iova_list;
>>>> +
>>>> +	if (list_empty(iova))
>>>> +		return 0;
>>>> +
>>>> +	/* Check if new one is outside the current aperture */
>>>
>>> "Disjoint sets"
>>>
>>>> +	first = list_first_entry(iova, struct vfio_iova, list);
>>>> +	last = list_last_entry(iova, struct vfio_iova, list);
>>>> +	if ((start > last->end) || (end < first->start))
>>>> +		return -EINVAL;
>>>> +
>>>> +	/* Check for any existing dma mappings outside the new start */
>>>> +	if (start > first->start) {
>>>> +		if (vfio_find_dma_overlap(iommu, first->start, start - 1))
>>>> +			return -EINVAL;
>>>> +	}
>>>> +
>>>> +	/* Check for any existing dma mappings outside the new end */
>>>> +	if (end < last->end) {
>>>> +		if (vfio_find_dma_overlap(iommu, end + 1, last->end))
>>>> +			return -EINVAL;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>
>>> I think this returns an int because you want to use it for the return
>>> value below, but it really seems like a bool question, ie. does this
>>> aperture conflict with existing mappings.  Additionally, the aperture
>>> is valid, it was provided to us by the IOMMU API, the question is
>>> whether it conflicts.  Please also name consistently to the other
>>> functions in this patch, vfio_iommu_aper_xxxx().
>>>
>>>> +
>>>> +/*
>>>> + * Adjust the iommu aperture window if new aperture is a valid one
>>>> + */
>>>> +static int vfio_iommu_iova_aper_adjust(struct vfio_iommu *iommu,
>>>> +				      phys_addr_t start,
>>>> +				      phys_addr_t end)
>>>
>>> Perhaps "resize", "prune", or "shrink" to make it more clear what is
>>> being adjusted?
>>>
>>>> +{
>>>> +	struct vfio_iova *node, *next;
>>>> +	struct list_head *iova = &iommu->iova_list;
>>>> +
>>>> +	if (list_empty(iova))
>>>> +		return vfio_insert_iova(start, end, iova);
>>>> +
>>>> +	/* Adjust iova list start */
>>>> +	list_for_each_entry_safe(node, next, iova, list) {
>>>> +		if (start < node->start)
>>>> +			break;
>>>> +		if ((start >= node->start) && (start <= node->end)) {
>>>
>>> start == node->end results in a zero sized node.  s/<=/</
>>>
>>>> +			node->start = start;
>>>> +			break;
>>>> +		}
>>>> +		/* Delete nodes before new start */
>>>> +		list_del(&node->list);
>>>> +		kfree(node);
>>>> +	}
>>>> +
>>>> +	/* Adjust iova list end */
>>>> +	list_for_each_entry_safe(node, next, iova, list) {
>>>> +		if (end > node->end)
>>>> +			continue;
>>>> +
>>>> +		if ((end >= node->start) && (end <= node->end)) {
>>>
>>> end == node->start results in a zero sized node.  s/>=/>/
>>>
>>>> +			node->end = end;
>>>> +			continue;
>>>> +		}
>>>> +		/* Delete nodes after new end */
>>>> +		list_del(&node->list);
>>>> +		kfree(node);
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static int vfio_iommu_type1_attach_group(void *iommu_data,
>>>>  					 struct iommu_group *iommu_group)
>>>>  {
>>>> @@ -1202,6 +1326,7 @@ static int vfio_iommu_type1_attach_group(void
>> *iommu_data,
>>>>  	int ret;
>>>>  	bool resv_msi, msi_remap;
>>>>  	phys_addr_t resv_msi_base;
>>>> +	struct iommu_domain_geometry geo;
>>>>
>>>>  	mutex_lock(&iommu->lock);
>>>>
>>>> @@ -1271,6 +1396,14 @@ static int vfio_iommu_type1_attach_group(void
>> *iommu_data,
>>>>  	if (ret)
>>>>  		goto out_domain;
>>>>
>>>> +	/* Get aperture info */
>>>> +	iommu_domain_get_attr(domain->domain,
>> DOMAIN_ATTR_GEOMETRY, &geo);
>>>> +
>>>> +	ret = vfio_iommu_valid_aperture(iommu, geo.aperture_start,
>>>> +					geo.aperture_end);
>>>> +	if (ret)
>>>> +		goto out_detach;
>>>> +
>>>>  	resv_msi = vfio_iommu_has_sw_msi(iommu_group, &resv_msi_base);
>>>>
>>>>  	INIT_LIST_HEAD(&domain->group_list);
>>>> @@ -1327,6 +1460,11 @@ static int vfio_iommu_type1_attach_group(void
>> *iommu_data,
>>>>  			goto out_detach;
>>>>  	}
>>>>
>>>> +	ret = vfio_iommu_iova_aper_adjust(iommu, geo.aperture_start,
>>>> +					  geo.aperture_end);
>>>> +	if (ret)
>>>> +		goto out_detach;
>>>> +
>>>>  	list_add(&domain->next, &iommu->domain_list);
>>>>
>>>>  	mutex_unlock(&iommu->lock);
>>>> @@ -1392,6 +1530,35 @@ static void vfio_sanity_check_pfn_list(struct
>> vfio_iommu *iommu)
>>>>  	WARN_ON(iommu->notifier.head);
>>>>  }
>>>>
>>>> +/*
>>>> + * Called when a domain is removed in detach. It is possible that
>>>> + * the removed domain decided the iova aperture window. Modify the
>>>> + * iova aperture with the smallest window among existing domains.
>>>> + */
>>>> +static void vfio_iommu_iova_aper_refresh(struct vfio_iommu *iommu)
>>>> +{
>>>> +	struct vfio_domain *domain;
>>>> +	struct iommu_domain_geometry geo;
>>>> +	struct vfio_iova *node;
>>>> +	phys_addr_t start = 0;
>>>> +	phys_addr_t end = (phys_addr_t)~0;
>>>> +
>>>> +	list_for_each_entry(domain, &iommu->domain_list, next) {
>>>> +		iommu_domain_get_attr(domain->domain,
>> DOMAIN_ATTR_GEOMETRY,
>>>> +				      &geo);
>>>> +			if (geo.aperture_start > start)
>>>> +				start = geo.aperture_start;
>>>> +			if (geo.aperture_end < end)
>>>> +				end = geo.aperture_end;
>>>> +	}
>>>> +
>>>> +	/* modify iova aperture limits */
>>>> +	node = list_first_entry(&iommu->iova_list, struct vfio_iova, list);
>>>> +	node->start = start;
>>>> +	node = list_last_entry(&iommu->iova_list, struct vfio_iova, list);
>>>> +	node->end = end;
>>>
>>> We can do this because the new aperture is the same or bigger than the
>>> current aperture, never smaller.  That's not fully obvious and should
>>> be noted in the comment.  Perhaps this function should be "expand"
>>> rather than "refresh".
>> This one is not obvious to me either:
>> assuming you have 2 domains, resp with aperture 1 and 2, resulting into
>> aperture 3. Holes are created by resv regions for instance. If you
>> remove domain 1, don't you get 4) instead of 2)?
>>
>> 1)   |------------|
>>  +
>> 2) |---|    |--|       |-----|
>> =
>> 3)   |-|    |--|
>>
>>
>> 4) |---|    |----------------|
> 
> That is true partially. But please remember that this patch is not aware of
> any reserved regions yet. That is introduced in patch #2. So patch #1 and #2
> together, the iova aperture might looks like 4) after this function call and once 
> vfio_iommu_iova_resv_refresh() in patch #2 is done, the aperture will be
> back to 2).
> 
> Hope I am clear. Please let me know.
Ah OK.
> 
> In any case, based on comments by Alex, I will be removing this aperture/reserve
> refresh functions and leave the iova list as it is when a group is detached. 
Looking forwarding to reviewing the next version then.

Thanks

Eric
> 
> Thanks,
> Shameer
> 
>> Thanks
>>
>> Eric
>>>
>>>> +}
>>>> +
>>>>  static void vfio_iommu_type1_detach_group(void *iommu_data,
>>>>  					  struct iommu_group *iommu_group)
>>>>  {
>>>> @@ -1445,6 +1612,7 @@ static void vfio_iommu_type1_detach_group(void
>> *iommu_data,
>>>>  			iommu_domain_free(domain->domain);
>>>>  			list_del(&domain->next);
>>>>  			kfree(domain);
>>>> +			vfio_iommu_iova_aper_refresh(iommu);
>>>>  		}
>>>>  		break;
>>>>  	}
>>>> @@ -1475,6 +1643,7 @@ static void *vfio_iommu_type1_open(unsigned
>> long arg)
>>>>  	}
>>>>
>>>>  	INIT_LIST_HEAD(&iommu->domain_list);
>>>> +	INIT_LIST_HEAD(&iommu->iova_list);
>>>>  	iommu->dma_list = RB_ROOT;
>>>>  	mutex_init(&iommu->lock);
>>>>  	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
>>>> @@ -1502,6 +1671,7 @@ static void vfio_iommu_type1_release(void
>> *iommu_data)
>>>>  {
>>>>  	struct vfio_iommu *iommu = iommu_data;
>>>>  	struct vfio_domain *domain, *domain_tmp;
>>>> +	struct vfio_iova *iova, *iova_tmp;
>>>>
>>>>  	if (iommu->external_domain) {
>>>>  		vfio_release_domain(iommu->external_domain, true);
>>>> @@ -1517,6 +1687,13 @@ static void vfio_iommu_type1_release(void
>> *iommu_data)
>>>>  		list_del(&domain->next);
>>>>  		kfree(domain);
>>>>  	}
>>>> +
>>>> +	list_for_each_entry_safe(iova, iova_tmp,
>>>> +				 &iommu->iova_list, list) {
>>>> +		list_del(&iova->list);
>>>> +		kfree(iova);
>>>> +	}
>>>> +
>>>>  	kfree(iommu);
>>>>  }
>>>>
>>>

  reply	other threads:[~2018-01-23 11:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-12 16:45 [RFC v2 0/5] vfio/type1: Add support for valid iova list management Shameer Kolothum
2018-01-12 16:45 ` [RFC v2 1/5] vfio/type1: Introduce iova list and add iommu aperture validity check Shameer Kolothum
2018-01-18  0:04   ` Alex Williamson
2018-01-19  9:47     ` Shameerali Kolothum Thodi
2018-01-23  8:25     ` Auger Eric
2018-01-23 10:04       ` Shameerali Kolothum Thodi
2018-01-23 11:20         ` Auger Eric [this message]
2018-01-12 16:45 ` [RFC v2 2/5] vfio/type1: Check reserve region conflict and update iova list Shameer Kolothum
2018-01-18  0:04   ` Alex Williamson
2018-01-19  9:48     ` Shameerali Kolothum Thodi
2018-01-19 15:45       ` Alex Williamson
2018-01-23  8:32     ` Auger Eric
2018-01-23 12:16       ` Shameerali Kolothum Thodi
2018-01-23 12:51         ` Auger Eric
2018-01-23 15:26           ` Shameerali Kolothum Thodi
2018-01-12 16:45 ` [RFC v2 3/5] vfio/type1: check dma map request is within a valid iova range Shameer Kolothum
2018-01-23  8:38   ` Auger Eric
2018-01-12 16:45 ` [RFC v2 4/5] vfio/type1: Add IOVA range capability support Shameer Kolothum
2018-01-23 11:16   ` Auger Eric
2018-01-23 12:51     ` Shameerali Kolothum Thodi
2018-01-12 16:45 ` [RFC v2 5/5] vfio/type1: remove duplicate retrieval of reserved regions Shameer Kolothum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d63d94c-781d-6eb7-d464-4f18ab1d3cfe@redhat.com \
    --to=eric.auger@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=john.garry@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=pmorel@linux.vnet.ibm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=xuwei5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.