All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] iommu/dma: Map scatterlists more parsimoniously
@ 2015-11-20 10:57 Robin Murphy
       [not found] ` <cf89e25139683d85862302e75f9923d101ba233e.1448016328.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Murphy @ 2015-11-20 10:57 UTC (permalink / raw)
  To: joro-zLv9SwRftAIdnm+yROfE0A
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Whilst blindly assuming the worst case for segment boundaries and
aligning every segment individually is safe from the point of view
of respecting the device's parameters, it is also undeniably a waste
of IOVA space. Futhermore, the knock-on effects of more pages than
necessary being exposed to device access, additional overhead in page
table updates and TLB invalidations, etc., are even more undesirable.

Improve matters by taking the actual boundary mask into account to
actively detect the cases in which we really do need to adjust a
segment, and avoid wasting space in the remainder.

Tested-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---

Minor change: removed the now-redundant null check on prev, since we no
longer dereference it unconditionally and pad_len is guaranteed to be
zero the first time around.

 drivers/iommu/dma-iommu.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 3a20db4..427fdc1 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -441,6 +441,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	struct scatterlist *s, *prev = NULL;
 	dma_addr_t dma_addr;
 	size_t iova_len = 0;
+	unsigned long mask = dma_get_seg_boundary(dev);
 	int i;
 
 	/*
@@ -452,6 +453,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	for_each_sg(sg, s, nents, i) {
 		size_t s_offset = iova_offset(iovad, s->offset);
 		size_t s_length = s->length;
+		size_t pad_len = (mask - iova_len + 1) & mask;
 
 		sg_dma_address(s) = s->offset;
 		sg_dma_len(s) = s_length;
@@ -460,15 +462,13 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 		s->length = s_length;
 
 		/*
-		 * The simple way to avoid the rare case of a segment
-		 * crossing the boundary mask is to pad the previous one
-		 * to end at a naturally-aligned IOVA for this one's size,
-		 * at the cost of potentially over-allocating a little.
+		 * With a single size-aligned IOVA allocation, no segment risks
+		 * crossing the boundary mask unless the total size exceeds
+		 * the mask itself. The simple way to maintain alignment when
+		 * that does happen is to pad the previous segment to end at the
+		 * next boundary, at the cost of over-allocating a little.
 		 */
-		if (prev) {
-			size_t pad_len = roundup_pow_of_two(s_length);
-
-			pad_len = (pad_len - iova_len) & (pad_len - 1);
+		if (pad_len && pad_len < s_length - 1) {
 			prev->length += pad_len;
 			iova_len += pad_len;
 		}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] iommu/dma: Map scatterlists more parsimoniously
       [not found] ` <cf89e25139683d85862302e75f9923d101ba233e.1448016328.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
@ 2015-11-26 15:37   ` Joerg Roedel
       [not found]     ` <20151126153728.GC17674-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  2015-12-30  1:42   ` Yong Wu
  2016-02-15 23:00   ` Yong Wu
  2 siblings, 1 reply; 7+ messages in thread
From: Joerg Roedel @ 2015-11-26 15:37 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Fri, Nov 20, 2015 at 10:57:40AM +0000, Robin Murphy wrote:
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 3a20db4..427fdc1 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -441,6 +441,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>  	struct scatterlist *s, *prev = NULL;
>  	dma_addr_t dma_addr;
>  	size_t iova_len = 0;
> +	unsigned long mask = dma_get_seg_boundary(dev);
>  	int i;
>  
>  	/*
> @@ -452,6 +453,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>  	for_each_sg(sg, s, nents, i) {
>  		size_t s_offset = iova_offset(iovad, s->offset);
>  		size_t s_length = s->length;
> +		size_t pad_len = (mask - iova_len + 1) & mask;
>  
>  		sg_dma_address(s) = s->offset;
>  		sg_dma_len(s) = s_length;
> @@ -460,15 +462,13 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>  		s->length = s_length;
>  
>  		/*
> -		 * The simple way to avoid the rare case of a segment
> -		 * crossing the boundary mask is to pad the previous one
> -		 * to end at a naturally-aligned IOVA for this one's size,
> -		 * at the cost of potentially over-allocating a little.
> +		 * With a single size-aligned IOVA allocation, no segment risks
> +		 * crossing the boundary mask unless the total size exceeds
> +		 * the mask itself. The simple way to maintain alignment when
> +		 * that does happen is to pad the previous segment to end at the
> +		 * next boundary, at the cost of over-allocating a little.
>  		 */
> -		if (prev) {
> -			size_t pad_len = roundup_pow_of_two(s_length);
> -
> -			pad_len = (pad_len - iova_len) & (pad_len - 1);
> +		if (pad_len && pad_len < s_length - 1) {
>  			prev->length += pad_len;
>  			iova_len += pad_len;
>  		}

Hmm, this whole code looks overly complicated. Why don't you just add
the sizes of the individual sg-elements together and then do an
allocation aligned on the next power-of-two. This will not cross the
boundary mask until the size is smaller than the mask.
When the size is bigger than the mask you can either put a WARN on into
and return error (to see if that really happens), or just do multiple
smaller allocations that fit into the boundary mask.



	Joerg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] iommu/dma: Map scatterlists more parsimoniously
       [not found]     ` <20151126153728.GC17674-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2015-11-26 17:36       ` Robin Murphy
       [not found]         ` <56574314.30205-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Murphy @ 2015-11-26 17:36 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 26/11/15 15:37, Joerg Roedel wrote:
> On Fri, Nov 20, 2015 at 10:57:40AM +0000, Robin Murphy wrote:
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 3a20db4..427fdc1 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -441,6 +441,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>>   	struct scatterlist *s, *prev = NULL;
>>   	dma_addr_t dma_addr;
>>   	size_t iova_len = 0;
>> +	unsigned long mask = dma_get_seg_boundary(dev);
>>   	int i;
>>
>>   	/*
>> @@ -452,6 +453,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>>   	for_each_sg(sg, s, nents, i) {
>>   		size_t s_offset = iova_offset(iovad, s->offset);
>>   		size_t s_length = s->length;
>> +		size_t pad_len = (mask - iova_len + 1) & mask;
>>
>>   		sg_dma_address(s) = s->offset;
>>   		sg_dma_len(s) = s_length;
>> @@ -460,15 +462,13 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>>   		s->length = s_length;
>>
>>   		/*
>> -		 * The simple way to avoid the rare case of a segment
>> -		 * crossing the boundary mask is to pad the previous one
>> -		 * to end at a naturally-aligned IOVA for this one's size,
>> -		 * at the cost of potentially over-allocating a little.
>> +		 * With a single size-aligned IOVA allocation, no segment risks
>> +		 * crossing the boundary mask unless the total size exceeds
>> +		 * the mask itself. The simple way to maintain alignment when
>> +		 * that does happen is to pad the previous segment to end at the
>> +		 * next boundary, at the cost of over-allocating a little.
>>   		 */
>> -		if (prev) {
>> -			size_t pad_len = roundup_pow_of_two(s_length);
>> -
>> -			pad_len = (pad_len - iova_len) & (pad_len - 1);
>> +		if (pad_len && pad_len < s_length - 1) {
>>   			prev->length += pad_len;
>>   			iova_len += pad_len;
>>   		}
>
> Hmm, this whole code looks overly complicated. Why don't you just add
> the sizes of the individual sg-elements together and then do an
> allocation aligned on the next power-of-two. This will not cross the
> boundary mask until the size is smaller than the mask.

Other than all of 4 lines referencing pad_len, that's exactly what it 
does! In the very common case when the boundary mask is larger than the 
total length such that (pad_len < s_length - 1) is always false, that's 
still exactly what it does!

> When the size is bigger than the mask you can either put a WARN on into
> and return error (to see if that really happens), or just do multiple
> smaller allocations that fit into the boundary mask.

That case is actually surprisingly common in at least one situation - a 
typical scatterlist coming in from the SCSI layer seems to have a max 
segment length of 65536, a boundary mask of 0xffff, and a short first 
segment followed by subsequent full segments (I guess they are the 
command buffer and data buffers respectively), meaning an alignment bump 
is needed fairly frequently there.

I wanted to avoid multiple allocations for various reasons:
  - iommu_map_sg() only takes a single base IOVA, but it's nice if we 
can make use of it.
  - Some of the "complication" would just be shifted into the unmap 
path, and having to make multiple unmap requests to the IOMMU driver 
increases the number of potentially costly TLB syncs.
  - It minimises contention in the IOVA allocator, which is apparently a 
real-world problem ;)
  - 'Proper' segment concatenation is only a small step on top of the 
current code - probably only a couple of extra lines in finalise_sg() (I 
just haven't yet looked into resurrecting it from iommu-dma v5 as I'm 
trying to get other things done).
  - Unless we bring back the really dumb "blindly iommu_map() each 
segment individually" function from iommu-dma v1, we have to scan the 
list doing all this segment boundary calculation regardless. At that 
point it seems far cheaper and simpler to add one number to another 
number along the way than to break off into a whole 
allocate/map/see-if-it-worked cycle every time.
  - I like to minimise the number of people complaining at me that whole 
scatterlists don't get mapped into contiguous IOVA ranges.

Robin.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] iommu/dma: Map scatterlists more parsimoniously
       [not found]         ` <56574314.30205-5wv7dgnIgG8@public.gmane.org>
@ 2015-11-27 15:16           ` Joerg Roedel
       [not found]             ` <20151127151653.GK2064-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Joerg Roedel @ 2015-11-27 15:16 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Thu, Nov 26, 2015 at 05:36:20PM +0000, Robin Murphy wrote:
> On 26/11/15 15:37, Joerg Roedel wrote:
> >When the size is bigger than the mask you can either put a WARN on into
> >and return error (to see if that really happens), or just do multiple
> >smaller allocations that fit into the boundary mask.
> 
> That case is actually surprisingly common in at least one situation
> - a typical scatterlist coming in from the SCSI layer seems to have
> a max segment length of 65536, a boundary mask of 0xffff, and a
> short first segment followed by subsequent full segments (I guess
> they are the command buffer and data buffers respectively), meaning
> an alignment bump is needed fairly frequently there.

The boundary_mask is a property of the underlying device, which one is
it with that boundary mask?

> I wanted to avoid multiple allocations for various reasons:
>  - iommu_map_sg() only takes a single base IOVA, but it's nice if we
> can make use of it.
>  - Some of the "complication" would just be shifted into the unmap
> path, and having to make multiple unmap requests to the IOMMU driver
> increases the number of potentially costly TLB syncs.

Yeah, its a trade-off between wasting iova space and a faster
implementation.

>  - It minimises contention in the IOVA allocator, which is
> apparently a real-world problem ;)

Thats why we currently evaluate options to get rid of it completly :)

>  - I like to minimise the number of people complaining at me that
> whole scatterlists don't get mapped into contiguous IOVA ranges.

Making that assumption is a bug anyway, this is the simple answer you
can give to anyone complaining.


	Joerg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] iommu/dma: Map scatterlists more parsimoniously
       [not found]             ` <20151127151653.GK2064-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2015-11-27 19:49               ` Robin Murphy
  0 siblings, 0 replies; 7+ messages in thread
From: Robin Murphy @ 2015-11-27 19:49 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On 27/11/15 15:16, Joerg Roedel wrote:
> On Thu, Nov 26, 2015 at 05:36:20PM +0000, Robin Murphy wrote:
>> On 26/11/15 15:37, Joerg Roedel wrote:
>>> When the size is bigger than the mask you can either put a WARN on into
>>> and return error (to see if that really happens), or just do multiple
>>> smaller allocations that fit into the boundary mask.
>>
>> That case is actually surprisingly common in at least one situation
>> - a typical scatterlist coming in from the SCSI layer seems to have
>> a max segment length of 65536, a boundary mask of 0xffff, and a
>> short first segment followed by subsequent full segments (I guess
>> they are the command buffer and data buffers respectively), meaning
>> an alignment bump is needed fairly frequently there.
>
> The boundary_mask is a property of the underlying device, which one is
> it with that boundary mask?

That particular one is actually ATA_DMA_BOUNDARY, which is used by 
several drivers - sata_sil24 is the specific one I was testing with.

>> I wanted to avoid multiple allocations for various reasons:
>>   - iommu_map_sg() only takes a single base IOVA, but it's nice if we
>> can make use of it.
>>   - Some of the "complication" would just be shifted into the unmap
>> path, and having to make multiple unmap requests to the IOMMU driver
>> increases the number of potentially costly TLB syncs.
>
> Yeah, its a trade-off between wasting iova space and a faster
> implementation.

Certainly my hunch is that in the great majority of use-cases 
(particularly on 64-bit systems) the IOVA space itself is going to be 
considerably less precious than the time spent managing it.

>>   - It minimises contention in the IOVA allocator, which is
>> apparently a real-world problem ;)
>
> Thats why we currently evaluate options to get rid of it completly :)

Heh, as long as the interface is sane, the implementation backing it can 
change as much as it likes. AFAICS it's still going to be generally the 
case that searching for one space once is more efficient than multiple 
searches for multiple spaces, though.

>>   - I like to minimise the number of people complaining at me that
>> whole scatterlists don't get mapped into contiguous IOVA ranges.
>
> Making that assumption is a bug anyway, this is the simple answer you
> can give to anyone complaining.

And indeed I have, repeatedly ;)

Anyway, this patch is merely making an objective improvement to the way 
the existing code does the thing it does: for comparison, 'dd bs=1M 
count=1K ...' reading from a USB mass storage device with the current 
code adds unnecessary padding to a segment in the order of a couple of 
hundred times; with this patch it never happens at all, which seems like 
precisely what you asked for.

Robin.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] iommu/dma: Map scatterlists more parsimoniously
       [not found] ` <cf89e25139683d85862302e75f9923d101ba233e.1448016328.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
  2015-11-26 15:37   ` Joerg Roedel
@ 2015-12-30  1:42   ` Yong Wu
  2016-02-15 23:00   ` Yong Wu
  2 siblings, 0 replies; 7+ messages in thread
From: Yong Wu @ 2015-12-30  1:42 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA


[-- Attachment #1.1: Type: text/plain, Size: 1221 bytes --]

On Fri, 2015-11-20 at 10:57 +0000, Robin Murphy wrote:

> Whilst blindly assuming the worst case for segment boundaries and
> aligning every segment individually is safe from the point of view
> of respecting the device's parameters, it is also undeniably a waste
> of IOVA space. Futhermore, the knock-on effects of more pages than
> necessary being exposed to device access, additional overhead in page
> table updates and TLB invalidations, etc., are even more undesirable.
> 
> Improve matters by taking the actual boundary mask into account to
> actively detect the cases in which we really do need to adjust a
> segment, and avoid wasting space in the remainder.
> 
> Tested-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>


Hi Robin,
    What's the status of this patch? Could we know whether it will be
merged into v4.5?
 

> ---
> 
> Minor change: removed the now-redundant null check on prev, since we no
> longer dereference it unconditionally and pad_len is guaranteed to be
> zero the first time around.
> 
>  drivers/iommu/dma-iommu.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 

[...]


[-- Attachment #1.2: Type: text/html, Size: 1680 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] iommu/dma: Map scatterlists more parsimoniously
       [not found] ` <cf89e25139683d85862302e75f9923d101ba233e.1448016328.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
  2015-11-26 15:37   ` Joerg Roedel
  2015-12-30  1:42   ` Yong Wu
@ 2016-02-15 23:00   ` Yong Wu
  2 siblings, 0 replies; 7+ messages in thread
From: Yong Wu @ 2016-02-15 23:00 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Daniel Kurtz, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w

On Fri, 2015-11-20 at 10:57 +0000, Robin Murphy wrote:
> Whilst blindly assuming the worst case for segment boundaries and
> aligning every segment individually is safe from the point of view
> of respecting the device's parameters, it is also undeniably a waste
> of IOVA space. Futhermore, the knock-on effects of more pages than
> necessary being exposed to device access, additional overhead in page
> table updates and TLB invalidations, etc., are even more undesirable.
> 
> Improve matters by taking the actual boundary mask into account to
> actively detect the cases in which we really do need to adjust a
> segment, and avoid wasting space in the remainder.
> 
> Tested-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---

Hi Robin, Joerg,

   Is there any plan for this patch? It seems to be lost.
Could we pick this up as a fix for v4.5? 
Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-15 23:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-20 10:57 [PATCH v2] iommu/dma: Map scatterlists more parsimoniously Robin Murphy
     [not found] ` <cf89e25139683d85862302e75f9923d101ba233e.1448016328.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2015-11-26 15:37   ` Joerg Roedel
     [not found]     ` <20151126153728.GC17674-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-11-26 17:36       ` Robin Murphy
     [not found]         ` <56574314.30205-5wv7dgnIgG8@public.gmane.org>
2015-11-27 15:16           ` Joerg Roedel
     [not found]             ` <20151127151653.GK2064-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2015-11-27 19:49               ` Robin Murphy
2015-12-30  1:42   ` Yong Wu
2016-02-15 23:00   ` Yong Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.