All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-11 17:30 ` Douglas Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Douglas Anderson @ 2016-01-11 17:30 UTC (permalink / raw)
  To: linux, mchehab, robin.murphy, tfiga, m.szyprowski
  Cc: pawel, Dmitry Torokhov, hch, Douglas Anderson, k.debski,
	laurent.pinchart+renesas, corbet, mike.looijmans, linux-kernel,
	linux-doc, will.deacon, jtp.park, penguin-kernel, kyungmin.park,
	carlo, akpm, linux-media, dan.j.williams, linux-arm-kernel

This series of patches will speed up memory allocation in dma-mapping
quite a bit.

The first patch ("ARM: dma-mapping: Optimize allocation") is hopefully
not terribly controversial: it merely doesn't try as hard to allocate
big chunks once it gets the first failure.  Since it's unlikely that
further big chunks will help (they're not likely to be virtually aligned
anyway), this should give a big speedup with no real regression to speak
of.  Yes, things could be made better, but this seems like a sane start.

The second patch ("common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES
attribute") models MADV_NOHUGEPAGE as I understand it.  Hopefully folks
are happy with following that lead.  It does nothing by itself.

The third patch ("ARM: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to
optimize allocation") simply applies the 2nd patch.  Again it's pretty
simple.  ...and again it does nothing by itself.

Thue fourth patch ("[media] videobuf2-dc: Let drivers specify DMA
attrs") comes from the ChromeOS tree (authored by Tomasz Figa) and
allows the fifth patch.

The fifth patch ("[media] s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES") uses the
new attribute.  For a second user, you can see the out of tree patch for
rk3288 at <https://chromium-review.googlesource.com/#/c/320498/>.

All testing was done on the chromeos kernel-3.8 and kernel-3.14.
Sanity (compile / boot) testing was done on a v4.4-rc6-based kernel on
rk3288, though the video codec isn't there.  I don't have graphics / MFC
working well on exynos, so the MFC change was only compile-tested
upstream.  Hopefully someone upstream whose setup for MFC can give a
Tested-by for these?

Also note that v2 of this series had an extra patch
<https://patchwork.kernel.org/patch/7888861/> that would attempt to sort
the allocation results to opportunistically get some extra alignment.  I
dropped that, but it could be re-introduced if there was interest.  I
found that it did give a little extra alignment sometimes, but maybe not
enough to justify the extra complexity.  It also was a bit half-baked
since it really should have tried harder to ensure alignment.

Changes in v6:
- renamed DMA_ATTR_NO_HUGE_PAGE to DMA_ATTR_ALLOC_SINGLE_PAGES

Changes in v5:
- renamed DMA_ATTR_NOHUGEPAGE to DMA_ATTR_NO_HUGE_PAGE
- s/ping ping/ping pong/
- Let drivers specify DMA attrs new for v5
- s5p-mfc patch new for v5

Changes in v4:
- Added Marek's ack
- renamed DMA_ATTR_SEQUENTIAL to DMA_ATTR_NOHUGEPAGE

Changes in v3:
- add DMA_ATTR_SEQUENTIAL attribute new for v3
- Use DMA_ATTR_SEQUENTIAL hint patch new for v3.

Changes in v2:
- No longer just 1 page at a time, but gives up higher order quickly.
- Only tries important higher order allocations that might help us.

Douglas Anderson (4):
  ARM: dma-mapping: Optimize allocation
  common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES attribute
  ARM: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize
    alloc
  s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES

Tomasz Figa (1):
  videobuf2-dc: Let drivers specify DMA attrs

 Documentation/DMA-attributes.txt               | 23 ++++++++++++++++
 arch/arm/mm/dma-mapping.c                      | 38 ++++++++++++++++----------
 drivers/media/platform/s5p-mfc/s5p_mfc.c       | 13 +++++++--
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 33 ++++++++++++++--------
 include/linux/dma-attrs.h                      |  1 +
 include/media/videobuf2-dma-contig.h           | 11 +++++++-
 6 files changed, 91 insertions(+), 28 deletions(-)

-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-11 17:30 ` Douglas Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Douglas Anderson @ 2016-01-11 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

This series of patches will speed up memory allocation in dma-mapping
quite a bit.

The first patch ("ARM: dma-mapping: Optimize allocation") is hopefully
not terribly controversial: it merely doesn't try as hard to allocate
big chunks once it gets the first failure.  Since it's unlikely that
further big chunks will help (they're not likely to be virtually aligned
anyway), this should give a big speedup with no real regression to speak
of.  Yes, things could be made better, but this seems like a sane start.

The second patch ("common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES
attribute") models MADV_NOHUGEPAGE as I understand it.  Hopefully folks
are happy with following that lead.  It does nothing by itself.

The third patch ("ARM: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to
optimize allocation") simply applies the 2nd patch.  Again it's pretty
simple.  ...and again it does nothing by itself.

Thue fourth patch ("[media] videobuf2-dc: Let drivers specify DMA
attrs") comes from the ChromeOS tree (authored by Tomasz Figa) and
allows the fifth patch.

The fifth patch ("[media] s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES") uses the
new attribute.  For a second user, you can see the out of tree patch for
rk3288 at <https://chromium-review.googlesource.com/#/c/320498/>.

All testing was done on the chromeos kernel-3.8 and kernel-3.14.
Sanity (compile / boot) testing was done on a v4.4-rc6-based kernel on
rk3288, though the video codec isn't there.  I don't have graphics / MFC
working well on exynos, so the MFC change was only compile-tested
upstream.  Hopefully someone upstream whose setup for MFC can give a
Tested-by for these?

Also note that v2 of this series had an extra patch
<https://patchwork.kernel.org/patch/7888861/> that would attempt to sort
the allocation results to opportunistically get some extra alignment.  I
dropped that, but it could be re-introduced if there was interest.  I
found that it did give a little extra alignment sometimes, but maybe not
enough to justify the extra complexity.  It also was a bit half-baked
since it really should have tried harder to ensure alignment.

Changes in v6:
- renamed DMA_ATTR_NO_HUGE_PAGE to DMA_ATTR_ALLOC_SINGLE_PAGES

Changes in v5:
- renamed DMA_ATTR_NOHUGEPAGE to DMA_ATTR_NO_HUGE_PAGE
- s/ping ping/ping pong/
- Let drivers specify DMA attrs new for v5
- s5p-mfc patch new for v5

Changes in v4:
- Added Marek's ack
- renamed DMA_ATTR_SEQUENTIAL to DMA_ATTR_NOHUGEPAGE

Changes in v3:
- add DMA_ATTR_SEQUENTIAL attribute new for v3
- Use DMA_ATTR_SEQUENTIAL hint patch new for v3.

Changes in v2:
- No longer just 1 page at a time, but gives up higher order quickly.
- Only tries important higher order allocations that might help us.

Douglas Anderson (4):
  ARM: dma-mapping: Optimize allocation
  common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES attribute
  ARM: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize
    alloc
  s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES

Tomasz Figa (1):
  videobuf2-dc: Let drivers specify DMA attrs

 Documentation/DMA-attributes.txt               | 23 ++++++++++++++++
 arch/arm/mm/dma-mapping.c                      | 38 ++++++++++++++++----------
 drivers/media/platform/s5p-mfc/s5p_mfc.c       | 13 +++++++--
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 33 ++++++++++++++--------
 include/linux/dma-attrs.h                      |  1 +
 include/media/videobuf2-dma-contig.h           | 11 +++++++-
 6 files changed, 91 insertions(+), 28 deletions(-)

-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 1/5] ARM: dma-mapping: Optimize allocation
  2016-01-11 17:30 ` Douglas Anderson
@ 2016-01-11 17:30   ` Douglas Anderson
  -1 siblings, 0 replies; 36+ messages in thread
From: Douglas Anderson @ 2016-01-11 17:30 UTC (permalink / raw)
  To: linux, mchehab, robin.murphy, tfiga, m.szyprowski
  Cc: pawel, Dmitry Torokhov, hch, Douglas Anderson, will.deacon, akpm,
	dan.j.williams, carlo, laurent.pinchart+renesas, mike.looijmans,
	penguin-kernel, linux-arm-kernel, linux-kernel

The __iommu_alloc_buffer() is expected to be called to allocate pretty
sizeable buffers.  Upon simple tests of video I saw it trying to
allocate 4,194,304 bytes.  The function tries to allocate large chunks
in order to optimize IOMMU TLB usage.

The current function is very, very slow.

One problem is the way it keeps trying and trying to allocate big
chunks.  Imagine a very fragmented memory that has 4M free but no
contiguous pages at all.  Further imagine allocating 4M (1024 pages).
We'll do the following memory allocations:
- For page 1:
  - Try to allocate order 10 (no retry)
  - Try to allocate order 9 (no retry)
  - ...
  - Try to allocate order 0 (with retry, but not needed)
- For page 2:
  - Try to allocate order 9 (no retry)
  - Try to allocate order 8 (no retry)
  - ...
  - Try to allocate order 0 (with retry, but not needed)
- ...
- ...

Total number of calls to alloc() calls for this case is:
  sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
  => 9228

The above is obviously worse case, but given how slow alloc can be we
really want to try to avoid even somewhat bad cases.  I timed the old
code with a device under memory pressure and it wasn't hard to see it
take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
was done on kernel 3.14, so possibly mainline would behave
differently).

A second problem is that allocating big chunks under memory pressure
when we don't need them is just not a great idea anyway unless we really
need them.  We can make due pretty well with smaller chunks so it's
probably wise to leave bigger chunks for other users once memory
pressure is on.

Let's adjust the allocation like this:

1. If a big chunk fails, stop trying to hard and bump down to lower
   order allocations.
2. Don't try useless orders.  The whole point of big chunks is to
   optimize the TLB and it can really only make use of 2M, 1M, 64K and
   4K sizes.

We'll still tend to eat up a bunch of big chunks, but that might be the
right answer for some users.  A future patch could possibly add a new
DMA_ATTR that would let the caller decide that TLB optimization isn't
important and that we should use smaller chunks.  Presumably this would
be a sane strategy for some callers.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
Changes in v6: None
Changes in v5: None
Changes in v4:
- Added Marek's ack

Changes in v3: None
Changes in v2:
- No longer just 1 page at a time, but gives up higher order quickly.
- Only tries important higher order allocations that might help us.

 arch/arm/mm/dma-mapping.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0eca3812527e..bc9cebfa0891 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1122,6 +1122,9 @@ static inline void __free_iova(struct dma_iommu_mapping *mapping,
 	spin_unlock_irqrestore(&mapping->lock, flags);
 }
 
+/* We'll try 2M, 1M, 64K, and finally 4K; array must end with 0! */
+static const int iommu_order_array[] = { 9, 8, 4, 0 };
+
 static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
 					  gfp_t gfp, struct dma_attrs *attrs)
 {
@@ -1129,6 +1132,7 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
 	int count = size >> PAGE_SHIFT;
 	int array_size = count * sizeof(struct page *);
 	int i = 0;
+	int order_idx = 0;
 
 	if (array_size <= PAGE_SIZE)
 		pages = kzalloc(array_size, GFP_KERNEL);
@@ -1162,22 +1166,24 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
 	while (count) {
 		int j, order;
 
-		for (order = __fls(count); order > 0; --order) {
-			/*
-			 * We do not want OOM killer to be invoked as long
-			 * as we can fall back to single pages, so we force
-			 * __GFP_NORETRY for orders higher than zero.
-			 */
-			pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
-			if (pages[i])
-				break;
+		order = iommu_order_array[order_idx];
+
+		/* Drop down when we get small */
+		if (__fls(count) < order) {
+			order_idx++;
+			continue;
 		}
 
-		if (!pages[i]) {
-			/*
-			 * Fall back to single page allocation.
-			 * Might invoke OOM killer as last resort.
-			 */
+		if (order) {
+			/* See if it's easy to allocate a high-order chunk */
+			pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
+
+			/* Go down a notch at first sign of pressure */
+			if (!pages[i]) {
+				order_idx++;
+				continue;
+			}
+		} else {
 			pages[i] = alloc_pages(gfp, 0);
 			if (!pages[i])
 				goto error;
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 1/5] ARM: dma-mapping: Optimize allocation
@ 2016-01-11 17:30   ` Douglas Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Douglas Anderson @ 2016-01-11 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

The __iommu_alloc_buffer() is expected to be called to allocate pretty
sizeable buffers.  Upon simple tests of video I saw it trying to
allocate 4,194,304 bytes.  The function tries to allocate large chunks
in order to optimize IOMMU TLB usage.

The current function is very, very slow.

One problem is the way it keeps trying and trying to allocate big
chunks.  Imagine a very fragmented memory that has 4M free but no
contiguous pages at all.  Further imagine allocating 4M (1024 pages).
We'll do the following memory allocations:
- For page 1:
  - Try to allocate order 10 (no retry)
  - Try to allocate order 9 (no retry)
  - ...
  - Try to allocate order 0 (with retry, but not needed)
- For page 2:
  - Try to allocate order 9 (no retry)
  - Try to allocate order 8 (no retry)
  - ...
  - Try to allocate order 0 (with retry, but not needed)
- ...
- ...

Total number of calls to alloc() calls for this case is:
  sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
  => 9228

The above is obviously worse case, but given how slow alloc can be we
really want to try to avoid even somewhat bad cases.  I timed the old
code with a device under memory pressure and it wasn't hard to see it
take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
was done on kernel 3.14, so possibly mainline would behave
differently).

A second problem is that allocating big chunks under memory pressure
when we don't need them is just not a great idea anyway unless we really
need them.  We can make due pretty well with smaller chunks so it's
probably wise to leave bigger chunks for other users once memory
pressure is on.

Let's adjust the allocation like this:

1. If a big chunk fails, stop trying to hard and bump down to lower
   order allocations.
2. Don't try useless orders.  The whole point of big chunks is to
   optimize the TLB and it can really only make use of 2M, 1M, 64K and
   4K sizes.

We'll still tend to eat up a bunch of big chunks, but that might be the
right answer for some users.  A future patch could possibly add a new
DMA_ATTR that would let the caller decide that TLB optimization isn't
important and that we should use smaller chunks.  Presumably this would
be a sane strategy for some callers.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
Changes in v6: None
Changes in v5: None
Changes in v4:
- Added Marek's ack

Changes in v3: None
Changes in v2:
- No longer just 1 page at a time, but gives up higher order quickly.
- Only tries important higher order allocations that might help us.

 arch/arm/mm/dma-mapping.c | 34 ++++++++++++++++++++--------------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0eca3812527e..bc9cebfa0891 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1122,6 +1122,9 @@ static inline void __free_iova(struct dma_iommu_mapping *mapping,
 	spin_unlock_irqrestore(&mapping->lock, flags);
 }
 
+/* We'll try 2M, 1M, 64K, and finally 4K; array must end with 0! */
+static const int iommu_order_array[] = { 9, 8, 4, 0 };
+
 static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
 					  gfp_t gfp, struct dma_attrs *attrs)
 {
@@ -1129,6 +1132,7 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
 	int count = size >> PAGE_SHIFT;
 	int array_size = count * sizeof(struct page *);
 	int i = 0;
+	int order_idx = 0;
 
 	if (array_size <= PAGE_SIZE)
 		pages = kzalloc(array_size, GFP_KERNEL);
@@ -1162,22 +1166,24 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
 	while (count) {
 		int j, order;
 
-		for (order = __fls(count); order > 0; --order) {
-			/*
-			 * We do not want OOM killer to be invoked as long
-			 * as we can fall back to single pages, so we force
-			 * __GFP_NORETRY for orders higher than zero.
-			 */
-			pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
-			if (pages[i])
-				break;
+		order = iommu_order_array[order_idx];
+
+		/* Drop down when we get small */
+		if (__fls(count) < order) {
+			order_idx++;
+			continue;
 		}
 
-		if (!pages[i]) {
-			/*
-			 * Fall back to single page allocation.
-			 * Might invoke OOM killer as last resort.
-			 */
+		if (order) {
+			/* See if it's easy to allocate a high-order chunk */
+			pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
+
+			/* Go down a notch@first sign of pressure */
+			if (!pages[i]) {
+				order_idx++;
+				continue;
+			}
+		} else {
 			pages[i] = alloc_pages(gfp, 0);
 			if (!pages[i])
 				goto error;
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 2/5] common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES attribute
  2016-01-11 17:30 ` Douglas Anderson
  (?)
  (?)
@ 2016-01-11 17:30 ` Douglas Anderson
  2016-01-13 12:36   ` Robin Murphy
  -1 siblings, 1 reply; 36+ messages in thread
From: Douglas Anderson @ 2016-01-11 17:30 UTC (permalink / raw)
  To: linux, mchehab, robin.murphy, tfiga, m.szyprowski
  Cc: pawel, Dmitry Torokhov, hch, Douglas Anderson, corbet, akpm,
	linux-doc, linux-kernel

This patch adds the DMA_ATTR_ALLOC_SINGLE_PAGES attribute to the
DMA-mapping subsystem.

This attribute can be used as a hint to the DMA-mapping subsystem that
it's likely not worth it to try to allocate large pages behind the
scenes.  Large pages are likely to make an IOMMU TLB work more
efficiently but may not be worth it.  See the Documentation contained in
this patch for more details about this attribute and when to use it.

Note that the name of the hint (DMA_ATTR_ALLOC_SINGLE_PAGES) is loosely
based on the name MADV_NOHUGEPAGE.  Just as there is MADV_NOHUGEPAGE
vs. MADV_HUGEPAGE we could also add an "opposite" attribute to
DMA_ATTR_ALLOC_SINGLE_PAGES.  Without having the "opposite" attribute
the lack of DMA_ATTR_ALLOC_SINGLE_PAGES means "use your best judgement
about whether to use small pages or large pages".

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
Changes in v6:
- renamed DMA_ATTR_NO_HUGE_PAGE to DMA_ATTR_ALLOC_SINGLE_PAGES

Changes in v5:
- renamed DMA_ATTR_NOHUGEPAGE to DMA_ATTR_NO_HUGE_PAGE
- s/ping ping/ping pong/

Changes in v4:
- renamed DMA_ATTR_SEQUENTIAL to DMA_ATTR_NOHUGEPAGE
- added Marek's ack

Changes in v3:
- add DMA_ATTR_SEQUENTIAL attribute new for v3

Changes in v2: None

 Documentation/DMA-attributes.txt | 23 +++++++++++++++++++++++
 include/linux/dma-attrs.h        |  1 +
 2 files changed, 24 insertions(+)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index 18dc52c4f2a0..69b7b65ab516 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -100,3 +100,26 @@ allocated by dma_alloc_attrs() function from individual pages if it can
 be mapped as contiguous chunk into device dma address space. By
 specifying this attribute the allocated buffer is forced to be contiguous
 also in physical memory.
+
+DMA_ATTR_ALLOC_SINGLE_PAGES
+------------------------
+
+This is a hint to the DMA-mapping subsystem that it's probably not worth
+the time to try to allocate memory to in a way that gives better TLB
+efficiency (AKA it's not worth trying to build the mapping out of larger
+pages).  You might want to specify this if:
+- You know that the accesses to this memory won't thrash the TLB.
+  You might know that the accesses are likely to be sequential or
+  that they aren't sequential but it's unlikely you'll ping-pong
+  between many addresses that are likely to be in different physical
+  pages.
+- You know that the penalty of TLB misses while accessing the
+  memory will be small enough to be inconsequential.  If you are
+  doing a heavy operation like decryption or decompression this
+  might be the case.
+- You know that the DMA mapping is fairly transitory.  If you expect
+  the mapping to have a short lifetime then it may be worth it to
+  optimize allocation (avoid coming up with large pages) instead of
+  getting the slight performance win of larger pages.
+Setting this hint doesn't guarantee that you won't get huge pages, but it
+means that we won't try quite as hard to get them.
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index 99c0be00b47c..5246239a4953 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -18,6 +18,7 @@ enum dma_attr {
 	DMA_ATTR_NO_KERNEL_MAPPING,
 	DMA_ATTR_SKIP_CPU_SYNC,
 	DMA_ATTR_FORCE_CONTIGUOUS,
+	DMA_ATTR_ALLOC_SINGLE_PAGES,
 	DMA_ATTR_MAX,
 };
 
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 3/5] ARM: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc
  2016-01-11 17:30 ` Douglas Anderson
@ 2016-01-11 17:30   ` Douglas Anderson
  -1 siblings, 0 replies; 36+ messages in thread
From: Douglas Anderson @ 2016-01-11 17:30 UTC (permalink / raw)
  To: linux, mchehab, robin.murphy, tfiga, m.szyprowski
  Cc: pawel, Dmitry Torokhov, hch, Douglas Anderson, will.deacon, akpm,
	dan.j.williams, carlo, laurent.pinchart+renesas, mike.looijmans,
	linux-arm-kernel, linux-kernel

If we know that TLB efficiency will not be an issue when memory is
accessed then it's not terribly important to allocate big chunks of
memory.  The whole point of allocating the big chunks was that it would
make TLB usage efficient.

As Marek Szyprowski indicated:
    Please note that mapping memory with larger pages significantly
    improves performance, especially when IOMMU has a little TLB
    cache. This can be easily observed when multimedia devices do
    processing of RGB data with 90/270 degree rotation
Image rotation is distinctly an operation that needs to bounce around
through memory, so it makes sense that TLB efficiency is important
there.

Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory.  Decoding video is also pretty heavy and the TLB misses aren't a
huge deal.  Presumably most HW video acceleration users of dma-mapping
will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.

Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full.  In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory.  This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out.  In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another.  Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.

Allocating big chunks of memory is also expensive indirectly.  Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks.  By eating all the big
chunks we're causing extra work for the rest of the system.  We also may
start making other memory allocations fail.  While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
Changes in v6:
- renamed DMA_ATTR_NO_HUGE_PAGE to DMA_ATTR_ALLOC_SINGLE_PAGES

Changes in v5:
- renamed DMA_ATTR_NOHUGEPAGE to DMA_ATTR_NO_HUGE_PAGE

Changes in v4:
- renamed DMA_ATTR_SEQUENTIAL to DMA_ATTR_NOHUGEPAGE
- added Marek's ack

Changes in v3:
- Use DMA_ATTR_SEQUENTIAL hint patch new for v3.

Changes in v2: None

 arch/arm/mm/dma-mapping.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index bc9cebfa0891..9f996a3d79f7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1158,6 +1158,10 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
 		return pages;
 	}
 
+	/* Go straight to 4K chunks if caller says it's OK. */
+	if (dma_get_attr(DMA_ATTR_ALLOC_SINGLE_PAGES, attrs))
+		order_idx = ARRAY_SIZE(iommu_order_array) - 1;
+
 	/*
 	 * IOMMU can map any pages, so himem can also be used here
 	 */
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 3/5] ARM: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc
@ 2016-01-11 17:30   ` Douglas Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Douglas Anderson @ 2016-01-11 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

If we know that TLB efficiency will not be an issue when memory is
accessed then it's not terribly important to allocate big chunks of
memory.  The whole point of allocating the big chunks was that it would
make TLB usage efficient.

As Marek Szyprowski indicated:
    Please note that mapping memory with larger pages significantly
    improves performance, especially when IOMMU has a little TLB
    cache. This can be easily observed when multimedia devices do
    processing of RGB data with 90/270 degree rotation
Image rotation is distinctly an operation that needs to bounce around
through memory, so it makes sense that TLB efficiency is important
there.

Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory.  Decoding video is also pretty heavy and the TLB misses aren't a
huge deal.  Presumably most HW video acceleration users of dma-mapping
will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.

Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full.  In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory.  This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out.  In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another.  Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.

Allocating big chunks of memory is also expensive indirectly.  Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks.  By eating all the big
chunks we're causing extra work for the rest of the system.  We also may
start making other memory allocations fail.  While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
Changes in v6:
- renamed DMA_ATTR_NO_HUGE_PAGE to DMA_ATTR_ALLOC_SINGLE_PAGES

Changes in v5:
- renamed DMA_ATTR_NOHUGEPAGE to DMA_ATTR_NO_HUGE_PAGE

Changes in v4:
- renamed DMA_ATTR_SEQUENTIAL to DMA_ATTR_NOHUGEPAGE
- added Marek's ack

Changes in v3:
- Use DMA_ATTR_SEQUENTIAL hint patch new for v3.

Changes in v2: None

 arch/arm/mm/dma-mapping.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index bc9cebfa0891..9f996a3d79f7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1158,6 +1158,10 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
 		return pages;
 	}
 
+	/* Go straight to 4K chunks if caller says it's OK. */
+	if (dma_get_attr(DMA_ATTR_ALLOC_SINGLE_PAGES, attrs))
+		order_idx = ARRAY_SIZE(iommu_order_array) - 1;
+
 	/*
 	 * IOMMU can map any pages, so himem can also be used here
 	 */
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 4/5] videobuf2-dc: Let drivers specify DMA attrs
  2016-01-11 17:30 ` Douglas Anderson
                   ` (3 preceding siblings ...)
  (?)
@ 2016-01-11 17:30 ` Douglas Anderson
  2016-02-01 11:40   ` Mauro Carvalho Chehab
                     ` (2 more replies)
  -1 siblings, 3 replies; 36+ messages in thread
From: Douglas Anderson @ 2016-01-11 17:30 UTC (permalink / raw)
  To: linux, mchehab, robin.murphy, tfiga, m.szyprowski
  Cc: pawel, Dmitry Torokhov, hch, Douglas Anderson, kyungmin.park,
	linux-media, linux-kernel

From: Tomasz Figa <tfiga@chromium.org>

DMA allocations might be subject to certain reqiurements specific to the
hardware using the buffers, such as availability of kernel mapping (for
contents fix-ups in the driver). The only entity that knows them is the
driver, so it must share this knowledge with vb2-dc.

This patch extends the alloc_ctx initialization interface to let the
driver specify DMA attrs, which are then stored inside the allocation
context and will be used for all allocations with that context.

As a side effect, all dma_*_coherent() calls are turned into
dma_*_attrs() calls, because the attributes need to be carried over
through all DMA operations.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
Changes in v6: None
Changes in v5:
- Let drivers specify DMA attrs new for v5

Changes in v4: None
Changes in v3: None
Changes in v2: None

 drivers/media/v4l2-core/videobuf2-dma-contig.c | 33 +++++++++++++++++---------
 include/media/videobuf2-dma-contig.h           | 11 ++++++++-
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index c33127284cfe..5361197f3e57 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -23,13 +23,16 @@
 
 struct vb2_dc_conf {
 	struct device		*dev;
+	struct dma_attrs	attrs;
 };
 
 struct vb2_dc_buf {
 	struct device			*dev;
 	void				*vaddr;
 	unsigned long			size;
+	void				*cookie;
 	dma_addr_t			dma_addr;
+	struct dma_attrs		attrs;
 	enum dma_data_direction		dma_dir;
 	struct sg_table			*dma_sgt;
 	struct frame_vector		*vec;
@@ -131,7 +134,8 @@ static void vb2_dc_put(void *buf_priv)
 		sg_free_table(buf->sgt_base);
 		kfree(buf->sgt_base);
 	}
-	dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);
+	dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
+			&buf->attrs);
 	put_device(buf->dev);
 	kfree(buf);
 }
@@ -147,14 +151,18 @@ static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size,
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
-	buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr,
-						GFP_KERNEL | gfp_flags);
-	if (!buf->vaddr) {
+	buf->attrs = conf->attrs;
+	buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
+					GFP_KERNEL | gfp_flags, &buf->attrs);
+	if (!buf->cookie) {
 		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
 		kfree(buf);
 		return ERR_PTR(-ENOMEM);
 	}
 
+	if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, &buf->attrs))
+		buf->vaddr = buf->cookie;
+
 	/* Prevent the device from being released while the buffer is used */
 	buf->dev = get_device(dev);
 	buf->size = size;
@@ -185,8 +193,8 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
 	 */
 	vma->vm_pgoff = 0;
 
-	ret = dma_mmap_coherent(buf->dev, vma, buf->vaddr,
-		buf->dma_addr, buf->size);
+	ret = dma_mmap_attrs(buf->dev, vma, buf->cookie,
+		buf->dma_addr, buf->size, &buf->attrs);
 
 	if (ret) {
 		pr_err("Remapping memory failed, error: %d\n", ret);
@@ -329,7 +337,7 @@ static void *vb2_dc_dmabuf_ops_kmap(struct dma_buf *dbuf, unsigned long pgnum)
 {
 	struct vb2_dc_buf *buf = dbuf->priv;
 
-	return buf->vaddr + pgnum * PAGE_SIZE;
+	return buf->vaddr ? buf->vaddr + pgnum * PAGE_SIZE : NULL;
 }
 
 static void *vb2_dc_dmabuf_ops_vmap(struct dma_buf *dbuf)
@@ -368,8 +376,8 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
 		return NULL;
 	}
 
-	ret = dma_get_sgtable(buf->dev, sgt, buf->vaddr, buf->dma_addr,
-		buf->size);
+	ret = dma_get_sgtable_attrs(buf->dev, sgt, buf->cookie, buf->dma_addr,
+		buf->size, &buf->attrs);
 	if (ret < 0) {
 		dev_err(buf->dev, "failed to get scatterlist from DMA API\n");
 		kfree(sgt);
@@ -721,7 +729,8 @@ const struct vb2_mem_ops vb2_dma_contig_memops = {
 };
 EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);
 
-void *vb2_dma_contig_init_ctx(struct device *dev)
+void *vb2_dma_contig_init_ctx_attrs(struct device *dev,
+				    struct dma_attrs *attrs)
 {
 	struct vb2_dc_conf *conf;
 
@@ -730,10 +739,12 @@ void *vb2_dma_contig_init_ctx(struct device *dev)
 		return ERR_PTR(-ENOMEM);
 
 	conf->dev = dev;
+	if (attrs)
+		conf->attrs = *attrs;
 
 	return conf;
 }
-EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx);
+EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx_attrs);
 
 void vb2_dma_contig_cleanup_ctx(void *alloc_ctx)
 {
diff --git a/include/media/videobuf2-dma-contig.h b/include/media/videobuf2-dma-contig.h
index c33dfa69d7ab..2087c9a68be3 100644
--- a/include/media/videobuf2-dma-contig.h
+++ b/include/media/videobuf2-dma-contig.h
@@ -16,6 +16,8 @@
 #include <media/videobuf2-v4l2.h>
 #include <linux/dma-mapping.h>
 
+struct dma_attrs;
+
 static inline dma_addr_t
 vb2_dma_contig_plane_dma_addr(struct vb2_buffer *vb, unsigned int plane_no)
 {
@@ -24,7 +26,14 @@ vb2_dma_contig_plane_dma_addr(struct vb2_buffer *vb, unsigned int plane_no)
 	return *addr;
 }
 
-void *vb2_dma_contig_init_ctx(struct device *dev);
+void *vb2_dma_contig_init_ctx_attrs(struct device *dev,
+				    struct dma_attrs *attrs);
+
+static inline void *vb2_dma_contig_init_ctx(struct device *dev)
+{
+	return vb2_dma_contig_init_ctx_attrs(dev, NULL);
+}
+
 void vb2_dma_contig_cleanup_ctx(void *alloc_ctx);
 
 extern const struct vb2_mem_ops vb2_dma_contig_memops;
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 5/5] s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES
  2016-01-11 17:30 ` Douglas Anderson
@ 2016-01-11 17:30   ` Douglas Anderson
  -1 siblings, 0 replies; 36+ messages in thread
From: Douglas Anderson @ 2016-01-11 17:30 UTC (permalink / raw)
  To: linux, mchehab, robin.murphy, tfiga, m.szyprowski
  Cc: pawel, Dmitry Torokhov, hch, Douglas Anderson, kyungmin.park,
	k.debski, jtp.park, linux-arm-kernel, linux-media, linux-kernel

We do video allocation all the time and we need it to be fast.  Plus TLB
efficiency isn't terribly important for video.

That means we want to set DMA_ATTR_ALLOC_SINGLE_PAGES.

See also the previous change ("ARM: dma-mapping: Use
DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize allocation").

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
Changes in v6:
- renamed DMA_ATTR_NO_HUGE_PAGE to DMA_ATTR_ALLOC_SINGLE_PAGES

Changes in v5:
- s5p-mfc patch new for v5

Changes in v4: None
Changes in v3: None
Changes in v2: None

 drivers/media/platform/s5p-mfc/s5p_mfc.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c b/drivers/media/platform/s5p-mfc/s5p_mfc.c
index 927ab4928779..421d25a1aec1 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -1095,6 +1095,7 @@ static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
 /* MFC probe function */
 static int s5p_mfc_probe(struct platform_device *pdev)
 {
+	DEFINE_DMA_ATTRS(attrs);
 	struct s5p_mfc_dev *dev;
 	struct video_device *vfd;
 	struct resource *res;
@@ -1164,12 +1165,20 @@ static int s5p_mfc_probe(struct platform_device *pdev)
 		}
 	}
 
-	dev->alloc_ctx[0] = vb2_dma_contig_init_ctx(dev->mem_dev_l);
+	/*
+	 * We'll do mostly sequential access, so sacrifice TLB efficiency for
+	 * faster allocation.
+	 */
+	dma_set_attr(DMA_ATTR_ALLOC_SINGLE_PAGES, &attrs);
+
+	dev->alloc_ctx[0] = vb2_dma_contig_init_ctx_attrs(dev->mem_dev_l,
+							  &attrs);
 	if (IS_ERR(dev->alloc_ctx[0])) {
 		ret = PTR_ERR(dev->alloc_ctx[0]);
 		goto err_res;
 	}
-	dev->alloc_ctx[1] = vb2_dma_contig_init_ctx(dev->mem_dev_r);
+	dev->alloc_ctx[1] = vb2_dma_contig_init_ctx_attrs(dev->mem_dev_r,
+							  &attrs);
 	if (IS_ERR(dev->alloc_ctx[1])) {
 		ret = PTR_ERR(dev->alloc_ctx[1]);
 		goto err_mem_init_ctx_1;
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 5/5] s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES
@ 2016-01-11 17:30   ` Douglas Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Douglas Anderson @ 2016-01-11 17:30 UTC (permalink / raw)
  To: linux-arm-kernel

We do video allocation all the time and we need it to be fast.  Plus TLB
efficiency isn't terribly important for video.

That means we want to set DMA_ATTR_ALLOC_SINGLE_PAGES.

See also the previous change ("ARM: dma-mapping: Use
DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize allocation").

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
Changes in v6:
- renamed DMA_ATTR_NO_HUGE_PAGE to DMA_ATTR_ALLOC_SINGLE_PAGES

Changes in v5:
- s5p-mfc patch new for v5

Changes in v4: None
Changes in v3: None
Changes in v2: None

 drivers/media/platform/s5p-mfc/s5p_mfc.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c b/drivers/media/platform/s5p-mfc/s5p_mfc.c
index 927ab4928779..421d25a1aec1 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -1095,6 +1095,7 @@ static int s5p_mfc_alloc_memdevs(struct s5p_mfc_dev *dev)
 /* MFC probe function */
 static int s5p_mfc_probe(struct platform_device *pdev)
 {
+	DEFINE_DMA_ATTRS(attrs);
 	struct s5p_mfc_dev *dev;
 	struct video_device *vfd;
 	struct resource *res;
@@ -1164,12 +1165,20 @@ static int s5p_mfc_probe(struct platform_device *pdev)
 		}
 	}
 
-	dev->alloc_ctx[0] = vb2_dma_contig_init_ctx(dev->mem_dev_l);
+	/*
+	 * We'll do mostly sequential access, so sacrifice TLB efficiency for
+	 * faster allocation.
+	 */
+	dma_set_attr(DMA_ATTR_ALLOC_SINGLE_PAGES, &attrs);
+
+	dev->alloc_ctx[0] = vb2_dma_contig_init_ctx_attrs(dev->mem_dev_l,
+							  &attrs);
 	if (IS_ERR(dev->alloc_ctx[0])) {
 		ret = PTR_ERR(dev->alloc_ctx[0]);
 		goto err_res;
 	}
-	dev->alloc_ctx[1] = vb2_dma_contig_init_ctx(dev->mem_dev_r);
+	dev->alloc_ctx[1] = vb2_dma_contig_init_ctx_attrs(dev->mem_dev_r,
+							  &attrs);
 	if (IS_ERR(dev->alloc_ctx[1])) {
 		ret = PTR_ERR(dev->alloc_ctx[1]);
 		goto err_mem_init_ctx_1;
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 1/5] ARM: dma-mapping: Optimize allocation
  2016-01-11 17:30   ` Douglas Anderson
@ 2016-01-13 12:23     ` Robin Murphy
  -1 siblings, 0 replies; 36+ messages in thread
From: Robin Murphy @ 2016-01-13 12:23 UTC (permalink / raw)
  To: Douglas Anderson, linux, mchehab, tfiga, m.szyprowski
  Cc: pawel, Dmitry Torokhov, hch, will.deacon, akpm, dan.j.williams,
	carlo, laurent.pinchart+renesas, mike.looijmans, penguin-kernel,
	linux-arm-kernel, linux-kernel

On 11/01/16 17:30, Douglas Anderson wrote:
> The __iommu_alloc_buffer() is expected to be called to allocate pretty
> sizeable buffers.  Upon simple tests of video I saw it trying to
> allocate 4,194,304 bytes.  The function tries to allocate large chunks
> in order to optimize IOMMU TLB usage.
>
> The current function is very, very slow.
>
> One problem is the way it keeps trying and trying to allocate big
> chunks.  Imagine a very fragmented memory that has 4M free but no
> contiguous pages at all.  Further imagine allocating 4M (1024 pages).
> We'll do the following memory allocations:
> - For page 1:
>    - Try to allocate order 10 (no retry)
>    - Try to allocate order 9 (no retry)
>    - ...
>    - Try to allocate order 0 (with retry, but not needed)
> - For page 2:
>    - Try to allocate order 9 (no retry)
>    - Try to allocate order 8 (no retry)
>    - ...
>    - Try to allocate order 0 (with retry, but not needed)
> - ...
> - ...
>
> Total number of calls to alloc() calls for this case is:
>    sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
>    => 9228
>
> The above is obviously worse case, but given how slow alloc can be we
> really want to try to avoid even somewhat bad cases.  I timed the old
> code with a device under memory pressure and it wasn't hard to see it
> take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
> was done on kernel 3.14, so possibly mainline would behave
> differently).
>
> A second problem is that allocating big chunks under memory pressure
> when we don't need them is just not a great idea anyway unless we really
> need them.  We can make due pretty well with smaller chunks so it's
> probably wise to leave bigger chunks for other users once memory
> pressure is on.
>
> Let's adjust the allocation like this:
>
> 1. If a big chunk fails, stop trying to hard and bump down to lower
>     order allocations.
> 2. Don't try useless orders.  The whole point of big chunks is to
>     optimize the TLB and it can really only make use of 2M, 1M, 64K and
>     4K sizes.
>
> We'll still tend to eat up a bunch of big chunks, but that might be the
> right answer for some users.  A future patch could possibly add a new
> DMA_ATTR that would let the caller decide that TLB optimization isn't
> important and that we should use smaller chunks.  Presumably this would
> be a sane strategy for some callers.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
> Changes in v6: None

Oops, good thing the the reply I just sent to v5 applies equally well to 
v6 too!

Robin.

> Changes in v5: None
> Changes in v4:
> - Added Marek's ack
>
> Changes in v3: None
> Changes in v2:
> - No longer just 1 page at a time, but gives up higher order quickly.
> - Only tries important higher order allocations that might help us.
>
>   arch/arm/mm/dma-mapping.c | 34 ++++++++++++++++++++--------------
>   1 file changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 0eca3812527e..bc9cebfa0891 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1122,6 +1122,9 @@ static inline void __free_iova(struct dma_iommu_mapping *mapping,
>   	spin_unlock_irqrestore(&mapping->lock, flags);
>   }
>
> +/* We'll try 2M, 1M, 64K, and finally 4K; array must end with 0! */
> +static const int iommu_order_array[] = { 9, 8, 4, 0 };
> +
>   static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
>   					  gfp_t gfp, struct dma_attrs *attrs)
>   {
> @@ -1129,6 +1132,7 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
>   	int count = size >> PAGE_SHIFT;
>   	int array_size = count * sizeof(struct page *);
>   	int i = 0;
> +	int order_idx = 0;
>
>   	if (array_size <= PAGE_SIZE)
>   		pages = kzalloc(array_size, GFP_KERNEL);
> @@ -1162,22 +1166,24 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
>   	while (count) {
>   		int j, order;
>
> -		for (order = __fls(count); order > 0; --order) {
> -			/*
> -			 * We do not want OOM killer to be invoked as long
> -			 * as we can fall back to single pages, so we force
> -			 * __GFP_NORETRY for orders higher than zero.
> -			 */
> -			pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
> -			if (pages[i])
> -				break;
> +		order = iommu_order_array[order_idx];
> +
> +		/* Drop down when we get small */
> +		if (__fls(count) < order) {
> +			order_idx++;
> +			continue;
>   		}
>
> -		if (!pages[i]) {
> -			/*
> -			 * Fall back to single page allocation.
> -			 * Might invoke OOM killer as last resort.
> -			 */
> +		if (order) {
> +			/* See if it's easy to allocate a high-order chunk */
> +			pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
> +
> +			/* Go down a notch at first sign of pressure */
> +			if (!pages[i]) {
> +				order_idx++;
> +				continue;
> +			}
> +		} else {
>   			pages[i] = alloc_pages(gfp, 0);
>   			if (!pages[i])
>   				goto error;
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 1/5] ARM: dma-mapping: Optimize allocation
@ 2016-01-13 12:23     ` Robin Murphy
  0 siblings, 0 replies; 36+ messages in thread
From: Robin Murphy @ 2016-01-13 12:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/01/16 17:30, Douglas Anderson wrote:
> The __iommu_alloc_buffer() is expected to be called to allocate pretty
> sizeable buffers.  Upon simple tests of video I saw it trying to
> allocate 4,194,304 bytes.  The function tries to allocate large chunks
> in order to optimize IOMMU TLB usage.
>
> The current function is very, very slow.
>
> One problem is the way it keeps trying and trying to allocate big
> chunks.  Imagine a very fragmented memory that has 4M free but no
> contiguous pages at all.  Further imagine allocating 4M (1024 pages).
> We'll do the following memory allocations:
> - For page 1:
>    - Try to allocate order 10 (no retry)
>    - Try to allocate order 9 (no retry)
>    - ...
>    - Try to allocate order 0 (with retry, but not needed)
> - For page 2:
>    - Try to allocate order 9 (no retry)
>    - Try to allocate order 8 (no retry)
>    - ...
>    - Try to allocate order 0 (with retry, but not needed)
> - ...
> - ...
>
> Total number of calls to alloc() calls for this case is:
>    sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
>    => 9228
>
> The above is obviously worse case, but given how slow alloc can be we
> really want to try to avoid even somewhat bad cases.  I timed the old
> code with a device under memory pressure and it wasn't hard to see it
> take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
> was done on kernel 3.14, so possibly mainline would behave
> differently).
>
> A second problem is that allocating big chunks under memory pressure
> when we don't need them is just not a great idea anyway unless we really
> need them.  We can make due pretty well with smaller chunks so it's
> probably wise to leave bigger chunks for other users once memory
> pressure is on.
>
> Let's adjust the allocation like this:
>
> 1. If a big chunk fails, stop trying to hard and bump down to lower
>     order allocations.
> 2. Don't try useless orders.  The whole point of big chunks is to
>     optimize the TLB and it can really only make use of 2M, 1M, 64K and
>     4K sizes.
>
> We'll still tend to eat up a bunch of big chunks, but that might be the
> right answer for some users.  A future patch could possibly add a new
> DMA_ATTR that would let the caller decide that TLB optimization isn't
> important and that we should use smaller chunks.  Presumably this would
> be a sane strategy for some callers.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
> Changes in v6: None

Oops, good thing the the reply I just sent to v5 applies equally well to 
v6 too!

Robin.

> Changes in v5: None
> Changes in v4:
> - Added Marek's ack
>
> Changes in v3: None
> Changes in v2:
> - No longer just 1 page at a time, but gives up higher order quickly.
> - Only tries important higher order allocations that might help us.
>
>   arch/arm/mm/dma-mapping.c | 34 ++++++++++++++++++++--------------
>   1 file changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 0eca3812527e..bc9cebfa0891 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1122,6 +1122,9 @@ static inline void __free_iova(struct dma_iommu_mapping *mapping,
>   	spin_unlock_irqrestore(&mapping->lock, flags);
>   }
>
> +/* We'll try 2M, 1M, 64K, and finally 4K; array must end with 0! */
> +static const int iommu_order_array[] = { 9, 8, 4, 0 };
> +
>   static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
>   					  gfp_t gfp, struct dma_attrs *attrs)
>   {
> @@ -1129,6 +1132,7 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
>   	int count = size >> PAGE_SHIFT;
>   	int array_size = count * sizeof(struct page *);
>   	int i = 0;
> +	int order_idx = 0;
>
>   	if (array_size <= PAGE_SIZE)
>   		pages = kzalloc(array_size, GFP_KERNEL);
> @@ -1162,22 +1166,24 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
>   	while (count) {
>   		int j, order;
>
> -		for (order = __fls(count); order > 0; --order) {
> -			/*
> -			 * We do not want OOM killer to be invoked as long
> -			 * as we can fall back to single pages, so we force
> -			 * __GFP_NORETRY for orders higher than zero.
> -			 */
> -			pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
> -			if (pages[i])
> -				break;
> +		order = iommu_order_array[order_idx];
> +
> +		/* Drop down when we get small */
> +		if (__fls(count) < order) {
> +			order_idx++;
> +			continue;
>   		}
>
> -		if (!pages[i]) {
> -			/*
> -			 * Fall back to single page allocation.
> -			 * Might invoke OOM killer as last resort.
> -			 */
> +		if (order) {
> +			/* See if it's easy to allocate a high-order chunk */
> +			pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
> +
> +			/* Go down a notch at first sign of pressure */
> +			if (!pages[i]) {
> +				order_idx++;
> +				continue;
> +			}
> +		} else {
>   			pages[i] = alloc_pages(gfp, 0);
>   			if (!pages[i])
>   				goto error;
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 2/5] common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES attribute
  2016-01-11 17:30 ` [PATCH v6 2/5] common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES attribute Douglas Anderson
@ 2016-01-13 12:36   ` Robin Murphy
  0 siblings, 0 replies; 36+ messages in thread
From: Robin Murphy @ 2016-01-13 12:36 UTC (permalink / raw)
  To: Douglas Anderson, linux, mchehab, tfiga, m.szyprowski
  Cc: pawel, Dmitry Torokhov, hch, corbet, akpm, linux-doc, linux-kernel

On 11/01/16 17:30, Douglas Anderson wrote:
> This patch adds the DMA_ATTR_ALLOC_SINGLE_PAGES attribute to the
> DMA-mapping subsystem.
>
> This attribute can be used as a hint to the DMA-mapping subsystem that
> it's likely not worth it to try to allocate large pages behind the
> scenes.  Large pages are likely to make an IOMMU TLB work more
> efficiently but may not be worth it.  See the Documentation contained in
> this patch for more details about this attribute and when to use it.
>
> Note that the name of the hint (DMA_ATTR_ALLOC_SINGLE_PAGES) is loosely
> based on the name MADV_NOHUGEPAGE.  Just as there is MADV_NOHUGEPAGE
> vs. MADV_HUGEPAGE we could also add an "opposite" attribute to
> DMA_ATTR_ALLOC_SINGLE_PAGES.  Without having the "opposite" attribute
> the lack of DMA_ATTR_ALLOC_SINGLE_PAGES means "use your best judgement
> about whether to use small pages or large pages".

I did ponder the discussion on v4 over the weekend and arrived at 
DMA_ATTR_LOW_ORDER_ALLOC and a shorter, more generalised description, 
but ALLOC_SINGLE_PAGES seems just as good to me.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>

Thanks,
Robin.

> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
> Changes in v6:
> - renamed DMA_ATTR_NO_HUGE_PAGE to DMA_ATTR_ALLOC_SINGLE_PAGES
>
> Changes in v5:
> - renamed DMA_ATTR_NOHUGEPAGE to DMA_ATTR_NO_HUGE_PAGE
> - s/ping ping/ping pong/
>
> Changes in v4:
> - renamed DMA_ATTR_SEQUENTIAL to DMA_ATTR_NOHUGEPAGE
> - added Marek's ack
>
> Changes in v3:
> - add DMA_ATTR_SEQUENTIAL attribute new for v3
>
> Changes in v2: None
>
>   Documentation/DMA-attributes.txt | 23 +++++++++++++++++++++++
>   include/linux/dma-attrs.h        |  1 +
>   2 files changed, 24 insertions(+)
>
> diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
> index 18dc52c4f2a0..69b7b65ab516 100644
> --- a/Documentation/DMA-attributes.txt
> +++ b/Documentation/DMA-attributes.txt
> @@ -100,3 +100,26 @@ allocated by dma_alloc_attrs() function from individual pages if it can
>   be mapped as contiguous chunk into device dma address space. By
>   specifying this attribute the allocated buffer is forced to be contiguous
>   also in physical memory.
> +
> +DMA_ATTR_ALLOC_SINGLE_PAGES
> +------------------------
> +
> +This is a hint to the DMA-mapping subsystem that it's probably not worth
> +the time to try to allocate memory to in a way that gives better TLB
> +efficiency (AKA it's not worth trying to build the mapping out of larger
> +pages).  You might want to specify this if:
> +- You know that the accesses to this memory won't thrash the TLB.
> +  You might know that the accesses are likely to be sequential or
> +  that they aren't sequential but it's unlikely you'll ping-pong
> +  between many addresses that are likely to be in different physical
> +  pages.
> +- You know that the penalty of TLB misses while accessing the
> +  memory will be small enough to be inconsequential.  If you are
> +  doing a heavy operation like decryption or decompression this
> +  might be the case.
> +- You know that the DMA mapping is fairly transitory.  If you expect
> +  the mapping to have a short lifetime then it may be worth it to
> +  optimize allocation (avoid coming up with large pages) instead of
> +  getting the slight performance win of larger pages.
> +Setting this hint doesn't guarantee that you won't get huge pages, but it
> +means that we won't try quite as hard to get them.
> diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
> index 99c0be00b47c..5246239a4953 100644
> --- a/include/linux/dma-attrs.h
> +++ b/include/linux/dma-attrs.h
> @@ -18,6 +18,7 @@ enum dma_attr {
>   	DMA_ATTR_NO_KERNEL_MAPPING,
>   	DMA_ATTR_SKIP_CPU_SYNC,
>   	DMA_ATTR_FORCE_CONTIGUOUS,
> +	DMA_ATTR_ALLOC_SINGLE_PAGES,
>   	DMA_ATTR_MAX,
>   };
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
  2016-01-11 17:30 ` Douglas Anderson
  (?)
@ 2016-01-26 23:31   ` Javier Martinez Canillas
  -1 siblings, 0 replies; 36+ messages in thread
From: Javier Martinez Canillas @ 2016-01-26 23:31 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Russell King, Mauro Carvalho Chehab, robin.murphy, tfiga,
	Marek Szyprowski, Kamil Debski, laurent.pinchart+renesas, pawel,
	Jonathan Corbet, mike.looijmans, Will Deacon, penguin-kernel,
	Dmitry Torokhov, linux-doc, Linux Kernel, hch, jtp.park,
	Kyungmin Park, carlo, akpm, Dan Williams, linux-arm-kernel,
	Linux Media Mailing List

Hello Doug,

On Mon, Jan 11, 2016 at 2:30 PM, Douglas Anderson <dianders@chromium.org> wrote:

[snip]

>
> All testing was done on the chromeos kernel-3.8 and kernel-3.14.
> Sanity (compile / boot) testing was done on a v4.4-rc6-based kernel on
> rk3288, though the video codec isn't there.  I don't have graphics / MFC
> working well on exynos, so the MFC change was only compile-tested
> upstream.  Hopefully someone upstream whose setup for MFC can give a
> Tested-by for these?
>

I tested these patches on a Exynos5800 Peach Pi Chromebook. The
s5p-mfc driver probes correctly and the allocation succeeds.

I also tried to test actual video decoding using Gstreamer but ran
into issues (not related to this series) so testing that won't be
trivial for me.

It shouldn't block Doug's series though IMHO since he tested on his
platform and the patches speeds up allocation there, so is an
improvement.

Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>

Best regards,
Javier

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-26 23:31   ` Javier Martinez Canillas
  0 siblings, 0 replies; 36+ messages in thread
From: Javier Martinez Canillas @ 2016-01-26 23:31 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Russell King, Mauro Carvalho Chehab, robin.murphy, tfiga,
	Marek Szyprowski, Kamil Debski, laurent.pinchart+renesas, pawel,
	Jonathan Corbet, mike.looijmans, Will Deacon, penguin-kernel,
	Dmitry Torokhov, linux-doc, Linux Kernel, hch, jtp.park,
	Kyungmin Park, carlo, akpm, Dan Williams, linux-arm-kernel,
	Linux Media Mailing List

Hello Doug,

On Mon, Jan 11, 2016 at 2:30 PM, Douglas Anderson <dianders@chromium.org> wrote:

[snip]

>
> All testing was done on the chromeos kernel-3.8 and kernel-3.14.
> Sanity (compile / boot) testing was done on a v4.4-rc6-based kernel on
> rk3288, though the video codec isn't there.  I don't have graphics / MFC
> working well on exynos, so the MFC change was only compile-tested
> upstream.  Hopefully someone upstream whose setup for MFC can give a
> Tested-by for these?
>

I tested these patches on a Exynos5800 Peach Pi Chromebook. The
s5p-mfc driver probes correctly and the allocation succeeds.

I also tried to test actual video decoding using Gstreamer but ran
into issues (not related to this series) so testing that won't be
trivial for me.

It shouldn't block Doug's series though IMHO since he tested on his
platform and the patches speeds up allocation there, so is an
improvement.

Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>

Best regards,
Javier

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-26 23:31   ` Javier Martinez Canillas
  0 siblings, 0 replies; 36+ messages in thread
From: Javier Martinez Canillas @ 2016-01-26 23:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Doug,

On Mon, Jan 11, 2016 at 2:30 PM, Douglas Anderson <dianders@chromium.org> wrote:

[snip]

>
> All testing was done on the chromeos kernel-3.8 and kernel-3.14.
> Sanity (compile / boot) testing was done on a v4.4-rc6-based kernel on
> rk3288, though the video codec isn't there.  I don't have graphics / MFC
> working well on exynos, so the MFC change was only compile-tested
> upstream.  Hopefully someone upstream whose setup for MFC can give a
> Tested-by for these?
>

I tested these patches on a Exynos5800 Peach Pi Chromebook. The
s5p-mfc driver probes correctly and the allocation succeeds.

I also tried to test actual video decoding using Gstreamer but ran
into issues (not related to this series) so testing that won't be
trivial for me.

It shouldn't block Doug's series though IMHO since he tested on his
platform and the patches speeds up allocation there, so is an
improvement.

Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>

Best regards,
Javier

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
  2016-01-11 17:30 ` Douglas Anderson
@ 2016-01-26 23:59   ` Andrew Morton
  -1 siblings, 0 replies; 36+ messages in thread
From: Andrew Morton @ 2016-01-26 23:59 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: linux, mchehab, robin.murphy, tfiga, m.szyprowski, pawel,
	Dmitry Torokhov, hch, k.debski, laurent.pinchart+renesas, corbet,
	mike.looijmans, linux-kernel, linux-doc, will.deacon, jtp.park,
	penguin-kernel, kyungmin.park, carlo, linux-media,
	dan.j.williams, linux-arm-kernel

On Mon, 11 Jan 2016 09:30:22 -0800 Douglas Anderson <dianders@chromium.org> wrote:

> This series of patches will speed up memory allocation in dma-mapping
> quite a bit.

This is pretty much all ARM and driver stuff so I think I'll duck it. 
But I can merge it if nobody else feels a need to.

I saw a few acked-by/tested-by/etc from the v5 posting which weren't
carried over into v6 (might have been a timing race), so please fix
that up if there's an opportunity.

Regarding the new DMA_ATTR_ALLOC_SINGLE_PAGES hint: I suggest adding
"DMA_ATTR_ALLOC_SINGLE_PAGES is presently implemented only on ARM" to
the docs.  Or perhaps have a shot at implementing it elsewhere.

Typo in 4/5 changelog: "reqiurements"

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-26 23:59   ` Andrew Morton
  0 siblings, 0 replies; 36+ messages in thread
From: Andrew Morton @ 2016-01-26 23:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 11 Jan 2016 09:30:22 -0800 Douglas Anderson <dianders@chromium.org> wrote:

> This series of patches will speed up memory allocation in dma-mapping
> quite a bit.

This is pretty much all ARM and driver stuff so I think I'll duck it. 
But I can merge it if nobody else feels a need to.

I saw a few acked-by/tested-by/etc from the v5 posting which weren't
carried over into v6 (might have been a timing race), so please fix
that up if there's an opportunity.

Regarding the new DMA_ATTR_ALLOC_SINGLE_PAGES hint: I suggest adding
"DMA_ATTR_ALLOC_SINGLE_PAGES is presently implemented only on ARM" to
the docs.  Or perhaps have a shot at implementing it elsewhere.

Typo in 4/5 changelog: "reqiurements"

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
  2016-01-26 23:59   ` Andrew Morton
  (?)
@ 2016-01-27  0:39     ` Doug Anderson
  -1 siblings, 0 replies; 36+ messages in thread
From: Doug Anderson @ 2016-01-27  0:39 UTC (permalink / raw)
  To: Russell King, Robin Murphy, Andrew Morton, Olof Johansson, Arnd Bergmann
  Cc: Mauro Carvalho Chehab, Tomasz Figa, Marek Szyprowski,
	Pawel Osciak, Dmitry Torokhov, Christoph Hellwig, k.debski,
	Laurent Pinchart, Jonathan Corbet, mike.looijmans, linux-kernel,
	linux-doc, Will Deacon, jtp.park, penguin-kernel, Kyungmin Park,
	Carlo Caione, linux-media, Dan Williams, linux-arm-kernel

Hi,

On Tue, Jan 26, 2016 at 3:59 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 11 Jan 2016 09:30:22 -0800 Douglas Anderson <dianders@chromium.org> wrote:
>
>> This series of patches will speed up memory allocation in dma-mapping
>> quite a bit.
>
> This is pretty much all ARM and driver stuff so I think I'll duck it.
> But I can merge it if nobody else feels a need to.

I was going to ask what the next steps were for this series.
Presumably I could post the patch to Russell's patch tracker if folks
want me to do that.  Alternatively it could go through the ARM-SOC
tree?


> I saw a few acked-by/tested-by/etc from the v5 posting which weren't
> carried over into v6 (might have been a timing race), so please fix
> that up if there's an opportunity.

Right.  Both Robin and Tomasz gave their Reviewed-by to Patch #1 in v5
even after v6 was posted.


> Regarding the new DMA_ATTR_ALLOC_SINGLE_PAGES hint: I suggest adding
> "DMA_ATTR_ALLOC_SINGLE_PAGES is presently implemented only on ARM" to
> the docs.  Or perhaps have a shot at implementing it elsewhere.

Warning sounds good.


> Typo in 4/5 changelog: "reqiurements"

Thanks for catching!


I'm happy to post up a v6 with these things fixed or I'm happy for
whoever is applying it to make these small fixes themselves.  Any
volunteers?  Olof, Arnd, or Russell: any of you want these patches?


-Doug

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-27  0:39     ` Doug Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Doug Anderson @ 2016-01-27  0:39 UTC (permalink / raw)
  To: Russell King, Robin Murphy, Andrew Morton, Olof Johansson, Arnd Bergmann
  Cc: Mauro Carvalho Chehab, Tomasz Figa, Marek Szyprowski,
	Pawel Osciak, Dmitry Torokhov, Christoph Hellwig, k.debski,
	Laurent Pinchart, Jonathan Corbet, mike.looijmans, linux-kernel,
	linux-doc, Will Deacon, jtp.park, penguin-kernel, Kyungmin Park,
	Carlo Caione, linux-media, Dan Williams, linux-arm-kernel

Hi,

On Tue, Jan 26, 2016 at 3:59 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 11 Jan 2016 09:30:22 -0800 Douglas Anderson <dianders@chromium.org> wrote:
>
>> This series of patches will speed up memory allocation in dma-mapping
>> quite a bit.
>
> This is pretty much all ARM and driver stuff so I think I'll duck it.
> But I can merge it if nobody else feels a need to.

I was going to ask what the next steps were for this series.
Presumably I could post the patch to Russell's patch tracker if folks
want me to do that.  Alternatively it could go through the ARM-SOC
tree?


> I saw a few acked-by/tested-by/etc from the v5 posting which weren't
> carried over into v6 (might have been a timing race), so please fix
> that up if there's an opportunity.

Right.  Both Robin and Tomasz gave their Reviewed-by to Patch #1 in v5
even after v6 was posted.


> Regarding the new DMA_ATTR_ALLOC_SINGLE_PAGES hint: I suggest adding
> "DMA_ATTR_ALLOC_SINGLE_PAGES is presently implemented only on ARM" to
> the docs.  Or perhaps have a shot at implementing it elsewhere.

Warning sounds good.


> Typo in 4/5 changelog: "reqiurements"

Thanks for catching!


I'm happy to post up a v6 with these things fixed or I'm happy for
whoever is applying it to make these small fixes themselves.  Any
volunteers?  Olof, Arnd, or Russell: any of you want these patches?


-Doug

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-27  0:39     ` Doug Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Doug Anderson @ 2016-01-27  0:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Tue, Jan 26, 2016 at 3:59 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon, 11 Jan 2016 09:30:22 -0800 Douglas Anderson <dianders@chromium.org> wrote:
>
>> This series of patches will speed up memory allocation in dma-mapping
>> quite a bit.
>
> This is pretty much all ARM and driver stuff so I think I'll duck it.
> But I can merge it if nobody else feels a need to.

I was going to ask what the next steps were for this series.
Presumably I could post the patch to Russell's patch tracker if folks
want me to do that.  Alternatively it could go through the ARM-SOC
tree?


> I saw a few acked-by/tested-by/etc from the v5 posting which weren't
> carried over into v6 (might have been a timing race), so please fix
> that up if there's an opportunity.

Right.  Both Robin and Tomasz gave their Reviewed-by to Patch #1 in v5
even after v6 was posted.


> Regarding the new DMA_ATTR_ALLOC_SINGLE_PAGES hint: I suggest adding
> "DMA_ATTR_ALLOC_SINGLE_PAGES is presently implemented only on ARM" to
> the docs.  Or perhaps have a shot at implementing it elsewhere.

Warning sounds good.


> Typo in 4/5 changelog: "reqiurements"

Thanks for catching!


I'm happy to post up a v6 with these things fixed or I'm happy for
whoever is applying it to make these small fixes themselves.  Any
volunteers?  Olof, Arnd, or Russell: any of you want these patches?


-Doug

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
  2016-01-27  0:39     ` Doug Anderson
  (?)
@ 2016-01-29 21:52       ` Olof Johansson
  -1 siblings, 0 replies; 36+ messages in thread
From: Olof Johansson @ 2016-01-29 21:52 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Russell King, Robin Murphy, Andrew Morton, Arnd Bergmann,
	Mauro Carvalho Chehab, Tomasz Figa, Marek Szyprowski,
	Pawel Osciak, Dmitry Torokhov, Christoph Hellwig, Kamil Debski,
	Laurent Pinchart, Jonathan Corbet, mike.looijmans, linux-kernel,
	linux-doc, Will Deacon, jtp.park, penguin-kernel, Kyungmin Park,
	Carlo Caione, Linux Media Mailing List, Dan Williams,
	linux-arm-kernel

On Tue, Jan 26, 2016 at 4:39 PM, Doug Anderson <dianders@chromium.org> wrote:
> Hi,
>
> On Tue, Jan 26, 2016 at 3:59 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Mon, 11 Jan 2016 09:30:22 -0800 Douglas Anderson <dianders@chromium.org> wrote:
>>
>>> This series of patches will speed up memory allocation in dma-mapping
>>> quite a bit.
>>
>> This is pretty much all ARM and driver stuff so I think I'll duck it.
>> But I can merge it if nobody else feels a need to.
>
> I was going to ask what the next steps were for this series.
> Presumably I could post the patch to Russell's patch tracker if folks
> want me to do that.  Alternatively it could go through the ARM-SOC
> tree?
>
>
>> I saw a few acked-by/tested-by/etc from the v5 posting which weren't
>> carried over into v6 (might have been a timing race), so please fix
>> that up if there's an opportunity.
>
> Right.  Both Robin and Tomasz gave their Reviewed-by to Patch #1 in v5
> even after v6 was posted.
>
>
>> Regarding the new DMA_ATTR_ALLOC_SINGLE_PAGES hint: I suggest adding
>> "DMA_ATTR_ALLOC_SINGLE_PAGES is presently implemented only on ARM" to
>> the docs.  Or perhaps have a shot at implementing it elsewhere.
>
> Warning sounds good.
>
>
>> Typo in 4/5 changelog: "reqiurements"
>
> Thanks for catching!
>
>
> I'm happy to post up a v6 with these things fixed or I'm happy for
> whoever is applying it to make these small fixes themselves.  Any
> volunteers?  Olof, Arnd, or Russell: any of you want these patches?

I think it makes sense to send these through Russell's tracker for him
to merge, especially since I don't think there are any dependencies on
them for SoC-specific patches coming up.


-Olof

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-29 21:52       ` Olof Johansson
  0 siblings, 0 replies; 36+ messages in thread
From: Olof Johansson @ 2016-01-29 21:52 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Russell King, Robin Murphy, Andrew Morton, Arnd Bergmann,
	Mauro Carvalho Chehab, Tomasz Figa, Marek Szyprowski,
	Pawel Osciak, Dmitry Torokhov, Christoph Hellwig, Kamil Debski,
	Laurent Pinchart, Jonathan Corbet, mike.looijmans, linux-kernel,
	linux-doc, Will Deacon, jtp.park, penguin-kernel, Kyungmin Park,
	Carlo Caione, Linux Media Mailing List, Dan Williams,
	linux-arm-kernel

On Tue, Jan 26, 2016 at 4:39 PM, Doug Anderson <dianders@chromium.org> wrote:
> Hi,
>
> On Tue, Jan 26, 2016 at 3:59 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Mon, 11 Jan 2016 09:30:22 -0800 Douglas Anderson <dianders@chromium.org> wrote:
>>
>>> This series of patches will speed up memory allocation in dma-mapping
>>> quite a bit.
>>
>> This is pretty much all ARM and driver stuff so I think I'll duck it.
>> But I can merge it if nobody else feels a need to.
>
> I was going to ask what the next steps were for this series.
> Presumably I could post the patch to Russell's patch tracker if folks
> want me to do that.  Alternatively it could go through the ARM-SOC
> tree?
>
>
>> I saw a few acked-by/tested-by/etc from the v5 posting which weren't
>> carried over into v6 (might have been a timing race), so please fix
>> that up if there's an opportunity.
>
> Right.  Both Robin and Tomasz gave their Reviewed-by to Patch #1 in v5
> even after v6 was posted.
>
>
>> Regarding the new DMA_ATTR_ALLOC_SINGLE_PAGES hint: I suggest adding
>> "DMA_ATTR_ALLOC_SINGLE_PAGES is presently implemented only on ARM" to
>> the docs.  Or perhaps have a shot at implementing it elsewhere.
>
> Warning sounds good.
>
>
>> Typo in 4/5 changelog: "reqiurements"
>
> Thanks for catching!
>
>
> I'm happy to post up a v6 with these things fixed or I'm happy for
> whoever is applying it to make these small fixes themselves.  Any
> volunteers?  Olof, Arnd, or Russell: any of you want these patches?

I think it makes sense to send these through Russell's tracker for him
to merge, especially since I don't think there are any dependencies on
them for SoC-specific patches coming up.


-Olof

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-29 21:52       ` Olof Johansson
  0 siblings, 0 replies; 36+ messages in thread
From: Olof Johansson @ 2016-01-29 21:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 26, 2016 at 4:39 PM, Doug Anderson <dianders@chromium.org> wrote:
> Hi,
>
> On Tue, Jan 26, 2016 at 3:59 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
>> On Mon, 11 Jan 2016 09:30:22 -0800 Douglas Anderson <dianders@chromium.org> wrote:
>>
>>> This series of patches will speed up memory allocation in dma-mapping
>>> quite a bit.
>>
>> This is pretty much all ARM and driver stuff so I think I'll duck it.
>> But I can merge it if nobody else feels a need to.
>
> I was going to ask what the next steps were for this series.
> Presumably I could post the patch to Russell's patch tracker if folks
> want me to do that.  Alternatively it could go through the ARM-SOC
> tree?
>
>
>> I saw a few acked-by/tested-by/etc from the v5 posting which weren't
>> carried over into v6 (might have been a timing race), so please fix
>> that up if there's an opportunity.
>
> Right.  Both Robin and Tomasz gave their Reviewed-by to Patch #1 in v5
> even after v6 was posted.
>
>
>> Regarding the new DMA_ATTR_ALLOC_SINGLE_PAGES hint: I suggest adding
>> "DMA_ATTR_ALLOC_SINGLE_PAGES is presently implemented only on ARM" to
>> the docs.  Or perhaps have a shot at implementing it elsewhere.
>
> Warning sounds good.
>
>
>> Typo in 4/5 changelog: "reqiurements"
>
> Thanks for catching!
>
>
> I'm happy to post up a v6 with these things fixed or I'm happy for
> whoever is applying it to make these small fixes themselves.  Any
> volunteers?  Olof, Arnd, or Russell: any of you want these patches?

I think it makes sense to send these through Russell's tracker for him
to merge, especially since I don't think there are any dependencies on
them for SoC-specific patches coming up.


-Olof

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
  2016-01-29 21:52       ` Olof Johansson
  (?)
@ 2016-01-29 21:58         ` Doug Anderson
  -1 siblings, 0 replies; 36+ messages in thread
From: Doug Anderson @ 2016-01-29 21:58 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Russell King, Robin Murphy, Andrew Morton, Arnd Bergmann,
	Mauro Carvalho Chehab, Tomasz Figa, Marek Szyprowski,
	Pawel Osciak, Dmitry Torokhov, Christoph Hellwig, Kamil Debski,
	Laurent Pinchart, Jonathan Corbet, mike.looijmans, linux-kernel,
	linux-doc, Will Deacon, jtp.park, penguin-kernel, Kyungmin Park,
	Carlo Caione, Linux Media Mailing List, Dan Williams,
	linux-arm-kernel

Hi,

On Fri, Jan 29, 2016 at 1:52 PM, Olof Johansson <olof@lixom.net> wrote:
> On Tue, Jan 26, 2016 at 4:39 PM, Doug Anderson <dianders@chromium.org> wrote:
>> Hi,
>>
>> On Tue, Jan 26, 2016 at 3:59 PM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>>> On Mon, 11 Jan 2016 09:30:22 -0800 Douglas Anderson <dianders@chromium.org> wrote:
>>>
>>>> This series of patches will speed up memory allocation in dma-mapping
>>>> quite a bit.
>>>
>>> This is pretty much all ARM and driver stuff so I think I'll duck it.
>>> But I can merge it if nobody else feels a need to.
>>
>> I was going to ask what the next steps were for this series.
>> Presumably I could post the patch to Russell's patch tracker if folks
>> want me to do that.  Alternatively it could go through the ARM-SOC
>> tree?
>>
>>
>>> I saw a few acked-by/tested-by/etc from the v5 posting which weren't
>>> carried over into v6 (might have been a timing race), so please fix
>>> that up if there's an opportunity.
>>
>> Right.  Both Robin and Tomasz gave their Reviewed-by to Patch #1 in v5
>> even after v6 was posted.
>>
>>
>>> Regarding the new DMA_ATTR_ALLOC_SINGLE_PAGES hint: I suggest adding
>>> "DMA_ATTR_ALLOC_SINGLE_PAGES is presently implemented only on ARM" to
>>> the docs.  Or perhaps have a shot at implementing it elsewhere.
>>
>> Warning sounds good.
>>
>>
>>> Typo in 4/5 changelog: "reqiurements"
>>
>> Thanks for catching!
>>
>>
>> I'm happy to post up a v6 with these things fixed or I'm happy for
>> whoever is applying it to make these small fixes themselves.  Any
>> volunteers?  Olof, Arnd, or Russell: any of you want these patches?
>
> I think it makes sense to send these through Russell's tracker for him
> to merge, especially since I don't think there are any dependencies on
> them for SoC-specific patches coming up.

Sounds good.  I'll make the nitfixes and I'll post a v7 directly to
Russell's tracker.  I'll follow up here with a link to those patches.

-Doug

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-29 21:58         ` Doug Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Doug Anderson @ 2016-01-29 21:58 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Russell King, Robin Murphy, Andrew Morton, Arnd Bergmann,
	Mauro Carvalho Chehab, Tomasz Figa, Marek Szyprowski,
	Pawel Osciak, Dmitry Torokhov, Christoph Hellwig, Kamil Debski,
	Laurent Pinchart, Jonathan Corbet, mike.looijmans, linux-kernel,
	linux-doc, Will Deacon, jtp.park, penguin-kernel, Kyungmin Park,
	Carlo Caione, Linux Media Mailing List, Dan Williams,
	linux-arm-kernel

Hi,

On Fri, Jan 29, 2016 at 1:52 PM, Olof Johansson <olof@lixom.net> wrote:
> On Tue, Jan 26, 2016 at 4:39 PM, Doug Anderson <dianders@chromium.org> wrote:
>> Hi,
>>
>> On Tue, Jan 26, 2016 at 3:59 PM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>>> On Mon, 11 Jan 2016 09:30:22 -0800 Douglas Anderson <dianders@chromium.org> wrote:
>>>
>>>> This series of patches will speed up memory allocation in dma-mapping
>>>> quite a bit.
>>>
>>> This is pretty much all ARM and driver stuff so I think I'll duck it.
>>> But I can merge it if nobody else feels a need to.
>>
>> I was going to ask what the next steps were for this series.
>> Presumably I could post the patch to Russell's patch tracker if folks
>> want me to do that.  Alternatively it could go through the ARM-SOC
>> tree?
>>
>>
>>> I saw a few acked-by/tested-by/etc from the v5 posting which weren't
>>> carried over into v6 (might have been a timing race), so please fix
>>> that up if there's an opportunity.
>>
>> Right.  Both Robin and Tomasz gave their Reviewed-by to Patch #1 in v5
>> even after v6 was posted.
>>
>>
>>> Regarding the new DMA_ATTR_ALLOC_SINGLE_PAGES hint: I suggest adding
>>> "DMA_ATTR_ALLOC_SINGLE_PAGES is presently implemented only on ARM" to
>>> the docs.  Or perhaps have a shot at implementing it elsewhere.
>>
>> Warning sounds good.
>>
>>
>>> Typo in 4/5 changelog: "reqiurements"
>>
>> Thanks for catching!
>>
>>
>> I'm happy to post up a v6 with these things fixed or I'm happy for
>> whoever is applying it to make these small fixes themselves.  Any
>> volunteers?  Olof, Arnd, or Russell: any of you want these patches?
>
> I think it makes sense to send these through Russell's tracker for him
> to merge, especially since I don't think there are any dependencies on
> them for SoC-specific patches coming up.

Sounds good.  I'll make the nitfixes and I'll post a v7 directly to
Russell's tracker.  I'll follow up here with a link to those patches.

-Doug

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-29 21:58         ` Doug Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Doug Anderson @ 2016-01-29 21:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Fri, Jan 29, 2016 at 1:52 PM, Olof Johansson <olof@lixom.net> wrote:
> On Tue, Jan 26, 2016 at 4:39 PM, Doug Anderson <dianders@chromium.org> wrote:
>> Hi,
>>
>> On Tue, Jan 26, 2016 at 3:59 PM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>>> On Mon, 11 Jan 2016 09:30:22 -0800 Douglas Anderson <dianders@chromium.org> wrote:
>>>
>>>> This series of patches will speed up memory allocation in dma-mapping
>>>> quite a bit.
>>>
>>> This is pretty much all ARM and driver stuff so I think I'll duck it.
>>> But I can merge it if nobody else feels a need to.
>>
>> I was going to ask what the next steps were for this series.
>> Presumably I could post the patch to Russell's patch tracker if folks
>> want me to do that.  Alternatively it could go through the ARM-SOC
>> tree?
>>
>>
>>> I saw a few acked-by/tested-by/etc from the v5 posting which weren't
>>> carried over into v6 (might have been a timing race), so please fix
>>> that up if there's an opportunity.
>>
>> Right.  Both Robin and Tomasz gave their Reviewed-by to Patch #1 in v5
>> even after v6 was posted.
>>
>>
>>> Regarding the new DMA_ATTR_ALLOC_SINGLE_PAGES hint: I suggest adding
>>> "DMA_ATTR_ALLOC_SINGLE_PAGES is presently implemented only on ARM" to
>>> the docs.  Or perhaps have a shot at implementing it elsewhere.
>>
>> Warning sounds good.
>>
>>
>>> Typo in 4/5 changelog: "reqiurements"
>>
>> Thanks for catching!
>>
>>
>> I'm happy to post up a v6 with these things fixed or I'm happy for
>> whoever is applying it to make these small fixes themselves.  Any
>> volunteers?  Olof, Arnd, or Russell: any of you want these patches?
>
> I think it makes sense to send these through Russell's tracker for him
> to merge, especially since I don't think there are any dependencies on
> them for SoC-specific patches coming up.

Sounds good.  I'll make the nitfixes and I'll post a v7 directly to
Russell's tracker.  I'll follow up here with a link to those patches.

-Doug

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
  2016-01-29 21:58         ` Doug Anderson
  (?)
@ 2016-01-29 22:14           ` Doug Anderson
  -1 siblings, 0 replies; 36+ messages in thread
From: Doug Anderson @ 2016-01-29 22:14 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Russell King, Robin Murphy, Andrew Morton, Arnd Bergmann,
	Mauro Carvalho Chehab, Tomasz Figa, Marek Szyprowski,
	Pawel Osciak, Dmitry Torokhov, Christoph Hellwig, Kamil Debski,
	Laurent Pinchart, Jonathan Corbet, mike.looijmans, linux-kernel,
	linux-doc, Will Deacon, jtp.park, penguin-kernel, Kyungmin Park,
	Carlo Caione, Linux Media Mailing List, Dan Williams,
	linux-arm-kernel

Hi,

On Fri, Jan 29, 2016 at 1:58 PM, Doug Anderson <dianders@chromium.org> wrote:
>> I think it makes sense to send these through Russell's tracker for him
>> to merge, especially since I don't think there are any dependencies on
>> them for SoC-specific patches coming up.
>
> Sounds good.  I'll make the nitfixes and I'll post a v7 directly to
> Russell's tracker.  I'll follow up here with a link to those patches.

For your viewing pleasure:

8505/1 dma-mapping: Optimize allocation
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8505/1

8506/1 common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES attribute
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8506/1

8507/1 dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8507/1

8508/1 videobuf2-dc: Let drivers specify DMA attrs
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8508/1

8509/1 s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8509/1

---

Changes in v7 (AKA the above patches):
- Add Robin and Tomasz Reviewed-by.
- Add Javier Tested-by.
- Add note that this is only implemented on ARM (Andrew Morton).
- Typo in commit message "reqiurements" (Andrew Morton).


-Doug

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-29 22:14           ` Doug Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Doug Anderson @ 2016-01-29 22:14 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Russell King, Robin Murphy, Andrew Morton, Arnd Bergmann,
	Mauro Carvalho Chehab, Tomasz Figa, Marek Szyprowski,
	Pawel Osciak, Dmitry Torokhov, Christoph Hellwig, Kamil Debski,
	Laurent Pinchart, Jonathan Corbet, mike.looijmans, linux-kernel,
	linux-doc, Will Deacon, jtp.park, penguin-kernel, Kyungmin Park,
	Carlo Caione, Linux Media Mailing List, Dan Williams,
	linux-arm-kernel

Hi,

On Fri, Jan 29, 2016 at 1:58 PM, Doug Anderson <dianders@chromium.org> wrote:
>> I think it makes sense to send these through Russell's tracker for him
>> to merge, especially since I don't think there are any dependencies on
>> them for SoC-specific patches coming up.
>
> Sounds good.  I'll make the nitfixes and I'll post a v7 directly to
> Russell's tracker.  I'll follow up here with a link to those patches.

For your viewing pleasure:

8505/1 dma-mapping: Optimize allocation
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8505/1

8506/1 common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES attribute
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8506/1

8507/1 dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8507/1

8508/1 videobuf2-dc: Let drivers specify DMA attrs
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8508/1

8509/1 s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8509/1

---

Changes in v7 (AKA the above patches):
- Add Robin and Tomasz Reviewed-by.
- Add Javier Tested-by.
- Add note that this is only implemented on ARM (Andrew Morton).
- Typo in commit message "reqiurements" (Andrew Morton).


-Doug

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation
@ 2016-01-29 22:14           ` Doug Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Doug Anderson @ 2016-01-29 22:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Fri, Jan 29, 2016 at 1:58 PM, Doug Anderson <dianders@chromium.org> wrote:
>> I think it makes sense to send these through Russell's tracker for him
>> to merge, especially since I don't think there are any dependencies on
>> them for SoC-specific patches coming up.
>
> Sounds good.  I'll make the nitfixes and I'll post a v7 directly to
> Russell's tracker.  I'll follow up here with a link to those patches.

For your viewing pleasure:

8505/1 dma-mapping: Optimize allocation
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8505/1

8506/1 common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES attribute
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8506/1

8507/1 dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8507/1

8508/1 videobuf2-dc: Let drivers specify DMA attrs
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8508/1

8509/1 s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8509/1

---

Changes in v7 (AKA the above patches):
- Add Robin and Tomasz Reviewed-by.
- Add Javier Tested-by.
- Add note that this is only implemented on ARM (Andrew Morton).
- Typo in commit message "reqiurements" (Andrew Morton).


-Doug

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 4/5] videobuf2-dc: Let drivers specify DMA attrs
  2016-01-11 17:30 ` [PATCH v6 4/5] videobuf2-dc: Let drivers specify DMA attrs Douglas Anderson
@ 2016-02-01 11:40   ` Mauro Carvalho Chehab
  2016-02-01 13:29   ` Marek Szyprowski
  2016-02-17 17:00   ` Hans Verkuil
  2 siblings, 0 replies; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2016-02-01 11:40 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: linux, robin.murphy, tfiga, m.szyprowski, pawel, Dmitry Torokhov,
	hch, kyungmin.park, linux-media, linux-kernel

Em Mon, 11 Jan 2016 09:30:26 -0800
Douglas Anderson <dianders@chromium.org> escreveu:

> From: Tomasz Figa <tfiga@chromium.org>
> 
> DMA allocations might be subject to certain reqiurements specific to the
> hardware using the buffers, such as availability of kernel mapping (for
> contents fix-ups in the driver). The only entity that knows them is the
> driver, so it must share this knowledge with vb2-dc.
> 
> This patch extends the alloc_ctx initialization interface to let the
> driver specify DMA attrs, which are then stored inside the allocation
> context and will be used for all allocations with that context.
> 
> As a side effect, all dma_*_coherent() calls are turned into
> dma_*_attrs() calls, because the attributes need to be carried over
> through all DMA operations.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Didn't test it, but provided that this is tested, I'm ok that such
patch would be merged via the ARM tree.

So, after testing with upstream Kernel:
	Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

> ---
> Changes in v6: None
> Changes in v5:
> - Let drivers specify DMA attrs new for v5
> 
> Changes in v4: None
> Changes in v3: None
> Changes in v2: None
> 
>  drivers/media/v4l2-core/videobuf2-dma-contig.c | 33 +++++++++++++++++---------
>  include/media/videobuf2-dma-contig.h           | 11 ++++++++-
>  2 files changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> index c33127284cfe..5361197f3e57 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> @@ -23,13 +23,16 @@
>  
>  struct vb2_dc_conf {
>  	struct device		*dev;
> +	struct dma_attrs	attrs;
>  };
>  
>  struct vb2_dc_buf {
>  	struct device			*dev;
>  	void				*vaddr;
>  	unsigned long			size;
> +	void				*cookie;
>  	dma_addr_t			dma_addr;
> +	struct dma_attrs		attrs;
>  	enum dma_data_direction		dma_dir;
>  	struct sg_table			*dma_sgt;
>  	struct frame_vector		*vec;
> @@ -131,7 +134,8 @@ static void vb2_dc_put(void *buf_priv)
>  		sg_free_table(buf->sgt_base);
>  		kfree(buf->sgt_base);
>  	}
> -	dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);
> +	dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
> +			&buf->attrs);
>  	put_device(buf->dev);
>  	kfree(buf);
>  }
> @@ -147,14 +151,18 @@ static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size,
>  	if (!buf)
>  		return ERR_PTR(-ENOMEM);
>  
> -	buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr,
> -						GFP_KERNEL | gfp_flags);
> -	if (!buf->vaddr) {
> +	buf->attrs = conf->attrs;
> +	buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
> +					GFP_KERNEL | gfp_flags, &buf->attrs);
> +	if (!buf->cookie) {
>  		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
>  		kfree(buf);
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> +	if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, &buf->attrs))
> +		buf->vaddr = buf->cookie;
> +
>  	/* Prevent the device from being released while the buffer is used */
>  	buf->dev = get_device(dev);
>  	buf->size = size;
> @@ -185,8 +193,8 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
>  	 */
>  	vma->vm_pgoff = 0;
>  
> -	ret = dma_mmap_coherent(buf->dev, vma, buf->vaddr,
> -		buf->dma_addr, buf->size);
> +	ret = dma_mmap_attrs(buf->dev, vma, buf->cookie,
> +		buf->dma_addr, buf->size, &buf->attrs);
>  
>  	if (ret) {
>  		pr_err("Remapping memory failed, error: %d\n", ret);
> @@ -329,7 +337,7 @@ static void *vb2_dc_dmabuf_ops_kmap(struct dma_buf *dbuf, unsigned long pgnum)
>  {
>  	struct vb2_dc_buf *buf = dbuf->priv;
>  
> -	return buf->vaddr + pgnum * PAGE_SIZE;
> +	return buf->vaddr ? buf->vaddr + pgnum * PAGE_SIZE : NULL;
>  }
>  
>  static void *vb2_dc_dmabuf_ops_vmap(struct dma_buf *dbuf)
> @@ -368,8 +376,8 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
>  		return NULL;
>  	}
>  
> -	ret = dma_get_sgtable(buf->dev, sgt, buf->vaddr, buf->dma_addr,
> -		buf->size);
> +	ret = dma_get_sgtable_attrs(buf->dev, sgt, buf->cookie, buf->dma_addr,
> +		buf->size, &buf->attrs);
>  	if (ret < 0) {
>  		dev_err(buf->dev, "failed to get scatterlist from DMA API\n");
>  		kfree(sgt);
> @@ -721,7 +729,8 @@ const struct vb2_mem_ops vb2_dma_contig_memops = {
>  };
>  EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);
>  
> -void *vb2_dma_contig_init_ctx(struct device *dev)
> +void *vb2_dma_contig_init_ctx_attrs(struct device *dev,
> +				    struct dma_attrs *attrs)
>  {
>  	struct vb2_dc_conf *conf;
>  
> @@ -730,10 +739,12 @@ void *vb2_dma_contig_init_ctx(struct device *dev)
>  		return ERR_PTR(-ENOMEM);
>  
>  	conf->dev = dev;
> +	if (attrs)
> +		conf->attrs = *attrs;
>  
>  	return conf;
>  }
> -EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx);
> +EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx_attrs);
>  
>  void vb2_dma_contig_cleanup_ctx(void *alloc_ctx)
>  {
> diff --git a/include/media/videobuf2-dma-contig.h b/include/media/videobuf2-dma-contig.h
> index c33dfa69d7ab..2087c9a68be3 100644
> --- a/include/media/videobuf2-dma-contig.h
> +++ b/include/media/videobuf2-dma-contig.h
> @@ -16,6 +16,8 @@
>  #include <media/videobuf2-v4l2.h>
>  #include <linux/dma-mapping.h>
>  
> +struct dma_attrs;
> +
>  static inline dma_addr_t
>  vb2_dma_contig_plane_dma_addr(struct vb2_buffer *vb, unsigned int plane_no)
>  {
> @@ -24,7 +26,14 @@ vb2_dma_contig_plane_dma_addr(struct vb2_buffer *vb, unsigned int plane_no)
>  	return *addr;
>  }
>  
> -void *vb2_dma_contig_init_ctx(struct device *dev);
> +void *vb2_dma_contig_init_ctx_attrs(struct device *dev,
> +				    struct dma_attrs *attrs);
> +
> +static inline void *vb2_dma_contig_init_ctx(struct device *dev)
> +{
> +	return vb2_dma_contig_init_ctx_attrs(dev, NULL);
> +}
> +
>  void vb2_dma_contig_cleanup_ctx(void *alloc_ctx);
>  
>  extern const struct vb2_mem_ops vb2_dma_contig_memops;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 4/5] videobuf2-dc: Let drivers specify DMA attrs
  2016-01-11 17:30 ` [PATCH v6 4/5] videobuf2-dc: Let drivers specify DMA attrs Douglas Anderson
  2016-02-01 11:40   ` Mauro Carvalho Chehab
@ 2016-02-01 13:29   ` Marek Szyprowski
  2016-02-01 21:37     ` Doug Anderson
  2016-02-17 17:00   ` Hans Verkuil
  2 siblings, 1 reply; 36+ messages in thread
From: Marek Szyprowski @ 2016-02-01 13:29 UTC (permalink / raw)
  To: Douglas Anderson, linux, mchehab, robin.murphy, tfiga
  Cc: pawel, Dmitry Torokhov, hch, kyungmin.park, linux-media, linux-kernel

Hello,

On 2016-01-11 18:30, Douglas Anderson wrote:
> From: Tomasz Figa <tfiga@chromium.org>
>
> DMA allocations might be subject to certain reqiurements specific to the
> hardware using the buffers, such as availability of kernel mapping (for
> contents fix-ups in the driver). The only entity that knows them is the
> driver, so it must share this knowledge with vb2-dc.
>
> This patch extends the alloc_ctx initialization interface to let the
> driver specify DMA attrs, which are then stored inside the allocation
> context and will be used for all allocations with that context.
>
> As a side effect, all dma_*_coherent() calls are turned into
> dma_*_attrs() calls, because the attributes need to be carried over
> through all DMA operations.
>
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>

> ---
> Changes in v6: None
> Changes in v5:
> - Let drivers specify DMA attrs new for v5
>
> Changes in v4: None
> Changes in v3: None
> Changes in v2: None
>
>   drivers/media/v4l2-core/videobuf2-dma-contig.c | 33 +++++++++++++++++---------
>   include/media/videobuf2-dma-contig.h           | 11 ++++++++-
>   2 files changed, 32 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> index c33127284cfe..5361197f3e57 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> @@ -23,13 +23,16 @@
>   
>   struct vb2_dc_conf {
>   	struct device		*dev;
> +	struct dma_attrs	attrs;
>   };
>   
>   struct vb2_dc_buf {
>   	struct device			*dev;
>   	void				*vaddr;
>   	unsigned long			size;
> +	void				*cookie;
>   	dma_addr_t			dma_addr;
> +	struct dma_attrs		attrs;
>   	enum dma_data_direction		dma_dir;
>   	struct sg_table			*dma_sgt;
>   	struct frame_vector		*vec;
> @@ -131,7 +134,8 @@ static void vb2_dc_put(void *buf_priv)
>   		sg_free_table(buf->sgt_base);
>   		kfree(buf->sgt_base);
>   	}
> -	dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);
> +	dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
> +			&buf->attrs);
>   	put_device(buf->dev);
>   	kfree(buf);
>   }
> @@ -147,14 +151,18 @@ static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size,
>   	if (!buf)
>   		return ERR_PTR(-ENOMEM);
>   
> -	buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr,
> -						GFP_KERNEL | gfp_flags);
> -	if (!buf->vaddr) {
> +	buf->attrs = conf->attrs;
> +	buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
> +					GFP_KERNEL | gfp_flags, &buf->attrs);
> +	if (!buf->cookie) {
>   		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
>   		kfree(buf);
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> +	if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, &buf->attrs))
> +		buf->vaddr = buf->cookie;
> +
>   	/* Prevent the device from being released while the buffer is used */
>   	buf->dev = get_device(dev);
>   	buf->size = size;
> @@ -185,8 +193,8 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
>   	 */
>   	vma->vm_pgoff = 0;
>   
> -	ret = dma_mmap_coherent(buf->dev, vma, buf->vaddr,
> -		buf->dma_addr, buf->size);
> +	ret = dma_mmap_attrs(buf->dev, vma, buf->cookie,
> +		buf->dma_addr, buf->size, &buf->attrs);
>   
>   	if (ret) {
>   		pr_err("Remapping memory failed, error: %d\n", ret);
> @@ -329,7 +337,7 @@ static void *vb2_dc_dmabuf_ops_kmap(struct dma_buf *dbuf, unsigned long pgnum)
>   {
>   	struct vb2_dc_buf *buf = dbuf->priv;
>   
> -	return buf->vaddr + pgnum * PAGE_SIZE;
> +	return buf->vaddr ? buf->vaddr + pgnum * PAGE_SIZE : NULL;
>   }
>   
>   static void *vb2_dc_dmabuf_ops_vmap(struct dma_buf *dbuf)
> @@ -368,8 +376,8 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
>   		return NULL;
>   	}
>   
> -	ret = dma_get_sgtable(buf->dev, sgt, buf->vaddr, buf->dma_addr,
> -		buf->size);
> +	ret = dma_get_sgtable_attrs(buf->dev, sgt, buf->cookie, buf->dma_addr,
> +		buf->size, &buf->attrs);
>   	if (ret < 0) {
>   		dev_err(buf->dev, "failed to get scatterlist from DMA API\n");
>   		kfree(sgt);
> @@ -721,7 +729,8 @@ const struct vb2_mem_ops vb2_dma_contig_memops = {
>   };
>   EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);
>   
> -void *vb2_dma_contig_init_ctx(struct device *dev)
> +void *vb2_dma_contig_init_ctx_attrs(struct device *dev,
> +				    struct dma_attrs *attrs)
>   {
>   	struct vb2_dc_conf *conf;
>   
> @@ -730,10 +739,12 @@ void *vb2_dma_contig_init_ctx(struct device *dev)
>   		return ERR_PTR(-ENOMEM);
>   
>   	conf->dev = dev;
> +	if (attrs)
> +		conf->attrs = *attrs;
>   
>   	return conf;
>   }
> -EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx);
> +EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx_attrs);
>   
>   void vb2_dma_contig_cleanup_ctx(void *alloc_ctx)
>   {
> diff --git a/include/media/videobuf2-dma-contig.h b/include/media/videobuf2-dma-contig.h
> index c33dfa69d7ab..2087c9a68be3 100644
> --- a/include/media/videobuf2-dma-contig.h
> +++ b/include/media/videobuf2-dma-contig.h
> @@ -16,6 +16,8 @@
>   #include <media/videobuf2-v4l2.h>
>   #include <linux/dma-mapping.h>
>   
> +struct dma_attrs;
> +
>   static inline dma_addr_t
>   vb2_dma_contig_plane_dma_addr(struct vb2_buffer *vb, unsigned int plane_no)
>   {
> @@ -24,7 +26,14 @@ vb2_dma_contig_plane_dma_addr(struct vb2_buffer *vb, unsigned int plane_no)
>   	return *addr;
>   }
>   
> -void *vb2_dma_contig_init_ctx(struct device *dev);
> +void *vb2_dma_contig_init_ctx_attrs(struct device *dev,
> +				    struct dma_attrs *attrs);
> +
> +static inline void *vb2_dma_contig_init_ctx(struct device *dev)
> +{
> +	return vb2_dma_contig_init_ctx_attrs(dev, NULL);
> +}
> +
>   void vb2_dma_contig_cleanup_ctx(void *alloc_ctx);
>   
>   extern const struct vb2_mem_ops vb2_dma_contig_memops;

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 4/5] videobuf2-dc: Let drivers specify DMA attrs
  2016-02-01 13:29   ` Marek Szyprowski
@ 2016-02-01 21:37     ` Doug Anderson
  0 siblings, 0 replies; 36+ messages in thread
From: Doug Anderson @ 2016-02-01 21:37 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Russell King, Mauro Carvalho Chehab, Robin Murphy, Tomasz Figa,
	Pawel Osciak, Dmitry Torokhov, Christoph Hellwig, Kyungmin Park,
	Linux Media Mailing List, linux-kernel

On Mon, Feb 1, 2016 at 5:29 AM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> Hello,
>
> On 2016-01-11 18:30, Douglas Anderson wrote:
>>
>> From: Tomasz Figa <tfiga@chromium.org>
>>
>> DMA allocations might be subject to certain reqiurements specific to the
>> hardware using the buffers, such as availability of kernel mapping (for
>> contents fix-ups in the driver). The only entity that knows them is the
>> driver, so it must share this knowledge with vb2-dc.
>>
>> This patch extends the alloc_ctx initialization interface to let the
>> driver specify DMA attrs, which are then stored inside the allocation
>> context and will be used for all allocations with that context.
>>
>> As a side effect, all dma_*_coherent() calls are turned into
>> dma_*_attrs() calls, because the attributes need to be carried over
>> through all DMA operations.
>>
>> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>
>
> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>

Added both your ans Mauro's acks to the current patch in RMK's patch
tracker.  You can see the patch at
<http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8508/2>

Note that Javier has tested this series upstream on a Samsung
Chromebook and validated that the allocations are working as intended,
even if MFC is a bit tricky to get working properly upstream.


-Doug

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 4/5] videobuf2-dc: Let drivers specify DMA attrs
  2016-01-11 17:30 ` [PATCH v6 4/5] videobuf2-dc: Let drivers specify DMA attrs Douglas Anderson
  2016-02-01 11:40   ` Mauro Carvalho Chehab
  2016-02-01 13:29   ` Marek Szyprowski
@ 2016-02-17 17:00   ` Hans Verkuil
  2016-02-17 17:26     ` Doug Anderson
  2016-02-18  5:58     ` Tomasz Figa
  2 siblings, 2 replies; 36+ messages in thread
From: Hans Verkuil @ 2016-02-17 17:00 UTC (permalink / raw)
  To: Douglas Anderson, linux, mchehab, robin.murphy, tfiga, m.szyprowski
  Cc: pawel, Dmitry Torokhov, hch, kyungmin.park, linux-media, linux-kernel

Hi Doug,

Is there any reason to think that different planes will need different
DMA attrs? I ask because this patch series of mine:

http://www.spinics.net/lists/linux-media/msg97522.html

does away with allocating allocation contexts (struct vb2_dc_conf).

For dma_attr this would mean that struct dma_attrs would probably be implemented
as a struct dma_attrs pointer in the vb2_queue struct once I rebase that patch series
on top of this patch. In other words, the same dma_attrs struct would be used for all
planes.

Second question: would specifying dma_attrs also make sense for videobuf2-dma-sg.c?

Regards,

	Hans

On 01/11/2016 06:30 PM, Douglas Anderson wrote:
> From: Tomasz Figa <tfiga@chromium.org>
> 
> DMA allocations might be subject to certain reqiurements specific to the
> hardware using the buffers, such as availability of kernel mapping (for
> contents fix-ups in the driver). The only entity that knows them is the
> driver, so it must share this knowledge with vb2-dc.
> 
> This patch extends the alloc_ctx initialization interface to let the
> driver specify DMA attrs, which are then stored inside the allocation
> context and will be used for all allocations with that context.
> 
> As a side effect, all dma_*_coherent() calls are turned into
> dma_*_attrs() calls, because the attributes need to be carried over
> through all DMA operations.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> Changes in v6: None
> Changes in v5:
> - Let drivers specify DMA attrs new for v5
> 
> Changes in v4: None
> Changes in v3: None
> Changes in v2: None
> 
>  drivers/media/v4l2-core/videobuf2-dma-contig.c | 33 +++++++++++++++++---------
>  include/media/videobuf2-dma-contig.h           | 11 ++++++++-
>  2 files changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> index c33127284cfe..5361197f3e57 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> @@ -23,13 +23,16 @@
>  
>  struct vb2_dc_conf {
>  	struct device		*dev;
> +	struct dma_attrs	attrs;
>  };
>  
>  struct vb2_dc_buf {
>  	struct device			*dev;
>  	void				*vaddr;
>  	unsigned long			size;
> +	void				*cookie;
>  	dma_addr_t			dma_addr;
> +	struct dma_attrs		attrs;
>  	enum dma_data_direction		dma_dir;
>  	struct sg_table			*dma_sgt;
>  	struct frame_vector		*vec;
> @@ -131,7 +134,8 @@ static void vb2_dc_put(void *buf_priv)
>  		sg_free_table(buf->sgt_base);
>  		kfree(buf->sgt_base);
>  	}
> -	dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);
> +	dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
> +			&buf->attrs);
>  	put_device(buf->dev);
>  	kfree(buf);
>  }
> @@ -147,14 +151,18 @@ static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size,
>  	if (!buf)
>  		return ERR_PTR(-ENOMEM);
>  
> -	buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr,
> -						GFP_KERNEL | gfp_flags);
> -	if (!buf->vaddr) {
> +	buf->attrs = conf->attrs;
> +	buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
> +					GFP_KERNEL | gfp_flags, &buf->attrs);
> +	if (!buf->cookie) {
>  		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
>  		kfree(buf);
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> +	if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, &buf->attrs))
> +		buf->vaddr = buf->cookie;
> +
>  	/* Prevent the device from being released while the buffer is used */
>  	buf->dev = get_device(dev);
>  	buf->size = size;
> @@ -185,8 +193,8 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
>  	 */
>  	vma->vm_pgoff = 0;
>  
> -	ret = dma_mmap_coherent(buf->dev, vma, buf->vaddr,
> -		buf->dma_addr, buf->size);
> +	ret = dma_mmap_attrs(buf->dev, vma, buf->cookie,
> +		buf->dma_addr, buf->size, &buf->attrs);
>  
>  	if (ret) {
>  		pr_err("Remapping memory failed, error: %d\n", ret);
> @@ -329,7 +337,7 @@ static void *vb2_dc_dmabuf_ops_kmap(struct dma_buf *dbuf, unsigned long pgnum)
>  {
>  	struct vb2_dc_buf *buf = dbuf->priv;
>  
> -	return buf->vaddr + pgnum * PAGE_SIZE;
> +	return buf->vaddr ? buf->vaddr + pgnum * PAGE_SIZE : NULL;
>  }
>  
>  static void *vb2_dc_dmabuf_ops_vmap(struct dma_buf *dbuf)
> @@ -368,8 +376,8 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
>  		return NULL;
>  	}
>  
> -	ret = dma_get_sgtable(buf->dev, sgt, buf->vaddr, buf->dma_addr,
> -		buf->size);
> +	ret = dma_get_sgtable_attrs(buf->dev, sgt, buf->cookie, buf->dma_addr,
> +		buf->size, &buf->attrs);
>  	if (ret < 0) {
>  		dev_err(buf->dev, "failed to get scatterlist from DMA API\n");
>  		kfree(sgt);
> @@ -721,7 +729,8 @@ const struct vb2_mem_ops vb2_dma_contig_memops = {
>  };
>  EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);
>  
> -void *vb2_dma_contig_init_ctx(struct device *dev)
> +void *vb2_dma_contig_init_ctx_attrs(struct device *dev,
> +				    struct dma_attrs *attrs)
>  {
>  	struct vb2_dc_conf *conf;
>  
> @@ -730,10 +739,12 @@ void *vb2_dma_contig_init_ctx(struct device *dev)
>  		return ERR_PTR(-ENOMEM);
>  
>  	conf->dev = dev;
> +	if (attrs)
> +		conf->attrs = *attrs;
>  
>  	return conf;
>  }
> -EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx);
> +EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx_attrs);
>  
>  void vb2_dma_contig_cleanup_ctx(void *alloc_ctx)
>  {
> diff --git a/include/media/videobuf2-dma-contig.h b/include/media/videobuf2-dma-contig.h
> index c33dfa69d7ab..2087c9a68be3 100644
> --- a/include/media/videobuf2-dma-contig.h
> +++ b/include/media/videobuf2-dma-contig.h
> @@ -16,6 +16,8 @@
>  #include <media/videobuf2-v4l2.h>
>  #include <linux/dma-mapping.h>
>  
> +struct dma_attrs;
> +
>  static inline dma_addr_t
>  vb2_dma_contig_plane_dma_addr(struct vb2_buffer *vb, unsigned int plane_no)
>  {
> @@ -24,7 +26,14 @@ vb2_dma_contig_plane_dma_addr(struct vb2_buffer *vb, unsigned int plane_no)
>  	return *addr;
>  }
>  
> -void *vb2_dma_contig_init_ctx(struct device *dev);
> +void *vb2_dma_contig_init_ctx_attrs(struct device *dev,
> +				    struct dma_attrs *attrs);
> +
> +static inline void *vb2_dma_contig_init_ctx(struct device *dev)
> +{
> +	return vb2_dma_contig_init_ctx_attrs(dev, NULL);
> +}
> +
>  void vb2_dma_contig_cleanup_ctx(void *alloc_ctx);
>  
>  extern const struct vb2_mem_ops vb2_dma_contig_memops;
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 4/5] videobuf2-dc: Let drivers specify DMA attrs
  2016-02-17 17:00   ` Hans Verkuil
@ 2016-02-17 17:26     ` Doug Anderson
  2016-02-18  5:58     ` Tomasz Figa
  1 sibling, 0 replies; 36+ messages in thread
From: Doug Anderson @ 2016-02-17 17:26 UTC (permalink / raw)
  To: Hans Verkuil, Tomasz Figa
  Cc: Russell King, Mauro Carvalho Chehab, Robin Murphy,
	Marek Szyprowski, Pawel Osciak, Dmitry Torokhov,
	Christoph Hellwig, Kyungmin Park, Linux Media Mailing List,
	linux-kernel

Hans,

On Wed, Feb 17, 2016 at 9:00 AM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> Hi Doug,
>
> Is there any reason to think that different planes will need different
> DMA attrs? I ask because this patch series of mine:
>
> http://www.spinics.net/lists/linux-media/msg97522.html
>
> does away with allocating allocation contexts (struct vb2_dc_conf).
>
> For dma_attr this would mean that struct dma_attrs would probably be implemented
> as a struct dma_attrs pointer in the vb2_queue struct once I rebase that patch series
> on top of this patch. In other words, the same dma_attrs struct would be used for all
> planes.
>
> Second question: would specifying dma_attrs also make sense for videobuf2-dma-sg.c?

Those are all probably better questions for someone like Tomasz, who
authored ${SUBJECT} patch.  I mostly ended up touching this codepath
by going down the rabbit hole chasing a bug.  In my particular case I
was seeing that video was eating up all the large chunks in the system
needlessly and starving other memory users.  Using DMA attrs was the
logical way to indicate that we didn't need large chunks, so I used
it.  Other than that I'm totally unfamiliar with the video subsystem.


-Doug

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 4/5] videobuf2-dc: Let drivers specify DMA attrs
  2016-02-17 17:00   ` Hans Verkuil
  2016-02-17 17:26     ` Doug Anderson
@ 2016-02-18  5:58     ` Tomasz Figa
  1 sibling, 0 replies; 36+ messages in thread
From: Tomasz Figa @ 2016-02-18  5:58 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Douglas Anderson, Russell King, Mauro Carvalho Chehab,
	Robin Murphy, Marek Szyprowski, Pawel Osciak, Dmitry Torokhov,
	Christoph Hellwig, Kyungmin Park, linux-media, linux-kernel

Hi Hans,

On Thu, Feb 18, 2016 at 2:00 AM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> Hi Doug,
>
> Is there any reason to think that different planes will need different
> DMA attrs? I ask because this patch series of mine:
>
> http://www.spinics.net/lists/linux-media/msg97522.html
>
> does away with allocating allocation contexts (struct vb2_dc_conf).
>
> For dma_attr this would mean that struct dma_attrs would probably be implemented
> as a struct dma_attrs pointer in the vb2_queue struct once I rebase that patch series
> on top of this patch. In other words, the same dma_attrs struct would be used for all
> planes.

I could think of some format consisting of video and metadata planes
(such as V4L2_PIX_FMT_S5C_UYVY_JPG) and some hypothetical hardware,
which generates the metadata in a way that requires patching in
.buf_finish(). In this case, we can allocate video plane without
kernel mapping, but for metadata plane kernel mapping is necessary to
do the patching.

However the above is only a hypothetical "what if" of mine, since
personally I haven't seen such case yet. Our real use case is
allocating raw video planes without kernel mapping, while keeping
kernel mapping available for encoded bitstream, which needs some extra
patching. The reason for disabling kernel mapping is that vmalloc
space can be easily exhausted when processing high resolution video
with long buffer queues (e.g. high resolution H264 decode/encode).

>
> Second question: would specifying dma_attrs also make sense for videobuf2-dma-sg.c?

For our particular use case, probably not, because I don't see kernel
mapping being implicitly created at videobuf2-dma-sg level for
allocated MMAP buffers. AFAICT only if vb2_plane_vaddr() or respective
DMA-BUF op is called then the mapping is created, which is unavoidable
because the caller apparently needs it for something.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2016-02-18  5:58 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-11 17:30 [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation Douglas Anderson
2016-01-11 17:30 ` Douglas Anderson
2016-01-11 17:30 ` [PATCH v6 1/5] ARM: dma-mapping: Optimize allocation Douglas Anderson
2016-01-11 17:30   ` Douglas Anderson
2016-01-13 12:23   ` Robin Murphy
2016-01-13 12:23     ` Robin Murphy
2016-01-11 17:30 ` [PATCH v6 2/5] common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES attribute Douglas Anderson
2016-01-13 12:36   ` Robin Murphy
2016-01-11 17:30 ` [PATCH v6 3/5] ARM: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc Douglas Anderson
2016-01-11 17:30   ` Douglas Anderson
2016-01-11 17:30 ` [PATCH v6 4/5] videobuf2-dc: Let drivers specify DMA attrs Douglas Anderson
2016-02-01 11:40   ` Mauro Carvalho Chehab
2016-02-01 13:29   ` Marek Szyprowski
2016-02-01 21:37     ` Doug Anderson
2016-02-17 17:00   ` Hans Verkuil
2016-02-17 17:26     ` Doug Anderson
2016-02-18  5:58     ` Tomasz Figa
2016-01-11 17:30 ` [PATCH v6 5/5] s5p-mfc: Set DMA_ATTR_ALLOC_SINGLE_PAGES Douglas Anderson
2016-01-11 17:30   ` Douglas Anderson
2016-01-26 23:31 ` [PATCH v6 0/5] dma-mapping: Patches for speeding up allocation Javier Martinez Canillas
2016-01-26 23:31   ` Javier Martinez Canillas
2016-01-26 23:31   ` Javier Martinez Canillas
2016-01-26 23:59 ` Andrew Morton
2016-01-26 23:59   ` Andrew Morton
2016-01-27  0:39   ` Doug Anderson
2016-01-27  0:39     ` Doug Anderson
2016-01-27  0:39     ` Doug Anderson
2016-01-29 21:52     ` Olof Johansson
2016-01-29 21:52       ` Olof Johansson
2016-01-29 21:52       ` Olof Johansson
2016-01-29 21:58       ` Doug Anderson
2016-01-29 21:58         ` Doug Anderson
2016-01-29 21:58         ` Doug Anderson
2016-01-29 22:14         ` Doug Anderson
2016-01-29 22:14           ` Doug Anderson
2016-01-29 22:14           ` Doug Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.