All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Userptr bo slab use optimization
@ 2017-07-27  9:05 Tvrtko Ursulin
  2017-07-27  9:05   ` Tvrtko Ursulin
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-07-27  9:05 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Ben Widawsky

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Yet another attempt to get this series reviewed and merged...

I've heard Vulkan might be creating a lot of userptr objects so might be
interesting to check what benefit it brings to those use cases.

As an introduction, this allows i915 to create fewer sg table entries for the bo
backing store representation. As such it primarily saves kernel slab memory.

When we added this optimisation to normal i915 bos, the savings were as far as
I remember around 1-2MiB of slab after booting to KDE desktop, and 2-4Mib on
neverball (game) main screen (or maybe it was while playing).

So thinking is, if Vulkan is indeed using a lot of userptr bos, it should
translate to similar savings there. It is not much but the motto is that every
little helps.

On the low level the saving will be up to around 32 bytes for each 4k of an
userptr bo (1GiB of userptr bos = up to ~8MiB of slab saving), with the actual
number depending on the backing store fragmentation.

Tvrtko Ursulin (4):
  lib/scatterlist: Fix offset type in sg_alloc_table_from_pages
  lib/scatterlist: Avoid potential scatterlist entry overflow
  lib/scatterlist: Introduce and export __sg_alloc_table_from_pages
  drm/i915: Use __sg_alloc_table_from_pages for userptr allocations

 drivers/gpu/drm/i915/i915_drv.h                | 15 +++++
 drivers/gpu/drm/i915/i915_gem.c                |  6 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c        | 79 +++++++++--------------
 drivers/media/v4l2-core/videobuf2-dma-contig.c |  4 +-
 drivers/rapidio/devices/rio_mport_cdev.c       |  4 +-
 include/linux/scatterlist.h                    | 17 +++--
 lib/scatterlist.c                              | 87 +++++++++++++++++++-------
 7 files changed, 126 insertions(+), 86 deletions(-)

-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] lib/scatterlist: Fix offset type in sg_alloc_table_from_pages
  2017-07-27  9:05 [PATCH 0/4] Userptr bo slab use optimization Tvrtko Ursulin
@ 2017-07-27  9:05   ` Tvrtko Ursulin
  2017-07-27  9:05   ` Tvrtko Ursulin
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-07-27  9:05 UTC (permalink / raw)
  To: Intel-gfx
  Cc: Ben Widawsky, Jason Ekstrand, Tvrtko Ursulin, Masahiro Yamada,
	Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Tomasz Stanislawski, Matt Porter, Alexandre Bounine, linux-media,
	linux-kernel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Scatterlist entries have an unsigned int for the offset so
correct the sg_alloc_table_from_pages function accordingly.

Since these are offsets withing a page, unsigned int is
wide enough.

Also converts callers which were using unsigned long locally
with the lower_32_bits annotation to make it explicitly
clear what is happening.

v2: Use offset_in_page. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Stanislawski <t.stanislaws@samsung.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: linux-media@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> (v1)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 4 ++--
 drivers/rapidio/devices/rio_mport_cdev.c       | 4 ++--
 include/linux/scatterlist.h                    | 2 +-
 lib/scatterlist.c                              | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 4f246d166111..2405077fdc71 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -479,7 +479,7 @@ static void *vb2_dc_get_userptr(struct device *dev, unsigned long vaddr,
 {
 	struct vb2_dc_buf *buf;
 	struct frame_vector *vec;
-	unsigned long offset;
+	unsigned int offset;
 	int n_pages, i;
 	int ret = 0;
 	struct sg_table *sgt;
@@ -507,7 +507,7 @@ static void *vb2_dc_get_userptr(struct device *dev, unsigned long vaddr,
 	buf->dev = dev;
 	buf->dma_dir = dma_dir;
 
-	offset = vaddr & ~PAGE_MASK;
+	offset = lower_32_bits(offset_in_page(vaddr));
 	vec = vb2_create_framevec(vaddr, size, dma_dir == DMA_FROM_DEVICE);
 	if (IS_ERR(vec)) {
 		ret = PTR_ERR(vec);
diff --git a/drivers/rapidio/devices/rio_mport_cdev.c b/drivers/rapidio/devices/rio_mport_cdev.c
index 5beb0c361076..5c1b6388122a 100644
--- a/drivers/rapidio/devices/rio_mport_cdev.c
+++ b/drivers/rapidio/devices/rio_mport_cdev.c
@@ -876,10 +876,10 @@ rio_dma_transfer(struct file *filp, u32 transfer_mode,
 	 * offset within the internal buffer specified by handle parameter.
 	 */
 	if (xfer->loc_addr) {
-		unsigned long offset;
+		unsigned int offset;
 		long pinned;
 
-		offset = (unsigned long)(uintptr_t)xfer->loc_addr & ~PAGE_MASK;
+		offset = lower_32_bits(offset_in_page(xfer->loc_addr));
 		nr_pages = PAGE_ALIGN(xfer->length + offset) >> PAGE_SHIFT;
 
 		page_list = kmalloc_array(nr_pages,
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 4b3286ac60c8..205aefb4ed93 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -263,7 +263,7 @@ int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int,
 int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
 int sg_alloc_table_from_pages(struct sg_table *sgt,
 	struct page **pages, unsigned int n_pages,
-	unsigned long offset, unsigned long size,
+	unsigned int offset, unsigned long size,
 	gfp_t gfp_mask);
 
 size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf,
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index be7b4dd6b68d..dee0c5004e2f 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -391,7 +391,7 @@ EXPORT_SYMBOL(sg_alloc_table);
  */
 int sg_alloc_table_from_pages(struct sg_table *sgt,
 	struct page **pages, unsigned int n_pages,
-	unsigned long offset, unsigned long size,
+	unsigned int offset, unsigned long size,
 	gfp_t gfp_mask)
 {
 	unsigned int chunks;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 1/4] lib/scatterlist: Fix offset type in sg_alloc_table_from_pages
@ 2017-07-27  9:05   ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-07-27  9:05 UTC (permalink / raw)
  To: Intel-gfx
  Cc: Tomasz Stanislawski, Pawel Osciak, linux-kernel, Masahiro Yamada,
	Kyungmin Park, Ben Widawsky, Matt Porter, linux-media,
	Alexandre Bounine, Marek Szyprowski

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Scatterlist entries have an unsigned int for the offset so
correct the sg_alloc_table_from_pages function accordingly.

Since these are offsets withing a page, unsigned int is
wide enough.

Also converts callers which were using unsigned long locally
with the lower_32_bits annotation to make it explicitly
clear what is happening.

v2: Use offset_in_page. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Stanislawski <t.stanislaws@samsung.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: linux-media@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> (v1)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 4 ++--
 drivers/rapidio/devices/rio_mport_cdev.c       | 4 ++--
 include/linux/scatterlist.h                    | 2 +-
 lib/scatterlist.c                              | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 4f246d166111..2405077fdc71 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -479,7 +479,7 @@ static void *vb2_dc_get_userptr(struct device *dev, unsigned long vaddr,
 {
 	struct vb2_dc_buf *buf;
 	struct frame_vector *vec;
-	unsigned long offset;
+	unsigned int offset;
 	int n_pages, i;
 	int ret = 0;
 	struct sg_table *sgt;
@@ -507,7 +507,7 @@ static void *vb2_dc_get_userptr(struct device *dev, unsigned long vaddr,
 	buf->dev = dev;
 	buf->dma_dir = dma_dir;
 
-	offset = vaddr & ~PAGE_MASK;
+	offset = lower_32_bits(offset_in_page(vaddr));
 	vec = vb2_create_framevec(vaddr, size, dma_dir == DMA_FROM_DEVICE);
 	if (IS_ERR(vec)) {
 		ret = PTR_ERR(vec);
diff --git a/drivers/rapidio/devices/rio_mport_cdev.c b/drivers/rapidio/devices/rio_mport_cdev.c
index 5beb0c361076..5c1b6388122a 100644
--- a/drivers/rapidio/devices/rio_mport_cdev.c
+++ b/drivers/rapidio/devices/rio_mport_cdev.c
@@ -876,10 +876,10 @@ rio_dma_transfer(struct file *filp, u32 transfer_mode,
 	 * offset within the internal buffer specified by handle parameter.
 	 */
 	if (xfer->loc_addr) {
-		unsigned long offset;
+		unsigned int offset;
 		long pinned;
 
-		offset = (unsigned long)(uintptr_t)xfer->loc_addr & ~PAGE_MASK;
+		offset = lower_32_bits(offset_in_page(xfer->loc_addr));
 		nr_pages = PAGE_ALIGN(xfer->length + offset) >> PAGE_SHIFT;
 
 		page_list = kmalloc_array(nr_pages,
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 4b3286ac60c8..205aefb4ed93 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -263,7 +263,7 @@ int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int,
 int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
 int sg_alloc_table_from_pages(struct sg_table *sgt,
 	struct page **pages, unsigned int n_pages,
-	unsigned long offset, unsigned long size,
+	unsigned int offset, unsigned long size,
 	gfp_t gfp_mask);
 
 size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf,
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index be7b4dd6b68d..dee0c5004e2f 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -391,7 +391,7 @@ EXPORT_SYMBOL(sg_alloc_table);
  */
 int sg_alloc_table_from_pages(struct sg_table *sgt,
 	struct page **pages, unsigned int n_pages,
-	unsigned long offset, unsigned long size,
+	unsigned int offset, unsigned long size,
 	gfp_t gfp_mask)
 {
 	unsigned int chunks;
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] lib/scatterlist: Avoid potential scatterlist entry overflow
  2017-07-27  9:05 [PATCH 0/4] Userptr bo slab use optimization Tvrtko Ursulin
@ 2017-07-27  9:05   ` Tvrtko Ursulin
  2017-07-27  9:05   ` Tvrtko Ursulin
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-07-27  9:05 UTC (permalink / raw)
  To: Intel-gfx
  Cc: Ben Widawsky, Jason Ekstrand, Tvrtko Ursulin, Masahiro Yamada,
	linux-kernel, Joonas Lahtinen, Andy Shevchenko

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Since the scatterlist length field is an unsigned int, make
sure that sg_alloc_table_from_pages does not overflow it while
coallescing pages to a single entry.

v2: Drop reference to future use. Use UINT_MAX.
v3: max_segment must be page aligned.
v4: Do not rely on compiler to optimise out the rounddown.
    (Joonas Lahtinen)
v5: Simplified loops and use post-increments rather than
    pre-increments. Use PAGE_MASK and fix comment typo.
    (Andy Shevchenko)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
---
 include/linux/scatterlist.h |  6 ++++++
 lib/scatterlist.c           | 31 ++++++++++++++++++++-----------
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 205aefb4ed93..6dd2ddbc6230 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -21,6 +21,12 @@ struct scatterlist {
 };
 
 /*
+ * Since the above length field is an unsigned int, below we define the maximum
+ * length in bytes that can be stored in one scatterlist entry.
+ */
+#define SCATTERLIST_MAX_SEGMENT (UINT_MAX & PAGE_MASK)
+
+/*
  * These macros should be used after a dma_map_sg call has been done
  * to get bus addresses of each of the SG entries and their lengths.
  * You should only work with the number of sg entries dma_map_sg
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index dee0c5004e2f..7b2e74da2c44 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -394,17 +394,22 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
 	unsigned int offset, unsigned long size,
 	gfp_t gfp_mask)
 {
-	unsigned int chunks;
-	unsigned int i;
-	unsigned int cur_page;
+	const unsigned int max_segment = SCATTERLIST_MAX_SEGMENT;
+	unsigned int chunks, cur_page, seg_len, i;
 	int ret;
 	struct scatterlist *s;
 
 	/* compute number of contiguous chunks */
 	chunks = 1;
-	for (i = 1; i < n_pages; ++i)
-		if (page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1)
-			++chunks;
+	seg_len = 0;
+	for (i = 1; i < n_pages; i++) {
+		seg_len += PAGE_SIZE;
+		if (seg_len >= max_segment ||
+		    page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1) {
+			chunks++;
+			seg_len = 0;
+		}
+	}
 
 	ret = sg_alloc_table(sgt, chunks, gfp_mask);
 	if (unlikely(ret))
@@ -413,17 +418,21 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
 	/* merging chunks and putting them into the scatterlist */
 	cur_page = 0;
 	for_each_sg(sgt->sgl, s, sgt->orig_nents, i) {
-		unsigned long chunk_size;
-		unsigned int j;
+		unsigned int j, chunk_size;
 
 		/* look for the end of the current chunk */
-		for (j = cur_page + 1; j < n_pages; ++j)
-			if (page_to_pfn(pages[j]) !=
+		seg_len = 0;
+		for (j = cur_page + 1; j < n_pages; j++) {
+			seg_len += PAGE_SIZE;
+			if (seg_len >= max_segment ||
+			    page_to_pfn(pages[j]) !=
 			    page_to_pfn(pages[j - 1]) + 1)
 				break;
+		}
 
 		chunk_size = ((j - cur_page) << PAGE_SHIFT) - offset;
-		sg_set_page(s, pages[cur_page], min(size, chunk_size), offset);
+		sg_set_page(s, pages[cur_page],
+			    min_t(unsigned long, size, chunk_size), offset);
 		size -= chunk_size;
 		offset = 0;
 		cur_page = j;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] lib/scatterlist: Avoid potential scatterlist entry overflow
@ 2017-07-27  9:05   ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-07-27  9:05 UTC (permalink / raw)
  To: Intel-gfx; +Cc: linux-kernel, Masahiro Yamada, Andy Shevchenko, Ben Widawsky

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Since the scatterlist length field is an unsigned int, make
sure that sg_alloc_table_from_pages does not overflow it while
coallescing pages to a single entry.

v2: Drop reference to future use. Use UINT_MAX.
v3: max_segment must be page aligned.
v4: Do not rely on compiler to optimise out the rounddown.
    (Joonas Lahtinen)
v5: Simplified loops and use post-increments rather than
    pre-increments. Use PAGE_MASK and fix comment typo.
    (Andy Shevchenko)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
---
 include/linux/scatterlist.h |  6 ++++++
 lib/scatterlist.c           | 31 ++++++++++++++++++++-----------
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 205aefb4ed93..6dd2ddbc6230 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -21,6 +21,12 @@ struct scatterlist {
 };
 
 /*
+ * Since the above length field is an unsigned int, below we define the maximum
+ * length in bytes that can be stored in one scatterlist entry.
+ */
+#define SCATTERLIST_MAX_SEGMENT (UINT_MAX & PAGE_MASK)
+
+/*
  * These macros should be used after a dma_map_sg call has been done
  * to get bus addresses of each of the SG entries and their lengths.
  * You should only work with the number of sg entries dma_map_sg
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index dee0c5004e2f..7b2e74da2c44 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -394,17 +394,22 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
 	unsigned int offset, unsigned long size,
 	gfp_t gfp_mask)
 {
-	unsigned int chunks;
-	unsigned int i;
-	unsigned int cur_page;
+	const unsigned int max_segment = SCATTERLIST_MAX_SEGMENT;
+	unsigned int chunks, cur_page, seg_len, i;
 	int ret;
 	struct scatterlist *s;
 
 	/* compute number of contiguous chunks */
 	chunks = 1;
-	for (i = 1; i < n_pages; ++i)
-		if (page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1)
-			++chunks;
+	seg_len = 0;
+	for (i = 1; i < n_pages; i++) {
+		seg_len += PAGE_SIZE;
+		if (seg_len >= max_segment ||
+		    page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1) {
+			chunks++;
+			seg_len = 0;
+		}
+	}
 
 	ret = sg_alloc_table(sgt, chunks, gfp_mask);
 	if (unlikely(ret))
@@ -413,17 +418,21 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
 	/* merging chunks and putting them into the scatterlist */
 	cur_page = 0;
 	for_each_sg(sgt->sgl, s, sgt->orig_nents, i) {
-		unsigned long chunk_size;
-		unsigned int j;
+		unsigned int j, chunk_size;
 
 		/* look for the end of the current chunk */
-		for (j = cur_page + 1; j < n_pages; ++j)
-			if (page_to_pfn(pages[j]) !=
+		seg_len = 0;
+		for (j = cur_page + 1; j < n_pages; j++) {
+			seg_len += PAGE_SIZE;
+			if (seg_len >= max_segment ||
+			    page_to_pfn(pages[j]) !=
 			    page_to_pfn(pages[j - 1]) + 1)
 				break;
+		}
 
 		chunk_size = ((j - cur_page) << PAGE_SHIFT) - offset;
-		sg_set_page(s, pages[cur_page], min(size, chunk_size), offset);
+		sg_set_page(s, pages[cur_page],
+			    min_t(unsigned long, size, chunk_size), offset);
 		size -= chunk_size;
 		offset = 0;
 		cur_page = j;
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] lib/scatterlist: Introduce and export __sg_alloc_table_from_pages
  2017-07-27  9:05 [PATCH 0/4] Userptr bo slab use optimization Tvrtko Ursulin
@ 2017-07-27  9:05   ` Tvrtko Ursulin
  2017-07-27  9:05   ` Tvrtko Ursulin
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-07-27  9:05 UTC (permalink / raw)
  To: Intel-gfx
  Cc: Ben Widawsky, Jason Ekstrand, Tvrtko Ursulin, Masahiro Yamada,
	linux-kernel, Chris Wilson, Joonas Lahtinen

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Drivers like i915 benefit from being able to control the maxium
size of the sg coallesced segment while building the scatter-
gather list.

Introduce and export the __sg_alloc_table_from_pages function
which will allow it that control.

v2: Reorder parameters. (Chris Wilson)
v3: Fix incomplete reordering in v2.
v4: max_segment needs to be page aligned.
v5: Rebase.
v6: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: linux-kernel@vger.kernel.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 include/linux/scatterlist.h | 11 +++++----
 lib/scatterlist.c           | 58 +++++++++++++++++++++++++++++++++++----------
 2 files changed, 52 insertions(+), 17 deletions(-)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 6dd2ddbc6230..874b50c232de 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -267,10 +267,13 @@ void sg_free_table(struct sg_table *);
 int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int,
 		     struct scatterlist *, gfp_t, sg_alloc_fn *);
 int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
-int sg_alloc_table_from_pages(struct sg_table *sgt,
-	struct page **pages, unsigned int n_pages,
-	unsigned int offset, unsigned long size,
-	gfp_t gfp_mask);
+int __sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
+				unsigned int n_pages, unsigned int offset,
+				unsigned long size, unsigned int max_segment,
+				gfp_t gfp_mask);
+int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
+			      unsigned int n_pages, unsigned int offset,
+			      unsigned long size, gfp_t gfp_mask);
 
 size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf,
 		      size_t buflen, off_t skip, bool to_buffer);
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 7b2e74da2c44..1a5900f9a057 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -370,14 +370,15 @@ int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask)
 EXPORT_SYMBOL(sg_alloc_table);
 
 /**
- * sg_alloc_table_from_pages - Allocate and initialize an sg table from
- *			       an array of pages
- * @sgt:	The sg table header to use
- * @pages:	Pointer to an array of page pointers
- * @n_pages:	Number of pages in the pages array
- * @offset:     Offset from start of the first page to the start of a buffer
- * @size:       Number of valid bytes in the buffer (after offset)
- * @gfp_mask:	GFP allocation mask
+ * __sg_alloc_table_from_pages - Allocate and initialize an sg table from
+ *			         an array of pages
+ * @sgt:	 The sg table header to use
+ * @pages:	 Pointer to an array of page pointers
+ * @n_pages:	 Number of pages in the pages array
+ * @offset:      Offset from start of the first page to the start of a buffer
+ * @size:        Number of valid bytes in the buffer (after offset)
+ * @max_segment: Maximum size of a scatterlist node in bytes (page aligned)
+ * @gfp_mask:	 GFP allocation mask
  *
  *  Description:
  *    Allocate and initialize an sg table from a list of pages. Contiguous
@@ -389,16 +390,18 @@ EXPORT_SYMBOL(sg_alloc_table);
  * Returns:
  *   0 on success, negative error on failure
  */
-int sg_alloc_table_from_pages(struct sg_table *sgt,
-	struct page **pages, unsigned int n_pages,
-	unsigned int offset, unsigned long size,
-	gfp_t gfp_mask)
+int __sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
+				unsigned int n_pages, unsigned int offset,
+				unsigned long size, unsigned int max_segment,
+				gfp_t gfp_mask)
 {
-	const unsigned int max_segment = SCATTERLIST_MAX_SEGMENT;
 	unsigned int chunks, cur_page, seg_len, i;
 	int ret;
 	struct scatterlist *s;
 
+	if (WARN_ON(!max_segment || offset_in_page(max_segment)))
+		return -EINVAL;
+
 	/* compute number of contiguous chunks */
 	chunks = 1;
 	seg_len = 0;
@@ -440,6 +443,35 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
 
 	return 0;
 }
+EXPORT_SYMBOL(__sg_alloc_table_from_pages);
+
+/**
+ * sg_alloc_table_from_pages - Allocate and initialize an sg table from
+ *			       an array of pages
+ * @sgt:	 The sg table header to use
+ * @pages:	 Pointer to an array of page pointers
+ * @n_pages:	 Number of pages in the pages array
+ * @offset:      Offset from start of the first page to the start of a buffer
+ * @size:        Number of valid bytes in the buffer (after offset)
+ * @gfp_mask:	 GFP allocation mask
+ *
+ *  Description:
+ *    Allocate and initialize an sg table from a list of pages. Contiguous
+ *    ranges of the pages are squashed into a single scatterlist node. A user
+ *    may provide an offset at a start and a size of valid data in a buffer
+ *    specified by the page array. The returned sg table is released by
+ *    sg_free_table.
+ *
+ * Returns:
+ *   0 on success, negative error on failure
+ */
+int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
+			      unsigned int n_pages, unsigned int offset,
+			      unsigned long size, gfp_t gfp_mask)
+{
+	return __sg_alloc_table_from_pages(sgt, pages, n_pages, offset, size,
+					   SCATTERLIST_MAX_SEGMENT, gfp_mask);
+}
 EXPORT_SYMBOL(sg_alloc_table_from_pages);
 
 void __sg_page_iter_start(struct sg_page_iter *piter,
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] lib/scatterlist: Introduce and export __sg_alloc_table_from_pages
@ 2017-07-27  9:05   ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-07-27  9:05 UTC (permalink / raw)
  To: Intel-gfx; +Cc: linux-kernel, Masahiro Yamada, Ben Widawsky

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Drivers like i915 benefit from being able to control the maxium
size of the sg coallesced segment while building the scatter-
gather list.

Introduce and export the __sg_alloc_table_from_pages function
which will allow it that control.

v2: Reorder parameters. (Chris Wilson)
v3: Fix incomplete reordering in v2.
v4: max_segment needs to be page aligned.
v5: Rebase.
v6: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: linux-kernel@vger.kernel.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 include/linux/scatterlist.h | 11 +++++----
 lib/scatterlist.c           | 58 +++++++++++++++++++++++++++++++++++----------
 2 files changed, 52 insertions(+), 17 deletions(-)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 6dd2ddbc6230..874b50c232de 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -267,10 +267,13 @@ void sg_free_table(struct sg_table *);
 int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int,
 		     struct scatterlist *, gfp_t, sg_alloc_fn *);
 int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
-int sg_alloc_table_from_pages(struct sg_table *sgt,
-	struct page **pages, unsigned int n_pages,
-	unsigned int offset, unsigned long size,
-	gfp_t gfp_mask);
+int __sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
+				unsigned int n_pages, unsigned int offset,
+				unsigned long size, unsigned int max_segment,
+				gfp_t gfp_mask);
+int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
+			      unsigned int n_pages, unsigned int offset,
+			      unsigned long size, gfp_t gfp_mask);
 
 size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf,
 		      size_t buflen, off_t skip, bool to_buffer);
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 7b2e74da2c44..1a5900f9a057 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -370,14 +370,15 @@ int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask)
 EXPORT_SYMBOL(sg_alloc_table);
 
 /**
- * sg_alloc_table_from_pages - Allocate and initialize an sg table from
- *			       an array of pages
- * @sgt:	The sg table header to use
- * @pages:	Pointer to an array of page pointers
- * @n_pages:	Number of pages in the pages array
- * @offset:     Offset from start of the first page to the start of a buffer
- * @size:       Number of valid bytes in the buffer (after offset)
- * @gfp_mask:	GFP allocation mask
+ * __sg_alloc_table_from_pages - Allocate and initialize an sg table from
+ *			         an array of pages
+ * @sgt:	 The sg table header to use
+ * @pages:	 Pointer to an array of page pointers
+ * @n_pages:	 Number of pages in the pages array
+ * @offset:      Offset from start of the first page to the start of a buffer
+ * @size:        Number of valid bytes in the buffer (after offset)
+ * @max_segment: Maximum size of a scatterlist node in bytes (page aligned)
+ * @gfp_mask:	 GFP allocation mask
  *
  *  Description:
  *    Allocate and initialize an sg table from a list of pages. Contiguous
@@ -389,16 +390,18 @@ EXPORT_SYMBOL(sg_alloc_table);
  * Returns:
  *   0 on success, negative error on failure
  */
-int sg_alloc_table_from_pages(struct sg_table *sgt,
-	struct page **pages, unsigned int n_pages,
-	unsigned int offset, unsigned long size,
-	gfp_t gfp_mask)
+int __sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
+				unsigned int n_pages, unsigned int offset,
+				unsigned long size, unsigned int max_segment,
+				gfp_t gfp_mask)
 {
-	const unsigned int max_segment = SCATTERLIST_MAX_SEGMENT;
 	unsigned int chunks, cur_page, seg_len, i;
 	int ret;
 	struct scatterlist *s;
 
+	if (WARN_ON(!max_segment || offset_in_page(max_segment)))
+		return -EINVAL;
+
 	/* compute number of contiguous chunks */
 	chunks = 1;
 	seg_len = 0;
@@ -440,6 +443,35 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
 
 	return 0;
 }
+EXPORT_SYMBOL(__sg_alloc_table_from_pages);
+
+/**
+ * sg_alloc_table_from_pages - Allocate and initialize an sg table from
+ *			       an array of pages
+ * @sgt:	 The sg table header to use
+ * @pages:	 Pointer to an array of page pointers
+ * @n_pages:	 Number of pages in the pages array
+ * @offset:      Offset from start of the first page to the start of a buffer
+ * @size:        Number of valid bytes in the buffer (after offset)
+ * @gfp_mask:	 GFP allocation mask
+ *
+ *  Description:
+ *    Allocate and initialize an sg table from a list of pages. Contiguous
+ *    ranges of the pages are squashed into a single scatterlist node. A user
+ *    may provide an offset at a start and a size of valid data in a buffer
+ *    specified by the page array. The returned sg table is released by
+ *    sg_free_table.
+ *
+ * Returns:
+ *   0 on success, negative error on failure
+ */
+int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
+			      unsigned int n_pages, unsigned int offset,
+			      unsigned long size, gfp_t gfp_mask)
+{
+	return __sg_alloc_table_from_pages(sgt, pages, n_pages, offset, size,
+					   SCATTERLIST_MAX_SEGMENT, gfp_mask);
+}
 EXPORT_SYMBOL(sg_alloc_table_from_pages);
 
 void __sg_page_iter_start(struct sg_page_iter *piter,
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] drm/i915: Use __sg_alloc_table_from_pages for userptr allocations
  2017-07-27  9:05 [PATCH 0/4] Userptr bo slab use optimization Tvrtko Ursulin
@ 2017-07-27  9:05   ` Tvrtko Ursulin
  2017-07-27  9:05   ` Tvrtko Ursulin
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-07-27  9:05 UTC (permalink / raw)
  To: Intel-gfx
  Cc: Ben Widawsky, Jason Ekstrand, Tvrtko Ursulin, Chris Wilson,
	linux-kernel, Joonas Lahtinen

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

With the addition of __sg_alloc_table_from_pages we can control
the maximum coallescing size and eliminate a separate path for
allocating backing store here.

Similar to 871dfbd67d4e ("drm/i915: Allow compaction upto
SWIOTLB max segment size") this enables more compact sg lists to
be created and so has a beneficial effect on workloads with many
and/or large objects of this class.

v2:
 * Rename helper to i915_sg_segment_size and fix swiotlb override.
 * Commit message update.

v3:
 * Actually include the swiotlb override fix.

v4:
 * Regroup parameters a bit. (Chris Wilson)

v5:
 * Rebase for swiotlb_max_segment.
 * Add DMA map failure handling as in abb0deacb5a6
   ("drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping").

v6: Handle swiotlb_max_segment() returning 1. (Joonas Lahtinen)

v7: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 15 +++++++
 drivers/gpu/drm/i915/i915_gem.c         |  6 +--
 drivers/gpu/drm/i915/i915_gem_userptr.c | 79 ++++++++++++---------------------
 3 files changed, 45 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2c7456f4ed38..6383940e8d55 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2749,6 +2749,21 @@ static inline struct scatterlist *__sg_next(struct scatterlist *sg)
 	     (((__iter).curr += PAGE_SIZE) < (__iter).max) ||		\
 	     ((__iter) = __sgt_iter(__sg_next((__iter).sgp), false), 0))
 
+static inline unsigned int i915_sg_segment_size(void)
+{
+	unsigned int size = swiotlb_max_segment();
+
+	if (size == 0)
+		return SCATTERLIST_MAX_SEGMENT;
+
+	size = rounddown(size, PAGE_SIZE);
+	/* swiotlb_max_segment_size can return 1 byte when it means one page. */
+	if (size < PAGE_SIZE)
+		size = PAGE_SIZE;
+
+	return size;
+}
+
 static inline const struct intel_device_info *
 intel_info(const struct drm_i915_private *dev_priv)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6faabf34f142..a60885d6231b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2339,7 +2339,7 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	struct sgt_iter sgt_iter;
 	struct page *page;
 	unsigned long last_pfn = 0;	/* suppress gcc warning */
-	unsigned int max_segment;
+	unsigned int max_segment = i915_sg_segment_size();
 	gfp_t noreclaim;
 	int ret;
 
@@ -2350,10 +2350,6 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	GEM_BUG_ON(obj->base.read_domains & I915_GEM_GPU_DOMAINS);
 	GEM_BUG_ON(obj->base.write_domain & I915_GEM_GPU_DOMAINS);
 
-	max_segment = swiotlb_max_segment();
-	if (!max_segment)
-		max_segment = rounddown(UINT_MAX, PAGE_SIZE);
-
 	st = kmalloc(sizeof(*st), GFP_KERNEL);
 	if (st == NULL)
 		return ERR_PTR(-ENOMEM);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index ccd09e8419f5..60c10d4118ae 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -399,64 +399,42 @@ struct get_pages_work {
 	struct task_struct *task;
 };
 
-#if IS_ENABLED(CONFIG_SWIOTLB)
-#define swiotlb_active() swiotlb_nr_tbl()
-#else
-#define swiotlb_active() 0
-#endif
-
-static int
-st_set_pages(struct sg_table **st, struct page **pvec, int num_pages)
-{
-	struct scatterlist *sg;
-	int ret, n;
-
-	*st = kmalloc(sizeof(**st), GFP_KERNEL);
-	if (*st == NULL)
-		return -ENOMEM;
-
-	if (swiotlb_active()) {
-		ret = sg_alloc_table(*st, num_pages, GFP_KERNEL);
-		if (ret)
-			goto err;
-
-		for_each_sg((*st)->sgl, sg, num_pages, n)
-			sg_set_page(sg, pvec[n], PAGE_SIZE, 0);
-	} else {
-		ret = sg_alloc_table_from_pages(*st, pvec, num_pages,
-						0, num_pages << PAGE_SHIFT,
-						GFP_KERNEL);
-		if (ret)
-			goto err;
-	}
-
-	return 0;
-
-err:
-	kfree(*st);
-	*st = NULL;
-	return ret;
-}
-
 static struct sg_table *
-__i915_gem_userptr_set_pages(struct drm_i915_gem_object *obj,
-			     struct page **pvec, int num_pages)
+__i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj,
+			       struct page **pvec, int num_pages)
 {
-	struct sg_table *pages;
+	unsigned int max_segment = i915_sg_segment_size();
+	struct sg_table *st;
 	int ret;
 
-	ret = st_set_pages(&pages, pvec, num_pages);
-	if (ret)
+	st = kmalloc(sizeof(*st), GFP_KERNEL);
+	if (!st)
+		return ERR_PTR(-ENOMEM);
+
+alloc_table:
+	ret = __sg_alloc_table_from_pages(st, pvec, num_pages,
+					  0, num_pages << PAGE_SHIFT,
+					  max_segment,
+					  GFP_KERNEL);
+	if (ret) {
+		kfree(st);
 		return ERR_PTR(ret);
+	}
 
-	ret = i915_gem_gtt_prepare_pages(obj, pages);
+	ret = i915_gem_gtt_prepare_pages(obj, st);
 	if (ret) {
-		sg_free_table(pages);
-		kfree(pages);
+		sg_free_table(st);
+
+		if (max_segment > PAGE_SIZE) {
+			max_segment = PAGE_SIZE;
+			goto alloc_table;
+		}
+
+		kfree(st);
 		return ERR_PTR(ret);
 	}
 
-	return pages;
+	return st;
 }
 
 static int
@@ -540,7 +518,8 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 		struct sg_table *pages = ERR_PTR(ret);
 
 		if (pinned == npages) {
-			pages = __i915_gem_userptr_set_pages(obj, pvec, npages);
+			pages = __i915_gem_userptr_alloc_pages(obj, pvec,
+							       npages);
 			if (!IS_ERR(pages)) {
 				__i915_gem_object_set_pages(obj, pages);
 				pinned = 0;
@@ -661,7 +640,7 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
 		pages = __i915_gem_userptr_get_pages_schedule(obj);
 		active = pages == ERR_PTR(-EAGAIN);
 	} else {
-		pages = __i915_gem_userptr_set_pages(obj, pvec, num_pages);
+		pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages);
 		active = !IS_ERR(pages);
 	}
 	if (active)
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] drm/i915: Use __sg_alloc_table_from_pages for userptr allocations
@ 2017-07-27  9:05   ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-07-27  9:05 UTC (permalink / raw)
  To: Intel-gfx; +Cc: linux-kernel, Ben Widawsky

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

With the addition of __sg_alloc_table_from_pages we can control
the maximum coallescing size and eliminate a separate path for
allocating backing store here.

Similar to 871dfbd67d4e ("drm/i915: Allow compaction upto
SWIOTLB max segment size") this enables more compact sg lists to
be created and so has a beneficial effect on workloads with many
and/or large objects of this class.

v2:
 * Rename helper to i915_sg_segment_size and fix swiotlb override.
 * Commit message update.

v3:
 * Actually include the swiotlb override fix.

v4:
 * Regroup parameters a bit. (Chris Wilson)

v5:
 * Rebase for swiotlb_max_segment.
 * Add DMA map failure handling as in abb0deacb5a6
   ("drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping").

v6: Handle swiotlb_max_segment() returning 1. (Joonas Lahtinen)

v7: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 15 +++++++
 drivers/gpu/drm/i915/i915_gem.c         |  6 +--
 drivers/gpu/drm/i915/i915_gem_userptr.c | 79 ++++++++++++---------------------
 3 files changed, 45 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2c7456f4ed38..6383940e8d55 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2749,6 +2749,21 @@ static inline struct scatterlist *__sg_next(struct scatterlist *sg)
 	     (((__iter).curr += PAGE_SIZE) < (__iter).max) ||		\
 	     ((__iter) = __sgt_iter(__sg_next((__iter).sgp), false), 0))
 
+static inline unsigned int i915_sg_segment_size(void)
+{
+	unsigned int size = swiotlb_max_segment();
+
+	if (size == 0)
+		return SCATTERLIST_MAX_SEGMENT;
+
+	size = rounddown(size, PAGE_SIZE);
+	/* swiotlb_max_segment_size can return 1 byte when it means one page. */
+	if (size < PAGE_SIZE)
+		size = PAGE_SIZE;
+
+	return size;
+}
+
 static inline const struct intel_device_info *
 intel_info(const struct drm_i915_private *dev_priv)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6faabf34f142..a60885d6231b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2339,7 +2339,7 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	struct sgt_iter sgt_iter;
 	struct page *page;
 	unsigned long last_pfn = 0;	/* suppress gcc warning */
-	unsigned int max_segment;
+	unsigned int max_segment = i915_sg_segment_size();
 	gfp_t noreclaim;
 	int ret;
 
@@ -2350,10 +2350,6 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 	GEM_BUG_ON(obj->base.read_domains & I915_GEM_GPU_DOMAINS);
 	GEM_BUG_ON(obj->base.write_domain & I915_GEM_GPU_DOMAINS);
 
-	max_segment = swiotlb_max_segment();
-	if (!max_segment)
-		max_segment = rounddown(UINT_MAX, PAGE_SIZE);
-
 	st = kmalloc(sizeof(*st), GFP_KERNEL);
 	if (st == NULL)
 		return ERR_PTR(-ENOMEM);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index ccd09e8419f5..60c10d4118ae 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -399,64 +399,42 @@ struct get_pages_work {
 	struct task_struct *task;
 };
 
-#if IS_ENABLED(CONFIG_SWIOTLB)
-#define swiotlb_active() swiotlb_nr_tbl()
-#else
-#define swiotlb_active() 0
-#endif
-
-static int
-st_set_pages(struct sg_table **st, struct page **pvec, int num_pages)
-{
-	struct scatterlist *sg;
-	int ret, n;
-
-	*st = kmalloc(sizeof(**st), GFP_KERNEL);
-	if (*st == NULL)
-		return -ENOMEM;
-
-	if (swiotlb_active()) {
-		ret = sg_alloc_table(*st, num_pages, GFP_KERNEL);
-		if (ret)
-			goto err;
-
-		for_each_sg((*st)->sgl, sg, num_pages, n)
-			sg_set_page(sg, pvec[n], PAGE_SIZE, 0);
-	} else {
-		ret = sg_alloc_table_from_pages(*st, pvec, num_pages,
-						0, num_pages << PAGE_SHIFT,
-						GFP_KERNEL);
-		if (ret)
-			goto err;
-	}
-
-	return 0;
-
-err:
-	kfree(*st);
-	*st = NULL;
-	return ret;
-}
-
 static struct sg_table *
-__i915_gem_userptr_set_pages(struct drm_i915_gem_object *obj,
-			     struct page **pvec, int num_pages)
+__i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj,
+			       struct page **pvec, int num_pages)
 {
-	struct sg_table *pages;
+	unsigned int max_segment = i915_sg_segment_size();
+	struct sg_table *st;
 	int ret;
 
-	ret = st_set_pages(&pages, pvec, num_pages);
-	if (ret)
+	st = kmalloc(sizeof(*st), GFP_KERNEL);
+	if (!st)
+		return ERR_PTR(-ENOMEM);
+
+alloc_table:
+	ret = __sg_alloc_table_from_pages(st, pvec, num_pages,
+					  0, num_pages << PAGE_SHIFT,
+					  max_segment,
+					  GFP_KERNEL);
+	if (ret) {
+		kfree(st);
 		return ERR_PTR(ret);
+	}
 
-	ret = i915_gem_gtt_prepare_pages(obj, pages);
+	ret = i915_gem_gtt_prepare_pages(obj, st);
 	if (ret) {
-		sg_free_table(pages);
-		kfree(pages);
+		sg_free_table(st);
+
+		if (max_segment > PAGE_SIZE) {
+			max_segment = PAGE_SIZE;
+			goto alloc_table;
+		}
+
+		kfree(st);
 		return ERR_PTR(ret);
 	}
 
-	return pages;
+	return st;
 }
 
 static int
@@ -540,7 +518,8 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 		struct sg_table *pages = ERR_PTR(ret);
 
 		if (pinned == npages) {
-			pages = __i915_gem_userptr_set_pages(obj, pvec, npages);
+			pages = __i915_gem_userptr_alloc_pages(obj, pvec,
+							       npages);
 			if (!IS_ERR(pages)) {
 				__i915_gem_object_set_pages(obj, pages);
 				pinned = 0;
@@ -661,7 +640,7 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
 		pages = __i915_gem_userptr_get_pages_schedule(obj);
 		active = pages == ERR_PTR(-EAGAIN);
 	} else {
-		pages = __i915_gem_userptr_set_pages(obj, pvec, num_pages);
+		pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages);
 		active = !IS_ERR(pages);
 	}
 	if (active)
-- 
2.9.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] Userptr bo slab use optimization
  2017-07-27  9:05 [PATCH 0/4] Userptr bo slab use optimization Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2017-07-27  9:05   ` Tvrtko Ursulin
@ 2017-07-27  9:25 ` Chris Wilson
  2017-07-27 10:46   ` Tvrtko Ursulin
  2017-08-01 18:16   ` Ben Widawsky
  2017-07-27  9:39 ` ✓ Fi.CI.BAT: success for " Patchwork
  5 siblings, 2 replies; 22+ messages in thread
From: Chris Wilson @ 2017-07-27  9:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx; +Cc: Ben Widawsky

Quoting Tvrtko Ursulin (2017-07-27 10:05:00)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Yet another attempt to get this series reviewed and merged...
> 
> I've heard Vulkan might be creating a lot of userptr objects so might be
> interesting to check what benefit it brings to those use cases.

Optimist :) My thinking is that this should only impact get_pages ->
vma_bind, which is supposed a rare operation, and if should happen as
part of the steady state that we have too many sg in a chain is just one
of the myriad little paper cuts :)

> As an introduction, this allows i915 to create fewer sg table entries for the bo
> backing store representation. As such it primarily saves kernel slab memory.
> 
> When we added this optimisation to normal i915 bos, the savings were as far as
> I remember around 1-2MiB of slab after booting to KDE desktop, and 2-4Mib on
> neverball (game) main screen (or maybe it was while playing).

I think we also want to think about the aspect where we are creating
objects of multiple 1G huge pages, so we are going to run into the sg
limits very quickly.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* ✓ Fi.CI.BAT: success for Userptr bo slab use optimization
  2017-07-27  9:05 [PATCH 0/4] Userptr bo slab use optimization Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2017-07-27  9:25 ` [PATCH 0/4] Userptr bo slab use optimization Chris Wilson
@ 2017-07-27  9:39 ` Patchwork
  5 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2017-07-27  9:39 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Userptr bo slab use optimization
URL   : https://patchwork.freedesktop.org/series/27976/
State : success

== Summary ==

Series 27976v1 Userptr bo slab use optimization
https://patchwork.freedesktop.org/api/1.0/series/27976/revisions/1/mbox/

Test gem_ringfill:
        Subgroup basic-default:
                pass       -> SKIP       (fi-bsw-n3050) fdo#101915
Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-legacy:
                fail       -> PASS       (fi-snb-2600) fdo#100215
Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-a:
                dmesg-warn -> PASS       (fi-pnv-d510) fdo#101597
        Subgroup suspend-read-crc-pipe-b:
                pass       -> DMESG-WARN (fi-byt-j1900) fdo#101705

fdo#101915 https://bugs.freedesktop.org/show_bug.cgi?id=101915
fdo#100215 https://bugs.freedesktop.org/show_bug.cgi?id=100215
fdo#101597 https://bugs.freedesktop.org/show_bug.cgi?id=101597
fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705

fi-bdw-5557u     total:280  pass:269  dwarn:0   dfail:0   fail:0   skip:11  time:446s
fi-blb-e6850     total:280  pass:225  dwarn:1   dfail:0   fail:0   skip:54  time:363s
fi-bsw-n3050     total:280  pass:243  dwarn:0   dfail:0   fail:0   skip:37  time:532s
fi-bxt-j4205     total:280  pass:261  dwarn:0   dfail:0   fail:0   skip:19  time:514s
fi-byt-j1900     total:280  pass:255  dwarn:1   dfail:0   fail:0   skip:24  time:492s
fi-byt-n2820     total:280  pass:251  dwarn:1   dfail:0   fail:0   skip:28  time:490s
fi-glk-2a        total:280  pass:261  dwarn:0   dfail:0   fail:0   skip:19  time:609s
fi-hsw-4770      total:280  pass:264  dwarn:0   dfail:0   fail:0   skip:16  time:435s
fi-hsw-4770r     total:280  pass:264  dwarn:0   dfail:0   fail:0   skip:16  time:422s
fi-ilk-650       total:280  pass:230  dwarn:0   dfail:0   fail:0   skip:50  time:409s
fi-ivb-3520m     total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:500s
fi-ivb-3770      total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:468s
fi-kbl-7500u     total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:470s
fi-kbl-7560u     total:280  pass:270  dwarn:0   dfail:0   fail:0   skip:10  time:578s
fi-kbl-r         total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:590s
fi-pnv-d510      total:280  pass:224  dwarn:1   dfail:0   fail:0   skip:55  time:563s
fi-skl-6260u     total:280  pass:270  dwarn:0   dfail:0   fail:0   skip:10  time:465s
fi-skl-6700hq    total:280  pass:263  dwarn:0   dfail:0   fail:0   skip:17  time:595s
fi-skl-6700k     total:280  pass:262  dwarn:0   dfail:0   fail:0   skip:18  time:470s
fi-skl-6770hq    total:280  pass:270  dwarn:0   dfail:0   fail:0   skip:10  time:485s
fi-skl-x1585l    total:280  pass:269  dwarn:0   dfail:0   fail:0   skip:11  time:475s
fi-snb-2520m     total:280  pass:252  dwarn:0   dfail:0   fail:0   skip:28  time:549s
fi-snb-2600      total:280  pass:251  dwarn:0   dfail:0   fail:0   skip:29  time:399s

94853c83714c98a0ade6379eecb4ad06cab04e22 drm-tip: 2017y-07m-27d-08h-06m-43s UTC integration manifest
bfb51a1c621f drm/i915: Use __sg_alloc_table_from_pages for userptr allocations
920029df94a0 lib/scatterlist: Introduce and export __sg_alloc_table_from_pages
fda66408c735 lib/scatterlist: Avoid potential scatterlist entry overflow
8df0c934790a lib/scatterlist: Fix offset type in sg_alloc_table_from_pages

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5289/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] Userptr bo slab use optimization
  2017-07-27  9:25 ` [PATCH 0/4] Userptr bo slab use optimization Chris Wilson
@ 2017-07-27 10:46   ` Tvrtko Ursulin
  2017-07-27 10:53     ` Chris Wilson
  2017-08-01 18:16   ` Ben Widawsky
  1 sibling, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-07-27 10:46 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx; +Cc: Ben Widawsky


On 27/07/2017 10:25, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2017-07-27 10:05:00)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Yet another attempt to get this series reviewed and merged...
>>
>> I've heard Vulkan might be creating a lot of userptr objects so might be
>> interesting to check what benefit it brings to those use cases.
> 
> Optimist :) My thinking is that this should only impact get_pages ->
> vma_bind, which is supposed a rare operation, and if should happen as
> part of the steady state that we have too many sg in a chain is just one
> of the myriad little paper cuts :)

I did not try to sell any performance benefits. There might be some 
micro (pico?) ones due less walking and/or smaller memory footprint. But 
slab reduction is the main point. It's not a big one but why not do it 
if we can. And it also makes the userptr consistent with out other bos 
in this respect. And is simpler code in i915_gem_userptr.c.

>> As an introduction, this allows i915 to create fewer sg table entries for the bo
>> backing store representation. As such it primarily saves kernel slab memory.
>>
>> When we added this optimisation to normal i915 bos, the savings were as far as
>> I remember around 1-2MiB of slab after booting to KDE desktop, and 2-4Mib on
>> neverball (game) main screen (or maybe it was while playing).
> 
> I think we also want to think about the aspect where we are creating
> objects of multiple 1G huge pages, so we are going to run into the sg
> limits very quickly.

You mean changing the core struct to allow larger chunks? Haven't the 
core kernel people already rolled their eyes on our sg table misuse? :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] Userptr bo slab use optimization
  2017-07-27 10:46   ` Tvrtko Ursulin
@ 2017-07-27 10:53     ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-07-27 10:53 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, Intel-gfx; +Cc: Ben Widawsky

Quoting Tvrtko Ursulin (2017-07-27 11:46:03)
> 
> On 27/07/2017 10:25, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2017-07-27 10:05:00)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Yet another attempt to get this series reviewed and merged...
> >>
> >> I've heard Vulkan might be creating a lot of userptr objects so might be
> >> interesting to check what benefit it brings to those use cases.
> > 
> > Optimist :) My thinking is that this should only impact get_pages ->
> > vma_bind, which is supposed a rare operation, and if should happen as
> > part of the steady state that we have too many sg in a chain is just one
> > of the myriad little paper cuts :)
> 
> I did not try to sell any performance benefits. There might be some 
> micro (pico?) ones due less walking and/or smaller memory footprint. But 
> slab reduction is the main point. It's not a big one but why not do it 
> if we can. And it also makes the userptr consistent with out other bos 
> in this respect. And is simpler code in i915_gem_userptr.c.

It's definitely beneficial, no doubt. Just a new minor user isn't all
that exciting ;)

> >> As an introduction, this allows i915 to create fewer sg table entries for the bo
> >> backing store representation. As such it primarily saves kernel slab memory.
> >>
> >> When we added this optimisation to normal i915 bos, the savings were as far as
> >> I remember around 1-2MiB of slab after booting to KDE desktop, and 2-4Mib on
> >> neverball (game) main screen (or maybe it was while playing).
> > 
> > I think we also want to think about the aspect where we are creating
> > objects of multiple 1G huge pages, so we are going to run into the sg
> > limits very quickly.
> 
> You mean changing the core struct to allow larger chunks? Haven't the 
> core kernel people already rolled their eyes on our sg table misuse? :)

They fortunately haven't seen ours... But yes Linus did rant about doing
2+G of dma as being wrong... Too bad. It doesn't stop it from being the
interface into dma remapping etc. :|
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 2/4] lib/scatterlist: Avoid potential scatterlist entry overflow
  2017-07-27  9:05   ` Tvrtko Ursulin
  (?)
@ 2017-07-28 10:53   ` Chris Wilson
  -1 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-07-28 10:53 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx
  Cc: linux-kernel, Masahiro Yamada, Andy Shevchenko, Ben Widawsky

Quoting Tvrtko Ursulin (2017-07-27 10:05:02)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Since the scatterlist length field is an unsigned int, make
> sure that sg_alloc_table_from_pages does not overflow it while
> coallescing pages to a single entry.
> 
> v2: Drop reference to future use. Use UINT_MAX.
> v3: max_segment must be page aligned.
> v4: Do not rely on compiler to optimise out the rounddown.
>     (Joonas Lahtinen)
> v5: Simplified loops and use post-increments rather than
>     pre-increments. Use PAGE_MASK and fix comment typo.
>     (Andy Shevchenko)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
> ---
>  include/linux/scatterlist.h |  6 ++++++
>  lib/scatterlist.c           | 31 ++++++++++++++++++++-----------
>  2 files changed, 26 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 205aefb4ed93..6dd2ddbc6230 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -21,6 +21,12 @@ struct scatterlist {
>  };
>  
>  /*
> + * Since the above length field is an unsigned int, below we define the maximum
> + * length in bytes that can be stored in one scatterlist entry.
> + */
> +#define SCATTERLIST_MAX_SEGMENT (UINT_MAX & PAGE_MASK)
> +
> +/*
>   * These macros should be used after a dma_map_sg call has been done
>   * to get bus addresses of each of the SG entries and their lengths.
>   * You should only work with the number of sg entries dma_map_sg
> diff --git a/lib/scatterlist.c b/lib/scatterlist.c
> index dee0c5004e2f..7b2e74da2c44 100644
> --- a/lib/scatterlist.c
> +++ b/lib/scatterlist.c
> @@ -394,17 +394,22 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
>         unsigned int offset, unsigned long size,
>         gfp_t gfp_mask)
>  {
> -       unsigned int chunks;
> -       unsigned int i;
> -       unsigned int cur_page;
> +       const unsigned int max_segment = SCATTERLIST_MAX_SEGMENT;
> +       unsigned int chunks, cur_page, seg_len, i;
>         int ret;
>         struct scatterlist *s;
>  
>         /* compute number of contiguous chunks */
>         chunks = 1;
> -       for (i = 1; i < n_pages; ++i)
> -               if (page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1)
> -                       ++chunks;
> +       seg_len = 0;
> +       for (i = 1; i < n_pages; i++) {
> +               seg_len += PAGE_SIZE;
> +               if (seg_len >= max_segment ||
> +                   page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1) {
> +                       chunks++;
> +                       seg_len = 0;
> +               }
> +       }

Ok. Took a moment to realise that it works correctly for a chunk on last
page.

>         ret = sg_alloc_table(sgt, chunks, gfp_mask);
>         if (unlikely(ret))
> @@ -413,17 +418,21 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
>         /* merging chunks and putting them into the scatterlist */
>         cur_page = 0;
>         for_each_sg(sgt->sgl, s, sgt->orig_nents, i) {
> -               unsigned long chunk_size;
> -               unsigned int j;
> +               unsigned int j, chunk_size;
>  
>                 /* look for the end of the current chunk */
> -               for (j = cur_page + 1; j < n_pages; ++j)
> -                       if (page_to_pfn(pages[j]) !=
> +               seg_len = 0;
> +               for (j = cur_page + 1; j < n_pages; j++) {
> +                       seg_len += PAGE_SIZE;
> +                       if (seg_len >= max_segment ||
> +                           page_to_pfn(pages[j]) !=
>                             page_to_pfn(pages[j - 1]) + 1)
>                                 break;
> +               }

Ok.

>  
>                 chunk_size = ((j - cur_page) << PAGE_SHIFT) - offset;
> -               sg_set_page(s, pages[cur_page], min(size, chunk_size), offset);
> +               sg_set_page(s, pages[cur_page],
> +                           min_t(unsigned long, size, chunk_size), offset);
>                 size -= chunk_size;
>                 offset = 0;
>                 cur_page = j;

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] drm/i915: Use __sg_alloc_table_from_pages for userptr allocations
  2017-07-27  9:05   ` Tvrtko Ursulin
  (?)
@ 2017-07-28 11:06   ` Chris Wilson
  -1 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-07-28 11:06 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx
  Cc: Ben Widawsky, Jason Ekstrand, Tvrtko Ursulin, linux-kernel,
	Joonas Lahtinen

Quoting Tvrtko Ursulin (2017-07-27 10:05:04)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> With the addition of __sg_alloc_table_from_pages we can control
> the maximum coallescing size and eliminate a separate path for
> allocating backing store here.
> 
> Similar to 871dfbd67d4e ("drm/i915: Allow compaction upto
> SWIOTLB max segment size") this enables more compact sg lists to
> be created and so has a beneficial effect on workloads with many
> and/or large objects of this class.
> 
> v2:
>  * Rename helper to i915_sg_segment_size and fix swiotlb override.
>  * Commit message update.
> 
> v3:
>  * Actually include the swiotlb override fix.
> 
> v4:
>  * Regroup parameters a bit. (Chris Wilson)
> 
> v5:
>  * Rebase for swiotlb_max_segment.
>  * Add DMA map failure handling as in abb0deacb5a6
>    ("drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping").
> 
> v6: Handle swiotlb_max_segment() returning 1. (Joonas Lahtinen)
> 
> v7: Rebase.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         | 15 +++++++
>  drivers/gpu/drm/i915/i915_gem.c         |  6 +--
>  drivers/gpu/drm/i915/i915_gem_userptr.c | 79 ++++++++++++---------------------
>  3 files changed, 45 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2c7456f4ed38..6383940e8d55 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2749,6 +2749,21 @@ static inline struct scatterlist *__sg_next(struct scatterlist *sg)
>              (((__iter).curr += PAGE_SIZE) < (__iter).max) ||           \
>              ((__iter) = __sgt_iter(__sg_next((__iter).sgp), false), 0))
>  
> +static inline unsigned int i915_sg_segment_size(void)
> +{
> +       unsigned int size = swiotlb_max_segment();
> +
> +       if (size == 0)
> +               return SCATTERLIST_MAX_SEGMENT;
> +
> +       size = rounddown(size, PAGE_SIZE);

Looks like swiotbl_max_seqment() is always page aligned when not 1.
And it returns bytes, ok.

Given that you are using a pot, you can use round_down().

> +       /* swiotlb_max_segment_size can return 1 byte when it means one page. */
> +       if (size < PAGE_SIZE)
> +               size = PAGE_SIZE;
> +
> +       return size;
> +}
> +
>  static inline const struct intel_device_info *
>  intel_info(const struct drm_i915_private *dev_priv)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 6faabf34f142..a60885d6231b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2339,7 +2339,7 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
>         struct sgt_iter sgt_iter;
>         struct page *page;
>         unsigned long last_pfn = 0;     /* suppress gcc warning */
> -       unsigned int max_segment;
> +       unsigned int max_segment = i915_sg_segment_size();
>         gfp_t noreclaim;
>         int ret;
>  
> @@ -2350,10 +2350,6 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
>         GEM_BUG_ON(obj->base.read_domains & I915_GEM_GPU_DOMAINS);
>         GEM_BUG_ON(obj->base.write_domain & I915_GEM_GPU_DOMAINS);
>  
> -       max_segment = swiotlb_max_segment();
> -       if (!max_segment)
> -               max_segment = rounddown(UINT_MAX, PAGE_SIZE);

Conversion to i915_sg_segment_size(), ok.

> -
>         st = kmalloc(sizeof(*st), GFP_KERNEL);
>         if (st == NULL)
>                 return ERR_PTR(-ENOMEM);
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index ccd09e8419f5..60c10d4118ae 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -399,64 +399,42 @@ struct get_pages_work {
>         struct task_struct *task;
>  };
>  
> -#if IS_ENABLED(CONFIG_SWIOTLB)
> -#define swiotlb_active() swiotlb_nr_tbl()
> -#else
> -#define swiotlb_active() 0
> -#endif

Converted to i915_sg_segment_size(), nice.

> -static int
> -st_set_pages(struct sg_table **st, struct page **pvec, int num_pages)
> -{
> -       struct scatterlist *sg;
> -       int ret, n;
> -
> -       *st = kmalloc(sizeof(**st), GFP_KERNEL);
> -       if (*st == NULL)
> -               return -ENOMEM;
> -
> -       if (swiotlb_active()) {
> -               ret = sg_alloc_table(*st, num_pages, GFP_KERNEL);
> -               if (ret)
> -                       goto err;
> -
> -               for_each_sg((*st)->sgl, sg, num_pages, n)
> -                       sg_set_page(sg, pvec[n], PAGE_SIZE, 0);
> -       } else {
> -               ret = sg_alloc_table_from_pages(*st, pvec, num_pages,
> -                                               0, num_pages << PAGE_SHIFT,
> -                                               GFP_KERNEL);
> -               if (ret)
> -                       goto err;
> -       }
> -
> -       return 0;
> -
> -err:
> -       kfree(*st);
> -       *st = NULL;
> -       return ret;
> -}
> -
>  static struct sg_table *
> -__i915_gem_userptr_set_pages(struct drm_i915_gem_object *obj,
> -                            struct page **pvec, int num_pages)
> +__i915_gem_userptr_alloc_pages(struct drm_i915_gem_object *obj,
> +                              struct page **pvec, int num_pages)
>  {
> -       struct sg_table *pages;
> +       unsigned int max_segment = i915_sg_segment_size();
> +       struct sg_table *st;
>         int ret;
>  
> -       ret = st_set_pages(&pages, pvec, num_pages);
> -       if (ret)
> +       st = kmalloc(sizeof(*st), GFP_KERNEL);
> +       if (!st)
> +               return ERR_PTR(-ENOMEM);
> +
> +alloc_table:
> +       ret = __sg_alloc_table_from_pages(st, pvec, num_pages,
> +                                         0, num_pages << PAGE_SHIFT,
> +                                         max_segment,
> +                                         GFP_KERNEL);
> +       if (ret) {
> +               kfree(st);
>                 return ERR_PTR(ret);
> +       }
>  
> -       ret = i915_gem_gtt_prepare_pages(obj, pages);
> +       ret = i915_gem_gtt_prepare_pages(obj, st);
>         if (ret) {
> -               sg_free_table(pages);
> -               kfree(pages);
> +               sg_free_table(st);
> +
> +               if (max_segment > PAGE_SIZE) {
> +                       max_segment = PAGE_SIZE;
> +                       goto alloc_table;
> +               }
> +
> +               kfree(st);
>                 return ERR_PTR(ret);
>         }

Much neater.

>  
> -       return pages;
> +       return st;
>  }
>  
>  static int
> @@ -540,7 +518,8 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
>                 struct sg_table *pages = ERR_PTR(ret);
>  
>                 if (pinned == npages) {
> -                       pages = __i915_gem_userptr_set_pages(obj, pvec, npages);
> +                       pages = __i915_gem_userptr_alloc_pages(obj, pvec,
> +                                                              npages);
>                         if (!IS_ERR(pages)) {
>                                 __i915_gem_object_set_pages(obj, pages);
>                                 pinned = 0;
> @@ -661,7 +640,7 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
>                 pages = __i915_gem_userptr_get_pages_schedule(obj);
>                 active = pages == ERR_PTR(-EAGAIN);
>         } else {
> -               pages = __i915_gem_userptr_set_pages(obj, pvec, num_pages);
> +               pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages);
>                 active = !IS_ERR(pages);
>         }
>         if (active)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] lib/scatterlist: Introduce and export __sg_alloc_table_from_pages
  2017-07-27  9:05   ` Tvrtko Ursulin
  (?)
@ 2017-07-28 11:07   ` Chris Wilson
  -1 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-07-28 11:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx
  Cc: Ben Widawsky, Jason Ekstrand, Tvrtko Ursulin, Masahiro Yamada,
	linux-kernel, Joonas Lahtinen

Quoting Tvrtko Ursulin (2017-07-27 10:05:03)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Drivers like i915 benefit from being able to control the maxium
> size of the sg coallesced segment while building the scatter-
> gather list.
> 
> Introduce and export the __sg_alloc_table_from_pages function
> which will allow it that control.
> 
> v2: Reorder parameters. (Chris Wilson)
> v3: Fix incomplete reordering in v2.
> v4: max_segment needs to be page aligned.
> v5: Rebase.
> v6: Rebase.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Sill happy and checked the external user,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] Userptr bo slab use optimization
  2017-07-27  9:25 ` [PATCH 0/4] Userptr bo slab use optimization Chris Wilson
  2017-07-27 10:46   ` Tvrtko Ursulin
@ 2017-08-01 18:16   ` Ben Widawsky
  1 sibling, 0 replies; 22+ messages in thread
From: Ben Widawsky @ 2017-08-01 18:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel-gfx

On 17-07-27 10:25:51, Chris Wilson wrote:
>Quoting Tvrtko Ursulin (2017-07-27 10:05:00)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Yet another attempt to get this series reviewed and merged...
>>
>> I've heard Vulkan might be creating a lot of userptr objects so might be
>> interesting to check what benefit it brings to those use cases.
>
>Optimist :) My thinking is that this should only impact get_pages ->
>vma_bind, which is supposed a rare operation, and if should happen as
>part of the steady state that we have too many sg in a chain is just one
>of the myriad little paper cuts :)
>

Vulkan is critically dependent on userptr, but I don't believe we create many
usrptr BOs as the implementation and API reduce the number of BOs in general.

I don't see any reason not to do any of this though. Series is
Acked-by: Ben Widawsky <ben@bwidawsk.net>

>> As an introduction, this allows i915 to create fewer sg table entries for the bo
>> backing store representation. As such it primarily saves kernel slab memory.
>>
>> When we added this optimisation to normal i915 bos, the savings were as far as
>> I remember around 1-2MiB of slab after booting to KDE desktop, and 2-4Mib on
>> neverball (game) main screen (or maybe it was while playing).
>
>I think we also want to think about the aspect where we are creating
>objects of multiple 1G huge pages, so we are going to run into the sg
>limits very quickly.
>-Chris

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] lib/scatterlist: Introduce and export __sg_alloc_table_from_pages
  2017-07-27  9:05   ` Tvrtko Ursulin
  (?)
  (?)
@ 2017-08-02 13:06   ` Tvrtko Ursulin
  2017-08-02 23:01     ` Andrew Morton
  -1 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-08-02 13:06 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx
  Cc: linux-kernel, Masahiro Yamada, Ben Widawsky, Andrew Morton,
	Vetter, Daniel


Hi Andrew,

We have a couple of small lib/scatterlist.c tidies here, plus exporting 
the new API which allows drivers to control the maximum coalesced entry 
as created by __sg_alloc_table_from_pages.

I am looking for an ack to merge these three patches via the drm-intel tree.

Regards,

Tvrtko

On 27/07/2017 10:05, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Drivers like i915 benefit from being able to control the maxium
> size of the sg coallesced segment while building the scatter-
> gather list.
> 
> Introduce and export the __sg_alloc_table_from_pages function
> which will allow it that control.
> 
> v2: Reorder parameters. (Chris Wilson)
> v3: Fix incomplete reordering in v2.
> v4: max_segment needs to be page aligned.
> v5: Rebase.
> v6: Rebase.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   include/linux/scatterlist.h | 11 +++++----
>   lib/scatterlist.c           | 58 +++++++++++++++++++++++++++++++++++----------
>   2 files changed, 52 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 6dd2ddbc6230..874b50c232de 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -267,10 +267,13 @@ void sg_free_table(struct sg_table *);
>   int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int,
>   		     struct scatterlist *, gfp_t, sg_alloc_fn *);
>   int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
> -int sg_alloc_table_from_pages(struct sg_table *sgt,
> -	struct page **pages, unsigned int n_pages,
> -	unsigned int offset, unsigned long size,
> -	gfp_t gfp_mask);
> +int __sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
> +				unsigned int n_pages, unsigned int offset,
> +				unsigned long size, unsigned int max_segment,
> +				gfp_t gfp_mask);
> +int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
> +			      unsigned int n_pages, unsigned int offset,
> +			      unsigned long size, gfp_t gfp_mask);
>   
>   size_t sg_copy_buffer(struct scatterlist *sgl, unsigned int nents, void *buf,
>   		      size_t buflen, off_t skip, bool to_buffer);
> diff --git a/lib/scatterlist.c b/lib/scatterlist.c
> index 7b2e74da2c44..1a5900f9a057 100644
> --- a/lib/scatterlist.c
> +++ b/lib/scatterlist.c
> @@ -370,14 +370,15 @@ int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask)
>   EXPORT_SYMBOL(sg_alloc_table);
>   
>   /**
> - * sg_alloc_table_from_pages - Allocate and initialize an sg table from
> - *			       an array of pages
> - * @sgt:	The sg table header to use
> - * @pages:	Pointer to an array of page pointers
> - * @n_pages:	Number of pages in the pages array
> - * @offset:     Offset from start of the first page to the start of a buffer
> - * @size:       Number of valid bytes in the buffer (after offset)
> - * @gfp_mask:	GFP allocation mask
> + * __sg_alloc_table_from_pages - Allocate and initialize an sg table from
> + *			         an array of pages
> + * @sgt:	 The sg table header to use
> + * @pages:	 Pointer to an array of page pointers
> + * @n_pages:	 Number of pages in the pages array
> + * @offset:      Offset from start of the first page to the start of a buffer
> + * @size:        Number of valid bytes in the buffer (after offset)
> + * @max_segment: Maximum size of a scatterlist node in bytes (page aligned)
> + * @gfp_mask:	 GFP allocation mask
>    *
>    *  Description:
>    *    Allocate and initialize an sg table from a list of pages. Contiguous
> @@ -389,16 +390,18 @@ EXPORT_SYMBOL(sg_alloc_table);
>    * Returns:
>    *   0 on success, negative error on failure
>    */
> -int sg_alloc_table_from_pages(struct sg_table *sgt,
> -	struct page **pages, unsigned int n_pages,
> -	unsigned int offset, unsigned long size,
> -	gfp_t gfp_mask)
> +int __sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
> +				unsigned int n_pages, unsigned int offset,
> +				unsigned long size, unsigned int max_segment,
> +				gfp_t gfp_mask)
>   {
> -	const unsigned int max_segment = SCATTERLIST_MAX_SEGMENT;
>   	unsigned int chunks, cur_page, seg_len, i;
>   	int ret;
>   	struct scatterlist *s;
>   
> +	if (WARN_ON(!max_segment || offset_in_page(max_segment)))
> +		return -EINVAL;
> +
>   	/* compute number of contiguous chunks */
>   	chunks = 1;
>   	seg_len = 0;
> @@ -440,6 +443,35 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
>   
>   	return 0;
>   }
> +EXPORT_SYMBOL(__sg_alloc_table_from_pages);
> +
> +/**
> + * sg_alloc_table_from_pages - Allocate and initialize an sg table from
> + *			       an array of pages
> + * @sgt:	 The sg table header to use
> + * @pages:	 Pointer to an array of page pointers
> + * @n_pages:	 Number of pages in the pages array
> + * @offset:      Offset from start of the first page to the start of a buffer
> + * @size:        Number of valid bytes in the buffer (after offset)
> + * @gfp_mask:	 GFP allocation mask
> + *
> + *  Description:
> + *    Allocate and initialize an sg table from a list of pages. Contiguous
> + *    ranges of the pages are squashed into a single scatterlist node. A user
> + *    may provide an offset at a start and a size of valid data in a buffer
> + *    specified by the page array. The returned sg table is released by
> + *    sg_free_table.
> + *
> + * Returns:
> + *   0 on success, negative error on failure
> + */
> +int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
> +			      unsigned int n_pages, unsigned int offset,
> +			      unsigned long size, gfp_t gfp_mask)
> +{
> +	return __sg_alloc_table_from_pages(sgt, pages, n_pages, offset, size,
> +					   SCATTERLIST_MAX_SEGMENT, gfp_mask);
> +}
>   EXPORT_SYMBOL(sg_alloc_table_from_pages);
>   
>   void __sg_page_iter_start(struct sg_page_iter *piter,
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] lib/scatterlist: Introduce and export __sg_alloc_table_from_pages
  2017-08-02 13:06   ` [Intel-gfx] " Tvrtko Ursulin
@ 2017-08-02 23:01     ` Andrew Morton
  2017-08-03  9:21         ` Tvrtko Ursulin
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2017-08-02 23:01 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Tvrtko Ursulin, Intel-gfx, linux-kernel, Masahiro Yamada,
	Ben Widawsky, Vetter, Daniel

On Wed, 2 Aug 2017 14:06:39 +0100 Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> 
> Hi Andrew,
> 
> We have a couple of small lib/scatterlist.c tidies here, plus exporting 
> the new API which allows drivers to control the maximum coalesced entry 
> as created by __sg_alloc_table_from_pages.
> 
> I am looking for an ack to merge these three patches via the drm-intel tree.
> 
> 
> ...
>
> On 27/07/2017 10:05, Tvrtko Ursulin wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > Drivers like i915 benefit from being able to control the maxium
> > size of the sg coallesced segment while building the scatter-

"coalesced"

> > gather list.
> > 
> > Introduce and export the __sg_alloc_table_from_pages function
> > which will allow it that control.
> > 
>
> ...
>
> > --- a/lib/scatterlist.c
> > +++ b/lib/scatterlist.c
> > @@ -370,14 +370,15 @@ int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask)
> >   EXPORT_SYMBOL(sg_alloc_table);
> >   
> >   /**
> > - * sg_alloc_table_from_pages - Allocate and initialize an sg table from
> > - *			       an array of pages
> > - * @sgt:	The sg table header to use
> > - * @pages:	Pointer to an array of page pointers
> > - * @n_pages:	Number of pages in the pages array
> > - * @offset:     Offset from start of the first page to the start of a buffer
> > - * @size:       Number of valid bytes in the buffer (after offset)
> > - * @gfp_mask:	GFP allocation mask
> > + * __sg_alloc_table_from_pages - Allocate and initialize an sg table from
> > + *			         an array of pages
> > + * @sgt:	 The sg table header to use
> > + * @pages:	 Pointer to an array of page pointers
> > + * @n_pages:	 Number of pages in the pages array
> > + * @offset:      Offset from start of the first page to the start of a buffer
> > + * @size:        Number of valid bytes in the buffer (after offset)
> > + * @max_segment: Maximum size of a scatterlist node in bytes (page aligned)
> > + * @gfp_mask:	 GFP allocation mask
> >    *
> >    *  Description:
> >    *    Allocate and initialize an sg table from a list of pages. Contiguous

The Description doesn't describe how this differs from
sg_alloc_table_from_pages(), although it doesn't seem terribly
important.

> > +
> > +/**
> > + * sg_alloc_table_from_pages - Allocate and initialize an sg table from
> > + *			       an array of pages
> > + * @sgt:	 The sg table header to use
> > + * @pages:	 Pointer to an array of page pointers
> > + * @n_pages:	 Number of pages in the pages array
> > + * @offset:      Offset from start of the first page to the start of a buffer
> > + * @size:        Number of valid bytes in the buffer (after offset)
> > + * @gfp_mask:	 GFP allocation mask
> > + *
> > + *  Description:
> > + *    Allocate and initialize an sg table from a list of pages. Contiguous
> > + *    ranges of the pages are squashed into a single scatterlist node. A user
> > + *    may provide an offset at a start and a size of valid data in a buffer
> > + *    specified by the page array. The returned sg table is released by
> > + *    sg_free_table.
> > + *
> > + * Returns:
> > + *   0 on success, negative error on failure
> > + */
> > +int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
> > +			      unsigned int n_pages, unsigned int offset,
> > +			      unsigned long size, gfp_t gfp_mask)
> > +{
> > +	return __sg_alloc_table_from_pages(sgt, pages, n_pages, offset, size,
> > +					   SCATTERLIST_MAX_SEGMENT, gfp_mask);
> > +}

Making this an inline would save a bunch or stack space in the callers?

One could just add the new arg then run around and update the 10ish
callers but I guess the new interface is OK.

Otherwise it's OK by me, please go ahead as proposed.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] lib/scatterlist: Introduce and export __sg_alloc_table_from_pages
  2017-08-02 23:01     ` Andrew Morton
@ 2017-08-03  9:21         ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-08-03  9:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tvrtko Ursulin, Intel-gfx, linux-kernel, Masahiro Yamada,
	Ben Widawsky, Vetter, Daniel


On 03/08/2017 00:01, Andrew Morton wrote:
> On Wed, 2 Aug 2017 14:06:39 +0100 Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>>
>> Hi Andrew,
>>
>> We have a couple of small lib/scatterlist.c tidies here, plus exporting
>> the new API which allows drivers to control the maximum coalesced entry
>> as created by __sg_alloc_table_from_pages.
>>
>> I am looking for an ack to merge these three patches via the drm-intel tree.
>>
>>
>> ...
>>
>> On 27/07/2017 10:05, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Drivers like i915 benefit from being able to control the maxium
>>> size of the sg coallesced segment while building the scatter-
> 
> "coalesced"

Oops, I've had this in other patches in the series. Fixed now.

>>> gather list.
>>>
>>> Introduce and export the __sg_alloc_table_from_pages function
>>> which will allow it that control.
>>>
>>
>> ...
>>
>>> --- a/lib/scatterlist.c
>>> +++ b/lib/scatterlist.c
>>> @@ -370,14 +370,15 @@ int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask)
>>>    EXPORT_SYMBOL(sg_alloc_table);
>>>    
>>>    /**
>>> - * sg_alloc_table_from_pages - Allocate and initialize an sg table from
>>> - *			       an array of pages
>>> - * @sgt:	The sg table header to use
>>> - * @pages:	Pointer to an array of page pointers
>>> - * @n_pages:	Number of pages in the pages array
>>> - * @offset:     Offset from start of the first page to the start of a buffer
>>> - * @size:       Number of valid bytes in the buffer (after offset)
>>> - * @gfp_mask:	GFP allocation mask
>>> + * __sg_alloc_table_from_pages - Allocate and initialize an sg table from
>>> + *			         an array of pages
>>> + * @sgt:	 The sg table header to use
>>> + * @pages:	 Pointer to an array of page pointers
>>> + * @n_pages:	 Number of pages in the pages array
>>> + * @offset:      Offset from start of the first page to the start of a buffer
>>> + * @size:        Number of valid bytes in the buffer (after offset)
>>> + * @max_segment: Maximum size of a scatterlist node in bytes (page aligned)
>>> + * @gfp_mask:	 GFP allocation mask
>>>     *
>>>     *  Description:
>>>     *    Allocate and initialize an sg table from a list of pages. Contiguous
> 
> The Description doesn't describe how this differs from
> sg_alloc_table_from_pages(), although it doesn't seem terribly
> important.

Well spotted, I've fixed this as well.

>>> +
>>> +/**
>>> + * sg_alloc_table_from_pages - Allocate and initialize an sg table from
>>> + *			       an array of pages
>>> + * @sgt:	 The sg table header to use
>>> + * @pages:	 Pointer to an array of page pointers
>>> + * @n_pages:	 Number of pages in the pages array
>>> + * @offset:      Offset from start of the first page to the start of a buffer
>>> + * @size:        Number of valid bytes in the buffer (after offset)
>>> + * @gfp_mask:	 GFP allocation mask
>>> + *
>>> + *  Description:
>>> + *    Allocate and initialize an sg table from a list of pages. Contiguous
>>> + *    ranges of the pages are squashed into a single scatterlist node. A user
>>> + *    may provide an offset at a start and a size of valid data in a buffer
>>> + *    specified by the page array. The returned sg table is released by
>>> + *    sg_free_table.
>>> + *
>>> + * Returns:
>>> + *   0 on success, negative error on failure
>>> + */
>>> +int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
>>> +			      unsigned int n_pages, unsigned int offset,
>>> +			      unsigned long size, gfp_t gfp_mask)
>>> +{
>>> +	return __sg_alloc_table_from_pages(sgt, pages, n_pages, offset, size,
>>> +					   SCATTERLIST_MAX_SEGMENT, gfp_mask);
>>> +}
> 
> Making this an inline would save a bunch or stack space in the callers?
> 
> One could just add the new arg then run around and update the 10ish
> callers but I guess the new interface is OK.

On the first suggestion - there are other trivial wrappers in 
lib/scatterlist.c which could benefit from the same treatment (move to 
inline) so I opted not to do this in this patch but will send a follow up.

> Otherwise it's OK by me, please go ahead as proposed.

Thanks a lot!

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] lib/scatterlist: Introduce and export __sg_alloc_table_from_pages
@ 2017-08-03  9:21         ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-08-03  9:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Intel-gfx, linux-kernel, Masahiro Yamada, Ben Widawsky, Vetter, Daniel


On 03/08/2017 00:01, Andrew Morton wrote:
> On Wed, 2 Aug 2017 14:06:39 +0100 Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>>
>> Hi Andrew,
>>
>> We have a couple of small lib/scatterlist.c tidies here, plus exporting
>> the new API which allows drivers to control the maximum coalesced entry
>> as created by __sg_alloc_table_from_pages.
>>
>> I am looking for an ack to merge these three patches via the drm-intel tree.
>>
>>
>> ...
>>
>> On 27/07/2017 10:05, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Drivers like i915 benefit from being able to control the maxium
>>> size of the sg coallesced segment while building the scatter-
> 
> "coalesced"

Oops, I've had this in other patches in the series. Fixed now.

>>> gather list.
>>>
>>> Introduce and export the __sg_alloc_table_from_pages function
>>> which will allow it that control.
>>>
>>
>> ...
>>
>>> --- a/lib/scatterlist.c
>>> +++ b/lib/scatterlist.c
>>> @@ -370,14 +370,15 @@ int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t gfp_mask)
>>>    EXPORT_SYMBOL(sg_alloc_table);
>>>    
>>>    /**
>>> - * sg_alloc_table_from_pages - Allocate and initialize an sg table from
>>> - *			       an array of pages
>>> - * @sgt:	The sg table header to use
>>> - * @pages:	Pointer to an array of page pointers
>>> - * @n_pages:	Number of pages in the pages array
>>> - * @offset:     Offset from start of the first page to the start of a buffer
>>> - * @size:       Number of valid bytes in the buffer (after offset)
>>> - * @gfp_mask:	GFP allocation mask
>>> + * __sg_alloc_table_from_pages - Allocate and initialize an sg table from
>>> + *			         an array of pages
>>> + * @sgt:	 The sg table header to use
>>> + * @pages:	 Pointer to an array of page pointers
>>> + * @n_pages:	 Number of pages in the pages array
>>> + * @offset:      Offset from start of the first page to the start of a buffer
>>> + * @size:        Number of valid bytes in the buffer (after offset)
>>> + * @max_segment: Maximum size of a scatterlist node in bytes (page aligned)
>>> + * @gfp_mask:	 GFP allocation mask
>>>     *
>>>     *  Description:
>>>     *    Allocate and initialize an sg table from a list of pages. Contiguous
> 
> The Description doesn't describe how this differs from
> sg_alloc_table_from_pages(), although it doesn't seem terribly
> important.

Well spotted, I've fixed this as well.

>>> +
>>> +/**
>>> + * sg_alloc_table_from_pages - Allocate and initialize an sg table from
>>> + *			       an array of pages
>>> + * @sgt:	 The sg table header to use
>>> + * @pages:	 Pointer to an array of page pointers
>>> + * @n_pages:	 Number of pages in the pages array
>>> + * @offset:      Offset from start of the first page to the start of a buffer
>>> + * @size:        Number of valid bytes in the buffer (after offset)
>>> + * @gfp_mask:	 GFP allocation mask
>>> + *
>>> + *  Description:
>>> + *    Allocate and initialize an sg table from a list of pages. Contiguous
>>> + *    ranges of the pages are squashed into a single scatterlist node. A user
>>> + *    may provide an offset at a start and a size of valid data in a buffer
>>> + *    specified by the page array. The returned sg table is released by
>>> + *    sg_free_table.
>>> + *
>>> + * Returns:
>>> + *   0 on success, negative error on failure
>>> + */
>>> +int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
>>> +			      unsigned int n_pages, unsigned int offset,
>>> +			      unsigned long size, gfp_t gfp_mask)
>>> +{
>>> +	return __sg_alloc_table_from_pages(sgt, pages, n_pages, offset, size,
>>> +					   SCATTERLIST_MAX_SEGMENT, gfp_mask);
>>> +}
> 
> Making this an inline would save a bunch or stack space in the callers?
> 
> One could just add the new arg then run around and update the 10ish
> callers but I guess the new interface is OK.

On the first suggestion - there are other trivial wrappers in 
lib/scatterlist.c which could benefit from the same treatment (move to 
inline) so I opted not to do this in this patch but will send a follow up.

> Otherwise it's OK by me, please go ahead as proposed.

Thanks a lot!

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] lib/scatterlist: Introduce and export __sg_alloc_table_from_pages
  2016-11-11  8:50 ` [PATCH 3/4] lib/scatterlist: Introduce and export __sg_alloc_table_from_pages Tvrtko Ursulin
@ 2016-11-11 10:24   ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2016-11-11 10:24 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx, Masahiro Yamada, linux-kernel

On Fri, Nov 11, 2016 at 08:50:19AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Drivers like i915 benefit from being able to control the maxium
> size of the sg coallesced segment while building the scatter-
> gather list.
> 
> Introduce and export the __sg_alloc_table_from_pages function
> which will allow it that control.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: linux-kernel@vger.kernel.org


> ---
>  include/linux/scatterlist.h | 11 +++++----
>  lib/scatterlist.c           | 55 ++++++++++++++++++++++++++++++++++-----------
>  2 files changed, 49 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index c981bee1a3ae..29591dbb20fd 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -261,10 +261,13 @@ void sg_free_table(struct sg_table *);
>  int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int,
>  		     struct scatterlist *, gfp_t, sg_alloc_fn *);
>  int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
> -int sg_alloc_table_from_pages(struct sg_table *sgt,
> -	struct page **pages, unsigned int n_pages,
> -	unsigned int offset, unsigned long size,
> -	gfp_t gfp_mask);
> +int __sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
> +				unsigned int n_pages, unsigned int offset,
> +				unsigned long size, gfp_t gfp_mask,
> +				unsigned int max_segment);

Just the question of parameter order, I like gfp_t last :)
And I think offset / size / max_segment tie together.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-08-03  9:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-27  9:05 [PATCH 0/4] Userptr bo slab use optimization Tvrtko Ursulin
2017-07-27  9:05 ` [PATCH 1/4] lib/scatterlist: Fix offset type in sg_alloc_table_from_pages Tvrtko Ursulin
2017-07-27  9:05   ` Tvrtko Ursulin
2017-07-27  9:05 ` [PATCH 2/4] lib/scatterlist: Avoid potential scatterlist entry overflow Tvrtko Ursulin
2017-07-27  9:05   ` Tvrtko Ursulin
2017-07-28 10:53   ` [Intel-gfx] " Chris Wilson
2017-07-27  9:05 ` [PATCH 3/4] lib/scatterlist: Introduce and export __sg_alloc_table_from_pages Tvrtko Ursulin
2017-07-27  9:05   ` Tvrtko Ursulin
2017-07-28 11:07   ` Chris Wilson
2017-08-02 13:06   ` [Intel-gfx] " Tvrtko Ursulin
2017-08-02 23:01     ` Andrew Morton
2017-08-03  9:21       ` Tvrtko Ursulin
2017-08-03  9:21         ` Tvrtko Ursulin
2017-07-27  9:05 ` [PATCH 4/4] drm/i915: Use __sg_alloc_table_from_pages for userptr allocations Tvrtko Ursulin
2017-07-27  9:05   ` Tvrtko Ursulin
2017-07-28 11:06   ` Chris Wilson
2017-07-27  9:25 ` [PATCH 0/4] Userptr bo slab use optimization Chris Wilson
2017-07-27 10:46   ` Tvrtko Ursulin
2017-07-27 10:53     ` Chris Wilson
2017-08-01 18:16   ` Ben Widawsky
2017-07-27  9:39 ` ✓ Fi.CI.BAT: success for " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2016-11-11  8:50 [PATCH 0/4] Compact userptr object backing store allocation Tvrtko Ursulin
2016-11-11  8:50 ` [PATCH 3/4] lib/scatterlist: Introduce and export __sg_alloc_table_from_pages Tvrtko Ursulin
2016-11-11 10:24   ` [Intel-gfx] " Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.