All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] drm/i915: refactor i915_gem_object_pin_map()
@ 2016-05-16 15:19 Dave Gordon
  2016-05-16 15:19 ` [PATCH 2/3] drm/i915: optimise i915_gem_object_map() for small objects Dave Gordon
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Dave Gordon @ 2016-05-16 15:19 UTC (permalink / raw)
  To: intel-gfx

The recently-added i915_gem_object_pin_map() can be further optimised
for "small" objects. To facilitate this, and simplify the error paths
before adding the new code, this patch pulls out the "mapping" part of
the operation (involving local allocations which must be undone before
return) into its own subfunction.

The next patch will then insert the new optimisation into the middle of
the now-separated subfunction.

This reorganisation will probably not affect the generated code, as the
compiler will most likely inline it anyway, but it makes the logical
structure a bit clearer and easier to modify.

v2:
    Restructure loop-over-pages & error check (Chris Wilson)

v3:
    Add page count to debug messages (Chris Wilson)
    Convert WARN_ON() to GEM_BUG_ON()

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 59 ++++++++++++++++++++++++++---------------
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index aff386e..bbec429 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2398,6 +2398,43 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
+/* The 'mapping' part of i915_gem_object_pin_map() below */
+static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
+{
+	unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
+	struct sg_table *sgt = obj->pages;
+	struct sg_page_iter sg_iter;
+	struct page **pages;
+	unsigned long i = 0;
+	void *addr = NULL;
+
+	/* A single page can always be kmapped */
+	if (n_pages == 1)
+		return kmap(sg_page(sgt->sgl));
+
+	pages = drm_malloc_gfp(n_pages, sizeof(*pages), GFP_TEMPORARY);
+	if (pages == NULL) {
+		DRM_DEBUG_DRIVER("Failed to get space for %lu pointers\n",
+				 n_pages);
+		return NULL;
+	}
+
+	for_each_sg_page(sgt->sgl, &sg_iter, sgt->nents, 0)
+		pages[i++] = sg_page_iter_page(&sg_iter);
+
+	/* Check that we have the expected number of pages */
+	GEM_BUG_ON(i != n_pages);
+
+	addr = vmap(pages, n_pages, 0, PAGE_KERNEL);
+	if (addr == NULL)
+		DRM_DEBUG_DRIVER("Failed to vmap %lu pages\n", n_pages);
+
+	drm_free_large(pages);
+
+	return addr;
+}
+
+/* get, pin, and map the pages of the object into kernel space */
 void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj)
 {
 	int ret;
@@ -2411,27 +2448,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj)
 	i915_gem_object_pin_pages(obj);
 
 	if (obj->mapping == NULL) {
-		struct page **pages;
-
-		pages = NULL;
-		if (obj->base.size == PAGE_SIZE)
-			obj->mapping = kmap(sg_page(obj->pages->sgl));
-		else
-			pages = drm_malloc_gfp(obj->base.size >> PAGE_SHIFT,
-					       sizeof(*pages),
-					       GFP_TEMPORARY);
-		if (pages != NULL) {
-			struct sg_page_iter sg_iter;
-			int n;
-
-			n = 0;
-			for_each_sg_page(obj->pages->sgl, &sg_iter,
-					 obj->pages->nents, 0)
-				pages[n++] = sg_page_iter_page(&sg_iter);
-
-			obj->mapping = vmap(pages, n, 0, PAGE_KERNEL);
-			drm_free_large(pages);
-		}
+		obj->mapping = i915_gem_object_map(obj);
 		if (obj->mapping == NULL) {
 			i915_gem_object_unpin_pages(obj);
 			return ERR_PTR(-ENOMEM);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] drm/i915: optimise i915_gem_object_map() for small objects
  2016-05-16 15:19 [PATCH 1/3] drm/i915: refactor i915_gem_object_pin_map() Dave Gordon
@ 2016-05-16 15:19 ` Dave Gordon
  2016-05-16 15:19   ` Dave Gordon
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Dave Gordon @ 2016-05-16 15:19 UTC (permalink / raw)
  To: intel-gfx

We're using this function for ringbuffers and other "small" objects, so
it's worth avoiding an extra malloc()/free() cycle if the page array is
small enough to put on the stack. Here we've chosen an arbitrary cutoff
of 32 (4k) pages, which is big enough for a ringbuffer (4 pages) or a
context image (currently up to 22 pages).

v5:
    change name of local array [Chris Wilson]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index bbec429..38e4e1e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2404,7 +2404,8 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
 	unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
 	struct sg_table *sgt = obj->pages;
 	struct sg_page_iter sg_iter;
-	struct page **pages;
+	struct page *stack_pages[32];
+	struct page **pages = stack_pages;
 	unsigned long i = 0;
 	void *addr = NULL;
 
@@ -2412,11 +2413,14 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
 	if (n_pages == 1)
 		return kmap(sg_page(sgt->sgl));
 
-	pages = drm_malloc_gfp(n_pages, sizeof(*pages), GFP_TEMPORARY);
-	if (pages == NULL) {
-		DRM_DEBUG_DRIVER("Failed to get space for %lu pointers\n",
-				 n_pages);
-		return NULL;
+	if (n_pages > ARRAY_SIZE(stack_pages)) {
+		/* Too big for stack -- allocate temporary array instead */
+		pages = drm_malloc_gfp(n_pages, sizeof(*pages), GFP_TEMPORARY);
+		if (pages == NULL) {
+			DRM_DEBUG_DRIVER("Failed to get space for %lu pointers\n",
+					 n_pages);
+			return NULL;
+		}
 	}
 
 	for_each_sg_page(sgt->sgl, &sg_iter, sgt->nents, 0)
@@ -2429,7 +2433,8 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
 	if (addr == NULL)
 		DRM_DEBUG_DRIVER("Failed to vmap %lu pages\n", n_pages);
 
-	drm_free_large(pages);
+	if (pages != stack_pages)
+		drm_free_large(pages);
 
 	return addr;
 }
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] Introduce & use new lightweight SGL iterators
  2016-05-16 15:19 [PATCH 1/3] drm/i915: refactor i915_gem_object_pin_map() Dave Gordon
@ 2016-05-16 15:19   ` Dave Gordon
  2016-05-16 15:19   ` Dave Gordon
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Dave Gordon @ 2016-05-16 15:19 UTC (permalink / raw)
  To: intel-gfx; +Cc: Dave Gordon, Chris Wilson, linux-kernel

The existing for_each_sg_page() iterator is somewhat heavyweight, and is
limiting i915 driver performance in a few benchmarks. So here we
introduce somewhat lighter weight iterators, primarily for use with GEM
objects or other case where we need only deal with whole aligned pages.

Unlike the old iterator, the new iterators use an internal state
structure which is not intended to be accessed by the caller; instead
each takes as a parameter an output variable which is set before each
iteration. This makes them particularly simple to use :)

One of the new iterators provides the caller with the DMA address of
each page in turn; the other provides the 'struct page' pointer required
by many memory management operations.

Various uses of for_each_sg_page() are then converted to the new macros.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_drv.h         | 62 +++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_gem.c         | 20 ++++-----
 drivers/gpu/drm/i915/i915_gem_fence.c   | 14 +++---
 drivers/gpu/drm/i915/i915_gem_gtt.c     | 76 ++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_userptr.c |  7 ++-
 5 files changed, 116 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 72f0b02..c0fc6aa 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2125,6 +2125,10 @@ struct drm_i915_gem_object_ops {
 #define INTEL_FRONTBUFFER_ALL_MASK(pipe) \
 	(0xff << (INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe)))
 
+void i915_gem_track_fb(struct drm_i915_gem_object *old,
+		       struct drm_i915_gem_object *new,
+		       unsigned frontbuffer_bits);
+
 struct drm_i915_gem_object {
 	struct drm_gem_object base;
 
@@ -2256,9 +2260,61 @@ struct drm_i915_gem_object {
 };
 #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
 
-void i915_gem_track_fb(struct drm_i915_gem_object *old,
-		       struct drm_i915_gem_object *new,
-		       unsigned frontbuffer_bits);
+/*
+ * New optimised SGL iterator for i915 GEM objects
+ */
+struct sgt_iter {
+	struct scatterlist *sgp;
+	union {
+		unsigned long pfn;
+		unsigned long dma;
+	} ix;
+	unsigned int curr;
+	unsigned int max;
+};
+
+/* Constructor for new iterator */
+static inline struct sgt_iter
+__sgt_iter(struct scatterlist *sgl, bool dma)
+{
+	struct sgt_iter s = { .sgp = sgl };
+
+	if (sgl) {
+		s.max = s.curr = sgl->offset;
+		s.max += sgl->length;
+		if (dma)
+			s.ix.dma = sg_dma_address(sgl);
+		else
+			s.ix.pfn = page_to_pfn(sg_page(sgl));
+	}
+
+	return s;
+}
+
+/**
+ * for_each_sgt_dma - iterate over the DMA addresses of the given sg_table
+ * @__dmap:	DMA address (output)
+ * @__iter:	'struct sgt_iter' (iterator state, internal)
+ * @__sgt:	sg_table to iterate over (input)
+ */
+#define for_each_sgt_dma(__dmap, __iter, __sgt)				\
+	for ((__iter) = __sgt_iter((__sgt)->sgl, true);			\
+	     ((__dmap) = (__iter).ix.dma + (__iter).curr);		\
+	     (((__iter).curr += PAGE_SIZE) < (__iter).max) ||		\
+		((__iter) = __sgt_iter(sg_next((__iter).sgp), true), 0))
+
+/**
+ * for_each_sgt_page - iterate over the pages of the given sg_table
+ * @__pp:	page pointer (output)
+ * @__iter:	'struct sgt_iter' (iterator state, internal)
+ * @__sgt:	sg_table to iterate over (input)
+ */
+#define for_each_sgt_page(__pp, __iter, __sgt)				\
+	for ((__iter) = __sgt_iter((__sgt)->sgl, false);		\
+	     ((__pp) = (__iter).ix.pfn == 0 ? NULL :			\
+		pfn_to_page((__iter).ix.pfn + ((__iter).curr >> PAGE_SHIFT)));\
+	     (((__iter).curr += PAGE_SIZE) < (__iter).max) ||		\
+		((__iter) = __sgt_iter(sg_next((__iter).sgp), false), 0))
 
 /**
  * Request queue structure.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 38e4e1e..0fe7e95 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2165,7 +2165,8 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
 static void
 i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 {
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	struct page *page;
 	int ret;
 
 	BUG_ON(obj->madv == __I915_MADV_PURGED);
@@ -2187,9 +2188,7 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
 	if (obj->madv == I915_MADV_DONTNEED)
 		obj->dirty = 0;
 
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
-		struct page *page = sg_page_iter_page(&sg_iter);
-
+	for_each_sgt_page(page, sgt_iter, obj->pages) {
 		if (obj->dirty)
 			set_page_dirty(page);
 
@@ -2246,7 +2245,7 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
 	struct address_space *mapping;
 	struct sg_table *st;
 	struct scatterlist *sg;
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
 	struct page *page;
 	unsigned long last_pfn = 0;	/* suppress gcc warning */
 	int ret;
@@ -2343,8 +2342,8 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
 
 err_pages:
 	sg_mark_end(sg);
-	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
-		put_page(sg_page_iter_page(&sg_iter));
+	for_each_sgt_page(page, sgt_iter, st)
+		put_page(page);
 	sg_free_table(st);
 	kfree(st);
 
@@ -2403,7 +2402,8 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
 {
 	unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
 	struct sg_table *sgt = obj->pages;
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	struct page *page;
 	struct page *stack_pages[32];
 	struct page **pages = stack_pages;
 	unsigned long i = 0;
@@ -2423,8 +2423,8 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
 		}
 	}
 
-	for_each_sg_page(sgt->sgl, &sg_iter, sgt->nents, 0)
-		pages[i++] = sg_page_iter_page(&sg_iter);
+	for_each_sgt_page(page, sgt_iter, sgt)
+		pages[i++] = page;
 
 	/* Check that we have the expected number of pages */
 	GEM_BUG_ON(i != n_pages);
diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
index a2b938e..2b6bdc2 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence.c
@@ -745,15 +745,15 @@ void i915_gem_restore_fences(struct drm_device *dev)
 void
 i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj)
 {
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	struct page *page;
 	int i;
 
 	if (obj->bit_17 == NULL)
 		return;
 
 	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
-		struct page *page = sg_page_iter_page(&sg_iter);
+	for_each_sgt_page(page, sgt_iter, obj->pages) {
 		char new_bit_17 = page_to_phys(page) >> 17;
 		if ((new_bit_17 & 0x1) !=
 		    (test_bit(i, obj->bit_17) != 0)) {
@@ -775,7 +775,8 @@ void i915_gem_restore_fences(struct drm_device *dev)
 void
 i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj)
 {
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	struct page *page;
 	int page_count = obj->base.size >> PAGE_SHIFT;
 	int i;
 
@@ -790,8 +791,9 @@ void i915_gem_restore_fences(struct drm_device *dev)
 	}
 
 	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
-		if (page_to_phys(sg_page_iter_page(&sg_iter)) & (1 << 17))
+
+	for_each_sgt_page(page, sgt_iter, obj->pages) {
+		if (page_to_phys(page) & (1 << 17))
 			__set_bit(i, obj->bit_17);
 		else
 			__clear_bit(i, obj->bit_17);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7eab619..4668477 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1839,20 +1839,19 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 				      enum i915_cache_level cache_level, u32 flags)
 {
 	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
-	gen6_pte_t *pt_vaddr;
 	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / GEN6_PTES;
 	unsigned act_pte = first_entry % GEN6_PTES;
-	struct sg_page_iter sg_iter;
+	gen6_pte_t *pt_vaddr = NULL;
+	struct sgt_iter sgt_iter;
+	dma_addr_t addr;
 
-	pt_vaddr = NULL;
-	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+	for_each_sgt_dma(addr, sgt_iter, pages) {
 		if (pt_vaddr == NULL)
 			pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
 
 		pt_vaddr[act_pte] =
-			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
-				       cache_level, true, flags);
+			vm->pte_encode(addr, cache_level, true, flags);
 
 		if (++act_pte == GEN6_PTES) {
 			kunmap_px(ppgtt, pt_vaddr);
@@ -1861,6 +1860,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 			act_pte = 0;
 		}
 	}
+
 	if (pt_vaddr)
 		kunmap_px(ppgtt, pt_vaddr);
 }
@@ -2362,22 +2362,20 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 {
 	struct drm_i915_private *dev_priv = to_i915(vm->dev);
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-	unsigned first_entry = start >> PAGE_SHIFT;
-	gen8_pte_t __iomem *gtt_entries =
-		(gen8_pte_t __iomem *)ggtt->gsm + first_entry;
-	int i = 0;
-	struct sg_page_iter sg_iter;
-	dma_addr_t addr = 0; /* shut up gcc */
+	struct sgt_iter sgt_iter;
+	gen8_pte_t __iomem *gtt_entries;
+	gen8_pte_t gtt_entry;
+	dma_addr_t addr;
 	int rpm_atomic_seq;
+	int i = 0;
 
 	rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
 
-	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) {
-		addr = sg_dma_address(sg_iter.sg) +
-			(sg_iter.sg_pgoffset << PAGE_SHIFT);
-		gen8_set_pte(&gtt_entries[i],
-			     gen8_pte_encode(addr, level, true));
-		i++;
+	gtt_entries = (gen8_pte_t __iomem *)ggtt->gsm + (start >> PAGE_SHIFT);
+
+	for_each_sgt_dma(addr, sgt_iter, st) {
+		gtt_entry = gen8_pte_encode(addr, level, true);
+		gen8_set_pte(&gtt_entries[i++], gtt_entry);
 	}
 
 	/*
@@ -2388,8 +2386,7 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 	 * hardware should work, we must keep this posting read for paranoia.
 	 */
 	if (i != 0)
-		WARN_ON(readq(&gtt_entries[i-1])
-			!= gen8_pte_encode(addr, level, true));
+		WARN_ON(readq(&gtt_entries[i-1]) != gtt_entry);
 
 	/* This next bit makes the above posting read even more important. We
 	 * want to flush the TLBs only after we're certain all the PTE updates
@@ -2440,20 +2437,20 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 {
 	struct drm_i915_private *dev_priv = to_i915(vm->dev);
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-	unsigned first_entry = start >> PAGE_SHIFT;
-	gen6_pte_t __iomem *gtt_entries =
-		(gen6_pte_t __iomem *)ggtt->gsm + first_entry;
-	int i = 0;
-	struct sg_page_iter sg_iter;
-	dma_addr_t addr = 0;
+	struct sgt_iter sgt_iter;
+	gen6_pte_t __iomem *gtt_entries;
+	gen6_pte_t gtt_entry;
+	dma_addr_t addr;
 	int rpm_atomic_seq;
+	int i = 0;
 
 	rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
 
-	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) {
-		addr = sg_page_iter_dma_address(&sg_iter);
-		iowrite32(vm->pte_encode(addr, level, true, flags), &gtt_entries[i]);
-		i++;
+	gtt_entries = (gen6_pte_t __iomem *)ggtt->gsm + (start >> PAGE_SHIFT);
+
+	for_each_sgt_dma(addr, sgt_iter, st) {
+		gtt_entry = vm->pte_encode(addr, level, true, flags);
+		iowrite32(gtt_entry, &gtt_entries[i++]);
 	}
 
 	/* XXX: This serves as a posting read to make sure that the PTE has
@@ -2462,10 +2459,8 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 	 * of NUMA access patterns. Therefore, even with the way we assume
 	 * hardware should work, we must keep this posting read for paranoia.
 	 */
-	if (i != 0) {
-		unsigned long gtt = readl(&gtt_entries[i-1]);
-		WARN_ON(gtt != vm->pte_encode(addr, level, true, flags));
-	}
+	if (i != 0)
+		WARN_ON(readl(&gtt_entries[i-1]) != gtt_entry);
 
 	/* This next bit makes the above posting read even more important. We
 	 * want to flush the TLBs only after we're certain all the PTE updates
@@ -3399,9 +3394,11 @@ struct i915_vma *
 intel_rotate_fb_obj_pages(struct intel_rotation_info *rot_info,
 			  struct drm_i915_gem_object *obj)
 {
+	const size_t n_pages = obj->base.size / PAGE_SIZE;
 	unsigned int size_pages = rot_info->plane[0].width * rot_info->plane[0].height;
 	unsigned int size_pages_uv;
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	dma_addr_t dma_addr;
 	unsigned long i;
 	dma_addr_t *page_addr_list;
 	struct sg_table *st;
@@ -3410,7 +3407,7 @@ struct i915_vma *
 	int ret = -ENOMEM;
 
 	/* Allocate a temporary list of source pages for random access. */
-	page_addr_list = drm_malloc_gfp(obj->base.size / PAGE_SIZE,
+	page_addr_list = drm_malloc_gfp(n_pages,
 					sizeof(dma_addr_t),
 					GFP_TEMPORARY);
 	if (!page_addr_list)
@@ -3433,11 +3430,10 @@ struct i915_vma *
 
 	/* Populate source page list from the object. */
 	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
-		page_addr_list[i] = sg_page_iter_dma_address(&sg_iter);
-		i++;
-	}
+	for_each_sgt_dma(dma_addr, sgt_iter, obj->pages)
+		page_addr_list[i++] = dma_addr;
 
+	GEM_BUG_ON(i != n_pages);
 	st->nents = 0;
 	sg = st->sgl;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 32d9726..d2c1595 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -706,7 +706,8 @@ struct get_pages_work {
 static void
 i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
 {
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	struct page *page;
 
 	BUG_ON(obj->userptr.work != NULL);
 	__i915_gem_userptr_set_active(obj, false);
@@ -716,9 +717,7 @@ struct get_pages_work {
 
 	i915_gem_gtt_finish_object(obj);
 
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
-		struct page *page = sg_page_iter_page(&sg_iter);
-
+	for_each_sgt_page(page, sgt_iter, obj->pages) {
 		if (obj->dirty)
 			set_page_dirty(page);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] Introduce & use new lightweight SGL iterators
@ 2016-05-16 15:19   ` Dave Gordon
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Gordon @ 2016-05-16 15:19 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel

The existing for_each_sg_page() iterator is somewhat heavyweight, and is
limiting i915 driver performance in a few benchmarks. So here we
introduce somewhat lighter weight iterators, primarily for use with GEM
objects or other case where we need only deal with whole aligned pages.

Unlike the old iterator, the new iterators use an internal state
structure which is not intended to be accessed by the caller; instead
each takes as a parameter an output variable which is set before each
iteration. This makes them particularly simple to use :)

One of the new iterators provides the caller with the DMA address of
each page in turn; the other provides the 'struct page' pointer required
by many memory management operations.

Various uses of for_each_sg_page() are then converted to the new macros.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: linux-kernel@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_drv.h         | 62 +++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_gem.c         | 20 ++++-----
 drivers/gpu/drm/i915/i915_gem_fence.c   | 14 +++---
 drivers/gpu/drm/i915/i915_gem_gtt.c     | 76 ++++++++++++++++-----------------
 drivers/gpu/drm/i915/i915_gem_userptr.c |  7 ++-
 5 files changed, 116 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 72f0b02..c0fc6aa 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2125,6 +2125,10 @@ struct drm_i915_gem_object_ops {
 #define INTEL_FRONTBUFFER_ALL_MASK(pipe) \
 	(0xff << (INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe)))
 
+void i915_gem_track_fb(struct drm_i915_gem_object *old,
+		       struct drm_i915_gem_object *new,
+		       unsigned frontbuffer_bits);
+
 struct drm_i915_gem_object {
 	struct drm_gem_object base;
 
@@ -2256,9 +2260,61 @@ struct drm_i915_gem_object {
 };
 #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
 
-void i915_gem_track_fb(struct drm_i915_gem_object *old,
-		       struct drm_i915_gem_object *new,
-		       unsigned frontbuffer_bits);
+/*
+ * New optimised SGL iterator for i915 GEM objects
+ */
+struct sgt_iter {
+	struct scatterlist *sgp;
+	union {
+		unsigned long pfn;
+		unsigned long dma;
+	} ix;
+	unsigned int curr;
+	unsigned int max;
+};
+
+/* Constructor for new iterator */
+static inline struct sgt_iter
+__sgt_iter(struct scatterlist *sgl, bool dma)
+{
+	struct sgt_iter s = { .sgp = sgl };
+
+	if (sgl) {
+		s.max = s.curr = sgl->offset;
+		s.max += sgl->length;
+		if (dma)
+			s.ix.dma = sg_dma_address(sgl);
+		else
+			s.ix.pfn = page_to_pfn(sg_page(sgl));
+	}
+
+	return s;
+}
+
+/**
+ * for_each_sgt_dma - iterate over the DMA addresses of the given sg_table
+ * @__dmap:	DMA address (output)
+ * @__iter:	'struct sgt_iter' (iterator state, internal)
+ * @__sgt:	sg_table to iterate over (input)
+ */
+#define for_each_sgt_dma(__dmap, __iter, __sgt)				\
+	for ((__iter) = __sgt_iter((__sgt)->sgl, true);			\
+	     ((__dmap) = (__iter).ix.dma + (__iter).curr);		\
+	     (((__iter).curr += PAGE_SIZE) < (__iter).max) ||		\
+		((__iter) = __sgt_iter(sg_next((__iter).sgp), true), 0))
+
+/**
+ * for_each_sgt_page - iterate over the pages of the given sg_table
+ * @__pp:	page pointer (output)
+ * @__iter:	'struct sgt_iter' (iterator state, internal)
+ * @__sgt:	sg_table to iterate over (input)
+ */
+#define for_each_sgt_page(__pp, __iter, __sgt)				\
+	for ((__iter) = __sgt_iter((__sgt)->sgl, false);		\
+	     ((__pp) = (__iter).ix.pfn == 0 ? NULL :			\
+		pfn_to_page((__iter).ix.pfn + ((__iter).curr >> PAGE_SHIFT)));\
+	     (((__iter).curr += PAGE_SIZE) < (__iter).max) ||		\
+		((__iter) = __sgt_iter(sg_next((__iter).sgp), false), 0))
 
 /**
  * Request queue structure.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 38e4e1e..0fe7e95 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2165,7 +2165,8 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
 static void
 i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
 {
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	struct page *page;
 	int ret;
 
 	BUG_ON(obj->madv == __I915_MADV_PURGED);
@@ -2187,9 +2188,7 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
 	if (obj->madv == I915_MADV_DONTNEED)
 		obj->dirty = 0;
 
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
-		struct page *page = sg_page_iter_page(&sg_iter);
-
+	for_each_sgt_page(page, sgt_iter, obj->pages) {
 		if (obj->dirty)
 			set_page_dirty(page);
 
@@ -2246,7 +2245,7 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
 	struct address_space *mapping;
 	struct sg_table *st;
 	struct scatterlist *sg;
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
 	struct page *page;
 	unsigned long last_pfn = 0;	/* suppress gcc warning */
 	int ret;
@@ -2343,8 +2342,8 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
 
 err_pages:
 	sg_mark_end(sg);
-	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
-		put_page(sg_page_iter_page(&sg_iter));
+	for_each_sgt_page(page, sgt_iter, st)
+		put_page(page);
 	sg_free_table(st);
 	kfree(st);
 
@@ -2403,7 +2402,8 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
 {
 	unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
 	struct sg_table *sgt = obj->pages;
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	struct page *page;
 	struct page *stack_pages[32];
 	struct page **pages = stack_pages;
 	unsigned long i = 0;
@@ -2423,8 +2423,8 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
 		}
 	}
 
-	for_each_sg_page(sgt->sgl, &sg_iter, sgt->nents, 0)
-		pages[i++] = sg_page_iter_page(&sg_iter);
+	for_each_sgt_page(page, sgt_iter, sgt)
+		pages[i++] = page;
 
 	/* Check that we have the expected number of pages */
 	GEM_BUG_ON(i != n_pages);
diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
index a2b938e..2b6bdc2 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence.c
@@ -745,15 +745,15 @@ void i915_gem_restore_fences(struct drm_device *dev)
 void
 i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj)
 {
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	struct page *page;
 	int i;
 
 	if (obj->bit_17 == NULL)
 		return;
 
 	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
-		struct page *page = sg_page_iter_page(&sg_iter);
+	for_each_sgt_page(page, sgt_iter, obj->pages) {
 		char new_bit_17 = page_to_phys(page) >> 17;
 		if ((new_bit_17 & 0x1) !=
 		    (test_bit(i, obj->bit_17) != 0)) {
@@ -775,7 +775,8 @@ void i915_gem_restore_fences(struct drm_device *dev)
 void
 i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj)
 {
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	struct page *page;
 	int page_count = obj->base.size >> PAGE_SHIFT;
 	int i;
 
@@ -790,8 +791,9 @@ void i915_gem_restore_fences(struct drm_device *dev)
 	}
 
 	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
-		if (page_to_phys(sg_page_iter_page(&sg_iter)) & (1 << 17))
+
+	for_each_sgt_page(page, sgt_iter, obj->pages) {
+		if (page_to_phys(page) & (1 << 17))
 			__set_bit(i, obj->bit_17);
 		else
 			__clear_bit(i, obj->bit_17);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7eab619..4668477 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1839,20 +1839,19 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 				      enum i915_cache_level cache_level, u32 flags)
 {
 	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
-	gen6_pte_t *pt_vaddr;
 	unsigned first_entry = start >> PAGE_SHIFT;
 	unsigned act_pt = first_entry / GEN6_PTES;
 	unsigned act_pte = first_entry % GEN6_PTES;
-	struct sg_page_iter sg_iter;
+	gen6_pte_t *pt_vaddr = NULL;
+	struct sgt_iter sgt_iter;
+	dma_addr_t addr;
 
-	pt_vaddr = NULL;
-	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
+	for_each_sgt_dma(addr, sgt_iter, pages) {
 		if (pt_vaddr == NULL)
 			pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
 
 		pt_vaddr[act_pte] =
-			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
-				       cache_level, true, flags);
+			vm->pte_encode(addr, cache_level, true, flags);
 
 		if (++act_pte == GEN6_PTES) {
 			kunmap_px(ppgtt, pt_vaddr);
@@ -1861,6 +1860,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 			act_pte = 0;
 		}
 	}
+
 	if (pt_vaddr)
 		kunmap_px(ppgtt, pt_vaddr);
 }
@@ -2362,22 +2362,20 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 {
 	struct drm_i915_private *dev_priv = to_i915(vm->dev);
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-	unsigned first_entry = start >> PAGE_SHIFT;
-	gen8_pte_t __iomem *gtt_entries =
-		(gen8_pte_t __iomem *)ggtt->gsm + first_entry;
-	int i = 0;
-	struct sg_page_iter sg_iter;
-	dma_addr_t addr = 0; /* shut up gcc */
+	struct sgt_iter sgt_iter;
+	gen8_pte_t __iomem *gtt_entries;
+	gen8_pte_t gtt_entry;
+	dma_addr_t addr;
 	int rpm_atomic_seq;
+	int i = 0;
 
 	rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
 
-	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) {
-		addr = sg_dma_address(sg_iter.sg) +
-			(sg_iter.sg_pgoffset << PAGE_SHIFT);
-		gen8_set_pte(&gtt_entries[i],
-			     gen8_pte_encode(addr, level, true));
-		i++;
+	gtt_entries = (gen8_pte_t __iomem *)ggtt->gsm + (start >> PAGE_SHIFT);
+
+	for_each_sgt_dma(addr, sgt_iter, st) {
+		gtt_entry = gen8_pte_encode(addr, level, true);
+		gen8_set_pte(&gtt_entries[i++], gtt_entry);
 	}
 
 	/*
@@ -2388,8 +2386,7 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 	 * hardware should work, we must keep this posting read for paranoia.
 	 */
 	if (i != 0)
-		WARN_ON(readq(&gtt_entries[i-1])
-			!= gen8_pte_encode(addr, level, true));
+		WARN_ON(readq(&gtt_entries[i-1]) != gtt_entry);
 
 	/* This next bit makes the above posting read even more important. We
 	 * want to flush the TLBs only after we're certain all the PTE updates
@@ -2440,20 +2437,20 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 {
 	struct drm_i915_private *dev_priv = to_i915(vm->dev);
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-	unsigned first_entry = start >> PAGE_SHIFT;
-	gen6_pte_t __iomem *gtt_entries =
-		(gen6_pte_t __iomem *)ggtt->gsm + first_entry;
-	int i = 0;
-	struct sg_page_iter sg_iter;
-	dma_addr_t addr = 0;
+	struct sgt_iter sgt_iter;
+	gen6_pte_t __iomem *gtt_entries;
+	gen6_pte_t gtt_entry;
+	dma_addr_t addr;
 	int rpm_atomic_seq;
+	int i = 0;
 
 	rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
 
-	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) {
-		addr = sg_page_iter_dma_address(&sg_iter);
-		iowrite32(vm->pte_encode(addr, level, true, flags), &gtt_entries[i]);
-		i++;
+	gtt_entries = (gen6_pte_t __iomem *)ggtt->gsm + (start >> PAGE_SHIFT);
+
+	for_each_sgt_dma(addr, sgt_iter, st) {
+		gtt_entry = vm->pte_encode(addr, level, true, flags);
+		iowrite32(gtt_entry, &gtt_entries[i++]);
 	}
 
 	/* XXX: This serves as a posting read to make sure that the PTE has
@@ -2462,10 +2459,8 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
 	 * of NUMA access patterns. Therefore, even with the way we assume
 	 * hardware should work, we must keep this posting read for paranoia.
 	 */
-	if (i != 0) {
-		unsigned long gtt = readl(&gtt_entries[i-1]);
-		WARN_ON(gtt != vm->pte_encode(addr, level, true, flags));
-	}
+	if (i != 0)
+		WARN_ON(readl(&gtt_entries[i-1]) != gtt_entry);
 
 	/* This next bit makes the above posting read even more important. We
 	 * want to flush the TLBs only after we're certain all the PTE updates
@@ -3399,9 +3394,11 @@ struct i915_vma *
 intel_rotate_fb_obj_pages(struct intel_rotation_info *rot_info,
 			  struct drm_i915_gem_object *obj)
 {
+	const size_t n_pages = obj->base.size / PAGE_SIZE;
 	unsigned int size_pages = rot_info->plane[0].width * rot_info->plane[0].height;
 	unsigned int size_pages_uv;
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	dma_addr_t dma_addr;
 	unsigned long i;
 	dma_addr_t *page_addr_list;
 	struct sg_table *st;
@@ -3410,7 +3407,7 @@ struct i915_vma *
 	int ret = -ENOMEM;
 
 	/* Allocate a temporary list of source pages for random access. */
-	page_addr_list = drm_malloc_gfp(obj->base.size / PAGE_SIZE,
+	page_addr_list = drm_malloc_gfp(n_pages,
 					sizeof(dma_addr_t),
 					GFP_TEMPORARY);
 	if (!page_addr_list)
@@ -3433,11 +3430,10 @@ struct i915_vma *
 
 	/* Populate source page list from the object. */
 	i = 0;
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
-		page_addr_list[i] = sg_page_iter_dma_address(&sg_iter);
-		i++;
-	}
+	for_each_sgt_dma(dma_addr, sgt_iter, obj->pages)
+		page_addr_list[i++] = dma_addr;
 
+	GEM_BUG_ON(i != n_pages);
 	st->nents = 0;
 	sg = st->sgl;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 32d9726..d2c1595 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -706,7 +706,8 @@ struct get_pages_work {
 static void
 i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
 {
-	struct sg_page_iter sg_iter;
+	struct sgt_iter sgt_iter;
+	struct page *page;
 
 	BUG_ON(obj->userptr.work != NULL);
 	__i915_gem_userptr_set_active(obj, false);
@@ -716,9 +717,7 @@ struct get_pages_work {
 
 	i915_gem_gtt_finish_object(obj);
 
-	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
-		struct page *page = sg_page_iter_page(&sg_iter);
-
+	for_each_sgt_page(page, sgt_iter, obj->pages) {
 		if (obj->dirty)
 			set_page_dirty(page);
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [1/3] drm/i915: refactor i915_gem_object_pin_map()
  2016-05-16 15:19 [PATCH 1/3] drm/i915: refactor i915_gem_object_pin_map() Dave Gordon
  2016-05-16 15:19 ` [PATCH 2/3] drm/i915: optimise i915_gem_object_map() for small objects Dave Gordon
  2016-05-16 15:19   ` Dave Gordon
@ 2016-05-16 16:01 ` Patchwork
  2016-05-17  9:22 ` [PATCH 1/3] " Tvrtko Ursulin
  3 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2016-05-16 16:01 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: refactor i915_gem_object_pin_map()
URL   : https://patchwork.freedesktop.org/series/7237/
State : failure

== Summary ==

Series 7237v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/7237/revisions/1/mbox

Test drv_hangman:
        Subgroup error-state-basic:
                fail       -> PASS       (ro-ilk1-i5-650)
Test kms_flip:
        Subgroup basic-flip-vs-wf_vblank:
                pass       -> FAIL       (ro-hsw-i7-4770r)
                fail       -> PASS       (ro-hsw-i3-4010u)

ro-bdw-i5-5250u  total:219  pass:181  dwarn:0   dfail:0   fail:0   skip:38 
ro-bdw-i7-5557U  total:219  pass:206  dwarn:0   dfail:0   fail:0   skip:13 
ro-bdw-i7-5600u  total:219  pass:187  dwarn:0   dfail:0   fail:0   skip:32 
ro-bsw-n3050     total:219  pass:175  dwarn:0   dfail:0   fail:2   skip:42 
ro-byt-n2820     total:218  pass:174  dwarn:0   dfail:0   fail:3   skip:41 
ro-hsw-i3-4010u  total:218  pass:193  dwarn:0   dfail:0   fail:0   skip:25 
ro-hsw-i7-4770r  total:219  pass:193  dwarn:0   dfail:0   fail:1   skip:25 
ro-ilk-i7-620lm  total:219  pass:151  dwarn:0   dfail:0   fail:1   skip:67 
ro-ilk1-i5-650   total:214  pass:152  dwarn:0   dfail:0   fail:1   skip:61 
ro-ivb-i7-3770   total:219  pass:183  dwarn:0   dfail:0   fail:0   skip:36 
ro-ivb2-i7-3770  total:219  pass:187  dwarn:0   dfail:0   fail:0   skip:32 
ro-skl-i7-6700hq total:214  pass:189  dwarn:0   dfail:0   fail:0   skip:25 
ro-snb-i7-2620M  total:219  pass:177  dwarn:0   dfail:0   fail:1   skip:41 

Results at /archive/results/CI_IGT_test/RO_Patchwork_910/

d381724 drm-intel-nightly: 2016y-05m-16d-12h-14m-04s UTC integration manifest
304d9cf Introduce & use new lightweight SGL iterators
60b157a drm/i915: optimise i915_gem_object_map() for small objects
bd3b9cc drm/i915: refactor i915_gem_object_pin_map()

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] drm/i915: refactor i915_gem_object_pin_map()
  2016-05-16 15:19 [PATCH 1/3] drm/i915: refactor i915_gem_object_pin_map() Dave Gordon
                   ` (2 preceding siblings ...)
  2016-05-16 16:01 ` ✗ Ro.CI.BAT: failure for series starting with [1/3] drm/i915: refactor i915_gem_object_pin_map() Patchwork
@ 2016-05-17  9:22 ` Tvrtko Ursulin
  2016-05-17 12:59   ` Dave Gordon
  3 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-05-17  9:22 UTC (permalink / raw)
  To: Dave Gordon, intel-gfx


On 16/05/16 16:19, Dave Gordon wrote:
> The recently-added i915_gem_object_pin_map() can be further optimised
> for "small" objects. To facilitate this, and simplify the error paths
> before adding the new code, this patch pulls out the "mapping" part of
> the operation (involving local allocations which must be undone before
> return) into its own subfunction.
>
> The next patch will then insert the new optimisation into the middle of
> the now-separated subfunction.
>
> This reorganisation will probably not affect the generated code, as the
> compiler will most likely inline it anyway, but it makes the logical
> structure a bit clearer and easier to modify.
>
> v2:
>      Restructure loop-over-pages & error check (Chris Wilson)
>
> v3:
>      Add page count to debug messages (Chris Wilson)
>      Convert WARN_ON() to GEM_BUG_ON()
>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 59 ++++++++++++++++++++++++++---------------
>   1 file changed, 38 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index aff386e..bbec429 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2398,6 +2398,43 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
>   	return 0;
>   }
>
> +/* The 'mapping' part of i915_gem_object_pin_map() below */
> +static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
> +{
> +	unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
> +	struct sg_table *sgt = obj->pages;
> +	struct sg_page_iter sg_iter;
> +	struct page **pages;
> +	unsigned long i = 0;
> +	void *addr = NULL;

Looks like this does not need to be initialized?

> +
> +	/* A single page can always be kmapped */
> +	if (n_pages == 1)
> +		return kmap(sg_page(sgt->sgl));
> +
> +	pages = drm_malloc_gfp(n_pages, sizeof(*pages), GFP_TEMPORARY);
> +	if (pages == NULL) {
> +		DRM_DEBUG_DRIVER("Failed to get space for %lu pointers\n",
> +				 n_pages);

Not terribly important but I think this is too low level functions to 
have debug logging. It will not add a lot of useful information, no call 
stack etc. And the callers are probably handling failures already and 
they would propagate somewhere from where it is already reported.

> +		return NULL;
> +	}
> +
> +	for_each_sg_page(sgt->sgl, &sg_iter, sgt->nents, 0)
> +		pages[i++] = sg_page_iter_page(&sg_iter);
> +
> +	/* Check that we have the expected number of pages */
> +	GEM_BUG_ON(i != n_pages);
> +
> +	addr = vmap(pages, n_pages, 0, PAGE_KERNEL);
> +	if (addr == NULL)
> +		DRM_DEBUG_DRIVER("Failed to vmap %lu pages\n", n_pages);

Same here. I mean, the only potential argument could be that this will 
tell the real reason which is otherwise lost in the NULL return code, 
but I am not sure it is worth it since it is so unlikely it would happen.

> +
> +	drm_free_large(pages);
> +
> +	return addr;
> +}
> +
> +/* get, pin, and map the pages of the object into kernel space */
>   void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj)
>   {
>   	int ret;
> @@ -2411,27 +2448,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj)
>   	i915_gem_object_pin_pages(obj);
>
>   	if (obj->mapping == NULL) {
> -		struct page **pages;
> -
> -		pages = NULL;
> -		if (obj->base.size == PAGE_SIZE)
> -			obj->mapping = kmap(sg_page(obj->pages->sgl));
> -		else
> -			pages = drm_malloc_gfp(obj->base.size >> PAGE_SHIFT,
> -					       sizeof(*pages),
> -					       GFP_TEMPORARY);
> -		if (pages != NULL) {
> -			struct sg_page_iter sg_iter;
> -			int n;
> -
> -			n = 0;
> -			for_each_sg_page(obj->pages->sgl, &sg_iter,
> -					 obj->pages->nents, 0)
> -				pages[n++] = sg_page_iter_page(&sg_iter);
> -
> -			obj->mapping = vmap(pages, n, 0, PAGE_KERNEL);
> -			drm_free_large(pages);
> -		}
> +		obj->mapping = i915_gem_object_map(obj);
>   		if (obj->mapping == NULL) {
>   			i915_gem_object_unpin_pages(obj);
>   			return ERR_PTR(-ENOMEM);
>

Otherwise looks fine to me.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH 3/3] Introduce & use new lightweight SGL iterators
  2016-05-16 15:19   ` Dave Gordon
@ 2016-05-17 10:34     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-05-17 10:34 UTC (permalink / raw)
  To: Dave Gordon, intel-gfx; +Cc: linux-kernel


On 16/05/16 16:19, Dave Gordon wrote:
> The existing for_each_sg_page() iterator is somewhat heavyweight, and is
> limiting i915 driver performance in a few benchmarks. So here we
> introduce somewhat lighter weight iterators, primarily for use with GEM
> objects or other case where we need only deal with whole aligned pages.

Interesting idea, if for nothing then for eliminating the dreaded 
st->nents of for_each_sg_page. :)

Which benchmarks it improves and how much do you know?

It generates more code for me but that seems to be by design, yes? 
Because what were previously calls to __sg_page_iter_next is now 
effectively inlined and instead there is only a call to sg_next, which 
__sg_page_iter_next would call anyway.

> Unlike the old iterator, the new iterators use an internal state
> structure which is not intended to be accessed by the caller; instead
> each takes as a parameter an output variable which is set before each
> iteration. This makes them particularly simple to use :)
>
> One of the new iterators provides the caller with the DMA address of
> each page in turn; the other provides the 'struct page' pointer required
> by many memory management operations.
>
> Various uses of for_each_sg_page() are then converted to the new macros.
>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: linux-kernel@vger.kernel.org
> ---
>   drivers/gpu/drm/i915/i915_drv.h         | 62 +++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/i915_gem.c         | 20 ++++-----
>   drivers/gpu/drm/i915/i915_gem_fence.c   | 14 +++---
>   drivers/gpu/drm/i915/i915_gem_gtt.c     | 76 ++++++++++++++++-----------------
>   drivers/gpu/drm/i915/i915_gem_userptr.c |  7 ++-
>   5 files changed, 116 insertions(+), 63 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 72f0b02..c0fc6aa 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2125,6 +2125,10 @@ struct drm_i915_gem_object_ops {
>   #define INTEL_FRONTBUFFER_ALL_MASK(pipe) \
>   	(0xff << (INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe)))
>
> +void i915_gem_track_fb(struct drm_i915_gem_object *old,
> +		       struct drm_i915_gem_object *new,
> +		       unsigned frontbuffer_bits);
> +
>   struct drm_i915_gem_object {
>   	struct drm_gem_object base;
>
> @@ -2256,9 +2260,61 @@ struct drm_i915_gem_object {
>   };
>   #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
>
> -void i915_gem_track_fb(struct drm_i915_gem_object *old,
> -		       struct drm_i915_gem_object *new,
> -		       unsigned frontbuffer_bits);
> +/*
> + * New optimised SGL iterator for i915 GEM objects
> + */
> +struct sgt_iter {
> +	struct scatterlist *sgp;
> +	union {
> +		unsigned long pfn;
> +		unsigned long dma;

dma_addr_t

> +	} ix;
> +	unsigned int curr;
> +	unsigned int max;

I think since this counts in bytes it should match obj->base.size which 
is a size_t?

Also, maybe the iterator would be more efficient if it counted in pages? 
Hm, probably makes no difference, but maybe worth exploring. Was just 
thinking about avoiding the need to to convert byte position to page 
position in every iteration.

> +};
> +
> +/* Constructor for new iterator */
> +static inline struct sgt_iter
> +__sgt_iter(struct scatterlist *sgl, bool dma)
> +{
> +	struct sgt_iter s = { .sgp = sgl };
> +
> +	if (sgl) {
> +		s.max = s.curr = sgl->offset;
> +		s.max += sgl->length;
> +		if (dma)
> +			s.ix.dma = sg_dma_address(sgl);
> +		else
> +			s.ix.pfn = page_to_pfn(sg_page(sgl));

I suppose we can rely on the compiler to optimize one branch away rather 
than having to provide two iterators.

> +	}
> +
> +	return s;
> +}
> +
> +/**
> + * for_each_sgt_dma - iterate over the DMA addresses of the given sg_table
> + * @__dmap:	DMA address (output)
> + * @__iter:	'struct sgt_iter' (iterator state, internal)
> + * @__sgt:	sg_table to iterate over (input)
> + */
> +#define for_each_sgt_dma(__dmap, __iter, __sgt)				\
> +	for ((__iter) = __sgt_iter((__sgt)->sgl, true);			\
> +	     ((__dmap) = (__iter).ix.dma + (__iter).curr);		\
> +	     (((__iter).curr += PAGE_SIZE) < (__iter).max) ||		\
> +		((__iter) = __sgt_iter(sg_next((__iter).sgp), true), 0))
> +
> +/**
> + * for_each_sgt_page - iterate over the pages of the given sg_table
> + * @__pp:	page pointer (output)
> + * @__iter:	'struct sgt_iter' (iterator state, internal)
> + * @__sgt:	sg_table to iterate over (input)
> + */
> +#define for_each_sgt_page(__pp, __iter, __sgt)				\
> +	for ((__iter) = __sgt_iter((__sgt)->sgl, false);		\
> +	     ((__pp) = (__iter).ix.pfn == 0 ? NULL :			\
> +		pfn_to_page((__iter).ix.pfn + ((__iter).curr >> PAGE_SHIFT)));\

I could figure out what is the pfn == 0 check for? The other loop has no 
such checks so one of them looks suspicious or I am missing something.

Also I think to be worth it you would have to handle the offset as well. 
That way all usages for for_each_sg_page could be converted and without 
it it leaves a mix of new and old.

> +	     (((__iter).curr += PAGE_SIZE) < (__iter).max) ||		\
> +		((__iter) = __sgt_iter(sg_next((__iter).sgp), false), 0))
>
>   /**
>    * Request queue structure.
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 38e4e1e..0fe7e95 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2165,7 +2165,8 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
>   static void
>   i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
>   {
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	struct page *page;
>   	int ret;
>
>   	BUG_ON(obj->madv == __I915_MADV_PURGED);
> @@ -2187,9 +2188,7 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
>   	if (obj->madv == I915_MADV_DONTNEED)
>   		obj->dirty = 0;
>
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> -		struct page *page = sg_page_iter_page(&sg_iter);
> -
> +	for_each_sgt_page(page, sgt_iter, obj->pages) {
>   		if (obj->dirty)
>   			set_page_dirty(page);
>
> @@ -2246,7 +2245,7 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
>   	struct address_space *mapping;
>   	struct sg_table *st;
>   	struct scatterlist *sg;
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
>   	struct page *page;
>   	unsigned long last_pfn = 0;	/* suppress gcc warning */
>   	int ret;
> @@ -2343,8 +2342,8 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
>
>   err_pages:
>   	sg_mark_end(sg);
> -	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
> -		put_page(sg_page_iter_page(&sg_iter));
> +	for_each_sgt_page(page, sgt_iter, st)
> +		put_page(page);
>   	sg_free_table(st);
>   	kfree(st);
>
> @@ -2403,7 +2402,8 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
>   {
>   	unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
>   	struct sg_table *sgt = obj->pages;
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	struct page *page;
>   	struct page *stack_pages[32];
>   	struct page **pages = stack_pages;
>   	unsigned long i = 0;
> @@ -2423,8 +2423,8 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
>   		}
>   	}
>
> -	for_each_sg_page(sgt->sgl, &sg_iter, sgt->nents, 0)
> -		pages[i++] = sg_page_iter_page(&sg_iter);
> +	for_each_sgt_page(page, sgt_iter, sgt)
> +		pages[i++] = page;
>
>   	/* Check that we have the expected number of pages */
>   	GEM_BUG_ON(i != n_pages);
> diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
> index a2b938e..2b6bdc2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_fence.c
> +++ b/drivers/gpu/drm/i915/i915_gem_fence.c
> @@ -745,15 +745,15 @@ void i915_gem_restore_fences(struct drm_device *dev)
>   void
>   i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj)
>   {
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	struct page *page;
>   	int i;
>
>   	if (obj->bit_17 == NULL)
>   		return;
>
>   	i = 0;
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> -		struct page *page = sg_page_iter_page(&sg_iter);
> +	for_each_sgt_page(page, sgt_iter, obj->pages) {
>   		char new_bit_17 = page_to_phys(page) >> 17;
>   		if ((new_bit_17 & 0x1) !=
>   		    (test_bit(i, obj->bit_17) != 0)) {
> @@ -775,7 +775,8 @@ void i915_gem_restore_fences(struct drm_device *dev)
>   void
>   i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj)
>   {
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	struct page *page;
>   	int page_count = obj->base.size >> PAGE_SHIFT;
>   	int i;
>
> @@ -790,8 +791,9 @@ void i915_gem_restore_fences(struct drm_device *dev)
>   	}
>
>   	i = 0;
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> -		if (page_to_phys(sg_page_iter_page(&sg_iter)) & (1 << 17))
> +
> +	for_each_sgt_page(page, sgt_iter, obj->pages) {
> +		if (page_to_phys(page) & (1 << 17))
>   			__set_bit(i, obj->bit_17);
>   		else
>   			__clear_bit(i, obj->bit_17);
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 7eab619..4668477 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1839,20 +1839,19 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   				      enum i915_cache_level cache_level, u32 flags)
>   {
>   	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
> -	gen6_pte_t *pt_vaddr;
>   	unsigned first_entry = start >> PAGE_SHIFT;
>   	unsigned act_pt = first_entry / GEN6_PTES;
>   	unsigned act_pte = first_entry % GEN6_PTES;
> -	struct sg_page_iter sg_iter;
> +	gen6_pte_t *pt_vaddr = NULL;
> +	struct sgt_iter sgt_iter;
> +	dma_addr_t addr;
>
> -	pt_vaddr = NULL;
> -	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> +	for_each_sgt_dma(addr, sgt_iter, pages) {
>   		if (pt_vaddr == NULL)
>   			pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
>
>   		pt_vaddr[act_pte] =
> -			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
> -				       cache_level, true, flags);
> +			vm->pte_encode(addr, cache_level, true, flags);
>
>   		if (++act_pte == GEN6_PTES) {
>   			kunmap_px(ppgtt, pt_vaddr);
> @@ -1861,6 +1860,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   			act_pte = 0;
>   		}
>   	}
> +
>   	if (pt_vaddr)
>   		kunmap_px(ppgtt, pt_vaddr);
>   }
> @@ -2362,22 +2362,20 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>   {
>   	struct drm_i915_private *dev_priv = to_i915(vm->dev);
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> -	unsigned first_entry = start >> PAGE_SHIFT;
> -	gen8_pte_t __iomem *gtt_entries =
> -		(gen8_pte_t __iomem *)ggtt->gsm + first_entry;
> -	int i = 0;
> -	struct sg_page_iter sg_iter;
> -	dma_addr_t addr = 0; /* shut up gcc */
> +	struct sgt_iter sgt_iter;
> +	gen8_pte_t __iomem *gtt_entries;
> +	gen8_pte_t gtt_entry;
> +	dma_addr_t addr;
>   	int rpm_atomic_seq;
> +	int i = 0;
>
>   	rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
>
> -	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) {
> -		addr = sg_dma_address(sg_iter.sg) +
> -			(sg_iter.sg_pgoffset << PAGE_SHIFT);
> -		gen8_set_pte(&gtt_entries[i],
> -			     gen8_pte_encode(addr, level, true));
> -		i++;
> +	gtt_entries = (gen8_pte_t __iomem *)ggtt->gsm + (start >> PAGE_SHIFT);
> +
> +	for_each_sgt_dma(addr, sgt_iter, st) {
> +		gtt_entry = gen8_pte_encode(addr, level, true);
> +		gen8_set_pte(&gtt_entries[i++], gtt_entry);
>   	}
>
>   	/*
> @@ -2388,8 +2386,7 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>   	 * hardware should work, we must keep this posting read for paranoia.
>   	 */
>   	if (i != 0)
> -		WARN_ON(readq(&gtt_entries[i-1])
> -			!= gen8_pte_encode(addr, level, true));
> +		WARN_ON(readq(&gtt_entries[i-1]) != gtt_entry);
>
>   	/* This next bit makes the above posting read even more important. We
>   	 * want to flush the TLBs only after we're certain all the PTE updates
> @@ -2440,20 +2437,20 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>   {
>   	struct drm_i915_private *dev_priv = to_i915(vm->dev);
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> -	unsigned first_entry = start >> PAGE_SHIFT;
> -	gen6_pte_t __iomem *gtt_entries =
> -		(gen6_pte_t __iomem *)ggtt->gsm + first_entry;
> -	int i = 0;
> -	struct sg_page_iter sg_iter;
> -	dma_addr_t addr = 0;
> +	struct sgt_iter sgt_iter;
> +	gen6_pte_t __iomem *gtt_entries;
> +	gen6_pte_t gtt_entry;
> +	dma_addr_t addr;
>   	int rpm_atomic_seq;
> +	int i = 0;
>
>   	rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
>
> -	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) {
> -		addr = sg_page_iter_dma_address(&sg_iter);
> -		iowrite32(vm->pte_encode(addr, level, true, flags), &gtt_entries[i]);
> -		i++;
> +	gtt_entries = (gen6_pte_t __iomem *)ggtt->gsm + (start >> PAGE_SHIFT);
> +
> +	for_each_sgt_dma(addr, sgt_iter, st) {
> +		gtt_entry = vm->pte_encode(addr, level, true, flags);
> +		iowrite32(gtt_entry, &gtt_entries[i++]);
>   	}
>
>   	/* XXX: This serves as a posting read to make sure that the PTE has
> @@ -2462,10 +2459,8 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>   	 * of NUMA access patterns. Therefore, even with the way we assume
>   	 * hardware should work, we must keep this posting read for paranoia.
>   	 */
> -	if (i != 0) {
> -		unsigned long gtt = readl(&gtt_entries[i-1]);
> -		WARN_ON(gtt != vm->pte_encode(addr, level, true, flags));
> -	}
> +	if (i != 0)
> +		WARN_ON(readl(&gtt_entries[i-1]) != gtt_entry);
>
>   	/* This next bit makes the above posting read even more important. We
>   	 * want to flush the TLBs only after we're certain all the PTE updates
> @@ -3399,9 +3394,11 @@ struct i915_vma *
>   intel_rotate_fb_obj_pages(struct intel_rotation_info *rot_info,
>   			  struct drm_i915_gem_object *obj)
>   {
> +	const size_t n_pages = obj->base.size / PAGE_SIZE;
>   	unsigned int size_pages = rot_info->plane[0].width * rot_info->plane[0].height;
>   	unsigned int size_pages_uv;
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	dma_addr_t dma_addr;
>   	unsigned long i;
>   	dma_addr_t *page_addr_list;
>   	struct sg_table *st;
> @@ -3410,7 +3407,7 @@ struct i915_vma *
>   	int ret = -ENOMEM;
>
>   	/* Allocate a temporary list of source pages for random access. */
> -	page_addr_list = drm_malloc_gfp(obj->base.size / PAGE_SIZE,
> +	page_addr_list = drm_malloc_gfp(n_pages,
>   					sizeof(dma_addr_t),
>   					GFP_TEMPORARY);
>   	if (!page_addr_list)
> @@ -3433,11 +3430,10 @@ struct i915_vma *
>
>   	/* Populate source page list from the object. */
>   	i = 0;
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> -		page_addr_list[i] = sg_page_iter_dma_address(&sg_iter);
> -		i++;
> -	}
> +	for_each_sgt_dma(dma_addr, sgt_iter, obj->pages)
> +		page_addr_list[i++] = dma_addr;
>
> +	GEM_BUG_ON(i != n_pages);
>   	st->nents = 0;
>   	sg = st->sgl;
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 32d9726..d2c1595 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -706,7 +706,8 @@ struct get_pages_work {
>   static void
>   i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
>   {
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	struct page *page;
>
>   	BUG_ON(obj->userptr.work != NULL);
>   	__i915_gem_userptr_set_active(obj, false);
> @@ -716,9 +717,7 @@ struct get_pages_work {
>
>   	i915_gem_gtt_finish_object(obj);
>
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> -		struct page *page = sg_page_iter_page(&sg_iter);
> -
> +	for_each_sgt_page(page, sgt_iter, obj->pages) {
>   		if (obj->dirty)
>   			set_page_dirty(page);
>
>

Couldn't spot any problems in this block.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Introduce & use new lightweight SGL iterators
@ 2016-05-17 10:34     ` Tvrtko Ursulin
  0 siblings, 0 replies; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-05-17 10:34 UTC (permalink / raw)
  To: Dave Gordon, intel-gfx; +Cc: linux-kernel


On 16/05/16 16:19, Dave Gordon wrote:
> The existing for_each_sg_page() iterator is somewhat heavyweight, and is
> limiting i915 driver performance in a few benchmarks. So here we
> introduce somewhat lighter weight iterators, primarily for use with GEM
> objects or other case where we need only deal with whole aligned pages.

Interesting idea, if for nothing then for eliminating the dreaded 
st->nents of for_each_sg_page. :)

Which benchmarks it improves and how much do you know?

It generates more code for me but that seems to be by design, yes? 
Because what were previously calls to __sg_page_iter_next is now 
effectively inlined and instead there is only a call to sg_next, which 
__sg_page_iter_next would call anyway.

> Unlike the old iterator, the new iterators use an internal state
> structure which is not intended to be accessed by the caller; instead
> each takes as a parameter an output variable which is set before each
> iteration. This makes them particularly simple to use :)
>
> One of the new iterators provides the caller with the DMA address of
> each page in turn; the other provides the 'struct page' pointer required
> by many memory management operations.
>
> Various uses of for_each_sg_page() are then converted to the new macros.
>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: linux-kernel@vger.kernel.org
> ---
>   drivers/gpu/drm/i915/i915_drv.h         | 62 +++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/i915_gem.c         | 20 ++++-----
>   drivers/gpu/drm/i915/i915_gem_fence.c   | 14 +++---
>   drivers/gpu/drm/i915/i915_gem_gtt.c     | 76 ++++++++++++++++-----------------
>   drivers/gpu/drm/i915/i915_gem_userptr.c |  7 ++-
>   5 files changed, 116 insertions(+), 63 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 72f0b02..c0fc6aa 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2125,6 +2125,10 @@ struct drm_i915_gem_object_ops {
>   #define INTEL_FRONTBUFFER_ALL_MASK(pipe) \
>   	(0xff << (INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe)))
>
> +void i915_gem_track_fb(struct drm_i915_gem_object *old,
> +		       struct drm_i915_gem_object *new,
> +		       unsigned frontbuffer_bits);
> +
>   struct drm_i915_gem_object {
>   	struct drm_gem_object base;
>
> @@ -2256,9 +2260,61 @@ struct drm_i915_gem_object {
>   };
>   #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
>
> -void i915_gem_track_fb(struct drm_i915_gem_object *old,
> -		       struct drm_i915_gem_object *new,
> -		       unsigned frontbuffer_bits);
> +/*
> + * New optimised SGL iterator for i915 GEM objects
> + */
> +struct sgt_iter {
> +	struct scatterlist *sgp;
> +	union {
> +		unsigned long pfn;
> +		unsigned long dma;

dma_addr_t

> +	} ix;
> +	unsigned int curr;
> +	unsigned int max;

I think since this counts in bytes it should match obj->base.size which 
is a size_t?

Also, maybe the iterator would be more efficient if it counted in pages? 
Hm, probably makes no difference, but maybe worth exploring. Was just 
thinking about avoiding the need to to convert byte position to page 
position in every iteration.

> +};
> +
> +/* Constructor for new iterator */
> +static inline struct sgt_iter
> +__sgt_iter(struct scatterlist *sgl, bool dma)
> +{
> +	struct sgt_iter s = { .sgp = sgl };
> +
> +	if (sgl) {
> +		s.max = s.curr = sgl->offset;
> +		s.max += sgl->length;
> +		if (dma)
> +			s.ix.dma = sg_dma_address(sgl);
> +		else
> +			s.ix.pfn = page_to_pfn(sg_page(sgl));

I suppose we can rely on the compiler to optimize one branch away rather 
than having to provide two iterators.

> +	}
> +
> +	return s;
> +}
> +
> +/**
> + * for_each_sgt_dma - iterate over the DMA addresses of the given sg_table
> + * @__dmap:	DMA address (output)
> + * @__iter:	'struct sgt_iter' (iterator state, internal)
> + * @__sgt:	sg_table to iterate over (input)
> + */
> +#define for_each_sgt_dma(__dmap, __iter, __sgt)				\
> +	for ((__iter) = __sgt_iter((__sgt)->sgl, true);			\
> +	     ((__dmap) = (__iter).ix.dma + (__iter).curr);		\
> +	     (((__iter).curr += PAGE_SIZE) < (__iter).max) ||		\
> +		((__iter) = __sgt_iter(sg_next((__iter).sgp), true), 0))
> +
> +/**
> + * for_each_sgt_page - iterate over the pages of the given sg_table
> + * @__pp:	page pointer (output)
> + * @__iter:	'struct sgt_iter' (iterator state, internal)
> + * @__sgt:	sg_table to iterate over (input)
> + */
> +#define for_each_sgt_page(__pp, __iter, __sgt)				\
> +	for ((__iter) = __sgt_iter((__sgt)->sgl, false);		\
> +	     ((__pp) = (__iter).ix.pfn == 0 ? NULL :			\
> +		pfn_to_page((__iter).ix.pfn + ((__iter).curr >> PAGE_SHIFT)));\

I could figure out what is the pfn == 0 check for? The other loop has no 
such checks so one of them looks suspicious or I am missing something.

Also I think to be worth it you would have to handle the offset as well. 
That way all usages for for_each_sg_page could be converted and without 
it it leaves a mix of new and old.

> +	     (((__iter).curr += PAGE_SIZE) < (__iter).max) ||		\
> +		((__iter) = __sgt_iter(sg_next((__iter).sgp), false), 0))
>
>   /**
>    * Request queue structure.
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 38e4e1e..0fe7e95 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2165,7 +2165,8 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
>   static void
>   i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj)
>   {
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	struct page *page;
>   	int ret;
>
>   	BUG_ON(obj->madv == __I915_MADV_PURGED);
> @@ -2187,9 +2188,7 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
>   	if (obj->madv == I915_MADV_DONTNEED)
>   		obj->dirty = 0;
>
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> -		struct page *page = sg_page_iter_page(&sg_iter);
> -
> +	for_each_sgt_page(page, sgt_iter, obj->pages) {
>   		if (obj->dirty)
>   			set_page_dirty(page);
>
> @@ -2246,7 +2245,7 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
>   	struct address_space *mapping;
>   	struct sg_table *st;
>   	struct scatterlist *sg;
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
>   	struct page *page;
>   	unsigned long last_pfn = 0;	/* suppress gcc warning */
>   	int ret;
> @@ -2343,8 +2342,8 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
>
>   err_pages:
>   	sg_mark_end(sg);
> -	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
> -		put_page(sg_page_iter_page(&sg_iter));
> +	for_each_sgt_page(page, sgt_iter, st)
> +		put_page(page);
>   	sg_free_table(st);
>   	kfree(st);
>
> @@ -2403,7 +2402,8 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
>   {
>   	unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
>   	struct sg_table *sgt = obj->pages;
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	struct page *page;
>   	struct page *stack_pages[32];
>   	struct page **pages = stack_pages;
>   	unsigned long i = 0;
> @@ -2423,8 +2423,8 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
>   		}
>   	}
>
> -	for_each_sg_page(sgt->sgl, &sg_iter, sgt->nents, 0)
> -		pages[i++] = sg_page_iter_page(&sg_iter);
> +	for_each_sgt_page(page, sgt_iter, sgt)
> +		pages[i++] = page;
>
>   	/* Check that we have the expected number of pages */
>   	GEM_BUG_ON(i != n_pages);
> diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
> index a2b938e..2b6bdc2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_fence.c
> +++ b/drivers/gpu/drm/i915/i915_gem_fence.c
> @@ -745,15 +745,15 @@ void i915_gem_restore_fences(struct drm_device *dev)
>   void
>   i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj)
>   {
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	struct page *page;
>   	int i;
>
>   	if (obj->bit_17 == NULL)
>   		return;
>
>   	i = 0;
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> -		struct page *page = sg_page_iter_page(&sg_iter);
> +	for_each_sgt_page(page, sgt_iter, obj->pages) {
>   		char new_bit_17 = page_to_phys(page) >> 17;
>   		if ((new_bit_17 & 0x1) !=
>   		    (test_bit(i, obj->bit_17) != 0)) {
> @@ -775,7 +775,8 @@ void i915_gem_restore_fences(struct drm_device *dev)
>   void
>   i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj)
>   {
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	struct page *page;
>   	int page_count = obj->base.size >> PAGE_SHIFT;
>   	int i;
>
> @@ -790,8 +791,9 @@ void i915_gem_restore_fences(struct drm_device *dev)
>   	}
>
>   	i = 0;
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> -		if (page_to_phys(sg_page_iter_page(&sg_iter)) & (1 << 17))
> +
> +	for_each_sgt_page(page, sgt_iter, obj->pages) {
> +		if (page_to_phys(page) & (1 << 17))
>   			__set_bit(i, obj->bit_17);
>   		else
>   			__clear_bit(i, obj->bit_17);
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 7eab619..4668477 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1839,20 +1839,19 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   				      enum i915_cache_level cache_level, u32 flags)
>   {
>   	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
> -	gen6_pte_t *pt_vaddr;
>   	unsigned first_entry = start >> PAGE_SHIFT;
>   	unsigned act_pt = first_entry / GEN6_PTES;
>   	unsigned act_pte = first_entry % GEN6_PTES;
> -	struct sg_page_iter sg_iter;
> +	gen6_pte_t *pt_vaddr = NULL;
> +	struct sgt_iter sgt_iter;
> +	dma_addr_t addr;
>
> -	pt_vaddr = NULL;
> -	for_each_sg_page(pages->sgl, &sg_iter, pages->nents, 0) {
> +	for_each_sgt_dma(addr, sgt_iter, pages) {
>   		if (pt_vaddr == NULL)
>   			pt_vaddr = kmap_px(ppgtt->pd.page_table[act_pt]);
>
>   		pt_vaddr[act_pte] =
> -			vm->pte_encode(sg_page_iter_dma_address(&sg_iter),
> -				       cache_level, true, flags);
> +			vm->pte_encode(addr, cache_level, true, flags);
>
>   		if (++act_pte == GEN6_PTES) {
>   			kunmap_px(ppgtt, pt_vaddr);
> @@ -1861,6 +1860,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
>   			act_pte = 0;
>   		}
>   	}
> +
>   	if (pt_vaddr)
>   		kunmap_px(ppgtt, pt_vaddr);
>   }
> @@ -2362,22 +2362,20 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>   {
>   	struct drm_i915_private *dev_priv = to_i915(vm->dev);
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> -	unsigned first_entry = start >> PAGE_SHIFT;
> -	gen8_pte_t __iomem *gtt_entries =
> -		(gen8_pte_t __iomem *)ggtt->gsm + first_entry;
> -	int i = 0;
> -	struct sg_page_iter sg_iter;
> -	dma_addr_t addr = 0; /* shut up gcc */
> +	struct sgt_iter sgt_iter;
> +	gen8_pte_t __iomem *gtt_entries;
> +	gen8_pte_t gtt_entry;
> +	dma_addr_t addr;
>   	int rpm_atomic_seq;
> +	int i = 0;
>
>   	rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
>
> -	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) {
> -		addr = sg_dma_address(sg_iter.sg) +
> -			(sg_iter.sg_pgoffset << PAGE_SHIFT);
> -		gen8_set_pte(&gtt_entries[i],
> -			     gen8_pte_encode(addr, level, true));
> -		i++;
> +	gtt_entries = (gen8_pte_t __iomem *)ggtt->gsm + (start >> PAGE_SHIFT);
> +
> +	for_each_sgt_dma(addr, sgt_iter, st) {
> +		gtt_entry = gen8_pte_encode(addr, level, true);
> +		gen8_set_pte(&gtt_entries[i++], gtt_entry);
>   	}
>
>   	/*
> @@ -2388,8 +2386,7 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
>   	 * hardware should work, we must keep this posting read for paranoia.
>   	 */
>   	if (i != 0)
> -		WARN_ON(readq(&gtt_entries[i-1])
> -			!= gen8_pte_encode(addr, level, true));
> +		WARN_ON(readq(&gtt_entries[i-1]) != gtt_entry);
>
>   	/* This next bit makes the above posting read even more important. We
>   	 * want to flush the TLBs only after we're certain all the PTE updates
> @@ -2440,20 +2437,20 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>   {
>   	struct drm_i915_private *dev_priv = to_i915(vm->dev);
>   	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
> -	unsigned first_entry = start >> PAGE_SHIFT;
> -	gen6_pte_t __iomem *gtt_entries =
> -		(gen6_pte_t __iomem *)ggtt->gsm + first_entry;
> -	int i = 0;
> -	struct sg_page_iter sg_iter;
> -	dma_addr_t addr = 0;
> +	struct sgt_iter sgt_iter;
> +	gen6_pte_t __iomem *gtt_entries;
> +	gen6_pte_t gtt_entry;
> +	dma_addr_t addr;
>   	int rpm_atomic_seq;
> +	int i = 0;
>
>   	rpm_atomic_seq = assert_rpm_atomic_begin(dev_priv);
>
> -	for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) {
> -		addr = sg_page_iter_dma_address(&sg_iter);
> -		iowrite32(vm->pte_encode(addr, level, true, flags), &gtt_entries[i]);
> -		i++;
> +	gtt_entries = (gen6_pte_t __iomem *)ggtt->gsm + (start >> PAGE_SHIFT);
> +
> +	for_each_sgt_dma(addr, sgt_iter, st) {
> +		gtt_entry = vm->pte_encode(addr, level, true, flags);
> +		iowrite32(gtt_entry, &gtt_entries[i++]);
>   	}
>
>   	/* XXX: This serves as a posting read to make sure that the PTE has
> @@ -2462,10 +2459,8 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
>   	 * of NUMA access patterns. Therefore, even with the way we assume
>   	 * hardware should work, we must keep this posting read for paranoia.
>   	 */
> -	if (i != 0) {
> -		unsigned long gtt = readl(&gtt_entries[i-1]);
> -		WARN_ON(gtt != vm->pte_encode(addr, level, true, flags));
> -	}
> +	if (i != 0)
> +		WARN_ON(readl(&gtt_entries[i-1]) != gtt_entry);
>
>   	/* This next bit makes the above posting read even more important. We
>   	 * want to flush the TLBs only after we're certain all the PTE updates
> @@ -3399,9 +3394,11 @@ struct i915_vma *
>   intel_rotate_fb_obj_pages(struct intel_rotation_info *rot_info,
>   			  struct drm_i915_gem_object *obj)
>   {
> +	const size_t n_pages = obj->base.size / PAGE_SIZE;
>   	unsigned int size_pages = rot_info->plane[0].width * rot_info->plane[0].height;
>   	unsigned int size_pages_uv;
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	dma_addr_t dma_addr;
>   	unsigned long i;
>   	dma_addr_t *page_addr_list;
>   	struct sg_table *st;
> @@ -3410,7 +3407,7 @@ struct i915_vma *
>   	int ret = -ENOMEM;
>
>   	/* Allocate a temporary list of source pages for random access. */
> -	page_addr_list = drm_malloc_gfp(obj->base.size / PAGE_SIZE,
> +	page_addr_list = drm_malloc_gfp(n_pages,
>   					sizeof(dma_addr_t),
>   					GFP_TEMPORARY);
>   	if (!page_addr_list)
> @@ -3433,11 +3430,10 @@ struct i915_vma *
>
>   	/* Populate source page list from the object. */
>   	i = 0;
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> -		page_addr_list[i] = sg_page_iter_dma_address(&sg_iter);
> -		i++;
> -	}
> +	for_each_sgt_dma(dma_addr, sgt_iter, obj->pages)
> +		page_addr_list[i++] = dma_addr;
>
> +	GEM_BUG_ON(i != n_pages);
>   	st->nents = 0;
>   	sg = st->sgl;
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 32d9726..d2c1595 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -706,7 +706,8 @@ struct get_pages_work {
>   static void
>   i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
>   {
> -	struct sg_page_iter sg_iter;
> +	struct sgt_iter sgt_iter;
> +	struct page *page;
>
>   	BUG_ON(obj->userptr.work != NULL);
>   	__i915_gem_userptr_set_active(obj, false);
> @@ -716,9 +717,7 @@ struct get_pages_work {
>
>   	i915_gem_gtt_finish_object(obj);
>
> -	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
> -		struct page *page = sg_page_iter_page(&sg_iter);
> -
> +	for_each_sgt_page(page, sgt_iter, obj->pages) {
>   		if (obj->dirty)
>   			set_page_dirty(page);
>
>

Couldn't spot any problems in this block.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH 3/3] Introduce & use new lightweight SGL iterators
  2016-05-17 10:34     ` Tvrtko Ursulin
  (?)
@ 2016-05-17 12:05     ` Dave Gordon
  2016-05-19 17:27       ` Chris Wilson
  -1 siblings, 1 reply; 12+ messages in thread
From: Dave Gordon @ 2016-05-17 12:05 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson; +Cc: linux-kernel

On 17/05/16 11:34, Tvrtko Ursulin wrote:
>
> On 16/05/16 16:19, Dave Gordon wrote:
>> The existing for_each_sg_page() iterator is somewhat heavyweight, and is
>> limiting i915 driver performance in a few benchmarks. So here we
>> introduce somewhat lighter weight iterators, primarily for use with GEM
>> objects or other case where we need only deal with whole aligned pages.
>
> Interesting idea, if for nothing then for eliminating the dreaded
> st->nents of for_each_sg_page. :)
>
> Which benchmarks it improves and how much do you know?

I know nothing :)

But last time I posted some easy-to-use iterators, Chris Wilson said 
they didn't address his complaint, which was that the existing ones were 
too slow.

https://lkml.org/lkml/2016/5/9/325

> It generates more code for me but that seems to be by design, yes?
> Because what were previously calls to __sg_page_iter_next is now
> effectively inlined and instead there is only a call to sg_next, which
> __sg_page_iter_next would call anyway.

In the light of Chris' comments, I decided to try some versions that 
were mostly focussed on avoiding the excessive number of function calls 
in the old ones. If this still isn't fast enough, we could also try 
inlining sg_next().

>> Unlike the old iterator, the new iterators use an internal state
>> structure which is not intended to be accessed by the caller; instead
>> each takes as a parameter an output variable which is set before each
>> iteration. This makes them particularly simple to use :)
>>
>> One of the new iterators provides the caller with the DMA address of
>> each page in turn; the other provides the 'struct page' pointer required
>> by many memory management operations.
>>
>> Various uses of for_each_sg_page() are then converted to the new macros.
>>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         | 62
>> +++++++++++++++++++++++++--
>>   drivers/gpu/drm/i915/i915_gem.c         | 20 ++++-----
>>   drivers/gpu/drm/i915/i915_gem_fence.c   | 14 +++---
>>   drivers/gpu/drm/i915/i915_gem_gtt.c     | 76
>> ++++++++++++++++-----------------
>>   drivers/gpu/drm/i915/i915_gem_userptr.c |  7 ++-
>>   5 files changed, 116 insertions(+), 63 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 72f0b02..c0fc6aa 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2125,6 +2125,10 @@ struct drm_i915_gem_object_ops {
>>   #define INTEL_FRONTBUFFER_ALL_MASK(pipe) \
>>       (0xff << (INTEL_FRONTBUFFER_BITS_PER_PIPE * (pipe)))
>>
>> +void i915_gem_track_fb(struct drm_i915_gem_object *old,
>> +               struct drm_i915_gem_object *new,
>> +               unsigned frontbuffer_bits);
>> +
>>   struct drm_i915_gem_object {
>>       struct drm_gem_object base;
>>
>> @@ -2256,9 +2260,61 @@ struct drm_i915_gem_object {
>>   };
>>   #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object,
>> base)
>>
>> -void i915_gem_track_fb(struct drm_i915_gem_object *old,
>> -               struct drm_i915_gem_object *new,
>> -               unsigned frontbuffer_bits);
>> +/*
>> + * New optimised SGL iterator for i915 GEM objects
>> + */
>> +struct sgt_iter {
>> +    struct scatterlist *sgp;
>> +    union {
>> +        unsigned long pfn;
>> +        unsigned long dma;
>
> dma_addr_t
>
>> +    } ix;
>> +    unsigned int curr;
>> +    unsigned int max;
>
> I think since this counts in bytes it should match obj->base.size which
> is a size_t?

Could be, although I think no GEM object can be greater than 4G.

> Also, maybe the iterator would be more efficient if it counted in pages?
> Hm, probably makes no difference, but maybe worth exploring. Was just
> thinking about avoiding the need to to convert byte position to page
> position in every iteration.

I tried that too, you still end up with byte/page conversions, just in 
different places. Anyway, it's only a single shift instruction.

>> +};
>> +
>> +/* Constructor for new iterator */
>> +static inline struct sgt_iter
>> +__sgt_iter(struct scatterlist *sgl, bool dma)
>> +{
>> +    struct sgt_iter s = { .sgp = sgl };
>> +
>> +    if (sgl) {
>> +        s.max = s.curr = sgl->offset;
>> +        s.max += sgl->length;
>> +        if (dma)
>> +            s.ix.dma = sg_dma_address(sgl);
>> +        else
>> +            s.ix.pfn = page_to_pfn(sg_page(sgl));
>
> I suppose we can rely on the compiler to optimize one branch away rather
> than having to provide two iterators.

Yes; the bool is always a compile-time const so the compiler only 
generates the relevant side of the branch.

>> +    }
>> +
>> +    return s;
>> +}
>> +
>> +/**
>> + * for_each_sgt_dma - iterate over the DMA addresses of the given
>> sg_table
>> + * @__dmap:    DMA address (output)
>> + * @__iter:    'struct sgt_iter' (iterator state, internal)
>> + * @__sgt:    sg_table to iterate over (input)
>> + */
>> +#define for_each_sgt_dma(__dmap, __iter, __sgt)                \
>> +    for ((__iter) = __sgt_iter((__sgt)->sgl, true);            \
>> +         ((__dmap) = (__iter).ix.dma + (__iter).curr);        \
>> +         (((__iter).curr += PAGE_SIZE) < (__iter).max) ||        \
>> +        ((__iter) = __sgt_iter(sg_next((__iter).sgp), true), 0))
>> +
>> +/**
>> + * for_each_sgt_page - iterate over the pages of the given sg_table
>> + * @__pp:    page pointer (output)
>> + * @__iter:    'struct sgt_iter' (iterator state, internal)
>> + * @__sgt:    sg_table to iterate over (input)
>> + */
>> +#define for_each_sgt_page(__pp, __iter, __sgt)                \
>> +    for ((__iter) = __sgt_iter((__sgt)->sgl, false);        \
>> +         ((__pp) = (__iter).ix.pfn == 0 ? NULL :            \
>> +        pfn_to_page((__iter).ix.pfn + ((__iter).curr >> PAGE_SHIFT)));\
>
> I could figure out what is the pfn == 0 check for? The other loop has no
> such checks so one of them looks suspicious or I am missing something.

The other *does* have a similar check -- this is the loop-break 
condition :) But here we need to avoid passing the invalid pfn 0 to 
pfn_to_page(), whereas in the other one we're just doing an addition, so 
we can terminate on (ix.dma == curr == 0).

> Also I think to be worth it you would have to handle the offset as well.
> That way all usages for for_each_sg_page could be converted and without
> it it leaves a mix of new and old.

Do you mean offset-within-page? That actually does work for DMA 
addresses (you get the same offset within each page), and it's ignored 
for page iteration.

Or do you mean starting-offset? That's really hard :( You can't find the 
n'th page without walking the entire list from the start. So I didn't 
write an iterator with a starting offset, and the few cases which need 
that can't easily be converted.

(I do have a "convenience" iterator with an offset as previously posted, 
but it's not a fast inlined version).

I'll try a couple more tweaks to this ...

.Dave.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] drm/i915: refactor i915_gem_object_pin_map()
  2016-05-17  9:22 ` [PATCH 1/3] " Tvrtko Ursulin
@ 2016-05-17 12:59   ` Dave Gordon
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Gordon @ 2016-05-17 12:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

On 17/05/16 10:22, Tvrtko Ursulin wrote:
>
> On 16/05/16 16:19, Dave Gordon wrote:
>> The recently-added i915_gem_object_pin_map() can be further optimised
>> for "small" objects. To facilitate this, and simplify the error paths
>> before adding the new code, this patch pulls out the "mapping" part of
>> the operation (involving local allocations which must be undone before
>> return) into its own subfunction.
>>
>> The next patch will then insert the new optimisation into the middle of
>> the now-separated subfunction.
>>
>> This reorganisation will probably not affect the generated code, as the
>> compiler will most likely inline it anyway, but it makes the logical
>> structure a bit clearer and easier to modify.
>>
>> v2:
>>      Restructure loop-over-pages & error check (Chris Wilson)
>>
>> v3:
>>      Add page count to debug messages (Chris Wilson)
>>      Convert WARN_ON() to GEM_BUG_ON()
>>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>>   drivers/gpu/drm/i915/i915_gem.c | 59
>> ++++++++++++++++++++++++++---------------
>>   1 file changed, 38 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index aff386e..bbec429 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2398,6 +2398,43 @@ static void
>> i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
>>       return 0;
>>   }
>>
>> +/* The 'mapping' part of i915_gem_object_pin_map() below */
>> +static void *i915_gem_object_map(const struct drm_i915_gem_object *obj)
>> +{
>> +    unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
>> +    struct sg_table *sgt = obj->pages;
>> +    struct sg_page_iter sg_iter;
>> +    struct page **pages;
>> +    unsigned long i = 0;
>> +    void *addr = NULL;
>
> Looks like this does not need to be initialized?

OK. BTW the compiler didn't actually generate any code for this :)

>> +
>> +    /* A single page can always be kmapped */
>> +    if (n_pages == 1)
>> +        return kmap(sg_page(sgt->sgl));
>> +
>> +    pages = drm_malloc_gfp(n_pages, sizeof(*pages), GFP_TEMPORARY);
>> +    if (pages == NULL) {
>> +        DRM_DEBUG_DRIVER("Failed to get space for %lu pointers\n",
>> +                 n_pages);
>
> Not terribly important but I think this is too low level functions to
> have debug logging. It will not add a lot of useful information, no call
> stack etc. And the callers are probably handling failures already and
> they would propagate somewhere from where it is already reported.
>
>> +        return NULL;
>> +    }
>> +
>> +    for_each_sg_page(sgt->sgl, &sg_iter, sgt->nents, 0)
>> +        pages[i++] = sg_page_iter_page(&sg_iter);
>> +
>> +    /* Check that we have the expected number of pages */
>> +    GEM_BUG_ON(i != n_pages);
>> +
>> +    addr = vmap(pages, n_pages, 0, PAGE_KERNEL);
>> +    if (addr == NULL)
>> +        DRM_DEBUG_DRIVER("Failed to vmap %lu pages\n", n_pages);
>
> Same here. I mean, the only potential argument could be that this will
> tell the real reason which is otherwise lost in the NULL return code,
> but I am not sure it is worth it since it is so unlikely it would happen.

OK, debugging removed, new version posted.

.Dave.

>> +
>> +    drm_free_large(pages);
>> +
>> +    return addr;
>> +}
>> +
>> +/* get, pin, and map the pages of the object into kernel space */
>>   void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj)
>>   {
>>       int ret;
>> @@ -2411,27 +2448,7 @@ void *i915_gem_object_pin_map(struct
>> drm_i915_gem_object *obj)
>>       i915_gem_object_pin_pages(obj);
>>
>>       if (obj->mapping == NULL) {
>> -        struct page **pages;
>> -
>> -        pages = NULL;
>> -        if (obj->base.size == PAGE_SIZE)
>> -            obj->mapping = kmap(sg_page(obj->pages->sgl));
>> -        else
>> -            pages = drm_malloc_gfp(obj->base.size >> PAGE_SHIFT,
>> -                           sizeof(*pages),
>> -                           GFP_TEMPORARY);
>> -        if (pages != NULL) {
>> -            struct sg_page_iter sg_iter;
>> -            int n;
>> -
>> -            n = 0;
>> -            for_each_sg_page(obj->pages->sgl, &sg_iter,
>> -                     obj->pages->nents, 0)
>> -                pages[n++] = sg_page_iter_page(&sg_iter);
>> -
>> -            obj->mapping = vmap(pages, n, 0, PAGE_KERNEL);
>> -            drm_free_large(pages);
>> -        }
>> +        obj->mapping = i915_gem_object_map(obj);
>>           if (obj->mapping == NULL) {
>>               i915_gem_object_unpin_pages(obj);
>>               return ERR_PTR(-ENOMEM);
>>
>
> Otherwise looks fine to me.
>
> Regards,
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH 3/3] Introduce & use new lightweight SGL iterators
  2016-05-17 12:05     ` [Intel-gfx] " Dave Gordon
@ 2016-05-19 17:27       ` Chris Wilson
  2016-05-20  0:13         ` Dave Gordon
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-05-19 17:27 UTC (permalink / raw)
  To: Dave Gordon; +Cc: Tvrtko Ursulin, linux-kernel

On Tue, May 17, 2016 at 01:05:48PM +0100, Dave Gordon wrote:
> On 17/05/16 11:34, Tvrtko Ursulin wrote:
> >
> >On 16/05/16 16:19, Dave Gordon wrote:
> >>The existing for_each_sg_page() iterator is somewhat heavyweight, and is
> >>limiting i915 driver performance in a few benchmarks. So here we
> >>introduce somewhat lighter weight iterators, primarily for use with GEM
> >>objects or other case where we need only deal with whole aligned pages.
> >
> >Interesting idea, if for nothing then for eliminating the dreaded
> >st->nents of for_each_sg_page. :)
> >
> >Which benchmarks it improves and how much do you know?
> 
> I know nothing :)
> 
> But last time I posted some easy-to-use iterators, Chris Wilson said
> they didn't address his complaint, which was that the existing ones
> were too slow.

These aren't very good either... Compared to the sg iters I have:

         gem:exec:fault:1MiB:    -4.32%
  gem:exec:fault:1MiB:forked:    -5.66%
        gem:exec:fault:16MiB:   -13.33%
 gem:exec:fault:16MiB:forked:   -12.03%
       gem:exec:fault:256MiB:   -15.28%
gem:exec:fault:256MiB:forked:   -16.98%

(I was really hoping to be able to drop a patch!)

Patch used for reference:

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 03b7c2e..d7c1431 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3787,4 +3787,56 @@ int remap_io_mapping(struct vm_area_struct *vma,
 #define i915_gem_object_for_each_vma(vma, obj) \
        list_for_each_entry_check(vma, &(obj)->vma_list, obj_link, &(obj)->base.dev->struct_mutex)
 
+struct sgt_iter {
+       struct scatterlist *sgp;
+       union {
+               unsigned long pfn;
+               unsigned long dma;
+       } ix;
+       unsigned int curr;
+       unsigned int max;
+};
+
+static inline struct sgt_iter
+__sgt_iter(struct scatterlist *sgl, bool dma)
+{
+       struct sgt_iter s = { .sgp = sgl };
+
+       if (sgl) {
+               s.max = s.curr = sgl->offset;
+               s.max += sgl->length;
+               if (dma)
+                       s.ix.dma = sg_dma_address(sgl);
+               else
+                       s.ix.pfn = page_to_pfn(sg_page(sgl));
+       }
+
+       return s;
+}
+
+/**
+ * for_each_sgt_dma - iterate over the DMA addresses of the given sg_table
+ * @__dmap:    DMA address (output)
+ * @__iter:    'struct sgt_iter' (iterator state, internal)
+ * @__sgt:     sg_table to iterate over (input)
+ */
+#define for_each_sgt_dma(__dmap, __iter, __sgt)                                \
+       for ((__iter) = __sgt_iter((__sgt)->sgl, true);                 \
+            ((__dmap) = (__iter).ix.dma + (__iter).curr);              \
+            (((__iter).curr += PAGE_SIZE) < (__iter).max) ||           \
+            ((__iter) = __sgt_iter(sg_next((__iter).sgp), true), 0))
+
+/**
+ * for_each_sgt_page - iterate over the pages of the given sg_table
+ * @__pp:      page pointer (output)
+ * @__iter:    'struct sgt_iter' (iterator state, internal)
+ * @__sgt:     sg_table to iterate over (input)
+ */
+#define for_each_sgt_page(__pp, __iter, __sgt)                         \
+       for ((__iter) = __sgt_iter((__sgt)->sgl, false);                \
+            ((__pp) = (__iter).ix.pfn == 0 ? NULL :                    \
+             pfn_to_page((__iter).ix.pfn + ((__iter).curr >> PAGE_SHIFT)));\
+            (((__iter).curr += PAGE_SIZE) < (__iter).max) ||           \
+            ((__iter) = __sgt_iter(sg_next((__iter).sgp), false), 0))
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 603895a..3fcb540 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1571,18 +1571,19 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
        unsigned act_pt = first_entry / GEN6_PTES;
        unsigned act_pte = first_entry % GEN6_PTES - 1;
        u32 pte_encode = vm->pte_encode(0, cache_level, true, flags);
-       struct st_iter iter;
+       struct sgt_iter iter;
        gen6_pte_t *pt_vaddr;
+       dma_addr_t addr;
 
        pt_vaddr = kmap_px(ppgtt, &ppgtt->pd.page_table[act_pt]);
-       st_for_each_address(&iter, pages) {
+       for_each_sgt_dma(addr, iter, pages) {
                if (++act_pte == GEN6_PTES) {
                        kunmap_px(pt_vaddr);
                        pt_vaddr = kmap_px(ppgtt,
                                           &ppgtt->pd.page_table[++act_pt]);
                        act_pte = 0;
                }
-               pt_vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(iter.dma);
+               pt_vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(addr);
        }
        kunmap_px(pt_vaddr);
 }

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Intel-gfx] [PATCH 3/3] Introduce & use new lightweight SGL iterators
  2016-05-19 17:27       ` Chris Wilson
@ 2016-05-20  0:13         ` Dave Gordon
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Gordon @ 2016-05-20  0:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Tvrtko Ursulin, linux-kernel

On 19/05/2016 18:27, Chris Wilson wrote:
> On Tue, May 17, 2016 at 01:05:48PM +0100, Dave Gordon wrote:
>> On 17/05/16 11:34, Tvrtko Ursulin wrote:
>>> On 16/05/16 16:19, Dave Gordon wrote:
>>>> The existing for_each_sg_page() iterator is somewhat heavyweight, and is
>>>> limiting i915 driver performance in a few benchmarks. So here we
>>>> introduce somewhat lighter weight iterators, primarily for use with GEM
>>>> objects or other case where we need only deal with whole aligned pages.
>>> Interesting idea, if for nothing then for eliminating the dreaded
>>> st->nents of for_each_sg_page. :)
>>>
>>> Which benchmarks it improves and how much do you know?
>> I know nothing :)
>>
>> But last time I posted some easy-to-use iterators, Chris Wilson said
>> they didn't address his complaint, which was that the existing ones
>> were too slow.
> These aren't very good either... Compared to the sg iters I have:
>
>           gem:exec:fault:1MiB:    -4.32%
>    gem:exec:fault:1MiB:forked:    -5.66%
>          gem:exec:fault:16MiB:   -13.33%
>   gem:exec:fault:16MiB:forked:   -12.03%
>         gem:exec:fault:256MiB:   -15.28%
> gem:exec:fault:256MiB:forked:   -16.98%
>
> (I was really hoping to be able to drop a patch!)

I think you've inlined sg_next() as well? That was the next thing I was 
going to try with these ...
I'll post the version with inline sg_next, but I'm on holiday now so 
won't be around until the end of the month.

.Dave.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-05-20  0:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-16 15:19 [PATCH 1/3] drm/i915: refactor i915_gem_object_pin_map() Dave Gordon
2016-05-16 15:19 ` [PATCH 2/3] drm/i915: optimise i915_gem_object_map() for small objects Dave Gordon
2016-05-16 15:19 ` [PATCH 3/3] Introduce & use new lightweight SGL iterators Dave Gordon
2016-05-16 15:19   ` Dave Gordon
2016-05-17 10:34   ` [Intel-gfx] " Tvrtko Ursulin
2016-05-17 10:34     ` Tvrtko Ursulin
2016-05-17 12:05     ` [Intel-gfx] " Dave Gordon
2016-05-19 17:27       ` Chris Wilson
2016-05-20  0:13         ` Dave Gordon
2016-05-16 16:01 ` ✗ Ro.CI.BAT: failure for series starting with [1/3] drm/i915: refactor i915_gem_object_pin_map() Patchwork
2016-05-17  9:22 ` [PATCH 1/3] " Tvrtko Ursulin
2016-05-17 12:59   ` Dave Gordon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.