[Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches
@ 2020-05-03 11:21 Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 02/14] drm/i915/gem: Implement legacy MI_STORE_DATA_IMM Chris Wilson
                   ` (15 more replies)
  0 siblings, 16 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, Chris Wilson

From: Chris Wilson <ickle@x201s.alporthouse.com>

It is required that a chained batch be in the same address domain as its
parent, and also that must be specified in the command for earlier gen
as it is not inferred from the chaining until gen6.

Fixes: 964a9b0f611e ("drm/i915/gem: Use chained reloc batches")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index cce7df231cb9..ab0d4df13c0b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1004,14 +1004,14 @@ static int reloc_gpu_chain(struct reloc_cache *cache)
 	GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE  / sizeof(u32));
 	cmd = cache->rq_cmd + cache->rq_size;
 	*cmd++ = MI_ARB_CHECK;
-	if (cache->gen >= 8) {
+	if (cache->gen >= 8)
 		*cmd++ = MI_BATCH_BUFFER_START_GEN8;
-		*cmd++ = lower_32_bits(batch->node.start);
-		*cmd++ = upper_32_bits(batch->node.start);
-	} else {
+	else if (cache->gen >= 6)
 		*cmd++ = MI_BATCH_BUFFER_START;
-		*cmd++ = lower_32_bits(batch->node.start);
-	}
+	else
+		*cmd++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
+	*cmd++ = lower_32_bits(batch->node.start);
+	*cmd++ = upper_32_bits(batch->node.start);
 	i915_gem_object_flush_map(cache->rq_vma->obj);
 	i915_gem_object_unpin_map(cache->rq_vma->obj);
 	cache->rq_vma = NULL;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 02/14] drm/i915/gem: Implement legacy MI_STORE_DATA_IMM
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:25   ` [Intel-gfx] [PATCH] " Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 03/14] drm/i915/gt: Small tidy of gen8+ breadcrumb emission Chris Wilson
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

The older arches did not convert MI_STORE_DATA_IMM to using the GTT, but
left them writing to a physical address. The notes suggest that the
primary reason would be so that the writes were cache coherent, as the
CPU cache uses physical tagging. As such we did not implement the
legacy variant of MI_STORE_DATA_IMM and so left all the relocations
synchronous -- but with a small function to convert from the vma address
into the physical address, we can implement asynchronous relocs on these
older arches, fixing up a few tests that require them.

In order to be able to test the legacy paths, refactor the gpu
relocations so that we can hook them up to a selftest.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/757
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 204 ++++++++++-------
 .../i915/gem/selftests/i915_gem_execbuffer.c  | 206 ++++++++++++++++++
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 3 files changed, 337 insertions(+), 74 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index ab0d4df13c0b..44d7da0e200e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -955,7 +955,7 @@ static void reloc_cache_init(struct reloc_cache *cache,
 	cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
 	cache->node.flags = 0;
 	cache->rq = NULL;
-	cache->rq_size = 0;
+	cache->target = NULL;
 }
 
 static inline void *unmask_page(unsigned long p)
@@ -1325,7 +1325,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 
 		ce = intel_context_create(engine);
 		if (IS_ERR(ce)) {
-			err = PTR_ERR(rq);
+			err = PTR_ERR(ce);
 			goto err_unpin;
 		}
 
@@ -1376,6 +1376,11 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	return err;
 }
 
+static bool can_store_dword(const struct intel_engine_cs *engine)
+{
+	return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6);
+}
+
 static u32 *reloc_gpu(struct i915_execbuffer *eb,
 		      struct i915_vma *vma,
 		      unsigned int len)
@@ -1387,9 +1392,9 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
 	if (unlikely(!cache->rq)) {
 		struct intel_engine_cs *engine = eb->engine;
 
-		if (!intel_engine_can_store_dword(engine)) {
+		if (!can_store_dword(engine)) {
 			engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
-			if (!engine || !intel_engine_can_store_dword(engine))
+			if (!engine)
 				return ERR_PTR(-ENODEV);
 		}
 
@@ -1435,91 +1440,138 @@ static inline bool use_reloc_gpu(struct i915_vma *vma)
 	return !dma_resv_test_signaled_rcu(vma->resv, true);
 }
 
-static u64
-relocate_entry(struct i915_vma *vma,
-	       const struct drm_i915_gem_relocation_entry *reloc,
-	       struct i915_execbuffer *eb,
-	       const struct i915_vma *target)
+static unsigned long vma_phys_addr(struct i915_vma *vma, u32 offset)
 {
-	u64 offset = reloc->offset;
-	u64 target_offset = relocation_target(reloc, target);
-	bool wide = eb->reloc_cache.use_64bit_reloc;
-	void *vaddr;
+	struct page *page;
+	unsigned long addr;
 
-	if (!eb->reloc_cache.vaddr && use_reloc_gpu(vma)) {
-		const unsigned int gen = eb->reloc_cache.gen;
-		unsigned int len;
-		u32 *batch;
-		u64 addr;
+	GEM_BUG_ON(vma->pages != vma->obj->mm.pages);
 
-		if (wide)
-			len = offset & 7 ? 8 : 5;
-		else if (gen >= 4)
-			len = 4;
-		else
-			len = 3;
+	page = i915_gem_object_get_page(vma->obj, offset >> PAGE_SHIFT);
+	addr = PFN_PHYS(page_to_pfn(page));
+	GEM_BUG_ON(overflows_type(addr, u32)); /* expected dma32 */
 
-		batch = reloc_gpu(eb, vma, len);
-		if (IS_ERR(batch))
-			goto repeat;
+	return addr + offset_in_page(offset);
+}
+
+static bool __reloc_entry_gpu(struct i915_vma *vma,
+			      struct i915_execbuffer *eb,
+			      u64 offset,
+			      u64 target_addr)
+{
+	const unsigned int gen = eb->reloc_cache.gen;
+	unsigned int len;
+	u32 *batch;
+	u64 addr;
+
+	if (gen >= 8)
+		len = offset & 7 ? 8 : 5;
+	else if (gen >= 4)
+		len = 4;
+	else
+		len = 3;
+
+	batch = reloc_gpu(eb, vma, len);
+	if (IS_ERR(batch))
+		return false;
+
+	addr = gen8_canonical_addr(vma->node.start + offset);
+	if (gen >= 8) {
+		if (offset & 7) {
+			*batch++ = MI_STORE_DWORD_IMM_GEN4;
+			*batch++ = lower_32_bits(addr);
+			*batch++ = upper_32_bits(addr);
+			*batch++ = lower_32_bits(target_addr);
+
+			addr = gen8_canonical_addr(addr + 4);
 
-		addr = gen8_canonical_addr(vma->node.start + offset);
-		if (wide) {
-			if (offset & 7) {
-				*batch++ = MI_STORE_DWORD_IMM_GEN4;
-				*batch++ = lower_32_bits(addr);
-				*batch++ = upper_32_bits(addr);
-				*batch++ = lower_32_bits(target_offset);
-
-				addr = gen8_canonical_addr(addr + 4);
-
-				*batch++ = MI_STORE_DWORD_IMM_GEN4;
-				*batch++ = lower_32_bits(addr);
-				*batch++ = upper_32_bits(addr);
-				*batch++ = upper_32_bits(target_offset);
-			} else {
-				*batch++ = (MI_STORE_DWORD_IMM_GEN4 | (1 << 21)) + 1;
-				*batch++ = lower_32_bits(addr);
-				*batch++ = upper_32_bits(addr);
-				*batch++ = lower_32_bits(target_offset);
-				*batch++ = upper_32_bits(target_offset);
-			}
-		} else if (gen >= 6) {
 			*batch++ = MI_STORE_DWORD_IMM_GEN4;
-			*batch++ = 0;
-			*batch++ = addr;
-			*batch++ = target_offset;
-		} else if (gen >= 4) {
-			*batch++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
-			*batch++ = 0;
-			*batch++ = addr;
-			*batch++ = target_offset;
+			*batch++ = lower_32_bits(addr);
+			*batch++ = upper_32_bits(addr);
+			*batch++ = upper_32_bits(target_addr);
 		} else {
-			*batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
-			*batch++ = addr;
-			*batch++ = target_offset;
+			*batch++ = (MI_STORE_DWORD_IMM_GEN4 | (1 << 21)) + 1;
+			*batch++ = lower_32_bits(addr);
+			*batch++ = upper_32_bits(addr);
+			*batch++ = lower_32_bits(target_addr);
+			*batch++ = upper_32_bits(target_addr);
 		}
-
-		goto out;
+	} else if (gen >= 6) {
+		*batch++ = MI_STORE_DWORD_IMM_GEN4;
+		*batch++ = 0;
+		*batch++ = addr;
+		*batch++ = target_addr;
+	} else if (IS_I965G(eb->i915)) {
+		*batch++ = MI_STORE_DWORD_IMM_GEN4;
+		*batch++ = 0;
+		*batch++ = vma_phys_addr(vma, offset);
+		*batch++ = target_addr;
+	} else if (gen >= 4) {
+		*batch++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+		*batch++ = 0;
+		*batch++ = addr;
+		*batch++ = target_addr;
+	} else if (gen >= 3 &&
+		   !(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) {
+		*batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
+		*batch++ = addr;
+		*batch++ = target_addr;
+	} else {
+		*batch++ = MI_STORE_DWORD_IMM;
+		*batch++ = vma_phys_addr(vma, offset);
+		*batch++ = target_addr;
 	}
 
+	return true;
+}
+
+static bool reloc_entry_gpu(struct i915_vma *vma,
+			    struct i915_execbuffer *eb,
+			    u64 offset,
+			    u64 target_addr)
+{
+	if (eb->reloc_cache.vaddr)
+		return false;
+
+	if (!use_reloc_gpu(vma))
+		return false;
+
+	return __reloc_entry_gpu(vma, eb, offset, target_addr);
+}
+
+static u64
+relocate_entry(struct i915_vma *vma,
+	       const struct drm_i915_gem_relocation_entry *reloc,
+	       struct i915_execbuffer *eb,
+	       const struct i915_vma *target)
+{
+	u64 target_addr = relocation_target(reloc, target);
+	u64 offset = reloc->offset;
+
+	if (!reloc_entry_gpu(vma, eb, offset, target_addr)) {
+		bool wide = eb->reloc_cache.use_64bit_reloc;
+		void *vaddr;
+
 repeat:
-	vaddr = reloc_vaddr(vma->obj, &eb->reloc_cache, offset >> PAGE_SHIFT);
-	if (IS_ERR(vaddr))
-		return PTR_ERR(vaddr);
+		vaddr = reloc_vaddr(vma->obj,
+				    &eb->reloc_cache,
+				    offset >> PAGE_SHIFT);
+		if (IS_ERR(vaddr))
+			return PTR_ERR(vaddr);
 
-	clflush_write32(vaddr + offset_in_page(offset),
-			lower_32_bits(target_offset),
-			eb->reloc_cache.vaddr);
+		GEM_BUG_ON(!IS_ALIGNED(offset, sizeof(u32)));
+		clflush_write32(vaddr + offset_in_page(offset),
+				lower_32_bits(target_addr),
+				eb->reloc_cache.vaddr);
 
-	if (wide) {
-		offset += sizeof(u32);
-		target_offset >>= 32;
-		wide = false;
-		goto repeat;
+		if (wide) {
+			offset += sizeof(u32);
+			target_addr >>= 32;
+			wide = false;
+			goto repeat;
+		}
 	}
 
-out:
 	return target->node.start | UPDATE;
 }
 
@@ -3022,3 +3074,7 @@ end:;
 	kvfree(exec2_list);
 	return err;
 }
+
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftests/i915_gem_execbuffer.c"
+#endif
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
new file mode 100644
index 000000000000..59555779dd91
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include "i915_selftest.h"
+
+#include "gt/intel_engine_pm.h"
+#include "selftests/igt_flush_test.h"
+
+static void hexdump(const void *buf, size_t len)
+{
+	const size_t rowsize = 8 * sizeof(u32);
+	const void *prev = NULL;
+	bool skip = false;
+	size_t pos;
+
+	for (pos = 0; pos < len; pos += rowsize) {
+		char line[128];
+
+		if (prev && !memcmp(prev, buf + pos, rowsize)) {
+			if (!skip) {
+				pr_info("*\n");
+				skip = true;
+			}
+			continue;
+		}
+
+		WARN_ON_ONCE(hex_dump_to_buffer(buf + pos, len - pos,
+						rowsize, sizeof(u32),
+						line, sizeof(line),
+						false) >= sizeof(line));
+		pr_info("[%04zx] %s\n", pos, line);
+
+		prev = buf + pos;
+		skip = false;
+	}
+}
+
+static u64 read_reloc(const u32 *map, int x, const u64 mask)
+{
+	u64 reloc;
+
+	memcpy(&reloc, &map[x], sizeof(reloc));
+	return reloc & mask;
+}
+
+static int __igt_gpu_reloc(struct i915_execbuffer *eb,
+			   struct drm_i915_gem_object *obj)
+{
+	enum {
+		X = 0,
+		Y = 3,
+		Z = 8
+	};
+	const u64 mask = GENMASK(eb->reloc_cache.gen >= 8 ? 63 : 31, 0);
+	const u32 *map = page_mask_bits(obj->mm.mapping);
+	struct i915_request *rq;
+	struct i915_vma *vma;
+	int inc;
+	int err;
+
+	vma = i915_vma_instance(obj, eb->context->vm, NULL);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
+
+	err = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_HIGH);
+	if (err)
+		return err;
+
+	/* 8-Byte aligned */
+	if (!__reloc_entry_gpu(vma, eb, X * sizeof(u32), X)) {
+		err = -EIO;
+		goto unpin_vma;
+	}
+
+	/* !8-Byte aligned */
+	if (!__reloc_entry_gpu(vma, eb, Y * sizeof(u32), Y)) {
+		err = -EIO;
+		goto unpin_vma;
+	}
+
+	/* Skip to the end of the cmd page */
+	inc = PAGE_SIZE / sizeof(u32) - RELOC_TAIL - 1;
+	inc -= eb->reloc_cache.rq_size;
+	memset(eb->reloc_cache.rq_cmd + eb->reloc_cache.rq_size,
+	       0, inc * sizeof(u32));
+	eb->reloc_cache.rq_size += inc;
+
+	/* Force batch chaining */
+	if (!__reloc_entry_gpu(vma, eb, Z * sizeof(u32), Z)) {
+		err = -EIO;
+		goto unpin_vma;
+	}
+
+	GEM_BUG_ON(!eb->reloc_cache.rq);
+	rq = i915_request_get(eb->reloc_cache.rq);
+	err = reloc_gpu_flush(&eb->reloc_cache);
+	if (err)
+		goto put_rq;
+	GEM_BUG_ON(eb->reloc_cache.rq);
+
+	err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
+	if (err) {
+		intel_gt_set_wedged(eb->engine->gt);
+		goto put_rq;
+	}
+
+	if (!i915_request_completed(rq)) {
+		pr_err("%s: did not wait for relocations!\n", eb->engine->name);
+		err = -EINVAL;
+		goto put_rq;
+	}
+
+	if (read_reloc(map, X, mask) != X) {
+		pr_err("%s[X]: map[%d] %llx != %x\n",
+		       eb->engine->name, X, read_reloc(map, X, mask), X);
+		err = -EINVAL;
+	}
+
+	if (read_reloc(map, Y, mask) != Y) {
+		pr_err("%s[Y]: map[%d] %llx != %x\n",
+		       eb->engine->name, Y, read_reloc(map, Y, mask), Y);
+		err = -EINVAL;
+	}
+
+	if (read_reloc(map, Z, mask) != Z) {
+		pr_err("%s[Z]: map[%d] %llx != %x\n",
+		       eb->engine->name, Z, read_reloc(map, Z, mask), Z);
+		err = -EINVAL;
+	}
+
+	if (err)
+		hexdump(map, 4096);
+
+put_rq:
+	i915_request_put(rq);
+unpin_vma:
+	i915_vma_unpin(vma);
+	return err;
+}
+
+static int igt_gpu_reloc(void *arg)
+{
+	struct i915_execbuffer eb;
+	struct drm_i915_gem_object *scratch;
+	int err = 0;
+	u32 *map;
+
+	eb.i915 = arg;
+
+	scratch = i915_gem_object_create_internal(eb.i915, 4096);
+	if (IS_ERR(scratch))
+		return PTR_ERR(scratch);
+
+	map = i915_gem_object_pin_map(scratch, I915_MAP_WC);
+	if (IS_ERR(map)) {
+		err = PTR_ERR(map);
+		goto err_scratch;
+	}
+
+	for_each_uabi_engine(eb.engine, eb.i915) {
+		reloc_cache_init(&eb.reloc_cache, eb.i915);
+		memset(map, POISON_INUSE, 4096);
+
+		intel_engine_pm_get(eb.engine);
+		eb.context = intel_context_create(eb.engine);
+		if (IS_ERR(eb.context)) {
+			err = PTR_ERR(eb.context);
+			goto err_pm;
+		}
+
+		err = intel_context_pin(eb.context);
+		if (err)
+			goto err_put;
+
+		err = __igt_gpu_reloc(&eb, scratch);
+
+		intel_context_unpin(eb.context);
+err_put:
+		intel_context_put(eb.context);
+err_pm:
+		intel_engine_pm_put(eb.engine);
+		if (err)
+			break;
+	}
+
+	if (igt_flush_test(eb.i915))
+		err = -EIO;
+
+err_scratch:
+	i915_gem_object_put(scratch);
+	return err;
+}
+
+int i915_gem_execbuffer_live_selftests(struct drm_i915_private *i915)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(igt_gpu_reloc),
+	};
+
+	if (intel_gt_is_wedged(&i915->gt))
+		return 0;
+
+	return i915_live_subtests(tests, i915);
+}
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index 0a953bfc0585..5dd5d81646c4 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -37,6 +37,7 @@ selftest(gem, i915_gem_live_selftests)
 selftest(evict, i915_gem_evict_live_selftests)
 selftest(hugepages, i915_gem_huge_page_live_selftests)
 selftest(gem_contexts, i915_gem_context_live_selftests)
+selftest(gem_execbuf, i915_gem_execbuffer_live_selftests)
 selftest(blt, i915_gem_object_blt_live_selftests)
 selftest(client, i915_gem_client_blt_live_selftests)
 selftest(reset, intel_reset_live_selftests)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 03/14] drm/i915/gt: Small tidy of gen8+ breadcrumb emission
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 02/14] drm/i915/gem: Implement legacy MI_STORE_DATA_IMM Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:21   ` [Intel-gfx] " Chris Wilson
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Use a local to shrink a line under 80 columns, and refactor the common
emit_xcs_breadcrumb() wrapper of ggtt-write.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 34 +++++++++++++----------------
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index d4ef344657b0..c00366387b54 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -4641,8 +4641,7 @@ static u32 *emit_preempt_busywait(struct i915_request *request, u32 *cs)
 }
 
 static __always_inline u32*
-gen8_emit_fini_breadcrumb_footer(struct i915_request *request,
-				 u32 *cs)
+gen8_emit_fini_breadcrumb_tail(struct i915_request *request, u32 *cs)
 {
 	*cs++ = MI_USER_INTERRUPT;
 
@@ -4656,14 +4655,16 @@ gen8_emit_fini_breadcrumb_footer(struct i915_request *request,
 	return gen8_emit_wa_tail(request, cs);
 }
 
-static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
+static u32 *emit_xcs_breadcrumb(struct i915_request *request, u32 *cs)
 {
-	cs = gen8_emit_ggtt_write(cs,
-				  request->fence.seqno,
-				  i915_request_active_timeline(request)->hwsp_offset,
-				  0);
+	u32 addr = i915_request_active_timeline(request)->hwsp_offset;
 
-	return gen8_emit_fini_breadcrumb_footer(request, cs);
+	return gen8_emit_ggtt_write(cs, request->fence.seqno, addr, 0);
+}
+
+static u32 *gen8_emit_fini_breadcrumb(struct i915_request *rq, u32 *cs)
+{
+	return gen8_emit_fini_breadcrumb_tail(rq, emit_xcs_breadcrumb(rq, cs));
 }
 
 static u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
@@ -4681,7 +4682,7 @@ static u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 				      PIPE_CONTROL_FLUSH_ENABLE |
 				      PIPE_CONTROL_CS_STALL);
 
-	return gen8_emit_fini_breadcrumb_footer(request, cs);
+	return gen8_emit_fini_breadcrumb_tail(request, cs);
 }
 
 static u32 *
@@ -4697,7 +4698,7 @@ gen11_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 				      PIPE_CONTROL_DC_FLUSH_ENABLE |
 				      PIPE_CONTROL_FLUSH_ENABLE);
 
-	return gen8_emit_fini_breadcrumb_footer(request, cs);
+	return gen8_emit_fini_breadcrumb_tail(request, cs);
 }
 
 /*
@@ -4735,7 +4736,7 @@ static u32 *gen12_emit_preempt_busywait(struct i915_request *request, u32 *cs)
 }
 
 static __always_inline u32*
-gen12_emit_fini_breadcrumb_footer(struct i915_request *request, u32 *cs)
+gen12_emit_fini_breadcrumb_tail(struct i915_request *request, u32 *cs)
 {
 	*cs++ = MI_USER_INTERRUPT;
 
@@ -4749,14 +4750,9 @@ gen12_emit_fini_breadcrumb_footer(struct i915_request *request, u32 *cs)
 	return gen8_emit_wa_tail(request, cs);
 }
 
-static u32 *gen12_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
+static u32 *gen12_emit_fini_breadcrumb(struct i915_request *rq, u32 *cs)
 {
-	cs = gen8_emit_ggtt_write(cs,
-				  request->fence.seqno,
-				  i915_request_active_timeline(request)->hwsp_offset,
-				  0);
-
-	return gen12_emit_fini_breadcrumb_footer(request, cs);
+	return gen12_emit_fini_breadcrumb_tail(rq, emit_xcs_breadcrumb(rq, cs));
 }
 
 static u32 *
@@ -4775,7 +4771,7 @@ gen12_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 				      PIPE_CONTROL_FLUSH_ENABLE |
 				      PIPE_CONTROL_HDC_PIPELINE_FLUSH);
 
-	return gen12_emit_fini_breadcrumb_footer(request, cs);
+	return gen12_emit_fini_breadcrumb_tail(request, cs);
 }
 
 static void execlists_park(struct intel_engine_cs *engine)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/14] drm/i915: Mark concurrent submissions with a weak-dependency
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
@ 2020-05-03 11:21   ` Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 03/14] drm/i915/gt: Small tidy of gen8+ breadcrumb emission Chris Wilson
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, Tvrtko Ursulin, stable

We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could
correctly perform priority inheritance from the parallel branches to the
common trunk. However, for the purpose of timeslicing and reset
handling, the dependency is weak -- as we the pair of requests are
allowed to run in parallel and not in strict succession. So for example
we do need to suspend one if the other hangs.

The real significance though is that this allows us to rearrange
groups of WAIT_FOR_SUBMIT linked requests along the single engine, and
so can resolve user level inter-batch scheduling dependencies from user
semaphores.

Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
---
 drivers/gpu/drm/i915/gt/intel_lrc.c         | 9 +++++++++
 drivers/gpu/drm/i915/i915_request.c         | 8 ++++++--
 drivers/gpu/drm/i915/i915_scheduler.c       | 4 +++-
 drivers/gpu/drm/i915/i915_scheduler.h       | 3 ++-
 drivers/gpu/drm/i915/i915_scheduler_types.h | 1 +
 5 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index c00366387b54..508661cf61d9 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1883,6 +1883,9 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl)
 			struct i915_request *w =
 				container_of(p->waiter, typeof(*w), sched);
 
+			if (p->flags & I915_DEPENDENCY_WEAK)
+				continue;
+
 			/* Leave semaphores spinning on the other engines */
 			if (w->engine != rq->engine)
 				continue;
@@ -2729,6 +2732,9 @@ static void __execlists_hold(struct i915_request *rq)
 			struct i915_request *w =
 				container_of(p->waiter, typeof(*w), sched);
 
+			if (p->flags & I915_DEPENDENCY_WEAK)
+				continue;
+
 			/* Leave semaphores spinning on the other engines */
 			if (w->engine != rq->engine)
 				continue;
@@ -2853,6 +2859,9 @@ static void __execlists_unhold(struct i915_request *rq)
 			struct i915_request *w =
 				container_of(p->waiter, typeof(*w), sched);
 
+			if (p->flags & I915_DEPENDENCY_WEAK)
+				continue;
+
 			/* Propagate any change in error status */
 			if (rq->fence.error)
 				i915_request_set_error_once(w, rq->fence.error);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 22635bbabf06..95edc5523a01 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1038,7 +1038,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		return 0;
 
 	if (to->engine->schedule) {
-		ret = i915_sched_node_add_dependency(&to->sched, &from->sched);
+		ret = i915_sched_node_add_dependency(&to->sched,
+						     &from->sched,
+						     0);
 		if (ret < 0)
 			return ret;
 	}
@@ -1200,7 +1202,9 @@ __i915_request_await_execution(struct i915_request *to,
 
 	/* Couple the dependency tree for PI on this exposed to->fence */
 	if (to->engine->schedule) {
-		err = i915_sched_node_add_dependency(&to->sched, &from->sched);
+		err = i915_sched_node_add_dependency(&to->sched,
+						     &from->sched,
+						     I915_DEPENDENCY_WEAK);
 		if (err < 0)
 			return err;
 	}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 37cfcf5b321b..5f4c1e49e974 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -462,7 +462,8 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 }
 
 int i915_sched_node_add_dependency(struct i915_sched_node *node,
-				   struct i915_sched_node *signal)
+				   struct i915_sched_node *signal,
+				   unsigned long flags)
 {
 	struct i915_dependency *dep;
 
@@ -473,6 +474,7 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node,
 	local_bh_disable();
 
 	if (!__i915_sched_node_add_dependency(node, signal, dep,
+					      flags |
 					      I915_DEPENDENCY_EXTERNAL |
 					      I915_DEPENDENCY_ALLOC))
 		i915_dependency_free(dep);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index d1dc4efef77b..6f0bf00fc569 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -34,7 +34,8 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 				      unsigned long flags);
 
 int i915_sched_node_add_dependency(struct i915_sched_node *node,
-				   struct i915_sched_node *signal);
+				   struct i915_sched_node *signal,
+				   unsigned long flags);
 
 void i915_sched_node_fini(struct i915_sched_node *node);
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index d18e70550054..7186875088a0 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -78,6 +78,7 @@ struct i915_dependency {
 	unsigned long flags;
 #define I915_DEPENDENCY_ALLOC		BIT(0)
 #define I915_DEPENDENCY_EXTERNAL	BIT(1)
+#define I915_DEPENDENCY_WEAK		BIT(2)
 };
 
 #endif /* _I915_SCHEDULER_TYPES_H_ */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 04/14] drm/i915: Mark concurrent submissions with a weak-dependency
@ 2020-05-03 11:21   ` Chris Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: stable, Chris Wilson

We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could
correctly perform priority inheritance from the parallel branches to the
common trunk. However, for the purpose of timeslicing and reset
handling, the dependency is weak -- as we the pair of requests are
allowed to run in parallel and not in strict succession. So for example
we do need to suspend one if the other hangs.

The real significance though is that this allows us to rearrange
groups of WAIT_FOR_SUBMIT linked requests along the single engine, and
so can resolve user level inter-batch scheduling dependencies from user
semaphores.

Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
---
 drivers/gpu/drm/i915/gt/intel_lrc.c         | 9 +++++++++
 drivers/gpu/drm/i915/i915_request.c         | 8 ++++++--
 drivers/gpu/drm/i915/i915_scheduler.c       | 4 +++-
 drivers/gpu/drm/i915/i915_scheduler.h       | 3 ++-
 drivers/gpu/drm/i915/i915_scheduler_types.h | 1 +
 5 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index c00366387b54..508661cf61d9 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1883,6 +1883,9 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl)
 			struct i915_request *w =
 				container_of(p->waiter, typeof(*w), sched);
 
+			if (p->flags & I915_DEPENDENCY_WEAK)
+				continue;
+
 			/* Leave semaphores spinning on the other engines */
 			if (w->engine != rq->engine)
 				continue;
@@ -2729,6 +2732,9 @@ static void __execlists_hold(struct i915_request *rq)
 			struct i915_request *w =
 				container_of(p->waiter, typeof(*w), sched);
 
+			if (p->flags & I915_DEPENDENCY_WEAK)
+				continue;
+
 			/* Leave semaphores spinning on the other engines */
 			if (w->engine != rq->engine)
 				continue;
@@ -2853,6 +2859,9 @@ static void __execlists_unhold(struct i915_request *rq)
 			struct i915_request *w =
 				container_of(p->waiter, typeof(*w), sched);
 
+			if (p->flags & I915_DEPENDENCY_WEAK)
+				continue;
+
 			/* Propagate any change in error status */
 			if (rq->fence.error)
 				i915_request_set_error_once(w, rq->fence.error);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 22635bbabf06..95edc5523a01 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1038,7 +1038,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		return 0;
 
 	if (to->engine->schedule) {
-		ret = i915_sched_node_add_dependency(&to->sched, &from->sched);
+		ret = i915_sched_node_add_dependency(&to->sched,
+						     &from->sched,
+						     0);
 		if (ret < 0)
 			return ret;
 	}
@@ -1200,7 +1202,9 @@ __i915_request_await_execution(struct i915_request *to,
 
 	/* Couple the dependency tree for PI on this exposed to->fence */
 	if (to->engine->schedule) {
-		err = i915_sched_node_add_dependency(&to->sched, &from->sched);
+		err = i915_sched_node_add_dependency(&to->sched,
+						     &from->sched,
+						     I915_DEPENDENCY_WEAK);
 		if (err < 0)
 			return err;
 	}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 37cfcf5b321b..5f4c1e49e974 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -462,7 +462,8 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 }
 
 int i915_sched_node_add_dependency(struct i915_sched_node *node,
-				   struct i915_sched_node *signal)
+				   struct i915_sched_node *signal,
+				   unsigned long flags)
 {
 	struct i915_dependency *dep;
 
@@ -473,6 +474,7 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node,
 	local_bh_disable();
 
 	if (!__i915_sched_node_add_dependency(node, signal, dep,
+					      flags |
 					      I915_DEPENDENCY_EXTERNAL |
 					      I915_DEPENDENCY_ALLOC))
 		i915_dependency_free(dep);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index d1dc4efef77b..6f0bf00fc569 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -34,7 +34,8 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 				      unsigned long flags);
 
 int i915_sched_node_add_dependency(struct i915_sched_node *node,
-				   struct i915_sched_node *signal);
+				   struct i915_sched_node *signal,
+				   unsigned long flags);
 
 void i915_sched_node_fini(struct i915_sched_node *node);
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index d18e70550054..7186875088a0 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -78,6 +78,7 @@ struct i915_dependency {
 	unsigned long flags;
 #define I915_DEPENDENCY_ALLOC		BIT(0)
 #define I915_DEPENDENCY_EXTERNAL	BIT(1)
+#define I915_DEPENDENCY_WEAK		BIT(2)
 };
 
 #endif /* _I915_SCHEDULER_TYPES_H_ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 05/14] drm/i915/selftests: Repeat the rps clock frequency measurement
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (2 preceding siblings ...)
  2020-05-03 11:21   ` [Intel-gfx] " Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 06/14] drm/i915/gt: Stop holding onto the pinned_default_state Chris Wilson
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Repeat the measurement of the clock frequency a few times and use the
median to try and reduce the systematic measurement error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/selftest_rps.c | 54 +++++++++++++++++++-------
 1 file changed, 40 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
index b89a7d7611f6..bfa1a15564f7 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rps.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
@@ -56,6 +56,18 @@ static int cmp_u64(const void *A, const void *B)
 		return 0;
 }
 
+static int cmp_u32(const void *A, const void *B)
+{
+	const u32 *a = A, *b = B;
+
+	if (a < b)
+		return -1;
+	else if (a > b)
+		return 1;
+	else
+		return 0;
+}
+
 static struct i915_vma *
 create_spin_counter(struct intel_engine_cs *engine,
 		    struct i915_address_space *vm,
@@ -236,8 +248,8 @@ int live_rps_clock_interval(void *arg)
 	for_each_engine(engine, gt, id) {
 		unsigned long saved_heartbeat;
 		struct i915_request *rq;
-		ktime_t dt;
 		u32 cycles;
+		u64 dt;
 
 		if (!intel_engine_can_store_dword(engine))
 			continue;
@@ -286,15 +298,29 @@ int live_rps_clock_interval(void *arg)
 				  engine->name);
 			err = -ENODEV;
 		} else {
-			preempt_disable();
-			dt = ktime_get();
-			cycles = -intel_uncore_read_fw(gt->uncore,
-						       GEN6_RP_CUR_UP_EI);
-			udelay(1000);
-			dt = ktime_sub(ktime_get(), dt);
-			cycles += intel_uncore_read_fw(gt->uncore,
-						       GEN6_RP_CUR_UP_EI);
-			preempt_enable();
+			ktime_t dt_[5];
+			u32 cycles_[5];
+			int i;
+
+			for (i = 0; i < 5; i++) {
+				preempt_disable();
+
+				dt_[i] = ktime_get();
+				cycles_[i] = -intel_uncore_read_fw(gt->uncore, GEN6_RP_CUR_UP_EI);
+
+				udelay(1000);
+
+				dt_[i] = ktime_sub(ktime_get(), dt_[i]);
+				cycles_[i] += intel_uncore_read_fw(gt->uncore, GEN6_RP_CUR_UP_EI);
+
+				preempt_enable();
+			}
+
+			/* Use the median of both cycle/dt; close enough */
+			sort(cycles_, 5, sizeof(*cycles_), cmp_u32, NULL);
+			cycles = (cycles_[1] + 2 * cycles_[2] + cycles_[3]) / 4;
+			sort(dt_, 5, sizeof(*dt_), cmp_u64, NULL);
+			dt = div_u64(dt_[1] + 2 * dt_[2] + dt_[3], 4);
 		}
 
 		intel_uncore_write_fw(gt->uncore, GEN6_RP_CONTROL, 0);
@@ -306,14 +332,14 @@ int live_rps_clock_interval(void *arg)
 		if (err == 0) {
 			u64 time = intel_gt_pm_interval_to_ns(gt, cycles);
 			u32 expected =
-				intel_gt_ns_to_pm_interval(gt, ktime_to_ns(dt));
+				intel_gt_ns_to_pm_interval(gt, dt);
 
 			pr_info("%s: rps counted %d C0 cycles [%lldns] in %lldns [%d cycles], using GT clock frequency of %uKHz\n",
-				engine->name, cycles, time, ktime_to_ns(dt), expected,
+				engine->name, cycles, time, dt, expected,
 				gt->clock_frequency / 1000);
 
-			if (10 * time < 8 * ktime_to_ns(dt) ||
-			    8 * time > 10 * ktime_to_ns(dt)) {
+			if (10 * time < 8 * dt ||
+			    8 * time > 10 * dt) {
 				pr_err("%s: rps clock time does not match walltime!\n",
 				       engine->name);
 				err = -EINVAL;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 06/14] drm/i915/gt: Stop holding onto the pinned_default_state
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (3 preceding siblings ...)
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 05/14] drm/i915/selftests: Repeat the rps clock frequency measurement Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 07/14] dma-buf: Proxy fence, an unsignaled fence placeholder Chris Wilson
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

As we only restore the default context state upon banning a context, we
only need enough of the state to run the ring and nothing more. That is
we only need our bare protocontext.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.c    | 14 +-----
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  1 -
 drivers/gpu/drm/i915/gt/intel_lrc.c          | 14 ++----
 drivers/gpu/drm/i915/gt/selftest_context.c   | 11 ++--
 drivers/gpu/drm/i915/gt/selftest_lrc.c       | 53 +++++++++++++++-----
 5 files changed, 47 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 811debefebc0..d0a1078ef632 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -21,18 +21,11 @@ static int __engine_unpark(struct intel_wakeref *wf)
 	struct intel_engine_cs *engine =
 		container_of(wf, typeof(*engine), wakeref);
 	struct intel_context *ce;
-	void *map;
 
 	ENGINE_TRACE(engine, "\n");
 
 	intel_gt_pm_get(engine->gt);
 
-	/* Pin the default state for fast resets from atomic context. */
-	map = NULL;
-	if (engine->default_state)
-		map = shmem_pin_map(engine->default_state);
-	engine->pinned_default_state = map;
-
 	/* Discard stale context state from across idling */
 	ce = engine->kernel_context;
 	if (ce) {
@@ -42,6 +35,7 @@ static int __engine_unpark(struct intel_wakeref *wf)
 		if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) && ce->state) {
 			struct drm_i915_gem_object *obj = ce->state->obj;
 			int type = i915_coherent_map_type(engine->i915);
+			void *map;
 
 			map = i915_gem_object_pin_map(obj, type);
 			if (!IS_ERR(map)) {
@@ -260,12 +254,6 @@ static int __engine_park(struct intel_wakeref *wf)
 	if (engine->park)
 		engine->park(engine);
 
-	if (engine->pinned_default_state) {
-		shmem_unpin_map(engine->default_state,
-				engine->pinned_default_state);
-		engine->pinned_default_state = NULL;
-	}
-
 	engine->execlists.no_priolist = false;
 
 	/* While gt calls i915_vma_parked(), we have to break the lock cycle */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 6c676774dcd9..c84525363bb7 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -339,7 +339,6 @@ struct intel_engine_cs {
 	unsigned long wakeref_serial;
 	struct intel_wakeref wakeref;
 	struct file *default_state;
-	void *pinned_default_state;
 
 	struct {
 		struct intel_ring *ring;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 508661cf61d9..66dd5e09a123 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1271,14 +1271,11 @@ execlists_check_context(const struct intel_context *ce,
 static void restore_default_state(struct intel_context *ce,
 				  struct intel_engine_cs *engine)
 {
-	u32 *regs = ce->lrc_reg_state;
+	u32 *regs;
 
-	if (engine->pinned_default_state)
-		memcpy(regs, /* skip restoring the vanilla PPHWSP */
-		       engine->pinned_default_state + LRC_STATE_OFFSET,
-		       engine->context_size - PAGE_SIZE);
+	regs = memset(ce->lrc_reg_state, 0, engine->context_size - PAGE_SIZE);
+	execlists_init_reg_state(regs, ce, engine, ce->ring, true);
 
-	execlists_init_reg_state(regs, ce, engine, ce->ring, false);
 	ce->runtime.last = intel_context_get_runtime(ce);
 }
 
@@ -4175,8 +4172,6 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	 * image back to the expected values to skip over the guilty request.
 	 */
 	__i915_request_reset(rq, stalled);
-	if (!stalled)
-		goto out_replay;
 
 	/*
 	 * We want a simple context + ring to execute the breadcrumb update.
@@ -4186,9 +4181,6 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	 * future request will be after userspace has had the opportunity
 	 * to recreate its own state.
 	 */
-	GEM_BUG_ON(!intel_context_is_pinned(ce));
-	restore_default_state(ce, engine);
-
 out_replay:
 	ENGINE_TRACE(engine, "replay {head:%04x, tail:%04x}\n",
 		     head, ce->ring->tail);
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
index b8ed3cbe1277..a56dff3b157a 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -154,10 +154,7 @@ static int live_context_size(void *arg)
 	 */
 
 	for_each_engine(engine, gt, id) {
-		struct {
-			struct file *state;
-			void *pinned;
-		} saved;
+		struct file *saved;
 
 		if (!engine->context_size)
 			continue;
@@ -171,8 +168,7 @@ static int live_context_size(void *arg)
 		 * active state is sufficient, we are only checking that we
 		 * don't use more than we planned.
 		 */
-		saved.state = fetch_and_zero(&engine->default_state);
-		saved.pinned = fetch_and_zero(&engine->pinned_default_state);
+		saved = fetch_and_zero(&engine->default_state);
 
 		/* Overlaps with the execlists redzone */
 		engine->context_size += I915_GTT_PAGE_SIZE;
@@ -181,8 +177,7 @@ static int live_context_size(void *arg)
 
 		engine->context_size -= I915_GTT_PAGE_SIZE;
 
-		engine->pinned_default_state = saved.pinned;
-		engine->default_state = saved.state;
+		engine->default_state = saved;
 
 		intel_engine_pm_put(engine);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 7529df92f6a2..d946c0521dbc 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -5206,6 +5206,7 @@ store_context(struct intel_context *ce, struct i915_vma *scratch)
 {
 	struct i915_vma *batch;
 	u32 dw, x, *cs, *hw;
+	u32 *defaults;
 
 	batch = create_user_vma(ce->vm, SZ_64K);
 	if (IS_ERR(batch))
@@ -5217,9 +5218,16 @@ store_context(struct intel_context *ce, struct i915_vma *scratch)
 		return ERR_CAST(cs);
 	}
 
+	defaults = shmem_pin_map(ce->engine->default_state);
+	if (!defaults) {
+		i915_gem_object_unpin_map(batch->obj);
+		i915_vma_put(batch);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	x = 0;
 	dw = 0;
-	hw = ce->engine->pinned_default_state;
+	hw = defaults;
 	hw += LRC_STATE_OFFSET / sizeof(*hw);
 	do {
 		u32 len = hw[dw] & 0x7f;
@@ -5250,6 +5258,8 @@ store_context(struct intel_context *ce, struct i915_vma *scratch)
 
 	*cs++ = MI_BATCH_BUFFER_END;
 
+	shmem_unpin_map(ce->engine->default_state, defaults);
+
 	i915_gem_object_flush_map(batch->obj);
 	i915_gem_object_unpin_map(batch->obj);
 
@@ -5360,6 +5370,7 @@ static struct i915_vma *load_context(struct intel_context *ce, u32 poison)
 {
 	struct i915_vma *batch;
 	u32 dw, *cs, *hw;
+	u32 *defaults;
 
 	batch = create_user_vma(ce->vm, SZ_64K);
 	if (IS_ERR(batch))
@@ -5371,8 +5382,15 @@ static struct i915_vma *load_context(struct intel_context *ce, u32 poison)
 		return ERR_CAST(cs);
 	}
 
+	defaults = shmem_pin_map(ce->engine->default_state);
+	if (!defaults) {
+		i915_gem_object_unpin_map(batch->obj);
+		i915_vma_put(batch);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	dw = 0;
-	hw = ce->engine->pinned_default_state;
+	hw = defaults;
 	hw += LRC_STATE_OFFSET / sizeof(*hw);
 	do {
 		u32 len = hw[dw] & 0x7f;
@@ -5400,6 +5418,8 @@ static struct i915_vma *load_context(struct intel_context *ce, u32 poison)
 
 	*cs++ = MI_BATCH_BUFFER_END;
 
+	shmem_unpin_map(ce->engine->default_state, defaults);
+
 	i915_gem_object_flush_map(batch->obj);
 	i915_gem_object_unpin_map(batch->obj);
 
@@ -5467,6 +5487,7 @@ static int compare_isolation(struct intel_engine_cs *engine,
 {
 	u32 x, dw, *hw, *lrc;
 	u32 *A[2], *B[2];
+	u32 *defaults;
 	int err = 0;
 
 	A[0] = i915_gem_object_pin_map(ref[0]->obj, I915_MAP_WC);
@@ -5499,9 +5520,15 @@ static int compare_isolation(struct intel_engine_cs *engine,
 	}
 	lrc += LRC_STATE_OFFSET / sizeof(*hw);
 
+	defaults = shmem_pin_map(ce->engine->default_state);
+	if (!defaults) {
+		err = -ENOMEM;
+		goto err_lrc;
+	}
+
 	x = 0;
 	dw = 0;
-	hw = engine->pinned_default_state;
+	hw = defaults;
 	hw += LRC_STATE_OFFSET / sizeof(*hw);
 	do {
 		u32 len = hw[dw] & 0x7f;
@@ -5541,6 +5568,8 @@ static int compare_isolation(struct intel_engine_cs *engine,
 	} while (dw < PAGE_SIZE / sizeof(u32) &&
 		 (hw[dw] & ~BIT(0)) != MI_BATCH_BUFFER_END);
 
+	shmem_unpin_map(ce->engine->default_state, defaults);
+err_lrc:
 	i915_gem_object_unpin_map(ce->state->obj);
 err_B1:
 	i915_gem_object_unpin_map(result[1]->obj);
@@ -5690,18 +5719,16 @@ static int live_lrc_isolation(void *arg)
 			continue;
 
 		intel_engine_pm_get(engine);
-		if (engine->pinned_default_state) {
-			for (i = 0; i < ARRAY_SIZE(poison); i++) {
-				int result;
+		for (i = 0; i < ARRAY_SIZE(poison); i++) {
+			int result;
 
-				result = __lrc_isolation(engine, poison[i]);
-				if (result && !err)
-					err = result;
+			result = __lrc_isolation(engine, poison[i]);
+			if (result && !err)
+				err = result;
 
-				result = __lrc_isolation(engine, ~poison[i]);
-				if (result && !err)
-					err = result;
-			}
+			result = __lrc_isolation(engine, ~poison[i]);
+			if (result && !err)
+				err = result;
 		}
 		intel_engine_pm_put(engine);
 		if (igt_flush_test(gt->i915)) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 07/14] dma-buf: Proxy fence, an unsignaled fence placeholder
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (4 preceding siblings ...)
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 06/14] drm/i915/gt: Stop holding onto the pinned_default_state Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 08/14] drm/syncobj: Allow use of dma-fence-proxy Chris Wilson
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Often we need to create a fence for a future event that has not yet been
associated with a fence. We can store a proxy fence, a placeholder, in
the timeline and replace it later when the real fence is known. Any
listeners that attach to the proxy fence will automatically be signaled
when the real fence completes, and any future listeners will instead be
attach directly to the real fence avoiding any indirection overhead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/dma-buf/Makefile             |  13 +-
 drivers/dma-buf/dma-fence-private.h  |  20 +
 drivers/dma-buf/dma-fence-proxy.c    | 255 ++++++++++++
 drivers/dma-buf/dma-fence.c          |   4 +-
 drivers/dma-buf/selftests.h          |   1 +
 drivers/dma-buf/st-dma-fence-proxy.c | 581 +++++++++++++++++++++++++++
 include/linux/dma-fence-proxy.h      |  34 ++
 7 files changed, 904 insertions(+), 4 deletions(-)
 create mode 100644 drivers/dma-buf/dma-fence-private.h
 create mode 100644 drivers/dma-buf/dma-fence-proxy.c
 create mode 100644 drivers/dma-buf/st-dma-fence-proxy.c
 create mode 100644 include/linux/dma-fence-proxy.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 995e05f609ff..afaf6dadd9a3 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,6 +1,12 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
-	 dma-resv.o seqno-fence.o
+obj-y := \
+	dma-buf.o \
+	dma-fence.o \
+	dma-fence-array.o \
+	dma-fence-chain.o \
+	dma-fence-proxy.o \
+	dma-resv.o \
+	seqno-fence.o
 obj-$(CONFIG_DMABUF_HEAPS)	+= dma-heap.o
 obj-$(CONFIG_DMABUF_HEAPS)	+= heaps/
 obj-$(CONFIG_SYNC_FILE)		+= sync_file.o
@@ -10,6 +16,7 @@ obj-$(CONFIG_UDMABUF)		+= udmabuf.o
 dmabuf_selftests-y := \
 	selftest.o \
 	st-dma-fence.o \
-	st-dma-fence-chain.o
+	st-dma-fence-chain.o \
+	st-dma-fence-proxy.o
 
 obj-$(CONFIG_DMABUF_SELFTESTS)	+= dmabuf_selftests.o
diff --git a/drivers/dma-buf/dma-fence-private.h b/drivers/dma-buf/dma-fence-private.h
new file mode 100644
index 000000000000..6924d28af0fa
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-private.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Fence mechanism for dma-buf and to allow for asynchronous dma access
+ *
+ * Copyright (C) 2012 Canonical Ltd
+ * Copyright (C) 2012 Texas Instruments
+ *
+ * Authors:
+ * Rob Clark <robdclark@gmail.com>
+ * Maarten Lankhorst <maarten.lankhorst@canonical.com>
+ */
+
+#ifndef DMA_FENCE_PRIVATE_H
+#define DMA_FENCE_PRIAVTE_H
+
+struct dma_fence;
+
+bool __dma_fence_enable_signaling(struct dma_fence *fence);
+
+#endif /* DMA_FENCE_PRIAVTE_H */
diff --git a/drivers/dma-buf/dma-fence-proxy.c b/drivers/dma-buf/dma-fence-proxy.c
new file mode 100644
index 000000000000..4cdb1c4763ab
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-proxy.c
@@ -0,0 +1,255 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * dma-fence-proxy: placeholder unsignaled fence
+ *
+ * Copyright (C) 2017-2019 Intel Corporation
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/dma-fence-proxy.h>
+#include <linux/export.h>
+#include <linux/irq_work.h>
+#include <linux/slab.h>
+
+#include "dma-fence-private.h"
+
+struct dma_fence_proxy {
+	struct dma_fence base;
+
+	struct dma_fence *real;
+	struct dma_fence_cb cb;
+	struct irq_work work;
+
+	wait_queue_head_t wq;
+};
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#define same_lockclass(A, B) (A)->dep_map.key == (B)->dep_map.key
+#else
+#define same_lockclass(A, B) 0
+#endif
+
+static const char *proxy_get_driver_name(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	struct dma_fence *real = READ_ONCE(p->real);
+
+	return real ? real->ops->get_driver_name(real) : "proxy";
+}
+
+static const char *proxy_get_timeline_name(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	struct dma_fence *real = READ_ONCE(p->real);
+
+	return real ? real->ops->get_timeline_name(real) : "unset";
+}
+
+static void proxy_irq_work(struct irq_work *work)
+{
+	struct dma_fence_proxy *p = container_of(work, typeof(*p), work);
+
+	dma_fence_signal(&p->base);
+	dma_fence_put(&p->base);
+}
+
+static void proxy_callback(struct dma_fence *real, struct dma_fence_cb *cb)
+{
+	struct dma_fence_proxy *p = container_of(cb, typeof(*p), cb);
+
+	if (real->error)
+		dma_fence_set_error(&p->base, real->error);
+
+	/* Lower the height of the proxy chain -> single stack frame */
+	irq_work_queue(&p->work);
+}
+
+static bool proxy_enable_signaling(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	struct dma_fence *real = READ_ONCE(p->real);
+	bool ret = true;
+
+	if (real) {
+		spin_lock_nested(real->lock,
+				 same_lockclass(&p->wq.lock, real->lock));
+		ret = __dma_fence_enable_signaling(real);
+		spin_unlock(real->lock);
+	}
+
+	return ret;
+}
+
+static void proxy_release(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+
+	dma_fence_put(p->real);
+	dma_fence_free(&p->base);
+}
+
+const struct dma_fence_ops dma_fence_proxy_ops = {
+	.get_driver_name = proxy_get_driver_name,
+	.get_timeline_name = proxy_get_timeline_name,
+	.enable_signaling = proxy_enable_signaling,
+	.wait = dma_fence_default_wait,
+	.release = proxy_release,
+};
+EXPORT_SYMBOL_GPL(dma_fence_proxy_ops);
+
+/**
+ * dma_fence_create_proxy - Create an unset dma-fence
+ *
+ * dma_fence_create_proxy() creates a new dma_fence stub that is initially
+ * unsignaled and may later be replaced with a real fence. Any listeners
+ * to the proxy fence will be signaled when the target fence signals its
+ * completion.
+ */
+struct dma_fence *dma_fence_create_proxy(void)
+{
+	struct dma_fence_proxy *p;
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return NULL;
+
+	init_waitqueue_head(&p->wq);
+	dma_fence_init(&p->base, &dma_fence_proxy_ops, &p->wq.lock,
+		       dma_fence_context_alloc(1), 0);
+	init_irq_work(&p->work, proxy_irq_work);
+
+	return &p->base;
+}
+EXPORT_SYMBOL(dma_fence_create_proxy);
+
+static void wrap_signal_locked(struct dma_fence *fence, struct dma_fence *real)
+{
+	if (real->error)
+		dma_fence_set_error(fence, real->error);
+	dma_fence_signal_locked(fence);
+}
+
+static void __wake_up_listeners(struct dma_fence_proxy *p)
+{
+	struct wait_queue_entry *wait, *next;
+
+	list_for_each_entry_safe(wait, next, &p->wq.head, entry) {
+		INIT_LIST_HEAD(&wait->entry);
+		wait->func(wait, TASK_NORMAL, 0, p->real);
+	}
+}
+
+static void proxy_assign(struct dma_fence *fence, struct dma_fence *real)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	unsigned long flags;
+
+	if (WARN_ON(fence == real))
+		return;
+
+	if (WARN_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)))
+		return;
+
+	if (WARN_ON(p->real))
+		return;
+
+	spin_lock_irqsave(&p->wq.lock, flags);
+
+	if (unlikely(!real)) {
+		dma_fence_signal_locked(&p->base);
+		goto unlock;
+	}
+
+	p->real = dma_fence_get(real);
+
+	spin_lock_nested(real->lock, same_lockclass(&p->wq.lock, real->lock));
+	if (dma_fence_is_signaled(real)) {
+		wrap_signal_locked(&p->base, real);
+	} else if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+			    &p->base.flags) &&
+		   !__dma_fence_enable_signaling(real)) {
+		wrap_signal_locked(&p->base, real);
+	} else {
+		dma_fence_get(&p->base);
+		p->cb.func = proxy_callback;
+		list_add_tail(&p->cb.node, &real->cb_list);
+	}
+	spin_unlock(real->lock);
+
+unlock:
+	__wake_up_listeners(p);
+	spin_unlock_irqrestore(&p->wq.lock, flags);
+}
+
+/**
+ * dma_fence_replace_proxy - Replace the proxy fence with the real target
+ * @slot: pointer to location of fence to update
+ * @fence: the new fence to store in @slot
+ *
+ * Once the real dma_fence is known, we can replace the proxy fence holder
+ * with a pointer to the real dma fence. Future listeners will attach to
+ * the real fence, avoiding any indirection overhead. Previous listeners
+ * will remain attached to the proxy fence, and be signaled in turn when
+ * the target fence completes.
+ */
+struct dma_fence *
+dma_fence_replace_proxy(struct dma_fence __rcu **slot, struct dma_fence *fence)
+{
+	struct dma_fence *old;
+
+	if (fence)
+		dma_fence_get(fence);
+
+	old = rcu_replace_pointer(*slot, fence, true);
+	if (old && dma_fence_is_proxy(old))
+		proxy_assign(old, fence);
+
+	return old;
+}
+EXPORT_SYMBOL(dma_fence_replace_proxy);
+
+void dma_fence_add_proxy_listener(struct dma_fence *fence,
+				  struct wait_queue_entry *wait)
+{
+	if (dma_fence_is_proxy(fence)) {
+		struct dma_fence_proxy *p =
+			container_of(fence, typeof(*p), base);
+		unsigned long flags;
+
+		spin_lock_irqsave(&p->wq.lock, flags);
+		if (!dma_fence_is_signaled(&p->base)) {
+			list_add_tail(&wait->entry, &p->wq.head);
+			wait = NULL;
+		}
+		fence = p->real;
+		spin_unlock_irqrestore(&p->wq.lock, flags);
+	}
+
+	if (wait) {
+		INIT_LIST_HEAD(&wait->entry);
+		wait->func(wait, TASK_NORMAL, 0, fence);
+	}
+}
+EXPORT_SYMBOL(dma_fence_add_proxy_listener);
+
+bool dma_fence_remove_proxy_listener(struct dma_fence *fence,
+				     struct wait_queue_entry *wait)
+{
+	bool ret = false;
+
+	if (dma_fence_is_proxy(fence)) {
+		struct dma_fence_proxy *p =
+			container_of(fence, typeof(*p), base);
+		unsigned long flags;
+
+		spin_lock_irqsave(&p->wq.lock, flags);
+		if (!list_empty(&wait->entry)) {
+			list_del_init(&wait->entry);
+			ret = true;
+		}
+		spin_unlock_irqrestore(&p->wq.lock, flags);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(dma_fence_remove_proxy_listener);
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 052a41e2451c..fa7bedc6703d 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -19,6 +19,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/dma_fence.h>
 
+#include "dma-fence-private.h"
+
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
@@ -273,7 +275,7 @@ void dma_fence_free(struct dma_fence *fence)
 }
 EXPORT_SYMBOL(dma_fence_free);
 
-static bool __dma_fence_enable_signaling(struct dma_fence *fence)
+bool __dma_fence_enable_signaling(struct dma_fence *fence)
 {
 	bool was_set;
 
diff --git a/drivers/dma-buf/selftests.h b/drivers/dma-buf/selftests.h
index 55918ef9adab..616eca70e2d8 100644
--- a/drivers/dma-buf/selftests.h
+++ b/drivers/dma-buf/selftests.h
@@ -12,3 +12,4 @@
 selftest(sanitycheck, __sanitycheck__) /* keep first (igt selfcheck) */
 selftest(dma_fence, dma_fence)
 selftest(dma_fence_chain, dma_fence_chain)
+selftest(dma_fence_proxy, dma_fence_proxy)
diff --git a/drivers/dma-buf/st-dma-fence-proxy.c b/drivers/dma-buf/st-dma-fence-proxy.c
new file mode 100644
index 000000000000..658f6b90abc4
--- /dev/null
+++ b/drivers/dma-buf/st-dma-fence-proxy.c
@@ -0,0 +1,581 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-fence.h>
+#include <linux/dma-fence-proxy.h>
+#include <linux/kernel.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include "selftest.h"
+
+static struct kmem_cache *slab_fences;
+
+static struct mock_fence {
+	struct dma_fence base;
+	spinlock_t lock;
+} *to_mock_fence(struct dma_fence *f) {
+	return container_of(f, struct mock_fence, base);
+}
+
+static const char *mock_name(struct dma_fence *f)
+{
+	return "mock";
+}
+
+static void mock_fence_release(struct dma_fence *f)
+{
+	kmem_cache_free(slab_fences, to_mock_fence(f));
+}
+
+static const struct dma_fence_ops mock_ops = {
+	.get_driver_name = mock_name,
+	.get_timeline_name = mock_name,
+	.release = mock_fence_release,
+};
+
+static struct dma_fence *mock_fence(void)
+{
+	struct mock_fence *f;
+
+	f = kmem_cache_alloc(slab_fences, GFP_KERNEL);
+	if (!f)
+		return NULL;
+
+	spin_lock_init(&f->lock);
+	dma_fence_init(&f->base, &mock_ops, &f->lock, 0, 0);
+
+	return &f->base;
+}
+
+static int sanitycheck(void *arg)
+{
+	struct dma_fence *f;
+
+	f = dma_fence_create_proxy();
+	if (!f)
+		return -ENOMEM;
+
+	dma_fence_signal(f);
+	dma_fence_put(f);
+
+	return 0;
+}
+
+struct fences {
+	struct dma_fence *real;
+	struct dma_fence *proxy;
+	struct dma_fence __rcu *slot;
+};
+
+static int create_fences(struct fences *f, bool attach)
+{
+	f->proxy = dma_fence_create_proxy();
+	if (!f->proxy)
+		return -ENOMEM;
+
+	RCU_INIT_POINTER(f->slot, f->proxy);
+
+	f->real = mock_fence();
+	if (!f->real) {
+		dma_fence_put(f->proxy);
+		return -ENOMEM;
+	}
+
+	if (attach)
+		dma_fence_replace_proxy(&f->slot, f->real);
+
+	return 0;
+}
+
+static void free_fences(struct fences *f)
+{
+	dma_fence_put(dma_fence_replace_proxy(&f->slot, NULL));
+	dma_fence_put(f->real);
+	dma_fence_put(f->proxy);
+}
+
+static int wrap_signaling(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence unexpectedly signaled on creation\n");
+		goto err_free;
+	}
+
+	if (dma_fence_signal(f.real)) {
+		pr_err("Fence reported being already signaled\n");
+		goto err_free;
+	}
+
+	if (!dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence not reporting signaled\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_signaling_recurse(void *arg)
+{
+	struct fences f;
+	struct dma_fence *chain;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	chain = dma_fence_create_proxy();
+	if (!chain) {
+		err = -ENOMEM;
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, chain);
+	dma_fence_put(dma_fence_replace_proxy(&f.slot, f.real));
+	dma_fence_put(chain);
+
+	/* f.real <- chain <- f.proxy */
+
+	if (dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence unexpectedly signaled on creation\n");
+		goto err_free;
+	}
+
+	if (dma_fence_signal(f.real)) {
+		pr_err("Fence reported being already signaled\n");
+		goto err_free;
+	}
+
+	if (!dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence not reporting signaled\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+struct simple_cb {
+	struct dma_fence_cb cb;
+	bool seen;
+};
+
+static void simple_callback(struct dma_fence *f, struct dma_fence_cb *cb)
+{
+	smp_store_mb(container_of(cb, struct simple_cb, cb)->seen, true);
+}
+
+static int wrap_add_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_add_callback_recurse(void *arg)
+{
+	struct simple_cb cb = {};
+	struct dma_fence *chain;
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	chain = dma_fence_create_proxy();
+	if (!chain) {
+		err = -ENOMEM;
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, chain);
+	dma_fence_put(dma_fence_replace_proxy(&f.slot, f.real));
+	dma_fence_put(chain);
+
+	/* f.real <- chain <- f.proxy */
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_late_add_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	dma_fence_signal(f.real);
+
+	if (!dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Added callback, but fence was already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (cb.seen) {
+		pr_err("Callback called after failed attachment!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_early_add_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, f.real);
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_early_add_callback_late(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	dma_fence_signal(f.real);
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, f.real);
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_early_add_callback_early(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, f.real);
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_rm_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	if (!dma_fence_remove_callback(f.proxy, &cb.cb)) {
+		pr_err("Failed to remove callback!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (cb.seen) {
+		pr_err("Callback still signaled after removal!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_late_rm_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	if (dma_fence_remove_callback(f.proxy, &cb.cb)) {
+		pr_err("Callback removal succeed after being executed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_status(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_get_status(f.proxy)) {
+		pr_err("Fence unexpectedly has signaled status on creation\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!dma_fence_get_status(f.proxy)) {
+		pr_err("Fence not reporting signaled status\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_error(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	dma_fence_set_error(f.real, -EIO);
+
+	if (dma_fence_get_status(f.proxy)) {
+		pr_err("Fence unexpectedly has error status before signal\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (dma_fence_get_status(f.proxy) != -EIO) {
+		pr_err("Fence not reporting error status, got %d\n",
+		       dma_fence_get_status(f.proxy));
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_wait(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_wait_timeout(f.proxy, false, 0) != 0) {
+		pr_err("Wait reported complete before being signaled\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+
+	if (dma_fence_wait_timeout(f.proxy, false, 0) == 0) {
+		pr_err("Wait reported incomplete after being signaled\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	dma_fence_signal(f.real);
+	free_fences(&f);
+	return err;
+}
+
+struct wait_timer {
+	struct timer_list timer;
+	struct fences f;
+};
+
+static void wait_timer(struct timer_list *timer)
+{
+	struct wait_timer *wt = from_timer(wt, timer, timer);
+
+	dma_fence_signal(wt->f.real);
+}
+
+static int wrap_wait_timeout(void *arg)
+{
+	struct wait_timer wt;
+	int err = -EINVAL;
+
+	if (create_fences(&wt.f, true))
+		return -ENOMEM;
+
+	timer_setup_on_stack(&wt.timer, wait_timer, 0);
+
+	if (dma_fence_wait_timeout(wt.f.proxy, false, 1) != 0) {
+		pr_err("Wait reported complete before being signaled\n");
+		goto err_free;
+	}
+
+	mod_timer(&wt.timer, jiffies + 1);
+
+	if (dma_fence_wait_timeout(wt.f.proxy, false, 2) != 0) {
+		if (timer_pending(&wt.timer)) {
+			pr_notice("Timer did not fire within the jiffie!\n");
+			err = 0; /* not our fault! */
+		} else {
+			pr_err("Wait reported incomplete after timeout\n");
+		}
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	del_timer_sync(&wt.timer);
+	destroy_timer_on_stack(&wt.timer);
+	dma_fence_signal(wt.f.real);
+	free_fences(&wt.f);
+	return err;
+}
+
+int dma_fence_proxy(void)
+{
+	static const struct subtest tests[] = {
+		SUBTEST(sanitycheck),
+		SUBTEST(wrap_signaling),
+		SUBTEST(wrap_signaling_recurse),
+		SUBTEST(wrap_add_callback),
+		SUBTEST(wrap_add_callback_recurse),
+		SUBTEST(wrap_late_add_callback),
+		SUBTEST(wrap_early_add_callback),
+		SUBTEST(wrap_early_add_callback_late),
+		SUBTEST(wrap_early_add_callback_early),
+		SUBTEST(wrap_rm_callback),
+		SUBTEST(wrap_late_rm_callback),
+		SUBTEST(wrap_status),
+		SUBTEST(wrap_error),
+		SUBTEST(wrap_wait),
+		SUBTEST(wrap_wait_timeout),
+	};
+	int ret;
+
+	slab_fences = KMEM_CACHE(mock_fence,
+				 SLAB_TYPESAFE_BY_RCU |
+				 SLAB_HWCACHE_ALIGN);
+	if (!slab_fences)
+		return -ENOMEM;
+
+	ret = subtests(tests, NULL);
+
+	kmem_cache_destroy(slab_fences);
+
+	return ret;
+}
diff --git a/include/linux/dma-fence-proxy.h b/include/linux/dma-fence-proxy.h
new file mode 100644
index 000000000000..063cde6b42c4
--- /dev/null
+++ b/include/linux/dma-fence-proxy.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * dma-fence-proxy: allows waiting upon unset and future fences
+ *
+ * Copyright (C) 2017 Intel Corporation
+ */
+
+#ifndef __LINUX_DMA_FENCE_PROXY_H
+#define __LINUX_DMA_FENCE_PROXY_H
+
+#include <linux/kernel.h>
+#include <linux/dma-fence.h>
+
+struct wait_queue_entry;
+
+extern const struct dma_fence_ops dma_fence_proxy_ops;
+
+struct dma_fence *dma_fence_create_proxy(void);
+
+static inline bool dma_fence_is_proxy(struct dma_fence *fence)
+{
+	return fence->ops == &dma_fence_proxy_ops;
+}
+
+struct dma_fence *
+dma_fence_replace_proxy(struct dma_fence __rcu **slot,
+			struct dma_fence *fence);
+
+void dma_fence_add_proxy_listener(struct dma_fence *fence,
+				  struct wait_queue_entry *wait);
+bool dma_fence_remove_proxy_listener(struct dma_fence *fence,
+				     struct wait_queue_entry *wait);
+
+#endif /* __LINUX_DMA_FENCE_PROXY_H */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 08/14] drm/syncobj: Allow use of dma-fence-proxy
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (5 preceding siblings ...)
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 07/14] dma-buf: Proxy fence, an unsignaled fence placeholder Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 09/14] drm/i915/gem: Teach execbuf how to wait on future syncobj Chris Wilson
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Allow the callers to supply a dma-fence-proxy for asynchronous waiting on
future fences.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/drm_syncobj.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 42d46414f767..e141db0e1eb6 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -184,6 +184,7 @@
  */
 
 #include <linux/anon_inodes.h>
+#include <linux/dma-fence-proxy.h>
 #include <linux/file.h>
 #include <linux/fs.h>
 #include <linux/sched/signal.h>
@@ -324,14 +325,9 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
 	struct dma_fence *old_fence;
 	struct syncobj_wait_entry *cur, *tmp;
 
-	if (fence)
-		dma_fence_get(fence);
-
 	spin_lock(&syncobj->lock);
 
-	old_fence = rcu_dereference_protected(syncobj->fence,
-					      lockdep_is_held(&syncobj->lock));
-	rcu_assign_pointer(syncobj->fence, fence);
+	old_fence = dma_fence_replace_proxy(&syncobj->fence, fence);
 
 	if (fence != old_fence) {
 		list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 09/14] drm/i915/gem: Teach execbuf how to wait on future syncobj
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (6 preceding siblings ...)
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 08/14] drm/syncobj: Allow use of dma-fence-proxy Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 10/14] drm/i915/gem: Allow combining submit-fences with syncobj Chris Wilson
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If a syncobj has not yet been assigned, treat it as a future fence and
install and wait upon a dma-fence-proxy. The proxy will be replace by
the real fence later, and that fence will be responsible for signaling
our waiter.

Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4854
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 21 +++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 44d7da0e200e..8b854f87a249 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/intel-iommu.h>
+#include <linux/dma-fence-proxy.h>
 #include <linux/dma-resv.h>
 #include <linux/sync_file.h>
 #include <linux/uaccess.h>
@@ -2524,8 +2525,24 @@ await_fence_array(struct i915_execbuffer *eb,
 			continue;
 
 		fence = drm_syncobj_fence_get(syncobj);
-		if (!fence)
-			return -EINVAL;
+		if (!fence) {
+			struct dma_fence *old;
+
+			fence = dma_fence_create_proxy();
+			if (!fence)
+				return -ENOMEM;
+
+			spin_lock(&syncobj->lock);
+			old = rcu_dereference_protected(syncobj->fence, true);
+			if (unlikely(old)) {
+				dma_fence_put(fence);
+				fence = dma_fence_get(old);
+			} else {
+				rcu_assign_pointer(syncobj->fence,
+						   dma_fence_get(fence));
+			}
+			spin_unlock(&syncobj->lock);
+		}
 
 		err = i915_request_await_dma_fence(eb->request, fence);
 		dma_fence_put(fence);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 10/14] drm/i915/gem: Allow combining submit-fences with syncobj
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (7 preceding siblings ...)
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 09/14] drm/i915/gem: Teach execbuf how to wait on future syncobj Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 11/14] drm/i915/gt: Declare when we enabled timeslicing Chris Wilson
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

We allow exported sync_file fences to be used as submit fences, but they
are not the only source of user fences. We also accept an array of
syncobj, and as with sync_file these are dma_fences underneath and so
feature the same set of controls. The submit-fence allows for a request
to be scheduled at the same time as the signaler, rather than as normal
after. Userspace can combine submit-fence with its own semaphores for
intra-batch scheduling.

Not exposing submit-fences to syncobj was at the time just a matter of
pragmatic expediency.

Fixes: a88b6e4cbafd ("drm/i915: Allow specification of parallel execbuf")
Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4854
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  14 ++-
 drivers/gpu/drm/i915/i915_request.c           | 110 ++++++++++++++++++
 include/uapi/drm/i915_drm.h                   |   7 +-
 3 files changed, 123 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 8b854f87a249..67ba33b3de60 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2432,7 +2432,7 @@ static void
 __free_fence_array(struct drm_syncobj **fences, unsigned int n)
 {
 	while (n--)
-		drm_syncobj_put(ptr_mask_bits(fences[n], 2));
+		drm_syncobj_put(ptr_mask_bits(fences[n], 3));
 	kvfree(fences);
 }
 
@@ -2489,7 +2489,7 @@ get_fence_array(struct drm_i915_gem_execbuffer2 *args,
 		BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
 			     ~__I915_EXEC_FENCE_UNKNOWN_FLAGS);
 
-		fences[n] = ptr_pack_bits(syncobj, fence.flags, 2);
+		fences[n] = ptr_pack_bits(syncobj, fence.flags, 3);
 	}
 
 	return fences;
@@ -2520,7 +2520,7 @@ await_fence_array(struct i915_execbuffer *eb,
 		struct dma_fence *fence;
 		unsigned int flags;
 
-		syncobj = ptr_unpack_bits(fences[n], &flags, 2);
+		syncobj = ptr_unpack_bits(fences[n], &flags, 3);
 		if (!(flags & I915_EXEC_FENCE_WAIT))
 			continue;
 
@@ -2544,7 +2544,11 @@ await_fence_array(struct i915_execbuffer *eb,
 			spin_unlock(&syncobj->lock);
 		}
 
-		err = i915_request_await_dma_fence(eb->request, fence);
+		if (flags & I915_EXEC_FENCE_WAIT_SUBMIT)
+			err = i915_request_await_execution(eb->request, fence,
+							   eb->engine->bond_execute);
+		else
+			err = i915_request_await_dma_fence(eb->request, fence);
 		dma_fence_put(fence);
 		if (err < 0)
 			return err;
@@ -2565,7 +2569,7 @@ signal_fence_array(struct i915_execbuffer *eb,
 		struct drm_syncobj *syncobj;
 		unsigned int flags;
 
-		syncobj = ptr_unpack_bits(fences[n], &flags, 2);
+		syncobj = ptr_unpack_bits(fences[n], &flags, 3);
 		if (!(flags & I915_EXEC_FENCE_SIGNAL))
 			continue;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 95edc5523a01..248efbc01224 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -23,6 +23,7 @@
  */
 
 #include <linux/dma-fence-array.h>
+#include <linux/dma-fence-proxy.h>
 #include <linux/irq_work.h>
 #include <linux/prefetch.h>
 #include <linux/sched.h>
@@ -1213,6 +1214,110 @@ __i915_request_await_execution(struct i915_request *to,
 					     &from->fence);
 }
 
+struct execution_proxy {
+	struct wait_queue_entry base;
+	void (*hook)(struct i915_request *rq,
+		     struct dma_fence *signal);
+	struct i915_request *request;
+	struct dma_fence *fence;
+	struct timer_list timer;
+	struct work_struct work;
+};
+
+static void execution_proxy_work(struct work_struct *work)
+{
+	struct execution_proxy *wait = container_of(work, typeof(*wait), work);
+	struct i915_request *rq = wait->request;
+
+	del_timer_sync(&wait->timer);
+
+	if (wait->fence) {
+		int err;
+
+		mutex_lock(&rq->context->timeline->mutex);
+
+		if (dma_fence_is_i915(wait->fence))
+			err = __i915_request_await_execution(rq,
+							     to_request(wait->fence),
+							     wait->hook);
+		else
+			err = i915_sw_fence_await_dma_fence(&rq->submit,
+							    wait->fence,
+							    I915_FENCE_TIMEOUT,
+							    GFP_KERNEL);
+		if (err < 0)
+			i915_request_set_error_once(rq, err);
+
+		mutex_unlock(&rq->context->timeline->mutex);
+	}
+
+	i915_sw_fence_complete(&rq->submit);
+
+	dma_fence_put(wait->fence);
+	kfree(wait);
+}
+
+static int
+execution_proxy_wake(struct wait_queue_entry *entry,
+		     unsigned int mode,
+		     int flags,
+		     void *fence)
+{
+	struct execution_proxy *wait = container_of(entry, typeof(*wait), base);
+
+	wait->fence = dma_fence_get(fence);
+	schedule_work(&wait->work);
+
+	return 0;
+}
+
+static void
+execution_proxy_timer(struct timer_list *t)
+{
+	struct execution_proxy *wait = container_of(t, typeof(*wait), timer);
+
+	if (dma_fence_remove_proxy_listener(wait->base.private, &wait->base)) {
+		struct i915_request *rq = wait->request;
+
+		pr_notice("Asynchronous wait on proxy fence for %s:%s:%llx timed out\n",
+			  rq->fence.ops->get_driver_name(&rq->fence),
+			  rq->fence.ops->get_timeline_name(&rq->fence),
+			  rq->fence.seqno);
+		i915_request_set_error_once(rq, -ETIMEDOUT);
+
+		schedule_work(&wait->work);
+	}
+}
+
+static int
+__i915_request_await_proxy_execution(struct i915_request *rq,
+				     struct dma_fence *fence,
+				     unsigned long timeout,
+				     void (*hook)(struct i915_request *rq,
+						  struct dma_fence *signal))
+{
+	struct execution_proxy *wait;
+
+	wait = kzalloc(sizeof(*wait), GFP_KERNEL);
+	if (!wait)
+		return -ENOMEM;
+
+	i915_sw_fence_await(&rq->submit);
+
+	wait->base.private = fence;
+	wait->base.func = execution_proxy_wake;
+	wait->request = rq;
+	wait->hook = hook;
+	INIT_WORK(&wait->work, execution_proxy_work);
+
+	timer_setup(&wait->timer, execution_proxy_timer, 0);
+	if (timeout)
+		mod_timer(&wait->timer, round_jiffies_up(jiffies + timeout));
+
+	dma_fence_add_proxy_listener(fence, &wait->base);
+	return 0;
+}
+
 int
 i915_request_await_execution(struct i915_request *rq,
 			     struct dma_fence *fence,
@@ -1249,6 +1354,11 @@ i915_request_await_execution(struct i915_request *rq,
 			ret = __i915_request_await_execution(rq,
 							     to_request(fence),
 							     hook);
+		else if (dma_fence_is_proxy(fence))
+			ret = __i915_request_await_proxy_execution(rq,
+								   fence,
+								   I915_FENCE_TIMEOUT,
+								   hook);
 		else
 			ret = i915_sw_fence_await_dma_fence(&rq->submit, fence,
 							    I915_FENCE_TIMEOUT,
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 14b67cd6b54b..704dd0e3bc1d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1040,9 +1040,10 @@ struct drm_i915_gem_exec_fence {
 	 */
 	__u32 handle;
 
-#define I915_EXEC_FENCE_WAIT            (1<<0)
-#define I915_EXEC_FENCE_SIGNAL          (1<<1)
-#define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SIGNAL << 1))
+#define I915_EXEC_FENCE_WAIT            (1u << 0)
+#define I915_EXEC_FENCE_SIGNAL          (1u << 1)
+#define I915_EXEC_FENCE_WAIT_SUBMIT     (1u << 2)
+#define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_WAIT_SUBMIT << 1))
 	__u32 flags;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 11/14] drm/i915/gt: Declare when we enabled timeslicing
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (8 preceding siblings ...)
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 10/14] drm/i915/gem: Allow combining submit-fences with syncobj Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 12/14] drm/i915: Replace the hardcoded I915_FENCE_TIMEOUT Chris Wilson
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Kenneth Graunke, Chris Wilson

Let userspace know if they can trust timeslicing by including it as part
of the I915_PARAM_HAS_SCHEDULER::I915_SCHEDULER_CAP_TIMESLICING

v2: Only declare timeslicing if we can safely preempt userspace.

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802
Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4854
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 1 +
 include/uapi/drm/i915_drm.h                 | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 848decee9066..8415511f1465 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -98,6 +98,7 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
 		MAP(HAS_PREEMPTION, PREEMPTION),
 		MAP(HAS_SEMAPHORES, SEMAPHORES),
 		MAP(SUPPORTS_STATS, ENGINE_BUSY_STATS),
+		MAP(HAS_TIMESLICES, TIMESLICING),
 #undef MAP
 	};
 	struct intel_engine_cs *engine;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 704dd0e3bc1d..1ee227b5131a 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -523,6 +523,7 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
 #define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
 #define   I915_SCHEDULER_CAP_ENGINE_BUSY_STATS	(1ul << 4)
+#define   I915_SCHEDULER_CAP_TIMESLICING	(1ul << 5)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 12/14] drm/i915: Replace the hardcoded I915_FENCE_TIMEOUT
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (9 preceding siblings ...)
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 11/14] drm/i915/gt: Declare when we enabled timeslicing Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 13/14] drm/i915: Drop I915_RESET_TIMEOUT and friends Chris Wilson
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Expose the hardcoded timeout for unsignaled foreign fences as a Kconfig
option, primarily to allow brave systems to disable the timeout and
solely rely on correct signaling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/Kconfig.profile           | 12 ++++++++++++
 drivers/gpu/drm/i915/Makefile                  |  1 +
 drivers/gpu/drm/i915/display/intel_display.c   |  5 +++--
 drivers/gpu/drm/i915/gem/i915_gem_clflush.c    |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_client_blt.c |  3 +--
 drivers/gpu/drm/i915/gem/i915_gem_fence.c      |  4 ++--
 drivers/gpu/drm/i915/i915_config.c             | 15 +++++++++++++++
 drivers/gpu/drm/i915/i915_drv.h                |  2 +-
 drivers/gpu/drm/i915/i915_request.c            |  8 ++++----
 9 files changed, 40 insertions(+), 12 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_config.c

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index 0bfd276c19fe..3925be65d314 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -1,3 +1,15 @@
+config DRM_I915_FENCE_TIMEOUT
+	int "Timeout for unsignaled foreign fences"
+	default 10000 # milliseconds
+	help
+	  When listening to a foreign fence, we install a supplementary timer
+	  to ensure that we are always signaled and our userspace is able to
+	  make forward progress. This value specifies the timeout used for an
+	  unsignaled foreign fence.
+
+	  May be 0 to disable the timeout, and rely on the foreign fence being
+	  eventually signaled.
+
 config DRM_I915_USERFAULT_AUTOSUSPEND
 	int "Runtime autosuspend delay for userspace GGTT mmaps (ms)"
 	default 250 # milliseconds
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 5359c736c789..b0da6ea6e3f1 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -35,6 +35,7 @@ subdir-ccflags-y += -I$(srctree)/$(src)
 
 # core driver code
 i915-y += i915_drv.o \
+	  i915_config.o \
 	  i915_irq.o \
 	  i915_getparam.o \
 	  i915_params.o \
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 2a17cf38d3dc..7b3578cfcacc 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -15815,7 +15815,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
 	if (new_plane_state->uapi.fence) { /* explicit fencing */
 		ret = i915_sw_fence_await_dma_fence(&state->commit_ready,
 						    new_plane_state->uapi.fence,
-						    I915_FENCE_TIMEOUT,
+						    i915_fence_timeout(dev_priv),
 						    GFP_KERNEL);
 		if (ret < 0)
 			return ret;
@@ -15842,7 +15842,8 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
 
 		ret = i915_sw_fence_await_reservation(&state->commit_ready,
 						      obj->base.resv, NULL,
-						      false, I915_FENCE_TIMEOUT,
+						      false,
+						      i915_fence_timeout(dev_priv),
 						      GFP_KERNEL);
 		if (ret < 0)
 			goto unpin_fb;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
index 34be4c0ee7c5..bc0223716906 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c
@@ -108,7 +108,7 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	if (clflush) {
 		i915_sw_fence_await_reservation(&clflush->base.chain,
 						obj->base.resv, NULL, true,
-						I915_FENCE_TIMEOUT,
+						i915_fence_timeout(to_i915(obj->base.dev)),
 						I915_FENCE_GFP);
 		dma_resv_add_excl_fence(obj->base.resv, &clflush->base.dma);
 		dma_fence_work_commit(&clflush->base);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
index 3a146aa2593b..d3a86a4d5c04 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c
@@ -288,8 +288,7 @@ int i915_gem_schedule_fill_pages_blt(struct drm_i915_gem_object *obj,
 
 	i915_gem_object_lock(obj);
 	err = i915_sw_fence_await_reservation(&work->wait,
-					      obj->base.resv, NULL,
-					      true, I915_FENCE_TIMEOUT,
+					      obj->base.resv, NULL, true, 0,
 					      I915_FENCE_GFP);
 	if (err < 0) {
 		dma_fence_set_error(&work->dma, err);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_fence.c b/drivers/gpu/drm/i915/gem/i915_gem_fence.c
index 2f6100ec2608..8ab842c80f99 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_fence.c
@@ -72,8 +72,8 @@ i915_gem_object_lock_fence(struct drm_i915_gem_object *obj)
 		       0, 0);
 
 	if (i915_sw_fence_await_reservation(&stub->chain,
-					    obj->base.resv, NULL,
-					    true, I915_FENCE_TIMEOUT,
+					    obj->base.resv, NULL, true,
+					    i915_fence_timeout(to_i915(obj->base.dev)),
 					    I915_FENCE_GFP) < 0)
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/i915_config.c b/drivers/gpu/drm/i915/i915_config.c
new file mode 100644
index 000000000000..c879d26369e1
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_config.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include "i915_drv.h"
+
+unsigned long i915_fence_timeout(const struct drm_i915_private *i915)
+{
+#if IS_ACTIVE(CONFIG_DRM_I915_FENCE_TIMEOUT)
+	return msecs_to_jiffies_timeout(CONFIG_DRM_I915_FENCE_TIMEOUT);
+#else
+	return 0;
+#endif
+}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6af69555733e..bc0e96c88821 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -614,8 +614,8 @@ struct i915_gem_mm {
 
 #define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
 
+unsigned long i915_fence_timeout(const struct drm_i915_private *i915);
 #define I915_RESET_TIMEOUT (10 * HZ) /* 10s */
-#define I915_FENCE_TIMEOUT (10 * HZ) /* 10s */
 
 #define I915_ENGINE_DEAD_TIMEOUT  (4 * HZ)  /* Seqno, head and subunits dead */
 #define I915_SEQNO_DEAD_TIMEOUT   (12 * HZ) /* Seqno dead with active head */
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 248efbc01224..ac78021ae4f0 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1114,7 +1114,7 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
 			ret = i915_request_await_request(rq, to_request(fence));
 		else
 			ret = i915_sw_fence_await_dma_fence(&rq->submit, fence,
-							    fence->context ? I915_FENCE_TIMEOUT : 0,
+							    fence->context ? i915_fence_timeout(rq->i915) : 0,
 							    I915_FENCE_GFP);
 		if (ret < 0)
 			return ret;
@@ -1243,7 +1243,7 @@ static void execution_proxy_work(struct work_struct *work)
 		else
 			err = i915_sw_fence_await_dma_fence(&rq->submit,
 							    wait->fence,
-							    I915_FENCE_TIMEOUT,
+							    i915_fence_timeout(rq->i915),
 							    GFP_KERNEL);
 		if (err < 0)
 			i915_request_set_error_once(rq, err);
@@ -1357,11 +1357,11 @@ i915_request_await_execution(struct i915_request *rq,
 		else if (dma_fence_is_proxy(fence))
 			ret = __i915_request_await_proxy_execution(rq,
 								   fence,
-								   I915_FENCE_TIMEOUT,
+								   i915_fence_timeout(rq->i915),
 								   hook);
 		else
 			ret = i915_sw_fence_await_dma_fence(&rq->submit, fence,
-							    I915_FENCE_TIMEOUT,
+							    i915_fence_timeout(rq->i915),
 							    GFP_KERNEL);
 		if (ret < 0)
 			return ret;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 13/14] drm/i915: Drop I915_RESET_TIMEOUT and friends
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (10 preceding siblings ...)
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 12/14] drm/i915: Replace the hardcoded I915_FENCE_TIMEOUT Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 14/14] drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT Chris Wilson
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

These were used to set various timeouts for the reset procedure
(deciding when the engine was dead, and even if the reset itself was not
making forward progress). No longer used.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bc0e96c88821..cdea08962ab5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -615,12 +615,6 @@ struct i915_gem_mm {
 #define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
 
 unsigned long i915_fence_timeout(const struct drm_i915_private *i915);
-#define I915_RESET_TIMEOUT (10 * HZ) /* 10s */
-
-#define I915_ENGINE_DEAD_TIMEOUT  (4 * HZ)  /* Seqno, head and subunits dead */
-#define I915_SEQNO_DEAD_TIMEOUT   (12 * HZ) /* Seqno dead with active head */
-
-#define I915_ENGINE_WEDGED_TIMEOUT  (60 * HZ)  /* Reset but no recovery? */
 
 /* Amount of SAGV/QGV points, BSpec precisely defines this */
 #define I915_NUM_QGV_POINTS 8
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 14/14] drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (11 preceding siblings ...)
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 13/14] drm/i915: Drop I915_RESET_TIMEOUT and friends Chris Wilson
@ 2020-05-03 11:21 ` Chris Wilson
  2020-05-03 11:38 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/14] drm/i915/gem: Specify address type for chained reloc batches (rev2) Patchwork
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

This timeout is only used in one place, to provide a tiny bit of grace
for slow igt to cleanup after themselves. If we are a bit stricter and
opt to kill outstanding requsts rather than wait, we can speed up igt by
not waiting for 200ms after a hang.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 11 ++++++-----
 drivers/gpu/drm/i915/i915_drv.h     |  2 --
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 8e98df6a3045..649acf1fc33d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1463,12 +1463,13 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
 {
 	int ret;
 
-	if (val & DROP_RESET_ACTIVE &&
-	    wait_for(intel_engines_are_idle(gt), I915_IDLE_ENGINES_TIMEOUT))
-		intel_gt_set_wedged(gt);
+	if (val & (DROP_RETIRE | DROP_RESET_ACTIVE))
+		intel_gt_wait_for_idle(gt, 1);
 
-	if (val & DROP_RETIRE)
-		intel_gt_retire_requests(gt);
+	if (val & DROP_RESET_ACTIVE && intel_gt_pm_get_if_awake(gt)) {
+		intel_gt_set_wedged(gt);
+		intel_gt_pm_put(gt);
+	}
 
 	if (val & (DROP_IDLE | DROP_ACTIVE)) {
 		ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cdea08962ab5..b12e9f39289e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -612,8 +612,6 @@ struct i915_gem_mm {
 	u32 shrink_count;
 };
 
-#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
-
 unsigned long i915_fence_timeout(const struct drm_i915_private *i915);
 
 /* Amount of SAGV/QGV points, BSpec precisely defines this */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/gem: Implement legacy MI_STORE_DATA_IMM
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 02/14] drm/i915/gem: Implement legacy MI_STORE_DATA_IMM Chris Wilson
@ 2020-05-03 11:25   ` Chris Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-03 11:25 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

The older arches did not convert MI_STORE_DATA_IMM to using the GTT, but
left them writing to a physical address. The notes suggest that the
primary reason would be so that the writes were cache coherent, as the
CPU cache uses physical tagging. As such we did not implement the
legacy variant of MI_STORE_DATA_IMM and so left all the relocations
synchronous -- but with a small function to convert from the vma address
into the physical address, we can implement asynchronous relocs on these
older arches, fixing up a few tests that require them.

In order to be able to test the legacy paths, refactor the gpu
relocations so that we can hook them up to a selftest.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/757
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 204 ++++++++++-------
 .../i915/gem/selftests/i915_gem_execbuffer.c  | 206 ++++++++++++++++++
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 3 files changed, 337 insertions(+), 74 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index ab0d4df13c0b..44d7da0e200e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -955,7 +955,7 @@ static void reloc_cache_init(struct reloc_cache *cache,
 	cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
 	cache->node.flags = 0;
 	cache->rq = NULL;
-	cache->rq_size = 0;
+	cache->target = NULL;
 }
 
 static inline void *unmask_page(unsigned long p)
@@ -1325,7 +1325,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 
 		ce = intel_context_create(engine);
 		if (IS_ERR(ce)) {
-			err = PTR_ERR(rq);
+			err = PTR_ERR(ce);
 			goto err_unpin;
 		}
 
@@ -1376,6 +1376,11 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	return err;
 }
 
+static bool can_store_dword(const struct intel_engine_cs *engine)
+{
+	return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6);
+}
+
 static u32 *reloc_gpu(struct i915_execbuffer *eb,
 		      struct i915_vma *vma,
 		      unsigned int len)
@@ -1387,9 +1392,9 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
 	if (unlikely(!cache->rq)) {
 		struct intel_engine_cs *engine = eb->engine;
 
-		if (!intel_engine_can_store_dword(engine)) {
+		if (!can_store_dword(engine)) {
 			engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
-			if (!engine || !intel_engine_can_store_dword(engine))
+			if (!engine)
 				return ERR_PTR(-ENODEV);
 		}
 
@@ -1435,91 +1440,138 @@ static inline bool use_reloc_gpu(struct i915_vma *vma)
 	return !dma_resv_test_signaled_rcu(vma->resv, true);
 }
 
-static u64
-relocate_entry(struct i915_vma *vma,
-	       const struct drm_i915_gem_relocation_entry *reloc,
-	       struct i915_execbuffer *eb,
-	       const struct i915_vma *target)
+static unsigned long vma_phys_addr(struct i915_vma *vma, u32 offset)
 {
-	u64 offset = reloc->offset;
-	u64 target_offset = relocation_target(reloc, target);
-	bool wide = eb->reloc_cache.use_64bit_reloc;
-	void *vaddr;
+	struct page *page;
+	unsigned long addr;
 
-	if (!eb->reloc_cache.vaddr && use_reloc_gpu(vma)) {
-		const unsigned int gen = eb->reloc_cache.gen;
-		unsigned int len;
-		u32 *batch;
-		u64 addr;
+	GEM_BUG_ON(vma->pages != vma->obj->mm.pages);
 
-		if (wide)
-			len = offset & 7 ? 8 : 5;
-		else if (gen >= 4)
-			len = 4;
-		else
-			len = 3;
+	page = i915_gem_object_get_page(vma->obj, offset >> PAGE_SHIFT);
+	addr = PFN_PHYS(page_to_pfn(page));
+	GEM_BUG_ON(overflows_type(addr, u32)); /* expected dma32 */
 
-		batch = reloc_gpu(eb, vma, len);
-		if (IS_ERR(batch))
-			goto repeat;
+	return addr + offset_in_page(offset);
+}
+
+static bool __reloc_entry_gpu(struct i915_vma *vma,
+			      struct i915_execbuffer *eb,
+			      u64 offset,
+			      u64 target_addr)
+{
+	const unsigned int gen = eb->reloc_cache.gen;
+	unsigned int len;
+	u32 *batch;
+	u64 addr;
+
+	if (gen >= 8)
+		len = offset & 7 ? 8 : 5;
+	else if (gen >= 4)
+		len = 4;
+	else
+		len = 3;
+
+	batch = reloc_gpu(eb, vma, len);
+	if (IS_ERR(batch))
+		return false;
+
+	addr = gen8_canonical_addr(vma->node.start + offset);
+	if (gen >= 8) {
+		if (offset & 7) {
+			*batch++ = MI_STORE_DWORD_IMM_GEN4;
+			*batch++ = lower_32_bits(addr);
+			*batch++ = upper_32_bits(addr);
+			*batch++ = lower_32_bits(target_addr);
+
+			addr = gen8_canonical_addr(addr + 4);
 
-		addr = gen8_canonical_addr(vma->node.start + offset);
-		if (wide) {
-			if (offset & 7) {
-				*batch++ = MI_STORE_DWORD_IMM_GEN4;
-				*batch++ = lower_32_bits(addr);
-				*batch++ = upper_32_bits(addr);
-				*batch++ = lower_32_bits(target_offset);
-
-				addr = gen8_canonical_addr(addr + 4);
-
-				*batch++ = MI_STORE_DWORD_IMM_GEN4;
-				*batch++ = lower_32_bits(addr);
-				*batch++ = upper_32_bits(addr);
-				*batch++ = upper_32_bits(target_offset);
-			} else {
-				*batch++ = (MI_STORE_DWORD_IMM_GEN4 | (1 << 21)) + 1;
-				*batch++ = lower_32_bits(addr);
-				*batch++ = upper_32_bits(addr);
-				*batch++ = lower_32_bits(target_offset);
-				*batch++ = upper_32_bits(target_offset);
-			}
-		} else if (gen >= 6) {
 			*batch++ = MI_STORE_DWORD_IMM_GEN4;
-			*batch++ = 0;
-			*batch++ = addr;
-			*batch++ = target_offset;
-		} else if (gen >= 4) {
-			*batch++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
-			*batch++ = 0;
-			*batch++ = addr;
-			*batch++ = target_offset;
+			*batch++ = lower_32_bits(addr);
+			*batch++ = upper_32_bits(addr);
+			*batch++ = upper_32_bits(target_addr);
 		} else {
-			*batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
-			*batch++ = addr;
-			*batch++ = target_offset;
+			*batch++ = (MI_STORE_DWORD_IMM_GEN4 | (1 << 21)) + 1;
+			*batch++ = lower_32_bits(addr);
+			*batch++ = upper_32_bits(addr);
+			*batch++ = lower_32_bits(target_addr);
+			*batch++ = upper_32_bits(target_addr);
 		}
-
-		goto out;
+	} else if (gen >= 6) {
+		*batch++ = MI_STORE_DWORD_IMM_GEN4;
+		*batch++ = 0;
+		*batch++ = addr;
+		*batch++ = target_addr;
+	} else if (IS_I965G(eb->i915)) {
+		*batch++ = MI_STORE_DWORD_IMM_GEN4;
+		*batch++ = 0;
+		*batch++ = vma_phys_addr(vma, offset);
+		*batch++ = target_addr;
+	} else if (gen >= 4) {
+		*batch++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+		*batch++ = 0;
+		*batch++ = addr;
+		*batch++ = target_addr;
+	} else if (gen >= 3 &&
+		   !(IS_I915G(eb->i915) || IS_I915GM(eb->i915))) {
+		*batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
+		*batch++ = addr;
+		*batch++ = target_addr;
+	} else {
+		*batch++ = MI_STORE_DWORD_IMM;
+		*batch++ = vma_phys_addr(vma, offset);
+		*batch++ = target_addr;
 	}
 
+	return true;
+}
+
+static bool reloc_entry_gpu(struct i915_vma *vma,
+			    struct i915_execbuffer *eb,
+			    u64 offset,
+			    u64 target_addr)
+{
+	if (eb->reloc_cache.vaddr)
+		return false;
+
+	if (!use_reloc_gpu(vma))
+		return false;
+
+	return __reloc_entry_gpu(vma, eb, offset, target_addr);
+}
+
+static u64
+relocate_entry(struct i915_vma *vma,
+	       const struct drm_i915_gem_relocation_entry *reloc,
+	       struct i915_execbuffer *eb,
+	       const struct i915_vma *target)
+{
+	u64 target_addr = relocation_target(reloc, target);
+	u64 offset = reloc->offset;
+
+	if (!reloc_entry_gpu(vma, eb, offset, target_addr)) {
+		bool wide = eb->reloc_cache.use_64bit_reloc;
+		void *vaddr;
+
 repeat:
-	vaddr = reloc_vaddr(vma->obj, &eb->reloc_cache, offset >> PAGE_SHIFT);
-	if (IS_ERR(vaddr))
-		return PTR_ERR(vaddr);
+		vaddr = reloc_vaddr(vma->obj,
+				    &eb->reloc_cache,
+				    offset >> PAGE_SHIFT);
+		if (IS_ERR(vaddr))
+			return PTR_ERR(vaddr);
 
-	clflush_write32(vaddr + offset_in_page(offset),
-			lower_32_bits(target_offset),
-			eb->reloc_cache.vaddr);
+		GEM_BUG_ON(!IS_ALIGNED(offset, sizeof(u32)));
+		clflush_write32(vaddr + offset_in_page(offset),
+				lower_32_bits(target_addr),
+				eb->reloc_cache.vaddr);
 
-	if (wide) {
-		offset += sizeof(u32);
-		target_offset >>= 32;
-		wide = false;
-		goto repeat;
+		if (wide) {
+			offset += sizeof(u32);
+			target_addr >>= 32;
+			wide = false;
+			goto repeat;
+		}
 	}
 
-out:
 	return target->node.start | UPDATE;
 }
 
@@ -3022,3 +3074,7 @@ end:;
 	kvfree(exec2_list);
 	return err;
 }
+
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftests/i915_gem_execbuffer.c"
+#endif
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
new file mode 100644
index 000000000000..985f9fbd0ba0
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include "i915_selftest.h"
+
+#include "gt/intel_engine_pm.h"
+#include "selftests/igt_flush_test.h"
+
+static void hexdump(const void *buf, size_t len)
+{
+	const size_t rowsize = 8 * sizeof(u32);
+	const void *prev = NULL;
+	bool skip = false;
+	size_t pos;
+
+	for (pos = 0; pos < len; pos += rowsize) {
+		char line[128];
+
+		if (prev && !memcmp(prev, buf + pos, rowsize)) {
+			if (!skip) {
+				pr_info("*\n");
+				skip = true;
+			}
+			continue;
+		}
+
+		WARN_ON_ONCE(hex_dump_to_buffer(buf + pos, len - pos,
+						rowsize, sizeof(u32),
+						line, sizeof(line),
+						false) >= sizeof(line));
+		pr_info("[%04zx] %s\n", pos, line);
+
+		prev = buf + pos;
+		skip = false;
+	}
+}
+
+static u64 read_reloc(const u32 *map, int x, const u64 mask)
+{
+	u64 reloc;
+
+	memcpy(&reloc, &map[x], sizeof(reloc));
+	return reloc & mask;
+}
+
+static int __igt_gpu_reloc(struct i915_execbuffer *eb,
+			   struct drm_i915_gem_object *obj)
+{
+	enum {
+		X = 0,
+		Y = 3,
+		Z = 8
+	};
+	const u64 mask = GENMASK_ULL(eb->reloc_cache.gen >= 8 ? 63 : 31, 0);
+	const u32 *map = page_mask_bits(obj->mm.mapping);
+	struct i915_request *rq;
+	struct i915_vma *vma;
+	int inc;
+	int err;
+
+	vma = i915_vma_instance(obj, eb->context->vm, NULL);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
+
+	err = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_HIGH);
+	if (err)
+		return err;
+
+	/* 8-Byte aligned */
+	if (!__reloc_entry_gpu(vma, eb, X * sizeof(u32), X)) {
+		err = -EIO;
+		goto unpin_vma;
+	}
+
+	/* !8-Byte aligned */
+	if (!__reloc_entry_gpu(vma, eb, Y * sizeof(u32), Y)) {
+		err = -EIO;
+		goto unpin_vma;
+	}
+
+	/* Skip to the end of the cmd page */
+	inc = PAGE_SIZE / sizeof(u32) - RELOC_TAIL - 1;
+	inc -= eb->reloc_cache.rq_size;
+	memset(eb->reloc_cache.rq_cmd + eb->reloc_cache.rq_size,
+	       0, inc * sizeof(u32));
+	eb->reloc_cache.rq_size += inc;
+
+	/* Force batch chaining */
+	if (!__reloc_entry_gpu(vma, eb, Z * sizeof(u32), Z)) {
+		err = -EIO;
+		goto unpin_vma;
+	}
+
+	GEM_BUG_ON(!eb->reloc_cache.rq);
+	rq = i915_request_get(eb->reloc_cache.rq);
+	err = reloc_gpu_flush(&eb->reloc_cache);
+	if (err)
+		goto put_rq;
+	GEM_BUG_ON(eb->reloc_cache.rq);
+
+	err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
+	if (err) {
+		intel_gt_set_wedged(eb->engine->gt);
+		goto put_rq;
+	}
+
+	if (!i915_request_completed(rq)) {
+		pr_err("%s: did not wait for relocations!\n", eb->engine->name);
+		err = -EINVAL;
+		goto put_rq;
+	}
+
+	if (read_reloc(map, X, mask) != X) {
+		pr_err("%s[X]: map[%d] %llx != %x\n",
+		       eb->engine->name, X, read_reloc(map, X, mask), X);
+		err = -EINVAL;
+	}
+
+	if (read_reloc(map, Y, mask) != Y) {
+		pr_err("%s[Y]: map[%d] %llx != %x\n",
+		       eb->engine->name, Y, read_reloc(map, Y, mask), Y);
+		err = -EINVAL;
+	}
+
+	if (read_reloc(map, Z, mask) != Z) {
+		pr_err("%s[Z]: map[%d] %llx != %x\n",
+		       eb->engine->name, Z, read_reloc(map, Z, mask), Z);
+		err = -EINVAL;
+	}
+
+	if (err)
+		hexdump(map, 4096);
+
+put_rq:
+	i915_request_put(rq);
+unpin_vma:
+	i915_vma_unpin(vma);
+	return err;
+}
+
+static int igt_gpu_reloc(void *arg)
+{
+	struct i915_execbuffer eb;
+	struct drm_i915_gem_object *scratch;
+	int err = 0;
+	u32 *map;
+
+	eb.i915 = arg;
+
+	scratch = i915_gem_object_create_internal(eb.i915, 4096);
+	if (IS_ERR(scratch))
+		return PTR_ERR(scratch);
+
+	map = i915_gem_object_pin_map(scratch, I915_MAP_WC);
+	if (IS_ERR(map)) {
+		err = PTR_ERR(map);
+		goto err_scratch;
+	}
+
+	for_each_uabi_engine(eb.engine, eb.i915) {
+		reloc_cache_init(&eb.reloc_cache, eb.i915);
+		memset(map, POISON_INUSE, 4096);
+
+		intel_engine_pm_get(eb.engine);
+		eb.context = intel_context_create(eb.engine);
+		if (IS_ERR(eb.context)) {
+			err = PTR_ERR(eb.context);
+			goto err_pm;
+		}
+
+		err = intel_context_pin(eb.context);
+		if (err)
+			goto err_put;
+
+		err = __igt_gpu_reloc(&eb, scratch);
+
+		intel_context_unpin(eb.context);
+err_put:
+		intel_context_put(eb.context);
+err_pm:
+		intel_engine_pm_put(eb.engine);
+		if (err)
+			break;
+	}
+
+	if (igt_flush_test(eb.i915))
+		err = -EIO;
+
+err_scratch:
+	i915_gem_object_put(scratch);
+	return err;
+}
+
+int i915_gem_execbuffer_live_selftests(struct drm_i915_private *i915)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(igt_gpu_reloc),
+	};
+
+	if (intel_gt_is_wedged(&i915->gt))
+		return 0;
+
+	return i915_live_subtests(tests, i915);
+}
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index 0a953bfc0585..5dd5d81646c4 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -37,6 +37,7 @@ selftest(gem, i915_gem_live_selftests)
 selftest(evict, i915_gem_evict_live_selftests)
 selftest(hugepages, i915_gem_huge_page_live_selftests)
 selftest(gem_contexts, i915_gem_context_live_selftests)
+selftest(gem_execbuf, i915_gem_execbuffer_live_selftests)
 selftest(blt, i915_gem_object_blt_live_selftests)
 selftest(client, i915_gem_client_blt_live_selftests)
 selftest(reset, intel_reset_live_selftests)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/14] drm/i915/gem: Specify address type for chained reloc batches (rev2)
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (12 preceding siblings ...)
  2020-05-03 11:21 ` [Intel-gfx] [PATCH 14/14] drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT Chris Wilson
@ 2020-05-03 11:38 ` Patchwork
  2020-05-03 12:02 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
  2020-05-04  1:15 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  15 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2020-05-03 11:38 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/14] drm/i915/gem: Specify address type for chained reloc batches (rev2)
URL   : https://patchwork.freedesktop.org/series/76876/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
02c8f810d643 drm/i915/gem: Specify address type for chained reloc batches
-:37: WARNING:NO_AUTHOR_SIGN_OFF: Missing Signed-off-by: line by nominal patch author 'Chris Wilson <ickle@x201s.alporthouse.com>'

total: 0 errors, 1 warnings, 0 checks, 20 lines checked
8feebfd436da drm/i915/gem: Implement legacy MI_STORE_DATA_IMM
-:285: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#285: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 466 lines checked
cc6b97ec7ae6 drm/i915/gt: Small tidy of gen8+ breadcrumb emission
c2e4e6b8593f drm/i915: Mark concurrent submissions with a weak-dependency
f4534f62acc6 drm/i915/selftests: Repeat the rps clock frequency measurement
-:68: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.rst
#68: FILE: drivers/gpu/drm/i915/gt/selftest_rps.c:311:
+				udelay(1000);

total: 0 errors, 0 warnings, 1 checks, 83 lines checked
c1b14a4db98f drm/i915/gt: Stop holding onto the pinned_default_state
9bc7d44c8f02 dma-buf: Proxy fence, an unsignaled fence placeholder
-:45: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#45: 
new file mode 100644

-:387: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#387: FILE: drivers/dma-buf/st-dma-fence-proxy.c:20:
+	spinlock_t lock;

-:547: WARNING:MEMORY_BARRIER: memory barrier without comment
#547: FILE: drivers/dma-buf/st-dma-fence-proxy.c:180:
+	smp_store_mb(container_of(cb, struct simple_cb, cb)->seen, true);

total: 0 errors, 2 warnings, 1 checks, 932 lines checked
9d9cd56fe002 drm/syncobj: Allow use of dma-fence-proxy
4cd92f858258 drm/i915/gem: Teach execbuf how to wait on future syncobj
bcb046982800 drm/i915/gem: Allow combining submit-fences with syncobj
b609443074e3 drm/i915/gt: Declare when we enabled timeslicing
695d8fbf606b drm/i915: Replace the hardcoded I915_FENCE_TIMEOUT
-:111: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#111: 
new file mode 100644

-:154: WARNING:LONG_LINE: line over 100 characters
#154: FILE: drivers/gpu/drm/i915/i915_request.c:1117:
+							    fence->context ? i915_fence_timeout(rq->i915) : 0,

total: 0 errors, 2 warnings, 0 checks, 119 lines checked
d5ad7de1df33 drm/i915: Drop I915_RESET_TIMEOUT and friends
19be72a69bc5 drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [01/14] drm/i915/gem: Specify address type for chained reloc batches (rev2)
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (13 preceding siblings ...)
  2020-05-03 11:38 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/14] drm/i915/gem: Specify address type for chained reloc batches (rev2) Patchwork
@ 2020-05-03 12:02 ` Patchwork
  2020-05-04  1:15 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  15 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2020-05-03 12:02 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/14] drm/i915/gem: Specify address type for chained reloc batches (rev2)
URL   : https://patchwork.freedesktop.org/series/76876/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8414 -> Patchwork_17556
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/index.html

New tests
---------

  New tests have been introduced between CI_DRM_8414 and Patchwork_17556:

### New IGT tests (2) ###

  * igt@dmabuf@all@dma_fence_proxy:
    - Statuses : 38 pass(s)
    - Exec time: [0.03, 0.09] s

  * igt@i915_selftest@live@gem_execbuf:
    - Statuses : 38 pass(s)
    - Exec time: [0.45, 2.74] s

  

Known issues
------------

  Here are the changes found in Patchwork_17556 that come from known issues:

### IGT changes ###

#### Warnings ####

  * igt@i915_pm_rpm@module-reload:
    - fi-kbl-x1275:       [FAIL][1] ([i915#62] / [i915#95]) -> [SKIP][2] ([fdo#109271])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/fi-kbl-x1275/igt@i915_pm_rpm@module-reload.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/fi-kbl-x1275/igt@i915_pm_rpm@module-reload.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (48 -> 41)
------------------------------

  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8414 -> Patchwork_17556

  CI-20190529: 20190529
  CI_DRM_8414: a6f6a61a3cd126f52d1a80c463f4bb3d3dcc1813 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5626: f27fdfff026276ac75c69e487c929a843f66f6ca @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17556: 19be72a69bc5d8b0038d39120cd9bc1a78f97891 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

19be72a69bc5 drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT
d5ad7de1df33 drm/i915: Drop I915_RESET_TIMEOUT and friends
695d8fbf606b drm/i915: Replace the hardcoded I915_FENCE_TIMEOUT
b609443074e3 drm/i915/gt: Declare when we enabled timeslicing
bcb046982800 drm/i915/gem: Allow combining submit-fences with syncobj
4cd92f858258 drm/i915/gem: Teach execbuf how to wait on future syncobj
9d9cd56fe002 drm/syncobj: Allow use of dma-fence-proxy
9bc7d44c8f02 dma-buf: Proxy fence, an unsignaled fence placeholder
c1b14a4db98f drm/i915/gt: Stop holding onto the pinned_default_state
f4534f62acc6 drm/i915/selftests: Repeat the rps clock frequency measurement
c2e4e6b8593f drm/i915: Mark concurrent submissions with a weak-dependency
cc6b97ec7ae6 drm/i915/gt: Small tidy of gen8+ breadcrumb emission
8feebfd436da drm/i915/gem: Implement legacy MI_STORE_DATA_IMM
02c8f810d643 drm/i915/gem: Specify address type for chained reloc batches

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for series starting with [01/14] drm/i915/gem: Specify address type for chained reloc batches (rev2)
  2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
                   ` (14 preceding siblings ...)
  2020-05-03 12:02 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2020-05-04  1:15 ` Patchwork
  15 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2020-05-04  1:15 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/14] drm/i915/gem: Specify address type for chained reloc batches (rev2)
URL   : https://patchwork.freedesktop.org/series/76876/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8414_full -> Patchwork_17556_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

New tests
---------

  New tests have been introduced between CI_DRM_8414_full and Patchwork_17556_full:

### New IGT tests (2) ###

  * igt@dmabuf@all@dma_fence_proxy:
    - Statuses : 7 pass(s)
    - Exec time: [0.04, 0.11] s

  * igt@i915_selftest@live@gem_execbuf:
    - Statuses : 8 pass(s)
    - Exec time: [0.41, 2.50] s

  

Known issues
------------

  Here are the changes found in Patchwork_17556_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_params@invalid-bsd-ring:
    - shard-iclb:         [PASS][1] -> [SKIP][2] ([fdo#109276])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-iclb2/igt@gem_exec_params@invalid-bsd-ring.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-iclb6/igt@gem_exec_params@invalid-bsd-ring.html

  * igt@gen9_exec_parse@allowed-all:
    - shard-skl:          [PASS][3] -> [DMESG-WARN][4] ([i915#716])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-skl5/igt@gen9_exec_parse@allowed-all.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-skl3/igt@gen9_exec_parse@allowed-all.html

  * igt@kms_cursor_crc@pipe-b-cursor-suspend:
    - shard-apl:          [PASS][5] -> [DMESG-WARN][6] ([i915#180]) +1 similar issue
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-apl6/igt@kms_cursor_crc@pipe-b-cursor-suspend.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-apl1/igt@kms_cursor_crc@pipe-b-cursor-suspend.html

  * igt@kms_cursor_crc@pipe-c-cursor-suspend:
    - shard-kbl:          [PASS][7] -> [DMESG-WARN][8] ([i915#180]) +5 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-kbl7/igt@kms_cursor_crc@pipe-c-cursor-suspend.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-kbl3/igt@kms_cursor_crc@pipe-c-cursor-suspend.html

  * igt@kms_hdr@bpc-switch-suspend:
    - shard-skl:          [PASS][9] -> [FAIL][10] ([i915#1188]) +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-skl8/igt@kms_hdr@bpc-switch-suspend.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-skl9/igt@kms_hdr@bpc-switch-suspend.html

  * igt@kms_plane_alpha_blend@pipe-b-constant-alpha-min:
    - shard-skl:          [PASS][11] -> [FAIL][12] ([fdo#108145] / [i915#265])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-skl8/igt@kms_plane_alpha_blend@pipe-b-constant-alpha-min.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-skl5/igt@kms_plane_alpha_blend@pipe-b-constant-alpha-min.html

  * igt@kms_plane_lowres@pipe-a-tiling-x:
    - shard-glk:          [PASS][13] -> [FAIL][14] ([i915#899])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-glk1/igt@kms_plane_lowres@pipe-a-tiling-x.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-glk4/igt@kms_plane_lowres@pipe-a-tiling-x.html

  * igt@kms_psr2_su@frontbuffer:
    - shard-iclb:         [PASS][15] -> [SKIP][16] ([fdo#109642] / [fdo#111068])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-iclb2/igt@kms_psr2_su@frontbuffer.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-iclb6/igt@kms_psr2_su@frontbuffer.html

  * igt@kms_psr@psr2_sprite_plane_move:
    - shard-iclb:         [PASS][17] -> [SKIP][18] ([fdo#109441]) +3 similar issues
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-iclb2/igt@kms_psr@psr2_sprite_plane_move.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-iclb6/igt@kms_psr@psr2_sprite_plane_move.html

  * igt@kms_setmode@basic:
    - shard-hsw:          [PASS][19] -> [FAIL][20] ([i915#31])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-hsw1/igt@kms_setmode@basic.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-hsw2/igt@kms_setmode@basic.html

  
#### Possible fixes ####

  * igt@gen9_exec_parse@allowed-all:
    - shard-apl:          [DMESG-WARN][21] ([i915#716]) -> [PASS][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-apl3/igt@gen9_exec_parse@allowed-all.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-apl6/igt@gen9_exec_parse@allowed-all.html

  * igt@kms_cursor_legacy@all-pipes-torture-bo:
    - shard-hsw:          [DMESG-WARN][23] ([i915#128]) -> [PASS][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-hsw4/igt@kms_cursor_legacy@all-pipes-torture-bo.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-hsw1/igt@kms_cursor_legacy@all-pipes-torture-bo.html

  * {igt@kms_flip@flip-vs-suspend-interruptible@a-dp1}:
    - shard-kbl:          [DMESG-WARN][25] ([i915#180]) -> [PASS][26] +9 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-kbl7/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-kbl3/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [FAIL][27] ([fdo#108145] / [i915#265]) -> [PASS][28]
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-skl8/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-skl5/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html

  * {igt@perf@polling-parameterized}:
    - shard-hsw:          [FAIL][29] ([i915#1542]) -> [PASS][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-hsw6/igt@perf@polling-parameterized.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-hsw7/igt@perf@polling-parameterized.html

  
#### Warnings ####

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-kbl:          [FAIL][31] ([i915#93] / [i915#95]) -> [DMESG-FAIL][32] ([i915#180] / [i915#95])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-kbl4/igt@kms_fbcon_fbt@fbc-suspend.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-kbl1/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-basic:
    - shard-apl:          [FAIL][33] ([fdo#108145] / [i915#265]) -> [FAIL][34] ([fdo#108145] / [i915#265] / [i915#95])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8414/shard-apl1/igt@kms_plane_alpha_blend@pipe-a-alpha-basic.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/shard-apl3/igt@kms_plane_alpha_blend@pipe-a-alpha-basic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#128]: https://gitlab.freedesktop.org/drm/intel/issues/128
  [i915#1542]: https://gitlab.freedesktop.org/drm/intel/issues/1542
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#31]: https://gitlab.freedesktop.org/drm/intel/issues/31
  [i915#716]: https://gitlab.freedesktop.org/drm/intel/issues/716
  [i915#899]: https://gitlab.freedesktop.org/drm/intel/issues/899
  [i915#93]: https://gitlab.freedesktop.org/drm/intel/issues/93
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8414 -> Patchwork_17556

  CI-20190529: 20190529
  CI_DRM_8414: a6f6a61a3cd126f52d1a80c463f4bb3d3dcc1813 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5626: f27fdfff026276ac75c69e487c929a843f66f6ca @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17556: 19be72a69bc5d8b0038d39120cd9bc1a78f97891 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17556/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-gfx] [PATCH 10/14] drm/i915/gem: Allow combining submit-fences with syncobj
  2020-05-05 21:52 [PATCH 01/14] drm/i915: Mark concurrent submissions with a weak-dependency Chris Wilson
@ 2020-05-05 21:52 ` Chris Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2020-05-05 21:52 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

We allow exported sync_file fences to be used as submit fences, but they
are not the only source of user fences. We also accept an array of
syncobj, and as with sync_file these are dma_fences underneath and so
feature the same set of controls. The submit-fence allows for a request
to be scheduled at the same time as the signaler, rather than as normal
after. Userspace can combine submit-fence with its own semaphores for
intra-batch scheduling.

Not exposing submit-fences to syncobj was at the time just a matter of
pragmatic expediency.

Fixes: a88b6e4cbafd ("drm/i915: Allow specification of parallel execbuf")
Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4854
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 14 +++++++----
 drivers/gpu/drm/i915/i915_request.c           | 24 +++++++++++++++++++
 include/uapi/drm/i915_drm.h                   |  7 +++---
 3 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 7abb96505a31..ec16ace50acf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2432,7 +2432,7 @@ static void
 __free_fence_array(struct drm_syncobj **fences, unsigned int n)
 {
 	while (n--)
-		drm_syncobj_put(ptr_mask_bits(fences[n], 2));
+		drm_syncobj_put(ptr_mask_bits(fences[n], 3));
 	kvfree(fences);
 }
 
@@ -2489,7 +2489,7 @@ get_fence_array(struct drm_i915_gem_execbuffer2 *args,
 		BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
 			     ~__I915_EXEC_FENCE_UNKNOWN_FLAGS);
 
-		fences[n] = ptr_pack_bits(syncobj, fence.flags, 2);
+		fences[n] = ptr_pack_bits(syncobj, fence.flags, 3);
 	}
 
 	return fences;
@@ -2520,7 +2520,7 @@ await_fence_array(struct i915_execbuffer *eb,
 		struct dma_fence *fence;
 		unsigned int flags;
 
-		syncobj = ptr_unpack_bits(fences[n], &flags, 2);
+		syncobj = ptr_unpack_bits(fences[n], &flags, 3);
 		if (!(flags & I915_EXEC_FENCE_WAIT))
 			continue;
 
@@ -2544,7 +2544,11 @@ await_fence_array(struct i915_execbuffer *eb,
 			spin_unlock(&syncobj->lock);
 		}
 
-		err = i915_request_await_dma_fence(eb->request, fence);
+		if (flags & I915_EXEC_FENCE_WAIT_SUBMIT)
+			err = i915_request_await_execution(eb->request, fence,
+							   eb->engine->bond_execute);
+		else
+			err = i915_request_await_dma_fence(eb->request, fence);
 		dma_fence_put(fence);
 		if (err < 0)
 			return err;
@@ -2565,7 +2569,7 @@ signal_fence_array(struct i915_execbuffer *eb,
 		struct drm_syncobj *syncobj;
 		unsigned int flags;
 
-		syncobj = ptr_unpack_bits(fences[n], &flags, 2);
+		syncobj = ptr_unpack_bits(fences[n], &flags, 3);
 		if (!(flags & I915_EXEC_FENCE_SIGNAL))
 			continue;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f506e3914dd8..b1cbc6d0babf 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1381,6 +1381,26 @@ __i915_request_await_execution(struct i915_request *to,
 					     &from->fence);
 }
 
+static int execution_proxy(struct await_proxy *ap)
+{
+	return i915_request_await_execution(ap->request, ap->fence, ap->data);
+}
+
+static int
+i915_request_await_proxy_execution(struct i915_request *rq,
+				   struct dma_fence *fence,
+				   void (*hook)(struct i915_request *rq,
+						struct dma_fence *signal))
+{
+	/*
+	 * We have to wait until the real request is known in order to
+	 * be able to hook into its execution, as opposed to waiting for
+	 * its completion.
+	 */
+	return __i915_request_await_proxy(rq, fence, I915_FENCE_TIMEOUT,
+					  execution_proxy, hook);
+}
+
 int
 i915_request_await_execution(struct i915_request *rq,
 			     struct dma_fence *fence,
@@ -1420,6 +1440,10 @@ i915_request_await_execution(struct i915_request *rq,
 			ret = __i915_request_await_execution(rq,
 							     to_request(fence),
 							     hook);
+		else if (dma_fence_is_proxy(fence))
+			ret = i915_request_await_proxy_execution(rq,
+								 fence,
+								 hook);
 		else
 			ret = i915_request_await_external(rq, fence);
 		if (ret < 0)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 14b67cd6b54b..704dd0e3bc1d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1040,9 +1040,10 @@ struct drm_i915_gem_exec_fence {
 	 */
 	__u32 handle;
 
-#define I915_EXEC_FENCE_WAIT            (1<<0)
-#define I915_EXEC_FENCE_SIGNAL          (1<<1)
-#define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SIGNAL << 1))
+#define I915_EXEC_FENCE_WAIT            (1u << 0)
+#define I915_EXEC_FENCE_SIGNAL          (1u << 1)
+#define I915_EXEC_FENCE_WAIT_SUBMIT     (1u << 2)
+#define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_WAIT_SUBMIT << 1))
 	__u32 flags;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-05-05 21:52 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-03 11:21 [Intel-gfx] [PATCH 01/14] drm/i915/gem: Specify address type for chained reloc batches Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 02/14] drm/i915/gem: Implement legacy MI_STORE_DATA_IMM Chris Wilson
2020-05-03 11:25   ` [Intel-gfx] [PATCH] " Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 03/14] drm/i915/gt: Small tidy of gen8+ breadcrumb emission Chris Wilson
2020-05-03 11:21 ` [PATCH 04/14] drm/i915: Mark concurrent submissions with a weak-dependency Chris Wilson
2020-05-03 11:21   ` [Intel-gfx] " Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 05/14] drm/i915/selftests: Repeat the rps clock frequency measurement Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 06/14] drm/i915/gt: Stop holding onto the pinned_default_state Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 07/14] dma-buf: Proxy fence, an unsignaled fence placeholder Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 08/14] drm/syncobj: Allow use of dma-fence-proxy Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 09/14] drm/i915/gem: Teach execbuf how to wait on future syncobj Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 10/14] drm/i915/gem: Allow combining submit-fences with syncobj Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 11/14] drm/i915/gt: Declare when we enabled timeslicing Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 12/14] drm/i915: Replace the hardcoded I915_FENCE_TIMEOUT Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 13/14] drm/i915: Drop I915_RESET_TIMEOUT and friends Chris Wilson
2020-05-03 11:21 ` [Intel-gfx] [PATCH 14/14] drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT Chris Wilson
2020-05-03 11:38 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/14] drm/i915/gem: Specify address type for chained reloc batches (rev2) Patchwork
2020-05-03 12:02 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-05-04  1:15 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2020-05-05 21:52 [PATCH 01/14] drm/i915: Mark concurrent submissions with a weak-dependency Chris Wilson
2020-05-05 21:52 ` [Intel-gfx] [PATCH 10/14] drm/i915/gem: Allow combining submit-fences with syncobj Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.