intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure
@ 2020-06-01  7:24 Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 02/36] drm/i915/gt: Split low level gen2-7 CS emitters Chris Wilson
                   ` (42 more replies)
  0 siblings, 43 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If we fail during engine setup, we may leave some engines not yet setup.
During the error cleanup, we have to be careful not to try and use the
uninitialise engines before discarding them.

[   16.136152] RIP: 0010:__flush_work+0x198/0x1b0
[   16.136168] Code: ff ff 8b 0b 48 8b 53 08 83 e1 08 48 0f ba 2b 03 80 c9 f0 e9 63 ff ff ff 0f 0b 48 83 c4 48 44 89 f0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45 31 f6 e9 62 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f
[   16.136186] RSP: 0018:ffffc900003bb928 EFLAGS: 00010246
[   16.136201] RAX: 0000000000000000 RBX: ffff88844f392168 RCX: 0000000000000000
[   16.136216] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88844f392168
[   16.136231] RBP: ffff88844f392130 R08: 0000000000000000 R09: 0000000000000001
[   16.136246] R10: ffff888441e31e40 R11: ffff88845e329c70 R12: ffff88844f796988
[   16.136261] R13: ffff888441e4fb80 R14: 0000000000000001 R15: ffff88844f790000
[   16.136388] FS:  00007fecbd208880(0000) GS:ffff88845e380000(0000) knlGS:0000000000000000
[   16.136405] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.136420] CR2: 00007ff3ce748f90 CR3: 0000000457a6a001 CR4: 00000000000606e0
[   16.136437] Call Trace:
[   16.136456]  ? try_to_del_timer_sync+0x3a/0x50
[   16.136529]  intel_wakeref_wait_for_idle+0x87/0xb0 [i915]
[   16.136606]  ? intel_engines_release+0x68/0xc0 [i915]
[   16.136680]  intel_engines_release+0x49/0xc0 [i915]
[   16.136757]  intel_gt_init+0x2f4/0x5e0 [i915]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index da5b61085257..c8c14981eb5d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -414,12 +414,12 @@ void intel_engines_release(struct intel_gt *gt)
 
 	/* Decouple the backend; but keep the layout for late GPU resets */
 	for_each_engine(engine, gt, id) {
-		intel_wakeref_wait_for_idle(&engine->wakeref);
-		GEM_BUG_ON(intel_engine_pm_is_awake(engine));
-
 		if (!engine->release)
 			continue;
 
+		intel_wakeref_wait_for_idle(&engine->wakeref);
+		GEM_BUG_ON(intel_engine_pm_is_awake(engine));
+
 		engine->release(engine);
 		engine->release = NULL;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 02/36] drm/i915/gt: Split low level gen2-7 CS emitters
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-02  9:04   ` Mika Kuoppala
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 03/36] drm/i915/gt: Move legacy context wa to intel_workarounds Chris Wilson
                   ` (41 subsequent siblings)
  42 siblings, 1 reply; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Pull the routines for writing CS packets out of intel_ring_submission
into their own files. These are low level operations for building CS
instructions, rather than the logic for filling the global ring buffer
with requests, and we will wnat to reuse them outside of this context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                 |   2 +
 drivers/gpu/drm/i915/gt/gen2_engine_cs.c      | 340 +++++++
 drivers/gpu/drm/i915/gt/gen2_engine_cs.h      |  38 +
 drivers/gpu/drm/i915/gt/gen6_engine_cs.c      | 455 ++++++++++
 drivers/gpu/drm/i915/gt/gen6_engine_cs.h      |  39 +
 drivers/gpu/drm/i915/gt/intel_engine.h        |   1 -
 .../gpu/drm/i915/gt/intel_ring_submission.c   | 832 +-----------------
 7 files changed, 901 insertions(+), 806 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/gen2_engine_cs.c
 create mode 100644 drivers/gpu/drm/i915/gt/gen2_engine_cs.h
 create mode 100644 drivers/gpu/drm/i915/gt/gen6_engine_cs.c
 create mode 100644 drivers/gpu/drm/i915/gt/gen6_engine_cs.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index b0da6ea6e3f1..41a27fd5dbc7 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -78,6 +78,8 @@ gt-y += \
 	gt/debugfs_engines.o \
 	gt/debugfs_gt.o \
 	gt/debugfs_gt_pm.o \
+	gt/gen2_engine_cs.o \
+	gt/gen6_engine_cs.o \
 	gt/gen6_ppgtt.o \
 	gt/gen7_renderclear.o \
 	gt/gen8_ppgtt.o \
diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
new file mode 100644
index 000000000000..8d2e85081247
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
@@ -0,0 +1,340 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include "gen2_engine_cs.h"
+#include "i915_drv.h"
+#include "intel_engine.h"
+#include "intel_gpu_commands.h"
+#include "intel_gt.h"
+#include "intel_gt_irq.h"
+#include "intel_ring.h"
+
+int gen2_emit_flush(struct i915_request *rq, u32 mode)
+{
+	unsigned int num_store_dw;
+	u32 cmd, *cs;
+
+	cmd = MI_FLUSH;
+	num_store_dw = 0;
+	if (mode & EMIT_INVALIDATE)
+		cmd |= MI_READ_FLUSH;
+	if (mode & EMIT_FLUSH)
+		num_store_dw = 4;
+
+	cs = intel_ring_begin(rq, 2 + 3 * num_store_dw);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = cmd;
+	while (num_store_dw--) {
+		*cs++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
+		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
+						INTEL_GT_SCRATCH_FIELD_DEFAULT);
+		*cs++ = 0;
+	}
+	*cs++ = MI_FLUSH | MI_NO_WRITE_FLUSH;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+int gen4_emit_flush_rcs(struct i915_request *rq, u32 mode)
+{
+	u32 cmd, *cs;
+	int i;
+
+	/*
+	 * read/write caches:
+	 *
+	 * I915_GEM_DOMAIN_RENDER is always invalidated, but is
+	 * only flushed if MI_NO_WRITE_FLUSH is unset.  On 965, it is
+	 * also flushed at 2d versus 3d pipeline switches.
+	 *
+	 * read-only caches:
+	 *
+	 * I915_GEM_DOMAIN_SAMPLER is flushed on pre-965 if
+	 * MI_READ_FLUSH is set, and is always flushed on 965.
+	 *
+	 * I915_GEM_DOMAIN_COMMAND may not exist?
+	 *
+	 * I915_GEM_DOMAIN_INSTRUCTION, which exists on 965, is
+	 * invalidated when MI_EXE_FLUSH is set.
+	 *
+	 * I915_GEM_DOMAIN_VERTEX, which exists on 965, is
+	 * invalidated with every MI_FLUSH.
+	 *
+	 * TLBs:
+	 *
+	 * On 965, TLBs associated with I915_GEM_DOMAIN_COMMAND
+	 * and I915_GEM_DOMAIN_CPU in are invalidated at PTE write and
+	 * I915_GEM_DOMAIN_RENDER and I915_GEM_DOMAIN_SAMPLER
+	 * are flushed at any MI_FLUSH.
+	 */
+
+	cmd = MI_FLUSH;
+	if (mode & EMIT_INVALIDATE) {
+		cmd |= MI_EXE_FLUSH;
+		if (IS_G4X(rq->i915) || IS_GEN(rq->i915, 5))
+			cmd |= MI_INVALIDATE_ISP;
+	}
+
+	i = 2;
+	if (mode & EMIT_INVALIDATE)
+		i += 20;
+
+	cs = intel_ring_begin(rq, i);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = cmd;
+
+	/*
+	 * A random delay to let the CS invalidate take effect? Without this
+	 * delay, the GPU relocation path fails as the CS does not see
+	 * the updated contents. Just as important, if we apply the flushes
+	 * to the EMIT_FLUSH branch (i.e. immediately after the relocation
+	 * write and before the invalidate on the next batch), the relocations
+	 * still fail. This implies that is a delay following invalidation
+	 * that is required to reset the caches as opposed to a delay to
+	 * ensure the memory is written.
+	 */
+	if (mode & EMIT_INVALIDATE) {
+		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
+		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
+						INTEL_GT_SCRATCH_FIELD_DEFAULT) |
+			PIPE_CONTROL_GLOBAL_GTT;
+		*cs++ = 0;
+		*cs++ = 0;
+
+		for (i = 0; i < 12; i++)
+			*cs++ = MI_FLUSH;
+
+		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
+		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
+						INTEL_GT_SCRATCH_FIELD_DEFAULT) |
+			PIPE_CONTROL_GLOBAL_GTT;
+		*cs++ = 0;
+		*cs++ = 0;
+	}
+
+	*cs++ = cmd;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+int gen4_emit_flush_vcs(struct i915_request *rq, u32 mode)
+{
+	u32 *cs;
+
+	cs = intel_ring_begin(rq, 2);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = MI_FLUSH;
+	*cs++ = MI_NOOP;
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+u32 *gen3_emit_breadcrumb(struct i915_request *rq, u32 *cs)
+{
+	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
+
+	*cs++ = MI_FLUSH;
+
+	*cs++ = MI_STORE_DWORD_INDEX;
+	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
+
+	rq->tail = intel_ring_offset(rq, cs);
+	assert_ring_tail_valid(rq->ring, rq->tail);
+
+	return cs;
+}
+
+#define GEN5_WA_STORES 8 /* must be at least 1! */
+u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
+{
+	int i;
+
+	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
+
+	*cs++ = MI_FLUSH;
+
+	BUILD_BUG_ON(GEN5_WA_STORES < 1);
+	for (i = 0; i < GEN5_WA_STORES; i++) {
+		*cs++ = MI_STORE_DWORD_INDEX;
+		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+		*cs++ = rq->fence.seqno;
+	}
+
+	*cs++ = MI_USER_INTERRUPT;
+
+	rq->tail = intel_ring_offset(rq, cs);
+	assert_ring_tail_valid(rq->ring, rq->tail);
+
+	return cs;
+}
+#undef GEN5_WA_STORES
+
+/* Just userspace ABI convention to limit the wa batch bo to a resonable size */
+#define I830_BATCH_LIMIT SZ_256K
+#define I830_TLB_ENTRIES (2)
+#define I830_WA_SIZE max(I830_TLB_ENTRIES * SZ_4K, I830_BATCH_LIMIT)
+int i830_emit_bb_start(struct i915_request *rq,
+		       u64 offset, u32 len,
+		       unsigned int dispatch_flags)
+{
+	u32 *cs, cs_offset =
+		intel_gt_scratch_offset(rq->engine->gt,
+					INTEL_GT_SCRATCH_FIELD_DEFAULT);
+
+	GEM_BUG_ON(rq->engine->gt->scratch->size < I830_WA_SIZE);
+
+	cs = intel_ring_begin(rq, 6);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	/* Evict the invalid PTE TLBs */
+	*cs++ = COLOR_BLT_CMD | BLT_WRITE_RGBA;
+	*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | 4096;
+	*cs++ = I830_TLB_ENTRIES << 16 | 4; /* load each page */
+	*cs++ = cs_offset;
+	*cs++ = 0xdeadbeef;
+	*cs++ = MI_NOOP;
+	intel_ring_advance(rq, cs);
+
+	if ((dispatch_flags & I915_DISPATCH_PINNED) == 0) {
+		if (len > I830_BATCH_LIMIT)
+			return -ENOSPC;
+
+		cs = intel_ring_begin(rq, 6 + 2);
+		if (IS_ERR(cs))
+			return PTR_ERR(cs);
+
+		/*
+		 * Blit the batch (which has now all relocs applied) to the
+		 * stable batch scratch bo area (so that the CS never
+		 * stumbles over its tlb invalidation bug) ...
+		 */
+		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
+		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | 4096;
+		*cs++ = DIV_ROUND_UP(len, 4096) << 16 | 4096;
+		*cs++ = cs_offset;
+		*cs++ = 4096;
+		*cs++ = offset;
+
+		*cs++ = MI_FLUSH;
+		*cs++ = MI_NOOP;
+		intel_ring_advance(rq, cs);
+
+		/* ... and execute it. */
+		offset = cs_offset;
+	}
+
+	if (!(dispatch_flags & I915_DISPATCH_SECURE))
+		offset |= MI_BATCH_NON_SECURE;
+
+	cs = intel_ring_begin(rq, 2);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
+	*cs++ = offset;
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+int gen3_emit_bb_start(struct i915_request *rq,
+		       u64 offset, u32 len,
+		       unsigned int dispatch_flags)
+{
+	u32 *cs;
+
+	if (!(dispatch_flags & I915_DISPATCH_SECURE))
+		offset |= MI_BATCH_NON_SECURE;
+
+	cs = intel_ring_begin(rq, 2);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
+	*cs++ = offset;
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+int gen4_emit_bb_start(struct i915_request *rq,
+		       u64 offset, u32 length,
+		       unsigned int dispatch_flags)
+{
+	u32 security;
+	u32 *cs;
+
+	security = MI_BATCH_NON_SECURE_I965;
+	if (dispatch_flags & I915_DISPATCH_SECURE)
+		security = 0;
+
+	cs = intel_ring_begin(rq, 2);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT | security;
+	*cs++ = offset;
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+void gen2_irq_enable(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *i915 = engine->i915;
+
+	i915->irq_mask &= ~engine->irq_enable_mask;
+	intel_uncore_write16(&i915->uncore, GEN2_IMR, i915->irq_mask);
+	ENGINE_POSTING_READ16(engine, RING_IMR);
+}
+
+void gen2_irq_disable(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *i915 = engine->i915;
+
+	i915->irq_mask |= engine->irq_enable_mask;
+	intel_uncore_write16(&i915->uncore, GEN2_IMR, i915->irq_mask);
+}
+
+void gen3_irq_enable(struct intel_engine_cs *engine)
+{
+	engine->i915->irq_mask &= ~engine->irq_enable_mask;
+	intel_uncore_write(engine->uncore, GEN2_IMR, engine->i915->irq_mask);
+	intel_uncore_posting_read_fw(engine->uncore, GEN2_IMR);
+}
+
+void gen3_irq_disable(struct intel_engine_cs *engine)
+{
+	engine->i915->irq_mask |= engine->irq_enable_mask;
+	intel_uncore_write(engine->uncore, GEN2_IMR, engine->i915->irq_mask);
+}
+
+void gen5_irq_enable(struct intel_engine_cs *engine)
+{
+	gen5_gt_enable_irq(engine->gt, engine->irq_enable_mask);
+}
+
+void gen5_irq_disable(struct intel_engine_cs *engine)
+{
+	gen5_gt_disable_irq(engine->gt, engine->irq_enable_mask);
+}
diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.h b/drivers/gpu/drm/i915/gt/gen2_engine_cs.h
new file mode 100644
index 000000000000..a5cd64a65c9e
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#ifndef __GEN2_ENGINE_CS_H__
+#define __GEN2_ENGINE_CS_H__
+
+#include <linux/types.h>
+
+struct i915_request;
+struct intel_engine_cs;
+
+int gen2_emit_flush(struct i915_request *rq, u32 mode);
+int gen4_emit_flush_rcs(struct i915_request *rq, u32 mode);
+int gen4_emit_flush_vcs(struct i915_request *rq, u32 mode);
+
+u32 *gen3_emit_breadcrumb(struct i915_request *rq, u32 *cs);
+u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs);
+
+int i830_emit_bb_start(struct i915_request *rq,
+		       u64 offset, u32 len,
+		       unsigned int dispatch_flags);
+int gen3_emit_bb_start(struct i915_request *rq,
+		       u64 offset, u32 len,
+		       unsigned int dispatch_flags);
+int gen4_emit_bb_start(struct i915_request *rq,
+		       u64 offset, u32 length,
+		       unsigned int dispatch_flags);
+
+void gen2_irq_enable(struct intel_engine_cs *engine);
+void gen2_irq_disable(struct intel_engine_cs *engine);
+void gen3_irq_enable(struct intel_engine_cs *engine);
+void gen3_irq_disable(struct intel_engine_cs *engine);
+void gen5_irq_enable(struct intel_engine_cs *engine);
+void gen5_irq_disable(struct intel_engine_cs *engine);
+
+#endif /* __GEN2_ENGINE_CS_H__ */
diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
new file mode 100644
index 000000000000..ce38d1bcaba3
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
@@ -0,0 +1,455 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include "gen6_engine_cs.h"
+#include "intel_engine.h"
+#include "intel_gpu_commands.h"
+#include "intel_gt.h"
+#include "intel_gt_irq.h"
+#include "intel_gt_pm_irq.h"
+#include "intel_ring.h"
+
+#define HWS_SCRATCH_ADDR	(I915_GEM_HWS_SCRATCH * sizeof(u32))
+
+/*
+ * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
+ * implementing two workarounds on gen6.  From section 1.4.7.1
+ * "PIPE_CONTROL" of the Sandy Bridge PRM volume 2 part 1:
+ *
+ * [DevSNB-C+{W/A}] Before any depth stall flush (including those
+ * produced by non-pipelined state commands), software needs to first
+ * send a PIPE_CONTROL with no bits set except Post-Sync Operation !=
+ * 0.
+ *
+ * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache Flush Enable
+ * =1, a PIPE_CONTROL with any non-zero post-sync-op is required.
+ *
+ * And the workaround for these two requires this workaround first:
+ *
+ * [Dev-SNB{W/A}]: Pipe-control with CS-stall bit set must be sent
+ * BEFORE the pipe-control with a post-sync op and no write-cache
+ * flushes.
+ *
+ * And this last workaround is tricky because of the requirements on
+ * that bit.  From section 1.4.7.2.3 "Stall" of the Sandy Bridge PRM
+ * volume 2 part 1:
+ *
+ *     "1 of the following must also be set:
+ *      - Render Target Cache Flush Enable ([12] of DW1)
+ *      - Depth Cache Flush Enable ([0] of DW1)
+ *      - Stall at Pixel Scoreboard ([1] of DW1)
+ *      - Depth Stall ([13] of DW1)
+ *      - Post-Sync Operation ([13] of DW1)
+ *      - Notify Enable ([8] of DW1)"
+ *
+ * The cache flushes require the workaround flush that triggered this
+ * one, so we can't use it.  Depth stall would trigger the same.
+ * Post-sync nonzero is what triggered this second workaround, so we
+ * can't use that one either.  Notify enable is IRQs, which aren't
+ * really our business.  That leaves only stall at scoreboard.
+ */
+static int
+gen6_emit_post_sync_nonzero_flush(struct i915_request *rq)
+{
+	u32 scratch_addr =
+		intel_gt_scratch_offset(rq->engine->gt,
+					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
+	u32 *cs;
+
+	cs = intel_ring_begin(rq, 6);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = GFX_OP_PIPE_CONTROL(5);
+	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
+	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
+	*cs++ = 0; /* low dword */
+	*cs++ = 0; /* high dword */
+	*cs++ = MI_NOOP;
+	intel_ring_advance(rq, cs);
+
+	cs = intel_ring_begin(rq, 6);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = GFX_OP_PIPE_CONTROL(5);
+	*cs++ = PIPE_CONTROL_QW_WRITE;
+	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
+	*cs++ = 0;
+	*cs++ = 0;
+	*cs++ = MI_NOOP;
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+int gen6_emit_flush_rcs(struct i915_request *rq, u32 mode)
+{
+	u32 scratch_addr =
+		intel_gt_scratch_offset(rq->engine->gt,
+					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
+	u32 *cs, flags = 0;
+	int ret;
+
+	/* Force SNB workarounds for PIPE_CONTROL flushes */
+	ret = gen6_emit_post_sync_nonzero_flush(rq);
+	if (ret)
+		return ret;
+
+	/*
+	 * Just flush everything.  Experiments have shown that reducing the
+	 * number of bits based on the write domains has little performance
+	 * impact. And when rearranging requests, the order of flushes is
+	 * unknown.
+	 */
+	if (mode & EMIT_FLUSH) {
+		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
+		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
+		/*
+		 * Ensure that any following seqno writes only happen
+		 * when the render cache is indeed flushed.
+		 */
+		flags |= PIPE_CONTROL_CS_STALL;
+	}
+	if (mode & EMIT_INVALIDATE) {
+		flags |= PIPE_CONTROL_TLB_INVALIDATE;
+		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
+		/*
+		 * TLB invalidate requires a post-sync write.
+		 */
+		flags |= PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
+	}
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = flags;
+	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
+	*cs++ = 0;
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
+{
+	/* First we do the gen6_emit_post_sync_nonzero_flush w/a */
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
+	*cs++ = 0;
+	*cs++ = 0;
+
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = PIPE_CONTROL_QW_WRITE;
+	*cs++ = intel_gt_scratch_offset(rq->engine->gt,
+					INTEL_GT_SCRATCH_FIELD_DEFAULT) |
+		PIPE_CONTROL_GLOBAL_GTT;
+	*cs++ = 0;
+
+	/* Finally we can flush and with it emit the breadcrumb */
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
+		 PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+		 PIPE_CONTROL_DC_FLUSH_ENABLE |
+		 PIPE_CONTROL_QW_WRITE |
+		 PIPE_CONTROL_CS_STALL);
+	*cs++ = i915_request_active_timeline(rq)->hwsp_offset |
+		PIPE_CONTROL_GLOBAL_GTT;
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
+
+	rq->tail = intel_ring_offset(rq, cs);
+	assert_ring_tail_valid(rq->ring, rq->tail);
+
+	return cs;
+}
+
+static int mi_flush_dw(struct i915_request *rq, u32 flags)
+{
+	u32 cmd, *cs;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	cmd = MI_FLUSH_DW;
+
+	/*
+	 * We always require a command barrier so that subsequent
+	 * commands, such as breadcrumb interrupts, are strictly ordered
+	 * wrt the contents of the write cache being flushed to memory
+	 * (and thus being coherent from the CPU).
+	 */
+	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+
+	/*
+	 * Bspec vol 1c.3 - blitter engine command streamer:
+	 * "If ENABLED, all TLBs will be invalidated once the flush
+	 * operation is complete. This bit is only valid when the
+	 * Post-Sync Operation field is a value of 1h or 3h."
+	 */
+	cmd |= flags;
+
+	*cs++ = cmd;
+	*cs++ = HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = 0;
+	*cs++ = MI_NOOP;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+static int gen6_flush_dw(struct i915_request *rq, u32 mode, u32 invflags)
+{
+	return mi_flush_dw(rq, mode & EMIT_INVALIDATE ? invflags : 0);
+}
+
+int gen6_emit_flush_xcs(struct i915_request *rq, u32 mode)
+{
+	return gen6_flush_dw(rq, mode, MI_INVALIDATE_TLB);
+}
+
+int gen6_emit_flush_vcs(struct i915_request *rq, u32 mode)
+{
+	return gen6_flush_dw(rq, mode, MI_INVALIDATE_TLB | MI_INVALIDATE_BSD);
+}
+
+int gen6_emit_bb_start(struct i915_request *rq,
+		       u64 offset, u32 len,
+		       unsigned int dispatch_flags)
+{
+	u32 security;
+	u32 *cs;
+
+	security = MI_BATCH_NON_SECURE_I965;
+	if (dispatch_flags & I915_DISPATCH_SECURE)
+		security = 0;
+
+	cs = intel_ring_begin(rq, 2);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	cs = __gen6_emit_bb_start(cs, offset, security);
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+int
+hsw_emit_bb_start(struct i915_request *rq,
+		  u64 offset, u32 len,
+		  unsigned int dispatch_flags)
+{
+	u32 security;
+	u32 *cs;
+
+	security = MI_BATCH_PPGTT_HSW | MI_BATCH_NON_SECURE_HSW;
+	if (dispatch_flags & I915_DISPATCH_SECURE)
+		security = 0;
+
+	cs = intel_ring_begin(rq, 2);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	cs = __gen6_emit_bb_start(cs, offset, security);
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+static int gen7_stall_cs(struct i915_request *rq)
+{
+	u32 *cs;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
+	*cs++ = 0;
+	*cs++ = 0;
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+int gen7_emit_flush_rcs(struct i915_request *rq, u32 mode)
+{
+	u32 scratch_addr =
+		intel_gt_scratch_offset(rq->engine->gt,
+					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
+	u32 *cs, flags = 0;
+
+	/*
+	 * Ensure that any following seqno writes only happen when the render
+	 * cache is indeed flushed.
+	 *
+	 * Workaround: 4th PIPE_CONTROL command (except the ones with only
+	 * read-cache invalidate bits set) must have the CS_STALL bit set. We
+	 * don't try to be clever and just set it unconditionally.
+	 */
+	flags |= PIPE_CONTROL_CS_STALL;
+
+	/*
+	 * CS_STALL suggests at least a post-sync write.
+	 */
+	flags |= PIPE_CONTROL_QW_WRITE;
+	flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
+
+	/*
+	 * Just flush everything.  Experiments have shown that reducing the
+	 * number of bits based on the write domains has little performance
+	 * impact.
+	 */
+	if (mode & EMIT_FLUSH) {
+		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
+		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
+		flags |= PIPE_CONTROL_DC_FLUSH_ENABLE;
+		flags |= PIPE_CONTROL_FLUSH_ENABLE;
+	}
+	if (mode & EMIT_INVALIDATE) {
+		flags |= PIPE_CONTROL_TLB_INVALIDATE;
+		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
+		flags |= PIPE_CONTROL_MEDIA_STATE_CLEAR;
+
+		/*
+		 * Workaround: we must issue a pipe_control with CS-stall bit
+		 * set before a pipe_control command that has the state cache
+		 * invalidate bit set.
+		 */
+		gen7_stall_cs(rq);
+	}
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = flags;
+	*cs++ = scratch_addr;
+	*cs++ = 0;
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
+{
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
+		 PIPE_CONTROL_DEPTH_CACHE_FLUSH |
+		 PIPE_CONTROL_DC_FLUSH_ENABLE |
+		 PIPE_CONTROL_FLUSH_ENABLE |
+		 PIPE_CONTROL_QW_WRITE |
+		 PIPE_CONTROL_GLOBAL_GTT_IVB |
+		 PIPE_CONTROL_CS_STALL);
+	*cs++ = i915_request_active_timeline(rq)->hwsp_offset;
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
+
+	rq->tail = intel_ring_offset(rq, cs);
+	assert_ring_tail_valid(rq->ring, rq->tail);
+
+	return cs;
+}
+
+u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
+{
+	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
+
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = MI_USER_INTERRUPT;
+
+	rq->tail = intel_ring_offset(rq, cs);
+	assert_ring_tail_valid(rq->ring, rq->tail);
+
+	return cs;
+}
+
+#define GEN7_XCS_WA 32
+u32 *gen7_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
+{
+	int i;
+
+	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
+
+	*cs++ = MI_FLUSH_DW | MI_INVALIDATE_TLB |
+		MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = rq->fence.seqno;
+
+	for (i = 0; i < GEN7_XCS_WA; i++) {
+		*cs++ = MI_STORE_DWORD_INDEX;
+		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+		*cs++ = rq->fence.seqno;
+	}
+
+	*cs++ = MI_FLUSH_DW;
+	*cs++ = 0;
+	*cs++ = 0;
+
+	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
+
+	rq->tail = intel_ring_offset(rq, cs);
+	assert_ring_tail_valid(rq->ring, rq->tail);
+
+	return cs;
+}
+#undef GEN7_XCS_WA
+
+void gen6_irq_enable(struct intel_engine_cs *engine)
+{
+	ENGINE_WRITE(engine, RING_IMR,
+		     ~(engine->irq_enable_mask | engine->irq_keep_mask));
+
+	/* Flush/delay to ensure the RING_IMR is active before the GT IMR */
+	ENGINE_POSTING_READ(engine, RING_IMR);
+
+	gen5_gt_enable_irq(engine->gt, engine->irq_enable_mask);
+}
+
+void gen6_irq_disable(struct intel_engine_cs *engine)
+{
+	ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask);
+	gen5_gt_disable_irq(engine->gt, engine->irq_enable_mask);
+}
+
+void hsw_irq_enable_vecs(struct intel_engine_cs *engine)
+{
+	ENGINE_WRITE(engine, RING_IMR, ~engine->irq_enable_mask);
+
+	/* Flush/delay to ensure the RING_IMR is active before the GT IMR */
+	ENGINE_POSTING_READ(engine, RING_IMR);
+
+	gen6_gt_pm_unmask_irq(engine->gt, engine->irq_enable_mask);
+}
+
+void hsw_irq_disable_vecs(struct intel_engine_cs *engine)
+{
+	ENGINE_WRITE(engine, RING_IMR, ~0);
+	gen6_gt_pm_mask_irq(engine->gt, engine->irq_enable_mask);
+}
diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.h b/drivers/gpu/drm/i915/gt/gen6_engine_cs.h
new file mode 100644
index 000000000000..76c6bc9f3bde
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#ifndef __GEN6_ENGINE_CS_H__
+#define __GEN6_ENGINE_CS_H__
+
+#include <linux/types.h>
+
+#include "intel_gpu_commands.h"
+
+struct i915_request;
+struct intel_engine_cs;
+
+int gen6_emit_flush_rcs(struct i915_request *rq, u32 mode);
+int gen6_emit_flush_vcs(struct i915_request *rq, u32 mode);
+int gen6_emit_flush_xcs(struct i915_request *rq, u32 mode);
+u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs);
+u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs);
+
+int gen7_emit_flush_rcs(struct i915_request *rq, u32 mode);
+u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs);
+u32 *gen7_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs);
+
+int gen6_emit_bb_start(struct i915_request *rq,
+		       u64 offset, u32 len,
+		       unsigned int dispatch_flags);
+int hsw_emit_bb_start(struct i915_request *rq,
+		      u64 offset, u32 len,
+		      unsigned int dispatch_flags);
+
+void gen6_irq_enable(struct intel_engine_cs *engine);
+void gen6_irq_disable(struct intel_engine_cs *engine);
+
+void hsw_irq_enable_vecs(struct intel_engine_cs *engine);
+void hsw_irq_disable_vecs(struct intel_engine_cs *engine);
+
+#endif /* __GEN6_ENGINE_CS_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 9bf6d4989968..791897f8d847 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -187,7 +187,6 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_SEQNO		0x40
 #define I915_GEM_HWS_SEQNO_ADDR		(I915_GEM_HWS_SEQNO * sizeof(u32))
 #define I915_GEM_HWS_SCRATCH		0x80
-#define I915_GEM_HWS_SCRATCH_ADDR	(I915_GEM_HWS_SCRATCH * sizeof(u32))
 
 #define I915_HWS_CSB_BUF0_INDEX		0x10
 #define I915_HWS_CSB_WRITE_INDEX	0x1f
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index ca7286e58409..96881cd8b17b 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -27,21 +27,15 @@
  *
  */
 
-#include <linux/log2.h>
-
-#include "gem/i915_gem_context.h"
-
+#include "gen2_engine_cs.h"
+#include "gen6_engine_cs.h"
 #include "gen6_ppgtt.h"
 #include "gen7_renderclear.h"
 #include "i915_drv.h"
-#include "i915_trace.h"
 #include "intel_context.h"
 #include "intel_gt.h"
-#include "intel_gt_irq.h"
-#include "intel_gt_pm_irq.h"
 #include "intel_reset.h"
 #include "intel_ring.h"
-#include "intel_workarounds.h"
 #include "shmem_utils.h"
 
 /* Rough estimate of the typical request size, performing a flush,
@@ -49,436 +43,6 @@
  */
 #define LEGACY_REQUEST_SIZE 200
 
-static int
-gen2_render_ring_flush(struct i915_request *rq, u32 mode)
-{
-	unsigned int num_store_dw;
-	u32 cmd, *cs;
-
-	cmd = MI_FLUSH;
-	num_store_dw = 0;
-	if (mode & EMIT_INVALIDATE)
-		cmd |= MI_READ_FLUSH;
-	if (mode & EMIT_FLUSH)
-		num_store_dw = 4;
-
-	cs = intel_ring_begin(rq, 2 + 3 * num_store_dw);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = cmd;
-	while (num_store_dw--) {
-		*cs++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
-		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
-						INTEL_GT_SCRATCH_FIELD_DEFAULT);
-		*cs++ = 0;
-	}
-	*cs++ = MI_FLUSH | MI_NO_WRITE_FLUSH;
-
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-static int
-gen4_render_ring_flush(struct i915_request *rq, u32 mode)
-{
-	u32 cmd, *cs;
-	int i;
-
-	/*
-	 * read/write caches:
-	 *
-	 * I915_GEM_DOMAIN_RENDER is always invalidated, but is
-	 * only flushed if MI_NO_WRITE_FLUSH is unset.  On 965, it is
-	 * also flushed at 2d versus 3d pipeline switches.
-	 *
-	 * read-only caches:
-	 *
-	 * I915_GEM_DOMAIN_SAMPLER is flushed on pre-965 if
-	 * MI_READ_FLUSH is set, and is always flushed on 965.
-	 *
-	 * I915_GEM_DOMAIN_COMMAND may not exist?
-	 *
-	 * I915_GEM_DOMAIN_INSTRUCTION, which exists on 965, is
-	 * invalidated when MI_EXE_FLUSH is set.
-	 *
-	 * I915_GEM_DOMAIN_VERTEX, which exists on 965, is
-	 * invalidated with every MI_FLUSH.
-	 *
-	 * TLBs:
-	 *
-	 * On 965, TLBs associated with I915_GEM_DOMAIN_COMMAND
-	 * and I915_GEM_DOMAIN_CPU in are invalidated at PTE write and
-	 * I915_GEM_DOMAIN_RENDER and I915_GEM_DOMAIN_SAMPLER
-	 * are flushed at any MI_FLUSH.
-	 */
-
-	cmd = MI_FLUSH;
-	if (mode & EMIT_INVALIDATE) {
-		cmd |= MI_EXE_FLUSH;
-		if (IS_G4X(rq->i915) || IS_GEN(rq->i915, 5))
-			cmd |= MI_INVALIDATE_ISP;
-	}
-
-	i = 2;
-	if (mode & EMIT_INVALIDATE)
-		i += 20;
-
-	cs = intel_ring_begin(rq, i);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = cmd;
-
-	/*
-	 * A random delay to let the CS invalidate take effect? Without this
-	 * delay, the GPU relocation path fails as the CS does not see
-	 * the updated contents. Just as important, if we apply the flushes
-	 * to the EMIT_FLUSH branch (i.e. immediately after the relocation
-	 * write and before the invalidate on the next batch), the relocations
-	 * still fail. This implies that is a delay following invalidation
-	 * that is required to reset the caches as opposed to a delay to
-	 * ensure the memory is written.
-	 */
-	if (mode & EMIT_INVALIDATE) {
-		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
-		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
-						INTEL_GT_SCRATCH_FIELD_DEFAULT) |
-			PIPE_CONTROL_GLOBAL_GTT;
-		*cs++ = 0;
-		*cs++ = 0;
-
-		for (i = 0; i < 12; i++)
-			*cs++ = MI_FLUSH;
-
-		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
-		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
-						INTEL_GT_SCRATCH_FIELD_DEFAULT) |
-			PIPE_CONTROL_GLOBAL_GTT;
-		*cs++ = 0;
-		*cs++ = 0;
-	}
-
-	*cs++ = cmd;
-
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-/*
- * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
- * implementing two workarounds on gen6.  From section 1.4.7.1
- * "PIPE_CONTROL" of the Sandy Bridge PRM volume 2 part 1:
- *
- * [DevSNB-C+{W/A}] Before any depth stall flush (including those
- * produced by non-pipelined state commands), software needs to first
- * send a PIPE_CONTROL with no bits set except Post-Sync Operation !=
- * 0.
- *
- * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache Flush Enable
- * =1, a PIPE_CONTROL with any non-zero post-sync-op is required.
- *
- * And the workaround for these two requires this workaround first:
- *
- * [Dev-SNB{W/A}]: Pipe-control with CS-stall bit set must be sent
- * BEFORE the pipe-control with a post-sync op and no write-cache
- * flushes.
- *
- * And this last workaround is tricky because of the requirements on
- * that bit.  From section 1.4.7.2.3 "Stall" of the Sandy Bridge PRM
- * volume 2 part 1:
- *
- *     "1 of the following must also be set:
- *      - Render Target Cache Flush Enable ([12] of DW1)
- *      - Depth Cache Flush Enable ([0] of DW1)
- *      - Stall at Pixel Scoreboard ([1] of DW1)
- *      - Depth Stall ([13] of DW1)
- *      - Post-Sync Operation ([13] of DW1)
- *      - Notify Enable ([8] of DW1)"
- *
- * The cache flushes require the workaround flush that triggered this
- * one, so we can't use it.  Depth stall would trigger the same.
- * Post-sync nonzero is what triggered this second workaround, so we
- * can't use that one either.  Notify enable is IRQs, which aren't
- * really our business.  That leaves only stall at scoreboard.
- */
-static int
-gen6_emit_post_sync_nonzero_flush(struct i915_request *rq)
-{
-	u32 scratch_addr =
-		intel_gt_scratch_offset(rq->engine->gt,
-					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
-	u32 *cs;
-
-	cs = intel_ring_begin(rq, 6);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = GFX_OP_PIPE_CONTROL(5);
-	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
-	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
-	*cs++ = 0; /* low dword */
-	*cs++ = 0; /* high dword */
-	*cs++ = MI_NOOP;
-	intel_ring_advance(rq, cs);
-
-	cs = intel_ring_begin(rq, 6);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = GFX_OP_PIPE_CONTROL(5);
-	*cs++ = PIPE_CONTROL_QW_WRITE;
-	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
-	*cs++ = 0;
-	*cs++ = 0;
-	*cs++ = MI_NOOP;
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-static int
-gen6_render_ring_flush(struct i915_request *rq, u32 mode)
-{
-	u32 scratch_addr =
-		intel_gt_scratch_offset(rq->engine->gt,
-					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
-	u32 *cs, flags = 0;
-	int ret;
-
-	/* Force SNB workarounds for PIPE_CONTROL flushes */
-	ret = gen6_emit_post_sync_nonzero_flush(rq);
-	if (ret)
-		return ret;
-
-	/* Just flush everything.  Experiments have shown that reducing the
-	 * number of bits based on the write domains has little performance
-	 * impact.
-	 */
-	if (mode & EMIT_FLUSH) {
-		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
-		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
-		/*
-		 * Ensure that any following seqno writes only happen
-		 * when the render cache is indeed flushed.
-		 */
-		flags |= PIPE_CONTROL_CS_STALL;
-	}
-	if (mode & EMIT_INVALIDATE) {
-		flags |= PIPE_CONTROL_TLB_INVALIDATE;
-		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
-		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
-		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
-		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
-		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
-		/*
-		 * TLB invalidate requires a post-sync write.
-		 */
-		flags |= PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
-	}
-
-	cs = intel_ring_begin(rq, 4);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = GFX_OP_PIPE_CONTROL(4);
-	*cs++ = flags;
-	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
-	*cs++ = 0;
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-static u32 *gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
-{
-	/* First we do the gen6_emit_post_sync_nonzero_flush w/a */
-	*cs++ = GFX_OP_PIPE_CONTROL(4);
-	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
-	*cs++ = 0;
-	*cs++ = 0;
-
-	*cs++ = GFX_OP_PIPE_CONTROL(4);
-	*cs++ = PIPE_CONTROL_QW_WRITE;
-	*cs++ = intel_gt_scratch_offset(rq->engine->gt,
-					INTEL_GT_SCRATCH_FIELD_DEFAULT) |
-		PIPE_CONTROL_GLOBAL_GTT;
-	*cs++ = 0;
-
-	/* Finally we can flush and with it emit the breadcrumb */
-	*cs++ = GFX_OP_PIPE_CONTROL(4);
-	*cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
-		 PIPE_CONTROL_DEPTH_CACHE_FLUSH |
-		 PIPE_CONTROL_DC_FLUSH_ENABLE |
-		 PIPE_CONTROL_QW_WRITE |
-		 PIPE_CONTROL_CS_STALL);
-	*cs++ = i915_request_active_timeline(rq)->hwsp_offset |
-		PIPE_CONTROL_GLOBAL_GTT;
-	*cs++ = rq->fence.seqno;
-
-	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
-
-	rq->tail = intel_ring_offset(rq, cs);
-	assert_ring_tail_valid(rq->ring, rq->tail);
-
-	return cs;
-}
-
-static int
-gen7_render_ring_cs_stall_wa(struct i915_request *rq)
-{
-	u32 *cs;
-
-	cs = intel_ring_begin(rq, 4);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = GFX_OP_PIPE_CONTROL(4);
-	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
-	*cs++ = 0;
-	*cs++ = 0;
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-static int
-gen7_render_ring_flush(struct i915_request *rq, u32 mode)
-{
-	u32 scratch_addr =
-		intel_gt_scratch_offset(rq->engine->gt,
-					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
-	u32 *cs, flags = 0;
-
-	/*
-	 * Ensure that any following seqno writes only happen when the render
-	 * cache is indeed flushed.
-	 *
-	 * Workaround: 4th PIPE_CONTROL command (except the ones with only
-	 * read-cache invalidate bits set) must have the CS_STALL bit set. We
-	 * don't try to be clever and just set it unconditionally.
-	 */
-	flags |= PIPE_CONTROL_CS_STALL;
-
-	/*
-	 * CS_STALL suggests at least a post-sync write.
-	 */
-	flags |= PIPE_CONTROL_QW_WRITE;
-	flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
-
-	/* Just flush everything.  Experiments have shown that reducing the
-	 * number of bits based on the write domains has little performance
-	 * impact.
-	 */
-	if (mode & EMIT_FLUSH) {
-		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
-		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
-		flags |= PIPE_CONTROL_DC_FLUSH_ENABLE;
-		flags |= PIPE_CONTROL_FLUSH_ENABLE;
-	}
-	if (mode & EMIT_INVALIDATE) {
-		flags |= PIPE_CONTROL_TLB_INVALIDATE;
-		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
-		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
-		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
-		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
-		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
-		flags |= PIPE_CONTROL_MEDIA_STATE_CLEAR;
-
-		/* Workaround: we must issue a pipe_control with CS-stall bit
-		 * set before a pipe_control command that has the state cache
-		 * invalidate bit set. */
-		gen7_render_ring_cs_stall_wa(rq);
-	}
-
-	cs = intel_ring_begin(rq, 4);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = GFX_OP_PIPE_CONTROL(4);
-	*cs++ = flags;
-	*cs++ = scratch_addr;
-	*cs++ = 0;
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-static u32 *gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
-{
-	*cs++ = GFX_OP_PIPE_CONTROL(4);
-	*cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
-		 PIPE_CONTROL_DEPTH_CACHE_FLUSH |
-		 PIPE_CONTROL_DC_FLUSH_ENABLE |
-		 PIPE_CONTROL_FLUSH_ENABLE |
-		 PIPE_CONTROL_QW_WRITE |
-		 PIPE_CONTROL_GLOBAL_GTT_IVB |
-		 PIPE_CONTROL_CS_STALL);
-	*cs++ = i915_request_active_timeline(rq)->hwsp_offset;
-	*cs++ = rq->fence.seqno;
-
-	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
-
-	rq->tail = intel_ring_offset(rq, cs);
-	assert_ring_tail_valid(rq->ring, rq->tail);
-
-	return cs;
-}
-
-static u32 *gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
-{
-	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
-	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
-
-	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
-	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
-	*cs++ = rq->fence.seqno;
-
-	*cs++ = MI_USER_INTERRUPT;
-
-	rq->tail = intel_ring_offset(rq, cs);
-	assert_ring_tail_valid(rq->ring, rq->tail);
-
-	return cs;
-}
-
-#define GEN7_XCS_WA 32
-static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
-{
-	int i;
-
-	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
-	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
-
-	*cs++ = MI_FLUSH_DW | MI_INVALIDATE_TLB |
-		MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
-	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
-	*cs++ = rq->fence.seqno;
-
-	for (i = 0; i < GEN7_XCS_WA; i++) {
-		*cs++ = MI_STORE_DWORD_INDEX;
-		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
-		*cs++ = rq->fence.seqno;
-	}
-
-	*cs++ = MI_FLUSH_DW;
-	*cs++ = 0;
-	*cs++ = 0;
-
-	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
-
-	rq->tail = intel_ring_offset(rq, cs);
-	assert_ring_tail_valid(rq->ring, rq->tail);
-
-	return cs;
-}
-#undef GEN7_XCS_WA
-
 static void set_hwstam(struct intel_engine_cs *engine, u32 mask)
 {
 	/*
@@ -918,255 +482,6 @@ static void i9xx_submit_request(struct i915_request *request)
 		     intel_ring_set_tail(request->ring, request->tail));
 }
 
-static u32 *i9xx_emit_breadcrumb(struct i915_request *rq, u32 *cs)
-{
-	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
-	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
-
-	*cs++ = MI_FLUSH;
-
-	*cs++ = MI_STORE_DWORD_INDEX;
-	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
-	*cs++ = rq->fence.seqno;
-
-	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
-
-	rq->tail = intel_ring_offset(rq, cs);
-	assert_ring_tail_valid(rq->ring, rq->tail);
-
-	return cs;
-}
-
-#define GEN5_WA_STORES 8 /* must be at least 1! */
-static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
-{
-	int i;
-
-	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
-	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
-
-	*cs++ = MI_FLUSH;
-
-	BUILD_BUG_ON(GEN5_WA_STORES < 1);
-	for (i = 0; i < GEN5_WA_STORES; i++) {
-		*cs++ = MI_STORE_DWORD_INDEX;
-		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
-		*cs++ = rq->fence.seqno;
-	}
-
-	*cs++ = MI_USER_INTERRUPT;
-
-	rq->tail = intel_ring_offset(rq, cs);
-	assert_ring_tail_valid(rq->ring, rq->tail);
-
-	return cs;
-}
-#undef GEN5_WA_STORES
-
-static void
-gen5_irq_enable(struct intel_engine_cs *engine)
-{
-	gen5_gt_enable_irq(engine->gt, engine->irq_enable_mask);
-}
-
-static void
-gen5_irq_disable(struct intel_engine_cs *engine)
-{
-	gen5_gt_disable_irq(engine->gt, engine->irq_enable_mask);
-}
-
-static void
-i9xx_irq_enable(struct intel_engine_cs *engine)
-{
-	engine->i915->irq_mask &= ~engine->irq_enable_mask;
-	intel_uncore_write(engine->uncore, GEN2_IMR, engine->i915->irq_mask);
-	intel_uncore_posting_read_fw(engine->uncore, GEN2_IMR);
-}
-
-static void
-i9xx_irq_disable(struct intel_engine_cs *engine)
-{
-	engine->i915->irq_mask |= engine->irq_enable_mask;
-	intel_uncore_write(engine->uncore, GEN2_IMR, engine->i915->irq_mask);
-}
-
-static void
-i8xx_irq_enable(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *i915 = engine->i915;
-
-	i915->irq_mask &= ~engine->irq_enable_mask;
-	intel_uncore_write16(&i915->uncore, GEN2_IMR, i915->irq_mask);
-	ENGINE_POSTING_READ16(engine, RING_IMR);
-}
-
-static void
-i8xx_irq_disable(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *i915 = engine->i915;
-
-	i915->irq_mask |= engine->irq_enable_mask;
-	intel_uncore_write16(&i915->uncore, GEN2_IMR, i915->irq_mask);
-}
-
-static int
-bsd_ring_flush(struct i915_request *rq, u32 mode)
-{
-	u32 *cs;
-
-	cs = intel_ring_begin(rq, 2);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = MI_FLUSH;
-	*cs++ = MI_NOOP;
-	intel_ring_advance(rq, cs);
-	return 0;
-}
-
-static void
-gen6_irq_enable(struct intel_engine_cs *engine)
-{
-	ENGINE_WRITE(engine, RING_IMR,
-		     ~(engine->irq_enable_mask | engine->irq_keep_mask));
-
-	/* Flush/delay to ensure the RING_IMR is active before the GT IMR */
-	ENGINE_POSTING_READ(engine, RING_IMR);
-
-	gen5_gt_enable_irq(engine->gt, engine->irq_enable_mask);
-}
-
-static void
-gen6_irq_disable(struct intel_engine_cs *engine)
-{
-	ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask);
-	gen5_gt_disable_irq(engine->gt, engine->irq_enable_mask);
-}
-
-static void
-hsw_vebox_irq_enable(struct intel_engine_cs *engine)
-{
-	ENGINE_WRITE(engine, RING_IMR, ~engine->irq_enable_mask);
-
-	/* Flush/delay to ensure the RING_IMR is active before the GT IMR */
-	ENGINE_POSTING_READ(engine, RING_IMR);
-
-	gen6_gt_pm_unmask_irq(engine->gt, engine->irq_enable_mask);
-}
-
-static void
-hsw_vebox_irq_disable(struct intel_engine_cs *engine)
-{
-	ENGINE_WRITE(engine, RING_IMR, ~0);
-	gen6_gt_pm_mask_irq(engine->gt, engine->irq_enable_mask);
-}
-
-static int
-i965_emit_bb_start(struct i915_request *rq,
-		   u64 offset, u32 length,
-		   unsigned int dispatch_flags)
-{
-	u32 *cs;
-
-	cs = intel_ring_begin(rq, 2);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT | (dispatch_flags &
-		I915_DISPATCH_SECURE ? 0 : MI_BATCH_NON_SECURE_I965);
-	*cs++ = offset;
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-/* Just userspace ABI convention to limit the wa batch bo to a resonable size */
-#define I830_BATCH_LIMIT SZ_256K
-#define I830_TLB_ENTRIES (2)
-#define I830_WA_SIZE max(I830_TLB_ENTRIES*4096, I830_BATCH_LIMIT)
-static int
-i830_emit_bb_start(struct i915_request *rq,
-		   u64 offset, u32 len,
-		   unsigned int dispatch_flags)
-{
-	u32 *cs, cs_offset =
-		intel_gt_scratch_offset(rq->engine->gt,
-					INTEL_GT_SCRATCH_FIELD_DEFAULT);
-
-	GEM_BUG_ON(rq->engine->gt->scratch->size < I830_WA_SIZE);
-
-	cs = intel_ring_begin(rq, 6);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	/* Evict the invalid PTE TLBs */
-	*cs++ = COLOR_BLT_CMD | BLT_WRITE_RGBA;
-	*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | 4096;
-	*cs++ = I830_TLB_ENTRIES << 16 | 4; /* load each page */
-	*cs++ = cs_offset;
-	*cs++ = 0xdeadbeef;
-	*cs++ = MI_NOOP;
-	intel_ring_advance(rq, cs);
-
-	if ((dispatch_flags & I915_DISPATCH_PINNED) == 0) {
-		if (len > I830_BATCH_LIMIT)
-			return -ENOSPC;
-
-		cs = intel_ring_begin(rq, 6 + 2);
-		if (IS_ERR(cs))
-			return PTR_ERR(cs);
-
-		/* Blit the batch (which has now all relocs applied) to the
-		 * stable batch scratch bo area (so that the CS never
-		 * stumbles over its tlb invalidation bug) ...
-		 */
-		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
-		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | 4096;
-		*cs++ = DIV_ROUND_UP(len, 4096) << 16 | 4096;
-		*cs++ = cs_offset;
-		*cs++ = 4096;
-		*cs++ = offset;
-
-		*cs++ = MI_FLUSH;
-		*cs++ = MI_NOOP;
-		intel_ring_advance(rq, cs);
-
-		/* ... and execute it. */
-		offset = cs_offset;
-	}
-
-	cs = intel_ring_begin(rq, 2);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
-	*cs++ = offset | (dispatch_flags & I915_DISPATCH_SECURE ? 0 :
-		MI_BATCH_NON_SECURE);
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-static int
-i915_emit_bb_start(struct i915_request *rq,
-		   u64 offset, u32 len,
-		   unsigned int dispatch_flags)
-{
-	u32 *cs;
-
-	cs = intel_ring_begin(rq, 2);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
-	*cs++ = offset | (dispatch_flags & I915_DISPATCH_SECURE ? 0 :
-		MI_BATCH_NON_SECURE);
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
 static void __ring_context_fini(struct intel_context *ce)
 {
 	i915_vma_put(ce->state);
@@ -1704,99 +1019,6 @@ static void gen6_bsd_submit_request(struct i915_request *request)
 	intel_uncore_forcewake_put(uncore, FORCEWAKE_ALL);
 }
 
-static int mi_flush_dw(struct i915_request *rq, u32 flags)
-{
-	u32 cmd, *cs;
-
-	cs = intel_ring_begin(rq, 4);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	cmd = MI_FLUSH_DW;
-
-	/*
-	 * We always require a command barrier so that subsequent
-	 * commands, such as breadcrumb interrupts, are strictly ordered
-	 * wrt the contents of the write cache being flushed to memory
-	 * (and thus being coherent from the CPU).
-	 */
-	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
-
-	/*
-	 * Bspec vol 1c.3 - blitter engine command streamer:
-	 * "If ENABLED, all TLBs will be invalidated once the flush
-	 * operation is complete. This bit is only valid when the
-	 * Post-Sync Operation field is a value of 1h or 3h."
-	 */
-	cmd |= flags;
-
-	*cs++ = cmd;
-	*cs++ = I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT;
-	*cs++ = 0;
-	*cs++ = MI_NOOP;
-
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-static int gen6_flush_dw(struct i915_request *rq, u32 mode, u32 invflags)
-{
-	return mi_flush_dw(rq, mode & EMIT_INVALIDATE ? invflags : 0);
-}
-
-static int gen6_bsd_ring_flush(struct i915_request *rq, u32 mode)
-{
-	return gen6_flush_dw(rq, mode, MI_INVALIDATE_TLB | MI_INVALIDATE_BSD);
-}
-
-static int
-hsw_emit_bb_start(struct i915_request *rq,
-		  u64 offset, u32 len,
-		  unsigned int dispatch_flags)
-{
-	u32 *cs;
-
-	cs = intel_ring_begin(rq, 2);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = MI_BATCH_BUFFER_START | (dispatch_flags & I915_DISPATCH_SECURE ?
-		0 : MI_BATCH_PPGTT_HSW | MI_BATCH_NON_SECURE_HSW);
-	/* bit0-7 is the length on GEN6+ */
-	*cs++ = offset;
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-static int
-gen6_emit_bb_start(struct i915_request *rq,
-		   u64 offset, u32 len,
-		   unsigned int dispatch_flags)
-{
-	u32 *cs;
-
-	cs = intel_ring_begin(rq, 2);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = MI_BATCH_BUFFER_START | (dispatch_flags & I915_DISPATCH_SECURE ?
-		0 : MI_BATCH_NON_SECURE_I965);
-	/* bit0-7 is the length on GEN6+ */
-	*cs++ = offset;
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-/* Blitter support (SandyBridge+) */
-
-static int gen6_ring_flush(struct i915_request *rq, u32 mode)
-{
-	return gen6_flush_dw(rq, mode, MI_INVALIDATE_TLB);
-}
-
 static void i9xx_set_default_submission(struct intel_engine_cs *engine)
 {
 	engine->submit_request = i9xx_submit_request;
@@ -1843,11 +1065,11 @@ static void setup_irq(struct intel_engine_cs *engine)
 		engine->irq_enable = gen5_irq_enable;
 		engine->irq_disable = gen5_irq_disable;
 	} else if (INTEL_GEN(i915) >= 3) {
-		engine->irq_enable = i9xx_irq_enable;
-		engine->irq_disable = i9xx_irq_disable;
+		engine->irq_enable = gen3_irq_enable;
+		engine->irq_disable = gen3_irq_disable;
 	} else {
-		engine->irq_enable = i8xx_irq_enable;
-		engine->irq_disable = i8xx_irq_disable;
+		engine->irq_enable = gen2_irq_enable;
+		engine->irq_disable = gen2_irq_disable;
 	}
 }
 
@@ -1874,7 +1096,7 @@ static void setup_common(struct intel_engine_cs *engine)
 	 * equivalent to our next initial bread so we can elide
 	 * engine->emit_init_breadcrumb().
 	 */
-	engine->emit_fini_breadcrumb = i9xx_emit_breadcrumb;
+	engine->emit_fini_breadcrumb = gen3_emit_breadcrumb;
 	if (IS_GEN(i915, 5))
 		engine->emit_fini_breadcrumb = gen5_emit_breadcrumb;
 
@@ -1883,11 +1105,11 @@ static void setup_common(struct intel_engine_cs *engine)
 	if (INTEL_GEN(i915) >= 6)
 		engine->emit_bb_start = gen6_emit_bb_start;
 	else if (INTEL_GEN(i915) >= 4)
-		engine->emit_bb_start = i965_emit_bb_start;
+		engine->emit_bb_start = gen4_emit_bb_start;
 	else if (IS_I830(i915) || IS_I845G(i915))
 		engine->emit_bb_start = i830_emit_bb_start;
 	else
-		engine->emit_bb_start = i915_emit_bb_start;
+		engine->emit_bb_start = gen3_emit_bb_start;
 }
 
 static void setup_rcs(struct intel_engine_cs *engine)
@@ -1900,18 +1122,18 @@ static void setup_rcs(struct intel_engine_cs *engine)
 	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 
 	if (INTEL_GEN(i915) >= 7) {
-		engine->emit_flush = gen7_render_ring_flush;
-		engine->emit_fini_breadcrumb = gen7_rcs_emit_breadcrumb;
+		engine->emit_flush = gen7_emit_flush_rcs;
+		engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_rcs;
 	} else if (IS_GEN(i915, 6)) {
-		engine->emit_flush = gen6_render_ring_flush;
-		engine->emit_fini_breadcrumb = gen6_rcs_emit_breadcrumb;
+		engine->emit_flush = gen6_emit_flush_rcs;
+		engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_rcs;
 	} else if (IS_GEN(i915, 5)) {
-		engine->emit_flush = gen4_render_ring_flush;
+		engine->emit_flush = gen4_emit_flush_rcs;
 	} else {
 		if (INTEL_GEN(i915) < 4)
-			engine->emit_flush = gen2_render_ring_flush;
+			engine->emit_flush = gen2_emit_flush;
 		else
-			engine->emit_flush = gen4_render_ring_flush;
+			engine->emit_flush = gen4_emit_flush_rcs;
 		engine->irq_enable_mask = I915_USER_INTERRUPT;
 	}
 
@@ -1929,15 +1151,15 @@ static void setup_vcs(struct intel_engine_cs *engine)
 		/* gen6 bsd needs a special wa for tail updates */
 		if (IS_GEN(i915, 6))
 			engine->set_default_submission = gen6_bsd_set_default_submission;
-		engine->emit_flush = gen6_bsd_ring_flush;
+		engine->emit_flush = gen6_emit_flush_vcs;
 		engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
 
 		if (IS_GEN(i915, 6))
-			engine->emit_fini_breadcrumb = gen6_xcs_emit_breadcrumb;
+			engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_xcs;
 		else
-			engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
+			engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs;
 	} else {
-		engine->emit_flush = bsd_ring_flush;
+		engine->emit_flush = gen4_emit_flush_vcs;
 		if (IS_GEN(i915, 5))
 			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
 		else
@@ -1949,13 +1171,13 @@ static void setup_bcs(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
 
-	engine->emit_flush = gen6_ring_flush;
+	engine->emit_flush = gen6_emit_flush_xcs;
 	engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
 
 	if (IS_GEN(i915, 6))
-		engine->emit_fini_breadcrumb = gen6_xcs_emit_breadcrumb;
+		engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_xcs;
 	else
-		engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
+		engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs;
 }
 
 static void setup_vecs(struct intel_engine_cs *engine)
@@ -1964,12 +1186,12 @@ static void setup_vecs(struct intel_engine_cs *engine)
 
 	GEM_BUG_ON(INTEL_GEN(i915) < 7);
 
-	engine->emit_flush = gen6_ring_flush;
+	engine->emit_flush = gen6_emit_flush_xcs;
 	engine->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
-	engine->irq_enable = hsw_vebox_irq_enable;
-	engine->irq_disable = hsw_vebox_irq_disable;
+	engine->irq_enable = hsw_irq_enable_vecs;
+	engine->irq_disable = hsw_irq_disable_vecs;
 
-	engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
+	engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs;
 }
 
 static int gen7_ctx_switch_bb_setup(struct intel_engine_cs * const engine,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 03/36] drm/i915/gt: Move legacy context wa to intel_workarounds
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 02/36] drm/i915/gt: Split low level gen2-7 CS emitters Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-02  9:05   ` Mika Kuoppala
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 04/36] drm/i915: Trim the ironlake+ irq handler Chris Wilson
                   ` (40 subsequent siblings)
  42 siblings, 1 reply; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Use the central mechanism for recording and verifying that we restore
the w/a for the older devices as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gt/intel_ring_submission.c   | 28 -----------------
 drivers/gpu/drm/i915/gt/intel_workarounds.c   | 31 +++++++++++++++++++
 2 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 96881cd8b17b..d9c1701061b9 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -429,32 +429,6 @@ static void reset_finish(struct intel_engine_cs *engine)
 {
 }
 
-static int rcs_resume(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *i915 = engine->i915;
-	struct intel_uncore *uncore = engine->uncore;
-
-	/*
-	 * Disable CONSTANT_BUFFER before it is loaded from the context
-	 * image. For as it is loaded, it is executed and the stored
-	 * address may no longer be valid, leading to a GPU hang.
-	 *
-	 * This imposes the requirement that userspace reload their
-	 * CONSTANT_BUFFER on every batch, fortunately a requirement
-	 * they are already accustomed to from before contexts were
-	 * enabled.
-	 */
-	if (IS_GEN(i915, 4))
-		intel_uncore_write(uncore, ECOSKPD,
-			   _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE));
-
-	if (IS_GEN_RANGE(i915, 6, 7))
-		intel_uncore_write(uncore, INSTPM,
-				   _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
-
-	return xcs_resume(engine);
-}
-
 static void reset_cancel(struct intel_engine_cs *engine)
 {
 	struct i915_request *request;
@@ -1139,8 +1113,6 @@ static void setup_rcs(struct intel_engine_cs *engine)
 
 	if (IS_HASWELL(i915))
 		engine->emit_bb_start = hsw_emit_bb_start;
-
-	engine->resume = rcs_resume;
 }
 
 static void setup_vcs(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index fa1e15657663..94d66a9d760d 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -199,6 +199,18 @@ wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
 #define WA_SET_FIELD_MASKED(addr, mask, value) \
 	wa_write_masked_or(wal, (addr), 0, _MASKED_FIELD((mask), (value)))
 
+static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine,
+				      struct i915_wa_list *wal)
+{
+	WA_SET_BIT_MASKED(INSTPM, INSTPM_FORCE_ORDERING);
+}
+
+static void gen7_ctx_workarounds_init(struct intel_engine_cs *engine,
+				      struct i915_wa_list *wal)
+{
+	WA_SET_BIT_MASKED(INSTPM, INSTPM_FORCE_ORDERING);
+}
+
 static void gen8_ctx_workarounds_init(struct intel_engine_cs *engine,
 				      struct i915_wa_list *wal)
 {
@@ -638,6 +650,10 @@ __intel_engine_init_ctx_wa(struct intel_engine_cs *engine,
 		chv_ctx_workarounds_init(engine, wal);
 	else if (IS_BROADWELL(i915))
 		bdw_ctx_workarounds_init(engine, wal);
+	else if (IS_GEN(i915, 7))
+		gen7_ctx_workarounds_init(engine, wal);
+	else if (IS_GEN(i915, 6))
+		gen6_ctx_workarounds_init(engine, wal);
 	else if (INTEL_GEN(i915) < 8)
 		return;
 	else
@@ -1583,6 +1599,21 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		       0, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH),
 		       /* XXX bit doesn't stick on Broadwater */
 		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH);
+
+	if (IS_GEN(i915, 4))
+		/*
+		 * Disable CONSTANT_BUFFER before it is loaded from the context
+		 * image. For as it is loaded, it is executed and the stored
+		 * address may no longer be valid, leading to a GPU hang.
+		 *
+		 * This imposes the requirement that userspace reload their
+		 * CONSTANT_BUFFER on every batch, fortunately a requirement
+		 * they are already accustomed to from before contexts were
+		 * enabled.
+		 */
+		wa_add(wal, ECOSKPD,
+		       0, _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE),
+		       0 /* XXX bit doesn't stick on Broadwater */);
 }
 
 static void
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 04/36] drm/i915: Trim the ironlake+ irq handler
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 02/36] drm/i915/gt: Split low level gen2-7 CS emitters Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 03/36] drm/i915/gt: Move legacy context wa to intel_workarounds Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01 11:49   ` Mika Kuoppala
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 05/36] Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq" Chris Wilson
                   ` (39 subsequent siblings)
  42 siblings, 1 reply; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Ever noticed that our interrupt handlers are where we spend most of our
time on a busy system? In part this is unavoidable as each interrupt
requires to poll and reset several registers, but we can try and so as
efficiently as possible.

Function                                     old     new   delta
ilk_irq_handler                             2317    2156    -161

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 59 ++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 63579ab71cf6..07c0c7ea3795 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2097,69 +2097,66 @@ static void ivb_display_irq_handler(struct drm_i915_private *dev_priv,
  */
 static irqreturn_t ilk_irq_handler(int irq, void *arg)
 {
-	struct drm_i915_private *dev_priv = arg;
+	struct drm_i915_private *i915 = arg;
+	void __iomem * const regs = i915->uncore.regs;
 	u32 de_iir, gt_iir, de_ier, sde_ier = 0;
-	irqreturn_t ret = IRQ_NONE;
 
-	if (!intel_irqs_enabled(dev_priv))
+	if (!unlikely(intel_irqs_enabled(i915)))
 		return IRQ_NONE;
 
 	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
-	disable_rpm_wakeref_asserts(&dev_priv->runtime_pm);
+	disable_rpm_wakeref_asserts(&i915->runtime_pm);
 
 	/* disable master interrupt before clearing iir  */
-	de_ier = I915_READ(DEIER);
-	I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
+	de_ier = raw_reg_read(regs, DEIER);
+	raw_reg_write(regs, DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
 
 	/* Disable south interrupts. We'll only write to SDEIIR once, so further
 	 * interrupts will will be stored on its back queue, and then we'll be
 	 * able to process them after we restore SDEIER (as soon as we restore
 	 * it, we'll get an interrupt if SDEIIR still has something to process
 	 * due to its back queue). */
-	if (!HAS_PCH_NOP(dev_priv)) {
-		sde_ier = I915_READ(SDEIER);
-		I915_WRITE(SDEIER, 0);
+	if (!HAS_PCH_NOP(i915)) {
+		sde_ier = raw_reg_read(regs, SDEIER);
+		raw_reg_write(regs, SDEIER, 0);
 	}
 
 	/* Find, clear, then process each source of interrupt */
 
-	gt_iir = I915_READ(GTIIR);
+	gt_iir = raw_reg_read(regs, GTIIR);
 	if (gt_iir) {
-		I915_WRITE(GTIIR, gt_iir);
-		ret = IRQ_HANDLED;
-		if (INTEL_GEN(dev_priv) >= 6)
-			gen6_gt_irq_handler(&dev_priv->gt, gt_iir);
+		raw_reg_write(regs, GTIIR, gt_iir);
+		if (INTEL_GEN(i915) >= 6)
+			gen6_gt_irq_handler(&i915->gt, gt_iir);
 		else
-			gen5_gt_irq_handler(&dev_priv->gt, gt_iir);
+			gen5_gt_irq_handler(&i915->gt, gt_iir);
 	}
 
-	de_iir = I915_READ(DEIIR);
+	de_iir = raw_reg_read(regs, DEIIR);
 	if (de_iir) {
-		I915_WRITE(DEIIR, de_iir);
-		ret = IRQ_HANDLED;
-		if (INTEL_GEN(dev_priv) >= 7)
-			ivb_display_irq_handler(dev_priv, de_iir);
+		raw_reg_write(regs, DEIIR, de_iir);
+		if (INTEL_GEN(i915) >= 7)
+			ivb_display_irq_handler(i915, de_iir);
 		else
-			ilk_display_irq_handler(dev_priv, de_iir);
+			ilk_display_irq_handler(i915, de_iir);
 	}
 
-	if (INTEL_GEN(dev_priv) >= 6) {
-		u32 pm_iir = I915_READ(GEN6_PMIIR);
+	if (INTEL_GEN(i915) >= 6) {
+		u32 pm_iir = raw_reg_read(regs, GEN6_PMIIR);
 		if (pm_iir) {
-			I915_WRITE(GEN6_PMIIR, pm_iir);
-			ret = IRQ_HANDLED;
-			gen6_rps_irq_handler(&dev_priv->gt.rps, pm_iir);
+			raw_reg_write(regs, GEN6_PMIIR, pm_iir);
+			gen6_rps_irq_handler(&i915->gt.rps, pm_iir);
 		}
 	}
 
-	I915_WRITE(DEIER, de_ier);
-	if (!HAS_PCH_NOP(dev_priv))
-		I915_WRITE(SDEIER, sde_ier);
+	raw_reg_write(regs, DEIER, de_ier);
+	if (sde_ier)
+		raw_reg_write(regs, SDEIER, sde_ier);
 
 	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
-	enable_rpm_wakeref_asserts(&dev_priv->runtime_pm);
+	enable_rpm_wakeref_asserts(&i915->runtime_pm);
 
-	return ret;
+	return IRQ_HANDLED;
 }
 
 static void bxt_hpd_irq_handler(struct drm_i915_private *dev_priv,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 05/36] Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq"
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (2 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 04/36] drm/i915: Trim the ironlake+ irq handler Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 06/36] drm/i915/gt: Couple tasklet scheduling for all CS interrupts Chris Wilson
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

This was removed in commit 478ffad6d690 ("drm/i915: drop
engine_pin/unpin_breadcrumbs_irq") as the last user had been removed,
but now there is a promise of a new user in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 22 +++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_engine.h      |  3 +++
 2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index d907d538176e..03c14ab86d95 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -220,6 +220,28 @@ static void signal_irq_work(struct irq_work *work)
 	}
 }
 
+void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	spin_lock_irq(&b->irq_lock);
+	if (!b->irq_enabled++)
+		irq_enable(engine);
+	GEM_BUG_ON(!b->irq_enabled); /* no overflow! */
+	spin_unlock_irq(&b->irq_lock);
+}
+
+void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	spin_lock_irq(&b->irq_lock);
+	GEM_BUG_ON(!b->irq_enabled); /* no underflow! */
+	if (!--b->irq_enabled)
+		irq_disable(engine);
+	spin_unlock_irq(&b->irq_lock);
+}
+
 static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 {
 	struct intel_engine_cs *engine =
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 791897f8d847..043462b6ce1f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -226,6 +226,9 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine);
 void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 
+void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine);
+void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine);
+
 void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
 
 static inline void
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 06/36] drm/i915/gt: Couple tasklet scheduling for all CS interrupts
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (3 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 05/36] Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq" Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 07/36] drm/i915/gt: Support creation of 'internal' rings Chris Wilson
                   ` (37 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If any engine asks for the tasklet to be kicked from the CS interrupt,
do so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_gt_irq.c | 17 ++++++++++++-----
 drivers/gpu/drm/i915/gt/intel_gt_irq.h |  3 +++
 drivers/gpu/drm/i915/gt/intel_rps.c    |  2 +-
 drivers/gpu/drm/i915/i915_irq.c        |  8 ++++----
 4 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
index 0cc7dd54f4f9..28edf314a319 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
@@ -60,6 +60,13 @@ cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 		tasklet_hi_schedule(&engine->execlists.tasklet);
 }
 
+void gen2_engine_cs_irq(struct intel_engine_cs *engine)
+{
+	intel_engine_signal_breadcrumbs(engine);
+	if (intel_engine_needs_breadcrumb_tasklet(engine))
+		tasklet_hi_schedule(&engine->execlists.tasklet);
+}
+
 static u32
 gen11_gt_engine_identity(struct intel_gt *gt,
 			 const unsigned int bank, const unsigned int bit)
@@ -273,9 +280,9 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt)
 void gen5_gt_irq_handler(struct intel_gt *gt, u32 gt_iir)
 {
 	if (gt_iir & GT_RENDER_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]);
+		gen2_engine_cs_irq(gt->engine_class[RENDER_CLASS][0]);
 	if (gt_iir & ILK_BSD_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]);
+		gen2_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0]);
 }
 
 static void gen7_parity_error_irq_handler(struct intel_gt *gt, u32 iir)
@@ -299,11 +306,11 @@ static void gen7_parity_error_irq_handler(struct intel_gt *gt, u32 iir)
 void gen6_gt_irq_handler(struct intel_gt *gt, u32 gt_iir)
 {
 	if (gt_iir & GT_RENDER_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine_class[RENDER_CLASS][0]);
+		gen2_engine_cs_irq(gt->engine_class[RENDER_CLASS][0]);
 	if (gt_iir & GT_BSD_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine_class[VIDEO_DECODE_CLASS][0]);
+		gen2_engine_cs_irq(gt->engine_class[VIDEO_DECODE_CLASS][0]);
 	if (gt_iir & GT_BLT_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine_class[COPY_ENGINE_CLASS][0]);
+		gen2_engine_cs_irq(gt->engine_class[COPY_ENGINE_CLASS][0]);
 
 	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
 		      GT_BSD_CS_ERROR_INTERRUPT |
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.h b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
index 886c5cf408a2..6c69cd563fe1 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.h
@@ -9,6 +9,7 @@
 
 #include <linux/types.h>
 
+struct intel_engine_cs;
 struct intel_gt;
 
 #define GEN8_GT_IRQS (GEN8_GT_RCS_IRQ | \
@@ -19,6 +20,8 @@ struct intel_gt;
 		      GEN8_GT_PM_IRQ | \
 		      GEN8_GT_GUC_IRQ)
 
+void gen2_engine_cs_irq(struct intel_engine_cs *engine);
+
 void gen11_gt_irq_reset(struct intel_gt *gt);
 void gen11_gt_irq_postinstall(struct intel_gt *gt);
 void gen11_gt_irq_handler(struct intel_gt *gt, const u32 master_ctl);
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index 2f59fc6df3c2..2e4ddc9ca09d 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1741,7 +1741,7 @@ void gen6_rps_irq_handler(struct intel_rps *rps, u32 pm_iir)
 		return;
 
 	if (pm_iir & PM_VEBOX_USER_INTERRUPT)
-		intel_engine_signal_breadcrumbs(gt->engine[VECS0]);
+		gen2_engine_cs_irq(gt->engine[VECS0]);
 
 	if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
 		DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 07c0c7ea3795..096123777533 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3678,7 +3678,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
 		intel_uncore_write16(&dev_priv->uncore, GEN2_IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
+			gen2_engine_cs_irq(dev_priv->gt.engine[RCS0]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i8xx_error_irq_handler(dev_priv, eir, eir_stuck);
@@ -3783,7 +3783,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 		I915_WRITE(GEN2_IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
+			gen2_engine_cs_irq(dev_priv->gt.engine[RCS0]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
@@ -3925,10 +3925,10 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 		I915_WRITE(GEN2_IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[RCS0]);
+			gen2_engine_cs_irq(dev_priv->gt.engine[RCS0]);
 
 		if (iir & I915_BSD_USER_INTERRUPT)
-			intel_engine_signal_breadcrumbs(dev_priv->gt.engine[VCS0]);
+			gen2_engine_cs_irq(dev_priv->gt.engine[VCS0]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 07/36] drm/i915/gt: Support creation of 'internal' rings
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (4 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 06/36] drm/i915/gt: Couple tasklet scheduling for all CS interrupts Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 08/36] drm/i915/gt: Use client timeline address for seqno writes Chris Wilson
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

To support legacy ring buffer scheduling, we want a virtual ringbuffer
for each client. These rings are purely for holding the requests as they
are being constructed on the CPU and never accessed by the GPU, so they
should not be bound into the GGTT, and we can use plain old WB mapped
pages.

As they are not bound, we need to nerf a few assumptions that a rq->ring
is in the GGTT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_context.c    |  2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c  | 17 ++----
 drivers/gpu/drm/i915/gt/intel_ring.c       | 63 ++++++++++++++--------
 drivers/gpu/drm/i915/gt/intel_ring.h       | 12 ++++-
 drivers/gpu/drm/i915/gt/intel_ring_types.h |  2 +
 5 files changed, 57 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index e4aece20bc80..fd71977c010a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -127,7 +127,7 @@ int __intel_context_do_pin(struct intel_context *ce)
 			goto err_active;
 
 		CE_TRACE(ce, "pin ring:{start:%08x, head:%04x, tail:%04x}\n",
-			 i915_ggtt_offset(ce->ring->vma),
+			 intel_ring_address(ce->ring),
 			 ce->ring->head, ce->ring->tail);
 
 		smp_mb__before_atomic(); /* flush pin before it is visible */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index c8c14981eb5d..64e13c074708 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1261,7 +1261,7 @@ static int print_ring(char *buf, int sz, struct i915_request *rq)
 
 		len = scnprintf(buf, sz,
 				"ring:{start:%08x, hwsp:%08x, seqno:%08x, runtime:%llums}, ",
-				i915_ggtt_offset(rq->ring->vma),
+				intel_ring_address(rq->ring),
 				tl ? tl->hwsp_offset : 0,
 				hwsp_seqno(rq),
 				DIV_ROUND_CLOSEST_ULL(intel_context_get_total_runtime_ns(rq->context),
@@ -1542,7 +1542,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		print_request(m, rq, "\t\tactive ");
 
 		drm_printf(m, "\t\tring->start:  0x%08x\n",
-			   i915_ggtt_offset(rq->ring->vma));
+			   intel_ring_address(rq->ring));
 		drm_printf(m, "\t\tring->head:   0x%08x\n",
 			   rq->ring->head);
 		drm_printf(m, "\t\tring->tail:   0x%08x\n",
@@ -1621,13 +1621,6 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine)
 	return total;
 }
 
-static bool match_ring(struct i915_request *rq)
-{
-	u32 ring = ENGINE_READ(rq->engine, RING_START);
-
-	return ring == i915_ggtt_offset(rq->ring->vma);
-}
-
 struct i915_request *
 intel_engine_find_active_request(struct intel_engine_cs *engine)
 {
@@ -1667,11 +1660,7 @@ intel_engine_find_active_request(struct intel_engine_cs *engine)
 			continue;
 
 		if (!i915_request_started(request))
-			continue;
-
-		/* More than one preemptible request may match! */
-		if (!match_ring(request))
-			continue;
+			break;
 
 		active = request;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
index 8cda1b7e17ba..438637996ab5 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -24,33 +24,42 @@ unsigned int intel_ring_update_space(struct intel_ring *ring)
 int intel_ring_pin(struct intel_ring *ring)
 {
 	struct i915_vma *vma = ring->vma;
-	unsigned int flags;
 	void *addr;
 	int ret;
 
 	if (atomic_fetch_inc(&ring->pin_count))
 		return 0;
 
-	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
-	flags = PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma);
+	if (!(ring->flags & INTEL_RING_CREATE_INTERNAL)) {
+		unsigned int pin;
 
-	if (vma->obj->stolen)
-		flags |= PIN_MAPPABLE;
-	else
-		flags |= PIN_HIGH;
+		/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
+		pin |= PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma);
 
-	ret = i915_ggtt_pin(vma, 0, flags);
-	if (unlikely(ret))
-		goto err_unpin;
+		if (vma->obj->stolen)
+			pin |= PIN_MAPPABLE;
+		else
+			pin |= PIN_HIGH;
 
-	if (i915_vma_is_map_and_fenceable(vma))
-		addr = (void __force *)i915_vma_pin_iomap(vma);
-	else
-		addr = i915_gem_object_pin_map(vma->obj,
-					       i915_coherent_map_type(vma->vm->i915));
-	if (IS_ERR(addr)) {
-		ret = PTR_ERR(addr);
-		goto err_ring;
+		ret = i915_ggtt_pin(vma, 0, pin);
+		if (unlikely(ret))
+			goto err_unpin;
+
+		if (i915_vma_is_map_and_fenceable(vma))
+			addr = (void __force *)i915_vma_pin_iomap(vma);
+		else
+			addr = i915_gem_object_pin_map(vma->obj,
+						       i915_coherent_map_type(vma->vm->i915));
+		if (IS_ERR(addr)) {
+			ret = PTR_ERR(addr);
+			goto err_ring;
+		}
+	} else {
+		addr = i915_gem_object_pin_map(vma->obj, I915_MAP_WB);
+		if (IS_ERR(addr)) {
+			ret = PTR_ERR(addr);
+			goto err_ring;
+		}
 	}
 
 	i915_vma_make_unshrinkable(vma);
@@ -91,10 +100,12 @@ void intel_ring_unpin(struct intel_ring *ring)
 		i915_gem_object_unpin_map(vma->obj);
 
 	i915_vma_make_purgeable(vma);
-	i915_vma_unpin(vma);
+	if (!(ring->flags & INTEL_RING_CREATE_INTERNAL))
+		i915_vma_unpin(vma);
 }
 
-static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
+static struct i915_vma *
+create_ring_vma(struct i915_ggtt *ggtt, int size, unsigned int flags)
 {
 	struct i915_address_space *vm = &ggtt->vm;
 	struct drm_i915_private *i915 = vm->i915;
@@ -102,7 +113,8 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
 	struct i915_vma *vma;
 
 	obj = ERR_PTR(-ENODEV);
-	if (i915_ggtt_has_aperture(ggtt))
+	if (!(flags & INTEL_RING_CREATE_INTERNAL) &&
+	    i915_ggtt_has_aperture(ggtt))
 		obj = i915_gem_object_create_stolen(i915, size);
 	if (IS_ERR(obj))
 		obj = i915_gem_object_create_internal(i915, size);
@@ -128,12 +140,14 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
 }
 
 struct intel_ring *
-intel_engine_create_ring(struct intel_engine_cs *engine, int size)
+intel_engine_create_ring(struct intel_engine_cs *engine, unsigned int size)
 {
 	struct drm_i915_private *i915 = engine->i915;
+	unsigned int flags = size & GENMASK(11, 0);
 	struct intel_ring *ring;
 	struct i915_vma *vma;
 
+	size ^= flags;
 	GEM_BUG_ON(!is_power_of_2(size));
 	GEM_BUG_ON(RING_CTL_SIZE(size) & ~RING_NR_PAGES);
 
@@ -142,8 +156,10 @@ intel_engine_create_ring(struct intel_engine_cs *engine, int size)
 		return ERR_PTR(-ENOMEM);
 
 	kref_init(&ring->ref);
+
 	ring->size = size;
 	ring->wrap = BITS_PER_TYPE(ring->size) - ilog2(size);
+	ring->flags = flags;
 
 	/*
 	 * Workaround an erratum on the i830 which causes a hang if
@@ -156,11 +172,12 @@ intel_engine_create_ring(struct intel_engine_cs *engine, int size)
 
 	intel_ring_update_space(ring);
 
-	vma = create_ring_vma(engine->gt->ggtt, size);
+	vma = create_ring_vma(engine->gt->ggtt, size, flags);
 	if (IS_ERR(vma)) {
 		kfree(ring);
 		return ERR_CAST(vma);
 	}
+
 	ring->vma = vma;
 
 	return ring;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.h b/drivers/gpu/drm/i915/gt/intel_ring.h
index cc0ebca65167..d022fa209325 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.h
+++ b/drivers/gpu/drm/i915/gt/intel_ring.h
@@ -9,12 +9,14 @@
 
 #include "i915_gem.h" /* GEM_BUG_ON */
 #include "i915_request.h"
+#include "i915_vma.h"
 #include "intel_ring_types.h"
 
 struct intel_engine_cs;
 
 struct intel_ring *
-intel_engine_create_ring(struct intel_engine_cs *engine, int size);
+intel_engine_create_ring(struct intel_engine_cs *engine, unsigned int size);
+#define INTEL_RING_CREATE_INTERNAL BIT(0)
 
 u32 *intel_ring_begin(struct i915_request *rq, unsigned int num_dwords);
 int intel_ring_cacheline_align(struct i915_request *rq);
@@ -137,4 +139,12 @@ __intel_ring_space(unsigned int head, unsigned int tail, unsigned int size)
 	return (head - tail - CACHELINE_BYTES) & (size - 1);
 }
 
+static inline u32 intel_ring_address(const struct intel_ring *ring)
+{
+	if (ring->flags & INTEL_RING_CREATE_INTERNAL)
+		return -1;
+
+	return i915_ggtt_offset(ring->vma);
+}
+
 #endif /* INTEL_RING_H */
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_types.h b/drivers/gpu/drm/i915/gt/intel_ring_types.h
index 1a189ea00fd8..d927deafcb33 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_ring_types.h
@@ -47,6 +47,8 @@ struct intel_ring {
 	u32 size;
 	u32 wrap;
 	u32 effective_size;
+
+	unsigned long flags;
 };
 
 #endif /* INTEL_RING_TYPES_H */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 08/36] drm/i915/gt: Use client timeline address for seqno writes
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (5 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 07/36] drm/i915/gt: Support creation of 'internal' rings Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 09/36] drm/i915: Support inter-engine semaphores on gen6/7 Chris Wilson
                   ` (35 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If we allow for per-client timelines, even with legacy ring submission,
we open the door to a world full of possiblities [scheduling and
semaphores].

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/gen6_engine_cs.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
index ce38d1bcaba3..fa11174bb13b 100644
--- a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
@@ -373,11 +373,10 @@ u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
 
 u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
 {
-	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
-	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
+	u32 addr = i915_request_active_timeline(rq)->hwsp_offset;
 
-	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
-	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW;
+	*cs++ = addr | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->fence.seqno;
 
 	*cs++ = MI_USER_INTERRUPT;
@@ -391,19 +390,17 @@ u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
 #define GEN7_XCS_WA 32
 u32 *gen7_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
 {
+	u32 addr = i915_request_active_timeline(rq)->hwsp_offset;
 	int i;
 
-	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
-	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
-
-	*cs++ = MI_FLUSH_DW | MI_INVALIDATE_TLB |
-		MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
-	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW;
+	*cs++ = addr | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->fence.seqno;
 
 	for (i = 0; i < GEN7_XCS_WA; i++) {
-		*cs++ = MI_STORE_DWORD_INDEX;
-		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+		*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+		*cs++ = 0;
+		*cs++ = addr;
 		*cs++ = rq->fence.seqno;
 	}
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 09/36] drm/i915: Support inter-engine semaphores on gen6/7
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (6 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 08/36] drm/i915/gt: Use client timeline address for seqno writes Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 10/36] drm/i915/gt: Infrastructure for ring scheduling Chris Wilson
                   ` (34 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Should we gain per-client timelines, we can then utilise the separate
HWSP in order to use MI_SEMAPHORE_MBOX with the unique GGTT addresses
required for synchronising between clients across different engines.
Teach the emit_semaphore_wait about MI_SEMAPHORE_MBOX for the older
generations.

Note that the engine must still indicate support for the semaphore
synchronisation before the context is allowed to use them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c | 31 +++++++++++++++++++----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index c5d7220de529..9537e30f9439 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1016,7 +1016,7 @@ __emit_semaphore_wait(struct i915_request *to,
 	int len, err;
 	u32 *cs;
 
-	GEM_BUG_ON(INTEL_GEN(to->i915) < 8);
+	GEM_BUG_ON(INTEL_GEN(to->i915) < 6);
 	GEM_BUG_ON(i915_request_has_initial_breadcrumb(to));
 
 	/* We need to pin the signaler's HWSP until we are finished reading. */
@@ -1040,17 +1040,26 @@ __emit_semaphore_wait(struct i915_request *to,
 	 * (post-wrap) values than they were expecting (and so wait
 	 * forever).
 	 */
-	*cs++ = (MI_SEMAPHORE_WAIT |
-		 MI_SEMAPHORE_GLOBAL_GTT |
-		 MI_SEMAPHORE_POLL |
-		 MI_SEMAPHORE_SAD_GTE_SDD) +
-		has_token;
-	*cs++ = seqno;
-	*cs++ = hwsp_offset;
-	*cs++ = 0;
-	if (has_token) {
+	if (INTEL_GEN(to->i915) >= 8) {
+		*cs++ = (MI_SEMAPHORE_WAIT |
+			 MI_SEMAPHORE_GLOBAL_GTT |
+			 MI_SEMAPHORE_POLL |
+			 MI_SEMAPHORE_SAD_GTE_SDD) +
+			has_token;
+		*cs++ = seqno;
+		*cs++ = hwsp_offset;
+		*cs++ = 0;
+		if (has_token) {
+			*cs++ = 0;
+			*cs++ = MI_NOOP;
+		}
+	} else {
+		*cs++ = (MI_SEMAPHORE_MBOX |
+			 MI_SEMAPHORE_GLOBAL_GTT |
+			 MI_SEMAPHORE_COMPARE);
+		*cs++ = seqno - 1; /* COMPARE is a strict greater-than */
+		*cs++ = hwsp_offset;
 		*cs++ = 0;
-		*cs++ = MI_NOOP;
 	}
 
 	intel_ring_advance(to, cs);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 10/36] drm/i915/gt: Infrastructure for ring scheduling
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (7 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 09/36] drm/i915: Support inter-engine semaphores on gen6/7 Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 11/36] drm/i915/gt: Enable busy-stats for ring-scheduler Chris Wilson
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Build a bare bones scheduler to sit on top the global legacy ringbuffer
submission. This virtual execlists scheme should be applicable to all
older platforms.

A key problem we have with the legacy ring buffer submission is that it
only allows for FIFO queuing. All clients share the global request queue
and must contend for its lock when submitting. As any client may need to
wait for external events, all clients must then wait. However, if we
stage each client into their own virtual ringbuffer with their own
timelines, we can copy the client requests into the global ringbuffer
only when they are ready, reordering the submission around stalls.
Furthermore, the ability to reorder gives us rudimentarily priority
sorting -- although without preemption support, once something is on the
GPU it stays on the GPU, and so it is still possible for a hog to delay
a high priority request (such as updating the display). However, it does
means that in keeping a short submission queue, the high priority
request will be next. This design resembles the old guc submission
scheduler, for reordering requests onto a global workqueue.

The implementation uses the MI_USER_INTERRUPT at the end of every
request to track completion, so is more interrupt happy than execlists
[which has an interrupt for each context event, albeit two]. Our
interrupts on these system are relatively heavy, and in the past we have
been able to completely starve Sandybrige by the interrupt traffic. Our
interrupt handlers are being much better (in part offloading the work to
bottom halves leaving the interrupt itself only dealing with acking the
registers) but we can still see the impact of starvation in the uneven
submission latency on a saturated system.

Overall though, the short sumission queues and extra interrupts do not
appear to be affecting throughput (+-10%, some tasks even improve to the
reduced request overheads) and improve latency. [Which is a massive
improvement since the introduction of Sandybridge!]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/gt/intel_engine.h        |   1 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   1 +
 .../gpu/drm/i915/gt/intel_ring_scheduler.c    | 760 ++++++++++++++++++
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  13 +-
 .../gpu/drm/i915/gt/intel_ring_submission.h   |  16 +
 6 files changed, 786 insertions(+), 6 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_ring_submission.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 41a27fd5dbc7..6d98a74da41e 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -109,6 +109,7 @@ gt-y += \
 	gt/intel_renderstate.o \
 	gt/intel_reset.o \
 	gt/intel_ring.o \
+	gt/intel_ring_scheduler.o \
 	gt/intel_ring_submission.o \
 	gt/intel_rps.o \
 	gt/intel_sseu.o \
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 043462b6ce1f..08176117757e 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -209,6 +209,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine);
 int intel_engine_resume(struct intel_engine_cs *engine);
 
 int intel_ring_submission_setup(struct intel_engine_cs *engine);
+int intel_ring_scheduler_setup(struct intel_engine_cs *engine);
 
 int intel_engine_stop_cs(struct intel_engine_cs *engine);
 void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 2b6cdf47d428..3782e27c2945 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -348,6 +348,7 @@ struct intel_engine_cs {
 	struct {
 		struct intel_ring *ring;
 		struct intel_timeline *timeline;
+		struct intel_context *context;
 	} legacy;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
new file mode 100644
index 000000000000..c8cd435d1c51
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
@@ -0,0 +1,760 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include <linux/log2.h>
+
+#include <drm/i915_drm.h>
+
+#include "i915_drv.h"
+#include "intel_context.h"
+#include "intel_gt.h"
+#include "intel_gt_pm.h"
+#include "intel_gt_requests.h"
+#include "intel_reset.h"
+#include "intel_ring.h"
+#include "intel_ring_submission.h"
+#include "shmem_utils.h"
+
+/*
+ * Rough estimate of the typical request size, performing a flush,
+ * set-context and then emitting the batch.
+ */
+#define LEGACY_REQUEST_SIZE 200
+
+static inline int rq_prio(const struct i915_request *rq)
+{
+	return rq->sched.attr.priority;
+}
+
+static inline struct i915_priolist *to_priolist(struct rb_node *rb)
+{
+	return rb_entry(rb, struct i915_priolist, node);
+}
+
+static inline int queue_prio(struct rb_node *rb)
+{
+	return rb ? to_priolist(rb)->priority : INT_MIN;
+}
+
+static inline bool reset_in_progress(const struct intel_engine_execlists *el)
+{
+	return unlikely(!__tasklet_is_enabled(&el->tasklet));
+}
+
+static void
+set_current_context(struct intel_context **ptr, struct intel_context *ce)
+{
+	if (ce)
+		intel_context_get(ce);
+
+	ce = xchg(ptr, ce);
+
+	if (ce)
+		intel_context_put(ce);
+}
+
+static struct i915_request *
+schedule_in(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	__intel_gt_pm_get(engine->gt);
+	return i915_request_get(rq);
+}
+
+static void
+schedule_out(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	if (list_is_last_rcu(&rq->link, &ce->timeline->requests))
+		intel_engine_add_retire(engine, ce->timeline);
+
+	i915_request_put(rq);
+	intel_gt_pm_put_async(engine->gt);
+}
+
+static void reset_prepare(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const el = &engine->execlists;
+	unsigned long flags;
+
+	GEM_TRACE("%s\n", engine->name);
+
+	__tasklet_disable_sync_once(&el->tasklet);
+	GEM_BUG_ON(!reset_in_progress(el));
+
+	/* And flush any current direct submission. */
+	spin_lock_irqsave(&engine->active.lock, flags);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
+
+	intel_ring_submission_reset_prepare(engine);
+}
+
+static void reset_queue_priority(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const el = &engine->execlists;
+
+	el->queue_priority_hint = queue_prio(rb_first_cached(&el->queue));
+}
+
+static struct i915_request *
+__unwind_incomplete_requests(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq, *rn, *active = NULL;
+	struct list_head *uninitialized_var(pl);
+	int prio = I915_PRIORITY_INVALID;
+
+	lockdep_assert_held(&engine->active.lock);
+
+	list_for_each_entry_safe_reverse(rq, rn,
+					 &engine->active.requests,
+					 sched.link) {
+		if (i915_request_completed(rq))
+			break;
+
+		__i915_request_unsubmit(rq);
+
+		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+		if (rq_prio(rq) != prio) {
+			prio = rq_prio(rq);
+			pl = i915_sched_lookup_priolist(engine, prio);
+		}
+		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+
+		list_move(&rq->sched.link, pl);
+		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+
+		active = rq;
+	}
+
+	reset_queue_priority(engine);
+
+	return active;
+}
+
+static inline void clear_ports(struct i915_request **ports, int count)
+{
+	memset_p((void **)ports, NULL, count);
+}
+
+static void cancel_port_requests(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const el = &engine->execlists;
+	struct i915_request * const *port;
+
+	clear_ports(el->pending, ARRAY_SIZE(el->pending));
+	for (port = xchg(&el->active, el->pending); *port; port++)
+		schedule_out(engine, *port);
+	clear_ports(el->inflight, ARRAY_SIZE(el->inflight));
+
+	smp_wmb(); /* complete the seqlock for execlists_active() */
+	WRITE_ONCE(el->active, el->inflight);
+}
+
+static void __ring_rewind(struct intel_engine_cs *engine, bool stalled)
+{
+	struct i915_request *rq;
+
+	rq = __unwind_incomplete_requests(engine);
+	if (rq && i915_request_started(rq))
+		__i915_request_reset(rq, stalled);
+
+	cancel_port_requests(engine);
+
+	/* Clear the global submission state, we will submit from scratch */
+	intel_ring_reset(engine->legacy.ring, 0);
+	set_current_context(&engine->legacy.context, NULL);
+}
+
+static void ring_reset_rewind(struct intel_engine_cs *engine, bool stalled)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&engine->active.lock, flags);
+	__ring_rewind(engine, stalled);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
+}
+
+static void nop_submission_tasklet(unsigned long data)
+{
+	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
+
+	/* The driver is wedged; don't process any more events. */
+	WRITE_ONCE(engine->execlists.queue_priority_hint, INT_MIN);
+}
+
+static void ring_reset_cancel(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const el = &engine->execlists;
+	struct i915_request *rq, *rn;
+	unsigned long flags;
+	struct rb_node *rb;
+
+	spin_lock_irqsave(&engine->active.lock, flags);
+
+	__ring_rewind(engine, true);
+
+	/* Mark all submitted requests as skipped. */
+	list_for_each_entry(rq, &engine->active.requests, sched.link) {
+		i915_request_set_error_once(rq, -EIO);
+		i915_request_mark_complete(rq);
+	}
+
+	/* Flush the queued requests to the timeline list (for retiring). */
+	while ((rb = rb_first_cached(&el->queue))) {
+		struct i915_priolist *p = to_priolist(rb);
+		int i;
+
+		priolist_for_each_request_consume(rq, rn, p, i) {
+			i915_request_set_error_once(rq, -EIO);
+			i915_request_mark_complete(rq);
+			__i915_request_submit(rq);
+		}
+
+		rb_erase_cached(&p->node, &el->queue);
+		i915_priolist_free(p);
+	}
+
+	el->queue_priority_hint = INT_MIN;
+	el->queue = RB_ROOT_CACHED;
+
+	/* Remaining _unready_ requests will be nop'ed when submitted */
+
+	GEM_BUG_ON(__tasklet_is_enabled(&el->tasklet));
+	el->tasklet.func = nop_submission_tasklet;
+
+	spin_unlock_irqrestore(&engine->active.lock, flags);
+}
+
+static void reset_finish(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const el = &engine->execlists;
+
+	intel_ring_submission_reset_finish(engine);
+
+	if (__tasklet_enable(&el->tasklet))
+		tasklet_hi_schedule(&el->tasklet);
+}
+
+static u32 *ring_map(struct intel_ring *ring, u32 len)
+{
+	u32 *va;
+
+	if (unlikely(ring->tail + len > ring->effective_size)) {
+		memset(ring->vaddr + ring->tail, 0, ring->size - ring->tail);
+		ring->tail = 0;
+	}
+
+	va = ring->vaddr + ring->tail;
+	ring->tail = intel_ring_wrap(ring, ring->tail + len);
+
+	return va;
+}
+
+static inline u32 *ring_map_dw(struct intel_ring *ring, u32 len)
+{
+	return ring_map(ring, len * sizeof(u32));
+}
+
+static void ring_copy(struct intel_ring *dst,
+		      const struct intel_ring *src,
+		      u32 start, u32 end)
+{
+	unsigned int len;
+	void *out;
+
+	len = end - start;
+	if (end < start)
+		len += src->size;
+	out = ring_map(dst, len);
+
+	if (end < start) {
+		len = src->size - start;
+		memcpy(out, src->vaddr + start, len);
+		out += len;
+		start = 0;
+	}
+
+	memcpy(out, src->vaddr + start, end - start);
+}
+
+static void switch_context(struct intel_ring *ring, struct i915_request *rq)
+{
+}
+
+static struct i915_request *ring_submit(struct i915_request *rq)
+{
+	struct intel_ring *ring = rq->engine->legacy.ring;
+
+	__i915_request_submit(rq);
+
+	if (rq->engine->legacy.context != rq->context) {
+		switch_context(ring, rq);
+		set_current_context(&rq->engine->legacy.context, rq->context);
+	}
+
+	ring_copy(ring, rq->ring, rq->head, rq->tail);
+	return rq;
+}
+
+static struct i915_request **
+copy_active(struct i915_request **port, struct i915_request * const *active)
+{
+	while (*active)
+		*port++ = *active++;
+
+	return port;
+}
+
+static void __dequeue(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const el = &engine->execlists;
+	struct i915_request ** const last_port = el->pending + el->port_mask;
+	struct i915_request **port, *last;
+	struct rb_node *rb;
+
+	lockdep_assert_held(&engine->active.lock);
+
+	port = copy_active(el->pending, el->active);
+	if (port > last_port)
+		return;
+
+	last = NULL;
+	while ((rb = rb_first_cached(&el->queue))) {
+		struct i915_priolist *p = to_priolist(rb);
+		struct i915_request *rq, *rn;
+		int i;
+
+		priolist_for_each_request_consume(rq, rn, p, i) {
+			GEM_BUG_ON(rq == last);
+			if (last && rq->context != last->context) {
+				if (port == last_port)
+					goto done;
+
+				*port++ = schedule_in(engine, last);
+			}
+
+			last = ring_submit(rq);
+		}
+
+		rb_erase_cached(&p->node, &el->queue);
+		i915_priolist_free(p);
+	}
+
+done:
+	el->queue_priority_hint = queue_prio(rb);
+	if (last) {
+		*port++ = schedule_in(engine, last);
+		*port++ = NULL;
+		WRITE_ONCE(el->active, el->pending);
+
+		wmb(); /* paranoid flush of WCB before RING_TAIL write */
+		ENGINE_WRITE(engine, RING_TAIL, engine->legacy.ring->tail);
+		memcpy(el->inflight, el->pending,
+		       (port - el->pending) * sizeof(*port));
+
+		WRITE_ONCE(el->active, el->inflight);
+		GEM_BUG_ON(!*el->active);
+	}
+}
+
+static void __submission_tasklet(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const el = &engine->execlists;
+	struct i915_request *rq;
+
+	while ((rq = *el->active)) {
+		if (!i915_request_completed(rq))
+			break;
+
+		schedule_out(engine, rq);
+		el->active++;
+	}
+
+	if (el->queue_priority_hint != INT_MIN)
+		__dequeue(engine);
+}
+
+static void submission_tasklet(unsigned long data)
+{
+	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
+	unsigned long flags;
+
+	spin_lock_irqsave(&engine->active.lock, flags);
+	__submission_tasklet(engine);
+	spin_unlock_irqrestore(&engine->active.lock, flags);
+}
+
+static void queue_request(struct intel_engine_cs *engine,
+			  struct i915_request *rq)
+{
+	GEM_BUG_ON(!list_empty(&rq->sched.link));
+	list_add_tail(&rq->sched.link,
+		      i915_sched_lookup_priolist(engine, rq_prio(rq)));
+	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+}
+
+static void __submit_queue_imm(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const el = &engine->execlists;
+
+	if (reset_in_progress(el))
+		return; /* defer until we restart the engine following reset */
+
+	__submission_tasklet(engine);
+}
+
+static void submit_queue(struct intel_engine_cs *engine,
+			 const struct i915_request *rq)
+{
+	struct intel_engine_execlists *el = &engine->execlists;
+
+	if (rq_prio(rq) <= el->queue_priority_hint)
+		return;
+
+	el->queue_priority_hint = rq_prio(rq);
+	__submit_queue_imm(engine);
+}
+
+static void submit_request(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine = rq->engine;
+	unsigned long flags;
+
+	spin_lock_irqsave(&engine->active.lock, flags);
+
+	queue_request(engine, rq);
+
+	GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+	GEM_BUG_ON(list_empty(&rq->sched.link));
+
+	submit_queue(engine, rq);
+
+	spin_unlock_irqrestore(&engine->active.lock, flags);
+}
+
+static void submission_park(struct intel_engine_cs *engine)
+{
+	GEM_BUG_ON(engine->execlists.queue_priority_hint != INT_MIN);
+	intel_engine_unpin_breadcrumbs_irq(engine);
+	submission_tasklet((unsigned long)engine); /* drain the submit queue */
+}
+
+static void submission_unpark(struct intel_engine_cs *engine)
+{
+	intel_engine_pin_breadcrumbs_irq(engine);
+}
+
+static void ring_context_destroy(struct kref *ref)
+{
+	struct intel_context *ce = container_of(ref, typeof(*ce), ref);
+
+	GEM_BUG_ON(intel_context_is_pinned(ce));
+
+	if (ce->state)
+		i915_vma_put(ce->state);
+	if (test_bit(CONTEXT_ALLOC_BIT, &ce->flags))
+		intel_ring_put(ce->ring);
+
+	intel_context_fini(ce);
+	intel_context_free(ce);
+}
+
+static void ring_context_unpin(struct intel_context *ce)
+{
+}
+
+static int alloc_context_vma(struct intel_context *ce)
+
+{
+	struct intel_engine_cs *engine = ce->engine;
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int err;
+
+	obj = i915_gem_object_create_shmem(engine->i915, engine->context_size);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	/*
+	 * Try to make the context utilize L3 as well as LLC.
+	 *
+	 * On VLV we don't have L3 controls in the PTEs so we
+	 * shouldn't touch the cache level, especially as that
+	 * would make the object snooped which might have a
+	 * negative performance impact.
+	 *
+	 * Snooping is required on non-llc platforms in execlist
+	 * mode, but since all GGTT accesses use PAT entry 0 we
+	 * get snooping anyway regardless of cache_level.
+	 *
+	 * This is only applicable for Ivy Bridge devices since
+	 * later platforms don't have L3 control bits in the PTE.
+	 */
+	if (IS_IVYBRIDGE(engine->i915))
+		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
+
+	if (engine->default_state) {
+		void *vaddr;
+
+		vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
+		if (IS_ERR(vaddr)) {
+			err = PTR_ERR(vaddr);
+			goto err_obj;
+		}
+
+		shmem_read(engine->default_state, 0,
+			   vaddr, engine->context_size);
+		__set_bit(CONTEXT_VALID_BIT, &ce->flags);
+
+		i915_gem_object_flush_map(obj);
+		i915_gem_object_unpin_map(obj);
+	}
+
+	vma = i915_vma_instance(obj, &engine->gt->ggtt->vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_obj;
+	}
+
+	ce->state = vma;
+	return 0;
+
+err_obj:
+	i915_gem_object_put(obj);
+	return err;
+}
+
+static int alloc_timeline(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+	struct intel_timeline *tl;
+	struct i915_vma *hwsp;
+
+	/*
+	 * Use the static global HWSP for the kernel context, and
+	 * a dynamically allocated cacheline for everyone else.
+	 */
+	hwsp = NULL;
+	if (unlikely(intel_context_is_barrier(ce)))
+		hwsp = engine->status_page.vma;
+
+	tl = intel_timeline_create(engine->gt, hwsp);
+	if (IS_ERR(tl))
+		return PTR_ERR(tl);
+
+	ce->timeline = tl;
+	return 0;
+}
+
+static int ring_context_alloc(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+	struct intel_ring *ring;
+	int err;
+
+	GEM_BUG_ON(ce->state);
+	if (engine->context_size) {
+		err = alloc_context_vma(ce);
+		if (err)
+			return err;
+	}
+
+	if (!ce->timeline) {
+		err = alloc_timeline(ce);
+		if (err)
+			goto err_vma;
+	}
+
+	ring = intel_engine_create_ring(engine,
+					(unsigned long)ce->ring |
+					INTEL_RING_CREATE_INTERNAL);
+	if (IS_ERR(ring)) {
+		err = PTR_ERR(ring);
+		goto err_timeline;
+	}
+	ce->ring = ring;
+
+	return 0;
+
+err_timeline:
+	intel_timeline_put(ce->timeline);
+err_vma:
+	if (ce->state) {
+		i915_vma_put(ce->state);
+		ce->state = NULL;
+	}
+	return err;
+}
+
+static int ring_context_pin(struct intel_context *ce)
+{
+	return 0;
+}
+
+static void ring_context_reset(struct intel_context *ce)
+{
+	intel_ring_reset(ce->ring, 0);
+	clear_bit(CONTEXT_VALID_BIT, &ce->flags);
+}
+
+static const struct intel_context_ops ring_context_ops = {
+	.alloc = ring_context_alloc,
+
+	.pin = ring_context_pin,
+	.unpin = ring_context_unpin,
+
+	.enter = intel_context_enter_engine,
+	.exit = intel_context_exit_engine,
+
+	.reset = ring_context_reset,
+	.destroy = ring_context_destroy,
+};
+
+static int ring_request_alloc(struct i915_request *rq)
+{
+	int ret;
+
+	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
+
+	/*
+	 * Flush enough space to reduce the likelihood of waiting after
+	 * we start building the request - in which case we will just
+	 * have to repeat work.
+	 */
+	rq->reserved_space += LEGACY_REQUEST_SIZE;
+
+	/* Unconditionally invalidate GPU caches and TLBs. */
+	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
+	if (ret)
+		return ret;
+
+	rq->reserved_space -= LEGACY_REQUEST_SIZE;
+	return 0;
+}
+
+static void set_default_submission(struct intel_engine_cs *engine)
+{
+	engine->submit_request = submit_request;
+	engine->execlists.tasklet.func = submission_tasklet;
+}
+
+static void ring_release(struct intel_engine_cs *engine)
+{
+	intel_engine_cleanup_common(engine);
+
+	set_current_context(&engine->legacy.context, NULL);
+
+	intel_ring_unpin(engine->legacy.ring);
+	intel_ring_put(engine->legacy.ring);
+}
+
+static void setup_irq(struct intel_engine_cs *engine)
+{
+}
+
+static void setup_common(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *i915 = engine->i915;
+
+	/* gen8+ are only supported with execlists */
+	GEM_BUG_ON(INTEL_GEN(i915) >= 8);
+	GEM_BUG_ON(INTEL_GEN(i915) < 8);
+
+	setup_irq(engine);
+
+	engine->park = submission_park;
+	engine->unpark = submission_unpark;
+	engine->schedule = i915_schedule;
+
+	engine->resume = intel_ring_submission_resume;
+	engine->reset.prepare = reset_prepare;
+	engine->reset.rewind = ring_reset_rewind;
+	engine->reset.cancel = ring_reset_cancel;
+	engine->reset.finish = reset_finish;
+
+	engine->cops = &ring_context_ops;
+	engine->request_alloc = ring_request_alloc;
+
+	engine->set_default_submission = set_default_submission;
+}
+
+static void setup_rcs(struct intel_engine_cs *engine)
+{
+}
+
+static void setup_vcs(struct intel_engine_cs *engine)
+{
+}
+
+static void setup_bcs(struct intel_engine_cs *engine)
+{
+}
+
+static void setup_vecs(struct intel_engine_cs *engine)
+{
+	GEM_BUG_ON(!IS_HASWELL(engine->i915));
+}
+
+static unsigned int global_ring_size(void)
+{
+	/* Enough space to hold 2 clients and the context switch */
+	return roundup_pow_of_two(EXECLIST_MAX_PORTS * SZ_16K + SZ_4K);
+}
+
+int intel_ring_scheduler_setup(struct intel_engine_cs *engine)
+{
+	struct intel_ring *ring;
+	int err;
+
+	GEM_BUG_ON(HAS_EXECLISTS(engine->i915));
+
+	tasklet_init(&engine->execlists.tasklet,
+		     submission_tasklet, (unsigned long)engine);
+
+	setup_common(engine);
+
+	switch (engine->class) {
+	case RENDER_CLASS:
+		setup_rcs(engine);
+		break;
+	case VIDEO_DECODE_CLASS:
+		setup_vcs(engine);
+		break;
+	case COPY_ENGINE_CLASS:
+		setup_bcs(engine);
+		break;
+	case VIDEO_ENHANCEMENT_CLASS:
+		setup_vecs(engine);
+		break;
+	default:
+		MISSING_CASE(engine->class);
+		return -ENODEV;
+	}
+
+	ring = intel_engine_create_ring(engine, global_ring_size());
+	if (IS_ERR(ring)) {
+		err = PTR_ERR(ring);
+		goto err;
+	}
+
+	err = intel_ring_pin(ring);
+	if (err)
+		goto err_ring;
+
+	GEM_BUG_ON(engine->legacy.ring);
+	engine->legacy.ring = ring;
+
+	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
+
+	/* Finally, take ownership and responsibility for cleanup! */
+	engine->release = ring_release;
+	return 0;
+
+err_ring:
+	intel_ring_put(ring);
+err:
+	intel_engine_cleanup_common(engine);
+	return err;
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index d9c1701061b9..bb1ed29cf753 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -36,6 +36,7 @@
 #include "intel_gt.h"
 #include "intel_reset.h"
 #include "intel_ring.h"
+#include "intel_ring_submission.h"
 #include "shmem_utils.h"
 
 /* Rough estimate of the typical request size, performing a flush,
@@ -214,7 +215,7 @@ static void set_pp_dir(struct intel_engine_cs *engine)
 	}
 }
 
-static int xcs_resume(struct intel_engine_cs *engine)
+int intel_ring_submission_resume(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 	struct intel_ring *ring = engine->legacy.ring;
@@ -318,7 +319,7 @@ static int xcs_resume(struct intel_engine_cs *engine)
 	return ret;
 }
 
-static void reset_prepare(struct intel_engine_cs *engine)
+void intel_ring_submission_reset_prepare(struct intel_engine_cs *engine)
 {
 	struct intel_uncore *uncore = engine->uncore;
 	const u32 base = engine->mmio_base;
@@ -425,7 +426,7 @@ static void reset_rewind(struct intel_engine_cs *engine, bool stalled)
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
-static void reset_finish(struct intel_engine_cs *engine)
+void intel_ring_submission_reset_finish(struct intel_engine_cs *engine)
 {
 }
 
@@ -1056,11 +1057,11 @@ static void setup_common(struct intel_engine_cs *engine)
 
 	setup_irq(engine);
 
-	engine->resume = xcs_resume;
-	engine->reset.prepare = reset_prepare;
+	engine->resume = intel_ring_submission_resume;
+	engine->reset.prepare = intel_ring_submission_reset_prepare;
 	engine->reset.rewind = reset_rewind;
 	engine->reset.cancel = reset_cancel;
-	engine->reset.finish = reset_finish;
+	engine->reset.finish = intel_ring_submission_reset_finish;
 
 	engine->cops = &ring_context_ops;
 	engine->request_alloc = ring_request_alloc;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.h b/drivers/gpu/drm/i915/gt/intel_ring_submission.h
new file mode 100644
index 000000000000..701eb033e055
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#ifndef __INTEL_RING_SUBMISSION_H__
+#define __INTEL_RING_SUBMISSION_H__
+
+struct intel_engine_cs;
+
+void intel_ring_submission_reset_prepare(struct intel_engine_cs *engine);
+void intel_ring_submission_reset_finish(struct intel_engine_cs *engine);
+
+int intel_ring_submission_resume(struct intel_engine_cs *engine);
+
+#endif /* __INTEL_RING_SUBMISSION_H__ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 11/36] drm/i915/gt: Enable busy-stats for ring-scheduler
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (8 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 10/36] drm/i915/gt: Infrastructure for ring scheduling Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  8:03   ` [Intel-gfx] [PATCH] " Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 12/36] drm/i915/gt: Track if an engine requires forcewake w/a Chris Wilson
                   ` (32 subsequent siblings)
  42 siblings, 1 reply; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Couple up the context in/out accounting to record how long each engine
is busy handling requests. This is exposed to userspace for more accurate
measurements, and also enables our soft-rps timer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_stats.h  | 49 +++++++++++
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 34 +-------
 .../gpu/drm/i915/gt/intel_ring_scheduler.c    |  4 +
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c  | 86 +++++++++++++++++++
 drivers/gpu/drm/i915/gt/selftest_rps.c        |  5 ++
 5 files changed, 145 insertions(+), 33 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_stats.h

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_stats.h b/drivers/gpu/drm/i915/gt/intel_engine_stats.h
new file mode 100644
index 000000000000..58491eae3482
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_engine_stats.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#ifndef __INTEL_ENGINE_STATS_H__
+#define __INTEL_ENGINE_STATS_H__
+
+#include <linux/atomic.h>
+#include <linux/ktime.h>
+#include <linux/seqlock.h>
+
+#include "i915_gem.h" /* GEM_BUG_ON */
+#include "intel_engine.h"
+
+static inline void intel_engine_context_in(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (atomic_add_unless(&engine->stats.active, 1, 0))
+		return;
+
+	write_seqlock_irqsave(&engine->stats.lock, flags);
+	if (!atomic_add_unless(&engine->stats.active, 1, 0)) {
+		engine->stats.start = ktime_get();
+		atomic_inc(&engine->stats.active);
+	}
+	write_sequnlock_irqrestore(&engine->stats.lock, flags);
+}
+
+static inline void intel_engine_context_out(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	GEM_BUG_ON(!atomic_read(&engine->stats.active));
+
+	if (atomic_add_unless(&engine->stats.active, -1, 1))
+		return;
+
+	write_seqlock_irqsave(&engine->stats.lock, flags);
+	if (atomic_dec_and_test(&engine->stats.active)) {
+		engine->stats.total =
+			ktime_add(engine->stats.total,
+				  ktime_sub(ktime_get(), engine->stats.start));
+	}
+	write_sequnlock_irqrestore(&engine->stats.lock, flags);
+}
+
+#endif /* __INTEL_ENGINE_STATS_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 6fc0966b75ff..13ef4f58cb08 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -139,6 +139,7 @@
 #include "i915_vgpu.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
+#include "intel_engine_stats.h"
 #include "intel_gt.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
@@ -1187,39 +1188,6 @@ execlists_context_status_change(struct i915_request *rq, unsigned long status)
 				   status, rq);
 }
 
-static void intel_engine_context_in(struct intel_engine_cs *engine)
-{
-	unsigned long flags;
-
-	if (atomic_add_unless(&engine->stats.active, 1, 0))
-		return;
-
-	write_seqlock_irqsave(&engine->stats.lock, flags);
-	if (!atomic_add_unless(&engine->stats.active, 1, 0)) {
-		engine->stats.start = ktime_get();
-		atomic_inc(&engine->stats.active);
-	}
-	write_sequnlock_irqrestore(&engine->stats.lock, flags);
-}
-
-static void intel_engine_context_out(struct intel_engine_cs *engine)
-{
-	unsigned long flags;
-
-	GEM_BUG_ON(!atomic_read(&engine->stats.active));
-
-	if (atomic_add_unless(&engine->stats.active, -1, 1))
-		return;
-
-	write_seqlock_irqsave(&engine->stats.lock, flags);
-	if (atomic_dec_and_test(&engine->stats.active)) {
-		engine->stats.total =
-			ktime_add(engine->stats.total,
-				  ktime_sub(ktime_get(), engine->stats.start));
-	}
-	write_sequnlock_irqrestore(&engine->stats.lock, flags);
-}
-
 static void
 execlists_check_context(const struct intel_context *ce,
 			const struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
index c8cd435d1c51..aaff554865b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
@@ -9,6 +9,7 @@
 
 #include "i915_drv.h"
 #include "intel_context.h"
+#include "intel_engine_stats.h"
 #include "intel_gt.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
@@ -59,6 +60,7 @@ static struct i915_request *
 schedule_in(struct intel_engine_cs *engine, struct i915_request *rq)
 {
 	__intel_gt_pm_get(engine->gt);
+	intel_engine_context_in(engine);
 	return i915_request_get(rq);
 }
 
@@ -71,6 +73,7 @@ schedule_out(struct intel_engine_cs *engine, struct i915_request *rq)
 		intel_engine_add_retire(engine, ce->timeline);
 
 	i915_request_put(rq);
+	intel_engine_context_out(engine);
 	intel_gt_pm_put_async(engine->gt);
 }
 
@@ -747,6 +750,7 @@ int intel_ring_scheduler_setup(struct intel_engine_cs *engine)
 	engine->legacy.ring = ring;
 
 	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
+	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 
 	/* Finally, take ownership and responsibility for cleanup! */
 	engine->release = ring_release;
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
index cbf6b0735272..5e48c7571b4d 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
@@ -7,6 +7,91 @@
 #include "i915_selftest.h"
 #include "selftest_engine.h"
 #include "selftests/igt_atomic.h"
+#include "selftests/igt_flush_test.h"
+#include "selftests/igt_spinner.h"
+
+static int live_engine_busy_stats(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct igt_spinner spin;
+	int err;
+
+	/*
+	 * Check that if an engine supports busy-stats, they tell the truth.
+	 */
+
+	if (igt_spinner_init(&spin, gt))
+		return -ENOMEM;
+
+	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
+	for_each_engine(engine, gt, id) {
+		struct i915_request *rq;
+		ktime_t dt, de;
+
+		if (!intel_engine_supports_stats(engine))
+			continue;
+
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		if (intel_gt_pm_wait_for_idle(gt)) {
+			err = -EBUSY;
+			break;
+		}
+
+		dt = ktime_get();
+		de = intel_engine_get_busy_time(engine);
+		usleep_range(100, 200);
+		dt = ktime_sub(ktime_get(), dt);
+		de = ktime_sub(intel_engine_get_busy_time(engine), de);
+		if (de > 10) {
+			pr_err("%s: reported %lldns [%d%%] busyness while sleeping [for %lldns]\n",
+			       engine->name,
+			       de, (int)div64_u64(100 * de, dt), dt);
+			err = -EINVAL;
+			break;
+		}
+
+		rq = igt_spinner_create_request(&spin,
+						engine->kernel_context,
+						MI_NOOP);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			break;
+		}
+		i915_request_add(rq);
+
+		if (!igt_wait_for_spinner(&spin, rq)) {
+			intel_gt_set_wedged(engine->gt);
+			err = -ETIME;
+			break;
+		}
+
+		dt = ktime_get();
+		de = intel_engine_get_busy_time(engine);
+		usleep_range(100, 200);
+		dt = ktime_sub(ktime_get(), dt);
+		de = ktime_sub(intel_engine_get_busy_time(engine), de);
+		if (100 * de < 95 * dt) {
+			pr_err("%s: reported only %lldns [%d%%] busyness while spinning [for %lldns]\n",
+			       engine->name,
+			       de, (int)div64_u64(100 * de, dt), dt);
+			err = -EINVAL;
+			break;
+		}
+
+		igt_spinner_end(&spin);
+		if (igt_flush_test(gt->i915)) {
+			err = -EIO;
+			break;
+		}
+	}
+
+	igt_spinner_fini(&spin);
+	return err;
+}
 
 static int live_engine_pm(void *arg)
 {
@@ -77,6 +162,7 @@ static int live_engine_pm(void *arg)
 int live_engine_pm_selftests(struct intel_gt *gt)
 {
 	static const struct i915_subtest tests[] = {
+		SUBTEST(live_engine_busy_stats),
 		SUBTEST(live_engine_pm),
 	};
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
index 5049c3dd08a6..5e364fb31aea 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rps.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
@@ -1252,6 +1252,11 @@ int live_rps_dynamic(void *arg)
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
+	if (intel_rps_has_interrupts(rps))
+		pr_info("RPS has interrupt support\n");
+	if (intel_rps_uses_timer(rps))
+		pr_info("RPS has timer support\n");
+
 	for_each_engine(engine, gt, id) {
 		struct i915_request *rq;
 		struct {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 12/36] drm/i915/gt: Track if an engine requires forcewake w/a
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (9 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 11/36] drm/i915/gt: Enable busy-stats for ring-scheduler Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01 12:17   ` Mika Kuoppala
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 13/36] drm/i915: Relinquish forcewake immediately after manual grouping Chris Wilson
                   ` (31 subsequent siblings)
  42 siblings, 1 reply; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Sometimes an engine might need to keep forcewake active while it is busy
submitting requests for a particular workaround. Track such nuisance
with engine->fw_domain.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h   | 9 +++++++++
 drivers/gpu/drm/i915/gt/intel_ring_scheduler.c | 4 ++++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 3782e27c2945..ccdd69923793 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -313,6 +313,15 @@ struct intel_engine_cs {
 	u32 context_size;
 	u32 mmio_base;
 
+	/*
+	 * Some w/a require forcewake to be held (which prevents RC6) while
+	 * a particular engine is active. If so, we set fw_domain to which
+	 * domains need to be held for the duration of request activity,
+	 * and 0 if none.
+	 */
+	unsigned int fw_domain;
+	unsigned int fw_active;
+
 	unsigned long context_tag;
 
 	struct rb_node uabi_node;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
index aaff554865b1..777cab6d9540 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
@@ -60,6 +60,8 @@ static struct i915_request *
 schedule_in(struct intel_engine_cs *engine, struct i915_request *rq)
 {
 	__intel_gt_pm_get(engine->gt);
+	if (!engine->fw_active++ && engine->fw_domain)
+		intel_uncore_forcewake_get(engine->uncore, engine->fw_domain);
 	intel_engine_context_in(engine);
 	return i915_request_get(rq);
 }
@@ -74,6 +76,8 @@ schedule_out(struct intel_engine_cs *engine, struct i915_request *rq)
 
 	i915_request_put(rq);
 	intel_engine_context_out(engine);
+	if (!--engine->fw_active && engine->fw_domain)
+		intel_uncore_forcewake_put(engine->uncore, engine->fw_domain);
 	intel_gt_pm_put_async(engine->gt);
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 13/36] drm/i915: Relinquish forcewake immediately after manual grouping
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (10 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 12/36] drm/i915/gt: Track if an engine requires forcewake w/a Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01 12:20   ` Mika Kuoppala
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 14/36] drm/i915/gt: Implement ring scheduler for gen6/7 Chris Wilson
                   ` (30 subsequent siblings)
  42 siblings, 1 reply; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Our forcewake utilisation is split into categories: automatic and
manual. Around bare register reads, we look up the right forcewake
domain and automatically acquire and release [upon a timer] the
forcewake domain. For other access, where we know we require the
forcewake across a group of register reads, we manually acquire the
forcewake domain and release it at the end. Again, this currently arms
the domain timer for a later release.

However, looking at some energy utilisation profiles, we have tried to
avoid using forcewake [and rely on the natural wake up to post register
updates] due to that even keep the fw active for a brief period
contributes to a significant power draw [i.e. when the gpu is sleeping
with rc6 at high clocks]. But as it turns out, not posting the writes
immediately also has unintended consequences, such as not reducing the
clocks and so conserving power while busy.

As a compromise, let us only arm the domain timer for automatic
forcewake usage around bare register access, but immediately release the
forcewake when manually acquired by intel_uncore_forcewake_get/_put.

The corollary to this is that we may instead have to take forcewake more
often, and so incur a latency penalty in doing so. For Sandybridge this
was significant, and even on the latest machines, taking forcewake at
interrupt frequency is a huge impact. [So we don't do that anymore!
Hopefully, this will spare us from still needing the mitigation of the
timer for steady state execution.]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_uncore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index a61cb8ca4d50..7d6b9ae7403c 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -709,7 +709,7 @@ static void __intel_uncore_forcewake_put(struct intel_uncore *uncore,
 			continue;
 		}
 
-		fw_domain_arm_timer(domain);
+		uncore->funcs.force_wake_put(uncore, domain->mask);
 	}
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 14/36] drm/i915/gt: Implement ring scheduler for gen6/7
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (11 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 13/36] drm/i915: Relinquish forcewake immediately after manual grouping Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 15/36] drm/i915/gt: Enable ring scheduling " Chris Wilson
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

A key prolem with legacy ring buffer submission is that it is an inheret
FIFO queue across all clients; if one blocks, they all block. A
scheduler allows us to avoid that limitation, and ensures that all
clients can submit in parallel, removing the resource contention of the
global ringbuffer.

Having built the ring scheduler infrastructure over top of the global
ringbuffer submission, we now need to provide the HW knowledge required
to build command packets and implement context switching.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gt/intel_ring_scheduler.c    | 417 +++++++++++++++++-
 1 file changed, 415 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
index 777cab6d9540..68efe63c0037 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
@@ -7,6 +7,10 @@
 
 #include <drm/i915_drm.h>
 
+#include "gen2_engine_cs.h"
+#include "gen6_engine_cs.h"
+#include "gen6_ppgtt.h"
+#include "gen7_renderclear.h"
 #include "i915_drv.h"
 #include "intel_context.h"
 #include "intel_engine_stats.h"
@@ -286,8 +290,261 @@ static void ring_copy(struct intel_ring *dst,
 	memcpy(out, src->vaddr + start, end - start);
 }
 
+static void mi_set_context(struct intel_ring *ring,
+			   struct intel_engine_cs *engine,
+			   struct intel_context *ce,
+			   u32 flags)
+{
+	struct drm_i915_private *i915 = engine->i915;
+	enum intel_engine_id id;
+	const int num_engines =
+		IS_HASWELL(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0;
+	int len;
+	u32 *cs;
+
+	len = 4;
+	if (IS_GEN(i915, 7))
+		len += 2 + (num_engines ? 4 * num_engines + 6 : 0);
+	else if (IS_GEN(i915, 5))
+		len += 2;
+
+	cs = ring_map_dw(ring, len);
+
+	/* WaProgramMiArbOnOffAroundMiSetContext:ivb,vlv,hsw,bdw,chv */
+	if (IS_GEN(i915, 7)) {
+		*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
+		if (num_engines) {
+			struct intel_engine_cs *signaller;
+
+			*cs++ = MI_LOAD_REGISTER_IMM(num_engines);
+			for_each_engine(signaller, engine->gt, id) {
+				if (signaller == engine)
+					continue;
+
+				*cs++ = i915_mmio_reg_offset(
+					   RING_PSMI_CTL(signaller->mmio_base));
+				*cs++ = _MASKED_BIT_ENABLE(
+						GEN6_PSMI_SLEEP_MSG_DISABLE);
+			}
+		}
+	} else if (IS_GEN(i915, 5)) {
+		/*
+		 * This w/a is only listed for pre-production ilk a/b steppings,
+		 * but is also mentioned for programming the powerctx. To be
+		 * safe, just apply the workaround; we do not use SyncFlush so
+		 * this should never take effect and so be a no-op!
+		 */
+		*cs++ = MI_SUSPEND_FLUSH | MI_SUSPEND_FLUSH_EN;
+	}
+
+	*cs++ = MI_NOOP;
+	*cs++ = MI_SET_CONTEXT;
+	*cs++ = i915_ggtt_offset(ce->state) | flags;
+	/*
+	 * w/a: MI_SET_CONTEXT must always be followed by MI_NOOP
+	 * WaMiSetContext_Hang:snb,ivb,vlv
+	 */
+	*cs++ = MI_NOOP;
+
+	if (IS_GEN(i915, 7)) {
+		if (num_engines) {
+			struct intel_engine_cs *signaller;
+			i915_reg_t last_reg = {}; /* keep gcc quiet */
+
+			*cs++ = MI_LOAD_REGISTER_IMM(num_engines);
+			for_each_engine(signaller, engine->gt, id) {
+				if (signaller == engine)
+					continue;
+
+				last_reg = RING_PSMI_CTL(signaller->mmio_base);
+				*cs++ = i915_mmio_reg_offset(last_reg);
+				*cs++ = _MASKED_BIT_DISABLE(
+						GEN6_PSMI_SLEEP_MSG_DISABLE);
+			}
+
+			/* Insert a delay before the next switch! */
+			*cs++ = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
+			*cs++ = i915_mmio_reg_offset(last_reg);
+			*cs++ = intel_gt_scratch_offset(engine->gt,
+							INTEL_GT_SCRATCH_FIELD_DEFAULT);
+			*cs++ = MI_NOOP;
+		}
+		*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+	} else if (IS_GEN(i915, 5)) {
+		*cs++ = MI_SUSPEND_FLUSH;
+	}
+}
+
+static struct i915_address_space *vm_alias(struct i915_address_space *vm)
+{
+	if (i915_is_ggtt(vm))
+		vm = &i915_vm_to_ggtt(vm)->alias->vm;
+
+	return vm;
+}
+
+static void load_pd_dir(struct intel_ring *ring,
+			struct intel_engine_cs *engine,
+			const struct i915_ppgtt *ppgtt)
+{
+	u32 *cs = ring_map_dw(ring, 12);
+
+	*cs++ = MI_LOAD_REGISTER_IMM(1);
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->mmio_base));
+	*cs++ = PP_DIR_DCLV_2G;
+
+	*cs++ = MI_LOAD_REGISTER_IMM(1);
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
+	*cs++ = px_base(ppgtt->pd)->ggtt_offset << 10;
+
+	/* Stall until the page table load is complete? */
+	*cs++ = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
+	*cs++ = intel_gt_scratch_offset(engine->gt,
+					INTEL_GT_SCRATCH_FIELD_DEFAULT);
+
+	*cs++ = MI_LOAD_REGISTER_IMM(1);
+	*cs++ = i915_mmio_reg_offset(RING_INSTPM(engine->mmio_base));
+	*cs++ = _MASKED_BIT_ENABLE(INSTPM_TLB_INVALIDATE);
+}
+
+static struct i915_address_space *current_vm(struct intel_engine_cs *engine)
+{
+	struct intel_context *old = engine->legacy.context;
+
+	return old ? vm_alias(old->vm) : NULL;
+}
+
+static void gen6_emit_invalidate_rcs(struct intel_ring *ring,
+				     struct intel_engine_cs *engine)
+{
+	u32 addr, flags;
+	u32 *cs;
+
+	addr = intel_gt_scratch_offset(engine->gt,
+				       INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
+
+	flags = PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
+	flags |= PIPE_CONTROL_TLB_INVALIDATE;
+
+	if (INTEL_GEN(engine->i915) >= 7)
+		flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
+	else
+		addr |= PIPE_CONTROL_GLOBAL_GTT;
+
+	cs = ring_map_dw(ring, 4);
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = flags;
+	*cs++ = addr;
+	*cs++ = 0;
+}
+
+static struct i915_address_space *
+clear_residuals(struct intel_ring *ring, struct intel_engine_cs *engine)
+{
+	struct intel_context *ce = engine->kernel_context;
+	struct i915_address_space *vm = vm_alias(engine->gt->vm);
+	u32 flags;
+
+	if (vm != current_vm(engine))
+		load_pd_dir(ring, engine, i915_vm_to_ppgtt(vm));
+
+	if (ce->state)
+		mi_set_context(ring, engine, ce,
+			       MI_MM_SPACE_GTT | MI_RESTORE_INHIBIT);
+
+	if (IS_HASWELL(engine->i915))
+		flags = MI_BATCH_PPGTT_HSW | MI_BATCH_NON_SECURE_HSW;
+	else
+		flags = MI_BATCH_NON_SECURE_I965;
+
+	__gen6_emit_bb_start(ring_map_dw(ring, 2),
+			     engine->wa_ctx.vma->node.start, flags);
+
+	return vm;
+}
+
+static void remap_l3_slice(struct intel_ring *ring,
+			   struct intel_engine_cs *engine,
+			   int slice)
+{
+	u32 *cs, *remap_info = engine->i915->l3_parity.remap_info[slice];
+	int i;
+
+	if (!remap_info)
+		return;
+
+	/*
+	 * Note: We do not worry about the concurrent register cacheline hang
+	 * here because no other code should access these registers other than
+	 * at initialization time.
+	 */
+	cs = ring_map_dw(ring, GEN7_L3LOG_SIZE / 4 * 2 + 2);
+	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE / 4);
+	for (i = 0; i < GEN7_L3LOG_SIZE / 4; i++) {
+		*cs++ = i915_mmio_reg_offset(GEN7_L3LOG(slice, i));
+		*cs++ = remap_info[i];
+	}
+	*cs++ = MI_NOOP;
+}
+
+static void remap_l3(struct intel_ring *ring,
+		     struct intel_engine_cs *engine,
+		     struct intel_context *ce)
+{
+	struct i915_gem_context *ctx =
+		rcu_dereference_protected(ce->gem_context, true);
+	int bit, idx = -1;
+
+	if (!ctx || !ctx->remap_slice)
+		return;
+
+	do {
+		bit = ffs(ctx->remap_slice);
+		remap_l3_slice(ring, engine, idx += bit);
+	} while (ctx->remap_slice >>= bit);
+}
+
 static void switch_context(struct intel_ring *ring, struct i915_request *rq)
 {
+	struct intel_engine_cs *engine = rq->engine;
+	struct i915_address_space *cvm = current_vm(engine);
+	struct intel_context *ce = rq->context;
+	struct i915_address_space *vm;
+
+	if (engine->wa_ctx.vma && ce != engine->kernel_context) {
+		if (engine->wa_ctx.vma->private != ce) {
+			cvm = clear_residuals(ring, engine);
+			intel_context_put(engine->wa_ctx.vma->private);
+			engine->wa_ctx.vma->private = intel_context_get(ce);
+		}
+	}
+
+	vm = vm_alias(ce->vm);
+	if (vm != cvm)
+		load_pd_dir(ring, engine, i915_vm_to_ppgtt(vm));
+
+	if (ce->state) {
+		u32 flags;
+
+		GEM_BUG_ON(engine->id != RCS0);
+
+		/* For resource streamer on HSW+ and power context elsewhere */
+		BUILD_BUG_ON(HSW_MI_RS_SAVE_STATE_EN != MI_SAVE_EXT_STATE_EN);
+		BUILD_BUG_ON(HSW_MI_RS_RESTORE_STATE_EN != MI_RESTORE_EXT_STATE_EN);
+
+		flags = MI_SAVE_EXT_STATE_EN | MI_MM_SPACE_GTT;
+		if (test_bit(CONTEXT_VALID_BIT, &ce->flags)) {
+			gen6_emit_invalidate_rcs(ring, engine);
+			flags |= MI_RESTORE_EXT_STATE_EN;
+		} else {
+			flags |= MI_RESTORE_INHIBIT;
+		}
+
+		mi_set_context(ring, engine, ce, flags);
+	}
+
+	remap_l3(ring, engine, ce);
 }
 
 static struct i915_request *ring_submit(struct i915_request *rq)
@@ -453,6 +710,33 @@ static void submission_unpark(struct intel_engine_cs *engine)
 	intel_engine_pin_breadcrumbs_irq(engine);
 }
 
+static int gen6_emit_init_breadcrumb(struct i915_request *rq)
+{
+	struct intel_timeline *tl = i915_request_timeline(rq);
+	u32 *cs;
+
+	GEM_BUG_ON(i915_request_has_initial_breadcrumb(rq));
+	if (!tl->has_initial_breadcrumb)
+		return 0;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+	*cs++ = 0;
+	*cs++ = tl->hwsp_offset;
+	*cs++ = rq->fence.seqno - 1;
+
+	intel_ring_advance(rq, cs);
+
+	/* Record the updated position of the request's payload */
+	rq->infix = intel_ring_offset(rq, cs);
+
+	__set_bit(I915_FENCE_FLAG_INITIAL_BREADCRUMB, &rq->fence.flags);
+	return 0;
+}
+
 static void ring_context_destroy(struct kref *ref)
 {
 	struct intel_context *ce = container_of(ref, typeof(*ce), ref);
@@ -468,8 +752,30 @@ static void ring_context_destroy(struct kref *ref)
 	intel_context_free(ce);
 }
 
+static int __context_pin_ppgtt(struct intel_context *ce)
+{
+	struct i915_address_space *vm;
+	int err = 0;
+
+	vm = vm_alias(ce->vm);
+	if (vm)
+		err = gen6_ppgtt_pin(i915_vm_to_ppgtt((vm)));
+
+	return err;
+}
+
+static void __context_unpin_ppgtt(struct intel_context *ce)
+{
+	struct i915_address_space *vm;
+
+	vm = vm_alias(ce->vm);
+	if (vm)
+		gen6_ppgtt_unpin(i915_vm_to_ppgtt(vm));
+}
+
 static void ring_context_unpin(struct intel_context *ce)
 {
+	__context_unpin_ppgtt(ce);
 }
 
 static int alloc_context_vma(struct intel_context *ce)
@@ -597,7 +903,7 @@ static int ring_context_alloc(struct intel_context *ce)
 
 static int ring_context_pin(struct intel_context *ce)
 {
-	return 0;
+	return __context_pin_ppgtt(ce);
 }
 
 static void ring_context_reset(struct intel_context *ce)
@@ -653,12 +959,19 @@ static void ring_release(struct intel_engine_cs *engine)
 
 	set_current_context(&engine->legacy.context, NULL);
 
+	if (engine->wa_ctx.vma) {
+		intel_context_put(engine->wa_ctx.vma->private);
+		i915_vma_unpin_and_release(&engine->wa_ctx.vma, 0);
+	}
+
 	intel_ring_unpin(engine->legacy.ring);
 	intel_ring_put(engine->legacy.ring);
 }
 
 static void setup_irq(struct intel_engine_cs *engine)
 {
+	engine->irq_enable = gen6_irq_enable;
+	engine->irq_disable = gen6_irq_disable;
 }
 
 static void setup_common(struct intel_engine_cs *engine)
@@ -667,7 +980,7 @@ static void setup_common(struct intel_engine_cs *engine)
 
 	/* gen8+ are only supported with execlists */
 	GEM_BUG_ON(INTEL_GEN(i915) >= 8);
-	GEM_BUG_ON(INTEL_GEN(i915) < 8);
+	GEM_BUG_ON(INTEL_GEN(i915) < 6);
 
 	setup_irq(engine);
 
@@ -684,24 +997,62 @@ static void setup_common(struct intel_engine_cs *engine)
 	engine->cops = &ring_context_ops;
 	engine->request_alloc = ring_request_alloc;
 
+	engine->emit_init_breadcrumb = gen6_emit_init_breadcrumb;
+	if (INTEL_GEN(i915) >= 7)
+		engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs;
+	else if (INTEL_GEN(i915) >= 6)
+		engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_xcs;
+	else
+		engine->emit_fini_breadcrumb = gen3_emit_breadcrumb;
+
 	engine->set_default_submission = set_default_submission;
+
+	engine->emit_bb_start = gen6_emit_bb_start;
 }
 
 static void setup_rcs(struct intel_engine_cs *engine)
 {
+	struct drm_i915_private *i915 = engine->i915;
+
+	if (HAS_L3_DPF(i915))
+		engine->irq_keep_mask = GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
+
+	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
+
+	if (INTEL_GEN(i915) >= 7) {
+		engine->emit_flush = gen7_emit_flush_rcs;
+		engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_rcs;
+		if (IS_HASWELL(i915))
+			engine->emit_bb_start = hsw_emit_bb_start;
+	} else {
+		engine->emit_flush = gen6_emit_flush_rcs;
+		engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_rcs;
+	}
 }
 
 static void setup_vcs(struct intel_engine_cs *engine)
 {
+	engine->emit_flush = gen6_emit_flush_vcs;
+	engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
+
+	if (IS_GEN(engine->i915, 6))
+		engine->fw_domain = FORCEWAKE_ALL;
 }
 
 static void setup_bcs(struct intel_engine_cs *engine)
 {
+	engine->emit_flush = gen6_emit_flush_xcs;
+	engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
 }
 
 static void setup_vecs(struct intel_engine_cs *engine)
 {
 	GEM_BUG_ON(!IS_HASWELL(engine->i915));
+
+	engine->emit_flush = gen6_emit_flush_xcs;
+	engine->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
+	engine->irq_enable = hsw_irq_enable_vecs;
+	engine->irq_disable = hsw_irq_disable_vecs;
 }
 
 static unsigned int global_ring_size(void)
@@ -710,6 +1061,58 @@ static unsigned int global_ring_size(void)
 	return roundup_pow_of_two(EXECLIST_MAX_PORTS * SZ_16K + SZ_4K);
 }
 
+static int gen7_ctx_switch_bb_init(struct intel_engine_cs *engine)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int size;
+	int err;
+
+	size = gen7_setup_clear_gpr_bb(engine, NULL /* probe size */);
+	if (size <= 0)
+		return size;
+
+	size = ALIGN(size, PAGE_SIZE);
+	obj = i915_gem_object_create_internal(engine->i915, size);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	vma = i915_vma_instance(obj, engine->gt->vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_obj;
+	}
+
+	vma->private = intel_context_create(engine); /* dummy residuals */
+	if (IS_ERR(vma->private)) {
+		err = PTR_ERR(vma->private);
+		goto err_obj;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_HIGH);
+	if (err)
+		goto err_private;
+
+	err = i915_vma_sync(vma);
+	if (err)
+		goto err_unpin;
+
+	size = gen7_setup_clear_gpr_bb(engine, vma);
+	if (err)
+		goto err_unpin;
+
+	engine->wa_ctx.vma = vma;
+	return 0;
+
+err_unpin:
+	i915_vma_unpin(vma);
+err_private:
+	intel_context_put(vma->private);
+err_obj:
+	i915_gem_object_put(obj);
+	return err;
+}
+
 int intel_ring_scheduler_setup(struct intel_engine_cs *engine)
 {
 	struct intel_ring *ring;
@@ -753,13 +1156,23 @@ int intel_ring_scheduler_setup(struct intel_engine_cs *engine)
 	GEM_BUG_ON(engine->legacy.ring);
 	engine->legacy.ring = ring;
 
+	if (IS_HASWELL(engine->i915) && engine->class == RENDER_CLASS) {
+		err = gen7_ctx_switch_bb_init(engine);
+		if (err)
+			goto err_ring_unpin;
+	}
+
 	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
+	if (INTEL_GEN(engine->i915) >= 6)
+		engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
 
 	/* Finally, take ownership and responsibility for cleanup! */
 	engine->release = ring_release;
 	return 0;
 
+err_ring_unpin:
+	intel_ring_unpin(ring);
 err_ring:
 	intel_ring_put(ring);
 err:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 15/36] drm/i915/gt: Enable ring scheduling for gen6/7
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (12 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 14/36] drm/i915/gt: Implement ring scheduler for gen6/7 Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 16/36] drm/i915/gem: Mark the buffer pool as active for the cmdparser Chris Wilson
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Switch over from FIFO global submission to the priority-sorted
topographical scheduler. At the cost of more busy work on the CPU to
keep the GPU supplied with the next packet of requests, this allows us
to reorder requests around submission stalls.

This also enables the timer based RPS, with the exception of Valleyview
who's PCU doesn't take kindly to our interference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 2 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c             | 2 ++
 drivers/gpu/drm/i915/gt/intel_rps.c                   | 6 ++----
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index b81978890641..bb57687aea99 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -94,7 +94,7 @@ static int live_nop_switch(void *arg)
 			rq = i915_request_get(this);
 			i915_request_add(this);
 		}
-		if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		if (i915_request_wait(rq, 0, HZ) < 0) {
 			pr_err("Failed to populated %d contexts\n", nctx);
 			intel_gt_set_wedged(&i915->gt);
 			i915_request_put(rq);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 64e13c074708..c29727b3ba4a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -791,6 +791,8 @@ int intel_engines_init(struct intel_gt *gt)
 
 	if (HAS_EXECLISTS(gt->i915))
 		setup = intel_execlists_submission_setup;
+	else if (INTEL_GEN(gt->i915) >= 6)
+		setup = intel_ring_scheduler_setup;
 	else
 		setup = intel_ring_submission_setup;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index 2e4ddc9ca09d..22882c2953da 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1053,9 +1053,7 @@ static bool gen6_rps_enable(struct intel_rps *rps)
 	intel_uncore_write_fw(uncore, GEN6_RP_DOWN_TIMEOUT, 50000);
 	intel_uncore_write_fw(uncore, GEN6_RP_IDLE_HYSTERSIS, 10);
 
-	rps->pm_events = (GEN6_PM_RP_UP_THRESHOLD |
-			  GEN6_PM_RP_DOWN_THRESHOLD |
-			  GEN6_PM_RP_DOWN_TIMEOUT);
+	rps->pm_events = GEN6_PM_RP_UP_THRESHOLD | GEN6_PM_RP_DOWN_THRESHOLD;
 
 	return rps_reset(rps);
 }
@@ -1362,7 +1360,7 @@ void intel_rps_enable(struct intel_rps *rps)
 	GEM_BUG_ON(rps->efficient_freq < rps->min_freq);
 	GEM_BUG_ON(rps->efficient_freq > rps->max_freq);
 
-	if (has_busy_stats(rps))
+	if (has_busy_stats(rps) && !IS_VALLEYVIEW(i915))
 		intel_rps_set_timer(rps);
 	else if (INTEL_GEN(i915) >= 6)
 		intel_rps_set_interrupts(rps);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 16/36] drm/i915/gem: Mark the buffer pool as active for the cmdparser
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (13 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 15/36] drm/i915/gt: Enable ring scheduling " Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 17/36] drm/i915/gem: Async GPU relocations only Chris Wilson
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If the execbuf is interrupted after building the cmdparser pipeline, and
before we commit to submitting the request to HW, we would attempt to
clean up the cmdparser early. While we held active references to the vma
being parsed and constructed, we did not hold an active reference for
the buffer pool itself. The result was that an interrupted execbuf could
still have run the cmdparser pipeline, but since the buffer pool was
idle, its target vma could have been recycled.

Note this problem only occurs if the cmdparser is running async due to
pipelined waits on busy fences, and the execbuf is interrupted.

Fixes: 686c7c35abc2 ("drm/i915/gem: Asynchronous cmdparser")
Fixes: 16e87459673a ("drm/i915/gt: Move the batch buffer pool from the engine to the gt")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 56 ++++++++++++++++---
 1 file changed, 48 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 219a36995b96..0829ac8a55bf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1987,6 +1987,38 @@ static const struct dma_fence_work_ops eb_parse_ops = {
 	.release = __eb_parse_release,
 };
 
+static inline int
+__parser_mark_active(struct i915_vma *vma,
+		     struct intel_timeline *tl,
+		     struct dma_fence *fence)
+{
+	struct intel_gt_buffer_pool_node *node = vma->private;
+
+	return i915_active_ref(&node->active, tl, fence);
+}
+
+static int
+parser_mark_active(struct eb_parse_work *pw, struct intel_timeline *tl)
+{
+	int err;
+
+	mutex_lock(&tl->mutex);
+
+	err = __parser_mark_active(pw->shadow, tl, &pw->base.dma);
+	if (err)
+		goto unlock;
+
+	if (pw->trampoline) {
+		err = __parser_mark_active(pw->trampoline, tl, &pw->base.dma);
+		if (err)
+			goto unlock;
+	}
+
+unlock:
+	mutex_unlock(&tl->mutex);
+	return err;
+}
+
 static int eb_parse_pipeline(struct i915_execbuffer *eb,
 			     struct i915_vma *shadow,
 			     struct i915_vma *trampoline)
@@ -2021,20 +2053,25 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 	pw->shadow = shadow;
 	pw->trampoline = trampoline;
 
+	/* Mark active refs for this worker, in case we get interrupted */
+	err = parser_mark_active(pw, eb->context->timeline);
+	if (err)
+		goto err_commit;
+
 	err = dma_resv_lock_interruptible(pw->batch->resv, NULL);
 	if (err)
-		goto err_trampoline;
+		goto err_commit;
 
 	err = dma_resv_reserve_shared(pw->batch->resv, 1);
 	if (err)
-		goto err_batch_unlock;
+		goto err_commit_unlock;
 
 	/* Wait for all writes (and relocs) into the batch to complete */
 	err = i915_sw_fence_await_reservation(&pw->base.chain,
 					      pw->batch->resv, NULL, false,
 					      0, I915_FENCE_GFP);
 	if (err < 0)
-		goto err_batch_unlock;
+		goto err_commit_unlock;
 
 	/* Keep the batch alive and unwritten as we parse */
 	dma_resv_add_shared_fence(pw->batch->resv, &pw->base.dma);
@@ -2049,11 +2086,13 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 	dma_fence_work_commit_imm(&pw->base);
 	return 0;
 
-err_batch_unlock:
+err_commit_unlock:
 	dma_resv_unlock(pw->batch->resv);
-err_trampoline:
-	if (trampoline)
-		i915_active_release(&trampoline->active);
+err_commit:
+	i915_sw_fence_set_error_once(&pw->base.chain, err);
+	dma_fence_work_commit_imm(&pw->base);
+	return err;
+
 err_shadow:
 	i915_active_release(&shadow->active);
 err_batch:
@@ -2099,6 +2138,7 @@ static int eb_parse(struct i915_execbuffer *eb)
 		goto err;
 	}
 	i915_gem_object_set_readonly(shadow->obj);
+	shadow->private = pool;
 
 	trampoline = NULL;
 	if (CMDPARSER_USES_GGTT(eb->i915)) {
@@ -2112,6 +2152,7 @@ static int eb_parse(struct i915_execbuffer *eb)
 			shadow = trampoline;
 			goto err_shadow;
 		}
+		shadow->private = pool;
 
 		eb->batch_flags |= I915_DISPATCH_SECURE;
 	}
@@ -2128,7 +2169,6 @@ static int eb_parse(struct i915_execbuffer *eb)
 	eb->trampoline = trampoline;
 	eb->batch_start_offset = 0;
 
-	shadow->private = pool;
 	return 0;
 
 err_trampoline:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 17/36] drm/i915/gem: Async GPU relocations only
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (14 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 16/36] drm/i915/gem: Mark the buffer pool as active for the cmdparser Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 18/36] drm/i915: Add list_for_each_entry_safe_continue_reverse Chris Wilson
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Reduce the 3 relocation patches down to the single path that accommodates
all. The primary motivation for this is to guard the relocations with a
natural fence (derived from the i915_request used to write the
relocation from the GPU).

The tradeoff in using async gpu relocations is that it increases latency
over using direct CPU relocations, for the cases where the target is
idle and accessible by the CPU. The benefit is greatly reduced lock
contention and improved concurrency by pipelining.

Note that forcing the async gpu relocations does reveal a few issues
they have. Firstly, is that they are visible as writes to gem_busy,
causing to mark some buffers are being to written to by the GPU even
though userspace only reads. Secondly is that, in combination with the
cmdparser, they can cause priority inversions. This should be the case
where the work is being put into a common workqueue losing our priority
information and so being executed in FIFO from the worker, denying us
the opportunity to reorder the requests afterwards.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 295 ++----------------
 .../i915/gem/selftests/i915_gem_execbuffer.c  |  21 +-
 2 files changed, 27 insertions(+), 289 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 0829ac8a55bf..540188454b6d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -45,13 +45,6 @@ struct eb_vma_array {
 	struct eb_vma vma[];
 };
 
-enum {
-	FORCE_CPU_RELOC = 1,
-	FORCE_GTT_RELOC,
-	FORCE_GPU_RELOC,
-#define DBG_FORCE_RELOC 0 /* choose one of the above! */
-};
-
 #define __EXEC_OBJECT_HAS_PIN		BIT(31)
 #define __EXEC_OBJECT_HAS_FENCE		BIT(30)
 #define __EXEC_OBJECT_NEEDS_MAP		BIT(29)
@@ -260,8 +253,6 @@ struct i915_execbuffer {
 	 */
 	struct reloc_cache {
 		struct drm_mm_node node; /** temporary GTT binding */
-		unsigned long vaddr; /** Current kmap address */
-		unsigned long page; /** Currently mapped page index */
 		unsigned int gen; /** Cached value of INTEL_GEN */
 		bool use_64bit_reloc : 1;
 		bool has_llc : 1;
@@ -605,23 +596,6 @@ eb_add_vma(struct i915_execbuffer *eb,
 	}
 }
 
-static inline int use_cpu_reloc(const struct reloc_cache *cache,
-				const struct drm_i915_gem_object *obj)
-{
-	if (!i915_gem_object_has_struct_page(obj))
-		return false;
-
-	if (DBG_FORCE_RELOC == FORCE_CPU_RELOC)
-		return true;
-
-	if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
-		return false;
-
-	return (cache->has_llc ||
-		obj->cache_dirty ||
-		obj->cache_level != I915_CACHE_NONE);
-}
-
 static int eb_reserve_vma(const struct i915_execbuffer *eb,
 			  struct eb_vma *ev,
 			  u64 pin_flags)
@@ -945,8 +919,6 @@ relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
 static void reloc_cache_init(struct reloc_cache *cache,
 			     struct drm_i915_private *i915)
 {
-	cache->page = -1;
-	cache->vaddr = 0;
 	/* Must be a variable in the struct to allow GCC to unroll. */
 	cache->gen = INTEL_GEN(i915);
 	cache->has_llc = HAS_LLC(i915);
@@ -1089,181 +1061,6 @@ static int reloc_gpu_flush(struct reloc_cache *cache)
 	return err;
 }
 
-static void reloc_cache_reset(struct reloc_cache *cache)
-{
-	void *vaddr;
-
-	if (!cache->vaddr)
-		return;
-
-	vaddr = unmask_page(cache->vaddr);
-	if (cache->vaddr & KMAP) {
-		if (cache->vaddr & CLFLUSH_AFTER)
-			mb();
-
-		kunmap_atomic(vaddr);
-		i915_gem_object_finish_access((struct drm_i915_gem_object *)cache->node.mm);
-	} else {
-		struct i915_ggtt *ggtt = cache_to_ggtt(cache);
-
-		intel_gt_flush_ggtt_writes(ggtt->vm.gt);
-		io_mapping_unmap_atomic((void __iomem *)vaddr);
-
-		if (drm_mm_node_allocated(&cache->node)) {
-			ggtt->vm.clear_range(&ggtt->vm,
-					     cache->node.start,
-					     cache->node.size);
-			mutex_lock(&ggtt->vm.mutex);
-			drm_mm_remove_node(&cache->node);
-			mutex_unlock(&ggtt->vm.mutex);
-		} else {
-			i915_vma_unpin((struct i915_vma *)cache->node.mm);
-		}
-	}
-
-	cache->vaddr = 0;
-	cache->page = -1;
-}
-
-static void *reloc_kmap(struct drm_i915_gem_object *obj,
-			struct reloc_cache *cache,
-			unsigned long page)
-{
-	void *vaddr;
-
-	if (cache->vaddr) {
-		kunmap_atomic(unmask_page(cache->vaddr));
-	} else {
-		unsigned int flushes;
-		int err;
-
-		err = i915_gem_object_prepare_write(obj, &flushes);
-		if (err)
-			return ERR_PTR(err);
-
-		BUILD_BUG_ON(KMAP & CLFLUSH_FLAGS);
-		BUILD_BUG_ON((KMAP | CLFLUSH_FLAGS) & PAGE_MASK);
-
-		cache->vaddr = flushes | KMAP;
-		cache->node.mm = (void *)obj;
-		if (flushes)
-			mb();
-	}
-
-	vaddr = kmap_atomic(i915_gem_object_get_dirty_page(obj, page));
-	cache->vaddr = unmask_flags(cache->vaddr) | (unsigned long)vaddr;
-	cache->page = page;
-
-	return vaddr;
-}
-
-static void *reloc_iomap(struct drm_i915_gem_object *obj,
-			 struct reloc_cache *cache,
-			 unsigned long page)
-{
-	struct i915_ggtt *ggtt = cache_to_ggtt(cache);
-	unsigned long offset;
-	void *vaddr;
-
-	if (cache->vaddr) {
-		intel_gt_flush_ggtt_writes(ggtt->vm.gt);
-		io_mapping_unmap_atomic((void __force __iomem *) unmask_page(cache->vaddr));
-	} else {
-		struct i915_vma *vma;
-		int err;
-
-		if (i915_gem_object_is_tiled(obj))
-			return ERR_PTR(-EINVAL);
-
-		if (use_cpu_reloc(cache, obj))
-			return NULL;
-
-		i915_gem_object_lock(obj);
-		err = i915_gem_object_set_to_gtt_domain(obj, true);
-		i915_gem_object_unlock(obj);
-		if (err)
-			return ERR_PTR(err);
-
-		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
-					       PIN_MAPPABLE |
-					       PIN_NONBLOCK /* NOWARN */ |
-					       PIN_NOEVICT);
-		if (IS_ERR(vma)) {
-			memset(&cache->node, 0, sizeof(cache->node));
-			mutex_lock(&ggtt->vm.mutex);
-			err = drm_mm_insert_node_in_range
-				(&ggtt->vm.mm, &cache->node,
-				 PAGE_SIZE, 0, I915_COLOR_UNEVICTABLE,
-				 0, ggtt->mappable_end,
-				 DRM_MM_INSERT_LOW);
-			mutex_unlock(&ggtt->vm.mutex);
-			if (err) /* no inactive aperture space, use cpu reloc */
-				return NULL;
-		} else {
-			cache->node.start = vma->node.start;
-			cache->node.mm = (void *)vma;
-		}
-	}
-
-	offset = cache->node.start;
-	if (drm_mm_node_allocated(&cache->node)) {
-		ggtt->vm.insert_page(&ggtt->vm,
-				     i915_gem_object_get_dma_address(obj, page),
-				     offset, I915_CACHE_NONE, 0);
-	} else {
-		offset += page << PAGE_SHIFT;
-	}
-
-	vaddr = (void __force *)io_mapping_map_atomic_wc(&ggtt->iomap,
-							 offset);
-	cache->page = page;
-	cache->vaddr = (unsigned long)vaddr;
-
-	return vaddr;
-}
-
-static void *reloc_vaddr(struct drm_i915_gem_object *obj,
-			 struct reloc_cache *cache,
-			 unsigned long page)
-{
-	void *vaddr;
-
-	if (cache->page == page) {
-		vaddr = unmask_page(cache->vaddr);
-	} else {
-		vaddr = NULL;
-		if ((cache->vaddr & KMAP) == 0)
-			vaddr = reloc_iomap(obj, cache, page);
-		if (!vaddr)
-			vaddr = reloc_kmap(obj, cache, page);
-	}
-
-	return vaddr;
-}
-
-static void clflush_write32(u32 *addr, u32 value, unsigned int flushes)
-{
-	if (unlikely(flushes & (CLFLUSH_BEFORE | CLFLUSH_AFTER))) {
-		if (flushes & CLFLUSH_BEFORE) {
-			clflushopt(addr);
-			mb();
-		}
-
-		*addr = value;
-
-		/*
-		 * Writes to the same cacheline are serialised by the CPU
-		 * (including clflush). On the write path, we only require
-		 * that it hits memory in an orderly fashion and place
-		 * mb barriers at the start and end of the relocation phase
-		 * to ensure ordering of clflush wrt to the system.
-		 */
-		if (flushes & CLFLUSH_AFTER)
-			clflushopt(addr);
-	} else
-		*addr = value;
-}
-
 static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
@@ -1429,17 +1226,6 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb,
 	return cmd;
 }
 
-static inline bool use_reloc_gpu(struct i915_vma *vma)
-{
-	if (DBG_FORCE_RELOC == FORCE_GPU_RELOC)
-		return true;
-
-	if (DBG_FORCE_RELOC)
-		return false;
-
-	return !dma_resv_test_signaled_rcu(vma->resv, true);
-}
-
 static unsigned long vma_phys_addr(struct i915_vma *vma, u32 offset)
 {
 	struct page *page;
@@ -1454,10 +1240,10 @@ static unsigned long vma_phys_addr(struct i915_vma *vma, u32 offset)
 	return addr + offset_in_page(offset);
 }
 
-static bool __reloc_entry_gpu(struct i915_execbuffer *eb,
-			      struct i915_vma *vma,
-			      u64 offset,
-			      u64 target_addr)
+static int __reloc_entry_gpu(struct i915_execbuffer *eb,
+			     struct i915_vma *vma,
+			     u64 offset,
+			     u64 target_addr)
 {
 	const unsigned int gen = eb->reloc_cache.gen;
 	unsigned int len;
@@ -1473,7 +1259,7 @@ static bool __reloc_entry_gpu(struct i915_execbuffer *eb,
 
 	batch = reloc_gpu(eb, vma, len);
 	if (IS_ERR(batch))
-		return false;
+		return PTR_ERR(batch);
 
 	addr = gen8_canonical_addr(vma->node.start + offset);
 	if (gen >= 8) {
@@ -1522,55 +1308,21 @@ static bool __reloc_entry_gpu(struct i915_execbuffer *eb,
 		*batch++ = target_addr;
 	}
 
-	return true;
-}
-
-static bool reloc_entry_gpu(struct i915_execbuffer *eb,
-			    struct i915_vma *vma,
-			    u64 offset,
-			    u64 target_addr)
-{
-	if (eb->reloc_cache.vaddr)
-		return false;
-
-	if (!use_reloc_gpu(vma))
-		return false;
-
-	return __reloc_entry_gpu(eb, vma, offset, target_addr);
+	return 0;
 }
 
 static u64
-relocate_entry(struct i915_vma *vma,
+relocate_entry(struct i915_execbuffer *eb,
+	       struct i915_vma *vma,
 	       const struct drm_i915_gem_relocation_entry *reloc,
-	       struct i915_execbuffer *eb,
 	       const struct i915_vma *target)
 {
 	u64 target_addr = relocation_target(reloc, target);
-	u64 offset = reloc->offset;
-
-	if (!reloc_entry_gpu(eb, vma, offset, target_addr)) {
-		bool wide = eb->reloc_cache.use_64bit_reloc;
-		void *vaddr;
-
-repeat:
-		vaddr = reloc_vaddr(vma->obj,
-				    &eb->reloc_cache,
-				    offset >> PAGE_SHIFT);
-		if (IS_ERR(vaddr))
-			return PTR_ERR(vaddr);
-
-		GEM_BUG_ON(!IS_ALIGNED(offset, sizeof(u32)));
-		clflush_write32(vaddr + offset_in_page(offset),
-				lower_32_bits(target_addr),
-				eb->reloc_cache.vaddr);
-
-		if (wide) {
-			offset += sizeof(u32);
-			target_addr >>= 32;
-			wide = false;
-			goto repeat;
-		}
-	}
+	int err;
+
+	err = __reloc_entry_gpu(eb, vma, reloc->offset, target_addr);
+	if (err)
+		return err;
 
 	return target->node.start | UPDATE;
 }
@@ -1635,8 +1387,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,
 	 * If the relocation already has the right value in it, no
 	 * more work needs to be done.
 	 */
-	if (!DBG_FORCE_RELOC &&
-	    gen8_canonical_addr(target->vma->node.start) == reloc->presumed_offset)
+	if (gen8_canonical_addr(target->vma->node.start) == reloc->presumed_offset)
 		return 0;
 
 	/* Check that the relocation address is valid... */
@@ -1668,7 +1419,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,
 	ev->flags &= ~EXEC_OBJECT_ASYNC;
 
 	/* and update the user's relocation entry */
-	return relocate_entry(ev->vma, reloc, eb, target->vma);
+	return relocate_entry(eb, ev->vma, reloc, target->vma);
 }
 
 static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
@@ -1706,10 +1457,8 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
 		 * this is bad and so lockdep complains vehemently.
 		 */
 		copied = __copy_from_user(r, urelocs, count * sizeof(r[0]));
-		if (unlikely(copied)) {
-			remain = -EFAULT;
-			goto out;
-		}
+		if (unlikely(copied))
+			return -EFAULT;
 
 		remain -= count;
 		do {
@@ -1717,8 +1466,7 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
 
 			if (likely(offset == 0)) {
 			} else if ((s64)offset < 0) {
-				remain = (int)offset;
-				goto out;
+				return (int)offset;
 			} else {
 				/*
 				 * Note that reporting an error now
@@ -1748,9 +1496,8 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
 		} while (r++, --count);
 		urelocs += ARRAY_SIZE(stack);
 	} while (remain);
-out:
-	reloc_cache_reset(&eb->reloc_cache);
-	return remain;
+
+	return 0;
 }
 
 static int eb_relocate(struct i915_execbuffer *eb)
@@ -2658,7 +2405,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	eb.i915 = i915;
 	eb.file = file;
 	eb.args = args;
-	if (DBG_FORCE_RELOC || !(args->flags & I915_EXEC_NO_RELOC))
+	if (!(args->flags & I915_EXEC_NO_RELOC))
 		args->flags |= __EXEC_HAS_RELOC;
 
 	eb.exec = exec;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index a49016f8ee0d..57c14d3340cd 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -37,20 +37,14 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 		return err;
 
 	/* 8-Byte aligned */
-	if (!__reloc_entry_gpu(eb, vma,
-			       offsets[0] * sizeof(u32),
-			       0)) {
-		err = -EIO;
+	err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
+	if (err)
 		goto unpin_vma;
-	}
 
 	/* !8-Byte aligned */
-	if (!__reloc_entry_gpu(eb, vma,
-			       offsets[1] * sizeof(u32),
-			       1)) {
-		err = -EIO;
+	err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1);
+	if (err)
 		goto unpin_vma;
-	}
 
 	/* Skip to the end of the cmd page */
 	i = PAGE_SIZE / sizeof(u32) - RELOC_TAIL - 1;
@@ -60,12 +54,9 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	eb->reloc_cache.rq_size += i;
 
 	/* Force batch chaining */
-	if (!__reloc_entry_gpu(eb, vma,
-			       offsets[2] * sizeof(u32),
-			       2)) {
-		err = -EIO;
+	err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2);
+	if (err)
 		goto unpin_vma;
-	}
 
 	GEM_BUG_ON(!eb->reloc_cache.rq);
 	rq = i915_request_get(eb->reloc_cache.rq);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 18/36] drm/i915: Add list_for_each_entry_safe_continue_reverse
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (15 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 17/36] drm/i915/gem: Async GPU relocations only Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 19/36] drm/i915/gem: Separate reloc validation into an earlier step Chris Wilson
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

One more list iterator variant, for when we want to unwind from inside
one list iterator with the intention of restarting from the current
entry as the new head of the list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_utils.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 03a73d2bd50d..6ebccdd12d4c 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -266,6 +266,12 @@ static inline int list_is_last_rcu(const struct list_head *list,
 	return READ_ONCE(list->next) == head;
 }
 
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))
+
 /*
  * Wait until the work is finally complete, even if it tries to postpone
  * by requeueing itself. Note, that if the worker never cancels itself,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 19/36] drm/i915/gem: Separate reloc validation into an earlier step
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (16 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 18/36] drm/i915: Add list_for_each_entry_safe_continue_reverse Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 20/36] drm/i915/gem: Lift GPU relocation allocation Chris Wilson
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Over the next couple of patches, we will want to lock all the modified
vma for relocation processing under a single ww_mutex. We neither want
to have to include the vma that are skipped (due to no modifications
required) nor do we want those to be marked as written too. So separate
out the reloc validation into an early step, which we can use both to
reject the execbuf before committing to making our changes, and to
filter out the unmodified vma.

This does introduce a second pass through the reloc[], but only if we
need to emit relocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 178 +++++++++++++-----
 1 file changed, 133 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 540188454b6d..bed7c7ea2723 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1331,6 +1331,117 @@ static u64
 eb_relocate_entry(struct i915_execbuffer *eb,
 		  struct eb_vma *ev,
 		  const struct drm_i915_gem_relocation_entry *reloc)
+{
+	struct eb_vma *target;
+
+	/* we've already hold a reference to all valid objects */
+	target = eb_get_vma(eb, reloc->target_handle);
+	if (unlikely(!target))
+		return -ENOENT;
+
+	/*
+	 * If the relocation already has the right value in it, no
+	 * more work needs to be done.
+	 */
+	if (gen8_canonical_addr(target->vma->node.start) == reloc->presumed_offset)
+		return 0;
+
+	/*
+	 * If we write into the object, we need to force the synchronisation
+	 * barrier, either with an asynchronous clflush or if we executed the
+	 * patching using the GPU (though that should be serialised by the
+	 * timeline). To be completely sure, and since we are required to
+	 * do relocations we are already stalling, disable the user's opt
+	 * out of our synchronisation.
+	 */
+	ev->flags &= ~EXEC_OBJECT_ASYNC;
+
+	/* and update the user's relocation entry */
+	return relocate_entry(eb, ev->vma, reloc, target->vma);
+}
+
+static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
+{
+#define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
+	struct drm_i915_gem_relocation_entry stack[N_RELOC(512)];
+	const struct drm_i915_gem_exec_object2 *entry = ev->exec;
+	struct drm_i915_gem_relocation_entry __user *urelocs =
+		u64_to_user_ptr(entry->relocs_ptr);
+	unsigned long remain = entry->relocation_count;
+
+	if (unlikely(remain > N_RELOC(ULONG_MAX)))
+		return -EINVAL;
+
+	/*
+	 * We must check that the entire relocation array is safe
+	 * to read. However, if the array is not writable the user loses
+	 * the updated relocation values.
+	 */
+	if (unlikely(!access_ok(urelocs, remain * sizeof(*urelocs))))
+		return -EFAULT;
+
+	do {
+		struct drm_i915_gem_relocation_entry *r = stack;
+		unsigned int count =
+			min_t(unsigned long, remain, ARRAY_SIZE(stack));
+		unsigned int copied;
+
+		/*
+		 * This is the fast path and we cannot handle a pagefault
+		 * whilst holding the struct mutex lest the user pass in the
+		 * relocations contained within a mmaped bo. For in such a case
+		 * we, the page fault handler would call i915_gem_fault() and
+		 * we would try to acquire the struct mutex again. Obviously
+		 * this is bad and so lockdep complains vehemently.
+		 */
+		copied = __copy_from_user(r, urelocs, count * sizeof(r[0]));
+		if (unlikely(copied))
+			return -EFAULT;
+
+		remain -= count;
+		do {
+			u64 offset = eb_relocate_entry(eb, ev, r);
+
+			if (likely(offset == 0)) {
+			} else if ((s64)offset < 0) {
+				return (int)offset;
+			} else {
+				/*
+				 * Note that reporting an error now
+				 * leaves everything in an inconsistent
+				 * state as we have *already* changed
+				 * the relocation value inside the
+				 * object. As we have not changed the
+				 * reloc.presumed_offset or will not
+				 * change the execobject.offset, on the
+				 * call we may not rewrite the value
+				 * inside the object, leaving it
+				 * dangling and causing a GPU hang. Unless
+				 * userspace dynamically rebuilds the
+				 * relocations on each execbuf rather than
+				 * presume a static tree.
+				 *
+				 * We did previously check if the relocations
+				 * were writable (access_ok), an error now
+				 * would be a strange race with mprotect,
+				 * having already demonstrated that we
+				 * can read from this userspace address.
+				 */
+				offset = gen8_canonical_addr(offset & ~UPDATE);
+				__put_user(offset,
+					   &urelocs[r - stack].presumed_offset);
+			}
+		} while (r++, --count);
+		urelocs += ARRAY_SIZE(stack);
+	} while (remain);
+
+	return 0;
+}
+
+static int
+eb_reloc_valid(struct i915_execbuffer *eb,
+	       struct eb_vma *ev,
+	       const struct drm_i915_gem_relocation_entry *reloc)
 {
 	struct drm_i915_private *i915 = eb->i915;
 	struct eb_vma *target;
@@ -1408,21 +1519,10 @@ eb_relocate_entry(struct i915_execbuffer *eb,
 		return -EINVAL;
 	}
 
-	/*
-	 * If we write into the object, we need to force the synchronisation
-	 * barrier, either with an asynchronous clflush or if we executed the
-	 * patching using the GPU (though that should be serialised by the
-	 * timeline). To be completely sure, and since we are required to
-	 * do relocations we are already stalling, disable the user's opt
-	 * out of our synchronisation.
-	 */
-	ev->flags &= ~EXEC_OBJECT_ASYNC;
-
-	/* and update the user's relocation entry */
-	return relocate_entry(eb, ev->vma, reloc, target->vma);
+	return 1;
 }
 
-static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
+static long eb_reloc_vma_validate(struct i915_execbuffer *eb, struct eb_vma *ev)
 {
 #define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
 	struct drm_i915_gem_relocation_entry stack[N_RELOC(512)];
@@ -1430,6 +1530,7 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
 	struct drm_i915_gem_relocation_entry __user *urelocs =
 		u64_to_user_ptr(entry->relocs_ptr);
 	unsigned long remain = entry->relocation_count;
+	long required = 0;
 
 	if (unlikely(remain > N_RELOC(ULONG_MAX)))
 		return -EINVAL;
@@ -1462,42 +1563,18 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
 
 		remain -= count;
 		do {
-			u64 offset = eb_relocate_entry(eb, ev, r);
+			int ret;
 
-			if (likely(offset == 0)) {
-			} else if ((s64)offset < 0) {
-				return (int)offset;
-			} else {
-				/*
-				 * Note that reporting an error now
-				 * leaves everything in an inconsistent
-				 * state as we have *already* changed
-				 * the relocation value inside the
-				 * object. As we have not changed the
-				 * reloc.presumed_offset or will not
-				 * change the execobject.offset, on the
-				 * call we may not rewrite the value
-				 * inside the object, leaving it
-				 * dangling and causing a GPU hang. Unless
-				 * userspace dynamically rebuilds the
-				 * relocations on each execbuf rather than
-				 * presume a static tree.
-				 *
-				 * We did previously check if the relocations
-				 * were writable (access_ok), an error now
-				 * would be a strange race with mprotect,
-				 * having already demonstrated that we
-				 * can read from this userspace address.
-				 */
-				offset = gen8_canonical_addr(offset & ~UPDATE);
-				__put_user(offset,
-					   &urelocs[r - stack].presumed_offset);
-			}
+			ret = eb_reloc_valid(eb, ev, r);
+			if (ret < 0)
+				return ret;
+
+			required += ret;
 		} while (r++, --count);
 		urelocs += ARRAY_SIZE(stack);
 	} while (remain);
 
-	return 0;
+	return required;
 }
 
 static int eb_relocate(struct i915_execbuffer *eb)
@@ -1516,9 +1593,20 @@ static int eb_relocate(struct i915_execbuffer *eb)
 
 	/* The objects are in their final locations, apply the relocations. */
 	if (eb->args->flags & __EXEC_HAS_RELOC) {
-		struct eb_vma *ev;
+		struct eb_vma *ev, *en;
 		int flush;
 
+		list_for_each_entry_safe(ev, en, &eb->relocs, reloc_link) {
+			long count;
+
+			count = eb_reloc_vma_validate(eb, ev);
+			if (count < 0)
+				return count;
+
+			if (count == 0)
+				list_del_init(&ev->reloc_link);
+		}
+
 		list_for_each_entry(ev, &eb->relocs, reloc_link) {
 			err = eb_relocate_vma(eb, ev);
 			if (err)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 20/36] drm/i915/gem: Lift GPU relocation allocation
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (17 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 19/36] drm/i915/gem: Separate reloc validation into an earlier step Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 21/36] drm/i915/gem: Build the reloc request first Chris Wilson
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Since we have reduced the relocations paths to just use the async GPU,
we can lift the request allocation to the start of the relocations.
Knowing that we use one request for all relocations will simplify
tracking the relocation fence.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 98 ++++++++++---------
 .../i915/gem/selftests/i915_gem_execbuffer.c  |  5 +-
 2 files changed, 56 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index bed7c7ea2723..9537fd87e3a4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -900,8 +900,6 @@ eb_get_vma(const struct i915_execbuffer *eb, unsigned long handle)
 
 static void eb_destroy(const struct i915_execbuffer *eb)
 {
-	GEM_BUG_ON(eb->reloc_cache.rq);
-
 	if (eb->array)
 		eb_vma_array_put(eb->array);
 
@@ -926,7 +924,6 @@ static void reloc_cache_init(struct reloc_cache *cache,
 	cache->has_fence = cache->gen < 4;
 	cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
 	cache->node.flags = 0;
-	cache->rq = NULL;
 	cache->target = NULL;
 }
 
@@ -1026,13 +1023,9 @@ static unsigned int reloc_bb_flags(const struct reloc_cache *cache)
 
 static int reloc_gpu_flush(struct reloc_cache *cache)
 {
-	struct i915_request *rq;
+	struct i915_request *rq = cache->rq;
 	int err;
 
-	rq = fetch_and_zero(&cache->rq);
-	if (!rq)
-		return 0;
-
 	if (cache->rq_vma) {
 		struct drm_i915_gem_object *obj = cache->rq_vma->obj;
 
@@ -1081,9 +1074,8 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
 	return err;
 }
 
-static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
-			     struct intel_engine_cs *engine,
-			     unsigned int len)
+static int
+__reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine)
 {
 	struct reloc_cache *cache = &eb->reloc_cache;
 	struct intel_gt_buffer_pool_node *pool;
@@ -1173,33 +1165,14 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	return err;
 }
 
-static bool reloc_can_use_engine(const struct intel_engine_cs *engine)
-{
-	return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6);
-}
-
-static u32 *reloc_gpu(struct i915_execbuffer *eb,
-		      struct i915_vma *vma,
-		      unsigned int len)
+static u32 *reloc_batch_grow(struct i915_execbuffer *eb,
+			     struct i915_vma *vma,
+			     unsigned int len)
 {
 	struct reloc_cache *cache = &eb->reloc_cache;
 	u32 *cmd;
 	int err;
 
-	if (unlikely(!cache->rq)) {
-		struct intel_engine_cs *engine = eb->engine;
-
-		if (!reloc_can_use_engine(engine)) {
-			engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
-			if (!engine)
-				return ERR_PTR(-ENODEV);
-		}
-
-		err = __reloc_gpu_alloc(eb, engine, len);
-		if (unlikely(err))
-			return ERR_PTR(err);
-	}
-
 	if (vma != cache->target) {
 		err = reloc_move_to_gpu(cache->rq, vma);
 		if (unlikely(err)) {
@@ -1257,7 +1230,7 @@ static int __reloc_entry_gpu(struct i915_execbuffer *eb,
 	else
 		len = 3;
 
-	batch = reloc_gpu(eb, vma, len);
+	batch = reloc_batch_grow(eb, vma, len);
 	if (IS_ERR(batch))
 		return PTR_ERR(batch);
 
@@ -1577,6 +1550,47 @@ static long eb_reloc_vma_validate(struct i915_execbuffer *eb, struct eb_vma *ev)
 	return required;
 }
 
+static bool reloc_can_use_engine(const struct intel_engine_cs *engine)
+{
+	return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6);
+}
+
+static int reloc_gpu_alloc(struct i915_execbuffer *eb)
+{
+	struct intel_engine_cs *engine = eb->engine;
+
+	if (!reloc_can_use_engine(engine)) {
+		engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
+		if (!engine)
+			return -ENODEV;
+	}
+
+	return __reloc_gpu_alloc(eb, engine);
+}
+
+static int reloc_gpu(struct i915_execbuffer *eb)
+{
+	struct eb_vma *ev;
+	int flush, err;
+
+	err = reloc_gpu_alloc(eb);
+	if (err)
+		return err;
+	GEM_BUG_ON(!eb->reloc_cache.rq);
+
+	list_for_each_entry(ev, &eb->relocs, reloc_link) {
+		err = eb_relocate_vma(eb, ev);
+		if (err)
+			goto out;
+	}
+
+out:
+	flush = reloc_gpu_flush(&eb->reloc_cache);
+	if (!err)
+		err = flush;
+	return err;
+}
+
 static int eb_relocate(struct i915_execbuffer *eb)
 {
 	int err;
@@ -1594,7 +1608,6 @@ static int eb_relocate(struct i915_execbuffer *eb)
 	/* The objects are in their final locations, apply the relocations. */
 	if (eb->args->flags & __EXEC_HAS_RELOC) {
 		struct eb_vma *ev, *en;
-		int flush;
 
 		list_for_each_entry_safe(ev, en, &eb->relocs, reloc_link) {
 			long count;
@@ -1607,18 +1620,14 @@ static int eb_relocate(struct i915_execbuffer *eb)
 				list_del_init(&ev->reloc_link);
 		}
 
-		list_for_each_entry(ev, &eb->relocs, reloc_link) {
-			err = eb_relocate_vma(eb, ev);
+		if (!list_empty(&eb->relocs)) {
+			err = reloc_gpu(eb);
 			if (err)
-				break;
+				return err;
 		}
-
-		flush = reloc_gpu_flush(&eb->reloc_cache);
-		if (!err)
-			err = flush;
 	}
 
-	return err;
+	return 0;
 }
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
@@ -2618,9 +2627,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		batch = vma;
 	}
 
-	/* All GPU relocation batches must be submitted prior to the user rq */
-	GEM_BUG_ON(eb.reloc_cache.rq);
-
 	/* Allocate a request for this batch buffer nice and early. */
 	eb.request = i915_request_create(eb.context);
 	if (IS_ERR(eb.request)) {
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index 57c14d3340cd..50fe22d87ae1 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -36,6 +36,10 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	if (err)
 		return err;
 
+	err = reloc_gpu_alloc(eb);
+	if (err)
+		goto unpin_vma;
+
 	/* 8-Byte aligned */
 	err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
 	if (err)
@@ -63,7 +67,6 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	err = reloc_gpu_flush(&eb->reloc_cache);
 	if (err)
 		goto put_rq;
-	GEM_BUG_ON(eb->reloc_cache.rq);
 
 	err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
 	if (err) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 21/36] drm/i915/gem: Build the reloc request first
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (18 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 20/36] drm/i915/gem: Lift GPU relocation allocation Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 22/36] drm/i915/gem: Add all GPU reloc awaits/signals en masse Chris Wilson
                   ` (22 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If we get interrupted in the middle of chaining up the relocation
entries, we will fail to submit the relocation batch. However, we will
report having already completed some of the relocations, and so the
reloc.presumed_offset will no longer match the batch contents, causing
confusion and invalid future batches. If we build the relocation request
packet first, we can always emit as far as we get up in the relocation
chain.

Fixes: 0e97fbb08055 ("drm/i915/gem: Use a single chained reloc batches for a single execbuf")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 51 ++++++++++---------
 .../i915/gem/selftests/i915_gem_execbuffer.c  |  8 +--
 2 files changed, 31 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 9537fd87e3a4..c48950d7f1c9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1021,11 +1021,27 @@ static unsigned int reloc_bb_flags(const struct reloc_cache *cache)
 	return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
 }
 
-static int reloc_gpu_flush(struct reloc_cache *cache)
+static int reloc_gpu_emit(struct reloc_cache *cache)
 {
 	struct i915_request *rq = cache->rq;
 	int err;
 
+	err = 0;
+	if (rq->engine->emit_init_breadcrumb)
+		err = rq->engine->emit_init_breadcrumb(rq);
+	if (!err)
+		err = rq->engine->emit_bb_start(rq,
+						rq->batch->node.start,
+						PAGE_SIZE,
+						reloc_bb_flags(cache));
+
+	return err;
+}
+
+static void reloc_gpu_flush(struct reloc_cache *cache)
+{
+	struct i915_request *rq = cache->rq;
+
 	if (cache->rq_vma) {
 		struct drm_i915_gem_object *obj = cache->rq_vma->obj;
 
@@ -1037,21 +1053,8 @@ static int reloc_gpu_flush(struct reloc_cache *cache)
 		i915_gem_object_unpin_map(obj);
 	}
 
-	err = 0;
-	if (rq->engine->emit_init_breadcrumb)
-		err = rq->engine->emit_init_breadcrumb(rq);
-	if (!err)
-		err = rq->engine->emit_bb_start(rq,
-						rq->batch->node.start,
-						PAGE_SIZE,
-						reloc_bb_flags(cache));
-	if (err)
-		i915_request_set_error_once(rq, err);
-
 	intel_gt_chipset_flush(rq->engine->gt);
 	i915_request_add(rq);
-
-	return err;
 }
 
 static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
@@ -1139,7 +1142,7 @@ __reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine)
 		err = i915_vma_move_to_active(batch, rq, 0);
 	i915_vma_unlock(batch);
 	if (err)
-		goto skip_request;
+		goto err_request;
 
 	rq->batch = batch;
 	i915_vma_unpin(batch);
@@ -1152,8 +1155,6 @@ __reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine)
 	/* Return with batch mapping (cmd) still pinned */
 	goto out_pool;
 
-skip_request:
-	i915_request_set_error_once(rq, err);
 err_request:
 	i915_request_add(rq);
 err_unpin:
@@ -1186,10 +1187,8 @@ static u32 *reloc_batch_grow(struct i915_execbuffer *eb,
 	if (unlikely(cache->rq_size + len >
 		     PAGE_SIZE / sizeof(u32) - RELOC_TAIL)) {
 		err = reloc_gpu_chain(cache);
-		if (unlikely(err)) {
-			i915_request_set_error_once(cache->rq, err);
+		if (unlikely(err))
 			return ERR_PTR(err);
-		}
 	}
 
 	GEM_BUG_ON(cache->rq_size + len >= PAGE_SIZE  / sizeof(u32));
@@ -1571,23 +1570,25 @@ static int reloc_gpu_alloc(struct i915_execbuffer *eb)
 static int reloc_gpu(struct i915_execbuffer *eb)
 {
 	struct eb_vma *ev;
-	int flush, err;
+	int err;
 
 	err = reloc_gpu_alloc(eb);
 	if (err)
 		return err;
 	GEM_BUG_ON(!eb->reloc_cache.rq);
 
+	err = reloc_gpu_emit(&eb->reloc_cache);
+	if (err)
+		goto out;
+
 	list_for_each_entry(ev, &eb->relocs, reloc_link) {
 		err = eb_relocate_vma(eb, ev);
 		if (err)
-			goto out;
+			break;
 	}
 
 out:
-	flush = reloc_gpu_flush(&eb->reloc_cache);
-	if (!err)
-		err = flush;
+	reloc_gpu_flush(&eb->reloc_cache);
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index 50fe22d87ae1..faed6480a792 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -40,6 +40,10 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	if (err)
 		goto unpin_vma;
 
+	err = reloc_gpu_emit(&eb->reloc_cache);
+	if (err)
+		goto unpin_vma;
+
 	/* 8-Byte aligned */
 	err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
 	if (err)
@@ -64,9 +68,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 
 	GEM_BUG_ON(!eb->reloc_cache.rq);
 	rq = i915_request_get(eb->reloc_cache.rq);
-	err = reloc_gpu_flush(&eb->reloc_cache);
-	if (err)
-		goto put_rq;
+	reloc_gpu_flush(&eb->reloc_cache);
 
 	err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
 	if (err) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 22/36] drm/i915/gem: Add all GPU reloc awaits/signals en masse
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (19 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 21/36] drm/i915/gem: Build the reloc request first Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 23/36] dma-buf: Proxy fence, an unsignaled fence placeholder Chris Wilson
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Asynchronous waits and signaling form a traditional semaphore with all
the usual ordering problems with taking multiple locks. If we want to
add more than one wait on a shared resource by the GPU, we must ensure
that all the associated timelines are advanced atomically, ergo we must
lock all the timelines en masse.

Testcase: igt/gem_exec_reloc/basic-concurrent16
Fixes: 0e97fbb08055 ("drm/i915/gem: Use a single chained reloc batches for a single execbuf")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/1889
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 114 ++++++++++++------
 .../i915/gem/selftests/i915_gem_execbuffer.c  |  24 ++--
 2 files changed, 93 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index c48950d7f1c9..37855ae8f8b3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -259,7 +259,6 @@ struct i915_execbuffer {
 		bool has_fence : 1;
 		bool needs_unfenced : 1;
 
-		struct i915_vma *target;
 		struct i915_request *rq;
 		struct i915_vma *rq_vma;
 		u32 *rq_cmd;
@@ -924,7 +923,6 @@ static void reloc_cache_init(struct reloc_cache *cache,
 	cache->has_fence = cache->gen < 4;
 	cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
 	cache->node.flags = 0;
-	cache->target = NULL;
 }
 
 static inline void *unmask_page(unsigned long p)
@@ -1057,26 +1055,6 @@ static void reloc_gpu_flush(struct reloc_cache *cache)
 	i915_request_add(rq);
 }
 
-static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
-{
-	struct drm_i915_gem_object *obj = vma->obj;
-	int err;
-
-	i915_vma_lock(vma);
-
-	if (obj->cache_dirty & ~obj->cache_coherent)
-		i915_gem_clflush_object(obj, 0);
-	obj->write_domain = 0;
-
-	err = i915_request_await_object(rq, vma->obj, true);
-	if (err == 0)
-		err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
-
-	i915_vma_unlock(vma);
-
-	return err;
-}
-
 static int
 __reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine)
 {
@@ -1166,24 +1144,12 @@ __reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine)
 	return err;
 }
 
-static u32 *reloc_batch_grow(struct i915_execbuffer *eb,
-			     struct i915_vma *vma,
-			     unsigned int len)
+static u32 *reloc_batch_grow(struct i915_execbuffer *eb, unsigned int len)
 {
 	struct reloc_cache *cache = &eb->reloc_cache;
 	u32 *cmd;
 	int err;
 
-	if (vma != cache->target) {
-		err = reloc_move_to_gpu(cache->rq, vma);
-		if (unlikely(err)) {
-			i915_request_set_error_once(cache->rq, err);
-			return ERR_PTR(err);
-		}
-
-		cache->target = vma;
-	}
-
 	if (unlikely(cache->rq_size + len >
 		     PAGE_SIZE / sizeof(u32) - RELOC_TAIL)) {
 		err = reloc_gpu_chain(cache);
@@ -1229,7 +1195,7 @@ static int __reloc_entry_gpu(struct i915_execbuffer *eb,
 	else
 		len = 3;
 
-	batch = reloc_batch_grow(eb, vma, len);
+	batch = reloc_batch_grow(eb, len);
 	if (IS_ERR(batch))
 		return PTR_ERR(batch);
 
@@ -1549,6 +1515,78 @@ static long eb_reloc_vma_validate(struct i915_execbuffer *eb, struct eb_vma *ev)
 	return required;
 }
 
+static int reloc_move_to_gpu(struct reloc_cache *cache, struct eb_vma *ev)
+{
+	struct i915_request *rq = cache->rq;
+	struct i915_vma *vma = ev->vma;
+	struct drm_i915_gem_object *obj = vma->obj;
+	int err;
+
+	if (obj->cache_dirty & ~obj->cache_coherent)
+		i915_gem_clflush_object(obj, 0);
+
+	obj->write_domain = I915_GEM_DOMAIN_RENDER;
+	obj->read_domains = I915_GEM_DOMAIN_RENDER;
+
+	err = i915_request_await_object(rq, obj, true);
+	if (err)
+		return err;
+
+	err = __i915_vma_move_to_active(vma, rq);
+	if (err)
+		return err;
+
+	dma_resv_add_excl_fence(vma->resv, &rq->fence);
+
+	return 0;
+}
+
+static int
+lock_relocs(struct i915_execbuffer *eb)
+{
+	struct ww_acquire_ctx acquire;
+	struct eb_vma *ev;
+	int err = 0;
+
+	ww_acquire_init(&acquire, &reservation_ww_class);
+
+	list_for_each_entry(ev, &eb->relocs, reloc_link) {
+		struct i915_vma *vma = ev->vma;
+
+		err = ww_mutex_lock_interruptible(&vma->resv->lock, &acquire);
+		if (err == -EDEADLK) {
+			struct eb_vma *unlock = ev, *en;
+
+			list_for_each_entry_safe_continue_reverse(unlock, en,
+								  &eb->relocs,
+								  reloc_link) {
+				ww_mutex_unlock(&unlock->vma->resv->lock);
+				list_move_tail(&unlock->reloc_link,
+					       &eb->relocs);
+			}
+
+			GEM_BUG_ON(!list_is_first(&ev->reloc_link,
+						  &eb->relocs));
+			err = ww_mutex_lock_slow_interruptible(&vma->resv->lock,
+							       &acquire);
+		}
+		if (err)
+			break;
+	}
+
+	ww_acquire_done(&acquire);
+
+	list_for_each_entry_continue_reverse(ev, &eb->relocs, reloc_link) {
+		if (err == 0)
+			err = reloc_move_to_gpu(&eb->reloc_cache, ev);
+		ww_mutex_unlock(&ev->vma->resv->lock);
+	}
+
+	ww_acquire_fini(&acquire);
+
+	return err;
+}
+
 static bool reloc_can_use_engine(const struct intel_engine_cs *engine)
 {
 	return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6);
@@ -1577,6 +1615,10 @@ static int reloc_gpu(struct i915_execbuffer *eb)
 		return err;
 	GEM_BUG_ON(!eb->reloc_cache.rq);
 
+	err = lock_relocs(eb);
+	if (err)
+		goto out;
+
 	err = reloc_gpu_emit(&eb->reloc_cache);
 	if (err)
 		goto out;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index faed6480a792..4f10b51f9a7e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -24,15 +24,15 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 		GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0);
 	const u32 *map = page_mask_bits(obj->mm.mapping);
 	struct i915_request *rq;
-	struct i915_vma *vma;
+	struct eb_vma ev;
 	int err;
 	int i;
 
-	vma = i915_vma_instance(obj, eb->context->vm, NULL);
-	if (IS_ERR(vma))
-		return PTR_ERR(vma);
+	ev.vma = i915_vma_instance(obj, eb->context->vm, NULL);
+	if (IS_ERR(ev.vma))
+		return PTR_ERR(ev.vma);
 
-	err = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_HIGH);
+	err = i915_vma_pin(ev.vma, 0, 0, PIN_USER | PIN_HIGH);
 	if (err)
 		return err;
 
@@ -40,17 +40,22 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	if (err)
 		goto unpin_vma;
 
+	list_add(&ev.reloc_link, &eb->relocs);
+	err = lock_relocs(eb);
+	if (err)
+		goto unpin_vma;
+
 	err = reloc_gpu_emit(&eb->reloc_cache);
 	if (err)
 		goto unpin_vma;
 
 	/* 8-Byte aligned */
-	err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
+	err = __reloc_entry_gpu(eb, ev.vma, offsets[0] * sizeof(u32), 0);
 	if (err)
 		goto unpin_vma;
 
 	/* !8-Byte aligned */
-	err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1);
+	err = __reloc_entry_gpu(eb, ev.vma, offsets[1] * sizeof(u32), 1);
 	if (err)
 		goto unpin_vma;
 
@@ -62,7 +67,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	eb->reloc_cache.rq_size += i;
 
 	/* Force batch chaining */
-	err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2);
+	err = __reloc_entry_gpu(eb, ev.vma, offsets[2] * sizeof(u32), 2);
 	if (err)
 		goto unpin_vma;
 
@@ -97,7 +102,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 put_rq:
 	i915_request_put(rq);
 unpin_vma:
-	i915_vma_unpin(vma);
+	i915_vma_unpin(ev.vma);
 	return err;
 }
 
@@ -121,6 +126,7 @@ static int igt_gpu_reloc(void *arg)
 	}
 
 	for_each_uabi_engine(eb.engine, eb.i915) {
+		INIT_LIST_HEAD(&eb.relocs);
 		reloc_cache_init(&eb.reloc_cache, eb.i915);
 		memset(map, POISON_INUSE, 4096);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 23/36] dma-buf: Proxy fence, an unsignaled fence placeholder
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (20 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 22/36] drm/i915/gem: Add all GPU reloc awaits/signals en masse Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 24/36] drm/i915: Unpeel awaits on a proxy fence Chris Wilson
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Often we need to create a fence for a future event that has not yet been
associated with a fence. We can store a proxy fence, a placeholder, in
the timeline and replace it later when the real fence is known. Any
listeners that attach to the proxy fence will automatically be signaled
when the real fence completes, and any future listeners will instead be
attach directly to the real fence avoiding any indirection overhead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/dma-buf/Makefile             |  13 +-
 drivers/dma-buf/dma-fence-private.h  |  20 +
 drivers/dma-buf/dma-fence-proxy.c    | 306 +++++++++++
 drivers/dma-buf/dma-fence.c          |   4 +-
 drivers/dma-buf/selftests.h          |   1 +
 drivers/dma-buf/st-dma-fence-proxy.c | 752 +++++++++++++++++++++++++++
 include/linux/dma-fence-proxy.h      |  38 ++
 7 files changed, 1130 insertions(+), 4 deletions(-)
 create mode 100644 drivers/dma-buf/dma-fence-private.h
 create mode 100644 drivers/dma-buf/dma-fence-proxy.c
 create mode 100644 drivers/dma-buf/st-dma-fence-proxy.c
 create mode 100644 include/linux/dma-fence-proxy.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 995e05f609ff..afaf6dadd9a3 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,6 +1,12 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
-	 dma-resv.o seqno-fence.o
+obj-y := \
+	dma-buf.o \
+	dma-fence.o \
+	dma-fence-array.o \
+	dma-fence-chain.o \
+	dma-fence-proxy.o \
+	dma-resv.o \
+	seqno-fence.o
 obj-$(CONFIG_DMABUF_HEAPS)	+= dma-heap.o
 obj-$(CONFIG_DMABUF_HEAPS)	+= heaps/
 obj-$(CONFIG_SYNC_FILE)		+= sync_file.o
@@ -10,6 +16,7 @@ obj-$(CONFIG_UDMABUF)		+= udmabuf.o
 dmabuf_selftests-y := \
 	selftest.o \
 	st-dma-fence.o \
-	st-dma-fence-chain.o
+	st-dma-fence-chain.o \
+	st-dma-fence-proxy.o
 
 obj-$(CONFIG_DMABUF_SELFTESTS)	+= dmabuf_selftests.o
diff --git a/drivers/dma-buf/dma-fence-private.h b/drivers/dma-buf/dma-fence-private.h
new file mode 100644
index 000000000000..6924d28af0fa
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-private.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Fence mechanism for dma-buf and to allow for asynchronous dma access
+ *
+ * Copyright (C) 2012 Canonical Ltd
+ * Copyright (C) 2012 Texas Instruments
+ *
+ * Authors:
+ * Rob Clark <robdclark@gmail.com>
+ * Maarten Lankhorst <maarten.lankhorst@canonical.com>
+ */
+
+#ifndef DMA_FENCE_PRIVATE_H
+#define DMA_FENCE_PRIAVTE_H
+
+struct dma_fence;
+
+bool __dma_fence_enable_signaling(struct dma_fence *fence);
+
+#endif /* DMA_FENCE_PRIAVTE_H */
diff --git a/drivers/dma-buf/dma-fence-proxy.c b/drivers/dma-buf/dma-fence-proxy.c
new file mode 100644
index 000000000000..42674e92b0f9
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-proxy.c
@@ -0,0 +1,306 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * dma-fence-proxy: placeholder unsignaled fence
+ *
+ * Copyright (C) 2017-2019 Intel Corporation
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/dma-fence-proxy.h>
+#include <linux/export.h>
+#include <linux/irq_work.h>
+#include <linux/slab.h>
+
+#include "dma-fence-private.h"
+
+struct dma_fence_proxy {
+	struct dma_fence base;
+
+	struct dma_fence *real;
+	struct dma_fence_cb cb;
+	struct irq_work work;
+
+	wait_queue_head_t wq;
+};
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#define same_lockclass(A, B) (A)->dep_map.key == (B)->dep_map.key
+#else
+#define same_lockclass(A, B) 0
+#endif
+
+static const char *proxy_get_driver_name(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	struct dma_fence *real = READ_ONCE(p->real);
+
+	return real ? real->ops->get_driver_name(real) : "proxy";
+}
+
+static const char *proxy_get_timeline_name(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	struct dma_fence *real = READ_ONCE(p->real);
+
+	return real ? real->ops->get_timeline_name(real) : "unset";
+}
+
+static void proxy_irq_work(struct irq_work *work)
+{
+	struct dma_fence_proxy *p = container_of(work, typeof(*p), work);
+
+	dma_fence_signal(&p->base);
+	dma_fence_put(&p->base);
+}
+
+static void proxy_callback(struct dma_fence *real, struct dma_fence_cb *cb)
+{
+	struct dma_fence_proxy *p = container_of(cb, typeof(*p), cb);
+
+	/* Signaled before enabling signalling callbacks? */
+	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &p->base.flags)) {
+		dma_fence_put(&p->base);
+		return;
+	}
+
+	if (real->error)
+		dma_fence_set_error(&p->base, real->error);
+
+	/* Lower the height of the proxy chain -> single stack frame */
+	irq_work_queue(&p->work);
+}
+
+static bool proxy_enable_signaling(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	struct dma_fence *real = READ_ONCE(p->real);
+	bool ret = true;
+
+	if (real) {
+		spin_lock_nested(real->lock,
+				 same_lockclass(&p->wq.lock, real->lock));
+		ret = __dma_fence_enable_signaling(real);
+		if (!ret && real->error)
+			dma_fence_set_error(&p->base, real->error);
+		spin_unlock(real->lock);
+	}
+
+	return ret;
+}
+
+static void proxy_release(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+
+	dma_fence_put(p->real);
+	dma_fence_free(&p->base);
+}
+
+const struct dma_fence_ops dma_fence_proxy_ops = {
+	.get_driver_name = proxy_get_driver_name,
+	.get_timeline_name = proxy_get_timeline_name,
+	.enable_signaling = proxy_enable_signaling,
+	.wait = dma_fence_default_wait,
+	.release = proxy_release,
+};
+EXPORT_SYMBOL_GPL(dma_fence_proxy_ops);
+
+/**
+ * __dma_fence_create_proxy - Create an unset dma-fence
+ * @context: context number to use for proxy fence
+ * @seqno: sequence number to use for proxy fence
+ *
+ * __dma_fence_create_proxy() creates a new dma_fence stub that is initially
+ * unsignaled and may later be replaced with a real fence. Any listeners
+ * to the proxy fence will be signaled when the target fence signals its
+ * completion.
+ */
+struct dma_fence *__dma_fence_create_proxy(u64 context, u64 seqno)
+{
+	struct dma_fence_proxy *p;
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return NULL;
+
+	init_waitqueue_head(&p->wq);
+	dma_fence_init(&p->base, &dma_fence_proxy_ops, &p->wq.lock,
+		       context, seqno);
+	init_irq_work(&p->work, proxy_irq_work);
+
+	return &p->base;
+}
+EXPORT_SYMBOL(__dma_fence_create_proxy);
+
+/**
+ * dma_fence_create_proxy - Create an unset dma-fence
+ *
+ * Wraps __dma_fence_create_proxy() to create a new proxy fence with the
+ * next available (unique) context id.
+ */
+struct dma_fence *dma_fence_create_proxy(void)
+{
+	return __dma_fence_create_proxy(dma_fence_context_alloc(1), 0);
+}
+EXPORT_SYMBOL(dma_fence_create_proxy);
+
+static void __wake_up_listeners(struct dma_fence_proxy *p)
+{
+	struct wait_queue_entry *wait, *next;
+
+	list_for_each_entry_safe(wait, next, &p->wq.head, entry) {
+		INIT_LIST_HEAD(&wait->entry);
+		wait->func(wait, TASK_NORMAL, 0, p->real);
+	}
+}
+
+static void set_proxy_callback(struct dma_fence *real, struct dma_fence_cb *cb)
+{
+	cb->func = proxy_callback;
+	list_add_tail(&cb->node, &real->cb_list);
+}
+
+static void proxy_assign(struct dma_fence *fence, struct dma_fence *real)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	unsigned long flags;
+
+	if (WARN_ON(fence == real))
+		return;
+
+	if (WARN_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)))
+		return;
+
+	if (WARN_ON(p->real))
+		return;
+
+	spin_lock_irqsave(&p->wq.lock, flags);
+
+	if (unlikely(!real)) {
+		dma_fence_signal_locked(&p->base);
+		goto unlock;
+	}
+
+	p->real = dma_fence_get(real);
+
+	dma_fence_get(&p->base);
+	spin_lock_nested(real->lock, same_lockclass(&p->wq.lock, real->lock));
+	if (dma_fence_is_signaled_locked(real))
+		proxy_callback(real, &p->cb);
+	else if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &p->base.flags) &&
+		 !__dma_fence_enable_signaling(real))
+		proxy_callback(real, &p->cb);
+	else
+		set_proxy_callback(real, &p->cb);
+	spin_unlock(real->lock);
+
+unlock:
+	__wake_up_listeners(p);
+	spin_unlock_irqrestore(&p->wq.lock, flags);
+}
+
+/**
+ * dma_fence_replace_proxy - Replace the proxy fence with the real target
+ * @slot: pointer to location of fence to update
+ * @fence: the new fence to store in @slot
+ *
+ * Once the real dma_fence is known, we can replace the proxy fence holder
+ * with a pointer to the real dma fence. Future listeners will attach to
+ * the real fence, avoiding any indirection overhead. Previous listeners
+ * will remain attached to the proxy fence, and be signaled in turn when
+ * the target fence completes.
+ */
+struct dma_fence *
+dma_fence_replace_proxy(struct dma_fence __rcu **slot, struct dma_fence *fence)
+{
+	struct dma_fence *old;
+
+	if (fence)
+		dma_fence_get(fence);
+
+	old = rcu_replace_pointer(*slot, fence, true);
+	if (old && dma_fence_is_proxy(old))
+		proxy_assign(old, fence);
+
+	return old;
+}
+EXPORT_SYMBOL(dma_fence_replace_proxy);
+
+/**
+ * dma_fence_proxy_set_real - Set the target of a proxy fence
+ * @fence: the proxy fence
+ * @real: the target fence.
+ *
+ */
+void dma_fence_proxy_set_real(struct dma_fence *fence, struct dma_fence *real)
+{
+	if (dma_fence_is_proxy(fence))
+		proxy_assign(fence, real);
+}
+EXPORT_SYMBOL(dma_fence_proxy_set_real);
+
+/**
+ * dma_fence_proxy_get_real - Query the target of a proxy fence
+ * @fence: the proxy fence
+ *
+ * Unpeel the proxy fence to see if it has been replaced with a real fence.
+ */
+struct dma_fence *dma_fence_proxy_get_real(struct dma_fence *fence)
+{
+	if (dma_fence_is_proxy(fence)) {
+		struct dma_fence_proxy *p =
+			container_of(fence, typeof(*p), base);
+
+		if (p->real)
+			fence = p->real;
+	}
+
+	return fence;
+}
+EXPORT_SYMBOL(dma_fence_proxy_get_real);
+
+void dma_fence_add_proxy_listener(struct dma_fence *fence,
+				  struct wait_queue_entry *wait)
+{
+	if (dma_fence_is_proxy(fence)) {
+		struct dma_fence_proxy *p =
+			container_of(fence, typeof(*p), base);
+		unsigned long flags;
+
+		spin_lock_irqsave(&p->wq.lock, flags);
+		if (!p->real) {
+			list_add_tail(&wait->entry, &p->wq.head);
+			wait = NULL;
+		}
+		fence = p->real;
+		spin_unlock_irqrestore(&p->wq.lock, flags);
+	}
+
+	if (wait) {
+		INIT_LIST_HEAD(&wait->entry);
+		wait->func(wait, TASK_NORMAL, 0, fence);
+	}
+}
+EXPORT_SYMBOL(dma_fence_add_proxy_listener);
+
+bool dma_fence_remove_proxy_listener(struct dma_fence *fence,
+				     struct wait_queue_entry *wait)
+{
+	bool ret = false;
+
+	if (dma_fence_is_proxy(fence)) {
+		struct dma_fence_proxy *p =
+			container_of(fence, typeof(*p), base);
+		unsigned long flags;
+
+		spin_lock_irqsave(&p->wq.lock, flags);
+		if (!list_empty(&wait->entry)) {
+			list_del_init(&wait->entry);
+			ret = true;
+		}
+		spin_unlock_irqrestore(&p->wq.lock, flags);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(dma_fence_remove_proxy_listener);
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 656e9ac2d028..329bd033059f 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -19,6 +19,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/dma_fence.h>
 
+#include "dma-fence-private.h"
+
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
@@ -275,7 +277,7 @@ void dma_fence_free(struct dma_fence *fence)
 }
 EXPORT_SYMBOL(dma_fence_free);
 
-static bool __dma_fence_enable_signaling(struct dma_fence *fence)
+bool __dma_fence_enable_signaling(struct dma_fence *fence)
 {
 	bool was_set;
 
diff --git a/drivers/dma-buf/selftests.h b/drivers/dma-buf/selftests.h
index 55918ef9adab..616eca70e2d8 100644
--- a/drivers/dma-buf/selftests.h
+++ b/drivers/dma-buf/selftests.h
@@ -12,3 +12,4 @@
 selftest(sanitycheck, __sanitycheck__) /* keep first (igt selfcheck) */
 selftest(dma_fence, dma_fence)
 selftest(dma_fence_chain, dma_fence_chain)
+selftest(dma_fence_proxy, dma_fence_proxy)
diff --git a/drivers/dma-buf/st-dma-fence-proxy.c b/drivers/dma-buf/st-dma-fence-proxy.c
new file mode 100644
index 000000000000..c3f210bc4e60
--- /dev/null
+++ b/drivers/dma-buf/st-dma-fence-proxy.c
@@ -0,0 +1,752 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-fence.h>
+#include <linux/dma-fence-proxy.h>
+#include <linux/kernel.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include "selftest.h"
+
+static struct kmem_cache *slab_fences;
+
+static struct mock_fence {
+	struct dma_fence base;
+	spinlock_t lock;
+} *to_mock_fence(struct dma_fence *f) {
+	return container_of(f, struct mock_fence, base);
+}
+
+static const char *mock_name(struct dma_fence *f)
+{
+	return "mock";
+}
+
+static void mock_fence_release(struct dma_fence *f)
+{
+	kmem_cache_free(slab_fences, to_mock_fence(f));
+}
+
+static const struct dma_fence_ops mock_ops = {
+	.get_driver_name = mock_name,
+	.get_timeline_name = mock_name,
+	.release = mock_fence_release,
+};
+
+static struct dma_fence *mock_fence(void)
+{
+	struct mock_fence *f;
+
+	f = kmem_cache_alloc(slab_fences, GFP_KERNEL);
+	if (!f)
+		return NULL;
+
+	spin_lock_init(&f->lock);
+	dma_fence_init(&f->base, &mock_ops, &f->lock, 0, 0);
+
+	return &f->base;
+}
+
+static int sanitycheck(void *arg)
+{
+	struct dma_fence *f;
+
+	f = dma_fence_create_proxy();
+	if (!f)
+		return -ENOMEM;
+
+	dma_fence_signal(f);
+	dma_fence_put(f);
+
+	return 0;
+}
+
+struct fences {
+	struct dma_fence *real;
+	struct dma_fence *proxy;
+	struct dma_fence __rcu *slot;
+};
+
+static int create_fences(struct fences *f, bool attach)
+{
+	f->proxy = dma_fence_create_proxy();
+	if (!f->proxy)
+		return -ENOMEM;
+
+	RCU_INIT_POINTER(f->slot, f->proxy);
+
+	f->real = mock_fence();
+	if (!f->real) {
+		dma_fence_put(f->proxy);
+		return -ENOMEM;
+	}
+
+	if (attach)
+		dma_fence_replace_proxy(&f->slot, f->real);
+
+	return 0;
+}
+
+static void free_fences(struct fences *f)
+{
+	dma_fence_put(dma_fence_replace_proxy(&f->slot, NULL));
+
+	dma_fence_signal(f->real);
+	dma_fence_put(f->real);
+
+	dma_fence_signal(f->proxy);
+	dma_fence_put(f->proxy);
+}
+
+static int wrap_target(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	if (dma_fence_proxy_get_real(f.proxy) != f.proxy) {
+		pr_err("Unwrapped proxy fenced reported a target fence!\n");
+		goto err_free;
+	}
+
+	dma_fence_proxy_set_real(f.proxy, f.real);
+	rcu_assign_pointer(f.slot, dma_fence_get(f.real)); /* free_fences() */
+
+	if (dma_fence_proxy_get_real(f.proxy) != f.real) {
+		pr_err("Wrapped proxy fenced did not report the target fence!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_proxy(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_proxy_get_real(f.proxy) != f.real) {
+		pr_err("Wrapped proxy fenced did not report the target fence!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_signaling(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence unexpectedly signaled on creation\n");
+		goto err_free;
+	}
+
+	if (dma_fence_signal(f.real)) {
+		pr_err("Fence reported being already signaled\n");
+		goto err_free;
+	}
+
+	if (!dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence not reporting signaled\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_signaling_recurse(void *arg)
+{
+	struct fences f;
+	struct dma_fence *chain;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	chain = dma_fence_create_proxy();
+	if (!chain) {
+		err = -ENOMEM;
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, chain);
+	dma_fence_put(dma_fence_replace_proxy(&f.slot, f.real));
+	dma_fence_put(chain);
+
+	/* f.real <- chain <- f.proxy */
+
+	if (dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence unexpectedly signaled on creation\n");
+		goto err_free;
+	}
+
+	if (dma_fence_signal(f.real)) {
+		pr_err("Fence reported being already signaled\n");
+		goto err_free;
+	}
+
+	if (!dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence not reporting signaled\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+struct simple_cb {
+	struct dma_fence_cb cb;
+	bool seen;
+};
+
+static void simple_callback(struct dma_fence *f, struct dma_fence_cb *cb)
+{
+	/* Ensure the callback marker is visible, no excuses for missing it! */
+	smp_store_mb(container_of(cb, struct simple_cb, cb)->seen, true);
+}
+
+static int wrap_add_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_add_callback_recurse(void *arg)
+{
+	struct simple_cb cb = {};
+	struct dma_fence *chain;
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	chain = dma_fence_create_proxy();
+	if (!chain) {
+		err = -ENOMEM;
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, chain);
+	dma_fence_put(dma_fence_replace_proxy(&f.slot, f.real));
+	dma_fence_put(chain);
+
+	/* f.real <- chain <- f.proxy */
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_late_add_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	dma_fence_signal(f.real);
+
+	if (!dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Added callback, but fence was already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (cb.seen) {
+		pr_err("Callback called after failed attachment!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_early_add_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, f.real);
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_early_add_callback_late(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	dma_fence_signal(f.real);
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, f.real);
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_early_add_callback_early(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, f.real);
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_rm_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	if (!dma_fence_remove_callback(f.proxy, &cb.cb)) {
+		pr_err("Failed to remove callback!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (cb.seen) {
+		pr_err("Callback still signaled after removal!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_late_rm_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	if (dma_fence_remove_callback(f.proxy, &cb.cb)) {
+		pr_err("Callback removal succeed after being executed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_status(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_get_status(f.proxy)) {
+		pr_err("Fence unexpectedly has signaled status on creation\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!dma_fence_get_status(f.proxy)) {
+		pr_err("Fence not reporting signaled status\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_error(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	dma_fence_set_error(f.real, -EIO);
+
+	if (dma_fence_get_status(f.proxy)) {
+		pr_err("Fence unexpectedly has error status before signal\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (dma_fence_get_status(f.proxy) != -EIO) {
+		pr_err("Fence not reporting error status, got %d\n",
+		       dma_fence_get_status(f.proxy));
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_wait(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_wait_timeout(f.proxy, false, 0) != 0) {
+		pr_err("Wait reported complete before being signaled\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+
+	if (dma_fence_wait_timeout(f.proxy, false, 0) == 0) {
+		pr_err("Wait reported incomplete after being signaled\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	dma_fence_signal(f.real);
+	free_fences(&f);
+	return err;
+}
+
+struct wait_timer {
+	struct timer_list timer;
+	struct fences f;
+};
+
+static void wait_timer(struct timer_list *timer)
+{
+	struct wait_timer *wt = from_timer(wt, timer, timer);
+
+	dma_fence_signal(wt->f.real);
+}
+
+static int wrap_wait_timeout(void *arg)
+{
+	struct wait_timer wt;
+	int err = -EINVAL;
+
+	if (create_fences(&wt.f, true))
+		return -ENOMEM;
+
+	timer_setup_on_stack(&wt.timer, wait_timer, 0);
+
+	if (dma_fence_wait_timeout(wt.f.proxy, false, 1) != 0) {
+		pr_err("Wait reported complete before being signaled\n");
+		goto err_free;
+	}
+
+	mod_timer(&wt.timer, jiffies + 1);
+
+	if (dma_fence_wait_timeout(wt.f.proxy, false, 2) != 0) {
+		if (timer_pending(&wt.timer)) {
+			pr_notice("Timer did not fire within the jiffie!\n");
+			err = 0; /* not our fault! */
+		} else {
+			pr_err("Wait reported incomplete after timeout\n");
+		}
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	del_timer_sync(&wt.timer);
+	destroy_timer_on_stack(&wt.timer);
+	dma_fence_signal(wt.f.real);
+	free_fences(&wt.f);
+	return err;
+}
+
+struct proxy_wait {
+	struct wait_queue_entry base;
+	struct dma_fence *fence;
+	bool seen;
+};
+
+static int proxy_wait_cb(struct wait_queue_entry *entry,
+			 unsigned int mode, int flags, void *key)
+{
+	struct proxy_wait *p = container_of(entry, typeof(*p), base);
+
+	p->fence = key;
+	p->seen = true;
+
+	return 0;
+}
+
+static int wrap_listen_early(void *arg)
+{
+	struct proxy_wait wait = { .base.func = proxy_wait_cb };
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	dma_fence_replace_proxy(&f.slot, f.real);
+	dma_fence_add_proxy_listener(f.proxy, &wait.base);
+
+	if (!wait.seen) {
+		pr_err("Proxy listener was not called after replace!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	if (wait.fence != f.real) {
+		pr_err("Proxy listener was not passed the real fence!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	dma_fence_signal(f.real);
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_listen_late(void *arg)
+{
+	struct proxy_wait wait = { .base.func = proxy_wait_cb };
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	dma_fence_add_proxy_listener(f.proxy, &wait.base);
+	dma_fence_replace_proxy(&f.slot, f.real);
+
+	if (!wait.seen) {
+		pr_err("Proxy listener was not called on replace!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	if (wait.fence != f.real) {
+		pr_err("Proxy listener was not passed the real fence!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	dma_fence_signal(f.real);
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_listen_cancel(void *arg)
+{
+	struct proxy_wait wait = { .base.func = proxy_wait_cb };
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	dma_fence_add_proxy_listener(f.proxy, &wait.base);
+	if (!dma_fence_remove_proxy_listener(f.proxy, &wait.base)) {
+		pr_err("Cancelling listener, already detached?\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+	dma_fence_replace_proxy(&f.slot, f.real);
+
+	if (wait.seen) {
+		pr_err("Proxy listener was called after being removed!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	if (dma_fence_remove_proxy_listener(f.proxy, &wait.base)) {
+		pr_err("Double listener cancellation!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	dma_fence_signal(f.real);
+	free_fences(&f);
+	return err;
+}
+
+int dma_fence_proxy(void)
+{
+	static const struct subtest tests[] = {
+		SUBTEST(sanitycheck),
+		SUBTEST(wrap_target),
+		SUBTEST(wrap_proxy),
+		SUBTEST(wrap_signaling),
+		SUBTEST(wrap_signaling_recurse),
+		SUBTEST(wrap_add_callback),
+		SUBTEST(wrap_add_callback_recurse),
+		SUBTEST(wrap_late_add_callback),
+		SUBTEST(wrap_early_add_callback),
+		SUBTEST(wrap_early_add_callback_late),
+		SUBTEST(wrap_early_add_callback_early),
+		SUBTEST(wrap_rm_callback),
+		SUBTEST(wrap_late_rm_callback),
+		SUBTEST(wrap_status),
+		SUBTEST(wrap_error),
+		SUBTEST(wrap_wait),
+		SUBTEST(wrap_wait_timeout),
+		SUBTEST(wrap_listen_early),
+		SUBTEST(wrap_listen_late),
+		SUBTEST(wrap_listen_cancel),
+	};
+	int ret;
+
+	slab_fences = KMEM_CACHE(mock_fence,
+				 SLAB_TYPESAFE_BY_RCU |
+				 SLAB_HWCACHE_ALIGN);
+	if (!slab_fences)
+		return -ENOMEM;
+
+	ret = subtests(tests, NULL);
+
+	kmem_cache_destroy(slab_fences);
+
+	return ret;
+}
diff --git a/include/linux/dma-fence-proxy.h b/include/linux/dma-fence-proxy.h
new file mode 100644
index 000000000000..6a986b5bb009
--- /dev/null
+++ b/include/linux/dma-fence-proxy.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * dma-fence-proxy: allows waiting upon unset and future fences
+ *
+ * Copyright (C) 2017 Intel Corporation
+ */
+
+#ifndef __LINUX_DMA_FENCE_PROXY_H
+#define __LINUX_DMA_FENCE_PROXY_H
+
+#include <linux/kernel.h>
+#include <linux/dma-fence.h>
+
+struct wait_queue_entry;
+
+extern const struct dma_fence_ops dma_fence_proxy_ops;
+
+struct dma_fence *__dma_fence_create_proxy(u64 context, u64 seqno);
+struct dma_fence *dma_fence_create_proxy(void);
+
+static inline bool dma_fence_is_proxy(struct dma_fence *fence)
+{
+	return fence->ops == &dma_fence_proxy_ops;
+}
+
+void dma_fence_proxy_set_real(struct dma_fence *fence, struct dma_fence *real);
+struct dma_fence *dma_fence_proxy_get_real(struct dma_fence *fence);
+
+struct dma_fence *
+dma_fence_replace_proxy(struct dma_fence __rcu **slot,
+			struct dma_fence *fence);
+
+void dma_fence_add_proxy_listener(struct dma_fence *fence,
+				  struct wait_queue_entry *wait);
+bool dma_fence_remove_proxy_listener(struct dma_fence *fence,
+				     struct wait_queue_entry *wait);
+
+#endif /* __LINUX_DMA_FENCE_PROXY_H */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 24/36] drm/i915: Unpeel awaits on a proxy fence
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (21 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 23/36] dma-buf: Proxy fence, an unsignaled fence placeholder Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 25/36] drm/i915/gem: Make relocations atomic within execbuf Chris Wilson
                   ` (19 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If the real target for a proxy fence is known at the time we are
attaching our awaits, use the real target in preference to hooking up to
the proxy. If use the real target instead, we can optimize the awaits,
e.g. if it along the same engine, we can order the submission and avoid
the wait-for-completion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c   | 157 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c |  41 +++++++
 drivers/gpu/drm/i915/i915_scheduler.h |   3 +
 3 files changed, 201 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9537e30f9439..02747c171c54 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -24,6 +24,7 @@
 
 #include <linux/dma-fence-array.h>
 #include <linux/dma-fence-chain.h>
+#include <linux/dma-fence-proxy.h>
 #include <linux/irq_work.h>
 #include <linux/prefetch.h>
 #include <linux/sched.h>
@@ -461,6 +462,7 @@ static bool fatal_error(int error)
 	case 0: /* not an error! */
 	case -EAGAIN: /* innocent victim of a GT reset (__i915_request_reset) */
 	case -ETIMEDOUT: /* waiting for Godot (timer_i915_sw_fence_wake) */
+	case -EDEADLK: /* cyclic fence lockup (await_proxy)  */
 		return false;
 	default:
 		return true;
@@ -1251,6 +1253,138 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
 	return err;
 }
 
+struct await_proxy {
+	struct wait_queue_entry base;
+	struct i915_request *request;
+	struct dma_fence *fence;
+	struct timer_list timer;
+	struct work_struct work;
+	int (*attach)(struct await_proxy *ap);
+	void *data;
+};
+
+static void await_proxy_work(struct work_struct *work)
+{
+	struct await_proxy *ap = container_of(work, typeof(*ap), work);
+	struct i915_request *rq = ap->request;
+
+	del_timer_sync(&ap->timer);
+
+	if (ap->fence) {
+		int err = 0;
+
+		/*
+		 * If the fence is external, we impose a 10s timeout.
+		 * However, if the fence is internal, we skip a timeout in
+		 * the belief that all fences are in-order (DAG, no cycles)
+		 * and we can enforce forward progress by reset the GPU if
+		 * necessary. A future fence, provided userspace, can trivially
+		 * generate a cycle in the dependency graph, and so cause
+		 * that entire cycle to become deadlocked and for no forward
+		 * progress to either be made, and the driver being kept
+		 * eternally awake.
+		 */
+		if (dma_fence_is_i915(ap->fence) &&
+		    !i915_sched_node_verify_dag(&rq->sched,
+						&to_request(ap->fence)->sched))
+			err = -EDEADLK;
+
+		if (!err) {
+			mutex_lock(&rq->context->timeline->mutex);
+			err = ap->attach(ap);
+			mutex_unlock(&rq->context->timeline->mutex);
+		}
+
+		/* Don't flag an error for co-dependent scheduling */
+		if (err == -EDEADLK) {
+			struct i915_sched_node *waiter =
+				&to_request(ap->fence)->sched;
+			struct i915_dependency *p;
+
+			list_for_each_entry_lockless(p,
+						     &rq->sched.waiters_list,
+						     wait_link) {
+				if (p->waiter == waiter &&
+				    p->flags & I915_DEPENDENCY_WEAK) {
+					err = 0;
+					break;
+				}
+			}
+		}
+
+		if (err < 0)
+			i915_sw_fence_set_error_once(&rq->submit, err);
+	}
+
+	i915_sw_fence_complete(&rq->submit);
+
+	dma_fence_put(ap->fence);
+	kfree(ap);
+}
+
+static int
+await_proxy_wake(struct wait_queue_entry *entry,
+		 unsigned int mode,
+		 int flags,
+		 void *fence)
+{
+	struct await_proxy *ap = container_of(entry, typeof(*ap), base);
+
+	ap->fence = dma_fence_get(fence);
+	schedule_work(&ap->work);
+
+	return 0;
+}
+
+static void
+await_proxy_timer(struct timer_list *t)
+{
+	struct await_proxy *ap = container_of(t, typeof(*ap), timer);
+
+	if (dma_fence_remove_proxy_listener(ap->base.private, &ap->base)) {
+		struct i915_request *rq = ap->request;
+
+		pr_notice("Asynchronous wait on unset proxy fence by %s:%s:%llx timed out\n",
+			  rq->fence.ops->get_driver_name(&rq->fence),
+			  rq->fence.ops->get_timeline_name(&rq->fence),
+			  rq->fence.seqno);
+		i915_sw_fence_set_error_once(&rq->submit, -ETIMEDOUT);
+
+		schedule_work(&ap->work);
+	}
+}
+
+static int
+__i915_request_await_proxy(struct i915_request *rq,
+			   struct dma_fence *fence,
+			   unsigned long timeout,
+			   int (*attach)(struct await_proxy *ap),
+			   void *data)
+{
+	struct await_proxy *ap;
+
+	ap = kzalloc(sizeof(*ap), I915_FENCE_GFP);
+	if (!ap)
+		return -ENOMEM;
+
+	i915_sw_fence_await(&rq->submit);
+	mark_external(rq);
+
+	ap->base.private = fence;
+	ap->base.func = await_proxy_wake;
+	ap->request = rq;
+	INIT_WORK(&ap->work, await_proxy_work);
+	ap->attach = attach;
+	ap->data = data;
+
+	timer_setup(&ap->timer, await_proxy_timer, 0);
+	if (timeout)
+		mod_timer(&ap->timer, round_jiffies_up(jiffies + timeout));
+
+	dma_fence_add_proxy_listener(fence, &ap->base);
+	return 0;
+}
+
 int
 i915_request_await_execution(struct i915_request *rq,
 			     struct dma_fence *fence,
@@ -1349,6 +1483,24 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 	return 0;
 }
 
+static int await_proxy(struct await_proxy *ap)
+{
+	return i915_request_await_dma_fence(ap->request, ap->fence);
+}
+
+static int
+i915_request_await_proxy(struct i915_request *rq, struct dma_fence *fence)
+{
+	/*
+	 * Wait until we know the real fence so that can optimise the
+	 * inter-fence synchronisation.
+	 */
+	return __i915_request_await_proxy(rq, fence,
+					  i915_fence_context_timeout(rq->i915,
+								     fence->context),
+					  await_proxy, NULL);
+}
+
 int
 i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
 {
@@ -1356,6 +1508,9 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
 	unsigned int nchild = 1;
 	int ret;
 
+	/* Unpeel the proxy fence if the real target is already known */
+	fence = dma_fence_proxy_get_real(fence);
+
 	/*
 	 * Note that if the fence-array was created in signal-on-any mode,
 	 * we should *not* decompose it into its individual fences. However,
@@ -1395,6 +1550,8 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
 
 		if (dma_fence_is_i915(fence))
 			ret = i915_request_await_request(rq, to_request(fence));
+		else if (dma_fence_is_proxy(fence))
+			ret = i915_request_await_proxy(rq, fence);
 		else
 			ret = i915_request_await_external(rq, fence);
 		if (ret < 0)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index cbb880b10c65..250832768279 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -469,6 +469,47 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node,
 	return 0;
 }
 
+bool i915_sched_node_verify_dag(struct i915_sched_node *waiter,
+				struct i915_sched_node *signaler)
+{
+	struct i915_dependency *dep, *p;
+	struct i915_dependency stack;
+	bool result = false;
+	LIST_HEAD(dfs);
+
+	if (list_empty(&waiter->waiters_list))
+		return true;
+
+	spin_lock_irq(&schedule_lock);
+
+	stack.signaler = signaler;
+	list_add(&stack.dfs_link, &dfs);
+
+	list_for_each_entry(dep, &dfs, dfs_link) {
+		struct i915_sched_node *node = dep->signaler;
+
+		if (node_signaled(node))
+			continue;
+
+		list_for_each_entry(p, &node->signalers_list, signal_link) {
+			if (p->signaler == waiter)
+				goto out;
+
+			if (list_empty(&p->dfs_link))
+				list_add_tail(&p->dfs_link, &dfs);
+		}
+	}
+
+	result = true;
+out:
+	list_for_each_entry_safe(dep, p, &dfs, dfs_link)
+		INIT_LIST_HEAD(&dep->dfs_link);
+
+	spin_unlock_irq(&schedule_lock);
+
+	return result;
+}
+
 void i915_sched_node_fini(struct i915_sched_node *node)
 {
 	struct i915_dependency *dep, *tmp;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 6f0bf00fc569..13432add8929 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -28,6 +28,9 @@
 void i915_sched_node_init(struct i915_sched_node *node);
 void i915_sched_node_reinit(struct i915_sched_node *node);
 
+bool i915_sched_node_verify_dag(struct i915_sched_node *waiter,
+				struct i915_sched_node *signal);
+
 bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 				      struct i915_sched_node *signal,
 				      struct i915_dependency *dep,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 25/36] drm/i915/gem: Make relocations atomic within execbuf
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (22 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 24/36] drm/i915: Unpeel awaits on a proxy fence Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 26/36] drm/syncobj: Allow use of dma-fence-proxy Chris Wilson
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Although we may chide userspace for reusing the same batches
concurrently from multiple threads, at the same time we must be very
careful to only execute the batch and its relocations as supplied by the
user. If we are not careful, we may allow another thread to rewrite the
current batch with its own relocations. We must order the relocations
and their batch such that they are an atomic pair on the GPU, and that
the ioctl itself appears atomic to userspace. The order of execution may
be undetermined, but it will not be subverted.

We could do this by moving the relocations into the main request, if it
were not for the situation where we need a second engine to perform the
relocations for us. Instead, we use the dependency tracking to only
publish the write fence on the main request and not on the relocation
request, so that concurrent updates are queued after the batch has
consumed its relocations.

Testcase: igt/gem_exec_reloc/basic-concurrent
Fixes: ef398881d27d ("drm/i915/gem: Limit struct_mutex to eb_reserve")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 92 ++++++++++++++-----
 .../i915/gem/selftests/i915_gem_execbuffer.c  | 11 ++-
 2 files changed, 73 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 37855ae8f8b3..2844274c37bb 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/intel-iommu.h>
+#include <linux/dma-fence-proxy.h>
 #include <linux/dma-resv.h>
 #include <linux/sync_file.h>
 #include <linux/uaccess.h>
@@ -259,6 +260,8 @@ struct i915_execbuffer {
 		bool has_fence : 1;
 		bool needs_unfenced : 1;
 
+		struct dma_fence *fence;
+
 		struct i915_request *rq;
 		struct i915_vma *rq_vma;
 		u32 *rq_cmd;
@@ -555,16 +558,6 @@ eb_add_vma(struct i915_execbuffer *eb,
 	ev->exec = entry;
 	ev->flags = entry->flags;
 
-	if (eb->lut_size > 0) {
-		ev->handle = entry->handle;
-		hlist_add_head(&ev->node,
-			       &eb->buckets[hash_32(entry->handle,
-						    eb->lut_size)]);
-	}
-
-	if (entry->relocation_count)
-		list_add_tail(&ev->reloc_link, &eb->relocs);
-
 	/*
 	 * SNA is doing fancy tricks with compressing batch buffers, which leads
 	 * to negative relocation deltas. Usually that works out ok since the
@@ -581,9 +574,21 @@ eb_add_vma(struct i915_execbuffer *eb,
 		if (eb->reloc_cache.has_fence)
 			ev->flags |= EXEC_OBJECT_NEEDS_FENCE;
 
+		INIT_LIST_HEAD(&ev->reloc_link);
+
 		eb->batch = ev;
 	}
 
+	if (entry->relocation_count)
+		list_add_tail(&ev->reloc_link, &eb->relocs);
+
+	if (eb->lut_size > 0) {
+		ev->handle = entry->handle;
+		hlist_add_head(&ev->node,
+			       &eb->buckets[hash_32(entry->handle,
+						    eb->lut_size)]);
+	}
+
 	if (eb_pin_vma(eb, entry, ev)) {
 		if (entry->offset != vma->node.start) {
 			entry->offset = vma->node.start | UPDATE;
@@ -923,6 +928,7 @@ static void reloc_cache_init(struct reloc_cache *cache,
 	cache->has_fence = cache->gen < 4;
 	cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
 	cache->node.flags = 0;
+	cache->fence = NULL;
 }
 
 static inline void *unmask_page(unsigned long p)
@@ -1052,6 +1058,7 @@ static void reloc_gpu_flush(struct reloc_cache *cache)
 	}
 
 	intel_gt_chipset_flush(rq->engine->gt);
+	i915_request_get(rq);
 	i915_request_add(rq);
 }
 
@@ -1284,16 +1291,6 @@ eb_relocate_entry(struct i915_execbuffer *eb,
 	if (gen8_canonical_addr(target->vma->node.start) == reloc->presumed_offset)
 		return 0;
 
-	/*
-	 * If we write into the object, we need to force the synchronisation
-	 * barrier, either with an asynchronous clflush or if we executed the
-	 * patching using the GPU (though that should be serialised by the
-	 * timeline). To be completely sure, and since we are required to
-	 * do relocations we are already stalling, disable the user's opt
-	 * out of our synchronisation.
-	 */
-	ev->flags &= ~EXEC_OBJECT_ASYNC;
-
 	/* and update the user's relocation entry */
 	return relocate_entry(eb, ev->vma, reloc, target->vma);
 }
@@ -1527,6 +1524,11 @@ static int reloc_move_to_gpu(struct reloc_cache *cache, struct eb_vma *ev)
 
 	obj->write_domain = I915_GEM_DOMAIN_RENDER;
 	obj->read_domains = I915_GEM_DOMAIN_RENDER;
+	ev->flags |= EXEC_OBJECT_ASYNC;
+
+	err = dma_resv_reserve_shared(vma->resv, 1);
+	if (err)
+		return err;
 
 	err = i915_request_await_object(rq, obj, true);
 	if (err)
@@ -1537,6 +1539,7 @@ static int reloc_move_to_gpu(struct reloc_cache *cache, struct eb_vma *ev)
 		return err;
 
 	dma_resv_add_excl_fence(vma->resv, &rq->fence);
+	dma_resv_add_shared_fence(vma->resv, cache->fence);
 
 	return 0;
 }
@@ -1605,14 +1608,28 @@ static int reloc_gpu_alloc(struct i915_execbuffer *eb)
 	return __reloc_gpu_alloc(eb, engine);
 }
 
+static void free_reloc_fence(struct i915_execbuffer *eb)
+{
+	struct dma_fence *f = fetch_and_zero(&eb->reloc_cache.fence);
+
+	dma_fence_signal(f);
+	dma_fence_put(f);
+}
+
 static int reloc_gpu(struct i915_execbuffer *eb)
 {
 	struct eb_vma *ev;
 	int err;
 
+	eb->reloc_cache.fence = __dma_fence_create_proxy(0, 0);
+	if (!eb->reloc_cache.fence)
+		return -ENOMEM;
+
 	err = reloc_gpu_alloc(eb);
-	if (err)
+	if (err) {
+		free_reloc_fence(eb);
 		return err;
+	}
 	GEM_BUG_ON(!eb->reloc_cache.rq);
 
 	err = lock_relocs(eb);
@@ -1673,6 +1690,15 @@ static int eb_relocate(struct i915_execbuffer *eb)
 	return 0;
 }
 
+static void eb_reloc_signal(struct i915_execbuffer *eb, struct i915_request *rq)
+{
+	dma_fence_proxy_set_real(eb->reloc_cache.fence, &rq->fence);
+	i915_request_put(eb->reloc_cache.rq);
+
+	dma_fence_put(eb->reloc_cache.fence);
+	eb->reloc_cache.fence = NULL;
+}
+
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
 	const unsigned int count = eb->buffer_count;
@@ -1953,10 +1979,15 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 	if (err)
 		goto err_commit_unlock;
 
-	/* Wait for all writes (and relocs) into the batch to complete */
-	err = i915_sw_fence_await_reservation(&pw->base.chain,
-					      pw->batch->resv, NULL, false,
-					      0, I915_FENCE_GFP);
+	/* Wait for all writes (or relocs) into the batch to complete */
+	if (!eb->reloc_cache.fence || list_empty(&eb->batch->reloc_link))
+		err = i915_sw_fence_await_reservation(&pw->base.chain,
+						      pw->batch->resv, NULL,
+						      false, 0, I915_FENCE_GFP);
+	else
+		err = i915_sw_fence_await_dma_fence(&pw->base.chain,
+						    &eb->reloc_cache.rq->fence,
+						    0, I915_FENCE_GFP);
 	if (err < 0)
 		goto err_commit_unlock;
 
@@ -2084,6 +2115,15 @@ static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch)
 {
 	int err;
 
+	if (eb->reloc_cache.fence) {
+		err = i915_request_await_dma_fence(eb->request,
+						   &eb->reloc_cache.rq->fence);
+		if (err)
+			return err;
+
+		eb_reloc_signal(eb, eb->request);
+	}
+
 	err = eb_move_to_gpu(eb);
 	if (err)
 		return err;
@@ -2743,6 +2783,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (batch->private)
 		intel_gt_buffer_pool_put(batch->private);
 err_vma:
+	if (eb.reloc_cache.fence)
+		eb_reloc_signal(&eb, eb.reloc_cache.rq);
 	if (eb.trampoline)
 		i915_vma_unpin(eb.trampoline);
 	eb_unpin_engine(&eb);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index 4f10b51f9a7e..62bba179b455 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -23,7 +23,6 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	const u64 mask =
 		GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0);
 	const u32 *map = page_mask_bits(obj->mm.mapping);
-	struct i915_request *rq;
 	struct eb_vma ev;
 	int err;
 	int i;
@@ -40,6 +39,9 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	if (err)
 		goto unpin_vma;
 
+	/* Single stage pipeline in the selftest */
+	eb->reloc_cache.fence = &eb->reloc_cache.rq->fence;
+
 	list_add(&ev.reloc_link, &eb->relocs);
 	err = lock_relocs(eb);
 	if (err)
@@ -71,8 +73,6 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	if (err)
 		goto unpin_vma;
 
-	GEM_BUG_ON(!eb->reloc_cache.rq);
-	rq = i915_request_get(eb->reloc_cache.rq);
 	reloc_gpu_flush(&eb->reloc_cache);
 
 	err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
@@ -81,7 +81,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 		goto put_rq;
 	}
 
-	if (!i915_request_completed(rq)) {
+	if (!i915_request_completed(eb->reloc_cache.rq)) {
 		pr_err("%s: did not wait for relocations!\n", eb->engine->name);
 		err = -EINVAL;
 		goto put_rq;
@@ -100,7 +100,8 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 		igt_hexdump(map, 4096);
 
 put_rq:
-	i915_request_put(rq);
+	i915_request_put(eb->reloc_cache.rq);
+	eb->reloc_cache.rq = NULL;
 unpin_vma:
 	i915_vma_unpin(ev.vma);
 	return err;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 26/36] drm/syncobj: Allow use of dma-fence-proxy
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (23 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 25/36] drm/i915/gem: Make relocations atomic within execbuf Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 27/36] drm/i915/gem: Teach execbuf how to wait on future syncobj Chris Wilson
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Allow the callers to supply a dma-fence-proxy for asynchronous waiting on
future fences.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/drm_syncobj.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 42d46414f767..e141db0e1eb6 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -184,6 +184,7 @@
  */
 
 #include <linux/anon_inodes.h>
+#include <linux/dma-fence-proxy.h>
 #include <linux/file.h>
 #include <linux/fs.h>
 #include <linux/sched/signal.h>
@@ -324,14 +325,9 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
 	struct dma_fence *old_fence;
 	struct syncobj_wait_entry *cur, *tmp;
 
-	if (fence)
-		dma_fence_get(fence);
-
 	spin_lock(&syncobj->lock);
 
-	old_fence = rcu_dereference_protected(syncobj->fence,
-					      lockdep_is_held(&syncobj->lock));
-	rcu_assign_pointer(syncobj->fence, fence);
+	old_fence = dma_fence_replace_proxy(&syncobj->fence, fence);
 
 	if (fence != old_fence) {
 		list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 27/36] drm/i915/gem: Teach execbuf how to wait on future syncobj
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (24 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 26/36] drm/syncobj: Allow use of dma-fence-proxy Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 28/36] drm/i915/gem: Allow combining submit-fences with syncobj Chris Wilson
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If a syncobj has not yet been assigned, treat it as a future fence and
install and wait upon a dma-fence-proxy. The proxy will be replace by
the real fence later, and that fence will be responsible for signaling
our waiter.

Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4854
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 20 +++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 2844274c37bb..4fddbe34efa6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2490,8 +2490,24 @@ await_fence_array(struct i915_execbuffer *eb,
 			continue;
 
 		fence = drm_syncobj_fence_get(syncobj);
-		if (!fence)
-			return -EINVAL;
+		if (!fence) {
+			struct dma_fence *old;
+
+			fence = dma_fence_create_proxy();
+			if (!fence)
+				return -ENOMEM;
+
+			spin_lock(&syncobj->lock);
+			old = rcu_dereference_protected(syncobj->fence, true);
+			if (unlikely(old)) {
+				dma_fence_put(fence);
+				fence = dma_fence_get(old);
+			} else {
+				rcu_assign_pointer(syncobj->fence,
+						   dma_fence_get(fence));
+			}
+			spin_unlock(&syncobj->lock);
+		}
 
 		err = i915_request_await_dma_fence(eb->request, fence);
 		dma_fence_put(fence);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 28/36] drm/i915/gem: Allow combining submit-fences with syncobj
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (25 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 27/36] drm/i915/gem: Teach execbuf how to wait on future syncobj Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 29/36] drm/i915/gt: Declare when we enabled timeslicing Chris Wilson
                   ` (15 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

We allow exported sync_file fences to be used as submit fences, but they
are not the only source of user fences. We also accept an array of
syncobj, and as with sync_file these are dma_fences underneath and so
feature the same set of controls. The submit-fence allows for a request
to be scheduled at the same time as the signaler, rather than as normal
after. Userspace can combine submit-fence with its own semaphores for
intra-batch scheduling.

Not exposing submit-fences to syncobj was at the time just a matter of
pragmatic expediency.

Fixes: a88b6e4cbafd ("drm/i915: Allow specification of parallel execbuf")
Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4854
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 14 +++++++----
 drivers/gpu/drm/i915/i915_request.c           | 25 +++++++++++++++++++
 include/uapi/drm/i915_drm.h                   |  7 +++---
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 4fddbe34efa6..cb4872ccfe58 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2397,7 +2397,7 @@ static void
 __free_fence_array(struct drm_syncobj **fences, unsigned int n)
 {
 	while (n--)
-		drm_syncobj_put(ptr_mask_bits(fences[n], 2));
+		drm_syncobj_put(ptr_mask_bits(fences[n], 3));
 	kvfree(fences);
 }
 
@@ -2454,7 +2454,7 @@ get_fence_array(struct drm_i915_gem_execbuffer2 *args,
 		BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) &
 			     ~__I915_EXEC_FENCE_UNKNOWN_FLAGS);
 
-		fences[n] = ptr_pack_bits(syncobj, fence.flags, 2);
+		fences[n] = ptr_pack_bits(syncobj, fence.flags, 3);
 	}
 
 	return fences;
@@ -2485,7 +2485,7 @@ await_fence_array(struct i915_execbuffer *eb,
 		struct dma_fence *fence;
 		unsigned int flags;
 
-		syncobj = ptr_unpack_bits(fences[n], &flags, 2);
+		syncobj = ptr_unpack_bits(fences[n], &flags, 3);
 		if (!(flags & I915_EXEC_FENCE_WAIT))
 			continue;
 
@@ -2509,7 +2509,11 @@ await_fence_array(struct i915_execbuffer *eb,
 			spin_unlock(&syncobj->lock);
 		}
 
-		err = i915_request_await_dma_fence(eb->request, fence);
+		if (flags & I915_EXEC_FENCE_WAIT_SUBMIT)
+			err = i915_request_await_execution(eb->request, fence,
+							   eb->engine->bond_execute);
+		else
+			err = i915_request_await_dma_fence(eb->request, fence);
 		dma_fence_put(fence);
 		if (err < 0)
 			return err;
@@ -2530,7 +2534,7 @@ signal_fence_array(struct i915_execbuffer *eb,
 		struct drm_syncobj *syncobj;
 		unsigned int flags;
 
-		syncobj = ptr_unpack_bits(fences[n], &flags, 2);
+		syncobj = ptr_unpack_bits(fences[n], &flags, 3);
 		if (!(flags & I915_EXEC_FENCE_SIGNAL))
 			continue;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 02747c171c54..a570a1f43d70 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1385,6 +1385,27 @@ __i915_request_await_proxy(struct i915_request *rq,
 	return 0;
 }
 
+static int execution_proxy(struct await_proxy *ap)
+{
+	return i915_request_await_execution(ap->request, ap->fence, ap->data);
+}
+
+static int
+i915_request_await_proxy_execution(struct i915_request *rq,
+				   struct dma_fence *fence,
+				   void (*hook)(struct i915_request *rq,
+						struct dma_fence *signal))
+{
+	/*
+	 * We have to wait until the real request is known in order to
+	 * be able to hook into its execution, as opposed to waiting for
+	 * its completion.
+	 */
+	return __i915_request_await_proxy(rq, fence,
+					  i915_fence_timeout(rq->i915),
+					  execution_proxy, hook);
+}
+
 int
 i915_request_await_execution(struct i915_request *rq,
 			     struct dma_fence *fence,
@@ -1424,6 +1445,10 @@ i915_request_await_execution(struct i915_request *rq,
 			ret = __i915_request_await_execution(rq,
 							     to_request(fence),
 							     hook);
+		else if (dma_fence_is_proxy(fence))
+			ret = i915_request_await_proxy_execution(rq,
+								 fence,
+								 hook);
 		else
 			ret = i915_request_await_external(rq, fence);
 		if (ret < 0)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 14b67cd6b54b..704dd0e3bc1d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1040,9 +1040,10 @@ struct drm_i915_gem_exec_fence {
 	 */
 	__u32 handle;
 
-#define I915_EXEC_FENCE_WAIT            (1<<0)
-#define I915_EXEC_FENCE_SIGNAL          (1<<1)
-#define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SIGNAL << 1))
+#define I915_EXEC_FENCE_WAIT            (1u << 0)
+#define I915_EXEC_FENCE_SIGNAL          (1u << 1)
+#define I915_EXEC_FENCE_WAIT_SUBMIT     (1u << 2)
+#define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_WAIT_SUBMIT << 1))
 	__u32 flags;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 29/36] drm/i915/gt: Declare when we enabled timeslicing
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (26 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 28/36] drm/i915/gem: Allow combining submit-fences with syncobj Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 30/36] drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT Chris Wilson
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Kenneth Graunke, Chris Wilson

Let userspace know if they can trust timeslicing by including it as part
of the I915_PARAM_HAS_SCHEDULER::I915_SCHEDULER_CAP_TIMESLICING

v2: Only declare timeslicing if we can safely preempt userspace.

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802
Link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4854
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_user.c | 1 +
 include/uapi/drm/i915_drm.h                 | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 848decee9066..8415511f1465 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -98,6 +98,7 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
 		MAP(HAS_PREEMPTION, PREEMPTION),
 		MAP(HAS_SEMAPHORES, SEMAPHORES),
 		MAP(SUPPORTS_STATS, ENGINE_BUSY_STATS),
+		MAP(HAS_TIMESLICES, TIMESLICING),
 #undef MAP
 	};
 	struct intel_engine_cs *engine;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 704dd0e3bc1d..1ee227b5131a 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -523,6 +523,7 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
 #define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
 #define   I915_SCHEDULER_CAP_ENGINE_BUSY_STATS	(1ul << 4)
+#define   I915_SCHEDULER_CAP_TIMESLICING	(1ul << 5)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 30/36] drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (27 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 29/36] drm/i915/gt: Declare when we enabled timeslicing Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 31/36] drm/i915: Always defer fenced work to the worker Chris Wilson
                   ` (13 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

This timeout is only used in one place, to provide a tiny bit of grace
for slow igt to cleanup after themselves. If we are a bit stricter and
opt to kill outstanding requsts rather than wait, we can speed up igt by
not waiting for 200ms after a hang.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 11 ++++++-----
 drivers/gpu/drm/i915/i915_drv.h     |  2 --
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index bca036ac6621..c0bd26ef4772 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1462,12 +1462,13 @@ gt_drop_caches(struct intel_gt *gt, u64 val)
 {
 	int ret;
 
-	if (val & DROP_RESET_ACTIVE &&
-	    wait_for(intel_engines_are_idle(gt), I915_IDLE_ENGINES_TIMEOUT))
-		intel_gt_set_wedged(gt);
+	if (val & (DROP_RETIRE | DROP_RESET_ACTIVE))
+		intel_gt_wait_for_idle(gt, 1);
 
-	if (val & DROP_RETIRE)
-		intel_gt_retire_requests(gt);
+	if (val & DROP_RESET_ACTIVE && intel_gt_pm_get_if_awake(gt)) {
+		intel_gt_set_wedged(gt);
+		intel_gt_pm_put(gt);
+	}
 
 	if (val & (DROP_IDLE | DROP_ACTIVE)) {
 		ret = intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 98f2c448cd92..5140b90f7f7d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -616,8 +616,6 @@ struct i915_gem_mm {
 	u32 shrink_count;
 };
 
-#define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
-
 unsigned long i915_fence_context_timeout(const struct drm_i915_private *i915,
 					 u64 context);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 31/36] drm/i915: Always defer fenced work to the worker
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (28 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 30/36] drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 32/36] drm/i915/gem: Assign context id for async work Chris Wilson
                   ` (12 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Currently, if an error is raised we always call the cleanup locally
[and skip the main work callback]. However, some future users may need
to take a mutex to cleanup and so we cannot immediately execute the
cleanup as we may still be in interrupt context.

With the execute-immediate flag, for most cases this should result in
immediate cleanup of an error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_sw_fence_work.c | 25 +++++++++++------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index a3a81bb8f2c3..29f63ebc24e8 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -16,11 +16,14 @@ static void fence_complete(struct dma_fence_work *f)
 static void fence_work(struct work_struct *work)
 {
 	struct dma_fence_work *f = container_of(work, typeof(*f), work);
-	int err;
 
-	err = f->ops->work(f);
-	if (err)
-		dma_fence_set_error(&f->dma, err);
+	if (!f->dma.error) {
+		int err;
+
+		err = f->ops->work(f);
+		if (err)
+			dma_fence_set_error(&f->dma, err);
+	}
 
 	fence_complete(f);
 	dma_fence_put(&f->dma);
@@ -36,15 +39,11 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 		if (fence->error)
 			dma_fence_set_error(&f->dma, fence->error);
 
-		if (!f->dma.error) {
-			dma_fence_get(&f->dma);
-			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
-				fence_work(&f->work);
-			else
-				queue_work(system_unbound_wq, &f->work);
-		} else {
-			fence_complete(f);
-		}
+		dma_fence_get(&f->dma);
+		if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
+			fence_work(&f->work);
+		else
+			queue_work(system_unbound_wq, &f->work);
 		break;
 
 	case FENCE_FREE:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 32/36] drm/i915/gem: Assign context id for async work
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (29 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 31/36] drm/i915: Always defer fenced work to the worker Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 33/36] drm/i915: Export a preallocate variant of i915_active_acquire() Chris Wilson
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Allocate a few dma fence context id that we can use to associate async work
[for the CPU] launched on behalf of this context. For extra fun, we allow
a configurable concurrency width.

A current example would be that we spawn an unbound worker for every
userptr get_pages. In the future, we wish to charge this work to the
context that initiated the async work and to impose concurrency limits
based on the context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c       | 4 ++++
 drivers/gpu/drm/i915/gem/i915_gem_context.h       | 6 ++++++
 drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 6 ++++++
 3 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index f5d59d18cd5b..4e8cebe2e66b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -715,6 +715,10 @@ __create_context(struct drm_i915_private *i915)
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
 	mutex_init(&ctx->mutex);
 
+	ctx->async.width = rounddown_pow_of_two(num_online_cpus());
+	ctx->async.context = dma_fence_context_alloc(ctx->async.width);
+	ctx->async.width--;
+
 	spin_lock_init(&ctx->stale.lock);
 	INIT_LIST_HEAD(&ctx->stale.engines);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index 3702b2fb27ab..e104ff0ae740 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -134,6 +134,12 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
 				       struct drm_file *file);
 
+static inline u64 i915_gem_context_async_id(struct i915_gem_context *ctx)
+{
+	return (ctx->async.context +
+		(atomic_fetch_inc(&ctx->async.cur) & ctx->async.width));
+}
+
 static inline struct i915_gem_context *
 i915_gem_context_get(struct i915_gem_context *ctx)
 {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 28760bd03265..5f5cfa3a3e9b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -85,6 +85,12 @@ struct i915_gem_context {
 
 	struct intel_timeline *timeline;
 
+	struct {
+		u64 context;
+		atomic_t cur;
+		unsigned int width;
+	} async;
+
 	/**
 	 * @vm: unique address space (GTT)
 	 *
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 33/36] drm/i915: Export a preallocate variant of i915_active_acquire()
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (30 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 32/36] drm/i915/gem: Assign context id for async work Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 34/36] drm/i915/gem: Separate the ww_mutex walker into its own list Chris Wilson
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Sometimes we have to be very careful not to allocate underneath a mutex
(or spinlock) and yet still want to track activity. Enter
i915_active_acquire_for_context(). This raises the activity counter on
i915_active prior to use and ensures that the fence-tree contains a slot
for the context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_active.c | 107 ++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_active.h |   5 ++
 2 files changed, 103 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index d960d0be5bd2..71ad0d452680 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -217,11 +217,10 @@ excl_retire(struct dma_fence *fence, struct dma_fence_cb *cb)
 }
 
 static struct i915_active_fence *
-active_instance(struct i915_active *ref, struct intel_timeline *tl)
+active_instance(struct i915_active *ref, u64 idx)
 {
 	struct active_node *node, *prealloc;
 	struct rb_node **p, *parent;
-	u64 idx = tl->fence_context;
 
 	/*
 	 * We track the most recently used timeline to skip a rbtree search
@@ -367,7 +366,7 @@ int i915_active_ref(struct i915_active *ref,
 	if (err)
 		return err;
 
-	active = active_instance(ref, tl);
+	active = active_instance(ref, tl->fence_context);
 	if (!active) {
 		err = -ENOMEM;
 		goto out;
@@ -384,32 +383,104 @@ int i915_active_ref(struct i915_active *ref,
 		atomic_dec(&ref->count);
 	}
 	if (!__i915_active_fence_set(active, fence))
-		atomic_inc(&ref->count);
+		__i915_active_acquire(ref);
 
 out:
 	i915_active_release(ref);
 	return err;
 }
 
-struct dma_fence *
-i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
+static struct dma_fence *
+__i915_active_set_fence(struct i915_active *ref,
+			struct i915_active_fence *active,
+			struct dma_fence *fence)
 {
 	struct dma_fence *prev;
 
 	/* We expect the caller to manage the exclusive timeline ordering */
 	GEM_BUG_ON(i915_active_is_idle(ref));
 
+	if (is_barrier(active)) { /* proto-node used by our idle barrier */
+		/*
+		 * This request is on the kernel_context timeline, and so
+		 * we can use it to substitute for the pending idle-barrer
+		 * request that we want to emit on the kernel_context.
+		 */
+		__active_del_barrier(ref, node_from_active(active));
+		RCU_INIT_POINTER(active->fence, NULL);
+		atomic_dec(&ref->count);
+	}
+
 	rcu_read_lock();
-	prev = __i915_active_fence_set(&ref->excl, f);
+	prev = __i915_active_fence_set(active, fence);
 	if (prev)
 		prev = dma_fence_get_rcu(prev);
 	else
-		atomic_inc(&ref->count);
+		__i915_active_acquire(ref);
 	rcu_read_unlock();
 
 	return prev;
 }
 
+static struct i915_active_fence *
+__active_lookup(struct i915_active *ref, u64 idx)
+{
+	struct active_node *node;
+	struct rb_node *p;
+
+	/* Like active_instance() but with no malloc */
+
+	node = READ_ONCE(ref->cache);
+	if (node && node->timeline == idx)
+		return &node->base;
+
+	spin_lock_irq(&ref->tree_lock);
+	GEM_BUG_ON(i915_active_is_idle(ref));
+
+	p = ref->tree.rb_node;
+	while (p) {
+		node = rb_entry(p, struct active_node, node);
+		if (node->timeline == idx) {
+			ref->cache = node;
+			spin_unlock_irq(&ref->tree_lock);
+			return &node->base;
+		}
+
+		if (node->timeline < idx)
+			p = p->rb_right;
+		else
+			p = p->rb_left;
+	}
+
+	spin_unlock_irq(&ref->tree_lock);
+
+	return NULL;
+}
+
+struct dma_fence *
+__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence)
+{
+	struct dma_fence *prev = ERR_PTR(-ENOENT);
+	struct i915_active_fence *active;
+
+	if (!i915_active_acquire_if_busy(ref))
+		return ERR_PTR(-EINVAL);
+
+	active = __active_lookup(ref, idx);
+	if (active)
+		prev = __i915_active_set_fence(ref, active, fence);
+
+	i915_active_release(ref);
+	return prev;
+}
+
+struct dma_fence *
+i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
+{
+	/* We expect the caller to manage the exclusive timeline ordering */
+	return __i915_active_set_fence(ref, &ref->excl, f);
+}
+
 bool i915_active_acquire_if_busy(struct i915_active *ref)
 {
 	debug_active_assert(ref);
@@ -443,6 +514,24 @@ int i915_active_acquire(struct i915_active *ref)
 	return err;
 }
 
+int i915_active_acquire_for_context(struct i915_active *ref, u64 idx)
+{
+	struct i915_active_fence *active;
+	int err;
+
+	err = i915_active_acquire(ref);
+	if (err)
+		return err;
+
+	active = active_instance(ref, idx);
+	if (!active) {
+		i915_active_release(ref);
+		return -ENOMEM;
+	}
+
+	return 0; /* return with active ref */
+}
+
 void i915_active_release(struct i915_active *ref)
 {
 	debug_active_assert(ref);
@@ -804,7 +893,7 @@ int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
 			 */
 			RCU_INIT_POINTER(node->base.fence, ERR_PTR(-EAGAIN));
 			node->base.cb.node.prev = (void *)engine;
-			atomic_inc(&ref->count);
+			__i915_active_acquire(ref);
 		}
 		GEM_BUG_ON(rcu_access_pointer(node->base.fence) != ERR_PTR(-EAGAIN));
 
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index cf4058150966..042502abefe5 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -163,6 +163,9 @@ void __i915_active_init(struct i915_active *ref,
 	__i915_active_init(ref, active, retire, &__mkey, &__wkey);	\
 } while (0)
 
+struct dma_fence *
+__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence);
+
 int i915_active_ref(struct i915_active *ref,
 		    struct intel_timeline *tl,
 		    struct dma_fence *fence);
@@ -198,7 +201,9 @@ int i915_request_await_active(struct i915_request *rq,
 #define I915_ACTIVE_AWAIT_BARRIER BIT(2)
 
 int i915_active_acquire(struct i915_active *ref);
+int i915_active_acquire_for_context(struct i915_active *ref, u64 idx);
 bool i915_active_acquire_if_busy(struct i915_active *ref);
+
 void i915_active_release(struct i915_active *ref);
 
 static inline void __i915_active_acquire(struct i915_active *ref)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 34/36] drm/i915/gem: Separate the ww_mutex walker into its own list
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (31 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 33/36] drm/i915: Export a preallocate variant of i915_active_acquire() Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 35/36] drm/i915/gem: Asynchronous GTT unbinding Chris Wilson
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

In preparation for making eb_vma bigger and heavy to run inn parallel,
we need to stop apply an in-place swap() to reorder around ww_mutex
deadlocks. Keep the array intact and reorder the locks using a dedicated
list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 52 ++++++++++++-------
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index cb4872ccfe58..b400eed1f435 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -36,6 +36,7 @@ struct eb_vma {
 	struct drm_i915_gem_exec_object2 *exec;
 	struct list_head bind_link;
 	struct list_head reloc_link;
+	struct list_head lock_link;
 
 	struct hlist_node node;
 	u32 handle;
@@ -247,6 +248,8 @@ struct i915_execbuffer {
 	/** list of vma that have execobj.relocation_count */
 	struct list_head relocs;
 
+	struct list_head lock;
+
 	/**
 	 * Track the most recently used object for relocations, as we
 	 * frequently have to perform multiple relocations within the same
@@ -391,6 +394,10 @@ static int eb_create(struct i915_execbuffer *eb)
 		eb->lut_size = -eb->buffer_count;
 	}
 
+	INIT_LIST_HEAD(&eb->relocs);
+	INIT_LIST_HEAD(&eb->unbound);
+	INIT_LIST_HEAD(&eb->lock);
+
 	return 0;
 }
 
@@ -598,6 +605,8 @@ eb_add_vma(struct i915_execbuffer *eb,
 		eb_unreserve_vma(ev);
 		list_add_tail(&ev->bind_link, &eb->unbound);
 	}
+
+	list_add_tail(&ev->lock_link, &eb->lock);
 }
 
 static int eb_reserve_vma(const struct i915_execbuffer *eb,
@@ -857,9 +866,6 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 	unsigned int i;
 	int err = 0;
 
-	INIT_LIST_HEAD(&eb->relocs);
-	INIT_LIST_HEAD(&eb->unbound);
-
 	for (i = 0; i < eb->buffer_count; i++) {
 		struct i915_vma *vma;
 
@@ -1701,38 +1707,43 @@ static void eb_reloc_signal(struct i915_execbuffer *eb, struct i915_request *rq)
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
-	const unsigned int count = eb->buffer_count;
 	struct ww_acquire_ctx acquire;
-	unsigned int i;
+	struct eb_vma *ev;
 	int err = 0;
 
 	ww_acquire_init(&acquire, &reservation_ww_class);
 
-	for (i = 0; i < count; i++) {
-		struct eb_vma *ev = &eb->vma[i];
+	list_for_each_entry(ev, &eb->lock, lock_link) {
 		struct i915_vma *vma = ev->vma;
 
 		err = ww_mutex_lock_interruptible(&vma->resv->lock, &acquire);
 		if (err == -EDEADLK) {
-			GEM_BUG_ON(i == 0);
-			do {
-				int j = i - 1;
-
-				ww_mutex_unlock(&eb->vma[j].vma->resv->lock);
+			struct eb_vma *unlock = ev, *en;
 
-				swap(eb->vma[i],  eb->vma[j]);
-			} while (--i);
+			list_for_each_entry_safe_continue_reverse(unlock, en,
+								  &eb->lock,
+								  lock_link) {
+				ww_mutex_unlock(&unlock->vma->resv->lock);
+				list_move_tail(&unlock->lock_link, &eb->lock);
+			}
 
+			GEM_BUG_ON(!list_is_first(&ev->lock_link, &eb->lock));
 			err = ww_mutex_lock_slow_interruptible(&vma->resv->lock,
 							       &acquire);
 		}
-		if (err)
-			break;
+		if (err) {
+			list_for_each_entry_continue_reverse(ev,
+							     &eb->lock,
+							     lock_link)
+				ww_mutex_unlock(&ev->vma->resv->lock);
+
+			ww_acquire_fini(&acquire);
+			goto err_skip;
+		}
 	}
 	ww_acquire_done(&acquire);
 
-	while (i--) {
-		struct eb_vma *ev = &eb->vma[i];
+	list_for_each_entry(ev, &eb->lock, lock_link) {
 		struct i915_vma *vma = ev->vma;
 		unsigned int flags = ev->flags;
 		struct drm_i915_gem_object *obj = vma->obj;
@@ -2079,9 +2090,10 @@ static int eb_parse(struct i915_execbuffer *eb)
 	if (err)
 		goto err_trampoline;
 
-	eb->vma[eb->buffer_count].vma = i915_vma_get(shadow);
-	eb->vma[eb->buffer_count].flags = __EXEC_OBJECT_HAS_PIN;
 	eb->batch = &eb->vma[eb->buffer_count++];
+	eb->batch->vma = i915_vma_get(shadow);
+	eb->batch->flags = __EXEC_OBJECT_HAS_PIN;
+	list_add_tail(&eb->batch->lock_link, &eb->lock);
 	eb->vma[eb->buffer_count].vma = NULL;
 
 	eb->trampoline = trampoline;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 35/36] drm/i915/gem: Asynchronous GTT unbinding
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (32 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 34/36] drm/i915/gem: Separate the ww_mutex walker into its own list Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 36/36] drm/i915/gem: Bind the fence async for execbuf Chris Wilson
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld, Chris Wilson

It is reasonably common for userspace (even modern drivers like iris) to
reuse an active address for a new buffer. This would cause the
application to stall under its mutex (originally struct_mutex) until the
old batches were idle and it could synchronously remove the stale PTE.
However, we can queue up a job that waits on the signal for the old
nodes to complete and upon those signals, remove the old nodes replacing
them with the new ones for the batch. This is still CPU driven, but in
theory we can do the GTT patching from the GPU. The job itself has a
completion signal allowing the execbuf to wait upon the rebinding, and
also other observers to coordinate with the common VM activity.

Letting userspace queue up more work, lets it do more stuff without
blocking other clients. In turn, we take care not to let it too much
concurrent work, creating a small number of queues for each context to
limit the number of concurrent tasks.

The implementation relies on only scheduling one unbind operation per
vma as we use the unbound vma->node location to track the stale PTE.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1402
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 810 ++++++++++++++++--
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c          |   1 +
 drivers/gpu/drm/i915/gt/intel_ggtt.c          |   3 +-
 drivers/gpu/drm/i915/gt/intel_gtt.c           |   4 +
 drivers/gpu/drm/i915/gt/intel_gtt.h           |   2 +
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   3 +-
 drivers/gpu/drm/i915/i915_gem.c               |   7 +
 drivers/gpu/drm/i915/i915_gem_gtt.c           |   5 +
 drivers/gpu/drm/i915/i915_vma.c               |  71 +-
 drivers/gpu/drm/i915/i915_vma.h               |   4 +
 10 files changed, 813 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index b400eed1f435..49bfae968215 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -19,6 +19,7 @@
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_buffer_pool.h"
 #include "gt/intel_gt_pm.h"
+#include "gt/intel_gt_requests.h"
 #include "gt/intel_ring.h"
 
 #include "i915_drv.h"
@@ -32,6 +33,9 @@ struct eb_vma {
 	struct i915_vma *vma;
 	unsigned int flags;
 
+	struct drm_mm_node hole;
+	unsigned int bind_flags;
+
 	/** This vma's place in the execbuf reservation list */
 	struct drm_i915_gem_exec_object2 *exec;
 	struct list_head bind_link;
@@ -49,9 +53,10 @@ struct eb_vma_array {
 
 #define __EXEC_OBJECT_HAS_PIN		BIT(31)
 #define __EXEC_OBJECT_HAS_FENCE		BIT(30)
-#define __EXEC_OBJECT_NEEDS_MAP		BIT(29)
-#define __EXEC_OBJECT_NEEDS_BIAS	BIT(28)
-#define __EXEC_OBJECT_INTERNAL_FLAGS	(~0u << 28) /* all of the above */
+#define __EXEC_OBJECT_HAS_PAGES		BIT(29)
+#define __EXEC_OBJECT_NEEDS_MAP		BIT(28)
+#define __EXEC_OBJECT_NEEDS_BIAS	BIT(27)
+#define __EXEC_OBJECT_INTERNAL_FLAGS	(~0u << 27) /* all of the above */
 
 #define __EXEC_HAS_RELOC	BIT(31)
 #define __EXEC_INTERNAL_FLAGS	(~0u << 31)
@@ -65,11 +70,12 @@ struct eb_vma_array {
 	 I915_EXEC_RESOURCE_STREAMER)
 
 /* Catch emission of unexpected errors for CI! */
+#define __EINVAL__ 22
 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 #undef EINVAL
 #define EINVAL ({ \
 	DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \
-	22; \
+	__EINVAL__; \
 })
 #endif
 
@@ -309,6 +315,12 @@ static struct eb_vma_array *eb_vma_array_create(unsigned int count)
 	return arr;
 }
 
+static struct eb_vma_array *eb_vma_array_get(struct eb_vma_array *arr)
+{
+	kref_get(&arr->kref);
+	return arr;
+}
+
 static inline void eb_unreserve_vma(struct eb_vma *ev)
 {
 	struct i915_vma *vma = ev->vma;
@@ -319,8 +331,12 @@ static inline void eb_unreserve_vma(struct eb_vma *ev)
 	if (ev->flags & __EXEC_OBJECT_HAS_PIN)
 		__i915_vma_unpin(vma);
 
+	if (ev->flags & __EXEC_OBJECT_HAS_PAGES)
+		i915_vma_put_pages(vma);
+
 	ev->flags &= ~(__EXEC_OBJECT_HAS_PIN |
-		       __EXEC_OBJECT_HAS_FENCE);
+		       __EXEC_OBJECT_HAS_FENCE |
+		       __EXEC_OBJECT_HAS_PAGES);
 }
 
 static void eb_vma_array_destroy(struct kref *kref)
@@ -406,7 +422,7 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry,
 		 const struct i915_vma *vma,
 		 unsigned int flags)
 {
-	if (vma->node.size < entry->pad_to_size)
+	if (vma->node.size < max(vma->size, entry->pad_to_size))
 		return true;
 
 	if (entry->alignment && !IS_ALIGNED(vma->node.start, entry->alignment))
@@ -479,12 +495,18 @@ eb_pin_vma(struct i915_execbuffer *eb,
 		if (entry->flags & EXEC_OBJECT_PINNED)
 			return false;
 
+		/* Concurrent async binds in progress, get in the queue */
+		if (!i915_active_is_idle(&vma->vm->active))
+			return false;
+
 		/* Failing that pick any _free_ space if suitable */
 		if (unlikely(i915_vma_pin(vma,
 					  entry->pad_to_size,
 					  entry->alignment,
 					  eb_pin_flags(entry, ev->flags) |
-					  PIN_USER | PIN_NOEVICT)))
+					  PIN_USER |
+					  PIN_NOEVICT |
+					  PIN_NOSEARCH)))
 			return false;
 	}
 
@@ -609,85 +631,711 @@ eb_add_vma(struct i915_execbuffer *eb,
 	list_add_tail(&ev->lock_link, &eb->lock);
 }
 
-static int eb_reserve_vma(const struct i915_execbuffer *eb,
-			  struct eb_vma *ev,
-			  u64 pin_flags)
+struct eb_vm_work {
+	struct dma_fence_work base;
+	struct list_head unbound;
+	struct eb_vma_array *array;
+	struct i915_address_space *vm;
+	struct list_head evict_list;
+	u64 *p_flags;
+	u64 active;
+};
+
+static inline u64 node_end(const struct drm_mm_node *node)
+{
+	return node->start + node->size;
+}
+
+static int set_bind_fence(struct i915_vma *vma, struct eb_vm_work *work)
+{
+	struct dma_fence *prev;
+	int err = 0;
+
+	lockdep_assert_held(&vma->vm->mutex);
+	prev = i915_active_set_exclusive(&vma->active, &work->base.dma);
+	if (unlikely(prev)) {
+		err = i915_sw_fence_await_dma_fence(&work->base.chain, prev, 0,
+						    GFP_NOWAIT | __GFP_NOWARN);
+		dma_fence_put(prev);
+	}
+
+	return err < 0 ? err : 0;
+}
+
+static int await_evict(struct eb_vm_work *work, struct i915_vma *vma)
+{
+	int err;
+
+	if (rcu_access_pointer(vma->active.excl.fence) == &work->base.dma)
+		return 0;
+
+	/* Wait for all other previous activity */
+	err = i915_sw_fence_await_active(&work->base.chain,
+					 &vma->active,
+					 I915_ACTIVE_AWAIT_ACTIVE);
+	/* Then insert along the exclusive vm->mutex timeline */
+	if (err == 0)
+		err = set_bind_fence(vma, work);
+
+	return err;
+}
+
+static int
+evict_for_node(struct eb_vm_work *work,
+	       struct eb_vma *const target,
+	       unsigned int flags)
+{
+	struct i915_address_space *vm = target->vma->vm;
+	const unsigned long color = target->vma->node.color;
+	const u64 start = target->vma->node.start;
+	const u64 end = start + target->vma->node.size;
+	u64 hole_start = start, hole_end = end;
+	struct i915_vma *vma, *next;
+	struct drm_mm_node *node;
+	LIST_HEAD(evict_list);
+	LIST_HEAD(steal_list);
+	int err = 0;
+
+	lockdep_assert_held(&vm->mutex);
+	GEM_BUG_ON(drm_mm_node_allocated(&target->vma->node));
+	GEM_BUG_ON(!IS_ALIGNED(start, I915_GTT_PAGE_SIZE));
+	GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE));
+
+	if (i915_vm_has_cache_coloring(vm)) {
+		/* Expand search to cover neighbouring guard pages (or lack!) */
+		if (hole_start)
+			hole_start -= I915_GTT_PAGE_SIZE;
+
+		/* Always look at the page afterwards to avoid the end-of-GTT */
+		hole_end += I915_GTT_PAGE_SIZE;
+	}
+	GEM_BUG_ON(hole_start >= hole_end);
+
+	drm_mm_for_each_node_in_range(node, &vm->mm, hole_start, hole_end) {
+		GEM_BUG_ON(node == &target->vma->node);
+
+		/* If we find any non-objects (!vma), we cannot evict them */
+		if (node->color == I915_COLOR_UNEVICTABLE) {
+			err = -ENOSPC;
+			goto err;
+		}
+
+		/*
+		 * If we are using coloring to insert guard pages between
+		 * different cache domains within the address space, we have
+		 * to check whether the objects on either side of our range
+		 * abutt and conflict. If they are in conflict, then we evict
+		 * those as well to make room for our guard pages.
+		 */
+		if (i915_vm_has_cache_coloring(vm)) {
+			if (node_end(node) == start && node->color == color)
+				continue;
+
+			if (node->start == end && node->color == color)
+				continue;
+		}
+
+		GEM_BUG_ON(!drm_mm_node_allocated(node));
+		vma = container_of(node, typeof(*vma), node);
+
+		if (i915_vma_is_pinned(vma)) {
+			err = -ENOSPC;
+			goto err;
+		}
+
+		/* If this VMA is already being freed, or idle, steal it! */
+		if (!i915_active_acquire_if_busy(&vma->active)) {
+			list_move(&vma->vm_link, &steal_list);
+			continue;
+		}
+
+		if (flags & PIN_NONBLOCK)
+			err = -EAGAIN;
+		else
+			err = await_evict(work, vma);
+		i915_active_release(&vma->active);
+		if (err)
+			goto err;
+
+		GEM_BUG_ON(!i915_vma_is_active(vma));
+		list_move(&vma->vm_link, &evict_list);
+	}
+
+	list_for_each_entry_safe(vma, next, &steal_list, vm_link) {
+		atomic_and(~I915_VMA_BIND_MASK, &vma->flags);
+		__i915_vma_evict(vma);
+		drm_mm_remove_node(&vma->node);
+		/* No ref held; vma may now be concurrently freed */
+	}
+
+	/* No overlapping nodes to evict, claim the slot for ourselves! */
+	if (list_empty(&evict_list))
+		return drm_mm_reserve_node(&vm->mm, &target->vma->node);
+
+	/*
+	 * Mark this range as reserved.
+	 *
+	 * We have not yet removed the PTEs for the old evicted nodes, so
+	 * must prevent this range from being reused for anything else. The
+	 * PTE will be cleared when the range is idle (during the rebind
+	 * phase in the worker).
+	 */
+	target->hole.color = I915_COLOR_UNEVICTABLE;
+	target->hole.start = start;
+	target->hole.size = end;
+
+	list_for_each_entry(vma, &evict_list, vm_link) {
+		target->hole.start =
+			min(target->hole.start, vma->node.start);
+		target->hole.size =
+			max(target->hole.size, node_end(&vma->node));
+
+		GEM_BUG_ON(vma->node.mm != &vm->mm);
+		drm_mm_remove_node(&vma->node);
+		atomic_and(~I915_VMA_BIND_MASK, &vma->flags);
+		GEM_BUG_ON(i915_vma_is_pinned(vma));
+	}
+	list_splice(&evict_list, &work->evict_list);
+
+	target->hole.size -= target->hole.start;
+
+	return drm_mm_reserve_node(&vm->mm, &target->hole);
+
+err:
+	list_splice(&evict_list, &vm->bound_list);
+	list_splice(&steal_list, &vm->bound_list);
+	return err;
+}
+
+static int
+evict_in_range(struct eb_vm_work *work,
+	       struct eb_vma * const target,
+	       u64 start, u64 end, u64 align)
+{
+	struct i915_address_space *vm = target->vma->vm;
+	struct i915_vma *active = NULL;
+	struct i915_vma *vma, *next;
+	struct drm_mm_scan scan;
+	LIST_HEAD(evict_list);
+	bool found = false;
+
+	lockdep_assert_held(&vm->mutex);
+
+	drm_mm_scan_init_with_range(&scan, &vm->mm,
+				    target->vma->node.size,
+				    align,
+				    target->vma->node.color,
+				    start, end,
+				    DRM_MM_INSERT_BEST);
+
+	list_for_each_entry_safe(vma, next, &vm->bound_list, vm_link) {
+		if (i915_vma_is_pinned(vma))
+			continue;
+
+		if (vma == active)
+			active = ERR_PTR(-EAGAIN);
+
+		/* Prefer to reuse idle nodes; push all active vma to the end */
+		if (active != ERR_PTR(-EAGAIN) && i915_vma_is_active(vma)) {
+			if (!active)
+				active = vma;
+
+			list_move_tail(&vma->vm_link, &vm->bound_list);
+			continue;
+		}
+
+		list_move(&vma->vm_link, &evict_list);
+		if (drm_mm_scan_add_block(&scan, &vma->node)) {
+			target->vma->node.start =
+				round_up(scan.hit_start, align);
+			found = true;
+			break;
+		}
+	}
+
+	list_for_each_entry(vma, &evict_list, vm_link)
+		drm_mm_scan_remove_block(&scan, &vma->node);
+	list_splice(&evict_list, &vm->bound_list);
+	if (!found)
+		return -ENOSPC;
+
+	return evict_for_node(work, target, 0);
+}
+
+static u64 random_offset(u64 start, u64 end, u64 len, u64 align)
+{
+	u64 range, addr;
+
+	GEM_BUG_ON(range_overflows(start, len, end));
+	GEM_BUG_ON(round_up(start, align) > round_down(end - len, align));
+
+	range = round_down(end - len, align) - round_up(start, align);
+	if (range) {
+		if (sizeof(unsigned long) == sizeof(u64)) {
+			addr = get_random_long();
+		} else {
+			addr = get_random_int();
+			if (range > U32_MAX) {
+				addr <<= 32;
+				addr |= get_random_int();
+			}
+		}
+		div64_u64_rem(addr, range, &addr);
+		start += addr;
+	}
+
+	return round_up(start, align);
+}
+
+static u64 align0(u64 align)
+{
+	return align <= I915_GTT_MIN_ALIGNMENT ? 0 : align;
+}
+
+static struct drm_mm_node *__best_hole(struct drm_mm *mm, u64 size)
+{
+	struct rb_node *rb = mm->holes_size.rb_root.rb_node;
+	struct drm_mm_node *best = NULL;
+
+	do {
+		struct drm_mm_node *node =
+			rb_entry(rb, struct drm_mm_node, rb_hole_size);
+
+		if (size <= node->hole_size) {
+			best = node;
+			rb = rb->rb_right;
+		} else {
+			rb = rb->rb_left;
+		}
+	} while (rb);
+
+	return best;
+}
+
+static int best_hole(struct drm_mm *mm, struct drm_mm_node *node,
+		     u64 start, u64 end, u64 align)
+{
+	struct drm_mm_node *hole;
+	u64 size = node->size;
+
+	do {
+		hole = __best_hole(mm, size);
+		if (!hole)
+			return -ENOSPC;
+
+		node->start = round_up(max(start, drm_mm_hole_node_start(hole)),
+				       align);
+		if (min(drm_mm_hole_node_end(hole), end) >=
+		    node->start + node->size)
+			return drm_mm_reserve_node(mm, node);
+
+		/*
+		 * Too expensive to search for every single hole every time,
+		 * so just look for the next bigger hole, introducing enough
+		 * space for alignments. Finding the smallest hole with ideal
+		 * alignment scales very poorly, so we choose to waste space
+		 * if an alignment is forced. On the other hand, simply
+		 * randomly selecting an offset in 48b space will cause us
+		 * to use the majority of that space and exhaust all memory
+		 * in storing the page directories. Compromise is required.
+		 */
+		size = hole->hole_size + align;
+	} while (1);
+}
+
+static int eb_reserve_vma(struct eb_vm_work *work, struct eb_vma *ev)
 {
 	struct drm_i915_gem_exec_object2 *entry = ev->exec;
+	const unsigned int exec_flags = ev->flags;
 	struct i915_vma *vma = ev->vma;
+	struct i915_address_space *vm = vma->vm;
+	u64 start = 0, end = vm->total;
+	u64 align = entry->alignment ?: I915_GTT_MIN_ALIGNMENT;
+	unsigned int bind_flags;
 	int err;
 
-	if (drm_mm_node_allocated(&vma->node) &&
-	    eb_vma_misplaced(entry, vma, ev->flags)) {
-		err = i915_vma_unbind(vma);
-		if (err)
-			return err;
+	lockdep_assert_held(&vm->mutex);
+
+	bind_flags = PIN_USER;
+	if (exec_flags & EXEC_OBJECT_NEEDS_GTT)
+		bind_flags |= PIN_GLOBAL;
+
+	if (drm_mm_node_allocated(&vma->node))
+		goto pin;
+
+	GEM_BUG_ON(i915_vma_is_pinned(vma));
+	GEM_BUG_ON(i915_vma_is_bound(vma, I915_VMA_BIND_MASK));
+	GEM_BUG_ON(i915_active_fence_isset(&vma->active.excl));
+	GEM_BUG_ON(!vma->size);
+
+	/* Reuse old address (if it doesn't conflict with new requirements) */
+	if (eb_vma_misplaced(entry, vma, exec_flags)) {
+		vma->node.start = entry->offset & PIN_OFFSET_MASK;
+		vma->node.size = max(entry->pad_to_size, vma->size);
+		vma->node.color = 0;
+		if (i915_vm_has_cache_coloring(vm))
+			vma->node.color = vma->obj->cache_level;
 	}
 
-	err = i915_vma_pin(vma,
-			   entry->pad_to_size, entry->alignment,
-			   eb_pin_flags(entry, ev->flags) | pin_flags);
-	if (err)
-		return err;
+	/*
+	 * Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset,
+	 * limit address to the first 4GBs for unflagged objects.
+	 */
+	if (!(exec_flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS))
+		end = min_t(u64, end, (1ULL << 32) - I915_GTT_PAGE_SIZE);
 
-	if (entry->offset != vma->node.start) {
-		entry->offset = vma->node.start | UPDATE;
-		eb->args->flags |= __EXEC_HAS_RELOC;
+	align = max(align, vma->display_alignment);
+	if (exec_flags & __EXEC_OBJECT_NEEDS_MAP) {
+		vma->node.size = max_t(u64, vma->node.size, vma->fence_size);
+		end = min_t(u64, end, i915_vm_to_ggtt(vm)->mappable_end);
+		align = max_t(u64, align, vma->fence_alignment);
 	}
 
-	if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_FENCE)) {
-		err = i915_vma_pin_fence(vma);
-		if (unlikely(err)) {
-			i915_vma_unpin(vma);
+	if (exec_flags & __EXEC_OBJECT_NEEDS_BIAS)
+		start = BATCH_OFFSET_BIAS;
+
+	GEM_BUG_ON(!vma->node.size);
+	if (vma->node.size > end - start)
+		return -E2BIG;
+
+	/* Try the user's preferred location first (mandatory if soft-pinned) */
+	err = -__EINVAL__;
+	if (vma->node.start >= start &&
+	    IS_ALIGNED(vma->node.start, align) &&
+	    !range_overflows(vma->node.start, vma->node.size, end)) {
+		unsigned int pin_flags;
+
+		if (drm_mm_reserve_node(&vm->mm, &vma->node) == 0)
+			goto pin;
+
+		pin_flags = 0;
+		if (!(exec_flags & EXEC_OBJECT_PINNED))
+			pin_flags = PIN_NONBLOCK;
+
+		err = evict_for_node(work, ev, pin_flags);
+		if (err == 0)
+			goto pin;
+	}
+	if (exec_flags & EXEC_OBJECT_PINNED)
+		return err;
+
+	/* Try the first available free space */
+	if (!best_hole(&vm->mm, &vma->node, start, end, align))
+		goto pin;
+
+	/* Pick a random slot and see if it's available [O(N) worst case] */
+	vma->node.start = random_offset(start, end, vma->node.size, align);
+	if (evict_for_node(work, ev, 0) == 0)
+		goto pin;
+
+	/* Otherwise search all free space [degrades to O(N^2)] */
+	if (drm_mm_insert_node_in_range(&vm->mm, &vma->node,
+					vma->node.size,
+					align0(align),
+					vma->node.color,
+					start, end,
+					DRM_MM_INSERT_BEST) == 0)
+		goto pin;
+
+	/* Pretty busy! Loop over "LRU" and evict oldest in our search range */
+	err = evict_in_range(work, ev, start, end, align);
+	if (unlikely(err))
+		return err;
+
+pin:
+	if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
+		err = __i915_vma_pin_fence(vma); /* XXX no waiting */
+		if (unlikely(err))
 			return err;
-		}
 
 		if (vma->fence)
 			ev->flags |= __EXEC_OBJECT_HAS_FENCE;
 	}
 
+	bind_flags &= ~atomic_read(&vma->flags);
+	if (bind_flags) {
+		err = set_bind_fence(vma, work);
+		if (unlikely(err))
+			return err;
+
+		atomic_add(I915_VMA_PAGES_ACTIVE, &vma->pages_count);
+		atomic_or(bind_flags, &vma->flags);
+
+		if (i915_vma_is_ggtt(vma))
+			__i915_vma_set_map_and_fenceable(vma);
+
+		GEM_BUG_ON(!i915_vma_is_active(vma));
+		list_move_tail(&vma->vm_link, &vm->bound_list);
+		ev->bind_flags = bind_flags;
+	}
+	__i915_vma_pin(vma); /* and release */
+
+	GEM_BUG_ON(!bind_flags && !drm_mm_node_allocated(&vma->node));
+	GEM_BUG_ON(!(drm_mm_node_allocated(&vma->node) ^
+		     drm_mm_node_allocated(&ev->hole)));
+
+	if (entry->offset != vma->node.start) {
+		entry->offset = vma->node.start | UPDATE;
+		*work->p_flags |= __EXEC_HAS_RELOC;
+	}
+
 	ev->flags |= __EXEC_OBJECT_HAS_PIN;
 	GEM_BUG_ON(eb_vma_misplaced(entry, vma, ev->flags));
 
 	return 0;
 }
 
-static int eb_reserve(struct i915_execbuffer *eb)
+static int __eb_bind_vma(struct eb_vm_work *work, int err)
 {
-	const unsigned int count = eb->buffer_count;
-	unsigned int pin_flags = PIN_USER | PIN_NONBLOCK;
-	struct list_head last;
+	struct i915_address_space *vm = work->vm;
 	struct eb_vma *ev;
-	unsigned int i, pass;
-	int err = 0;
+
+	GEM_BUG_ON(!intel_gt_pm_is_awake(vm->gt));
 
 	/*
-	 * Attempt to pin all of the buffers into the GTT.
-	 * This is done in 3 phases:
-	 *
-	 * 1a. Unbind all objects that do not match the GTT constraints for
-	 *     the execbuffer (fenceable, mappable, alignment etc).
-	 * 1b. Increment pin count for already bound objects.
-	 * 2.  Bind new objects.
-	 * 3.  Decrement pin count.
-	 *
-	 * This avoid unnecessary unbinding of later objects in order to make
-	 * room for the earlier objects *unless* we need to defragment.
+	 * We have to wait until the stale nodes are completely idle before
+	 * we can remove their PTE and unbind their pages. Hence, after
+	 * claiming their slot in the drm_mm, we defer their removal to
+	 * after the fences are signaled.
 	 */
+	if (!list_empty(&work->evict_list)) {
+		struct i915_vma *vma, *vn;
+
+		mutex_lock(&vm->mutex);
+		list_for_each_entry_safe(vma, vn, &work->evict_list, vm_link) {
+			GEM_BUG_ON(vma->vm != vm);
+			__i915_vma_evict(vma);
+			GEM_BUG_ON(!i915_vma_is_active(vma));
+		}
+		mutex_unlock(&vm->mutex);
+	}
+
+	/*
+	 * Now we know the nodes we require in drm_mm are idle, we can
+	 * replace the PTE in those ranges with our own.
+	 */
+	list_for_each_entry(ev, &work->unbound, bind_link) {
+		struct i915_vma *vma = ev->vma;
+
+		if (!ev->bind_flags)
+			goto put;
+
+		GEM_BUG_ON(vma->vm != vm);
+		GEM_BUG_ON(!i915_vma_is_active(vma));
+
+		if (err == 0)
+			err = vma->ops->bind_vma(vma,
+						 vma->obj->cache_level,
+						 ev->bind_flags |
+						 I915_VMA_ALLOC);
+		if (err)
+			atomic_and(~ev->bind_flags, &vma->flags);
+
+		if (drm_mm_node_allocated(&ev->hole)) {
+			mutex_lock(&vm->mutex);
+			GEM_BUG_ON(ev->hole.mm != &vm->mm);
+			GEM_BUG_ON(ev->hole.color != I915_COLOR_UNEVICTABLE);
+			GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+			drm_mm_remove_node(&ev->hole);
+			if (!err) {
+				drm_mm_reserve_node(&vm->mm, &vma->node);
+				GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
+			} else {
+				list_del_init(&vma->vm_link);
+			}
+			mutex_unlock(&vm->mutex);
+		}
+		ev->bind_flags = 0;
+
+put:
+		GEM_BUG_ON(drm_mm_node_allocated(&ev->hole));
+	}
+	INIT_LIST_HEAD(&work->unbound);
+
+	return err;
+}
+
+static int eb_bind_vma(struct dma_fence_work *base)
+{
+	struct eb_vm_work *work = container_of(base, typeof(*work), base);
+
+	return __eb_bind_vma(work, 0);
+}
+
+static void eb_vma_work_release(struct dma_fence_work *base)
+{
+	struct eb_vm_work *work = container_of(base, typeof(*work), base);
+
+	if (work->active) {
+		if (!list_empty(&work->unbound)) {
+			GEM_BUG_ON(!work->base.dma.error);
+			__eb_bind_vma(work, work->base.dma.error);
+		}
+		i915_active_release(&work->vm->active);
+	}
+
+	eb_vma_array_put(work->array);
+}
+
+static const struct dma_fence_work_ops eb_bind_ops = {
+	.name = "eb_bind",
+	.work = eb_bind_vma,
+	.release = eb_vma_work_release,
+};
+
+static struct eb_vm_work *eb_vm_work(struct i915_execbuffer *eb)
+{
+	struct eb_vm_work *work;
+
+	work = kzalloc(sizeof(*work), GFP_KERNEL);
+	if (!work)
+		return NULL;
+
+	dma_fence_work_init(&work->base, &eb_bind_ops);
+	list_replace_init(&eb->unbound, &work->unbound);
+	work->array = eb_vma_array_get(eb->array);
+	work->p_flags = &eb->args->flags;
+	work->vm = eb->context->vm;
+
+	/* Preallocate our slot in vm->active, outside of vm->mutex */
+	work->active = i915_gem_context_async_id(eb->gem_context);
+	if (i915_active_acquire_for_context(&work->vm->active, work->active)) {
+		work->active = 0;
+		work->base.dma.error = -ENOMEM;
+		dma_fence_work_commit(&work->base);
+		return NULL;
+	}
+
+	INIT_LIST_HEAD(&work->evict_list);
+
+	GEM_BUG_ON(list_empty(&work->unbound));
+	GEM_BUG_ON(!list_empty(&eb->unbound));
+
+	return work;
+}
+
+static int eb_vm_throttle(struct eb_vm_work *work)
+{
+	struct dma_fence *p;
+	int err;
+
+	/* Keep async work queued per context */
+	p = __i915_active_ref(&work->vm->active, work->active, &work->base.dma);
+	if (IS_ERR_OR_NULL(p))
+		return PTR_ERR_OR_ZERO(p);
+
+	err = i915_sw_fence_await_dma_fence(&work->base.chain, p, 0,
+					    GFP_NOWAIT | __GFP_NOWARN);
+	dma_fence_put(p);
+
+	return err < 0 ? err : 0;
+}
+
+static int eb_prepare_vma(struct eb_vma *ev)
+{
+	struct i915_vma *vma = ev->vma;
+	int err;
+
+	ev->hole.flags = 0;
+	ev->bind_flags = 0;
 
-	if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex))
-		return -EINTR;
+	if (!(ev->flags &  __EXEC_OBJECT_HAS_PAGES)) {
+		err = i915_vma_get_pages(vma);
+		if (err)
+			return err;
+
+		ev->flags |=  __EXEC_OBJECT_HAS_PAGES;
+	}
+
+	return 0;
+}
+
+static int eb_reserve(struct i915_execbuffer *eb)
+{
+	const unsigned int count = eb->buffer_count;
+	struct i915_address_space *vm = eb->context->vm;
+	struct list_head last;
+	unsigned int i, pass;
+	struct eb_vma *ev;
+	int err = 0;
 
 	pass = 0;
 	do {
+		struct eb_vm_work *work;
+
 		list_for_each_entry(ev, &eb->unbound, bind_link) {
-			err = eb_reserve_vma(eb, ev, pin_flags);
+			err = eb_prepare_vma(ev);
+			switch (err) {
+			case 0:
+				break;
+			case -EAGAIN:
+				goto retry;
+			default:
+				return err;
+			}
+		}
+
+		work = eb_vm_work(eb);
+		if (!work)
+			return -ENOMEM;
+
+		/* No allocations allowed beyond this point */
+		if (mutex_lock_interruptible(&vm->mutex)) {
+			work->base.dma.error = -EINTR;
+			dma_fence_work_commit(&work->base);
+			return -EINTR;
+		}
+
+		err = eb_vm_throttle(work);
+		if (err) {
+			mutex_unlock(&vm->mutex);
+			work->base.dma.error = err;
+			dma_fence_work_commit(&work->base);
+			return err;
+		}
+
+		list_for_each_entry(ev, &work->unbound, bind_link) {
+			struct i915_vma *vma = ev->vma;
+
+			/*
+			 * Check if this node is being evicted or must be.
+			 *
+			 * As we use the single node inside the vma to track
+			 * both the eviction and where to insert the new node,
+			 * we cannot handle migrating the vma inside the worker.
+			 */
+			if (drm_mm_node_allocated(&vma->node)) {
+				if (eb_vma_misplaced(ev->exec, vma, ev->flags)) {
+					err = -ENOSPC;
+					break;
+				}
+			} else {
+				if (i915_vma_is_active(vma)) {
+					err = -ENOSPC;
+					break;
+				}
+			}
+
+			err = i915_active_acquire(&vma->active);
+			if (!err) {
+				err = eb_reserve_vma(work, ev);
+				i915_active_release(&vma->active);
+			}
 			if (err)
 				break;
 		}
-		if (!(err == -ENOSPC || err == -EAGAIN))
-			break;
 
+		mutex_unlock(&vm->mutex);
+
+		dma_fence_get(&work->base.dma);
+		dma_fence_work_commit_imm(&work->base);
+		if (err == -ENOSPC && dma_fence_wait(&work->base.dma, true))
+			err = -EINTR;
+		dma_fence_put(&work->base.dma);
+		if (err != -ENOSPC)
+			return err;
+
+retry:
 		/* Resort *all* the objects into priority order */
 		INIT_LIST_HEAD(&eb->unbound);
 		INIT_LIST_HEAD(&last);
@@ -716,37 +1364,52 @@ static int eb_reserve(struct i915_execbuffer *eb)
 		}
 		list_splice_tail(&last, &eb->unbound);
 
+		if (signal_pending(current))
+			return -EINTR;
+
 		if (err == -EAGAIN) {
-			mutex_unlock(&eb->i915->drm.struct_mutex);
 			flush_workqueue(eb->i915->mm.userptr_wq);
-			mutex_lock(&eb->i915->drm.struct_mutex);
 			continue;
 		}
 
-		switch (pass++) {
-		case 0:
-			break;
+		/* Now safe to wait with no reservations held */
+		list_for_each_entry(ev, &eb->unbound, bind_link) {
+			struct i915_vma *vma = ev->vma;
 
-		case 1:
-			/* Too fragmented, unbind everything and retry */
-			mutex_lock(&eb->context->vm->mutex);
-			err = i915_gem_evict_vm(eb->context->vm);
-			mutex_unlock(&eb->context->vm->mutex);
+			GEM_BUG_ON(ev->flags & __EXEC_OBJECT_HAS_PIN);
+
+			if (drm_mm_node_allocated(&vma->node) &&
+			    eb_vma_misplaced(ev->exec, vma, ev->flags)) {
+				err = i915_vma_unbind(vma);
+				if (err)
+					return err;
+			}
+
+			/* Wait for previous to avoid reusing vma->node */
+			err = i915_vma_wait_for_unbind(vma);
 			if (err)
-				goto unlock;
-			break;
+				return err;
+		}
 
+		switch (pass++) {
 		default:
-			err = -ENOSPC;
-			goto unlock;
-		}
+			return -ENOSPC;
 
-		pin_flags = PIN_USER;
-	} while (1);
+		case 2:
+			if (intel_gt_wait_for_idle(vm->gt,
+						   MAX_SCHEDULE_TIMEOUT))
+				return -EINTR;
 
-unlock:
-	mutex_unlock(&eb->i915->drm.struct_mutex);
-	return err;
+			fallthrough;
+		case 1:
+			if (i915_active_wait(&vm->active))
+				return -EINTR;
+
+			fallthrough;
+		case 0:
+			break;
+		}
+	} while (1);
 }
 
 static unsigned int eb_batch_index(const struct i915_execbuffer *eb)
@@ -1393,6 +2056,8 @@ eb_reloc_valid(struct i915_execbuffer *eb,
 	if (unlikely(!target))
 		return -ENOENT;
 
+	GEM_BUG_ON(!i915_vma_is_pinned(target->vma));
+
 	/* Validate that the target is in a valid r/w GPU domain */
 	if (unlikely(reloc->write_domain & (reloc->write_domain - 1))) {
 		drm_dbg(&i915->drm, "reloc with multiple write domains: "
@@ -1787,7 +2452,6 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb)
 			err = i915_vma_move_to_active(vma, eb->request, flags);
 
 		i915_vma_unlock(vma);
-		eb_unreserve_vma(ev);
 	}
 	ww_acquire_fini(&acquire);
 
@@ -2719,7 +3383,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 * snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure
 	 * batch" bit. Hence we need to pin secure batches into the global gtt.
 	 * hsw should have this fixed, but bdw mucks it up again. */
-	batch = eb.batch->vma;
+	batch = i915_vma_get(eb.batch->vma);
 	if (eb.batch_flags & I915_DISPATCH_SECURE) {
 		struct i915_vma *vma;
 
@@ -2739,6 +3403,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			goto err_parse;
 		}
 
+		GEM_BUG_ON(vma->obj != batch->obj);
 		batch = vma;
 	}
 
@@ -2814,6 +3479,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_parse:
 	if (batch->private)
 		intel_gt_buffer_pool_put(batch->private);
+	i915_vma_put(batch);
 err_vma:
 	if (eb.reloc_cache.fence)
 		eb_reloc_signal(&eb, eb.reloc_cache.rq);
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index f4fec7eb4064..2c5ac598ade2 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -370,6 +370,7 @@ static struct i915_vma *pd_vma_create(struct gen6_ppgtt *ppgtt, int size)
 	atomic_set(&vma->flags, I915_VMA_GGTT);
 	vma->ggtt_view.type = I915_GGTT_VIEW_ROTATED; /* prevent fencing */
 
+	INIT_LIST_HEAD(&vma->vm_link);
 	INIT_LIST_HEAD(&vma->obj_link);
 	INIT_LIST_HEAD(&vma->closed_link);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 323c328d444a..eaacf369d304 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -582,7 +582,8 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
 	if (flags & I915_VMA_LOCAL_BIND) {
 		struct i915_ppgtt *alias = i915_vm_to_ggtt(vma->vm)->alias;
 
-		if (flags & I915_VMA_ALLOC) {
+		if (flags & I915_VMA_ALLOC &&
+		    !test_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma))) {
 			ret = alias->vm.allocate_va_range(&alias->vm,
 							  vma->node.start,
 							  vma->size);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 2a72cce63fd9..82d4f943c346 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -194,6 +194,8 @@ void __i915_vm_close(struct i915_address_space *vm)
 
 void i915_address_space_fini(struct i915_address_space *vm)
 {
+	i915_active_fini(&vm->active);
+
 	spin_lock(&vm->free_pages.lock);
 	if (pagevec_count(&vm->free_pages.pvec))
 		vm_free_pages_release(vm, true);
@@ -246,6 +248,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
 	drm_mm_init(&vm->mm, 0, vm->total);
 	vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
 
+	i915_active_init(&vm->active, NULL, NULL);
+
 	stash_init(&vm->free_pages);
 
 	INIT_LIST_HEAD(&vm->bound_list);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index d93ebdf3fa0e..773fc76dfa1b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -263,6 +263,8 @@ struct i915_address_space {
 	 */
 	struct list_head bound_list;
 
+	struct i915_active active;
+
 	struct pagestash free_pages;
 
 	/* Global GTT */
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index f86f7e68ce5e..ecdd58f4b993 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -162,7 +162,8 @@ static int ppgtt_bind_vma(struct i915_vma *vma,
 	u32 pte_flags;
 	int err;
 
-	if (flags & I915_VMA_ALLOC) {
+	if (flags & I915_VMA_ALLOC &&
+	    !test_bit(I915_VMA_ALLOC_BIT, __i915_vma_flags(vma))) {
 		err = vma->vm->allocate_va_range(vma->vm,
 						 vma->node.start, vma->size);
 		if (err)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0cbcb9f54e7d..6effa85532c6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -984,6 +984,9 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 		return vma;
 
 	if (i915_vma_misplaced(vma, size, alignment, flags)) {
+		if (flags & PIN_NOEVICT)
+			return ERR_PTR(-ENOSPC);
+
 		if (flags & PIN_NONBLOCK) {
 			if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))
 				return ERR_PTR(-ENOSPC);
@@ -998,6 +1001,10 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			return ERR_PTR(ret);
 	}
 
+	if (flags & PIN_NONBLOCK &&
+	    i915_active_fence_isset(&vma->active.excl))
+		return ERR_PTR(-EAGAIN);
+
 	ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
 	if (ret)
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index cb43381b0d37..7e1225874b03 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -219,6 +219,8 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 		mode = DRM_MM_INSERT_HIGHEST;
 	if (flags & PIN_MAPPABLE)
 		mode = DRM_MM_INSERT_LOW;
+	if (flags & PIN_NOSEARCH)
+		mode |= DRM_MM_INSERT_ONCE;
 
 	/* We only allocate in PAGE_SIZE/GTT_PAGE_SIZE (4096) chunks,
 	 * so we know that we always have a minimum alignment of 4096.
@@ -236,6 +238,9 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 	if (err != -ENOSPC)
 		return err;
 
+	if (flags & PIN_NOSEARCH)
+		return -ENOSPC;
+
 	if (mode & DRM_MM_INSERT_ONCE) {
 		err = drm_mm_insert_node_in_range(&vm->mm, node,
 						  size, alignment, color,
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 9b30ddc49e4b..c071669e352a 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -132,6 +132,7 @@ vma_create(struct drm_i915_gem_object *obj,
 		fs_reclaim_release(GFP_KERNEL);
 	}
 
+	INIT_LIST_HEAD(&vma->vm_link);
 	INIT_LIST_HEAD(&vma->closed_link);
 
 	if (view && view->type != I915_GGTT_VIEW_NORMAL) {
@@ -342,25 +343,37 @@ struct i915_vma_work *i915_vma_work(void)
 	return vw;
 }
 
-int i915_vma_wait_for_bind(struct i915_vma *vma)
+static int
+__i915_vma_wait_excl(struct i915_vma *vma, bool bound, unsigned int flags)
 {
+	struct dma_fence *fence;
 	int err = 0;
 
-	if (rcu_access_pointer(vma->active.excl.fence)) {
-		struct dma_fence *fence;
+	fence = i915_active_fence_get(&vma->active.excl);
+	if (!fence)
+		return 0;
 
-		rcu_read_lock();
-		fence = dma_fence_get_rcu_safe(&vma->active.excl.fence);
-		rcu_read_unlock();
-		if (fence) {
-			err = dma_fence_wait(fence, MAX_SCHEDULE_TIMEOUT);
-			dma_fence_put(fence);
-		}
+	if (drm_mm_node_allocated(&vma->node) == bound) {
+		if (flags & PIN_NOEVICT)
+			err = -EBUSY;
+		else
+			err = dma_fence_wait(fence, true);
 	}
 
+	dma_fence_put(fence);
 	return err;
 }
 
+int i915_vma_wait_for_bind(struct i915_vma *vma)
+{
+	return __i915_vma_wait_excl(vma, true, 0);
+}
+
+int i915_vma_wait_for_unbind(struct i915_vma *vma)
+{
+	return __i915_vma_wait_excl(vma, false, 0);
+}
+
 /**
  * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space.
  * @vma: VMA to map
@@ -628,8 +641,9 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	u64 start, end;
 	int ret;
 
-	GEM_BUG_ON(i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND));
+	GEM_BUG_ON(i915_vma_is_bound(vma, I915_VMA_BIND_MASK));
 	GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
+	GEM_BUG_ON(i915_active_fence_isset(&vma->active.excl));
 
 	size = max(size, vma->size);
 	alignment = max(alignment, vma->display_alignment);
@@ -725,7 +739,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 	GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, color));
 
-	list_add_tail(&vma->vm_link, &vma->vm->bound_list);
+	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
 
 	return 0;
 }
@@ -733,15 +747,12 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 static void
 i915_vma_detach(struct i915_vma *vma)
 {
-	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
-	GEM_BUG_ON(i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND));
-
 	/*
 	 * And finally now the object is completely decoupled from this
 	 * vma, we can drop its hold on the backing storage and allow
 	 * it to be reaped by the shrinker.
 	 */
-	list_del(&vma->vm_link);
+	list_del_init(&vma->vm_link);
 }
 
 static bool try_qad_pin(struct i915_vma *vma, unsigned int flags)
@@ -787,7 +798,7 @@ static bool try_qad_pin(struct i915_vma *vma, unsigned int flags)
 	return pinned;
 }
 
-static int vma_get_pages(struct i915_vma *vma)
+int i915_vma_get_pages(struct i915_vma *vma)
 {
 	int err = 0;
 
@@ -834,7 +845,7 @@ static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
 	mutex_unlock(&vma->pages_mutex);
 }
 
-static void vma_put_pages(struct i915_vma *vma)
+void i915_vma_put_pages(struct i915_vma *vma)
 {
 	if (atomic_add_unless(&vma->pages_count, -1, 1))
 		return;
@@ -851,9 +862,13 @@ static void vma_unbind_pages(struct i915_vma *vma)
 	/* The upper portion of pages_count is the number of bindings */
 	count = atomic_read(&vma->pages_count);
 	count >>= I915_VMA_PAGES_BIAS;
-	GEM_BUG_ON(!count);
+	if (count)
+		__vma_put_pages(vma, count | count << I915_VMA_PAGES_BIAS);
+}
 
-	__vma_put_pages(vma, count | count << I915_VMA_PAGES_BIAS);
+static int __wait_for_unbind(struct i915_vma *vma, unsigned int flags)
+{
+	return __i915_vma_wait_excl(vma, false, flags);
 }
 
 int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
@@ -872,10 +887,14 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	if (try_qad_pin(vma, flags & I915_VMA_BIND_MASK))
 		return 0;
 
-	err = vma_get_pages(vma);
+	err = i915_vma_get_pages(vma);
 	if (err)
 		return err;
 
+	err = __wait_for_unbind(vma, flags);
+	if (err)
+		goto err_pages;
+
 	if (flags & vma->vm->bind_async_flags) {
 		work = i915_vma_work();
 		if (!work) {
@@ -937,6 +956,10 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 		goto err_unlock;
 
 	if (!(bound & I915_VMA_BIND_MASK)) {
+		err = __wait_for_unbind(vma, flags);
+		if (err)
+			goto err_active;
+
 		err = i915_vma_insert(vma, size, alignment, flags);
 		if (err)
 			goto err_active;
@@ -956,6 +979,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	GEM_BUG_ON(bound + I915_VMA_PAGES_ACTIVE < bound);
 	atomic_add(I915_VMA_PAGES_ACTIVE, &vma->pages_count);
 	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+	GEM_BUG_ON(!i915_vma_is_active(vma));
 
 	__i915_vma_pin(vma);
 	GEM_BUG_ON(!i915_vma_is_pinned(vma));
@@ -977,7 +1001,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
 err_pages:
-	vma_put_pages(vma);
+	i915_vma_put_pages(vma);
 	return err;
 }
 
@@ -1081,6 +1105,7 @@ void i915_vma_release(struct kref *ref)
 		GEM_BUG_ON(drm_mm_node_allocated(&vma->node));
 	}
 	GEM_BUG_ON(i915_vma_is_active(vma));
+	GEM_BUG_ON(!list_empty(&vma->vm_link));
 
 	if (vma->obj) {
 		struct drm_i915_gem_object *obj = vma->obj;
@@ -1139,7 +1164,7 @@ static void __i915_vma_iounmap(struct i915_vma *vma)
 {
 	GEM_BUG_ON(i915_vma_is_pinned(vma));
 
-	if (vma->iomap == NULL)
+	if (!vma->iomap)
 		return;
 
 	io_mapping_unmap(vma->iomap);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index d0d01f909548..478e8679f331 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -240,6 +240,9 @@ int __must_check
 i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags);
 int i915_ggtt_pin(struct i915_vma *vma, u32 align, unsigned int flags);
 
+int i915_vma_get_pages(struct i915_vma *vma);
+void i915_vma_put_pages(struct i915_vma *vma);
+
 static inline int i915_vma_pin_count(const struct i915_vma *vma)
 {
 	return atomic_read(&vma->flags) & I915_VMA_PIN_MASK;
@@ -377,6 +380,7 @@ void i915_vma_make_shrinkable(struct i915_vma *vma);
 void i915_vma_make_purgeable(struct i915_vma *vma);
 
 int i915_vma_wait_for_bind(struct i915_vma *vma);
+int i915_vma_wait_for_unbind(struct i915_vma *vma);
 
 static inline int i915_vma_sync(struct i915_vma *vma)
 {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH 36/36] drm/i915/gem: Bind the fence async for execbuf
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (33 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 35/36] drm/i915/gem: Asynchronous GTT unbinding Chris Wilson
@ 2020-06-01  7:24 ` Chris Wilson
  2020-06-01  7:34 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/36] drm/i915: Handle very early engine initialisation failure Patchwork
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  7:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

It is illegal to wait on an another vma while holding the vm->mutex, as
that easily leads to ABBA deadlocks (we wait on a second vma that waits
on us to release the vm->mutex). So while the vm->mutex exists, move the
waiting outside of the lock into the async binding pipeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  41 ++++--
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c  | 137 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h  |   5 +
 3 files changed, 166 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 49bfae968215..aa8d86d4466d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -511,13 +511,23 @@ eb_pin_vma(struct i915_execbuffer *eb,
 	}
 
 	if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_FENCE)) {
-		if (unlikely(i915_vma_pin_fence(vma))) {
-			i915_vma_unpin(vma);
-			return false;
-		}
+		struct i915_fence_reg *reg = vma->fence;
 
-		if (vma->fence)
+		/* Avoid waiting to change the fence; defer to async worker */
+		if (reg) {
+			if (READ_ONCE(reg->dirty)) {
+				__i915_vma_unpin(vma);
+				return false;
+			}
+
+			atomic_inc(&reg->pin_count);
 			ev->flags |= __EXEC_OBJECT_HAS_FENCE;
+		} else {
+			if (i915_gem_object_is_tiled(vma->obj)) {
+				__i915_vma_unpin(vma);
+				return false;
+			}
+		}
 	}
 
 	ev->flags |= __EXEC_OBJECT_HAS_PIN;
@@ -1043,15 +1053,6 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_vma *ev)
 		return err;
 
 pin:
-	if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
-		err = __i915_vma_pin_fence(vma); /* XXX no waiting */
-		if (unlikely(err))
-			return err;
-
-		if (vma->fence)
-			ev->flags |= __EXEC_OBJECT_HAS_FENCE;
-	}
-
 	bind_flags &= ~atomic_read(&vma->flags);
 	if (bind_flags) {
 		err = set_bind_fence(vma, work);
@@ -1082,6 +1083,15 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_vma *ev)
 	ev->flags |= __EXEC_OBJECT_HAS_PIN;
 	GEM_BUG_ON(eb_vma_misplaced(entry, vma, ev->flags));
 
+	if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) {
+		err = __i915_vma_pin_fence_async(vma, &work->base);
+		if (unlikely(err))
+			return err;
+
+		if (vma->fence)
+			ev->flags |= __EXEC_OBJECT_HAS_FENCE;
+	}
+
 	return 0;
 }
 
@@ -1117,6 +1127,9 @@ static int __eb_bind_vma(struct eb_vm_work *work, int err)
 	list_for_each_entry(ev, &work->unbound, bind_link) {
 		struct i915_vma *vma = ev->vma;
 
+		if (ev->flags & __EXEC_OBJECT_HAS_FENCE)
+			__i915_vma_apply_fence_async(vma);
+
 		if (!ev->bind_flags)
 			goto put;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
index 7fb36b12fe7a..734b6aa61809 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
@@ -21,10 +21,13 @@
  * IN THE SOFTWARE.
  */
 
+#include "i915_active.h"
 #include "i915_drv.h"
 #include "i915_scatterlist.h"
+#include "i915_sw_fence_work.h"
 #include "i915_pvinfo.h"
 #include "i915_vgpu.h"
+#include "i915_vma.h"
 
 /**
  * DOC: fence register handling
@@ -340,19 +343,37 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
 	return ERR_PTR(-EDEADLK);
 }
 
+static int fence_wait_bind(struct i915_fence_reg *reg)
+{
+	struct dma_fence *fence;
+	int err = 0;
+
+	fence = i915_active_fence_get(&reg->active.excl);
+	if (fence) {
+		err = dma_fence_wait(fence, true);
+		dma_fence_put(fence);
+	}
+
+	return err;
+}
+
 int __i915_vma_pin_fence(struct i915_vma *vma)
 {
 	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
-	struct i915_fence_reg *fence;
+	struct i915_fence_reg *fence = vma->fence;
 	struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL;
 	int err;
 
 	lockdep_assert_held(&vma->vm->mutex);
 
 	/* Just update our place in the LRU if our fence is getting reused. */
-	if (vma->fence) {
-		fence = vma->fence;
+	if (fence) {
 		GEM_BUG_ON(fence->vma != vma);
+
+		err = fence_wait_bind(fence);
+		if (err)
+			return err;
+
 		atomic_inc(&fence->pin_count);
 		if (!fence->dirty) {
 			list_move_tail(&fence->link, &ggtt->fence_list);
@@ -384,6 +405,116 @@ int __i915_vma_pin_fence(struct i915_vma *vma)
 	return err;
 }
 
+static int set_bind_fence(struct i915_fence_reg *fence,
+			  struct dma_fence_work *work)
+{
+	struct dma_fence *prev;
+	int err;
+
+	if (rcu_access_pointer(fence->active.excl.fence) == &work->dma)
+		return 0;
+
+	err = i915_sw_fence_await_active(&work->chain,
+					 &fence->active,
+					 I915_ACTIVE_AWAIT_ACTIVE);
+	if (err)
+		return err;
+
+	if (i915_active_acquire(&fence->active))
+		return -ENOENT;
+
+	prev = i915_active_set_exclusive(&fence->active, &work->dma);
+	if (unlikely(prev)) {
+		err = i915_sw_fence_await_dma_fence(&work->chain, prev, 0,
+						    GFP_NOWAIT | __GFP_NOWARN);
+		dma_fence_put(prev);
+	}
+
+	i915_active_release(&fence->active);
+	return err < 0 ? err : 0;
+}
+
+int __i915_vma_pin_fence_async(struct i915_vma *vma,
+			       struct dma_fence_work *work)
+{
+	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
+	struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL;
+	struct i915_fence_reg *fence = vma->fence;
+	int err;
+
+	lockdep_assert_held(&vma->vm->mutex);
+
+	/* Just update our place in the LRU if our fence is getting reused. */
+	if (fence) {
+		GEM_BUG_ON(fence->vma != vma);
+		GEM_BUG_ON(!i915_vma_is_map_and_fenceable(vma));
+	} else if (set) {
+		if (!i915_vma_is_map_and_fenceable(vma))
+			return -EINVAL;
+
+		fence = fence_find(ggtt);
+		if (IS_ERR(fence))
+			return -ENOSPC;
+
+		GEM_BUG_ON(atomic_read(&fence->pin_count));
+		fence->dirty = true;
+	} else {
+		return 0;
+	}
+
+	atomic_inc(&fence->pin_count);
+	list_move_tail(&fence->link, &ggtt->fence_list);
+	if (!fence->dirty)
+		return 0;
+
+	if (INTEL_GEN(fence_to_i915(fence)) < 4 &&
+	    rcu_access_pointer(vma->active.excl.fence) != &work->dma) {
+		/* implicit 'unfenced' GPU blits */
+		err = i915_sw_fence_await_active(&work->chain,
+						 &vma->active,
+						 I915_ACTIVE_AWAIT_ACTIVE);
+		if (err)
+			goto err_unpin;
+	}
+
+	err = set_bind_fence(fence, work);
+	if (err)
+		goto err_unpin;
+
+	if (set) {
+		fence->start = vma->node.start;
+		fence->size  = vma->fence_size;
+		fence->stride = i915_gem_object_get_stride(vma->obj);
+		fence->tiling = i915_gem_object_get_tiling(vma->obj);
+
+		vma->fence = fence;
+	} else {
+		fence->tiling = 0;
+		vma->fence = NULL;
+	}
+
+	set = xchg(&fence->vma, set);
+	if (set && set != vma) {
+		GEM_BUG_ON(set->fence != fence);
+		WRITE_ONCE(set->fence, NULL);
+		i915_vma_revoke_mmap(set);
+	}
+
+	return 0;
+
+err_unpin:
+	atomic_dec(&fence->pin_count);
+	return err;
+}
+
+void __i915_vma_apply_fence_async(struct i915_vma *vma)
+{
+	struct i915_fence_reg *fence = vma->fence;
+
+	if (fence->dirty)
+		fence_write(fence);
+}
+
 /**
  * i915_vma_pin_fence - set up fencing for a vma
  * @vma: vma to map through a fence reg
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
index 9eef679e1311..d306ac14d47e 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h
@@ -30,6 +30,7 @@
 
 #include "i915_active.h"
 
+struct dma_fence_work;
 struct drm_i915_gem_object;
 struct i915_ggtt;
 struct i915_vma;
@@ -70,6 +71,10 @@ void i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj,
 void i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj,
 					 struct sg_table *pages);
 
+int __i915_vma_pin_fence_async(struct i915_vma *vma,
+			       struct dma_fence_work *work);
+void __i915_vma_apply_fence_async(struct i915_vma *vma);
+
 void intel_ggtt_init_fences(struct i915_ggtt *ggtt);
 void intel_ggtt_fini_fences(struct i915_ggtt *ggtt);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/36] drm/i915: Handle very early engine initialisation failure
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (34 preceding siblings ...)
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 36/36] drm/i915/gem: Bind the fence async for execbuf Chris Wilson
@ 2020-06-01  7:34 ` Patchwork
  2020-06-01  7:36 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
                   ` (6 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2020-06-01  7:34 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/36] drm/i915: Handle very early engine initialisation failure
URL   : https://patchwork.freedesktop.org/series/77857/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
96d738818236 drm/i915: Handle very early engine initialisation failure
284e5e82f9e8 drm/i915/gt: Split low level gen2-7 CS emitters
-:9: WARNING:TYPO_SPELLING: 'wnat' may be misspelled - perhaps 'want'?
#9: 
with requests, and we will wnat to reuse them outside of this context.

-:27: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#27: 
new file mode 100644

-:179: WARNING:LONG_LINE: line over 100 characters
#179: FILE: drivers/gpu/drm/i915/gt/gen2_engine_cs.c:148:
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);

-:202: WARNING:LONG_LINE: line over 100 characters
#202: FILE: drivers/gpu/drm/i915/gt/gen2_engine_cs.c:171:
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);

-:220: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#220: FILE: drivers/gpu/drm/i915/gt/gen2_engine_cs.c:189:
+}
+#undef GEN5_WA_STORES

-:798: WARNING:LONG_LINE: line over 100 characters
#798: FILE: drivers/gpu/drm/i915/gt/gen6_engine_cs.c:377:
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);

-:818: WARNING:LONG_LINE: line over 100 characters
#818: FILE: drivers/gpu/drm/i915/gt/gen6_engine_cs.c:397:
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);

-:843: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#843: FILE: drivers/gpu/drm/i915/gt/gen6_engine_cs.c:422:
+}
+#undef GEN7_XCS_WA

total: 0 errors, 6 warnings, 2 checks, 1812 lines checked
8f9e2ddc2328 drm/i915/gt: Move legacy context wa to intel_workarounds
5b9d253c3510 drm/i915: Trim the ironlake+ irq handler
cf0807656a76 Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq"
ce9a7aebf8d2 drm/i915/gt: Couple tasklet scheduling for all CS interrupts
eacfbfb97087 drm/i915/gt: Support creation of 'internal' rings
58305c0b593a drm/i915/gt: Use client timeline address for seqno writes
703dea6710a7 drm/i915: Support inter-engine semaphores on gen6/7
15787ec4da9d drm/i915/gt: Infrastructure for ring scheduling
-:79: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#79: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 842 lines checked
0760ab97f627 drm/i915/gt: Enable busy-stats for ring-scheduler
-:13: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#13: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 232 lines checked
118524c684c6 drm/i915/gt: Track if an engine requires forcewake w/a
99a81a3f55b0 drm/i915: Relinquish forcewake immediately after manual grouping
8c829f7c5554 drm/i915/gt: Implement ring scheduler for gen6/7
-:68: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#68: FILE: drivers/gpu/drm/i915/gt/intel_ring_scheduler.c:324:
+				*cs++ = i915_mmio_reg_offset(

-:70: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#70: FILE: drivers/gpu/drm/i915/gt/intel_ring_scheduler.c:326:
+				*cs++ = _MASKED_BIT_ENABLE(

-:105: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#105: FILE: drivers/gpu/drm/i915/gt/intel_ring_scheduler.c:361:
+				*cs++ = _MASKED_BIT_DISABLE(

total: 0 errors, 0 warnings, 3 checks, 512 lines checked
3148cd01132a drm/i915/gt: Enable ring scheduling for gen6/7
143d1995d466 drm/i915/gem: Mark the buffer pool as active for the cmdparser
033b56b47071 drm/i915/gem: Async GPU relocations only
9b9d4aa5ab50 drm/i915: Add list_for_each_entry_safe_continue_reverse
-:20: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'pos' - possible side-effects?
#20: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

-:20: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'n' - possible side-effects?
#20: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

-:20: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#20: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

total: 0 errors, 0 warnings, 3 checks, 12 lines checked
b2abfeed9589 drm/i915/gem: Separate reloc validation into an earlier step
-:101: WARNING:UNNECESSARY_ELSE: else is not generally useful after a break or return
#101: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1408:
+				return (int)offset;
+			} else {

total: 0 errors, 1 warnings, 0 checks, 217 lines checked
c26758f43cfe drm/i915/gem: Lift GPU relocation allocation
6f01210067aa drm/i915/gem: Build the reloc request first
710af09a4672 drm/i915/gem: Add all GPU reloc awaits/signals en masse
6b9cc101590e dma-buf: Proxy fence, an unsignaled fence placeholder
-:45: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#45: 
new file mode 100644

-:438: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#438: FILE: drivers/dma-buf/st-dma-fence-proxy.c:20:
+	spinlock_t lock;

total: 0 errors, 1 warnings, 1 checks, 1158 lines checked
06d2116fd5f4 drm/i915: Unpeel awaits on a proxy fence
f3cd5ea0ff2c drm/i915/gem: Make relocations atomic within execbuf
dda5123316b5 drm/syncobj: Allow use of dma-fence-proxy
d03f86e3977c drm/i915/gem: Teach execbuf how to wait on future syncobj
9b560a7e6079 drm/i915/gem: Allow combining submit-fences with syncobj
9675274fc27c drm/i915/gt: Declare when we enabled timeslicing
7cf3aad63437 drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT
308d42d80cb3 drm/i915: Always defer fenced work to the worker
b253079dbe95 drm/i915/gem: Assign context id for async work
68f85c40ff48 drm/i915: Export a preallocate variant of i915_active_acquire()
d90d92d26c5c drm/i915/gem: Separate the ww_mutex walker into its own list
9c2cdb286a07 drm/i915/gem: Asynchronous GTT unbinding
ac90fd42eb1d drm/i915/gem: Bind the fence async for execbuf

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [01/36] drm/i915: Handle very early engine initialisation failure
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (35 preceding siblings ...)
  2020-06-01  7:34 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/36] drm/i915: Handle very early engine initialisation failure Patchwork
@ 2020-06-01  7:36 ` Patchwork
  2020-06-01  7:56 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2020-06-01  7:36 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/36] drm/i915: Handle very early engine initialisation failure
URL   : https://patchwork.freedesktop.org/series/77857/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be checked separately.
+drivers/gpu/drm/i915/intel_wakeref.c:137:19: warning: context imbalance in 'wakeref_auto_timeout' - unexpected unlock
+drivers/gpu/drm/i915/selftests/i915_syncmap.c:80:54: warning: dubious: x | !y

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [01/36] drm/i915: Handle very early engine initialisation failure
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (36 preceding siblings ...)
  2020-06-01  7:36 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2020-06-01  7:56 ` Patchwork
  2020-06-01  8:30 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2) Patchwork
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2020-06-01  7:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/36] drm/i915: Handle very early engine initialisation failure
URL   : https://patchwork.freedesktop.org/series/77857/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8560 -> Patchwork_17826
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_17826 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17826, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17826/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_17826:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@gt_engines:
    - fi-cml-s:           [PASS][1] -> [FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/fi-cml-s/igt@i915_selftest@live@gt_engines.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17826/fi-cml-s/igt@i915_selftest@live@gt_engines.html

  
New tests
---------

  New tests have been introduced between CI_DRM_8560 and Patchwork_17826:

### New IGT tests (1) ###

  * igt@dmabuf@all@dma_fence_proxy:
    - Statuses : 42 pass(s)
    - Exec time: [0.02, 0.11] s

  



Participating hosts (50 -> 44)
------------------------------

  Additional (1): fi-ehl-1 
  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-kbl-7560u fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_8560 -> Patchwork_17826

  CI-20190529: 20190529
  CI_DRM_8560: 02fe287fdb4a3d6bceb1bb61b3c8538b4b941b3c @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5687: 668a5be752186b6e08f361bac34da37309d08393 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17826: ac90fd42eb1dc9c16ba67d9a551bb2b3c238cd42 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

ac90fd42eb1d drm/i915/gem: Bind the fence async for execbuf
9c2cdb286a07 drm/i915/gem: Asynchronous GTT unbinding
d90d92d26c5c drm/i915/gem: Separate the ww_mutex walker into its own list
68f85c40ff48 drm/i915: Export a preallocate variant of i915_active_acquire()
b253079dbe95 drm/i915/gem: Assign context id for async work
308d42d80cb3 drm/i915: Always defer fenced work to the worker
7cf3aad63437 drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT
9675274fc27c drm/i915/gt: Declare when we enabled timeslicing
9b560a7e6079 drm/i915/gem: Allow combining submit-fences with syncobj
d03f86e3977c drm/i915/gem: Teach execbuf how to wait on future syncobj
dda5123316b5 drm/syncobj: Allow use of dma-fence-proxy
f3cd5ea0ff2c drm/i915/gem: Make relocations atomic within execbuf
06d2116fd5f4 drm/i915: Unpeel awaits on a proxy fence
6b9cc101590e dma-buf: Proxy fence, an unsignaled fence placeholder
710af09a4672 drm/i915/gem: Add all GPU reloc awaits/signals en masse
6f01210067aa drm/i915/gem: Build the reloc request first
c26758f43cfe drm/i915/gem: Lift GPU relocation allocation
b2abfeed9589 drm/i915/gem: Separate reloc validation into an earlier step
9b9d4aa5ab50 drm/i915: Add list_for_each_entry_safe_continue_reverse
033b56b47071 drm/i915/gem: Async GPU relocations only
143d1995d466 drm/i915/gem: Mark the buffer pool as active for the cmdparser
3148cd01132a drm/i915/gt: Enable ring scheduling for gen6/7
8c829f7c5554 drm/i915/gt: Implement ring scheduler for gen6/7
99a81a3f55b0 drm/i915: Relinquish forcewake immediately after manual grouping
118524c684c6 drm/i915/gt: Track if an engine requires forcewake w/a
0760ab97f627 drm/i915/gt: Enable busy-stats for ring-scheduler
15787ec4da9d drm/i915/gt: Infrastructure for ring scheduling
703dea6710a7 drm/i915: Support inter-engine semaphores on gen6/7
58305c0b593a drm/i915/gt: Use client timeline address for seqno writes
eacfbfb97087 drm/i915/gt: Support creation of 'internal' rings
ce9a7aebf8d2 drm/i915/gt: Couple tasklet scheduling for all CS interrupts
cf0807656a76 Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq"
5b9d253c3510 drm/i915: Trim the ironlake+ irq handler
8f9e2ddc2328 drm/i915/gt: Move legacy context wa to intel_workarounds
284e5e82f9e8 drm/i915/gt: Split low level gen2-7 CS emitters
96d738818236 drm/i915: Handle very early engine initialisation failure

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17826/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/gt: Enable busy-stats for ring-scheduler
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 11/36] drm/i915/gt: Enable busy-stats for ring-scheduler Chris Wilson
@ 2020-06-01  8:03   ` Chris Wilson
  0 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01  8:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Couple up the context in/out accounting to record how long each engine
is busy handling requests. This is exposed to userspace for more accurate
measurements, and also enables our soft-rps timer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_stats.h  | 49 ++++++++++
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 34 +------
 .../gpu/drm/i915/gt/intel_ring_scheduler.c    |  4 +
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c  | 90 +++++++++++++++++++
 drivers/gpu/drm/i915/gt/selftest_rps.c        |  5 ++
 5 files changed, 149 insertions(+), 33 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_stats.h

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_stats.h b/drivers/gpu/drm/i915/gt/intel_engine_stats.h
new file mode 100644
index 000000000000..58491eae3482
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_engine_stats.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#ifndef __INTEL_ENGINE_STATS_H__
+#define __INTEL_ENGINE_STATS_H__
+
+#include <linux/atomic.h>
+#include <linux/ktime.h>
+#include <linux/seqlock.h>
+
+#include "i915_gem.h" /* GEM_BUG_ON */
+#include "intel_engine.h"
+
+static inline void intel_engine_context_in(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	if (atomic_add_unless(&engine->stats.active, 1, 0))
+		return;
+
+	write_seqlock_irqsave(&engine->stats.lock, flags);
+	if (!atomic_add_unless(&engine->stats.active, 1, 0)) {
+		engine->stats.start = ktime_get();
+		atomic_inc(&engine->stats.active);
+	}
+	write_sequnlock_irqrestore(&engine->stats.lock, flags);
+}
+
+static inline void intel_engine_context_out(struct intel_engine_cs *engine)
+{
+	unsigned long flags;
+
+	GEM_BUG_ON(!atomic_read(&engine->stats.active));
+
+	if (atomic_add_unless(&engine->stats.active, -1, 1))
+		return;
+
+	write_seqlock_irqsave(&engine->stats.lock, flags);
+	if (atomic_dec_and_test(&engine->stats.active)) {
+		engine->stats.total =
+			ktime_add(engine->stats.total,
+				  ktime_sub(ktime_get(), engine->stats.start));
+	}
+	write_sequnlock_irqrestore(&engine->stats.lock, flags);
+}
+
+#endif /* __INTEL_ENGINE_STATS_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 6fc0966b75ff..13ef4f58cb08 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -139,6 +139,7 @@
 #include "i915_vgpu.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
+#include "intel_engine_stats.h"
 #include "intel_gt.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
@@ -1187,39 +1188,6 @@ execlists_context_status_change(struct i915_request *rq, unsigned long status)
 				   status, rq);
 }
 
-static void intel_engine_context_in(struct intel_engine_cs *engine)
-{
-	unsigned long flags;
-
-	if (atomic_add_unless(&engine->stats.active, 1, 0))
-		return;
-
-	write_seqlock_irqsave(&engine->stats.lock, flags);
-	if (!atomic_add_unless(&engine->stats.active, 1, 0)) {
-		engine->stats.start = ktime_get();
-		atomic_inc(&engine->stats.active);
-	}
-	write_sequnlock_irqrestore(&engine->stats.lock, flags);
-}
-
-static void intel_engine_context_out(struct intel_engine_cs *engine)
-{
-	unsigned long flags;
-
-	GEM_BUG_ON(!atomic_read(&engine->stats.active));
-
-	if (atomic_add_unless(&engine->stats.active, -1, 1))
-		return;
-
-	write_seqlock_irqsave(&engine->stats.lock, flags);
-	if (atomic_dec_and_test(&engine->stats.active)) {
-		engine->stats.total =
-			ktime_add(engine->stats.total,
-				  ktime_sub(ktime_get(), engine->stats.start));
-	}
-	write_sequnlock_irqrestore(&engine->stats.lock, flags);
-}
-
 static void
 execlists_check_context(const struct intel_context *ce,
 			const struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
index c8cd435d1c51..aaff554865b1 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
@@ -9,6 +9,7 @@
 
 #include "i915_drv.h"
 #include "intel_context.h"
+#include "intel_engine_stats.h"
 #include "intel_gt.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
@@ -59,6 +60,7 @@ static struct i915_request *
 schedule_in(struct intel_engine_cs *engine, struct i915_request *rq)
 {
 	__intel_gt_pm_get(engine->gt);
+	intel_engine_context_in(engine);
 	return i915_request_get(rq);
 }
 
@@ -71,6 +73,7 @@ schedule_out(struct intel_engine_cs *engine, struct i915_request *rq)
 		intel_engine_add_retire(engine, ce->timeline);
 
 	i915_request_put(rq);
+	intel_engine_context_out(engine);
 	intel_gt_pm_put_async(engine->gt);
 }
 
@@ -747,6 +750,7 @@ int intel_ring_scheduler_setup(struct intel_engine_cs *engine)
 	engine->legacy.ring = ring;
 
 	engine->flags |= I915_ENGINE_NEEDS_BREADCRUMB_TASKLET;
+	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 
 	/* Finally, take ownership and responsibility for cleanup! */
 	engine->release = ring_release;
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
index cbf6b0735272..64cfa2fdc9bf 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
@@ -7,6 +7,95 @@
 #include "i915_selftest.h"
 #include "selftest_engine.h"
 #include "selftests/igt_atomic.h"
+#include "selftests/igt_flush_test.h"
+#include "selftests/igt_spinner.h"
+
+static int live_engine_busy_stats(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct igt_spinner spin;
+	int err;
+
+	/*
+	 * Check that if an engine supports busy-stats, they tell the truth.
+	 */
+
+	if (igt_spinner_init(&spin, gt))
+		return -ENOMEM;
+
+	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
+	for_each_engine(engine, gt, id) {
+		struct i915_request *rq;
+		ktime_t dt, de;
+
+		if (!intel_engine_supports_stats(engine))
+			continue;
+
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		if (intel_gt_pm_wait_for_idle(gt)) {
+			err = -EBUSY;
+			break;
+		}
+
+		preempt_disable();
+		dt = ktime_get();
+		de = intel_engine_get_busy_time(engine);
+		udelay(100);
+		dt = ktime_sub(ktime_get(), dt);
+		de = ktime_sub(intel_engine_get_busy_time(engine), de);
+		preempt_enable();
+		if (de > 10) {
+			pr_err("%s: reported %lldns [%d%%] busyness while sleeping [for %lldns]\n",
+			       engine->name,
+			       de, (int)div64_u64(100 * de, dt), dt);
+			err = -EINVAL;
+			break;
+		}
+
+		rq = igt_spinner_create_request(&spin,
+						engine->kernel_context,
+						MI_NOOP);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			break;
+		}
+		i915_request_add(rq);
+
+		if (!igt_wait_for_spinner(&spin, rq)) {
+			intel_gt_set_wedged(engine->gt);
+			err = -ETIME;
+			break;
+		}
+
+		preempt_disable();
+		dt = ktime_get();
+		de = intel_engine_get_busy_time(engine);
+		udelay(100);
+		dt = ktime_sub(ktime_get(), dt);
+		de = ktime_sub(intel_engine_get_busy_time(engine), de);
+		preempt_enable();
+		if (100 * de < 95 * dt) {
+			pr_err("%s: reported only %lldns [%d%%] busyness while spinning [for %lldns]\n",
+			       engine->name,
+			       de, (int)div64_u64(100 * de, dt), dt);
+			err = -EINVAL;
+			break;
+		}
+
+		igt_spinner_end(&spin);
+		if (igt_flush_test(gt->i915)) {
+			err = -EIO;
+			break;
+		}
+	}
+
+	igt_spinner_fini(&spin);
+	return err;
+}
 
 static int live_engine_pm(void *arg)
 {
@@ -77,6 +166,7 @@ static int live_engine_pm(void *arg)
 int live_engine_pm_selftests(struct intel_gt *gt)
 {
 	static const struct i915_subtest tests[] = {
+		SUBTEST(live_engine_busy_stats),
 		SUBTEST(live_engine_pm),
 	};
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c
index 5049c3dd08a6..5e364fb31aea 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rps.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
@@ -1252,6 +1252,11 @@ int live_rps_dynamic(void *arg)
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
+	if (intel_rps_has_interrupts(rps))
+		pr_info("RPS has interrupt support\n");
+	if (intel_rps_uses_timer(rps))
+		pr_info("RPS has timer support\n");
+
 	for_each_engine(engine, gt, id) {
 		struct i915_request *rq;
 		struct {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2)
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (37 preceding siblings ...)
  2020-06-01  7:56 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
@ 2020-06-01  8:30 ` Patchwork
  2020-06-01  8:31 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2020-06-01  8:30 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2)
URL   : https://patchwork.freedesktop.org/series/77857/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
b0896ff73ea4 drm/i915: Handle very early engine initialisation failure
28f59054aa9e drm/i915/gt: Split low level gen2-7 CS emitters
-:9: WARNING:TYPO_SPELLING: 'wnat' may be misspelled - perhaps 'want'?
#9: 
with requests, and we will wnat to reuse them outside of this context.

-:27: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#27: 
new file mode 100644

-:179: WARNING:LONG_LINE: line over 100 characters
#179: FILE: drivers/gpu/drm/i915/gt/gen2_engine_cs.c:148:
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);

-:202: WARNING:LONG_LINE: line over 100 characters
#202: FILE: drivers/gpu/drm/i915/gt/gen2_engine_cs.c:171:
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);

-:220: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#220: FILE: drivers/gpu/drm/i915/gt/gen2_engine_cs.c:189:
+}
+#undef GEN5_WA_STORES

-:798: WARNING:LONG_LINE: line over 100 characters
#798: FILE: drivers/gpu/drm/i915/gt/gen6_engine_cs.c:377:
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);

-:818: WARNING:LONG_LINE: line over 100 characters
#818: FILE: drivers/gpu/drm/i915/gt/gen6_engine_cs.c:397:
+	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);

-:843: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#843: FILE: drivers/gpu/drm/i915/gt/gen6_engine_cs.c:422:
+}
+#undef GEN7_XCS_WA

total: 0 errors, 6 warnings, 2 checks, 1812 lines checked
84fb5312e69a drm/i915/gt: Move legacy context wa to intel_workarounds
51c5e9106b00 drm/i915: Trim the ironlake+ irq handler
db08f3a83b7d Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq"
ee444a1c9757 drm/i915/gt: Couple tasklet scheduling for all CS interrupts
a186650cc2c0 drm/i915/gt: Support creation of 'internal' rings
ad2ec515e737 drm/i915/gt: Use client timeline address for seqno writes
75a19e638ad6 drm/i915: Support inter-engine semaphores on gen6/7
9c8572c7a931 drm/i915/gt: Infrastructure for ring scheduling
-:79: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#79: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 842 lines checked
5354a72cf88c drm/i915/gt: Enable busy-stats for ring-scheduler
-:13: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#13: 
new file mode 100644

-:200: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.rst
#200: FILE: drivers/gpu/drm/i915/gt/selftest_engine_pm.c:47:
+		udelay(100);

-:230: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.rst
#230: FILE: drivers/gpu/drm/i915/gt/selftest_engine_pm.c:77:
+		udelay(100);

total: 0 errors, 1 warnings, 2 checks, 236 lines checked
ad2a6b63bf64 drm/i915/gt: Track if an engine requires forcewake w/a
e43d3f056a59 drm/i915: Relinquish forcewake immediately after manual grouping
e4172f511dc0 drm/i915/gt: Implement ring scheduler for gen6/7
-:68: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#68: FILE: drivers/gpu/drm/i915/gt/intel_ring_scheduler.c:324:
+				*cs++ = i915_mmio_reg_offset(

-:70: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#70: FILE: drivers/gpu/drm/i915/gt/intel_ring_scheduler.c:326:
+				*cs++ = _MASKED_BIT_ENABLE(

-:105: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#105: FILE: drivers/gpu/drm/i915/gt/intel_ring_scheduler.c:361:
+				*cs++ = _MASKED_BIT_DISABLE(

total: 0 errors, 0 warnings, 3 checks, 512 lines checked
8f4228e0458d drm/i915/gt: Enable ring scheduling for gen6/7
4223614bdbed drm/i915/gem: Mark the buffer pool as active for the cmdparser
043a8c092764 drm/i915/gem: Async GPU relocations only
d6bb9bf39a67 drm/i915: Add list_for_each_entry_safe_continue_reverse
-:20: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'pos' - possible side-effects?
#20: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

-:20: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'n' - possible side-effects?
#20: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

-:20: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#20: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

total: 0 errors, 0 warnings, 3 checks, 12 lines checked
bf626d095a0a drm/i915/gem: Separate reloc validation into an earlier step
-:101: WARNING:UNNECESSARY_ELSE: else is not generally useful after a break or return
#101: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1408:
+				return (int)offset;
+			} else {

total: 0 errors, 1 warnings, 0 checks, 217 lines checked
a5a8005dc0d5 drm/i915/gem: Lift GPU relocation allocation
cf3caf6ea48e drm/i915/gem: Build the reloc request first
b5ed91ed9028 drm/i915/gem: Add all GPU reloc awaits/signals en masse
3d075ea5bad9 dma-buf: Proxy fence, an unsignaled fence placeholder
-:45: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#45: 
new file mode 100644

-:438: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#438: FILE: drivers/dma-buf/st-dma-fence-proxy.c:20:
+	spinlock_t lock;

total: 0 errors, 1 warnings, 1 checks, 1158 lines checked
ce15d9f7f287 drm/i915: Unpeel awaits on a proxy fence
7f48f9f9cabb drm/i915/gem: Make relocations atomic within execbuf
59c00fdc9150 drm/syncobj: Allow use of dma-fence-proxy
9f06244959ca drm/i915/gem: Teach execbuf how to wait on future syncobj
fe5ce9a4c96f drm/i915/gem: Allow combining submit-fences with syncobj
0f995d2bfbe2 drm/i915/gt: Declare when we enabled timeslicing
796d8967ce3e drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT
30672d041d30 drm/i915: Always defer fenced work to the worker
f86a694d1a36 drm/i915/gem: Assign context id for async work
e8d99d37c64d drm/i915: Export a preallocate variant of i915_active_acquire()
8ed22b7514d8 drm/i915/gem: Separate the ww_mutex walker into its own list
55b772ca8526 drm/i915/gem: Asynchronous GTT unbinding
d2ba95a40d8a drm/i915/gem: Bind the fence async for execbuf

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2)
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (38 preceding siblings ...)
  2020-06-01  8:30 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2) Patchwork
@ 2020-06-01  8:31 ` Patchwork
  2020-06-01  8:51 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2020-06-01  8:31 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2)
URL   : https://patchwork.freedesktop.org/series/77857/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be checked separately.
+drivers/gpu/drm/i915/intel_wakeref.c:137:19: warning: context imbalance in 'wakeref_auto_timeout' - unexpected unlock
+drivers/gpu/drm/i915/selftests/i915_syncmap.c:80:54: warning: dubious: x | !y

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2)
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (39 preceding siblings ...)
  2020-06-01  8:31 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2020-06-01  8:51 ` Patchwork
  2020-06-01 11:00 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
  2020-06-01 11:31 ` [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Mika Kuoppala
  42 siblings, 0 replies; 53+ messages in thread
From: Patchwork @ 2020-06-01  8:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2)
URL   : https://patchwork.freedesktop.org/series/77857/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8560 -> Patchwork_17828
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/index.html

New tests
---------

  New tests have been introduced between CI_DRM_8560 and Patchwork_17828:

### New IGT tests (1) ###

  * igt@dmabuf@all@dma_fence_proxy:
    - Statuses : 42 pass(s)
    - Exec time: [0.03, 0.10] s

  


Changes
-------

  No changes found


Participating hosts (50 -> 44)
------------------------------

  Additional (1): fi-ehl-1 
  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-kbl-7560u fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_8560 -> Patchwork_17828

  CI-20190529: 20190529
  CI_DRM_8560: 02fe287fdb4a3d6bceb1bb61b3c8538b4b941b3c @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5687: 668a5be752186b6e08f361bac34da37309d08393 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17828: d2ba95a40d8a1c2731ac575e5183770cbb118343 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

d2ba95a40d8a drm/i915/gem: Bind the fence async for execbuf
55b772ca8526 drm/i915/gem: Asynchronous GTT unbinding
8ed22b7514d8 drm/i915/gem: Separate the ww_mutex walker into its own list
e8d99d37c64d drm/i915: Export a preallocate variant of i915_active_acquire()
f86a694d1a36 drm/i915/gem: Assign context id for async work
30672d041d30 drm/i915: Always defer fenced work to the worker
796d8967ce3e drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT
0f995d2bfbe2 drm/i915/gt: Declare when we enabled timeslicing
fe5ce9a4c96f drm/i915/gem: Allow combining submit-fences with syncobj
9f06244959ca drm/i915/gem: Teach execbuf how to wait on future syncobj
59c00fdc9150 drm/syncobj: Allow use of dma-fence-proxy
7f48f9f9cabb drm/i915/gem: Make relocations atomic within execbuf
ce15d9f7f287 drm/i915: Unpeel awaits on a proxy fence
3d075ea5bad9 dma-buf: Proxy fence, an unsignaled fence placeholder
b5ed91ed9028 drm/i915/gem: Add all GPU reloc awaits/signals en masse
cf3caf6ea48e drm/i915/gem: Build the reloc request first
a5a8005dc0d5 drm/i915/gem: Lift GPU relocation allocation
bf626d095a0a drm/i915/gem: Separate reloc validation into an earlier step
d6bb9bf39a67 drm/i915: Add list_for_each_entry_safe_continue_reverse
043a8c092764 drm/i915/gem: Async GPU relocations only
4223614bdbed drm/i915/gem: Mark the buffer pool as active for the cmdparser
8f4228e0458d drm/i915/gt: Enable ring scheduling for gen6/7
e4172f511dc0 drm/i915/gt: Implement ring scheduler for gen6/7
e43d3f056a59 drm/i915: Relinquish forcewake immediately after manual grouping
ad2a6b63bf64 drm/i915/gt: Track if an engine requires forcewake w/a
5354a72cf88c drm/i915/gt: Enable busy-stats for ring-scheduler
9c8572c7a931 drm/i915/gt: Infrastructure for ring scheduling
75a19e638ad6 drm/i915: Support inter-engine semaphores on gen6/7
ad2ec515e737 drm/i915/gt: Use client timeline address for seqno writes
a186650cc2c0 drm/i915/gt: Support creation of 'internal' rings
ee444a1c9757 drm/i915/gt: Couple tasklet scheduling for all CS interrupts
db08f3a83b7d Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq"
51c5e9106b00 drm/i915: Trim the ironlake+ irq handler
84fb5312e69a drm/i915/gt: Move legacy context wa to intel_workarounds
28f59054aa9e drm/i915/gt: Split low level gen2-7 CS emitters
b0896ff73ea4 drm/i915: Handle very early engine initialisation failure

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2)
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (40 preceding siblings ...)
  2020-06-01  8:51 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2020-06-01 11:00 ` Patchwork
  2020-06-01 11:39   ` Chris Wilson
  2020-06-01 11:31 ` [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Mika Kuoppala
  42 siblings, 1 reply; 53+ messages in thread
From: Patchwork @ 2020-06-01 11:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2)
URL   : https://patchwork.freedesktop.org/series/77857/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8560_full -> Patchwork_17828_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_17828_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17828_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_17828_full:

### IGT changes ###

#### Possible regressions ####

  * igt@runner@aborted:
    - shard-hsw:          NOTRUN -> [FAIL][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-hsw6/igt@runner@aborted.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@gem_exec_fence@parallel@vecs0}:
    - shard-hsw:          [PASS][2] -> [FAIL][3] +3 similar issues
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-hsw1/igt@gem_exec_fence@parallel@vecs0.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-hsw2/igt@gem_exec_fence@parallel@vecs0.html

  * {igt@gem_exec_fence@syncobj-invalid-wait}:
    - shard-snb:          [PASS][4] -> [FAIL][5] +1 similar issue
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-snb5/igt@gem_exec_fence@syncobj-invalid-wait.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-snb5/igt@gem_exec_fence@syncobj-invalid-wait.html
    - shard-tglb:         [PASS][6] -> [FAIL][7]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-tglb6/igt@gem_exec_fence@syncobj-invalid-wait.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-tglb8/igt@gem_exec_fence@syncobj-invalid-wait.html
    - shard-skl:          [PASS][8] -> [FAIL][9]
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-skl8/igt@gem_exec_fence@syncobj-invalid-wait.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-skl8/igt@gem_exec_fence@syncobj-invalid-wait.html
    - shard-glk:          [PASS][10] -> [FAIL][11]
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-glk6/igt@gem_exec_fence@syncobj-invalid-wait.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-glk1/igt@gem_exec_fence@syncobj-invalid-wait.html
    - shard-apl:          [PASS][12] -> [FAIL][13]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-apl1/igt@gem_exec_fence@syncobj-invalid-wait.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-apl1/igt@gem_exec_fence@syncobj-invalid-wait.html
    - shard-kbl:          [PASS][14] -> [FAIL][15]
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-kbl3/igt@gem_exec_fence@syncobj-invalid-wait.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-kbl4/igt@gem_exec_fence@syncobj-invalid-wait.html
    - shard-iclb:         [PASS][16] -> [FAIL][17]
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-iclb2/igt@gem_exec_fence@syncobj-invalid-wait.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-iclb2/igt@gem_exec_fence@syncobj-invalid-wait.html

  
New tests
---------

  New tests have been introduced between CI_DRM_8560_full and Patchwork_17828_full:

### New IGT tests (1) ###

  * igt@dmabuf@all@dma_fence_proxy:
    - Statuses : 8 pass(s)
    - Exec time: [0.04, 0.10] s

  

Known issues
------------

  Here are the changes found in Patchwork_17828_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_param@set-priority-not-supported:
    - shard-snb:          [PASS][18] -> [SKIP][19] ([fdo#109271])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-snb5/igt@gem_ctx_param@set-priority-not-supported.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-snb1/igt@gem_ctx_param@set-priority-not-supported.html
    - shard-hsw:          [PASS][20] -> [SKIP][21] ([fdo#109271])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-hsw1/igt@gem_ctx_param@set-priority-not-supported.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-hsw2/igt@gem_ctx_param@set-priority-not-supported.html

  * igt@i915_pm_rps@waitboost:
    - shard-hsw:          [PASS][22] -> [FAIL][23] ([i915#39]) +1 similar issue
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-hsw1/igt@i915_pm_rps@waitboost.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-hsw2/igt@i915_pm_rps@waitboost.html

  * igt@kms_cursor_crc@pipe-b-cursor-suspend:
    - shard-skl:          [PASS][24] -> [INCOMPLETE][25] ([i915#300])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-skl6/igt@kms_cursor_crc@pipe-b-cursor-suspend.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-skl1/igt@kms_cursor_crc@pipe-b-cursor-suspend.html

  * igt@kms_cursor_legacy@2x-cursor-vs-flip-atomic:
    - shard-glk:          [PASS][26] -> [INCOMPLETE][27] ([i915#58] / [k.org#198133])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-glk2/igt@kms_cursor_legacy@2x-cursor-vs-flip-atomic.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-glk9/igt@kms_cursor_legacy@2x-cursor-vs-flip-atomic.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes:
    - shard-kbl:          [PASS][28] -> [DMESG-WARN][29] ([i915#180]) +2 similar issues
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-kbl2/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-kbl7/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [PASS][30] -> [FAIL][31] ([fdo#108145] / [i915#265]) +1 similar issue
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-skl1/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-skl9/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_psr@psr2_primary_mmap_gtt:
    - shard-iclb:         [PASS][32] -> [SKIP][33] ([fdo#109441]) +1 similar issue
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-iclb2/igt@kms_psr@psr2_primary_mmap_gtt.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-iclb4/igt@kms_psr@psr2_primary_mmap_gtt.html

  
#### Possible fixes ####

  * {igt@gem_exec_reloc@basic-concurrent0}:
    - shard-tglb:         [FAIL][34] ([i915#1930]) -> [PASS][35] +1 similar issue
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-tglb1/igt@gem_exec_reloc@basic-concurrent0.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-tglb7/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-hsw:          [FAIL][36] ([i915#1930]) -> [PASS][37]
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-hsw2/igt@gem_exec_reloc@basic-concurrent0.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-hsw1/igt@gem_exec_reloc@basic-concurrent0.html

  * {igt@gem_exec_reloc@basic-concurrent16}:
    - shard-snb:          [FAIL][38] ([i915#1930]) -> [PASS][39] +1 similar issue
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-snb4/igt@gem_exec_reloc@basic-concurrent16.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-snb1/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-iclb:         [FAIL][40] ([i915#1930]) -> [PASS][41] +1 similar issue
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-iclb3/igt@gem_exec_reloc@basic-concurrent16.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-iclb8/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-skl:          [FAIL][42] ([i915#1930]) -> [PASS][43] +1 similar issue
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-skl9/igt@gem_exec_reloc@basic-concurrent16.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-skl4/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-kbl:          [FAIL][44] ([i915#1930]) -> [PASS][45]
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-kbl4/igt@gem_exec_reloc@basic-concurrent16.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-kbl6/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-apl:          [FAIL][46] ([i915#1930]) -> [PASS][47]
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-apl8/igt@gem_exec_reloc@basic-concurrent16.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-apl2/igt@gem_exec_reloc@basic-concurrent16.html

  * igt@gem_exec_whisper@basic-contexts-priority-all:
    - shard-hsw:          [SKIP][48] ([fdo#109271]) -> [PASS][49] +34 similar issues
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-hsw2/igt@gem_exec_whisper@basic-contexts-priority-all.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-hsw5/igt@gem_exec_whisper@basic-contexts-priority-all.html

  * igt@kms_cursor_crc@pipe-a-cursor-64x64-sliding:
    - shard-apl:          [FAIL][50] ([i915#54]) -> [PASS][51]
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-apl8/igt@kms_cursor_crc@pipe-a-cursor-64x64-sliding.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-apl2/igt@kms_cursor_crc@pipe-a-cursor-64x64-sliding.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][52] ([i915#180]) -> [PASS][53] +2 similar issues
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-kbl4/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-kbl6/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_cursor_legacy@2x-long-flip-vs-cursor-atomic:
    - shard-glk:          [FAIL][54] ([i915#72]) -> [PASS][55]
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-glk9/igt@kms_cursor_legacy@2x-long-flip-vs-cursor-atomic.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-glk8/igt@kms_cursor_legacy@2x-long-flip-vs-cursor-atomic.html

  * {igt@kms_flip@2x-flip-vs-absolute-wf_vblank@ab-hdmi-a1-hdmi-a2}:
    - shard-glk:          [FAIL][56] ([i915#1928]) -> [PASS][57]
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-glk1/igt@kms_flip@2x-flip-vs-absolute-wf_vblank@ab-hdmi-a1-hdmi-a2.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-glk4/igt@kms_flip@2x-flip-vs-absolute-wf_vblank@ab-hdmi-a1-hdmi-a2.html

  * {igt@kms_flip@flip-vs-suspend-interruptible@c-dp1}:
    - shard-apl:          [DMESG-WARN][58] ([i915#180]) -> [PASS][59] +7 similar issues
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-apl1/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-apl6/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html

  * {igt@kms_flip@plain-flip-ts-check-interruptible@b-edp1}:
    - shard-skl:          [FAIL][60] ([i915#1928]) -> [PASS][61]
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-skl9/igt@kms_flip@plain-flip-ts-check-interruptible@b-edp1.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-skl4/igt@kms_flip@plain-flip-ts-check-interruptible@b-edp1.html

  * igt@kms_hdr@bpc-switch-dpms:
    - shard-skl:          [FAIL][62] ([i915#1188]) -> [PASS][63] +1 similar issue
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-skl4/igt@kms_hdr@bpc-switch-dpms.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-skl3/igt@kms_hdr@bpc-switch-dpms.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-c:
    - shard-skl:          [FAIL][64] ([i915#53]) -> [PASS][65]
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-skl9/igt@kms_pipe_crc_basic@read-crc-pipe-c.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-skl4/igt@kms_pipe_crc_basic@read-crc-pipe-c.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [FAIL][66] ([fdo#108145] / [i915#265]) -> [PASS][67] +2 similar issues
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-skl9/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-skl4/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html

  * igt@kms_psr2_su@frontbuffer:
    - shard-iclb:         [SKIP][68] ([fdo#109642] / [fdo#111068]) -> [PASS][69]
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-iclb7/igt@kms_psr2_su@frontbuffer.html
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-iclb2/igt@kms_psr2_su@frontbuffer.html

  * igt@kms_psr@psr2_cursor_plane_move:
    - shard-iclb:         [SKIP][70] ([fdo#109441]) -> [PASS][71] +1 similar issue
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-iclb6/igt@kms_psr@psr2_cursor_plane_move.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-iclb2/igt@kms_psr@psr2_cursor_plane_move.html

  * igt@kms_vblank@pipe-c-ts-continuation-dpms-suspend:
    - shard-kbl:          [INCOMPLETE][72] ([i915#155]) -> [PASS][73]
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-kbl3/igt@kms_vblank@pipe-c-ts-continuation-dpms-suspend.html
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-kbl1/igt@kms_vblank@pipe-c-ts-continuation-dpms-suspend.html

  * {igt@perf@polling-parameterized}:
    - shard-iclb:         [FAIL][74] ([i915#1542]) -> [PASS][75]
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-iclb6/igt@perf@polling-parameterized.html
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-iclb8/igt@perf@polling-parameterized.html

  * {igt@perf_pmu@busy-accuracy-2@bcs0}:
    - shard-snb:          [SKIP][76] ([fdo#109271]) -> [PASS][77] +33 similar issues
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-snb5/igt@perf_pmu@busy-accuracy-2@bcs0.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-snb6/igt@perf_pmu@busy-accuracy-2@bcs0.html

  
#### Warnings ####

  * igt@kms_content_protection@lic:
    - shard-apl:          [FAIL][78] ([fdo#110321]) -> [TIMEOUT][79] ([i915#1319] / [i915#1635])
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-apl2/igt@kms_content_protection@lic.html
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-apl8/igt@kms_content_protection@lic.html

  * igt@kms_hdr@bpc-switch-suspend:
    - shard-skl:          [FAIL][80] ([i915#1188]) -> [INCOMPLETE][81] ([i915#198])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-skl1/igt@kms_hdr@bpc-switch-suspend.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-skl9/igt@kms_hdr@bpc-switch-suspend.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#110321]: https://bugs.freedesktop.org/show_bug.cgi?id=110321
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#1319]: https://gitlab.freedesktop.org/drm/intel/issues/1319
  [i915#146]: https://gitlab.freedesktop.org/drm/intel/issues/146
  [i915#1542]: https://gitlab.freedesktop.org/drm/intel/issues/1542
  [i915#155]: https://gitlab.freedesktop.org/drm/intel/issues/155
  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1927]: https://gitlab.freedesktop.org/drm/intel/issues/1927
  [i915#1928]: https://gitlab.freedesktop.org/drm/intel/issues/1928
  [i915#1930]: https://gitlab.freedesktop.org/drm/intel/issues/1930
  [i915#198]: https://gitlab.freedesktop.org/drm/intel/issues/198
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#300]: https://gitlab.freedesktop.org/drm/intel/issues/300
  [i915#39]: https://gitlab.freedesktop.org/drm/intel/issues/39
  [i915#53]: https://gitlab.freedesktop.org/drm/intel/issues/53
  [i915#54]: https://gitlab.freedesktop.org/drm/intel/issues/54
  [i915#58]: https://gitlab.freedesktop.org/drm/intel/issues/58
  [i915#61]: https://gitlab.freedesktop.org/drm/intel/issues/61
  [i915#72]: https://gitlab.freedesktop.org/drm/intel/issues/72
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (11 -> 11)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * Linux: CI_DRM_8560 -> Patchwork_17828

  CI-20190529: 20190529
  CI_DRM_8560: 02fe287fdb4a3d6bceb1bb61b3c8538b4b941b3c @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5687: 668a5be752186b6e08f361bac34da37309d08393 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17828: d2ba95a40d8a1c2731ac575e5183770cbb118343 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure
  2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
                   ` (41 preceding siblings ...)
  2020-06-01 11:00 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
@ 2020-06-01 11:31 ` Mika Kuoppala
  42 siblings, 0 replies; 53+ messages in thread
From: Mika Kuoppala @ 2020-06-01 11:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Chris Wilson

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we fail during engine setup, we may leave some engines not yet setup.
> During the error cleanup, we have to be careful not to try and use the
> uninitialise engines before discarding them.
>
> [   16.136152] RIP: 0010:__flush_work+0x198/0x1b0
> [   16.136168] Code: ff ff 8b 0b 48 8b 53 08 83 e1 08 48 0f ba 2b 03 80 c9 f0 e9 63 ff ff ff 0f 0b 48 83 c4 48 44 89 f0 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 45 31 f6 e9 62 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f
> [   16.136186] RSP: 0018:ffffc900003bb928 EFLAGS: 00010246
> [   16.136201] RAX: 0000000000000000 RBX: ffff88844f392168 RCX: 0000000000000000
> [   16.136216] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88844f392168
> [   16.136231] RBP: ffff88844f392130 R08: 0000000000000000 R09: 0000000000000001
> [   16.136246] R10: ffff888441e31e40 R11: ffff88845e329c70 R12: ffff88844f796988
> [   16.136261] R13: ffff888441e4fb80 R14: 0000000000000001 R15: ffff88844f790000
> [   16.136388] FS:  00007fecbd208880(0000) GS:ffff88845e380000(0000) knlGS:0000000000000000
> [   16.136405] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   16.136420] CR2: 00007ff3ce748f90 CR3: 0000000457a6a001 CR4: 00000000000606e0
> [   16.136437] Call Trace:
> [   16.136456]  ? try_to_del_timer_sync+0x3a/0x50
> [   16.136529]  intel_wakeref_wait_for_idle+0x87/0xb0 [i915]
> [   16.136606]  ? intel_engines_release+0x68/0xc0 [i915]
> [   16.136680]  intel_engines_release+0x49/0xc0 [i915]
> [   16.136757]  intel_gt_init+0x2f4/0x5e0 [i915]
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index da5b61085257..c8c14981eb5d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -414,12 +414,12 @@ void intel_engines_release(struct intel_gt *gt)
>  
>  	/* Decouple the backend; but keep the layout for late GPU resets */
>  	for_each_engine(engine, gt, id) {
> -		intel_wakeref_wait_for_idle(&engine->wakeref);
> -		GEM_BUG_ON(intel_engine_pm_is_awake(engine));
> -
>  		if (!engine->release)
>  			continue;
>  
> +		intel_wakeref_wait_for_idle(&engine->wakeref);
> +		GEM_BUG_ON(intel_engine_pm_is_awake(engine));
> +
>  		engine->release(engine);
>  		engine->release = NULL;
>  
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2)
  2020-06-01 11:00 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
@ 2020-06-01 11:39   ` Chris Wilson
  0 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01 11:39 UTC (permalink / raw)
  To: Patchwork, intel-gfx; +Cc: intel-gfx

Quoting Patchwork (2020-06-01 12:00:40)
> == Series Details ==
> 
> Series: series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2)
> URL   : https://patchwork.freedesktop.org/series/77857/
> State : failure
> 
> == Summary ==
> 
> CI Bug Log - changes from CI_DRM_8560_full -> Patchwork_17828_full
> ====================================================
> 
> Summary
> -------
> 
>   **FAILURE**
> 
>   Serious unknown changes coming with Patchwork_17828_full absolutely need to be
>   verified manually.
>   
>   If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_17828_full, please notify your bug team to allow them
>   to document this new failure mode, which will reduce false positives in CI.
> 
>   
> 
> Possible new issues
> -------------------
> 
>   Here are the unknown changes that may have been introduced in Patchwork_17828_full:
> 
> ### IGT changes ###
> 
> #### Possible regressions ####
> 
>   * igt@runner@aborted:
>     - shard-hsw:          NOTRUN -> [FAIL][1]
>    [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-hsw6/igt@runner@aborted.html
> 
>   
> #### Suppressed ####
> 
>   The following results come from untrusted machines, tests, or statuses.
>   They do not affect the overall result.
> 
>   * {igt@gem_exec_fence@parallel@vecs0}:
>     - shard-hsw:          [PASS][2] -> [FAIL][3] +3 similar issues
>    [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8560/shard-hsw1/igt@gem_exec_fence@parallel@vecs0.html
>    [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17828/shard-hsw2/igt@gem_exec_fence@parallel@vecs0.html

Sigh. They dropped the memory compare from MI_SEMAPHORE_MBOX in Haswell.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH 04/36] drm/i915: Trim the ironlake+ irq handler
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 04/36] drm/i915: Trim the ironlake+ irq handler Chris Wilson
@ 2020-06-01 11:49   ` Mika Kuoppala
  2020-06-01 12:00     ` Chris Wilson
  0 siblings, 1 reply; 53+ messages in thread
From: Mika Kuoppala @ 2020-06-01 11:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Chris Wilson

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Ever noticed that our interrupt handlers are where we spend most of our
> time on a busy system? In part this is unavoidable as each interrupt
> requires to poll and reset several registers, but we can try and so as
> efficiently as possible.
>
> Function                                     old     new   delta
> ilk_irq_handler                             2317    2156    -161
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 59 ++++++++++++++++-----------------
>  1 file changed, 28 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 63579ab71cf6..07c0c7ea3795 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2097,69 +2097,66 @@ static void ivb_display_irq_handler(struct drm_i915_private *dev_priv,
>   */
>  static irqreturn_t ilk_irq_handler(int irq, void *arg)
>  {
> -	struct drm_i915_private *dev_priv = arg;
> +	struct drm_i915_private *i915 = arg;
> +	void __iomem * const regs = i915->uncore.regs;
>  	u32 de_iir, gt_iir, de_ier, sde_ier = 0;
> -	irqreturn_t ret = IRQ_NONE;
>  
> -	if (!intel_irqs_enabled(dev_priv))
> +	if (!unlikely(intel_irqs_enabled(i915)))

Put ! inside the unlikely for readability please.

>  		return IRQ_NONE;
>  
>  	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
> -	disable_rpm_wakeref_asserts(&dev_priv->runtime_pm);
> +	disable_rpm_wakeref_asserts(&i915->runtime_pm);
>  
>  	/* disable master interrupt before clearing iir  */
> -	de_ier = I915_READ(DEIER);
> -	I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
> +	de_ier = raw_reg_read(regs, DEIER);
> +	raw_reg_write(regs, DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
>  
>  	/* Disable south interrupts. We'll only write to SDEIIR once, so further
>  	 * interrupts will will be stored on its back queue, and then we'll be
>  	 * able to process them after we restore SDEIER (as soon as we restore
>  	 * it, we'll get an interrupt if SDEIIR still has something to process
>  	 * due to its back queue). */
> -	if (!HAS_PCH_NOP(dev_priv)) {
> -		sde_ier = I915_READ(SDEIER);
> -		I915_WRITE(SDEIER, 0);
> +	if (!HAS_PCH_NOP(i915)) {
> +		sde_ier = raw_reg_read(regs, SDEIER);
> +		raw_reg_write(regs, SDEIER, 0);
>  	}
>  
>  	/* Find, clear, then process each source of interrupt */
>  
> -	gt_iir = I915_READ(GTIIR);
> +	gt_iir = raw_reg_read(regs, GTIIR);
>  	if (gt_iir) {
> -		I915_WRITE(GTIIR, gt_iir);
> -		ret = IRQ_HANDLED;
> -		if (INTEL_GEN(dev_priv) >= 6)
> -			gen6_gt_irq_handler(&dev_priv->gt, gt_iir);
> +		raw_reg_write(regs, GTIIR, gt_iir);
> +		if (INTEL_GEN(i915) >= 6)
> +			gen6_gt_irq_handler(&i915->gt, gt_iir);
>  		else
> -			gen5_gt_irq_handler(&dev_priv->gt, gt_iir);
> +			gen5_gt_irq_handler(&i915->gt, gt_iir);
>  	}
>  
> -	de_iir = I915_READ(DEIIR);
> +	de_iir = raw_reg_read(regs, DEIIR);
>  	if (de_iir) {
> -		I915_WRITE(DEIIR, de_iir);
> -		ret = IRQ_HANDLED;
> -		if (INTEL_GEN(dev_priv) >= 7)
> -			ivb_display_irq_handler(dev_priv, de_iir);
> +		raw_reg_write(regs, DEIIR, de_iir);
> +		if (INTEL_GEN(i915) >= 7)
> +			ivb_display_irq_handler(i915, de_iir);
>  		else
> -			ilk_display_irq_handler(dev_priv, de_iir);
> +			ilk_display_irq_handler(i915, de_iir);
>  	}
>  
> -	if (INTEL_GEN(dev_priv) >= 6) {
> -		u32 pm_iir = I915_READ(GEN6_PMIIR);
> +	if (INTEL_GEN(i915) >= 6) {
> +		u32 pm_iir = raw_reg_read(regs, GEN6_PMIIR);
>  		if (pm_iir) {
> -			I915_WRITE(GEN6_PMIIR, pm_iir);
> -			ret = IRQ_HANDLED;
> -			gen6_rps_irq_handler(&dev_priv->gt.rps, pm_iir);
> +			raw_reg_write(regs, GEN6_PMIIR, pm_iir);
> +			gen6_rps_irq_handler(&i915->gt.rps, pm_iir);
>  		}
>  	}
>  
> -	I915_WRITE(DEIER, de_ier);
> -	if (!HAS_PCH_NOP(dev_priv))
> -		I915_WRITE(SDEIER, sde_ier);
> +	raw_reg_write(regs, DEIER, de_ier);
> +	if (sde_ier)
> +		raw_reg_write(regs, SDEIER, sde_ier);
>  
>  	/* IRQs are synced during runtime_suspend, we don't require a wakeref */
> -	enable_rpm_wakeref_asserts(&dev_priv->runtime_pm);
> +	enable_rpm_wakeref_asserts(&i915->runtime_pm);
>  
> -	return ret;
> +	return IRQ_HANDLED;

Change in here is that if we catch a intr without any work, we now
return handled instead of none. 

And as we have not actually done anything to prevent the next
one to kicking off, this is potentially dangerous.

But we explicitly clear the enables tho, but is it enough?

-Mika

>  }
>  
>  static void bxt_hpd_irq_handler(struct drm_i915_private *dev_priv,
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH 04/36] drm/i915: Trim the ironlake+ irq handler
  2020-06-01 11:49   ` Mika Kuoppala
@ 2020-06-01 12:00     ` Chris Wilson
  2020-06-01 12:08       ` Chris Wilson
  0 siblings, 1 reply; 53+ messages in thread
From: Chris Wilson @ 2020-06-01 12:00 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2020-06-01 12:49:49)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Ever noticed that our interrupt handlers are where we spend most of our
> > time on a busy system? In part this is unavoidable as each interrupt
> > requires to poll and reset several registers, but we can try and so as
> > efficiently as possible.
> >
> > Function                                     old     new   delta
> > ilk_irq_handler                             2317    2156    -161
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_irq.c | 59 ++++++++++++++++-----------------
> >  1 file changed, 28 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 63579ab71cf6..07c0c7ea3795 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -2097,69 +2097,66 @@ static void ivb_display_irq_handler(struct drm_i915_private *dev_priv,
> >   */
> >  static irqreturn_t ilk_irq_handler(int irq, void *arg)
> >  {
> > -     struct drm_i915_private *dev_priv = arg;
> > +     struct drm_i915_private *i915 = arg;
> > +     void __iomem * const regs = i915->uncore.regs;
> >       u32 de_iir, gt_iir, de_ier, sde_ier = 0;
> > -     irqreturn_t ret = IRQ_NONE;
> >  
> > -     if (!intel_irqs_enabled(dev_priv))
> > +     if (!unlikely(intel_irqs_enabled(i915)))
> 
> Put ! inside the unlikely for readability please.
> 
> >               return IRQ_NONE;
> >  
> >       /* IRQs are synced during runtime_suspend, we don't require a wakeref */
> > -     disable_rpm_wakeref_asserts(&dev_priv->runtime_pm);
> > +     disable_rpm_wakeref_asserts(&i915->runtime_pm);
> >  
> >       /* disable master interrupt before clearing iir  */
> > -     de_ier = I915_READ(DEIER);
> > -     I915_WRITE(DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
> > +     de_ier = raw_reg_read(regs, DEIER);
> > +     raw_reg_write(regs, DEIER, de_ier & ~DE_MASTER_IRQ_CONTROL);
> >  
> >       /* Disable south interrupts. We'll only write to SDEIIR once, so further
> >        * interrupts will will be stored on its back queue, and then we'll be
> >        * able to process them after we restore SDEIER (as soon as we restore
> >        * it, we'll get an interrupt if SDEIIR still has something to process
> >        * due to its back queue). */
> > -     if (!HAS_PCH_NOP(dev_priv)) {
> > -             sde_ier = I915_READ(SDEIER);
> > -             I915_WRITE(SDEIER, 0);
> > +     if (!HAS_PCH_NOP(i915)) {
> > +             sde_ier = raw_reg_read(regs, SDEIER);
> > +             raw_reg_write(regs, SDEIER, 0);
> >       }
> >  
> >       /* Find, clear, then process each source of interrupt */
> >  
> > -     gt_iir = I915_READ(GTIIR);
> > +     gt_iir = raw_reg_read(regs, GTIIR);
> >       if (gt_iir) {
> > -             I915_WRITE(GTIIR, gt_iir);
> > -             ret = IRQ_HANDLED;
> > -             if (INTEL_GEN(dev_priv) >= 6)
> > -                     gen6_gt_irq_handler(&dev_priv->gt, gt_iir);
> > +             raw_reg_write(regs, GTIIR, gt_iir);
> > +             if (INTEL_GEN(i915) >= 6)
> > +                     gen6_gt_irq_handler(&i915->gt, gt_iir);
> >               else
> > -                     gen5_gt_irq_handler(&dev_priv->gt, gt_iir);
> > +                     gen5_gt_irq_handler(&i915->gt, gt_iir);
> >       }
> >  
> > -     de_iir = I915_READ(DEIIR);
> > +     de_iir = raw_reg_read(regs, DEIIR);
> >       if (de_iir) {
> > -             I915_WRITE(DEIIR, de_iir);
> > -             ret = IRQ_HANDLED;
> > -             if (INTEL_GEN(dev_priv) >= 7)
> > -                     ivb_display_irq_handler(dev_priv, de_iir);
> > +             raw_reg_write(regs, DEIIR, de_iir);
> > +             if (INTEL_GEN(i915) >= 7)
> > +                     ivb_display_irq_handler(i915, de_iir);
> >               else
> > -                     ilk_display_irq_handler(dev_priv, de_iir);
> > +                     ilk_display_irq_handler(i915, de_iir);
> >       }
> >  
> > -     if (INTEL_GEN(dev_priv) >= 6) {
> > -             u32 pm_iir = I915_READ(GEN6_PMIIR);
> > +     if (INTEL_GEN(i915) >= 6) {
> > +             u32 pm_iir = raw_reg_read(regs, GEN6_PMIIR);
> >               if (pm_iir) {
> > -                     I915_WRITE(GEN6_PMIIR, pm_iir);
> > -                     ret = IRQ_HANDLED;
> > -                     gen6_rps_irq_handler(&dev_priv->gt.rps, pm_iir);
> > +                     raw_reg_write(regs, GEN6_PMIIR, pm_iir);
> > +                     gen6_rps_irq_handler(&i915->gt.rps, pm_iir);
> >               }
> >       }
> >  
> > -     I915_WRITE(DEIER, de_ier);
> > -     if (!HAS_PCH_NOP(dev_priv))
> > -             I915_WRITE(SDEIER, sde_ier);
> > +     raw_reg_write(regs, DEIER, de_ier);
> > +     if (sde_ier)
> > +             raw_reg_write(regs, SDEIER, sde_ier);
> >  
> >       /* IRQs are synced during runtime_suspend, we don't require a wakeref */
> > -     enable_rpm_wakeref_asserts(&dev_priv->runtime_pm);
> > +     enable_rpm_wakeref_asserts(&i915->runtime_pm);
> >  
> > -     return ret;
> > +     return IRQ_HANDLED;
> 
> Change in here is that if we catch a intr without any work, we now
> return handled instead of none. 
> 
> And as we have not actually done anything to prevent the next
> one to kicking off, this is potentially dangerous.
> 
> But we explicitly clear the enables tho, but is it enough?

It's MSI-X, to get here means there was an interrupt. Let's check how
much it adds to track IRQ_HANDLED.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH 04/36] drm/i915: Trim the ironlake+ irq handler
  2020-06-01 12:00     ` Chris Wilson
@ 2020-06-01 12:08       ` Chris Wilson
  0 siblings, 0 replies; 53+ messages in thread
From: Chris Wilson @ 2020-06-01 12:08 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Chris Wilson (2020-06-01 13:00:43)
> Quoting Mika Kuoppala (2020-06-01 12:49:49)
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > 
> > > Ever noticed that our interrupt handlers are where we spend most of our
> > > time on a busy system? In part this is unavoidable as each interrupt
> > > requires to poll and reset several registers, but we can try and so as
> > > efficiently as possible.
> > >
> > > Function                                     old     new   delta
> > > ilk_irq_handler                             2317    2156    -161

With irqreturn_t ret,

ilk_irq_handler.cold                          63      72      +9
ilk_irq_handler                             2221    2083    -138

I love how it can change so much simply by recompiling the baseline.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH 12/36] drm/i915/gt: Track if an engine requires forcewake w/a
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 12/36] drm/i915/gt: Track if an engine requires forcewake w/a Chris Wilson
@ 2020-06-01 12:17   ` Mika Kuoppala
  0 siblings, 0 replies; 53+ messages in thread
From: Mika Kuoppala @ 2020-06-01 12:17 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Chris Wilson

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Sometimes an engine might need to keep forcewake active while it is busy
> submitting requests for a particular workaround. Track such nuisance
> with engine->fw_domain.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/gt/intel_engine_types.h   | 9 +++++++++
>  drivers/gpu/drm/i915/gt/intel_ring_scheduler.c | 4 ++++
>  2 files changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 3782e27c2945..ccdd69923793 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -313,6 +313,15 @@ struct intel_engine_cs {
>  	u32 context_size;
>  	u32 mmio_base;
>  
> +	/*
> +	 * Some w/a require forcewake to be held (which prevents RC6) while
> +	 * a particular engine is active. If so, we set fw_domain to which
> +	 * domains need to be held for the duration of request activity,
> +	 * and 0 if none.
> +	 */
> +	unsigned int fw_domain;
> +	unsigned int fw_active;
> +
>  	unsigned long context_tag;
>  
>  	struct rb_node uabi_node;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
> index aaff554865b1..777cab6d9540 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_scheduler.c
> @@ -60,6 +60,8 @@ static struct i915_request *
>  schedule_in(struct intel_engine_cs *engine, struct i915_request *rq)
>  {
>  	__intel_gt_pm_get(engine->gt);
> +	if (!engine->fw_active++ && engine->fw_domain)
> +		intel_uncore_forcewake_get(engine->uncore, engine->fw_domain);
>  	intel_engine_context_in(engine);
>  	return i915_request_get(rq);
>  }
> @@ -74,6 +76,8 @@ schedule_out(struct intel_engine_cs *engine, struct i915_request *rq)
>  
>  	i915_request_put(rq);
>  	intel_engine_context_out(engine);
> +	if (!--engine->fw_active && engine->fw_domain)
> +		intel_uncore_forcewake_put(engine->uncore, engine->fw_domain);
>  	intel_gt_pm_put_async(engine->gt);
>  }
>  
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH 13/36] drm/i915: Relinquish forcewake immediately after manual grouping
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 13/36] drm/i915: Relinquish forcewake immediately after manual grouping Chris Wilson
@ 2020-06-01 12:20   ` Mika Kuoppala
  0 siblings, 0 replies; 53+ messages in thread
From: Mika Kuoppala @ 2020-06-01 12:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Chris Wilson

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Our forcewake utilisation is split into categories: automatic and
> manual. Around bare register reads, we look up the right forcewake
> domain and automatically acquire and release [upon a timer] the
> forcewake domain. For other access, where we know we require the
> forcewake across a group of register reads, we manually acquire the
> forcewake domain and release it at the end. Again, this currently arms
> the domain timer for a later release.
>
> However, looking at some energy utilisation profiles, we have tried to
> avoid using forcewake [and rely on the natural wake up to post register
> updates] due to that even keep the fw active for a brief period
> contributes to a significant power draw [i.e. when the gpu is sleeping
> with rc6 at high clocks]. But as it turns out, not posting the writes
> immediately also has unintended consequences, such as not reducing the
> clocks and so conserving power while busy.
>
> As a compromise, let us only arm the domain timer for automatic
> forcewake usage around bare register access, but immediately release the
> forcewake when manually acquired by intel_uncore_forcewake_get/_put.
>
> The corollary to this is that we may instead have to take forcewake more
> often, and so incur a latency penalty in doing so. For Sandybridge this
> was significant, and even on the latest machines, taking forcewake at
> interrupt frequency is a huge impact. [So we don't do that anymore!
> Hopefully, this will spare us from still needing the mitigation of the
> timer for steady state execution.]
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

I am not a fan of having explicit put relying on timer,
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index a61cb8ca4d50..7d6b9ae7403c 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -709,7 +709,7 @@ static void __intel_uncore_forcewake_put(struct intel_uncore *uncore,
>  			continue;
>  		}
>  
> -		fw_domain_arm_timer(domain);
> +		uncore->funcs.force_wake_put(uncore, domain->mask);
>  	}
>  }
>  
> -- 
> 2.20.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH 02/36] drm/i915/gt: Split low level gen2-7 CS emitters
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 02/36] drm/i915/gt: Split low level gen2-7 CS emitters Chris Wilson
@ 2020-06-02  9:04   ` Mika Kuoppala
  0 siblings, 0 replies; 53+ messages in thread
From: Mika Kuoppala @ 2020-06-02  9:04 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Chris Wilson

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Pull the routines for writing CS packets out of intel_ring_submission
> into their own files. These are low level operations for building CS
> instructions, rather than the logic for filling the global ring buffer
> with requests, and we will wnat to reuse them outside of this context.

*want.

Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/Makefile                 |   2 +
>  drivers/gpu/drm/i915/gt/gen2_engine_cs.c      | 340 +++++++
>  drivers/gpu/drm/i915/gt/gen2_engine_cs.h      |  38 +
>  drivers/gpu/drm/i915/gt/gen6_engine_cs.c      | 455 ++++++++++
>  drivers/gpu/drm/i915/gt/gen6_engine_cs.h      |  39 +
>  drivers/gpu/drm/i915/gt/intel_engine.h        |   1 -
>  .../gpu/drm/i915/gt/intel_ring_submission.c   | 832 +-----------------
>  7 files changed, 901 insertions(+), 806 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gt/gen2_engine_cs.c
>  create mode 100644 drivers/gpu/drm/i915/gt/gen2_engine_cs.h
>  create mode 100644 drivers/gpu/drm/i915/gt/gen6_engine_cs.c
>  create mode 100644 drivers/gpu/drm/i915/gt/gen6_engine_cs.h
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index b0da6ea6e3f1..41a27fd5dbc7 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -78,6 +78,8 @@ gt-y += \
>  	gt/debugfs_engines.o \
>  	gt/debugfs_gt.o \
>  	gt/debugfs_gt_pm.o \
> +	gt/gen2_engine_cs.o \
> +	gt/gen6_engine_cs.o \
>  	gt/gen6_ppgtt.o \
>  	gt/gen7_renderclear.o \
>  	gt/gen8_ppgtt.o \
> diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.c b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
> new file mode 100644
> index 000000000000..8d2e85081247
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.c
> @@ -0,0 +1,340 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2020 Intel Corporation
> + */
> +
> +#include "gen2_engine_cs.h"
> +#include "i915_drv.h"
> +#include "intel_engine.h"
> +#include "intel_gpu_commands.h"
> +#include "intel_gt.h"
> +#include "intel_gt_irq.h"
> +#include "intel_ring.h"
> +
> +int gen2_emit_flush(struct i915_request *rq, u32 mode)
> +{
> +	unsigned int num_store_dw;
> +	u32 cmd, *cs;
> +
> +	cmd = MI_FLUSH;
> +	num_store_dw = 0;
> +	if (mode & EMIT_INVALIDATE)
> +		cmd |= MI_READ_FLUSH;
> +	if (mode & EMIT_FLUSH)
> +		num_store_dw = 4;
> +
> +	cs = intel_ring_begin(rq, 2 + 3 * num_store_dw);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = cmd;
> +	while (num_store_dw--) {
> +		*cs++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
> +		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
> +						INTEL_GT_SCRATCH_FIELD_DEFAULT);
> +		*cs++ = 0;
> +	}
> +	*cs++ = MI_FLUSH | MI_NO_WRITE_FLUSH;
> +
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +int gen4_emit_flush_rcs(struct i915_request *rq, u32 mode)
> +{
> +	u32 cmd, *cs;
> +	int i;
> +
> +	/*
> +	 * read/write caches:
> +	 *
> +	 * I915_GEM_DOMAIN_RENDER is always invalidated, but is
> +	 * only flushed if MI_NO_WRITE_FLUSH is unset.  On 965, it is
> +	 * also flushed at 2d versus 3d pipeline switches.
> +	 *
> +	 * read-only caches:
> +	 *
> +	 * I915_GEM_DOMAIN_SAMPLER is flushed on pre-965 if
> +	 * MI_READ_FLUSH is set, and is always flushed on 965.
> +	 *
> +	 * I915_GEM_DOMAIN_COMMAND may not exist?
> +	 *
> +	 * I915_GEM_DOMAIN_INSTRUCTION, which exists on 965, is
> +	 * invalidated when MI_EXE_FLUSH is set.
> +	 *
> +	 * I915_GEM_DOMAIN_VERTEX, which exists on 965, is
> +	 * invalidated with every MI_FLUSH.
> +	 *
> +	 * TLBs:
> +	 *
> +	 * On 965, TLBs associated with I915_GEM_DOMAIN_COMMAND
> +	 * and I915_GEM_DOMAIN_CPU in are invalidated at PTE write and
> +	 * I915_GEM_DOMAIN_RENDER and I915_GEM_DOMAIN_SAMPLER
> +	 * are flushed at any MI_FLUSH.
> +	 */
> +
> +	cmd = MI_FLUSH;
> +	if (mode & EMIT_INVALIDATE) {
> +		cmd |= MI_EXE_FLUSH;
> +		if (IS_G4X(rq->i915) || IS_GEN(rq->i915, 5))
> +			cmd |= MI_INVALIDATE_ISP;
> +	}
> +
> +	i = 2;
> +	if (mode & EMIT_INVALIDATE)
> +		i += 20;
> +
> +	cs = intel_ring_begin(rq, i);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = cmd;
> +
> +	/*
> +	 * A random delay to let the CS invalidate take effect? Without this
> +	 * delay, the GPU relocation path fails as the CS does not see
> +	 * the updated contents. Just as important, if we apply the flushes
> +	 * to the EMIT_FLUSH branch (i.e. immediately after the relocation
> +	 * write and before the invalidate on the next batch), the relocations
> +	 * still fail. This implies that is a delay following invalidation
> +	 * that is required to reset the caches as opposed to a delay to
> +	 * ensure the memory is written.
> +	 */
> +	if (mode & EMIT_INVALIDATE) {
> +		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> +		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
> +						INTEL_GT_SCRATCH_FIELD_DEFAULT) |
> +			PIPE_CONTROL_GLOBAL_GTT;
> +		*cs++ = 0;
> +		*cs++ = 0;
> +
> +		for (i = 0; i < 12; i++)
> +			*cs++ = MI_FLUSH;
> +
> +		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> +		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
> +						INTEL_GT_SCRATCH_FIELD_DEFAULT) |
> +			PIPE_CONTROL_GLOBAL_GTT;
> +		*cs++ = 0;
> +		*cs++ = 0;
> +	}
> +
> +	*cs++ = cmd;
> +
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +int gen4_emit_flush_vcs(struct i915_request *rq, u32 mode)
> +{
> +	u32 *cs;
> +
> +	cs = intel_ring_begin(rq, 2);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = MI_FLUSH;
> +	*cs++ = MI_NOOP;
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +u32 *gen3_emit_breadcrumb(struct i915_request *rq, u32 *cs)
> +{
> +	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
> +	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
> +
> +	*cs++ = MI_FLUSH;
> +
> +	*cs++ = MI_STORE_DWORD_INDEX;
> +	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> +	*cs++ = rq->fence.seqno;
> +
> +	*cs++ = MI_USER_INTERRUPT;
> +	*cs++ = MI_NOOP;
> +
> +	rq->tail = intel_ring_offset(rq, cs);
> +	assert_ring_tail_valid(rq->ring, rq->tail);
> +
> +	return cs;
> +}
> +
> +#define GEN5_WA_STORES 8 /* must be at least 1! */
> +u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
> +{
> +	int i;
> +
> +	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
> +	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
> +
> +	*cs++ = MI_FLUSH;
> +
> +	BUILD_BUG_ON(GEN5_WA_STORES < 1);
> +	for (i = 0; i < GEN5_WA_STORES; i++) {
> +		*cs++ = MI_STORE_DWORD_INDEX;
> +		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> +		*cs++ = rq->fence.seqno;
> +	}
> +
> +	*cs++ = MI_USER_INTERRUPT;
> +
> +	rq->tail = intel_ring_offset(rq, cs);
> +	assert_ring_tail_valid(rq->ring, rq->tail);
> +
> +	return cs;
> +}
> +#undef GEN5_WA_STORES
> +
> +/* Just userspace ABI convention to limit the wa batch bo to a resonable size */
> +#define I830_BATCH_LIMIT SZ_256K
> +#define I830_TLB_ENTRIES (2)
> +#define I830_WA_SIZE max(I830_TLB_ENTRIES * SZ_4K, I830_BATCH_LIMIT)
> +int i830_emit_bb_start(struct i915_request *rq,
> +		       u64 offset, u32 len,
> +		       unsigned int dispatch_flags)
> +{
> +	u32 *cs, cs_offset =
> +		intel_gt_scratch_offset(rq->engine->gt,
> +					INTEL_GT_SCRATCH_FIELD_DEFAULT);
> +
> +	GEM_BUG_ON(rq->engine->gt->scratch->size < I830_WA_SIZE);
> +
> +	cs = intel_ring_begin(rq, 6);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	/* Evict the invalid PTE TLBs */
> +	*cs++ = COLOR_BLT_CMD | BLT_WRITE_RGBA;
> +	*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | 4096;
> +	*cs++ = I830_TLB_ENTRIES << 16 | 4; /* load each page */
> +	*cs++ = cs_offset;
> +	*cs++ = 0xdeadbeef;
> +	*cs++ = MI_NOOP;
> +	intel_ring_advance(rq, cs);
> +
> +	if ((dispatch_flags & I915_DISPATCH_PINNED) == 0) {
> +		if (len > I830_BATCH_LIMIT)
> +			return -ENOSPC;
> +
> +		cs = intel_ring_begin(rq, 6 + 2);
> +		if (IS_ERR(cs))
> +			return PTR_ERR(cs);
> +
> +		/*
> +		 * Blit the batch (which has now all relocs applied) to the
> +		 * stable batch scratch bo area (so that the CS never
> +		 * stumbles over its tlb invalidation bug) ...
> +		 */
> +		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
> +		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | 4096;
> +		*cs++ = DIV_ROUND_UP(len, 4096) << 16 | 4096;
> +		*cs++ = cs_offset;
> +		*cs++ = 4096;
> +		*cs++ = offset;
> +
> +		*cs++ = MI_FLUSH;
> +		*cs++ = MI_NOOP;
> +		intel_ring_advance(rq, cs);
> +
> +		/* ... and execute it. */
> +		offset = cs_offset;
> +	}
> +
> +	if (!(dispatch_flags & I915_DISPATCH_SECURE))
> +		offset |= MI_BATCH_NON_SECURE;
> +
> +	cs = intel_ring_begin(rq, 2);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
> +	*cs++ = offset;
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +int gen3_emit_bb_start(struct i915_request *rq,
> +		       u64 offset, u32 len,
> +		       unsigned int dispatch_flags)
> +{
> +	u32 *cs;
> +
> +	if (!(dispatch_flags & I915_DISPATCH_SECURE))
> +		offset |= MI_BATCH_NON_SECURE;
> +
> +	cs = intel_ring_begin(rq, 2);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
> +	*cs++ = offset;
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +int gen4_emit_bb_start(struct i915_request *rq,
> +		       u64 offset, u32 length,
> +		       unsigned int dispatch_flags)
> +{
> +	u32 security;
> +	u32 *cs;
> +
> +	security = MI_BATCH_NON_SECURE_I965;
> +	if (dispatch_flags & I915_DISPATCH_SECURE)
> +		security = 0;
> +
> +	cs = intel_ring_begin(rq, 2);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT | security;
> +	*cs++ = offset;
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +void gen2_irq_enable(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_private *i915 = engine->i915;
> +
> +	i915->irq_mask &= ~engine->irq_enable_mask;
> +	intel_uncore_write16(&i915->uncore, GEN2_IMR, i915->irq_mask);
> +	ENGINE_POSTING_READ16(engine, RING_IMR);
> +}
> +
> +void gen2_irq_disable(struct intel_engine_cs *engine)
> +{
> +	struct drm_i915_private *i915 = engine->i915;
> +
> +	i915->irq_mask |= engine->irq_enable_mask;
> +	intel_uncore_write16(&i915->uncore, GEN2_IMR, i915->irq_mask);
> +}
> +
> +void gen3_irq_enable(struct intel_engine_cs *engine)
> +{
> +	engine->i915->irq_mask &= ~engine->irq_enable_mask;
> +	intel_uncore_write(engine->uncore, GEN2_IMR, engine->i915->irq_mask);
> +	intel_uncore_posting_read_fw(engine->uncore, GEN2_IMR);
> +}
> +
> +void gen3_irq_disable(struct intel_engine_cs *engine)
> +{
> +	engine->i915->irq_mask |= engine->irq_enable_mask;
> +	intel_uncore_write(engine->uncore, GEN2_IMR, engine->i915->irq_mask);
> +}
> +
> +void gen5_irq_enable(struct intel_engine_cs *engine)
> +{
> +	gen5_gt_enable_irq(engine->gt, engine->irq_enable_mask);
> +}
> +
> +void gen5_irq_disable(struct intel_engine_cs *engine)
> +{
> +	gen5_gt_disable_irq(engine->gt, engine->irq_enable_mask);
> +}
> diff --git a/drivers/gpu/drm/i915/gt/gen2_engine_cs.h b/drivers/gpu/drm/i915/gt/gen2_engine_cs.h
> new file mode 100644
> index 000000000000..a5cd64a65c9e
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/gen2_engine_cs.h
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2020 Intel Corporation
> + */
> +
> +#ifndef __GEN2_ENGINE_CS_H__
> +#define __GEN2_ENGINE_CS_H__
> +
> +#include <linux/types.h>
> +
> +struct i915_request;
> +struct intel_engine_cs;
> +
> +int gen2_emit_flush(struct i915_request *rq, u32 mode);
> +int gen4_emit_flush_rcs(struct i915_request *rq, u32 mode);
> +int gen4_emit_flush_vcs(struct i915_request *rq, u32 mode);
> +
> +u32 *gen3_emit_breadcrumb(struct i915_request *rq, u32 *cs);
> +u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs);
> +
> +int i830_emit_bb_start(struct i915_request *rq,
> +		       u64 offset, u32 len,
> +		       unsigned int dispatch_flags);
> +int gen3_emit_bb_start(struct i915_request *rq,
> +		       u64 offset, u32 len,
> +		       unsigned int dispatch_flags);
> +int gen4_emit_bb_start(struct i915_request *rq,
> +		       u64 offset, u32 length,
> +		       unsigned int dispatch_flags);
> +
> +void gen2_irq_enable(struct intel_engine_cs *engine);
> +void gen2_irq_disable(struct intel_engine_cs *engine);
> +void gen3_irq_enable(struct intel_engine_cs *engine);
> +void gen3_irq_disable(struct intel_engine_cs *engine);
> +void gen5_irq_enable(struct intel_engine_cs *engine);
> +void gen5_irq_disable(struct intel_engine_cs *engine);
> +
> +#endif /* __GEN2_ENGINE_CS_H__ */
> diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.c b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
> new file mode 100644
> index 000000000000..ce38d1bcaba3
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.c
> @@ -0,0 +1,455 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2020 Intel Corporation
> + */
> +
> +#include "gen6_engine_cs.h"
> +#include "intel_engine.h"
> +#include "intel_gpu_commands.h"
> +#include "intel_gt.h"
> +#include "intel_gt_irq.h"
> +#include "intel_gt_pm_irq.h"
> +#include "intel_ring.h"
> +
> +#define HWS_SCRATCH_ADDR	(I915_GEM_HWS_SCRATCH * sizeof(u32))
> +
> +/*
> + * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
> + * implementing two workarounds on gen6.  From section 1.4.7.1
> + * "PIPE_CONTROL" of the Sandy Bridge PRM volume 2 part 1:
> + *
> + * [DevSNB-C+{W/A}] Before any depth stall flush (including those
> + * produced by non-pipelined state commands), software needs to first
> + * send a PIPE_CONTROL with no bits set except Post-Sync Operation !=
> + * 0.
> + *
> + * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache Flush Enable
> + * =1, a PIPE_CONTROL with any non-zero post-sync-op is required.
> + *
> + * And the workaround for these two requires this workaround first:
> + *
> + * [Dev-SNB{W/A}]: Pipe-control with CS-stall bit set must be sent
> + * BEFORE the pipe-control with a post-sync op and no write-cache
> + * flushes.
> + *
> + * And this last workaround is tricky because of the requirements on
> + * that bit.  From section 1.4.7.2.3 "Stall" of the Sandy Bridge PRM
> + * volume 2 part 1:
> + *
> + *     "1 of the following must also be set:
> + *      - Render Target Cache Flush Enable ([12] of DW1)
> + *      - Depth Cache Flush Enable ([0] of DW1)
> + *      - Stall at Pixel Scoreboard ([1] of DW1)
> + *      - Depth Stall ([13] of DW1)
> + *      - Post-Sync Operation ([13] of DW1)
> + *      - Notify Enable ([8] of DW1)"
> + *
> + * The cache flushes require the workaround flush that triggered this
> + * one, so we can't use it.  Depth stall would trigger the same.
> + * Post-sync nonzero is what triggered this second workaround, so we
> + * can't use that one either.  Notify enable is IRQs, which aren't
> + * really our business.  That leaves only stall at scoreboard.
> + */
> +static int
> +gen6_emit_post_sync_nonzero_flush(struct i915_request *rq)
> +{
> +	u32 scratch_addr =
> +		intel_gt_scratch_offset(rq->engine->gt,
> +					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
> +	u32 *cs;
> +
> +	cs = intel_ring_begin(rq, 6);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = GFX_OP_PIPE_CONTROL(5);
> +	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
> +	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
> +	*cs++ = 0; /* low dword */
> +	*cs++ = 0; /* high dword */
> +	*cs++ = MI_NOOP;
> +	intel_ring_advance(rq, cs);
> +
> +	cs = intel_ring_begin(rq, 6);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = GFX_OP_PIPE_CONTROL(5);
> +	*cs++ = PIPE_CONTROL_QW_WRITE;
> +	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
> +	*cs++ = 0;
> +	*cs++ = 0;
> +	*cs++ = MI_NOOP;
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +int gen6_emit_flush_rcs(struct i915_request *rq, u32 mode)
> +{
> +	u32 scratch_addr =
> +		intel_gt_scratch_offset(rq->engine->gt,
> +					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
> +	u32 *cs, flags = 0;
> +	int ret;
> +
> +	/* Force SNB workarounds for PIPE_CONTROL flushes */
> +	ret = gen6_emit_post_sync_nonzero_flush(rq);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Just flush everything.  Experiments have shown that reducing the
> +	 * number of bits based on the write domains has little performance
> +	 * impact. And when rearranging requests, the order of flushes is
> +	 * unknown.
> +	 */
> +	if (mode & EMIT_FLUSH) {
> +		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> +		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> +		/*
> +		 * Ensure that any following seqno writes only happen
> +		 * when the render cache is indeed flushed.
> +		 */
> +		flags |= PIPE_CONTROL_CS_STALL;
> +	}
> +	if (mode & EMIT_INVALIDATE) {
> +		flags |= PIPE_CONTROL_TLB_INVALIDATE;
> +		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
> +		/*
> +		 * TLB invalidate requires a post-sync write.
> +		 */
> +		flags |= PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
> +	}
> +
> +	cs = intel_ring_begin(rq, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = GFX_OP_PIPE_CONTROL(4);
> +	*cs++ = flags;
> +	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
> +	*cs++ = 0;
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
> +{
> +	/* First we do the gen6_emit_post_sync_nonzero_flush w/a */
> +	*cs++ = GFX_OP_PIPE_CONTROL(4);
> +	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
> +	*cs++ = 0;
> +	*cs++ = 0;
> +
> +	*cs++ = GFX_OP_PIPE_CONTROL(4);
> +	*cs++ = PIPE_CONTROL_QW_WRITE;
> +	*cs++ = intel_gt_scratch_offset(rq->engine->gt,
> +					INTEL_GT_SCRATCH_FIELD_DEFAULT) |
> +		PIPE_CONTROL_GLOBAL_GTT;
> +	*cs++ = 0;
> +
> +	/* Finally we can flush and with it emit the breadcrumb */
> +	*cs++ = GFX_OP_PIPE_CONTROL(4);
> +	*cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
> +		 PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> +		 PIPE_CONTROL_DC_FLUSH_ENABLE |
> +		 PIPE_CONTROL_QW_WRITE |
> +		 PIPE_CONTROL_CS_STALL);
> +	*cs++ = i915_request_active_timeline(rq)->hwsp_offset |
> +		PIPE_CONTROL_GLOBAL_GTT;
> +	*cs++ = rq->fence.seqno;
> +
> +	*cs++ = MI_USER_INTERRUPT;
> +	*cs++ = MI_NOOP;
> +
> +	rq->tail = intel_ring_offset(rq, cs);
> +	assert_ring_tail_valid(rq->ring, rq->tail);
> +
> +	return cs;
> +}
> +
> +static int mi_flush_dw(struct i915_request *rq, u32 flags)
> +{
> +	u32 cmd, *cs;
> +
> +	cs = intel_ring_begin(rq, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	cmd = MI_FLUSH_DW;
> +
> +	/*
> +	 * We always require a command barrier so that subsequent
> +	 * commands, such as breadcrumb interrupts, are strictly ordered
> +	 * wrt the contents of the write cache being flushed to memory
> +	 * (and thus being coherent from the CPU).
> +	 */
> +	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> +
> +	/*
> +	 * Bspec vol 1c.3 - blitter engine command streamer:
> +	 * "If ENABLED, all TLBs will be invalidated once the flush
> +	 * operation is complete. This bit is only valid when the
> +	 * Post-Sync Operation field is a value of 1h or 3h."
> +	 */
> +	cmd |= flags;
> +
> +	*cs++ = cmd;
> +	*cs++ = HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT;
> +	*cs++ = 0;
> +	*cs++ = MI_NOOP;
> +
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +static int gen6_flush_dw(struct i915_request *rq, u32 mode, u32 invflags)
> +{
> +	return mi_flush_dw(rq, mode & EMIT_INVALIDATE ? invflags : 0);
> +}
> +
> +int gen6_emit_flush_xcs(struct i915_request *rq, u32 mode)
> +{
> +	return gen6_flush_dw(rq, mode, MI_INVALIDATE_TLB);
> +}
> +
> +int gen6_emit_flush_vcs(struct i915_request *rq, u32 mode)
> +{
> +	return gen6_flush_dw(rq, mode, MI_INVALIDATE_TLB | MI_INVALIDATE_BSD);
> +}
> +
> +int gen6_emit_bb_start(struct i915_request *rq,
> +		       u64 offset, u32 len,
> +		       unsigned int dispatch_flags)
> +{
> +	u32 security;
> +	u32 *cs;
> +
> +	security = MI_BATCH_NON_SECURE_I965;
> +	if (dispatch_flags & I915_DISPATCH_SECURE)
> +		security = 0;
> +
> +	cs = intel_ring_begin(rq, 2);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	cs = __gen6_emit_bb_start(cs, offset, security);
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +int
> +hsw_emit_bb_start(struct i915_request *rq,
> +		  u64 offset, u32 len,
> +		  unsigned int dispatch_flags)
> +{
> +	u32 security;
> +	u32 *cs;
> +
> +	security = MI_BATCH_PPGTT_HSW | MI_BATCH_NON_SECURE_HSW;
> +	if (dispatch_flags & I915_DISPATCH_SECURE)
> +		security = 0;
> +
> +	cs = intel_ring_begin(rq, 2);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	cs = __gen6_emit_bb_start(cs, offset, security);
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +static int gen7_stall_cs(struct i915_request *rq)
> +{
> +	u32 *cs;
> +
> +	cs = intel_ring_begin(rq, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = GFX_OP_PIPE_CONTROL(4);
> +	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
> +	*cs++ = 0;
> +	*cs++ = 0;
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +int gen7_emit_flush_rcs(struct i915_request *rq, u32 mode)
> +{
> +	u32 scratch_addr =
> +		intel_gt_scratch_offset(rq->engine->gt,
> +					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
> +	u32 *cs, flags = 0;
> +
> +	/*
> +	 * Ensure that any following seqno writes only happen when the render
> +	 * cache is indeed flushed.
> +	 *
> +	 * Workaround: 4th PIPE_CONTROL command (except the ones with only
> +	 * read-cache invalidate bits set) must have the CS_STALL bit set. We
> +	 * don't try to be clever and just set it unconditionally.
> +	 */
> +	flags |= PIPE_CONTROL_CS_STALL;
> +
> +	/*
> +	 * CS_STALL suggests at least a post-sync write.
> +	 */
> +	flags |= PIPE_CONTROL_QW_WRITE;
> +	flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
> +
> +	/*
> +	 * Just flush everything.  Experiments have shown that reducing the
> +	 * number of bits based on the write domains has little performance
> +	 * impact.
> +	 */
> +	if (mode & EMIT_FLUSH) {
> +		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> +		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> +		flags |= PIPE_CONTROL_DC_FLUSH_ENABLE;
> +		flags |= PIPE_CONTROL_FLUSH_ENABLE;
> +	}
> +	if (mode & EMIT_INVALIDATE) {
> +		flags |= PIPE_CONTROL_TLB_INVALIDATE;
> +		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
> +		flags |= PIPE_CONTROL_MEDIA_STATE_CLEAR;
> +
> +		/*
> +		 * Workaround: we must issue a pipe_control with CS-stall bit
> +		 * set before a pipe_control command that has the state cache
> +		 * invalidate bit set.
> +		 */
> +		gen7_stall_cs(rq);
> +	}
> +
> +	cs = intel_ring_begin(rq, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	*cs++ = GFX_OP_PIPE_CONTROL(4);
> +	*cs++ = flags;
> +	*cs++ = scratch_addr;
> +	*cs++ = 0;
> +	intel_ring_advance(rq, cs);
> +
> +	return 0;
> +}
> +
> +u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
> +{
> +	*cs++ = GFX_OP_PIPE_CONTROL(4);
> +	*cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
> +		 PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> +		 PIPE_CONTROL_DC_FLUSH_ENABLE |
> +		 PIPE_CONTROL_FLUSH_ENABLE |
> +		 PIPE_CONTROL_QW_WRITE |
> +		 PIPE_CONTROL_GLOBAL_GTT_IVB |
> +		 PIPE_CONTROL_CS_STALL);
> +	*cs++ = i915_request_active_timeline(rq)->hwsp_offset;
> +	*cs++ = rq->fence.seqno;
> +
> +	*cs++ = MI_USER_INTERRUPT;
> +	*cs++ = MI_NOOP;
> +
> +	rq->tail = intel_ring_offset(rq, cs);
> +	assert_ring_tail_valid(rq->ring, rq->tail);
> +
> +	return cs;
> +}
> +
> +u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
> +{
> +	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
> +	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
> +
> +	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> +	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
> +	*cs++ = rq->fence.seqno;
> +
> +	*cs++ = MI_USER_INTERRUPT;
> +
> +	rq->tail = intel_ring_offset(rq, cs);
> +	assert_ring_tail_valid(rq->ring, rq->tail);
> +
> +	return cs;
> +}
> +
> +#define GEN7_XCS_WA 32
> +u32 *gen7_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
> +{
> +	int i;
> +
> +	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
> +	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
> +
> +	*cs++ = MI_FLUSH_DW | MI_INVALIDATE_TLB |
> +		MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> +	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
> +	*cs++ = rq->fence.seqno;
> +
> +	for (i = 0; i < GEN7_XCS_WA; i++) {
> +		*cs++ = MI_STORE_DWORD_INDEX;
> +		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> +		*cs++ = rq->fence.seqno;
> +	}
> +
> +	*cs++ = MI_FLUSH_DW;
> +	*cs++ = 0;
> +	*cs++ = 0;
> +
> +	*cs++ = MI_USER_INTERRUPT;
> +	*cs++ = MI_NOOP;
> +
> +	rq->tail = intel_ring_offset(rq, cs);
> +	assert_ring_tail_valid(rq->ring, rq->tail);
> +
> +	return cs;
> +}
> +#undef GEN7_XCS_WA
> +
> +void gen6_irq_enable(struct intel_engine_cs *engine)
> +{
> +	ENGINE_WRITE(engine, RING_IMR,
> +		     ~(engine->irq_enable_mask | engine->irq_keep_mask));
> +
> +	/* Flush/delay to ensure the RING_IMR is active before the GT IMR */
> +	ENGINE_POSTING_READ(engine, RING_IMR);
> +
> +	gen5_gt_enable_irq(engine->gt, engine->irq_enable_mask);
> +}
> +
> +void gen6_irq_disable(struct intel_engine_cs *engine)
> +{
> +	ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask);
> +	gen5_gt_disable_irq(engine->gt, engine->irq_enable_mask);
> +}
> +
> +void hsw_irq_enable_vecs(struct intel_engine_cs *engine)
> +{
> +	ENGINE_WRITE(engine, RING_IMR, ~engine->irq_enable_mask);
> +
> +	/* Flush/delay to ensure the RING_IMR is active before the GT IMR */
> +	ENGINE_POSTING_READ(engine, RING_IMR);
> +
> +	gen6_gt_pm_unmask_irq(engine->gt, engine->irq_enable_mask);
> +}
> +
> +void hsw_irq_disable_vecs(struct intel_engine_cs *engine)
> +{
> +	ENGINE_WRITE(engine, RING_IMR, ~0);
> +	gen6_gt_pm_mask_irq(engine->gt, engine->irq_enable_mask);
> +}
> diff --git a/drivers/gpu/drm/i915/gt/gen6_engine_cs.h b/drivers/gpu/drm/i915/gt/gen6_engine_cs.h
> new file mode 100644
> index 000000000000..76c6bc9f3bde
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/gen6_engine_cs.h
> @@ -0,0 +1,39 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2020 Intel Corporation
> + */
> +
> +#ifndef __GEN6_ENGINE_CS_H__
> +#define __GEN6_ENGINE_CS_H__
> +
> +#include <linux/types.h>
> +
> +#include "intel_gpu_commands.h"
> +
> +struct i915_request;
> +struct intel_engine_cs;
> +
> +int gen6_emit_flush_rcs(struct i915_request *rq, u32 mode);
> +int gen6_emit_flush_vcs(struct i915_request *rq, u32 mode);
> +int gen6_emit_flush_xcs(struct i915_request *rq, u32 mode);
> +u32 *gen6_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs);
> +u32 *gen6_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs);
> +
> +int gen7_emit_flush_rcs(struct i915_request *rq, u32 mode);
> +u32 *gen7_emit_breadcrumb_rcs(struct i915_request *rq, u32 *cs);
> +u32 *gen7_emit_breadcrumb_xcs(struct i915_request *rq, u32 *cs);
> +
> +int gen6_emit_bb_start(struct i915_request *rq,
> +		       u64 offset, u32 len,
> +		       unsigned int dispatch_flags);
> +int hsw_emit_bb_start(struct i915_request *rq,
> +		      u64 offset, u32 len,
> +		      unsigned int dispatch_flags);
> +
> +void gen6_irq_enable(struct intel_engine_cs *engine);
> +void gen6_irq_disable(struct intel_engine_cs *engine);
> +
> +void hsw_irq_enable_vecs(struct intel_engine_cs *engine);
> +void hsw_irq_disable_vecs(struct intel_engine_cs *engine);
> +
> +#endif /* __GEN6_ENGINE_CS_H__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 9bf6d4989968..791897f8d847 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -187,7 +187,6 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
>  #define I915_GEM_HWS_SEQNO		0x40
>  #define I915_GEM_HWS_SEQNO_ADDR		(I915_GEM_HWS_SEQNO * sizeof(u32))
>  #define I915_GEM_HWS_SCRATCH		0x80
> -#define I915_GEM_HWS_SCRATCH_ADDR	(I915_GEM_HWS_SCRATCH * sizeof(u32))
>  
>  #define I915_HWS_CSB_BUF0_INDEX		0x10
>  #define I915_HWS_CSB_WRITE_INDEX	0x1f
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index ca7286e58409..96881cd8b17b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -27,21 +27,15 @@
>   *
>   */
>  
> -#include <linux/log2.h>
> -
> -#include "gem/i915_gem_context.h"
> -
> +#include "gen2_engine_cs.h"
> +#include "gen6_engine_cs.h"
>  #include "gen6_ppgtt.h"
>  #include "gen7_renderclear.h"
>  #include "i915_drv.h"
> -#include "i915_trace.h"
>  #include "intel_context.h"
>  #include "intel_gt.h"
> -#include "intel_gt_irq.h"
> -#include "intel_gt_pm_irq.h"
>  #include "intel_reset.h"
>  #include "intel_ring.h"
> -#include "intel_workarounds.h"
>  #include "shmem_utils.h"
>  
>  /* Rough estimate of the typical request size, performing a flush,
> @@ -49,436 +43,6 @@
>   */
>  #define LEGACY_REQUEST_SIZE 200
>  
> -static int
> -gen2_render_ring_flush(struct i915_request *rq, u32 mode)
> -{
> -	unsigned int num_store_dw;
> -	u32 cmd, *cs;
> -
> -	cmd = MI_FLUSH;
> -	num_store_dw = 0;
> -	if (mode & EMIT_INVALIDATE)
> -		cmd |= MI_READ_FLUSH;
> -	if (mode & EMIT_FLUSH)
> -		num_store_dw = 4;
> -
> -	cs = intel_ring_begin(rq, 2 + 3 * num_store_dw);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = cmd;
> -	while (num_store_dw--) {
> -		*cs++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
> -		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
> -						INTEL_GT_SCRATCH_FIELD_DEFAULT);
> -		*cs++ = 0;
> -	}
> -	*cs++ = MI_FLUSH | MI_NO_WRITE_FLUSH;
> -
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
> -static int
> -gen4_render_ring_flush(struct i915_request *rq, u32 mode)
> -{
> -	u32 cmd, *cs;
> -	int i;
> -
> -	/*
> -	 * read/write caches:
> -	 *
> -	 * I915_GEM_DOMAIN_RENDER is always invalidated, but is
> -	 * only flushed if MI_NO_WRITE_FLUSH is unset.  On 965, it is
> -	 * also flushed at 2d versus 3d pipeline switches.
> -	 *
> -	 * read-only caches:
> -	 *
> -	 * I915_GEM_DOMAIN_SAMPLER is flushed on pre-965 if
> -	 * MI_READ_FLUSH is set, and is always flushed on 965.
> -	 *
> -	 * I915_GEM_DOMAIN_COMMAND may not exist?
> -	 *
> -	 * I915_GEM_DOMAIN_INSTRUCTION, which exists on 965, is
> -	 * invalidated when MI_EXE_FLUSH is set.
> -	 *
> -	 * I915_GEM_DOMAIN_VERTEX, which exists on 965, is
> -	 * invalidated with every MI_FLUSH.
> -	 *
> -	 * TLBs:
> -	 *
> -	 * On 965, TLBs associated with I915_GEM_DOMAIN_COMMAND
> -	 * and I915_GEM_DOMAIN_CPU in are invalidated at PTE write and
> -	 * I915_GEM_DOMAIN_RENDER and I915_GEM_DOMAIN_SAMPLER
> -	 * are flushed at any MI_FLUSH.
> -	 */
> -
> -	cmd = MI_FLUSH;
> -	if (mode & EMIT_INVALIDATE) {
> -		cmd |= MI_EXE_FLUSH;
> -		if (IS_G4X(rq->i915) || IS_GEN(rq->i915, 5))
> -			cmd |= MI_INVALIDATE_ISP;
> -	}
> -
> -	i = 2;
> -	if (mode & EMIT_INVALIDATE)
> -		i += 20;
> -
> -	cs = intel_ring_begin(rq, i);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = cmd;
> -
> -	/*
> -	 * A random delay to let the CS invalidate take effect? Without this
> -	 * delay, the GPU relocation path fails as the CS does not see
> -	 * the updated contents. Just as important, if we apply the flushes
> -	 * to the EMIT_FLUSH branch (i.e. immediately after the relocation
> -	 * write and before the invalidate on the next batch), the relocations
> -	 * still fail. This implies that is a delay following invalidation
> -	 * that is required to reset the caches as opposed to a delay to
> -	 * ensure the memory is written.
> -	 */
> -	if (mode & EMIT_INVALIDATE) {
> -		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> -		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
> -						INTEL_GT_SCRATCH_FIELD_DEFAULT) |
> -			PIPE_CONTROL_GLOBAL_GTT;
> -		*cs++ = 0;
> -		*cs++ = 0;
> -
> -		for (i = 0; i < 12; i++)
> -			*cs++ = MI_FLUSH;
> -
> -		*cs++ = GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE;
> -		*cs++ = intel_gt_scratch_offset(rq->engine->gt,
> -						INTEL_GT_SCRATCH_FIELD_DEFAULT) |
> -			PIPE_CONTROL_GLOBAL_GTT;
> -		*cs++ = 0;
> -		*cs++ = 0;
> -	}
> -
> -	*cs++ = cmd;
> -
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
> -/*
> - * Emits a PIPE_CONTROL with a non-zero post-sync operation, for
> - * implementing two workarounds on gen6.  From section 1.4.7.1
> - * "PIPE_CONTROL" of the Sandy Bridge PRM volume 2 part 1:
> - *
> - * [DevSNB-C+{W/A}] Before any depth stall flush (including those
> - * produced by non-pipelined state commands), software needs to first
> - * send a PIPE_CONTROL with no bits set except Post-Sync Operation !=
> - * 0.
> - *
> - * [Dev-SNB{W/A}]: Before a PIPE_CONTROL with Write Cache Flush Enable
> - * =1, a PIPE_CONTROL with any non-zero post-sync-op is required.
> - *
> - * And the workaround for these two requires this workaround first:
> - *
> - * [Dev-SNB{W/A}]: Pipe-control with CS-stall bit set must be sent
> - * BEFORE the pipe-control with a post-sync op and no write-cache
> - * flushes.
> - *
> - * And this last workaround is tricky because of the requirements on
> - * that bit.  From section 1.4.7.2.3 "Stall" of the Sandy Bridge PRM
> - * volume 2 part 1:
> - *
> - *     "1 of the following must also be set:
> - *      - Render Target Cache Flush Enable ([12] of DW1)
> - *      - Depth Cache Flush Enable ([0] of DW1)
> - *      - Stall at Pixel Scoreboard ([1] of DW1)
> - *      - Depth Stall ([13] of DW1)
> - *      - Post-Sync Operation ([13] of DW1)
> - *      - Notify Enable ([8] of DW1)"
> - *
> - * The cache flushes require the workaround flush that triggered this
> - * one, so we can't use it.  Depth stall would trigger the same.
> - * Post-sync nonzero is what triggered this second workaround, so we
> - * can't use that one either.  Notify enable is IRQs, which aren't
> - * really our business.  That leaves only stall at scoreboard.
> - */
> -static int
> -gen6_emit_post_sync_nonzero_flush(struct i915_request *rq)
> -{
> -	u32 scratch_addr =
> -		intel_gt_scratch_offset(rq->engine->gt,
> -					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
> -	u32 *cs;
> -
> -	cs = intel_ring_begin(rq, 6);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = GFX_OP_PIPE_CONTROL(5);
> -	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
> -	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
> -	*cs++ = 0; /* low dword */
> -	*cs++ = 0; /* high dword */
> -	*cs++ = MI_NOOP;
> -	intel_ring_advance(rq, cs);
> -
> -	cs = intel_ring_begin(rq, 6);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = GFX_OP_PIPE_CONTROL(5);
> -	*cs++ = PIPE_CONTROL_QW_WRITE;
> -	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
> -	*cs++ = 0;
> -	*cs++ = 0;
> -	*cs++ = MI_NOOP;
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
> -static int
> -gen6_render_ring_flush(struct i915_request *rq, u32 mode)
> -{
> -	u32 scratch_addr =
> -		intel_gt_scratch_offset(rq->engine->gt,
> -					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
> -	u32 *cs, flags = 0;
> -	int ret;
> -
> -	/* Force SNB workarounds for PIPE_CONTROL flushes */
> -	ret = gen6_emit_post_sync_nonzero_flush(rq);
> -	if (ret)
> -		return ret;
> -
> -	/* Just flush everything.  Experiments have shown that reducing the
> -	 * number of bits based on the write domains has little performance
> -	 * impact.
> -	 */
> -	if (mode & EMIT_FLUSH) {
> -		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> -		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> -		/*
> -		 * Ensure that any following seqno writes only happen
> -		 * when the render cache is indeed flushed.
> -		 */
> -		flags |= PIPE_CONTROL_CS_STALL;
> -	}
> -	if (mode & EMIT_INVALIDATE) {
> -		flags |= PIPE_CONTROL_TLB_INVALIDATE;
> -		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
> -		/*
> -		 * TLB invalidate requires a post-sync write.
> -		 */
> -		flags |= PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
> -	}
> -
> -	cs = intel_ring_begin(rq, 4);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = GFX_OP_PIPE_CONTROL(4);
> -	*cs++ = flags;
> -	*cs++ = scratch_addr | PIPE_CONTROL_GLOBAL_GTT;
> -	*cs++ = 0;
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
> -static u32 *gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
> -{
> -	/* First we do the gen6_emit_post_sync_nonzero_flush w/a */
> -	*cs++ = GFX_OP_PIPE_CONTROL(4);
> -	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
> -	*cs++ = 0;
> -	*cs++ = 0;
> -
> -	*cs++ = GFX_OP_PIPE_CONTROL(4);
> -	*cs++ = PIPE_CONTROL_QW_WRITE;
> -	*cs++ = intel_gt_scratch_offset(rq->engine->gt,
> -					INTEL_GT_SCRATCH_FIELD_DEFAULT) |
> -		PIPE_CONTROL_GLOBAL_GTT;
> -	*cs++ = 0;
> -
> -	/* Finally we can flush and with it emit the breadcrumb */
> -	*cs++ = GFX_OP_PIPE_CONTROL(4);
> -	*cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
> -		 PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> -		 PIPE_CONTROL_DC_FLUSH_ENABLE |
> -		 PIPE_CONTROL_QW_WRITE |
> -		 PIPE_CONTROL_CS_STALL);
> -	*cs++ = i915_request_active_timeline(rq)->hwsp_offset |
> -		PIPE_CONTROL_GLOBAL_GTT;
> -	*cs++ = rq->fence.seqno;
> -
> -	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
> -
> -	rq->tail = intel_ring_offset(rq, cs);
> -	assert_ring_tail_valid(rq->ring, rq->tail);
> -
> -	return cs;
> -}
> -
> -static int
> -gen7_render_ring_cs_stall_wa(struct i915_request *rq)
> -{
> -	u32 *cs;
> -
> -	cs = intel_ring_begin(rq, 4);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = GFX_OP_PIPE_CONTROL(4);
> -	*cs++ = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD;
> -	*cs++ = 0;
> -	*cs++ = 0;
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
> -static int
> -gen7_render_ring_flush(struct i915_request *rq, u32 mode)
> -{
> -	u32 scratch_addr =
> -		intel_gt_scratch_offset(rq->engine->gt,
> -					INTEL_GT_SCRATCH_FIELD_RENDER_FLUSH);
> -	u32 *cs, flags = 0;
> -
> -	/*
> -	 * Ensure that any following seqno writes only happen when the render
> -	 * cache is indeed flushed.
> -	 *
> -	 * Workaround: 4th PIPE_CONTROL command (except the ones with only
> -	 * read-cache invalidate bits set) must have the CS_STALL bit set. We
> -	 * don't try to be clever and just set it unconditionally.
> -	 */
> -	flags |= PIPE_CONTROL_CS_STALL;
> -
> -	/*
> -	 * CS_STALL suggests at least a post-sync write.
> -	 */
> -	flags |= PIPE_CONTROL_QW_WRITE;
> -	flags |= PIPE_CONTROL_GLOBAL_GTT_IVB;
> -
> -	/* Just flush everything.  Experiments have shown that reducing the
> -	 * number of bits based on the write domains has little performance
> -	 * impact.
> -	 */
> -	if (mode & EMIT_FLUSH) {
> -		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
> -		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
> -		flags |= PIPE_CONTROL_DC_FLUSH_ENABLE;
> -		flags |= PIPE_CONTROL_FLUSH_ENABLE;
> -	}
> -	if (mode & EMIT_INVALIDATE) {
> -		flags |= PIPE_CONTROL_TLB_INVALIDATE;
> -		flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_VF_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_CONST_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_STATE_CACHE_INVALIDATE;
> -		flags |= PIPE_CONTROL_MEDIA_STATE_CLEAR;
> -
> -		/* Workaround: we must issue a pipe_control with CS-stall bit
> -		 * set before a pipe_control command that has the state cache
> -		 * invalidate bit set. */
> -		gen7_render_ring_cs_stall_wa(rq);
> -	}
> -
> -	cs = intel_ring_begin(rq, 4);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = GFX_OP_PIPE_CONTROL(4);
> -	*cs++ = flags;
> -	*cs++ = scratch_addr;
> -	*cs++ = 0;
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
> -static u32 *gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
> -{
> -	*cs++ = GFX_OP_PIPE_CONTROL(4);
> -	*cs++ = (PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
> -		 PIPE_CONTROL_DEPTH_CACHE_FLUSH |
> -		 PIPE_CONTROL_DC_FLUSH_ENABLE |
> -		 PIPE_CONTROL_FLUSH_ENABLE |
> -		 PIPE_CONTROL_QW_WRITE |
> -		 PIPE_CONTROL_GLOBAL_GTT_IVB |
> -		 PIPE_CONTROL_CS_STALL);
> -	*cs++ = i915_request_active_timeline(rq)->hwsp_offset;
> -	*cs++ = rq->fence.seqno;
> -
> -	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
> -
> -	rq->tail = intel_ring_offset(rq, cs);
> -	assert_ring_tail_valid(rq->ring, rq->tail);
> -
> -	return cs;
> -}
> -
> -static u32 *gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
> -{
> -	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
> -	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
> -
> -	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> -	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
> -	*cs++ = rq->fence.seqno;
> -
> -	*cs++ = MI_USER_INTERRUPT;
> -
> -	rq->tail = intel_ring_offset(rq, cs);
> -	assert_ring_tail_valid(rq->ring, rq->tail);
> -
> -	return cs;
> -}
> -
> -#define GEN7_XCS_WA 32
> -static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
> -{
> -	int i;
> -
> -	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
> -	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
> -
> -	*cs++ = MI_FLUSH_DW | MI_INVALIDATE_TLB |
> -		MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> -	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
> -	*cs++ = rq->fence.seqno;
> -
> -	for (i = 0; i < GEN7_XCS_WA; i++) {
> -		*cs++ = MI_STORE_DWORD_INDEX;
> -		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> -		*cs++ = rq->fence.seqno;
> -	}
> -
> -	*cs++ = MI_FLUSH_DW;
> -	*cs++ = 0;
> -	*cs++ = 0;
> -
> -	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
> -
> -	rq->tail = intel_ring_offset(rq, cs);
> -	assert_ring_tail_valid(rq->ring, rq->tail);
> -
> -	return cs;
> -}
> -#undef GEN7_XCS_WA
> -
>  static void set_hwstam(struct intel_engine_cs *engine, u32 mask)
>  {
>  	/*
> @@ -918,255 +482,6 @@ static void i9xx_submit_request(struct i915_request *request)
>  		     intel_ring_set_tail(request->ring, request->tail));
>  }
>  
> -static u32 *i9xx_emit_breadcrumb(struct i915_request *rq, u32 *cs)
> -{
> -	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
> -	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
> -
> -	*cs++ = MI_FLUSH;
> -
> -	*cs++ = MI_STORE_DWORD_INDEX;
> -	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> -	*cs++ = rq->fence.seqno;
> -
> -	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
> -
> -	rq->tail = intel_ring_offset(rq, cs);
> -	assert_ring_tail_valid(rq->ring, rq->tail);
> -
> -	return cs;
> -}
> -
> -#define GEN5_WA_STORES 8 /* must be at least 1! */
> -static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
> -{
> -	int i;
> -
> -	GEM_BUG_ON(i915_request_active_timeline(rq)->hwsp_ggtt != rq->engine->status_page.vma);
> -	GEM_BUG_ON(offset_in_page(i915_request_active_timeline(rq)->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
> -
> -	*cs++ = MI_FLUSH;
> -
> -	BUILD_BUG_ON(GEN5_WA_STORES < 1);
> -	for (i = 0; i < GEN5_WA_STORES; i++) {
> -		*cs++ = MI_STORE_DWORD_INDEX;
> -		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> -		*cs++ = rq->fence.seqno;
> -	}
> -
> -	*cs++ = MI_USER_INTERRUPT;
> -
> -	rq->tail = intel_ring_offset(rq, cs);
> -	assert_ring_tail_valid(rq->ring, rq->tail);
> -
> -	return cs;
> -}
> -#undef GEN5_WA_STORES
> -
> -static void
> -gen5_irq_enable(struct intel_engine_cs *engine)
> -{
> -	gen5_gt_enable_irq(engine->gt, engine->irq_enable_mask);
> -}
> -
> -static void
> -gen5_irq_disable(struct intel_engine_cs *engine)
> -{
> -	gen5_gt_disable_irq(engine->gt, engine->irq_enable_mask);
> -}
> -
> -static void
> -i9xx_irq_enable(struct intel_engine_cs *engine)
> -{
> -	engine->i915->irq_mask &= ~engine->irq_enable_mask;
> -	intel_uncore_write(engine->uncore, GEN2_IMR, engine->i915->irq_mask);
> -	intel_uncore_posting_read_fw(engine->uncore, GEN2_IMR);
> -}
> -
> -static void
> -i9xx_irq_disable(struct intel_engine_cs *engine)
> -{
> -	engine->i915->irq_mask |= engine->irq_enable_mask;
> -	intel_uncore_write(engine->uncore, GEN2_IMR, engine->i915->irq_mask);
> -}
> -
> -static void
> -i8xx_irq_enable(struct intel_engine_cs *engine)
> -{
> -	struct drm_i915_private *i915 = engine->i915;
> -
> -	i915->irq_mask &= ~engine->irq_enable_mask;
> -	intel_uncore_write16(&i915->uncore, GEN2_IMR, i915->irq_mask);
> -	ENGINE_POSTING_READ16(engine, RING_IMR);
> -}
> -
> -static void
> -i8xx_irq_disable(struct intel_engine_cs *engine)
> -{
> -	struct drm_i915_private *i915 = engine->i915;
> -
> -	i915->irq_mask |= engine->irq_enable_mask;
> -	intel_uncore_write16(&i915->uncore, GEN2_IMR, i915->irq_mask);
> -}
> -
> -static int
> -bsd_ring_flush(struct i915_request *rq, u32 mode)
> -{
> -	u32 *cs;
> -
> -	cs = intel_ring_begin(rq, 2);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = MI_FLUSH;
> -	*cs++ = MI_NOOP;
> -	intel_ring_advance(rq, cs);
> -	return 0;
> -}
> -
> -static void
> -gen6_irq_enable(struct intel_engine_cs *engine)
> -{
> -	ENGINE_WRITE(engine, RING_IMR,
> -		     ~(engine->irq_enable_mask | engine->irq_keep_mask));
> -
> -	/* Flush/delay to ensure the RING_IMR is active before the GT IMR */
> -	ENGINE_POSTING_READ(engine, RING_IMR);
> -
> -	gen5_gt_enable_irq(engine->gt, engine->irq_enable_mask);
> -}
> -
> -static void
> -gen6_irq_disable(struct intel_engine_cs *engine)
> -{
> -	ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask);
> -	gen5_gt_disable_irq(engine->gt, engine->irq_enable_mask);
> -}
> -
> -static void
> -hsw_vebox_irq_enable(struct intel_engine_cs *engine)
> -{
> -	ENGINE_WRITE(engine, RING_IMR, ~engine->irq_enable_mask);
> -
> -	/* Flush/delay to ensure the RING_IMR is active before the GT IMR */
> -	ENGINE_POSTING_READ(engine, RING_IMR);
> -
> -	gen6_gt_pm_unmask_irq(engine->gt, engine->irq_enable_mask);
> -}
> -
> -static void
> -hsw_vebox_irq_disable(struct intel_engine_cs *engine)
> -{
> -	ENGINE_WRITE(engine, RING_IMR, ~0);
> -	gen6_gt_pm_mask_irq(engine->gt, engine->irq_enable_mask);
> -}
> -
> -static int
> -i965_emit_bb_start(struct i915_request *rq,
> -		   u64 offset, u32 length,
> -		   unsigned int dispatch_flags)
> -{
> -	u32 *cs;
> -
> -	cs = intel_ring_begin(rq, 2);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT | (dispatch_flags &
> -		I915_DISPATCH_SECURE ? 0 : MI_BATCH_NON_SECURE_I965);
> -	*cs++ = offset;
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
> -/* Just userspace ABI convention to limit the wa batch bo to a resonable size */
> -#define I830_BATCH_LIMIT SZ_256K
> -#define I830_TLB_ENTRIES (2)
> -#define I830_WA_SIZE max(I830_TLB_ENTRIES*4096, I830_BATCH_LIMIT)
> -static int
> -i830_emit_bb_start(struct i915_request *rq,
> -		   u64 offset, u32 len,
> -		   unsigned int dispatch_flags)
> -{
> -	u32 *cs, cs_offset =
> -		intel_gt_scratch_offset(rq->engine->gt,
> -					INTEL_GT_SCRATCH_FIELD_DEFAULT);
> -
> -	GEM_BUG_ON(rq->engine->gt->scratch->size < I830_WA_SIZE);
> -
> -	cs = intel_ring_begin(rq, 6);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	/* Evict the invalid PTE TLBs */
> -	*cs++ = COLOR_BLT_CMD | BLT_WRITE_RGBA;
> -	*cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | 4096;
> -	*cs++ = I830_TLB_ENTRIES << 16 | 4; /* load each page */
> -	*cs++ = cs_offset;
> -	*cs++ = 0xdeadbeef;
> -	*cs++ = MI_NOOP;
> -	intel_ring_advance(rq, cs);
> -
> -	if ((dispatch_flags & I915_DISPATCH_PINNED) == 0) {
> -		if (len > I830_BATCH_LIMIT)
> -			return -ENOSPC;
> -
> -		cs = intel_ring_begin(rq, 6 + 2);
> -		if (IS_ERR(cs))
> -			return PTR_ERR(cs);
> -
> -		/* Blit the batch (which has now all relocs applied) to the
> -		 * stable batch scratch bo area (so that the CS never
> -		 * stumbles over its tlb invalidation bug) ...
> -		 */
> -		*cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2);
> -		*cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | 4096;
> -		*cs++ = DIV_ROUND_UP(len, 4096) << 16 | 4096;
> -		*cs++ = cs_offset;
> -		*cs++ = 4096;
> -		*cs++ = offset;
> -
> -		*cs++ = MI_FLUSH;
> -		*cs++ = MI_NOOP;
> -		intel_ring_advance(rq, cs);
> -
> -		/* ... and execute it. */
> -		offset = cs_offset;
> -	}
> -
> -	cs = intel_ring_begin(rq, 2);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
> -	*cs++ = offset | (dispatch_flags & I915_DISPATCH_SECURE ? 0 :
> -		MI_BATCH_NON_SECURE);
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
> -static int
> -i915_emit_bb_start(struct i915_request *rq,
> -		   u64 offset, u32 len,
> -		   unsigned int dispatch_flags)
> -{
> -	u32 *cs;
> -
> -	cs = intel_ring_begin(rq, 2);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
> -	*cs++ = offset | (dispatch_flags & I915_DISPATCH_SECURE ? 0 :
> -		MI_BATCH_NON_SECURE);
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
>  static void __ring_context_fini(struct intel_context *ce)
>  {
>  	i915_vma_put(ce->state);
> @@ -1704,99 +1019,6 @@ static void gen6_bsd_submit_request(struct i915_request *request)
>  	intel_uncore_forcewake_put(uncore, FORCEWAKE_ALL);
>  }
>  
> -static int mi_flush_dw(struct i915_request *rq, u32 flags)
> -{
> -	u32 cmd, *cs;
> -
> -	cs = intel_ring_begin(rq, 4);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	cmd = MI_FLUSH_DW;
> -
> -	/*
> -	 * We always require a command barrier so that subsequent
> -	 * commands, such as breadcrumb interrupts, are strictly ordered
> -	 * wrt the contents of the write cache being flushed to memory
> -	 * (and thus being coherent from the CPU).
> -	 */
> -	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> -
> -	/*
> -	 * Bspec vol 1c.3 - blitter engine command streamer:
> -	 * "If ENABLED, all TLBs will be invalidated once the flush
> -	 * operation is complete. This bit is only valid when the
> -	 * Post-Sync Operation field is a value of 1h or 3h."
> -	 */
> -	cmd |= flags;
> -
> -	*cs++ = cmd;
> -	*cs++ = I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT;
> -	*cs++ = 0;
> -	*cs++ = MI_NOOP;
> -
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
> -static int gen6_flush_dw(struct i915_request *rq, u32 mode, u32 invflags)
> -{
> -	return mi_flush_dw(rq, mode & EMIT_INVALIDATE ? invflags : 0);
> -}
> -
> -static int gen6_bsd_ring_flush(struct i915_request *rq, u32 mode)
> -{
> -	return gen6_flush_dw(rq, mode, MI_INVALIDATE_TLB | MI_INVALIDATE_BSD);
> -}
> -
> -static int
> -hsw_emit_bb_start(struct i915_request *rq,
> -		  u64 offset, u32 len,
> -		  unsigned int dispatch_flags)
> -{
> -	u32 *cs;
> -
> -	cs = intel_ring_begin(rq, 2);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = MI_BATCH_BUFFER_START | (dispatch_flags & I915_DISPATCH_SECURE ?
> -		0 : MI_BATCH_PPGTT_HSW | MI_BATCH_NON_SECURE_HSW);
> -	/* bit0-7 is the length on GEN6+ */
> -	*cs++ = offset;
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
> -static int
> -gen6_emit_bb_start(struct i915_request *rq,
> -		   u64 offset, u32 len,
> -		   unsigned int dispatch_flags)
> -{
> -	u32 *cs;
> -
> -	cs = intel_ring_begin(rq, 2);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = MI_BATCH_BUFFER_START | (dispatch_flags & I915_DISPATCH_SECURE ?
> -		0 : MI_BATCH_NON_SECURE_I965);
> -	/* bit0-7 is the length on GEN6+ */
> -	*cs++ = offset;
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
> -/* Blitter support (SandyBridge+) */
> -
> -static int gen6_ring_flush(struct i915_request *rq, u32 mode)
> -{
> -	return gen6_flush_dw(rq, mode, MI_INVALIDATE_TLB);
> -}
> -
>  static void i9xx_set_default_submission(struct intel_engine_cs *engine)
>  {
>  	engine->submit_request = i9xx_submit_request;
> @@ -1843,11 +1065,11 @@ static void setup_irq(struct intel_engine_cs *engine)
>  		engine->irq_enable = gen5_irq_enable;
>  		engine->irq_disable = gen5_irq_disable;
>  	} else if (INTEL_GEN(i915) >= 3) {
> -		engine->irq_enable = i9xx_irq_enable;
> -		engine->irq_disable = i9xx_irq_disable;
> +		engine->irq_enable = gen3_irq_enable;
> +		engine->irq_disable = gen3_irq_disable;
>  	} else {
> -		engine->irq_enable = i8xx_irq_enable;
> -		engine->irq_disable = i8xx_irq_disable;
> +		engine->irq_enable = gen2_irq_enable;
> +		engine->irq_disable = gen2_irq_disable;
>  	}
>  }
>  
> @@ -1874,7 +1096,7 @@ static void setup_common(struct intel_engine_cs *engine)
>  	 * equivalent to our next initial bread so we can elide
>  	 * engine->emit_init_breadcrumb().
>  	 */
> -	engine->emit_fini_breadcrumb = i9xx_emit_breadcrumb;
> +	engine->emit_fini_breadcrumb = gen3_emit_breadcrumb;
>  	if (IS_GEN(i915, 5))
>  		engine->emit_fini_breadcrumb = gen5_emit_breadcrumb;
>  
> @@ -1883,11 +1105,11 @@ static void setup_common(struct intel_engine_cs *engine)
>  	if (INTEL_GEN(i915) >= 6)
>  		engine->emit_bb_start = gen6_emit_bb_start;
>  	else if (INTEL_GEN(i915) >= 4)
> -		engine->emit_bb_start = i965_emit_bb_start;
> +		engine->emit_bb_start = gen4_emit_bb_start;
>  	else if (IS_I830(i915) || IS_I845G(i915))
>  		engine->emit_bb_start = i830_emit_bb_start;
>  	else
> -		engine->emit_bb_start = i915_emit_bb_start;
> +		engine->emit_bb_start = gen3_emit_bb_start;
>  }
>  
>  static void setup_rcs(struct intel_engine_cs *engine)
> @@ -1900,18 +1122,18 @@ static void setup_rcs(struct intel_engine_cs *engine)
>  	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
>  
>  	if (INTEL_GEN(i915) >= 7) {
> -		engine->emit_flush = gen7_render_ring_flush;
> -		engine->emit_fini_breadcrumb = gen7_rcs_emit_breadcrumb;
> +		engine->emit_flush = gen7_emit_flush_rcs;
> +		engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_rcs;
>  	} else if (IS_GEN(i915, 6)) {
> -		engine->emit_flush = gen6_render_ring_flush;
> -		engine->emit_fini_breadcrumb = gen6_rcs_emit_breadcrumb;
> +		engine->emit_flush = gen6_emit_flush_rcs;
> +		engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_rcs;
>  	} else if (IS_GEN(i915, 5)) {
> -		engine->emit_flush = gen4_render_ring_flush;
> +		engine->emit_flush = gen4_emit_flush_rcs;
>  	} else {
>  		if (INTEL_GEN(i915) < 4)
> -			engine->emit_flush = gen2_render_ring_flush;
> +			engine->emit_flush = gen2_emit_flush;
>  		else
> -			engine->emit_flush = gen4_render_ring_flush;
> +			engine->emit_flush = gen4_emit_flush_rcs;
>  		engine->irq_enable_mask = I915_USER_INTERRUPT;
>  	}
>  
> @@ -1929,15 +1151,15 @@ static void setup_vcs(struct intel_engine_cs *engine)
>  		/* gen6 bsd needs a special wa for tail updates */
>  		if (IS_GEN(i915, 6))
>  			engine->set_default_submission = gen6_bsd_set_default_submission;
> -		engine->emit_flush = gen6_bsd_ring_flush;
> +		engine->emit_flush = gen6_emit_flush_vcs;
>  		engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
>  
>  		if (IS_GEN(i915, 6))
> -			engine->emit_fini_breadcrumb = gen6_xcs_emit_breadcrumb;
> +			engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_xcs;
>  		else
> -			engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
> +			engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs;
>  	} else {
> -		engine->emit_flush = bsd_ring_flush;
> +		engine->emit_flush = gen4_emit_flush_vcs;
>  		if (IS_GEN(i915, 5))
>  			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
>  		else
> @@ -1949,13 +1171,13 @@ static void setup_bcs(struct intel_engine_cs *engine)
>  {
>  	struct drm_i915_private *i915 = engine->i915;
>  
> -	engine->emit_flush = gen6_ring_flush;
> +	engine->emit_flush = gen6_emit_flush_xcs;
>  	engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
>  
>  	if (IS_GEN(i915, 6))
> -		engine->emit_fini_breadcrumb = gen6_xcs_emit_breadcrumb;
> +		engine->emit_fini_breadcrumb = gen6_emit_breadcrumb_xcs;
>  	else
> -		engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
> +		engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs;
>  }
>  
>  static void setup_vecs(struct intel_engine_cs *engine)
> @@ -1964,12 +1186,12 @@ static void setup_vecs(struct intel_engine_cs *engine)
>  
>  	GEM_BUG_ON(INTEL_GEN(i915) < 7);
>  
> -	engine->emit_flush = gen6_ring_flush;
> +	engine->emit_flush = gen6_emit_flush_xcs;
>  	engine->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
> -	engine->irq_enable = hsw_vebox_irq_enable;
> -	engine->irq_disable = hsw_vebox_irq_disable;
> +	engine->irq_enable = hsw_irq_enable_vecs;
> +	engine->irq_disable = hsw_irq_disable_vecs;
>  
> -	engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
> +	engine->emit_fini_breadcrumb = gen7_emit_breadcrumb_xcs;
>  }
>  
>  static int gen7_ctx_switch_bb_setup(struct intel_engine_cs * const engine,
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-gfx] [PATCH 03/36] drm/i915/gt: Move legacy context wa to intel_workarounds
  2020-06-01  7:24 ` [Intel-gfx] [PATCH 03/36] drm/i915/gt: Move legacy context wa to intel_workarounds Chris Wilson
@ 2020-06-02  9:05   ` Mika Kuoppala
  0 siblings, 0 replies; 53+ messages in thread
From: Mika Kuoppala @ 2020-06-02  9:05 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Chris Wilson

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Use the central mechanism for recording and verifying that we restore
> the w/a for the older devices as well.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  .../gpu/drm/i915/gt/intel_ring_submission.c   | 28 -----------------
>  drivers/gpu/drm/i915/gt/intel_workarounds.c   | 31 +++++++++++++++++++
>  2 files changed, 31 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 96881cd8b17b..d9c1701061b9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -429,32 +429,6 @@ static void reset_finish(struct intel_engine_cs *engine)
>  {
>  }
>  
> -static int rcs_resume(struct intel_engine_cs *engine)
> -{
> -	struct drm_i915_private *i915 = engine->i915;
> -	struct intel_uncore *uncore = engine->uncore;
> -
> -	/*
> -	 * Disable CONSTANT_BUFFER before it is loaded from the context
> -	 * image. For as it is loaded, it is executed and the stored
> -	 * address may no longer be valid, leading to a GPU hang.
> -	 *
> -	 * This imposes the requirement that userspace reload their
> -	 * CONSTANT_BUFFER on every batch, fortunately a requirement
> -	 * they are already accustomed to from before contexts were
> -	 * enabled.
> -	 */
> -	if (IS_GEN(i915, 4))
> -		intel_uncore_write(uncore, ECOSKPD,
> -			   _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE));
> -
> -	if (IS_GEN_RANGE(i915, 6, 7))
> -		intel_uncore_write(uncore, INSTPM,
> -				   _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
> -
> -	return xcs_resume(engine);
> -}
> -
>  static void reset_cancel(struct intel_engine_cs *engine)
>  {
>  	struct i915_request *request;
> @@ -1139,8 +1113,6 @@ static void setup_rcs(struct intel_engine_cs *engine)
>  
>  	if (IS_HASWELL(i915))
>  		engine->emit_bb_start = hsw_emit_bb_start;
> -
> -	engine->resume = rcs_resume;
>  }
>  
>  static void setup_vcs(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index fa1e15657663..94d66a9d760d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -199,6 +199,18 @@ wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
>  #define WA_SET_FIELD_MASKED(addr, mask, value) \
>  	wa_write_masked_or(wal, (addr), 0, _MASKED_FIELD((mask), (value)))
>  
> +static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine,
> +				      struct i915_wa_list *wal)
> +{
> +	WA_SET_BIT_MASKED(INSTPM, INSTPM_FORCE_ORDERING);
> +}
> +
> +static void gen7_ctx_workarounds_init(struct intel_engine_cs *engine,
> +				      struct i915_wa_list *wal)
> +{
> +	WA_SET_BIT_MASKED(INSTPM, INSTPM_FORCE_ORDERING);
> +}
> +
>  static void gen8_ctx_workarounds_init(struct intel_engine_cs *engine,
>  				      struct i915_wa_list *wal)
>  {
> @@ -638,6 +650,10 @@ __intel_engine_init_ctx_wa(struct intel_engine_cs *engine,
>  		chv_ctx_workarounds_init(engine, wal);
>  	else if (IS_BROADWELL(i915))
>  		bdw_ctx_workarounds_init(engine, wal);
> +	else if (IS_GEN(i915, 7))
> +		gen7_ctx_workarounds_init(engine, wal);
> +	else if (IS_GEN(i915, 6))
> +		gen6_ctx_workarounds_init(engine, wal);
>  	else if (INTEL_GEN(i915) < 8)
>  		return;
>  	else
> @@ -1583,6 +1599,21 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>  		       0, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH),
>  		       /* XXX bit doesn't stick on Broadwater */
>  		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH);
> +
> +	if (IS_GEN(i915, 4))
> +		/*
> +		 * Disable CONSTANT_BUFFER before it is loaded from the context
> +		 * image. For as it is loaded, it is executed and the stored
> +		 * address may no longer be valid, leading to a GPU hang.
> +		 *
> +		 * This imposes the requirement that userspace reload their
> +		 * CONSTANT_BUFFER on every batch, fortunately a requirement
> +		 * they are already accustomed to from before contexts were
> +		 * enabled.
> +		 */
> +		wa_add(wal, ECOSKPD,
> +		       0, _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE),
> +		       0 /* XXX bit doesn't stick on Broadwater */);
>  }
>  
>  static void
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2020-06-02  9:08 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-01  7:24 [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 02/36] drm/i915/gt: Split low level gen2-7 CS emitters Chris Wilson
2020-06-02  9:04   ` Mika Kuoppala
2020-06-01  7:24 ` [Intel-gfx] [PATCH 03/36] drm/i915/gt: Move legacy context wa to intel_workarounds Chris Wilson
2020-06-02  9:05   ` Mika Kuoppala
2020-06-01  7:24 ` [Intel-gfx] [PATCH 04/36] drm/i915: Trim the ironlake+ irq handler Chris Wilson
2020-06-01 11:49   ` Mika Kuoppala
2020-06-01 12:00     ` Chris Wilson
2020-06-01 12:08       ` Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 05/36] Restore "drm/i915: drop engine_pin/unpin_breadcrumbs_irq" Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 06/36] drm/i915/gt: Couple tasklet scheduling for all CS interrupts Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 07/36] drm/i915/gt: Support creation of 'internal' rings Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 08/36] drm/i915/gt: Use client timeline address for seqno writes Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 09/36] drm/i915: Support inter-engine semaphores on gen6/7 Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 10/36] drm/i915/gt: Infrastructure for ring scheduling Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 11/36] drm/i915/gt: Enable busy-stats for ring-scheduler Chris Wilson
2020-06-01  8:03   ` [Intel-gfx] [PATCH] " Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 12/36] drm/i915/gt: Track if an engine requires forcewake w/a Chris Wilson
2020-06-01 12:17   ` Mika Kuoppala
2020-06-01  7:24 ` [Intel-gfx] [PATCH 13/36] drm/i915: Relinquish forcewake immediately after manual grouping Chris Wilson
2020-06-01 12:20   ` Mika Kuoppala
2020-06-01  7:24 ` [Intel-gfx] [PATCH 14/36] drm/i915/gt: Implement ring scheduler for gen6/7 Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 15/36] drm/i915/gt: Enable ring scheduling " Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 16/36] drm/i915/gem: Mark the buffer pool as active for the cmdparser Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 17/36] drm/i915/gem: Async GPU relocations only Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 18/36] drm/i915: Add list_for_each_entry_safe_continue_reverse Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 19/36] drm/i915/gem: Separate reloc validation into an earlier step Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 20/36] drm/i915/gem: Lift GPU relocation allocation Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 21/36] drm/i915/gem: Build the reloc request first Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 22/36] drm/i915/gem: Add all GPU reloc awaits/signals en masse Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 23/36] dma-buf: Proxy fence, an unsignaled fence placeholder Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 24/36] drm/i915: Unpeel awaits on a proxy fence Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 25/36] drm/i915/gem: Make relocations atomic within execbuf Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 26/36] drm/syncobj: Allow use of dma-fence-proxy Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 27/36] drm/i915/gem: Teach execbuf how to wait on future syncobj Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 28/36] drm/i915/gem: Allow combining submit-fences with syncobj Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 29/36] drm/i915/gt: Declare when we enabled timeslicing Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 30/36] drm/i915: Drop I915_IDLE_ENGINES_TIMEOUT Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 31/36] drm/i915: Always defer fenced work to the worker Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 32/36] drm/i915/gem: Assign context id for async work Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 33/36] drm/i915: Export a preallocate variant of i915_active_acquire() Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 34/36] drm/i915/gem: Separate the ww_mutex walker into its own list Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 35/36] drm/i915/gem: Asynchronous GTT unbinding Chris Wilson
2020-06-01  7:24 ` [Intel-gfx] [PATCH 36/36] drm/i915/gem: Bind the fence async for execbuf Chris Wilson
2020-06-01  7:34 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/36] drm/i915: Handle very early engine initialisation failure Patchwork
2020-06-01  7:36 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-06-01  7:56 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2020-06-01  8:30 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/36] drm/i915: Handle very early engine initialisation failure (rev2) Patchwork
2020-06-01  8:31 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-06-01  8:51 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-06-01 11:00 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2020-06-01 11:39   ` Chris Wilson
2020-06-01 11:31 ` [Intel-gfx] [PATCH 01/36] drm/i915: Handle very early engine initialisation failure Mika Kuoppala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).