All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/6] drm/i915/lrc: Clarify the format of the context image
@ 2017-09-13  8:56 Chris Wilson
  2017-09-13  8:56 ` [PATCH 2/6] drm/i915/guc: Don't make assumptions while getting the lrca offset Chris Wilson
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Chris Wilson @ 2017-09-13  8:56 UTC (permalink / raw)
  To: intel-gfx; +Cc: intel-gvt-dev

From: Michel Thierry <michel.thierry@intel.com>

Not only the context image consist of two parts (the PPHWSP, and the
logical context state), but we also allocate a header at the start of
for sharing data with GuC. Thus every lrc looks like this:

  | [guc]          | [hwsp] [logical state] |
  |<- our header ->|<- context image      ->|

So far, we have oversimplified whenever we use each of these parts of the
context, just because the GuC header happens to be in page 0, and the
(PP)HWSP is in page 1. But this had led to using the same define for more
than one meaning (as a page index in the lrc and as 1 page).

This patch adds defines for the GuC shared page, the PPHWSP page and the
start of the logical state. It also updated the places where the old
define was being used. Since we are not changing the size (or format) of
the context, there are no functional changes.

v2: Use PPHWSP index for hws again.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-1-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gvt/scheduler.c       |  4 ++--
 drivers/gpu/drm/i915/i915_guc_submission.c |  4 ++--
 drivers/gpu/drm/i915/intel_lrc.c           |  9 ++++++---
 drivers/gpu/drm/i915/intel_lrc.h           | 25 ++++++++++++++++++++++---
 4 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index 6fb9b589276d..d5892d24f0b6 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -87,7 +87,7 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 			return -EINVAL;
 		}
 
-		page = i915_gem_object_get_page(ctx_obj, LRC_PPHWSP_PN + i);
+		page = i915_gem_object_get_page(ctx_obj, LRC_HEADER_PAGES + i);
 		dst = kmap(page);
 		intel_gvt_hypervisor_read_gpa(vgpu, context_gpa, dst,
 				GTT_PAGE_SIZE);
@@ -454,7 +454,7 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
 			return;
 		}
 
-		page = i915_gem_object_get_page(ctx_obj, LRC_PPHWSP_PN + i);
+		page = i915_gem_object_get_page(ctx_obj, LRC_HEADER_PAGES + i);
 		src = kmap(page);
 		intel_gvt_hypervisor_write_gpa(vgpu, context_gpa, src,
 				GTT_PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 48a1e9349a2c..b7ca13860677 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1310,7 +1310,7 @@ int intel_guc_suspend(struct drm_i915_private *dev_priv)
 	/* any value greater than GUC_POWER_D0 */
 	data[1] = GUC_POWER_D1;
 	/* first page is shared data with GuC */
-	data[2] = guc_ggtt_offset(ctx->engine[RCS].state);
+	data[2] = guc_ggtt_offset(ctx->engine[RCS].state) + LRC_GUCSHR_PN * PAGE_SIZE;
 
 	return intel_guc_send(guc, data, ARRAY_SIZE(data));
 }
@@ -1336,7 +1336,7 @@ int intel_guc_resume(struct drm_i915_private *dev_priv)
 	data[0] = INTEL_GUC_ACTION_EXIT_S_STATE;
 	data[1] = GUC_POWER_D0;
 	/* first page is shared data with GuC */
-	data[2] = guc_ggtt_offset(ctx->engine[RCS].state);
+	data[2] = guc_ggtt_offset(ctx->engine[RCS].state) + LRC_GUCSHR_PN * PAGE_SIZE;
 
 	return intel_guc_send(guc, data, ARRAY_SIZE(data));
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d89e1b8e1cc5..0c19aa7016bc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -279,7 +279,7 @@ intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
 	BUILD_BUG_ON(MAX_CONTEXT_HW_ID > (1<<GEN8_CTX_ID_WIDTH));
 
 	desc = ctx->desc_template;				/* bits  0-11 */
-	desc |= i915_ggtt_offset(ce->state) + LRC_PPHWSP_PN * PAGE_SIZE;
+	desc |= i915_ggtt_offset(ce->state) + LRC_HEADER_PAGES * PAGE_SIZE;
 								/* bits 12-31 */
 	desc |= (u64)ctx->hw_id << GEN8_CTX_ID_SHIFT;		/* bits 32-52 */
 
@@ -2054,8 +2054,11 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 
 	context_size = round_up(engine->context_size, I915_GTT_PAGE_SIZE);
 
-	/* One extra page as the sharing data between driver and GuC */
-	context_size += PAGE_SIZE * LRC_PPHWSP_PN;
+	/*
+	 * Before the actual start of the context image, we insert a few pages
+	 * for our own use and for sharing with the GuC.
+	 */
+	context_size += LRC_HEADER_PAGES * PAGE_SIZE;
 
 	ctx_obj = i915_gem_object_create(ctx->i915, context_size);
 	if (IS_ERR(ctx_obj)) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 57ef5833c427..2c35131c3c0e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -69,10 +69,29 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine);
 
 /* Logical Ring Contexts */
 
-/* One extra page is added before LRC for GuC as shared data */
+/*
+ * We allocate a header at the start of the context image for our own
+ * use, therefore the actual location of the logical state is offset
+ * from the start of the VMA. The layout is
+ *
+ * | [guc]          | [hwsp] [logical state] |
+ * |<- our header ->|<- context image      ->|
+ *
+ */
+/* The first page is used for sharing data with the GuC */
 #define LRC_GUCSHR_PN	(0)
-#define LRC_PPHWSP_PN	(LRC_GUCSHR_PN + 1)
-#define LRC_STATE_PN	(LRC_PPHWSP_PN + 1)
+#define LRC_GUCSHR_SZ	(1)
+/* At the start of the context image is its per-process HWS page */
+#define LRC_PPHWSP_PN	(LRC_GUCSHR_PN + LRC_GUCSHR_SZ)
+#define LRC_PPHWSP_SZ	(1)
+/* Finally we have the logical state for the context */
+#define LRC_STATE_PN	(LRC_PPHWSP_PN + LRC_PPHWSP_SZ)
+
+/*
+ * Currently we include the PPHWSP in __intel_engine_context_size() so
+ * the size of the header is synonymous with the start of the PPHWSP.
+ */
+#define LRC_HEADER_PAGES LRC_PPHWSP_PN
 
 struct drm_i915_private;
 struct i915_gem_context;
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] drm/i915/guc: Don't make assumptions while getting the lrca offset
  2017-09-13  8:56 [PATCH 1/6] drm/i915/lrc: Clarify the format of the context image Chris Wilson
@ 2017-09-13  8:56 ` Chris Wilson
  2017-09-13  8:56 ` [PATCH 3/6] drm/i915/lrc: allocate separate page for HWSP Chris Wilson
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2017-09-13  8:56 UTC (permalink / raw)
  To: intel-gfx

From: Michel Thierry <michel.thierry@intel.com>

Using the HWSP ggtt_offset to get the lrca offset is only correct if the
HWSP happens to be before it (when we reuse the PPHWSP of the kernel
context as the engine HWSP). Instead of making this assumption, get the
lrca offset from the kernel_context engine state.

And while looking at this part of the GuC interaction, it was also
noticed that the firmware expects the size of only the engine context
(context minus the execlist part, i.e. don't include the first 80
dwords), so pass the right size.

v2: Use the new macros to prevent abusive overuse of the old ones (Chris).

Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-2-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index b7ca13860677..8b96935cf96a 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1018,6 +1018,12 @@ static void guc_policies_init(struct guc_policies *policies)
 	policies->is_valid = 1;
 }
 
+/*
+ * The first 80 dwords of the register state context, containing the
+ * execlists and ppgtt registers.
+ */
+#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
+
 static int guc_ads_create(struct intel_guc *guc)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
@@ -1032,6 +1038,8 @@ static int guc_ads_create(struct intel_guc *guc)
 	} __packed *blob;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
+	const u32 skipped_offset = LRC_HEADER_PAGES * PAGE_SIZE;
+	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
 	u32 base;
 
 	GEM_BUG_ON(guc->ads_vma);
@@ -1062,13 +1070,20 @@ static int guc_ads_create(struct intel_guc *guc)
 	 * engines after a reset. Here we use the Render ring default
 	 * context, which must already exist and be pinned in the GGTT,
 	 * so its address won't change after we've told the GuC where
-	 * to find it.
+	 * to find it. Note that we have to skip our header (1 page),
+	 * because our GuC shared data is there.
 	 */
 	blob->ads.golden_context_lrca =
-		dev_priv->engine[RCS]->status_page.ggtt_offset;
+		guc_ggtt_offset(dev_priv->kernel_context->engine[RCS].state) + skipped_offset;
 
+	/*
+	 * The GuC expects us to exclude the portion of the context image that
+	 * it skips from the size it is to read. It starts reading from after
+	 * the execlist context (so skipping the first page [PPHWSP] and 80
+	 * dwords). Weird guc is weird.
+	 */
 	for_each_engine(engine, dev_priv, id)
-		blob->ads.eng_state_size[engine->guc_id] = engine->context_size;
+		blob->ads.eng_state_size[engine->guc_id] = engine->context_size - skipped_size;
 
 	base = guc_ggtt_offset(vma);
 	blob->ads.scheduler_policies = base + ptr_offset(blob, policies);
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] drm/i915/lrc: allocate separate page for HWSP
  2017-09-13  8:56 [PATCH 1/6] drm/i915/lrc: Clarify the format of the context image Chris Wilson
  2017-09-13  8:56 ` [PATCH 2/6] drm/i915/guc: Don't make assumptions while getting the lrca offset Chris Wilson
@ 2017-09-13  8:56 ` Chris Wilson
  2017-09-13  8:56 ` [PATCH 4/6] drm/i915: Allow HW status page to be bound high Chris Wilson
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2017-09-13  8:56 UTC (permalink / raw)
  To: intel-gfx

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

On gen8+ we're currently using the PPHWSP of the kernel ctx as the
global HWSP. However, when the kernel ctx gets submitted (e.g. from
__intel_autoenable_gt_powersave) the HW will use that page as both
HWSP and PPHWSP. This causes a conflict in the register arena of the
HWSP, i.e. dword indices below 0x30. We don't current utilize this arena,
but in the following patches we will take advantage of the cached
register state for handling execlist's context status interrupt.

To avoid the conflict, instead of re-using the PPHWSP of the kernel
ctx we can allocate a separate page for the HWSP like what happens for
pre-execlists platform.

v2: Add a use-case for the register arena of the HWSP.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1499357440-34688-1-git-send-email-daniele.ceraolospurio@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  | 126 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.c        |  34 +--------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 125 +------------------------------
 3 files changed, 127 insertions(+), 158 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 3ae89a9d6241..8a5535ad6552 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -442,6 +442,114 @@ static void intel_engine_cleanup_scratch(struct intel_engine_cs *engine)
 	i915_vma_unpin_and_release(&engine->scratch);
 }
 
+static void cleanup_phys_status_page(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+
+	if (!dev_priv->status_page_dmah)
+		return;
+
+	drm_pci_free(&dev_priv->drm, dev_priv->status_page_dmah);
+	engine->status_page.page_addr = NULL;
+}
+
+static void cleanup_status_page(struct intel_engine_cs *engine)
+{
+	struct i915_vma *vma;
+	struct drm_i915_gem_object *obj;
+
+	vma = fetch_and_zero(&engine->status_page.vma);
+	if (!vma)
+		return;
+
+	obj = vma->obj;
+
+	i915_vma_unpin(vma);
+	i915_vma_close(vma);
+
+	i915_gem_object_unpin_map(obj);
+	__i915_gem_object_release_unless_active(obj);
+}
+
+static int init_status_page(struct intel_engine_cs *engine)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	unsigned int flags;
+	void *vaddr;
+	int ret;
+
+	obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
+	if (IS_ERR(obj)) {
+		DRM_ERROR("Failed to allocate status page\n");
+		return PTR_ERR(obj);
+	}
+
+	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+	if (ret)
+		goto err;
+
+	vma = i915_vma_instance(obj, &engine->i915->ggtt.base, NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto err;
+	}
+
+	flags = PIN_GLOBAL;
+	if (!HAS_LLC(engine->i915))
+		/* On g33, we cannot place HWS above 256MiB, so
+		 * restrict its pinning to the low mappable arena.
+		 * Though this restriction is not documented for
+		 * gen4, gen5, or byt, they also behave similarly
+		 * and hang if the HWS is placed at the top of the
+		 * GTT. To generalise, it appears that all !llc
+		 * platforms have issues with us placing the HWS
+		 * above the mappable region (even though we never
+		 * actually map it).
+		 */
+		flags |= PIN_MAPPABLE;
+	ret = i915_vma_pin(vma, 0, 4096, flags);
+	if (ret)
+		goto err;
+
+	vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
+	if (IS_ERR(vaddr)) {
+		ret = PTR_ERR(vaddr);
+		goto err_unpin;
+	}
+
+	engine->status_page.vma = vma;
+	engine->status_page.ggtt_offset = i915_ggtt_offset(vma);
+	engine->status_page.page_addr = memset(vaddr, 0, PAGE_SIZE);
+
+	DRM_DEBUG_DRIVER("%s hws offset: 0x%08x\n",
+			 engine->name, i915_ggtt_offset(vma));
+	return 0;
+
+err_unpin:
+	i915_vma_unpin(vma);
+err:
+	i915_gem_object_put(obj);
+	return ret;
+}
+
+static int init_phys_status_page(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+
+	GEM_BUG_ON(engine->id != RCS);
+
+	dev_priv->status_page_dmah =
+		drm_pci_alloc(&dev_priv->drm, PAGE_SIZE, PAGE_SIZE);
+	if (!dev_priv->status_page_dmah)
+		return -ENOMEM;
+
+	engine->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
+	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
+
+	return 0;
+}
+
 /**
  * intel_engines_init_common - initialize cengine state which might require hw access
  * @engine: Engine to initialize.
@@ -477,10 +585,21 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 
 	ret = i915_gem_render_state_init(engine);
 	if (ret)
-		goto err_unpin;
+		goto err_breadcrumbs;
+
+	if (HWS_NEEDS_PHYSICAL(engine->i915))
+		ret = init_phys_status_page(engine);
+	else
+		ret = init_status_page(engine);
+	if (ret)
+		goto err_rs_fini;
 
 	return 0;
 
+err_rs_fini:
+	i915_gem_render_state_fini(engine);
+err_breadcrumbs:
+	intel_engine_fini_breadcrumbs(engine);
 err_unpin:
 	engine->context_unpin(engine, engine->i915->kernel_context);
 	return ret;
@@ -497,6 +616,11 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
 	intel_engine_cleanup_scratch(engine);
 
+	if (HWS_NEEDS_PHYSICAL(engine->i915))
+		cleanup_phys_status_page(engine);
+	else
+		cleanup_status_page(engine);
+
 	i915_gem_render_state_fini(engine);
 	intel_engine_fini_breadcrumbs(engine);
 	intel_engine_cleanup_cmd_parser(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0c19aa7016bc..aa5534213190 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1680,11 +1680,6 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
 	if (engine->cleanup)
 		engine->cleanup(engine);
 
-	if (engine->status_page.vma) {
-		i915_gem_object_unpin_map(engine->status_page.vma->obj);
-		engine->status_page.vma = NULL;
-	}
-
 	intel_engine_cleanup_common(engine);
 
 	lrc_destroy_wa_ctx(engine);
@@ -1731,24 +1726,6 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
 	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
 }
 
-static int
-lrc_setup_hws(struct intel_engine_cs *engine, struct i915_vma *vma)
-{
-	const int hws_offset = LRC_PPHWSP_PN * PAGE_SIZE;
-	void *hws;
-
-	/* The HWSP is part of the default context object in LRC mode. */
-	hws = i915_gem_object_pin_map(vma->obj, I915_MAP_WB);
-	if (IS_ERR(hws))
-		return PTR_ERR(hws);
-
-	engine->status_page.page_addr = hws + hws_offset;
-	engine->status_page.ggtt_offset = i915_ggtt_offset(vma) + hws_offset;
-	engine->status_page.vma = vma;
-
-	return 0;
-}
-
 static void
 logical_ring_setup(struct intel_engine_cs *engine)
 {
@@ -1781,23 +1758,14 @@ logical_ring_setup(struct intel_engine_cs *engine)
 	logical_ring_default_irqs(engine);
 }
 
-static int
-logical_ring_init(struct intel_engine_cs *engine)
+static int logical_ring_init(struct intel_engine_cs *engine)
 {
-	struct i915_gem_context *dctx = engine->i915->kernel_context;
 	int ret;
 
 	ret = intel_engine_init_common(engine);
 	if (ret)
 		goto error;
 
-	/* And setup the hardware status page. */
-	ret = lrc_setup_hws(engine, dctx->engine[engine->id].state);
-	if (ret) {
-		DRM_ERROR("Failed to set up hws %s: %d\n", engine->name, ret);
-		goto error;
-	}
-
 	return 0;
 
 error:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 268342433a8e..8af8871a8594 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1175,113 +1175,7 @@ i915_emit_bb_start(struct drm_i915_gem_request *req,
 	return 0;
 }
 
-static void cleanup_phys_status_page(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *dev_priv = engine->i915;
-
-	if (!dev_priv->status_page_dmah)
-		return;
-
-	drm_pci_free(&dev_priv->drm, dev_priv->status_page_dmah);
-	engine->status_page.page_addr = NULL;
-}
-
-static void cleanup_status_page(struct intel_engine_cs *engine)
-{
-	struct i915_vma *vma;
-	struct drm_i915_gem_object *obj;
-
-	vma = fetch_and_zero(&engine->status_page.vma);
-	if (!vma)
-		return;
-
-	obj = vma->obj;
-
-	i915_vma_unpin(vma);
-	i915_vma_close(vma);
-
-	i915_gem_object_unpin_map(obj);
-	__i915_gem_object_release_unless_active(obj);
-}
 
-static int init_status_page(struct intel_engine_cs *engine)
-{
-	struct drm_i915_gem_object *obj;
-	struct i915_vma *vma;
-	unsigned int flags;
-	void *vaddr;
-	int ret;
-
-	obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
-	if (IS_ERR(obj)) {
-		DRM_ERROR("Failed to allocate status page\n");
-		return PTR_ERR(obj);
-	}
-
-	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
-	if (ret)
-		goto err;
-
-	vma = i915_vma_instance(obj, &engine->i915->ggtt.base, NULL);
-	if (IS_ERR(vma)) {
-		ret = PTR_ERR(vma);
-		goto err;
-	}
-
-	flags = PIN_GLOBAL;
-	if (!HAS_LLC(engine->i915))
-		/* On g33, we cannot place HWS above 256MiB, so
-		 * restrict its pinning to the low mappable arena.
-		 * Though this restriction is not documented for
-		 * gen4, gen5, or byt, they also behave similarly
-		 * and hang if the HWS is placed at the top of the
-		 * GTT. To generalise, it appears that all !llc
-		 * platforms have issues with us placing the HWS
-		 * above the mappable region (even though we never
-		 * actualy map it).
-		 */
-		flags |= PIN_MAPPABLE;
-	ret = i915_vma_pin(vma, 0, 4096, flags);
-	if (ret)
-		goto err;
-
-	vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
-	if (IS_ERR(vaddr)) {
-		ret = PTR_ERR(vaddr);
-		goto err_unpin;
-	}
-
-	engine->status_page.vma = vma;
-	engine->status_page.ggtt_offset = i915_ggtt_offset(vma);
-	engine->status_page.page_addr = memset(vaddr, 0, PAGE_SIZE);
-
-	DRM_DEBUG_DRIVER("%s hws offset: 0x%08x\n",
-			 engine->name, i915_ggtt_offset(vma));
-	return 0;
-
-err_unpin:
-	i915_vma_unpin(vma);
-err:
-	i915_gem_object_put(obj);
-	return ret;
-}
-
-static int init_phys_status_page(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *dev_priv = engine->i915;
-
-	GEM_BUG_ON(engine->id != RCS);
-
-	dev_priv->status_page_dmah =
-		drm_pci_alloc(&dev_priv->drm, PAGE_SIZE, PAGE_SIZE);
-	if (!dev_priv->status_page_dmah)
-		return -ENOMEM;
-
-	engine->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
-	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
-
-	return 0;
-}
 
 int intel_ring_pin(struct intel_ring *ring,
 		   struct drm_i915_private *i915,
@@ -1568,17 +1462,10 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 	if (err)
 		goto err;
 
-	if (HWS_NEEDS_PHYSICAL(engine->i915))
-		err = init_phys_status_page(engine);
-	else
-		err = init_status_page(engine);
-	if (err)
-		goto err;
-
 	ring = intel_engine_create_ring(engine, 32 * PAGE_SIZE);
 	if (IS_ERR(ring)) {
 		err = PTR_ERR(ring);
-		goto err_hws;
+		goto err;
 	}
 
 	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
@@ -1593,11 +1480,6 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 
 err_ring:
 	intel_ring_free(ring);
-err_hws:
-	if (HWS_NEEDS_PHYSICAL(engine->i915))
-		cleanup_phys_status_page(engine);
-	else
-		cleanup_status_page(engine);
 err:
 	intel_engine_cleanup_common(engine);
 	return err;
@@ -1616,11 +1498,6 @@ void intel_engine_cleanup(struct intel_engine_cs *engine)
 	if (engine->cleanup)
 		engine->cleanup(engine);
 
-	if (HWS_NEEDS_PHYSICAL(dev_priv))
-		cleanup_phys_status_page(engine);
-	else
-		cleanup_status_page(engine);
-
 	intel_engine_cleanup_common(engine);
 
 	dev_priv->engine[engine->id] = NULL;
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] drm/i915: Allow HW status page to be bound high
  2017-09-13  8:56 [PATCH 1/6] drm/i915/lrc: Clarify the format of the context image Chris Wilson
  2017-09-13  8:56 ` [PATCH 2/6] drm/i915/guc: Don't make assumptions while getting the lrca offset Chris Wilson
  2017-09-13  8:56 ` [PATCH 3/6] drm/i915/lrc: allocate separate page for HWSP Chris Wilson
@ 2017-09-13  8:56 ` Chris Wilson
  2017-09-13  8:56 ` [PATCH 5/6] drm/i915/execlists: Read the context-status buffer from the HWSP Chris Wilson
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2017-09-13  8:56 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

At the time of commit 1f767e02d69f ("drm/i915: HWS must be in the
mappable region for g33"), drm_mm insertion would often default to
placing a new object high in the zone forcing us to specify that certain
HWSP must be bound within the low mappable region. Since then, drm_mm
has gained more finesse over its placement and exposes that to the
caller, commit 4e64e5539d15 ("drm: Improve drm_mm search (and fix
topdown allocation) with rbtrees"). As such where possible we want the
HWSP to be outside of the mappable aperture and so need to specify that
can be pinned high.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 8a5535ad6552..32d35f32d289 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -508,6 +508,8 @@ static int init_status_page(struct intel_engine_cs *engine)
 		 * actually map it).
 		 */
 		flags |= PIN_MAPPABLE;
+	else
+		flags |= PIN_HIGH;
 	ret = i915_vma_pin(vma, 0, 4096, flags);
 	if (ret)
 		goto err;
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] drm/i915/execlists: Read the context-status buffer from the HWSP
  2017-09-13  8:56 [PATCH 1/6] drm/i915/lrc: Clarify the format of the context image Chris Wilson
                   ` (2 preceding siblings ...)
  2017-09-13  8:56 ` [PATCH 4/6] drm/i915: Allow HW status page to be bound high Chris Wilson
@ 2017-09-13  8:56 ` Chris Wilson
  2017-09-13 13:35   ` [PATCH v7] " Chris Wilson
  2017-09-13 13:57   ` [PATCH 5/6] " Mika Kuoppala
  2017-09-13  8:56 ` [PATCH 6/6] drm/i915/execlists: Read the context-status HEAD " Chris Wilson
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 16+ messages in thread
From: Chris Wilson @ 2017-09-13  8:56 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

The engine provides a mirror of the CSB in the HWSP. If we use the
cacheable reads from the HWSP, we can shave off a few mmio reads per
context-switch interrupt (which are quite frequent!). Just removing a
couple of mmio is not enough to actually reduce any latency, but a small
reduction in overall cpu usage.

Much appreciation for Ben dropping the bombshell that the CSB was in the
HWSP and for Michel in digging out the details.

v2: Don't be lazy, add the defines for the indices.
v3: Include the HWSP in debugfs/i915_engine_info
v4: Check for GVT-g, it currently depends on intercepting CSB mmio
v5: Fixup GVT-g mmio path
v6: Disable HWSP if VT-d is active as the iommu adds unpredictable
memory latency. (Mika)

Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  7 +++++--
 drivers/gpu/drm/i915/intel_lrc.c        | 34 ++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6338018f655d..7062cde94a49 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3315,6 +3315,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			   upper_32_bits(addr), lower_32_bits(addr));
 
 		if (i915.enable_execlists) {
+			const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 			u32 ptr, read, write;
 			unsigned int idx;
 
@@ -3337,10 +3338,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 				write += GEN8_CSB_ENTRIES;
 			while (read < write) {
 				idx = ++read % GEN8_CSB_ENTRIES;
-				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n",
+				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x [0x%08x in hwsp], context: %d [%d in hwsp]\n",
 					   idx,
 					   I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
-					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)));
+					   hws[idx * 2],
+					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)),
+					   hws[idx * 2 + 1]);
 			}
 
 			rcu_read_lock();
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index aa5534213190..8e4b21a18554 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -547,10 +547,17 @@ static void intel_lrc_irq_handler(unsigned long data)
 	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
 		u32 __iomem *csb_mmio =
 			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
-		u32 __iomem *buf =
-			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
+		/* The HWSP contains a (cacheable) mirror of the CSB */
+		const u32 *buf =
+			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 		unsigned int head, tail;
 
+		/* However GVT emulation depends upon intercepting CSB mmio */
+		if (unlikely(engine->csb_use_mmio)) {
+			buf = (u32 * __force)
+				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+		}
+
 		/* The write will be ordered by the uncached read (itself
 		 * a memory barrier), so we do not need another in the form
 		 * of a locked instruction. The race between the interrupt
@@ -590,13 +597,12 @@ static void intel_lrc_irq_handler(unsigned long data)
 			 * status notifier.
 			 */
 
-			status = readl(buf + 2 * head);
+			status = buf[2 * head];
 			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
 				continue;
 
 			/* Check the context/desc id for this event matches */
-			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
-					 port->context_id);
+			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
 
 			rq = port_unpack(port, &count);
 			GEM_BUG_ON(count == 0);
@@ -1726,6 +1732,22 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
 	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
 }
 
+static bool irq_handler_force_mmio(struct drm_i915_private *i915)
+{
+	/* GVT emulation depends upon intercepting CSB mmio */
+	if (intel_vgpu_active(i915))
+		return false;
+
+	/*
+	 * IOMMU adds unpredictable latency causing the CSB write to only
+	 * be visible after the interrupt (missed breadcrumb syndrome).
+	 */
+	if (intel_vtd_active())
+		return false;
+
+	return true;
+}
+
 static void
 logical_ring_setup(struct intel_engine_cs *engine)
 {
@@ -1737,6 +1759,8 @@ logical_ring_setup(struct intel_engine_cs *engine)
 	/* Intentionally left blank. */
 	engine->buffer = NULL;
 
+	engine->csb_use_mmio = irq_handler_force_mmio(dev_priv);
+
 	fw_domains = intel_uncore_forcewake_for_reg(dev_priv,
 						    RING_ELSP(engine),
 						    FW_REG_WRITE);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 79c0021f3700..5c055b62966d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -391,6 +391,7 @@ struct intel_engine_cs {
 	struct rb_root execlist_queue;
 	struct rb_node *execlist_first;
 	unsigned int fw_domains;
+	bool csb_use_mmio;
 
 	/* Contexts are pinned whilst they are active on the GPU. The last
 	 * context executed remains active whilst the GPU is idle - the
@@ -496,6 +497,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_SCRATCH_INDEX	0x40
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
+#define I915_HWS_CSB_BUF0_INDEX		0x10
+
 struct intel_ring *
 intel_engine_create_ring(struct intel_engine_cs *engine, int size);
 int intel_ring_pin(struct intel_ring *ring,
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] drm/i915/execlists: Read the context-status HEAD from the HWSP
  2017-09-13  8:56 [PATCH 1/6] drm/i915/lrc: Clarify the format of the context image Chris Wilson
                   ` (3 preceding siblings ...)
  2017-09-13  8:56 ` [PATCH 5/6] drm/i915/execlists: Read the context-status buffer from the HWSP Chris Wilson
@ 2017-09-13  8:56 ` Chris Wilson
  2017-09-13 14:12   ` Mika Kuoppala
  2017-09-13  9:51 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915/lrc: Clarify the format of the context image Patchwork
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2017-09-13  8:56 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

The engine also provides a mirror of the CSB write pointer in the HWSP,
but not of our read pointer. To take advantage of this we need to
remember where we read up to on the last interrupt and continue off from
there. This poses a problem following a reset, as we don't know where
the hw will start writing from, and due to the use of power contexts we
cannot perform that query during the reset itself. So we continue the
current modus operandi of delaying the first read of the context-status
read/write pointers until after the first interrupt. With this we should
now have eliminated all uncached mmio reads in handling the
context-status interrupt, though we still have the uncached mmio writes
for submitting new work, and many uncached mmio reads in the global
interrupt handler itself. Still a step in the right direction towards
reducing our resubmit latency, although it appears lost in the noise!

v2: Cannonlake moved the CSB write index
v3: Include the sw/hwsp state in debugfs/i915_engine_info
v4: Also revert to using CSB mmio for GVT-g
v5: Prevent the compiler reloading tail (Mika)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  6 ++++--
 drivers/gpu/drm/i915/i915_drv.h         |  8 ++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 27 ++++++++++++++++++++-------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
 4 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 7062cde94a49..12381045ed6a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3326,8 +3326,10 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			ptr = I915_READ(RING_CONTEXT_STATUS_PTR(engine));
 			read = GEN8_CSB_READ_PTR(ptr);
 			write = GEN8_CSB_WRITE_PTR(ptr);
-			seq_printf(m, "\tExeclist CSB read %d, write %d, interrupt posted? %s\n",
-				   read, write,
+			seq_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws], interrupt posted? %s\n",
+				   read, engine->csb_head,
+				   write,
+				   intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)),
 				   yesno(test_bit(ENGINE_IRQ_EXECLIST,
 						  &engine->irq_posted)));
 			if (read >= GEN8_CSB_ENTRIES)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1cc31a5b049f..78195b7c64db 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -4399,4 +4399,12 @@ int remap_io_mapping(struct vm_area_struct *vma,
 		     unsigned long addr, unsigned long pfn, unsigned long size,
 		     struct io_mapping *iomap);
 
+static inline int intel_hws_csb_write_index(struct drm_i915_private *i915)
+{
+	if (INTEL_GEN(i915) >= 10)
+		return CNL_HWS_CSB_WRITE_INDEX;
+	else
+		return I915_HWS_CSB_WRITE_INDEX;
+}
+
 #endif
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8e4b21a18554..55d7eee21226 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -545,8 +545,6 @@ static void intel_lrc_irq_handler(unsigned long data)
 	 * new request (outside of the context-switch interrupt).
 	 */
 	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
-		u32 __iomem *csb_mmio =
-			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
 		/* The HWSP contains a (cacheable) mirror of the CSB */
 		const u32 *buf =
 			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
@@ -556,6 +554,7 @@ static void intel_lrc_irq_handler(unsigned long data)
 		if (unlikely(engine->csb_use_mmio)) {
 			buf = (u32 * __force)
 				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+			engine->csb_head = -1; /* force mmio read of CSB ptrs */
 		}
 
 		/* The write will be ordered by the uncached read (itself
@@ -569,9 +568,19 @@ static void intel_lrc_irq_handler(unsigned long data)
 		 * is set and we do a new loop.
 		 */
 		__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-		head = readl(csb_mmio);
-		tail = GEN8_CSB_WRITE_PTR(head);
-		head = GEN8_CSB_READ_PTR(head);
+		if (unlikely(engine->csb_head == -1)) { /* following a reset */
+			head = readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+			tail = GEN8_CSB_WRITE_PTR(head);
+			head = GEN8_CSB_READ_PTR(head);
+			engine->csb_head = head;
+		} else {
+			const int write_idx =
+				intel_hws_csb_write_index(dev_priv) -
+				I915_HWS_CSB_BUF0_INDEX;
+
+			head = engine->csb_head;
+			tail = READ_ONCE(buf[write_idx]);
+		}
 		while (head != tail) {
 			struct drm_i915_gem_request *rq;
 			unsigned int status;
@@ -625,8 +634,11 @@ static void intel_lrc_irq_handler(unsigned long data)
 				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
 		}
 
-		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
-		       csb_mmio);
+		if (head != engine->csb_head) {
+			engine->csb_head = head;
+			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
+			       dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+		}
 	}
 
 	if (execlists_elsp_ready(engine))
@@ -1275,6 +1287,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
 	I915_WRITE(GEN8_GT_IIR(gtiir[engine->id]),
 		   GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift);
 	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+	engine->csb_head = -1;
 
 	/* After a GPU reset, we may have requests to replay */
 	submit = false;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 5c055b62966d..4d63a2c0b2e1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -392,6 +392,7 @@ struct intel_engine_cs {
 	struct rb_node *execlist_first;
 	unsigned int fw_domains;
 	bool csb_use_mmio;
+	unsigned int csb_head;
 
 	/* Contexts are pinned whilst they are active on the GPU. The last
 	 * context executed remains active whilst the GPU is idle - the
@@ -498,6 +499,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
 #define I915_HWS_CSB_BUF0_INDEX		0x10
+#define I915_HWS_CSB_WRITE_INDEX	0x1f
+#define CNL_HWS_CSB_WRITE_INDEX		0x2f
 
 struct intel_ring *
 intel_engine_create_ring(struct intel_engine_cs *engine, int size);
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915/lrc: Clarify the format of the context image
  2017-09-13  8:56 [PATCH 1/6] drm/i915/lrc: Clarify the format of the context image Chris Wilson
                   ` (4 preceding siblings ...)
  2017-09-13  8:56 ` [PATCH 6/6] drm/i915/execlists: Read the context-status HEAD " Chris Wilson
@ 2017-09-13  9:51 ` Patchwork
  2017-09-13 13:37 ` ✓ Fi.CI.IGT: " Patchwork
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2017-09-13  9:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/6] drm/i915/lrc: Clarify the format of the context image
URL   : https://patchwork.freedesktop.org/series/30269/
State : success

== Summary ==

Series 30269v1 series starting with [1/6] drm/i915/lrc: Clarify the format of the context image
https://patchwork.freedesktop.org/api/1.0/series/30269/revisions/1/mbox/

Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-atomic:
                fail       -> PASS       (fi-snb-2600) fdo#100215 +1
Test pm_rpm:
        Subgroup basic-rte:
                dmesg-warn -> PASS       (fi-cfl-s) fdo#102294

fdo#100215 https://bugs.freedesktop.org/show_bug.cgi?id=100215
fdo#102294 https://bugs.freedesktop.org/show_bug.cgi?id=102294

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:444s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:451s
fi-blb-e6850     total:289  pass:224  dwarn:1   dfail:0   fail:0   skip:64  time:377s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:538s
fi-bwr-2160      total:289  pass:184  dwarn:0   dfail:0   fail:0   skip:105 time:271s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:511s
fi-byt-j1900     total:289  pass:254  dwarn:1   dfail:0   fail:0   skip:34  time:502s
fi-byt-n2820     total:289  pass:250  dwarn:1   dfail:0   fail:0   skip:38  time:500s
fi-cfl-s         total:289  pass:223  dwarn:34  dfail:0   fail:0   skip:32  time:555s
fi-elk-e7500     total:289  pass:230  dwarn:0   dfail:0   fail:0   skip:59  time:451s
fi-glk-2a        total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:600s
fi-hsw-4770      total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:427s
fi-hsw-4770r     total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:409s
fi-ilk-650       total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:437s
fi-ivb-3520m     total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:486s
fi-ivb-3770      total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:464s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:492s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:578s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:585s
fi-pnv-d510      total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:549s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:463s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:515s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:502s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:458s
fi-skl-x1585l    total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:471s
fi-snb-2520m     total:289  pass:251  dwarn:0   dfail:0   fail:0   skip:38  time:576s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:1   skip:39  time:427s

3a1a0bacb6e4e1dceb233c927d859e33b62dabaa drm-tip: 2017y-09m-13d-08h-04m-49s UTC integration manifest
d30c7ae9d7d1 drm/i915/execlists: Read the context-status HEAD from the HWSP
b3077eb1a08f drm/i915/execlists: Read the context-status buffer from the HWSP
956bfedc5df9 drm/i915: Allow HW status page to be bound high
271063c346a2 drm/i915/lrc: allocate separate page for HWSP
d2cc2e561722 drm/i915/guc: Don't make assumptions while getting the lrca offset
b616c24a034b drm/i915/lrc: Clarify the format of the context image

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5676/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v7] drm/i915/execlists: Read the context-status buffer from the HWSP
  2017-09-13  8:56 ` [PATCH 5/6] drm/i915/execlists: Read the context-status buffer from the HWSP Chris Wilson
@ 2017-09-13 13:35   ` Chris Wilson
  2017-09-13 14:02     ` Mika Kuoppala
  2017-09-13 13:57   ` [PATCH 5/6] " Mika Kuoppala
  1 sibling, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2017-09-13 13:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

The engine provides a mirror of the CSB in the HWSP. If we use the
cacheable reads from the HWSP, we can shave off a few mmio reads per
context-switch interrupt (which are quite frequent!). Just removing a
couple of mmio is not enough to actually reduce any latency, but a small
reduction in overall cpu usage.

Much appreciation for Ben dropping the bombshell that the CSB was in the
HWSP and for Michel in digging out the details.

v2: Don't be lazy, add the defines for the indices.
v3: Include the HWSP in debugfs/i915_engine_info
v4: Check for GVT-g, it currently depends on intercepting CSB mmio
v5: Fixup GVT-g mmio path
v6: Disable HWSP if VT-d is active as the iommu adds unpredictable
memory latency. (Mika)
v7: Also markup the CSB read with READ_ONCE() as it may still be an mmio
read and we want to stop the compiler from issuing a later (v.slow) reload.

Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  7 +++++--
 drivers/gpu/drm/i915/intel_lrc.c        | 34 ++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6338018f655d..7062cde94a49 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3315,6 +3315,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			   upper_32_bits(addr), lower_32_bits(addr));
 
 		if (i915.enable_execlists) {
+			const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 			u32 ptr, read, write;
 			unsigned int idx;
 
@@ -3337,10 +3338,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 				write += GEN8_CSB_ENTRIES;
 			while (read < write) {
 				idx = ++read % GEN8_CSB_ENTRIES;
-				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n",
+				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x [0x%08x in hwsp], context: %d [%d in hwsp]\n",
 					   idx,
 					   I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
-					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)));
+					   hws[idx * 2],
+					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)),
+					   hws[idx * 2 + 1]);
 			}
 
 			rcu_read_lock();
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8886e3b60e82..531a7cf174d7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -541,10 +541,17 @@ static void intel_lrc_irq_handler(unsigned long data)
 	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
 		u32 __iomem *csb_mmio =
 			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
-		u32 __iomem *buf =
-			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
+		/* The HWSP contains a (cacheable) mirror of the CSB */
+		const u32 *buf =
+			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 		unsigned int head, tail;
 
+		/* However GVT emulation depends upon intercepting CSB mmio */
+		if (unlikely(engine->csb_use_mmio)) {
+			buf = (u32 * __force)
+				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+		}
+
 		/* The write will be ordered by the uncached read (itself
 		 * a memory barrier), so we do not need another in the form
 		 * of a locked instruction. The race between the interrupt
@@ -584,13 +591,12 @@ static void intel_lrc_irq_handler(unsigned long data)
 			 * status notifier.
 			 */
 
-			status = readl(buf + 2 * head);
+			status = READ_ONCE(buf[2 * head]); /* maybe mmio! */
 			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
 				continue;
 
 			/* Check the context/desc id for this event matches */
-			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
-					 port->context_id);
+			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
 
 			rq = port_unpack(port, &count);
 			GEM_BUG_ON(count == 0);
@@ -1720,6 +1726,22 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
 	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
 }
 
+static bool irq_handler_force_mmio(struct drm_i915_private *i915)
+{
+	/* GVT emulation depends upon intercepting CSB mmio */
+	if (intel_vgpu_active(i915))
+		return false;
+
+	/*
+	 * IOMMU adds unpredictable latency causing the CSB write to only
+	 * be visible after the interrupt (missed breadcrumb syndrome).
+	 */
+	if (intel_vtd_active())
+		return false;
+
+	return true;
+}
+
 static void
 logical_ring_setup(struct intel_engine_cs *engine)
 {
@@ -1731,6 +1753,8 @@ logical_ring_setup(struct intel_engine_cs *engine)
 	/* Intentionally left blank. */
 	engine->buffer = NULL;
 
+	engine->csb_use_mmio = irq_handler_force_mmio(dev_priv);
+
 	fw_domains = intel_uncore_forcewake_for_reg(dev_priv,
 						    RING_ELSP(engine),
 						    FW_REG_WRITE);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 79c0021f3700..5c055b62966d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -391,6 +391,7 @@ struct intel_engine_cs {
 	struct rb_root execlist_queue;
 	struct rb_node *execlist_first;
 	unsigned int fw_domains;
+	bool csb_use_mmio;
 
 	/* Contexts are pinned whilst they are active on the GPU. The last
 	 * context executed remains active whilst the GPU is idle - the
@@ -496,6 +497,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_SCRATCH_INDEX	0x40
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
+#define I915_HWS_CSB_BUF0_INDEX		0x10
+
 struct intel_ring *
 intel_engine_create_ring(struct intel_engine_cs *engine, int size);
 int intel_ring_pin(struct intel_ring *ring,
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [1/6] drm/i915/lrc: Clarify the format of the context image
  2017-09-13  8:56 [PATCH 1/6] drm/i915/lrc: Clarify the format of the context image Chris Wilson
                   ` (5 preceding siblings ...)
  2017-09-13  9:51 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915/lrc: Clarify the format of the context image Patchwork
@ 2017-09-13 13:37 ` Patchwork
  2017-09-13 14:04 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915/lrc: Clarify the format of the context image (rev2) Patchwork
  2017-09-13 23:11 ` ✓ Fi.CI.IGT: " Patchwork
  8 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2017-09-13 13:37 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/6] drm/i915/lrc: Clarify the format of the context image
URL   : https://patchwork.freedesktop.org/series/30269/
State : success

== Summary ==

Test kms_flip:
        Subgroup wf_vblank-vs-modeset:
                dmesg-warn -> PASS       (shard-hsw) fdo#102706
Test kms_plane:
        Subgroup plane-panning-bottom-right-suspend-pipe-C-planes:
                skip       -> PASS       (shard-hsw)
Test prime_self_import:
        Subgroup reimport-vs-gem_close-race:
                fail       -> PASS       (shard-hsw) fdo#102696
Test perf:
        Subgroup blocking:
                pass       -> FAIL       (shard-hsw) fdo#102252

fdo#102706 https://bugs.freedesktop.org/show_bug.cgi?id=102706
fdo#102696 https://bugs.freedesktop.org/show_bug.cgi?id=102696
fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252

shard-hsw        total:2419 pass:1322 dwarn:2   dfail:0   fail:14  skip:1081 time:9712s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5676/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] drm/i915/execlists: Read the context-status buffer from the HWSP
  2017-09-13  8:56 ` [PATCH 5/6] drm/i915/execlists: Read the context-status buffer from the HWSP Chris Wilson
  2017-09-13 13:35   ` [PATCH v7] " Chris Wilson
@ 2017-09-13 13:57   ` Mika Kuoppala
  1 sibling, 0 replies; 16+ messages in thread
From: Mika Kuoppala @ 2017-09-13 13:57 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> The engine provides a mirror of the CSB in the HWSP. If we use the
> cacheable reads from the HWSP, we can shave off a few mmio reads per
> context-switch interrupt (which are quite frequent!). Just removing a
> couple of mmio is not enough to actually reduce any latency, but a small
> reduction in overall cpu usage.
>
> Much appreciation for Ben dropping the bombshell that the CSB was in the
> HWSP and for Michel in digging out the details.
>
> v2: Don't be lazy, add the defines for the indices.
> v3: Include the HWSP in debugfs/i915_engine_info
> v4: Check for GVT-g, it currently depends on intercepting CSB mmio
> v5: Fixup GVT-g mmio path
> v6: Disable HWSP if VT-d is active as the iommu adds unpredictable
> memory latency. (Mika)
>
> Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: Zhi Wang <zhi.a.wang@intel.com>
> Acked-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     |  7 +++++--
>  drivers/gpu/drm/i915/intel_lrc.c        | 34 ++++++++++++++++++++++++++++-----
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
>  3 files changed, 37 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 6338018f655d..7062cde94a49 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3315,6 +3315,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  			   upper_32_bits(addr), lower_32_bits(addr));
>  
>  		if (i915.enable_execlists) {
> +			const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
>  			u32 ptr, read, write;
>  			unsigned int idx;
>  
> @@ -3337,10 +3338,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  				write += GEN8_CSB_ENTRIES;
>  			while (read < write) {
>  				idx = ++read % GEN8_CSB_ENTRIES;
> -				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n",
> +				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x [0x%08x in hwsp], context: %d [%d in hwsp]\n",
>  					   idx,
>  					   I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
> -					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)));
> +					   hws[idx * 2],
> +					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)),
> +					   hws[idx * 2 + 1]);
>  			}
>  
>  			rcu_read_lock();
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index aa5534213190..8e4b21a18554 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -547,10 +547,17 @@ static void intel_lrc_irq_handler(unsigned long data)
>  	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
>  		u32 __iomem *csb_mmio =
>  			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
> -		u32 __iomem *buf =
> -			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
> +		/* The HWSP contains a (cacheable) mirror of the CSB */
> +		const u32 *buf =
> +			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
>  		unsigned int head, tail;
>  
> +		/* However GVT emulation depends upon intercepting CSB mmio */
> +		if (unlikely(engine->csb_use_mmio)) {
> +			buf = (u32 * __force)
> +				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
> +		}
> +
>  		/* The write will be ordered by the uncached read (itself
>  		 * a memory barrier), so we do not need another in the form
>  		 * of a locked instruction. The race between the interrupt
> @@ -590,13 +597,12 @@ static void intel_lrc_irq_handler(unsigned long data)
>  			 * status notifier.
>  			 */
>  
> -			status = readl(buf + 2 * head);
> +			status = buf[2 * head];
>  			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
>  				continue;
>  
>  			/* Check the context/desc id for this event matches */
> -			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
> -					 port->context_id);
> +			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
>  
>  			rq = port_unpack(port, &count);
>  			GEM_BUG_ON(count == 0);
> @@ -1726,6 +1732,22 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
>  	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
>  }
>  
> +static bool irq_handler_force_mmio(struct drm_i915_private *i915)
> +{
> +	/* GVT emulation depends upon intercepting CSB mmio */
> +	if (intel_vgpu_active(i915))
> +		return false;
> +
> +	/*
> +	 * IOMMU adds unpredictable latency causing the CSB write to only
> +	 * be visible after the interrupt (missed breadcrumb syndrome).
> +	 */
> +	if (intel_vtd_active())
> +		return false;

I don't know if it is worthwhile to check if this restriction could be
lifted on some gens. Perhaps for future work.

But with this in place now,

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> +
> +	return true;
> +}
> +
>  static void
>  logical_ring_setup(struct intel_engine_cs *engine)
>  {
> @@ -1737,6 +1759,8 @@ logical_ring_setup(struct intel_engine_cs *engine)
>  	/* Intentionally left blank. */
>  	engine->buffer = NULL;
>  
> +	engine->csb_use_mmio = irq_handler_force_mmio(dev_priv);
> +
>  	fw_domains = intel_uncore_forcewake_for_reg(dev_priv,
>  						    RING_ELSP(engine),
>  						    FW_REG_WRITE);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 79c0021f3700..5c055b62966d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -391,6 +391,7 @@ struct intel_engine_cs {
>  	struct rb_root execlist_queue;
>  	struct rb_node *execlist_first;
>  	unsigned int fw_domains;
> +	bool csb_use_mmio;
>  
>  	/* Contexts are pinned whilst they are active on the GPU. The last
>  	 * context executed remains active whilst the GPU is idle - the
> @@ -496,6 +497,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
>  #define I915_GEM_HWS_SCRATCH_INDEX	0x40
>  #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
>  
> +#define I915_HWS_CSB_BUF0_INDEX		0x10
> +
>  struct intel_ring *
>  intel_engine_create_ring(struct intel_engine_cs *engine, int size);
>  int intel_ring_pin(struct intel_ring *ring,
> -- 
> 2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v7] drm/i915/execlists: Read the context-status buffer from the HWSP
  2017-09-13 13:35   ` [PATCH v7] " Chris Wilson
@ 2017-09-13 14:02     ` Mika Kuoppala
  0 siblings, 0 replies; 16+ messages in thread
From: Mika Kuoppala @ 2017-09-13 14:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> The engine provides a mirror of the CSB in the HWSP. If we use the
> cacheable reads from the HWSP, we can shave off a few mmio reads per
> context-switch interrupt (which are quite frequent!). Just removing a
> couple of mmio is not enough to actually reduce any latency, but a small
> reduction in overall cpu usage.
>
> Much appreciation for Ben dropping the bombshell that the CSB was in the
> HWSP and for Michel in digging out the details.
>
> v2: Don't be lazy, add the defines for the indices.
> v3: Include the HWSP in debugfs/i915_engine_info
> v4: Check for GVT-g, it currently depends on intercepting CSB mmio
> v5: Fixup GVT-g mmio path
> v6: Disable HWSP if VT-d is active as the iommu adds unpredictable
> memory latency. (Mika)
> v7: Also markup the CSB read with READ_ONCE() as it may still be an mmio
> read and we want to stop the compiler from issuing a later (v.slow) reload.
>
> Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: Zhi Wang <zhi.a.wang@intel.com>
> Acked-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     |  7 +++++--
>  drivers/gpu/drm/i915/intel_lrc.c        | 34 ++++++++++++++++++++++++++++-----
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
>  3 files changed, 37 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 6338018f655d..7062cde94a49 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3315,6 +3315,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  			   upper_32_bits(addr), lower_32_bits(addr));
>  
>  		if (i915.enable_execlists) {
> +			const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
>  			u32 ptr, read, write;
>  			unsigned int idx;
>  
> @@ -3337,10 +3338,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  				write += GEN8_CSB_ENTRIES;
>  			while (read < write) {
>  				idx = ++read % GEN8_CSB_ENTRIES;
> -				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n",
> +				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x [0x%08x in hwsp], context: %d [%d in hwsp]\n",
>  					   idx,
>  					   I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
> -					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)));
> +					   hws[idx * 2],
> +					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)),
> +					   hws[idx * 2 + 1]);
>  			}
>  
>  			rcu_read_lock();
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 8886e3b60e82..531a7cf174d7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -541,10 +541,17 @@ static void intel_lrc_irq_handler(unsigned long data)
>  	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
>  		u32 __iomem *csb_mmio =
>  			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
> -		u32 __iomem *buf =
> -			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
> +		/* The HWSP contains a (cacheable) mirror of the CSB */
> +		const u32 *buf =
> +			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
>  		unsigned int head, tail;
>  
> +		/* However GVT emulation depends upon intercepting CSB mmio */
> +		if (unlikely(engine->csb_use_mmio)) {
> +			buf = (u32 * __force)
> +				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
> +		}
> +
>  		/* The write will be ordered by the uncached read (itself
>  		 * a memory barrier), so we do not need another in the form
>  		 * of a locked instruction. The race between the interrupt
> @@ -584,13 +591,12 @@ static void intel_lrc_irq_handler(unsigned long data)
>  			 * status notifier.
>  			 */
>  
> -			status = readl(buf + 2 * head);
> +			status = READ_ONCE(buf[2 * head]); /* maybe mmio! */

Even better

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

>  			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
>  				continue;
>  
>  			/* Check the context/desc id for this event matches */
> -			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
> -					 port->context_id);
> +			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
>  
>  			rq = port_unpack(port, &count);
>  			GEM_BUG_ON(count == 0);
> @@ -1720,6 +1726,22 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
>  	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
>  }
>  
> +static bool irq_handler_force_mmio(struct drm_i915_private *i915)
> +{
> +	/* GVT emulation depends upon intercepting CSB mmio */
> +	if (intel_vgpu_active(i915))
> +		return false;
> +
> +	/*
> +	 * IOMMU adds unpredictable latency causing the CSB write to only
> +	 * be visible after the interrupt (missed breadcrumb syndrome).
> +	 */
> +	if (intel_vtd_active())
> +		return false;
> +
> +	return true;
> +}
> +
>  static void
>  logical_ring_setup(struct intel_engine_cs *engine)
>  {
> @@ -1731,6 +1753,8 @@ logical_ring_setup(struct intel_engine_cs *engine)
>  	/* Intentionally left blank. */
>  	engine->buffer = NULL;
>  
> +	engine->csb_use_mmio = irq_handler_force_mmio(dev_priv);
> +
>  	fw_domains = intel_uncore_forcewake_for_reg(dev_priv,
>  						    RING_ELSP(engine),
>  						    FW_REG_WRITE);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 79c0021f3700..5c055b62966d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -391,6 +391,7 @@ struct intel_engine_cs {
>  	struct rb_root execlist_queue;
>  	struct rb_node *execlist_first;
>  	unsigned int fw_domains;
> +	bool csb_use_mmio;
>  
>  	/* Contexts are pinned whilst they are active on the GPU. The last
>  	 * context executed remains active whilst the GPU is idle - the
> @@ -496,6 +497,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
>  #define I915_GEM_HWS_SCRATCH_INDEX	0x40
>  #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
>  
> +#define I915_HWS_CSB_BUF0_INDEX		0x10
> +
>  struct intel_ring *
>  intel_engine_create_ring(struct intel_engine_cs *engine, int size);
>  int intel_ring_pin(struct intel_ring *ring,
> -- 
> 2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915/lrc: Clarify the format of the context image (rev2)
  2017-09-13  8:56 [PATCH 1/6] drm/i915/lrc: Clarify the format of the context image Chris Wilson
                   ` (6 preceding siblings ...)
  2017-09-13 13:37 ` ✓ Fi.CI.IGT: " Patchwork
@ 2017-09-13 14:04 ` Patchwork
  2017-09-13 23:11 ` ✓ Fi.CI.IGT: " Patchwork
  8 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2017-09-13 14:04 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/6] drm/i915/lrc: Clarify the format of the context image (rev2)
URL   : https://patchwork.freedesktop.org/series/30269/
State : success

== Summary ==

Series 30269v2 series starting with [1/6] drm/i915/lrc: Clarify the format of the context image
https://patchwork.freedesktop.org/api/1.0/series/30269/revisions/2/mbox/

Test kms_cursor_legacy:
        Subgroup basic-flip-before-cursor-atomic:
                incomplete -> PASS       (fi-bxt-j4205) fdo#102705
Test pm_rpm:
        Subgroup basic-rte:
                pass       -> DMESG-WARN (fi-cfl-s) fdo#102294

fdo#102705 https://bugs.freedesktop.org/show_bug.cgi?id=102705
fdo#102294 https://bugs.freedesktop.org/show_bug.cgi?id=102294

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:450s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:456s
fi-blb-e6850     total:289  pass:224  dwarn:1   dfail:0   fail:0   skip:64  time:381s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:539s
fi-bwr-2160      total:289  pass:184  dwarn:0   dfail:0   fail:0   skip:105 time:271s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:507s
fi-byt-j1900     total:289  pass:254  dwarn:1   dfail:0   fail:0   skip:34  time:510s
fi-byt-n2820     total:289  pass:250  dwarn:1   dfail:0   fail:0   skip:38  time:503s
fi-cfl-s         total:289  pass:222  dwarn:35  dfail:0   fail:0   skip:32  time:560s
fi-elk-e7500     total:289  pass:230  dwarn:0   dfail:0   fail:0   skip:59  time:456s
fi-glk-2a        total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:603s
fi-hsw-4770      total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:430s
fi-hsw-4770r     total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:422s
fi-ilk-650       total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:436s
fi-ivb-3520m     total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:488s
fi-ivb-3770      total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:466s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:491s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:576s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:589s
fi-pnv-d510      total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:551s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:459s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:528s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:507s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:465s
fi-skl-x1585l    total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:486s
fi-snb-2520m     total:289  pass:251  dwarn:0   dfail:0   fail:0   skip:38  time:581s
fi-snb-2600      total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:430s

76f9b11f445f4381eff873a62138ed0b00d08e80 drm-tip: 2017y-09m-13d-12h-28m-54s UTC integration manifest
57e7e6bca98d drm/i915/execlists: Read the context-status HEAD from the HWSP
f2b7f58de86a drm/i915/execlists: Read the context-status buffer from the HWSP
34c818260013 drm/i915: Allow HW status page to be bound high
e51591f3ca79 drm/i915/lrc: allocate separate page for HWSP
4722fc8fd973 drm/i915/guc: Don't make assumptions while getting the lrca offset
ff6e92d7101b drm/i915/lrc: Clarify the format of the context image

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5684/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] drm/i915/execlists: Read the context-status HEAD from the HWSP
  2017-09-13  8:56 ` [PATCH 6/6] drm/i915/execlists: Read the context-status HEAD " Chris Wilson
@ 2017-09-13 14:12   ` Mika Kuoppala
  2017-09-13 16:51     ` Chris Wilson
  0 siblings, 1 reply; 16+ messages in thread
From: Mika Kuoppala @ 2017-09-13 14:12 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> The engine also provides a mirror of the CSB write pointer in the HWSP,
> but not of our read pointer. To take advantage of this we need to
> remember where we read up to on the last interrupt and continue off from
> there. This poses a problem following a reset, as we don't know where
> the hw will start writing from, and due to the use of power contexts we
> cannot perform that query during the reset itself. So we continue the
> current modus operandi of delaying the first read of the context-status
> read/write pointers until after the first interrupt. With this we should
> now have eliminated all uncached mmio reads in handling the
> context-status interrupt, though we still have the uncached mmio writes
> for submitting new work, and many uncached mmio reads in the global
> interrupt handler itself. Still a step in the right direction towards
> reducing our resubmit latency, although it appears lost in the noise!
>
> v2: Cannonlake moved the CSB write index
> v3: Include the sw/hwsp state in debugfs/i915_engine_info
> v4: Also revert to using CSB mmio for GVT-g
> v5: Prevent the compiler reloading tail (Mika)
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: Zhi Wang <zhi.a.wang@intel.com>
> Acked-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     |  6 ++++--
>  drivers/gpu/drm/i915/i915_drv.h         |  8 ++++++++
>  drivers/gpu/drm/i915/intel_lrc.c        | 27 ++++++++++++++++++++-------
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
>  4 files changed, 35 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 7062cde94a49..12381045ed6a 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3326,8 +3326,10 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  			ptr = I915_READ(RING_CONTEXT_STATUS_PTR(engine));
>  			read = GEN8_CSB_READ_PTR(ptr);
>  			write = GEN8_CSB_WRITE_PTR(ptr);
> -			seq_printf(m, "\tExeclist CSB read %d, write %d, interrupt posted? %s\n",
> -				   read, write,
> +			seq_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws], interrupt posted? %s\n",
> +				   read, engine->csb_head,
> +				   write,
> +				   intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)),
>  				   yesno(test_bit(ENGINE_IRQ_EXECLIST,
>  						  &engine->irq_posted)));
>  			if (read >= GEN8_CSB_ENTRIES)
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 1cc31a5b049f..78195b7c64db 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -4399,4 +4399,12 @@ int remap_io_mapping(struct vm_area_struct *vma,
>  		     unsigned long addr, unsigned long pfn, unsigned long size,
>  		     struct io_mapping *iomap);
>  
> +static inline int intel_hws_csb_write_index(struct drm_i915_private *i915)
> +{
> +	if (INTEL_GEN(i915) >= 10)
> +		return CNL_HWS_CSB_WRITE_INDEX;
> +	else
> +		return I915_HWS_CSB_WRITE_INDEX;
> +}
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 8e4b21a18554..55d7eee21226 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -545,8 +545,6 @@ static void intel_lrc_irq_handler(unsigned long data)
>  	 * new request (outside of the context-switch interrupt).
>  	 */
>  	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
> -		u32 __iomem *csb_mmio =
> -			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
>  		/* The HWSP contains a (cacheable) mirror of the CSB */
>  		const u32 *buf =
>  			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> @@ -556,6 +554,7 @@ static void intel_lrc_irq_handler(unsigned long data)
>  		if (unlikely(engine->csb_use_mmio)) {
>  			buf = (u32 * __force)
>  				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
> +			engine->csb_head = -1; /* force mmio read of CSB ptrs */
>  		}
>  
>  		/* The write will be ordered by the uncached read (itself
> @@ -569,9 +568,19 @@ static void intel_lrc_irq_handler(unsigned long data)
>  		 * is set and we do a new loop.
>  		 */
>  		__clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> -		head = readl(csb_mmio);
> -		tail = GEN8_CSB_WRITE_PTR(head);
> -		head = GEN8_CSB_READ_PTR(head);
> +		if (unlikely(engine->csb_head == -1)) { /* following a reset */

Was going to suggest using the same csb_use_mmio flag for this
but that would not gain much when looking at the read ptr write
further down.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


> +			head = readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> +			tail = GEN8_CSB_WRITE_PTR(head);
> +			head = GEN8_CSB_READ_PTR(head);
> +			engine->csb_head = head;
> +		} else {
> +			const int write_idx =
> +				intel_hws_csb_write_index(dev_priv) -
> +				I915_HWS_CSB_BUF0_INDEX;
> +
> +			head = engine->csb_head;
> +			tail = READ_ONCE(buf[write_idx]);
> +		}
>  		while (head != tail) {
>  			struct drm_i915_gem_request *rq;
>  			unsigned int status;
> @@ -625,8 +634,11 @@ static void intel_lrc_irq_handler(unsigned long data)
>  				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
>  		}
>  
> -		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> -		       csb_mmio);
> +		if (head != engine->csb_head) {
> +			engine->csb_head = head;
> +			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
> +			       dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
> +		}
>  	}
>  
>  	if (execlists_elsp_ready(engine))
> @@ -1275,6 +1287,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
>  	I915_WRITE(GEN8_GT_IIR(gtiir[engine->id]),
>  		   GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift);
>  	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> +	engine->csb_head = -1;
>  
>  	/* After a GPU reset, we may have requests to replay */
>  	submit = false;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 5c055b62966d..4d63a2c0b2e1 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -392,6 +392,7 @@ struct intel_engine_cs {
>  	struct rb_node *execlist_first;
>  	unsigned int fw_domains;
>  	bool csb_use_mmio;
> +	unsigned int csb_head;
>  
>  	/* Contexts are pinned whilst they are active on the GPU. The last
>  	 * context executed remains active whilst the GPU is idle - the
> @@ -498,6 +499,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
>  #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
>  
>  #define I915_HWS_CSB_BUF0_INDEX		0x10
> +#define I915_HWS_CSB_WRITE_INDEX	0x1f
> +#define CNL_HWS_CSB_WRITE_INDEX		0x2f
>  
>  struct intel_ring *
>  intel_engine_create_ring(struct intel_engine_cs *engine, int size);
> -- 
> 2.14.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] drm/i915/execlists: Read the context-status HEAD from the HWSP
  2017-09-13 14:12   ` Mika Kuoppala
@ 2017-09-13 16:51     ` Chris Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2017-09-13 16:51 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2017-09-13 15:12:42)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > The engine also provides a mirror of the CSB write pointer in the HWSP,
> > but not of our read pointer. To take advantage of this we need to
> > remember where we read up to on the last interrupt and continue off from
> > there. This poses a problem following a reset, as we don't know where
> > the hw will start writing from, and due to the use of power contexts we
> > cannot perform that query during the reset itself. So we continue the
> > current modus operandi of delaying the first read of the context-status
> > read/write pointers until after the first interrupt. With this we should
> > now have eliminated all uncached mmio reads in handling the
> > context-status interrupt, though we still have the uncached mmio writes
> > for submitting new work, and many uncached mmio reads in the global
> > interrupt handler itself. Still a step in the right direction towards
> > reducing our resubmit latency, although it appears lost in the noise!
> >
> > v2: Cannonlake moved the CSB write index
> > v3: Include the sw/hwsp state in debugfs/i915_engine_info
> > v4: Also revert to using CSB mmio for GVT-g
> > v5: Prevent the compiler reloading tail (Mika)
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Cc: Zhi Wang <zhi.a.wang@intel.com>
> > Acked-by: Michel Thierry <michel.thierry@intel.com>

> > @@ -569,9 +568,19 @@ static void intel_lrc_irq_handler(unsigned long data)
> >                * is set and we do a new loop.
> >                */
> >               __clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
> > -             head = readl(csb_mmio);
> > -             tail = GEN8_CSB_WRITE_PTR(head);
> > -             head = GEN8_CSB_READ_PTR(head);
> > +             if (unlikely(engine->csb_head == -1)) { /* following a reset */
> 
> Was going to suggest using the same csb_use_mmio flag for this
> but that would not gain much when looking at the read ptr write
> further down.

Right, it has to cover the post-reset fixup of state as well.
 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

(Minor fixup for logical flip of csb_use_mmio).

Pushed this after a few months of waiting in the wings. Thanks all,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [1/6] drm/i915/lrc: Clarify the format of the context image (rev2)
  2017-09-13  8:56 [PATCH 1/6] drm/i915/lrc: Clarify the format of the context image Chris Wilson
                   ` (7 preceding siblings ...)
  2017-09-13 14:04 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915/lrc: Clarify the format of the context image (rev2) Patchwork
@ 2017-09-13 23:11 ` Patchwork
  8 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2017-09-13 23:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/6] drm/i915/lrc: Clarify the format of the context image (rev2)
URL   : https://patchwork.freedesktop.org/series/30269/
State : success

== Summary ==

Test kms_setmode:
        Subgroup basic:
                pass       -> FAIL       (shard-hsw) fdo#99912

fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912

shard-hsw        total:2313 pass:1244 dwarn:0   dfail:0   fail:14  skip:1055 time:9460s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5684/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 5/6] drm/i915/execlists: Read the context-status buffer from the HWSP
  2017-08-22 17:24 HWSP execlists for kbl-shards Chris Wilson
@ 2017-08-22 17:24 ` Chris Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2017-08-22 17:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

The engine provides a mirror of the CSB in the HWSP. If we use the
cacheable reads from the HWSP, we can shave off a few mmio reads per
context-switch interrupt (which are quite frequent!). Just removing a
couple of mmio is not enough to actually reduce any latency, but a small
reduction in overall cpu usage.

Much appreciation for Ben dropping the bombshell that the CSB was in the
HWSP and for Michel in digging out the details.

v2: Don't be lazy, add the defines for the indices.
v3: Include the HWSP in debugfs/i915_engine_info
v4: Check for GVT-g, it currently depends on intercepting CSB mmio
v5: Fixup GVT-g mmio path

Suggested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  7 +++++--
 drivers/gpu/drm/i915/intel_lrc.c        | 19 ++++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 +++
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 48572b157222..92aa7465e950 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3312,6 +3312,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 			   upper_32_bits(addr), lower_32_bits(addr));
 
 		if (i915.enable_execlists) {
+			const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 			u32 ptr, read, write;
 			unsigned int idx;
 
@@ -3334,10 +3335,12 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 				write += GEN8_CSB_ENTRIES;
 			while (read < write) {
 				idx = ++read % GEN8_CSB_ENTRIES;
-				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x, context: %d\n",
+				seq_printf(m, "\tExeclist CSB[%d]: 0x%08x [0x%08x in hwsp], context: %d [%d in hwsp]\n",
 					   idx,
 					   I915_READ(RING_CONTEXT_STATUS_BUF_LO(engine, idx)),
-					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)));
+					   hws[idx * 2],
+					   I915_READ(RING_CONTEXT_STATUS_BUF_HI(engine, idx)),
+					   hws[idx * 2 + 1]);
 			}
 
 			rcu_read_lock();
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index aa5534213190..e889b57ec676 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -547,10 +547,17 @@ static void intel_lrc_irq_handler(unsigned long data)
 	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
 		u32 __iomem *csb_mmio =
 			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
-		u32 __iomem *buf =
-			dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
+		/* The HWSP contains a (cacheable) mirror of the CSB */
+		const u32 *buf =
+			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 		unsigned int head, tail;
 
+		/* However GVT emulation depends upon intercepting CSB mmio */
+		if (unlikely(engine->csb_use_mmio)) {
+			buf = (u32 * __force)
+				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+		}
+
 		/* The write will be ordered by the uncached read (itself
 		 * a memory barrier), so we do not need another in the form
 		 * of a locked instruction. The race between the interrupt
@@ -590,13 +597,12 @@ static void intel_lrc_irq_handler(unsigned long data)
 			 * status notifier.
 			 */
 
-			status = readl(buf + 2 * head);
+			status = buf[2 * head];
 			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
 				continue;
 
 			/* Check the context/desc id for this event matches */
-			GEM_DEBUG_BUG_ON(readl(buf + 2 * head + 1) !=
-					 port->context_id);
+			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
 
 			rq = port_unpack(port, &count);
 			GEM_BUG_ON(count == 0);
@@ -1737,6 +1743,9 @@ logical_ring_setup(struct intel_engine_cs *engine)
 	/* Intentionally left blank. */
 	engine->buffer = NULL;
 
+	/* GVT emulation depends upon intercepting CSB mmio */
+	engine->csb_use_mmio = intel_vgpu_active(dev_priv);
+
 	fw_domains = intel_uncore_forcewake_for_reg(dev_priv,
 						    RING_ELSP(engine),
 						    FW_REG_WRITE);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 02d8974bf9ab..67b91f0b29bb 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -391,6 +391,7 @@ struct intel_engine_cs {
 	struct rb_root execlist_queue;
 	struct rb_node *execlist_first;
 	unsigned int fw_domains;
+	bool csb_use_mmio;
 
 	/* Contexts are pinned whilst they are active on the GPU. The last
 	 * context executed remains active whilst the GPU is idle - the
@@ -496,6 +497,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_SCRATCH_INDEX	0x40
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
+#define I915_HWS_CSB_BUF0_INDEX		0x10
+
 struct intel_ring *
 intel_engine_create_ring(struct intel_engine_cs *engine, int size);
 int intel_ring_pin(struct intel_ring *ring,
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-09-13 23:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-13  8:56 [PATCH 1/6] drm/i915/lrc: Clarify the format of the context image Chris Wilson
2017-09-13  8:56 ` [PATCH 2/6] drm/i915/guc: Don't make assumptions while getting the lrca offset Chris Wilson
2017-09-13  8:56 ` [PATCH 3/6] drm/i915/lrc: allocate separate page for HWSP Chris Wilson
2017-09-13  8:56 ` [PATCH 4/6] drm/i915: Allow HW status page to be bound high Chris Wilson
2017-09-13  8:56 ` [PATCH 5/6] drm/i915/execlists: Read the context-status buffer from the HWSP Chris Wilson
2017-09-13 13:35   ` [PATCH v7] " Chris Wilson
2017-09-13 14:02     ` Mika Kuoppala
2017-09-13 13:57   ` [PATCH 5/6] " Mika Kuoppala
2017-09-13  8:56 ` [PATCH 6/6] drm/i915/execlists: Read the context-status HEAD " Chris Wilson
2017-09-13 14:12   ` Mika Kuoppala
2017-09-13 16:51     ` Chris Wilson
2017-09-13  9:51 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915/lrc: Clarify the format of the context image Patchwork
2017-09-13 13:37 ` ✓ Fi.CI.IGT: " Patchwork
2017-09-13 14:04 ` ✓ Fi.CI.BAT: success for series starting with [1/6] drm/i915/lrc: Clarify the format of the context image (rev2) Patchwork
2017-09-13 23:11 ` ✓ Fi.CI.IGT: " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2017-08-22 17:24 HWSP execlists for kbl-shards Chris Wilson
2017-08-22 17:24 ` [PATCH 5/6] drm/i915/execlists: Read the context-status buffer from the HWSP Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.