RFC avoiding ksoftirqd for first submission

All of lore.kernel.org
 help / color / mirror / Atom feed

* RFC avoiding ksoftirqd for first submission
@ 2018-05-25  9:31 Chris Wilson
  2018-05-25  9:31 ` [PATCH 01/18] drm/i915: Prepare GEM for suspend earlier Chris Wilson
                   ` (21 more replies)
  0 siblings, 22 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx

The series is still in flux as we depend on getting the bug fixes
resolved at the head of the series first. But I wanted to present this
before the w/e so you all have some reading material!
-Chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/18] drm/i915: Prepare GEM for suspend earlier
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
@ 2018-05-25  9:31 ` Chris Wilson
  2018-05-25 12:56   ` Mika Kuoppala
  2018-05-25  9:31 ` [PATCH 02/18] drm/i915: Switch to kernel context before idling at runtime Chris Wilson
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx

In order to prepare the GPU for sleeping, we may want to submit commands
to it. This is a complicated process that may even require some swapping
in from shmemfs, if the GPU was in the wrong state. As such, we need to
do this preparation step synchronously before the rest of the system has
started to turn off (e.g. swapin fails if scsi is suspended).
Fortunately, we are provided with a such a hook, pm_ops.prepare().

v2: Compile cleanup
v3: Fewer asserts, fewer problems?

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106640
Testcase: igt/drv_suspend after igt/gem_tiled_swapping
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c | 41 +++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 9c449b8d8eab..9d6ac7f44812 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1553,12 +1553,24 @@ static bool suspend_to_idle(struct drm_i915_private *dev_priv)
 	return false;
 }
 
+static int i915_drm_prepare(struct drm_device *dev)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	int err;
+
+	err = i915_gem_suspend(i915);
+	if (err)
+		dev_err(&i915->drm.pdev->dev,
+			"GEM idle failed, suspend/resume might fail\n");
+
+	return err;
+}
+
 static int i915_drm_suspend(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct pci_dev *pdev = dev_priv->drm.pdev;
 	pci_power_t opregion_target_state;
-	int error;
 
 	/* ignore lid events during suspend */
 	mutex_lock(&dev_priv->modeset_restore_lock);
@@ -1575,13 +1587,6 @@ static int i915_drm_suspend(struct drm_device *dev)
 
 	pci_save_state(pdev);
 
-	error = i915_gem_suspend(dev_priv);
-	if (error) {
-		dev_err(&pdev->dev,
-			"GEM idle failed, resume might fail\n");
-		goto out;
-	}
-
 	intel_display_suspend(dev);
 
 	intel_dp_mst_suspend(dev);
@@ -1609,10 +1614,9 @@ static int i915_drm_suspend(struct drm_device *dev)
 
 	intel_csr_ucode_suspend(dev_priv);
 
-out:
 	enable_rpm_wakeref_asserts(dev_priv);
 
-	return error;
+	return 0;
 }
 
 static int i915_drm_suspend_late(struct drm_device *dev, bool hibernation)
@@ -2081,6 +2085,22 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
 	return ret;
 }
 
+static int i915_pm_prepare(struct device *kdev)
+{
+	struct pci_dev *pdev = to_pci_dev(kdev);
+	struct drm_device *dev = pci_get_drvdata(pdev);
+
+	if (!dev) {
+		dev_err(kdev, "DRM not initialized, aborting suspend.\n");
+		return -ENODEV;
+	}
+
+	if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
+		return 0;
+
+	return i915_drm_prepare(dev);
+}
+
 static int i915_pm_suspend(struct device *kdev)
 {
 	struct pci_dev *pdev = to_pci_dev(kdev);
@@ -2731,6 +2751,7 @@ const struct dev_pm_ops i915_pm_ops = {
 	 * S0ix (via system suspend) and S3 event handlers [PMSG_SUSPEND,
 	 * PMSG_RESUME]
 	 */
+	.prepare = i915_pm_prepare,
 	.suspend = i915_pm_suspend,
 	.suspend_late = i915_pm_suspend_late,
 	.resume_early = i915_pm_resume_early,
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/18] drm/i915: Switch to kernel context before idling at runtime
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
  2018-05-25  9:31 ` [PATCH 01/18] drm/i915: Prepare GEM for suspend earlier Chris Wilson
@ 2018-05-25  9:31 ` Chris Wilson
  2018-05-25 13:44   ` Mika Kuoppala
  2018-05-25  9:31 ` [PATCH 03/18] drm/i915: "Race-to-idle" after switching to the kernel context Chris Wilson
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx

We can reduce our exposure to random neutrinos by resting on the kernel
context having flushed out the user contexts to system memory and
beyond. The corollary is that we then we require two passes through the
idle handler to go to sleep, which on a truly idle system involves an
extra pass through the slow and irregular retire work handler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  7 +++++--
 drivers/gpu/drm/i915/i915_gem.c     | 29 ++++++++++++++++++++++++-----
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index a8e7761cdc7d..0fb86a8ef192 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4226,8 +4226,11 @@ i915_drop_caches_set(void *data, u64 val)
 		i915_gem_shrink_all(dev_priv);
 	fs_reclaim_release(GFP_KERNEL);
 
-	if (val & DROP_IDLE)
-		drain_delayed_work(&dev_priv->gt.idle_work);
+	if (val & DROP_IDLE) {
+		do {
+			drain_delayed_work(&dev_priv->gt.idle_work);
+		} while (READ_ONCE(dev_priv->gt.awake));
+	}
 
 	if (val & DROP_FREED)
 		i915_gem_drain_freed_objects(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 05f44ca35a06..c93f5dcb1d82 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -139,6 +139,8 @@ int i915_mutex_lock_interruptible(struct drm_device *dev)
 
 static u32 __i915_gem_park(struct drm_i915_private *i915)
 {
+	GEM_TRACE("\n");
+
 	lockdep_assert_held(&i915->drm.struct_mutex);
 	GEM_BUG_ON(i915->gt.active_requests);
 	GEM_BUG_ON(!list_empty(&i915->gt.active_rings));
@@ -193,6 +195,8 @@ void i915_gem_park(struct drm_i915_private *i915)
 
 void i915_gem_unpark(struct drm_i915_private *i915)
 {
+	GEM_TRACE("\n");
+
 	lockdep_assert_held(&i915->drm.struct_mutex);
 	GEM_BUG_ON(!i915->gt.active_requests);
 
@@ -3504,6 +3508,21 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	if (!READ_ONCE(dev_priv->gt.awake))
 		return;
 
+	/*
+	 * Flush out the last user context, leaving only the pinned
+	 * kernel context resident. When we are idling on the kernel_context,
+	 * no more new requests (with a context switch) are emitted and we
+	 * can finally rest. A consequence is that the idle work handler is
+	 * always called at least twice before idling (and if the system is
+	 * idle that implies a round trip through the retire worker).
+	 */
+	mutex_lock(&dev_priv->drm.struct_mutex);
+	i915_gem_switch_to_kernel_context(dev_priv);
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+
+	GEM_TRACE("active_requests=%d (after switch-to-kernel-context)\n",
+		  READ_ONCE(dev_priv->gt.active_requests));
+
 	/*
 	 * Wait for last execlists context complete, but bail out in case a
 	 * new request is submitted. As we don't trust the hardware, we
@@ -4914,11 +4933,9 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 
 void i915_gem_sanitize(struct drm_i915_private *i915)
 {
-	if (i915_terminally_wedged(&i915->gpu_error)) {
-		mutex_lock(&i915->drm.struct_mutex);
+	mutex_lock(&i915->drm.struct_mutex);
+	if (i915_terminally_wedged(&i915->gpu_error))
 		i915_gem_unset_wedged(i915);
-		mutex_unlock(&i915->drm.struct_mutex);
-	}
 
 	/*
 	 * If we inherit context state from the BIOS or earlier occupants
@@ -4930,6 +4947,9 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	 */
 	if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
 		WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
+
+	i915_gem_contexts_lost(i915);
+	mutex_unlock(&i915->drm.struct_mutex);
 }
 
 int i915_gem_suspend(struct drm_i915_private *dev_priv)
@@ -4965,7 +4985,6 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
 
 		assert_kernel_context_is_current(dev_priv);
 	}
-	i915_gem_contexts_lost(dev_priv);
 	mutex_unlock(&dev->struct_mutex);
 
 	intel_uc_suspend(dev_priv);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/18] drm/i915: "Race-to-idle" after switching to the kernel context
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
  2018-05-25  9:31 ` [PATCH 01/18] drm/i915: Prepare GEM for suspend earlier Chris Wilson
  2018-05-25  9:31 ` [PATCH 02/18] drm/i915: Switch to kernel context before idling at runtime Chris Wilson
@ 2018-05-25  9:31 ` Chris Wilson
  2018-05-25 12:22   ` Mika Kuoppala
  2018-05-25  9:31 ` [PATCH 04/18] drm/i915: After reset on sanitization, reset the engine backends Chris Wilson
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

During suspend we want to flush out all active contexts and their
rendering. To do so we queue a request from the kernel's context, once
we know that request is done, we know the GPU is completely idle. To
speed up that switch bump the GPU clocks.

Switching to the kernel context prior to idling is also used to enforce
a barrier before changing OA properties, and when evicting active
rendering from the global GTT. All cases where we do want to
race-to-idle.

v2: Limit the boosting to only the switch before suspend.
v3: Limit it to the wait-for-idle on suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> #v1
Tested-by: David Weinehall <david.weinehall@linux.intel.com> #v1
---
 drivers/gpu/drm/i915/i915_gem.c     | 27 +++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_request.h |  1 +
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c93f5dcb1d82..7b5544efa0ba 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3704,7 +3704,29 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 
 static int wait_for_timeline(struct i915_timeline *tl, unsigned int flags)
 {
-	return i915_gem_active_wait(&tl->last_request, flags);
+	struct i915_request *rq;
+	long ret;
+
+	rq = i915_gem_active_get_unlocked(&tl->last_request);
+	if (!rq)
+		return 0;
+
+	/*
+	 * "Race-to-idle".
+	 *
+	 * Switching to the kernel context is often used a synchronous
+	 * step prior to idling, e.g. in suspend for flushing all
+	 * current operations to memory before sleeping. These we
+	 * want to complete as quickly as possible to avoid prolonged
+	 * stalls, so allow the gpu to boost to maximum clocks.
+	 */
+	if (flags & I915_WAIT_FOR_IDLE_BOOST)
+		gen6_rps_boost(rq, NULL);
+
+	ret = i915_request_wait(rq, flags, MAX_SCHEDULE_TIMEOUT);
+	i915_request_put(rq);
+
+	return ret < 0 ? ret : 0;
 }
 
 static int wait_for_engines(struct drm_i915_private *i915)
@@ -4979,7 +5001,8 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
 
 		ret = i915_gem_wait_for_idle(dev_priv,
 					     I915_WAIT_INTERRUPTIBLE |
-					     I915_WAIT_LOCKED);
+					     I915_WAIT_LOCKED |
+					     I915_WAIT_FOR_IDLE_BOOST);
 		if (ret && ret != -EIO)
 			goto err_unlock;
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 1bbbb7a9fa03..491ff81d0fea 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -267,6 +267,7 @@ long i915_request_wait(struct i915_request *rq,
 #define I915_WAIT_INTERRUPTIBLE	BIT(0)
 #define I915_WAIT_LOCKED	BIT(1) /* struct_mutex held, handle GPU reset */
 #define I915_WAIT_ALL		BIT(2) /* used by i915_gem_object_wait() */
+#define I915_WAIT_FOR_IDLE_BOOST BIT(3)
 
 static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/18] drm/i915: After reset on sanitization, reset the engine backends
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (2 preceding siblings ...)
  2018-05-25  9:31 ` [PATCH 03/18] drm/i915: "Race-to-idle" after switching to the kernel context Chris Wilson
@ 2018-05-25  9:31 ` Chris Wilson
  2018-05-25 13:13   ` Mika Kuoppala
  2018-05-25  9:31 ` [PATCH 05/18] drm/i915: Only sanitize GEM from late suspend Chris Wilson
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx

As we reset the GPU on suspend/resume, we also do need to reset the
engine state tracking so call into the engine backends. This is
especially important so that we can also sanitize the state tracking
across resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7b5544efa0ba..5a7e0b388ad0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 
 void i915_gem_sanitize(struct drm_i915_private *i915)
 {
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	GEM_TRACE("\n");
+
 	mutex_lock(&i915->drm.struct_mutex);
+
+	intel_runtime_pm_get(i915);
+	intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
+
+	/*
+	 * As we have just resumed the machine and woken the device up from
+	 * deep PCI sleep (presumably D3_cold), assume the HW has been reset
+	 * back to defaults, recovering from whatever wedged state we left it
+	 * in and so worth trying to use the device once more.
+	 */
 	if (i915_terminally_wedged(&i915->gpu_error))
 		i915_gem_unset_wedged(i915);
 
@@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
 		WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
 
+	/* Reset the submission backend after resume as well as the GPU reset */
+	for_each_engine(engine, i915, id) {
+		if (engine->reset.reset)
+			engine->reset.reset(engine, NULL);
+	}
+
+	intel_uncore_forcewake_put(i915, FORCEWAKE_ALL);
+	intel_runtime_pm_put(i915);
+
 	i915_gem_contexts_lost(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 }
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/18] drm/i915: Only sanitize GEM from late suspend
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (3 preceding siblings ...)
  2018-05-25  9:31 ` [PATCH 04/18] drm/i915: After reset on sanitization, reset the engine backends Chris Wilson
@ 2018-05-25  9:31 ` Chris Wilson
  2018-05-25  9:31 ` [PATCH 06/18] drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset Chris Wilson
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx

During testing we encounter a conflict between SUSPEND_TEST_DEVICES and
disabling reset (gem_eio/suspend). This results in the device continuing
on without being reset, but since it has gone through HW sanitization to
account for the suspend/resume cycle, we have to assume the device has
been reset to its defaults. A simple way around this is to skip the
sanitize phase for SUSPEND_TEST_DEVICES by moving it to suspend-late.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c |  6 +++++-
 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_gem.c | 20 ++++++++++++--------
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 9d6ac7f44812..be8137b90a51 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -636,6 +636,8 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
 
 static void i915_gem_fini(struct drm_i915_private *dev_priv)
 {
+	i915_gem_suspend_late(dev_priv);
+
 	/* Flush any outstanding unpin_work. */
 	i915_gem_drain_workqueue(dev_priv);
 
@@ -1605,7 +1607,6 @@ static int i915_drm_suspend(struct drm_device *dev)
 	opregion_target_state = suspend_to_idle(dev_priv) ? PCI_D1 : PCI_D3cold;
 	intel_opregion_notify_adapter(dev_priv, opregion_target_state);
 
-	intel_uncore_suspend(dev_priv);
 	intel_opregion_unregister(dev_priv);
 
 	intel_fbdev_set_suspend(dev, FBINFO_STATE_SUSPENDED, true);
@@ -1627,7 +1628,10 @@ static int i915_drm_suspend_late(struct drm_device *dev, bool hibernation)
 
 	disable_rpm_wakeref_asserts(dev_priv);
 
+	i915_gem_suspend_late(dev_priv);
+
 	intel_display_set_init_power(dev_priv, false);
+	intel_uncore_suspend(dev_priv);
 
 	/*
 	 * In case of firmware assisted context save/restore don't manually
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 487922f88b76..4257ffc1bae1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3169,6 +3169,7 @@ void i915_gem_cleanup_engines(struct drm_i915_private *dev_priv);
 int i915_gem_wait_for_idle(struct drm_i915_private *dev_priv,
 			   unsigned int flags);
 int __must_check i915_gem_suspend(struct drm_i915_private *dev_priv);
+void i915_gem_suspend_late(struct drm_i915_private *dev_priv);
 void i915_gem_resume(struct drm_i915_private *dev_priv);
 int i915_gem_fault(struct vm_fault *vmf);
 int i915_gem_object_wait(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5a7e0b388ad0..b0980b3186e7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5051,6 +5051,17 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
 	if (WARN_ON(!intel_engines_are_idle(dev_priv)))
 		i915_gem_set_wedged(dev_priv); /* no hope, discard everything */
 
+	intel_runtime_pm_put(dev_priv);
+	return 0;
+
+err_unlock:
+	mutex_unlock(&dev->struct_mutex);
+	intel_runtime_pm_put(dev_priv);
+	return ret;
+}
+
+void i915_gem_suspend_late(struct drm_i915_private *dev_priv)
+{
 	/*
 	 * Neither the BIOS, ourselves or any other kernel
 	 * expects the system to be in execlists mode on startup,
@@ -5070,16 +5081,9 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
 	 * machines is a good idea, we don't - just in case it leaves the
 	 * machine in an unusable condition.
 	 */
+
 	intel_uc_sanitize(dev_priv);
 	i915_gem_sanitize(dev_priv);
-
-	intel_runtime_pm_put(dev_priv);
-	return 0;
-
-err_unlock:
-	mutex_unlock(&dev->struct_mutex);
-	intel_runtime_pm_put(dev_priv);
-	return ret;
 }
 
 void i915_gem_resume(struct drm_i915_private *i915)
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/18] drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (4 preceding siblings ...)
  2018-05-25  9:31 ` [PATCH 05/18] drm/i915: Only sanitize GEM from late suspend Chris Wilson
@ 2018-05-25  9:31 ` Chris Wilson
  2018-05-25  9:31 ` [PATCH 07/18] drm/i915: Be irqsafe inside reset Chris Wilson
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx

Inside the live_hangcheck (reset) selftests, we occasionally see
failures like

<7>[  239.094840] i915_gem_set_wedged rcs0
<7>[  239.094843] i915_gem_set_wedged 	current seqno 19a98, last 19a9a, hangcheck 0 [5158 ms]
<7>[  239.094846] i915_gem_set_wedged 	Reset count: 6239 (global 1)
<7>[  239.094848] i915_gem_set_wedged 	Requests:
<7>[  239.095052] i915_gem_set_wedged 		first  19a99 [e8c:5f] prio=1024 @ 5159ms: (null)
<7>[  239.095056] i915_gem_set_wedged 		last   19a9a [e81:1a] prio=139 @ 5159ms: igt/rcs0[5977]/1
<7>[  239.095059] i915_gem_set_wedged 		active 19a99 [e8c:5f] prio=1024 @ 5159ms: (null)
<7>[  239.095062] i915_gem_set_wedged 		[head 0220, postfix 0280, tail 02a8, batch 0xffffffff_ffffffff]
<7>[  239.100050] i915_gem_set_wedged 		ring->start:  0x00283000
<7>[  239.100053] i915_gem_set_wedged 		ring->head:   0x000001f8
<7>[  239.100055] i915_gem_set_wedged 		ring->tail:   0x000002a8
<7>[  239.100057] i915_gem_set_wedged 		ring->emit:   0x000002a8
<7>[  239.100059] i915_gem_set_wedged 		ring->space:  0x00000f10
<7>[  239.100085] i915_gem_set_wedged 	RING_START: 0x00283000
<7>[  239.100088] i915_gem_set_wedged 	RING_HEAD:  0x00000260
<7>[  239.100091] i915_gem_set_wedged 	RING_TAIL:  0x000002a8
<7>[  239.100094] i915_gem_set_wedged 	RING_CTL:   0x00000001
<7>[  239.100097] i915_gem_set_wedged 	RING_MODE:  0x00000300 [idle]
<7>[  239.100100] i915_gem_set_wedged 	RING_IMR: fffffefe
<7>[  239.100104] i915_gem_set_wedged 	ACTHD:  0x00000000_0000609c
<7>[  239.100108] i915_gem_set_wedged 	BBADDR: 0x00000000_0000609d
<7>[  239.100111] i915_gem_set_wedged 	DMA_FADDR: 0x00000000_00283260
<7>[  239.100114] i915_gem_set_wedged 	IPEIR: 0x00000000
<7>[  239.100117] i915_gem_set_wedged 	IPEHR: 0x02800000
<7>[  239.100120] i915_gem_set_wedged 	Execlist status: 0x00044052 00000002
<7>[  239.100124] i915_gem_set_wedged 	Execlist CSB read 5 [5 cached], write 5 [5 from hws], interrupt posted? no, tasklet queued? no (enabled)
<7>[  239.100128] i915_gem_set_wedged 		ELSP[0] count=1, ring->start=00283000, rq: 19a99 [e8c:5f] prio=1024 @ 5164ms: (null)
<7>[  239.100132] i915_gem_set_wedged 		ELSP[1] count=1, ring->start=00257000, rq: 19a9a [e81:1a] prio=139 @ 5164ms: igt/rcs0[5977]/1
<7>[  239.100135] i915_gem_set_wedged 		HW active? 0x5
<7>[  239.100250] i915_gem_set_wedged 		E 19a99 [e8c:5f] prio=1024 @ 5164ms: (null)
<7>[  239.100338] i915_gem_set_wedged 		E 19a9a [e81:1a] prio=139 @ 5164ms: igt/rcs0[5977]/1
<7>[  239.100340] i915_gem_set_wedged 		Queue priority: 139
<7>[  239.100343] i915_gem_set_wedged 		Q 0 [e98:19] prio=132 @ 5164ms: igt/rcs0[5977]/8
<7>[  239.100346] i915_gem_set_wedged 		Q 0 [e84:19] prio=121 @ 5165ms: igt/rcs0[5977]/2
<7>[  239.100349] i915_gem_set_wedged 		Q 0 [e87:19] prio=82 @ 5165ms: igt/rcs0[5977]/3
<7>[  239.100352] i915_gem_set_wedged 		Q 0 [e84:1a] prio=44 @ 5164ms: igt/rcs0[5977]/2
<7>[  239.100356] i915_gem_set_wedged 		Q 0 [e8b:19] prio=20 @ 5165ms: igt/rcs0[5977]/4
<7>[  239.100362] i915_gem_set_wedged 	drv_selftest [5894] waiting for 19a99

where the GPU saw an arbitration point and idles; AND HAS NOT BEEN RESET!
The RING_MODE indicates that is idle and has the STOP_RING bit set, so
try clearing it.

v2: Only clear the bit on restarting the ring, as we want to be sure the
STOP_RING bit is kept if reset fails on wedging.
v3: Spot when the ring state doesn't make sense when re-initialising the
engine and dump it to the logs so that we don't have to wait for an
error later and try to guess what happened earlier.
v4: Prepare to print all the unexpected state, not just the first.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 857ab04452f0..b3a5cfb7689f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1781,6 +1781,9 @@ static void enable_execlists(struct intel_engine_cs *engine)
 		I915_WRITE(RING_MODE_GEN7(engine),
 			   _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
 
+	I915_WRITE(RING_MI_MODE(engine->mmio_base),
+		   _MASKED_BIT_DISABLE(STOP_RING));
+
 	I915_WRITE(RING_HWS_PGA(engine->mmio_base),
 		   engine->status_page.ggtt_offset);
 	POSTING_READ(RING_HWS_PGA(engine->mmio_base));
@@ -1789,6 +1792,19 @@ static void enable_execlists(struct intel_engine_cs *engine)
 	engine->execlists.csb_head = -1;
 }
 
+static bool unexpected_starting_state(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	bool unexpected = false;
+
+	if (I915_READ(RING_MI_MODE(engine->mmio_base)) & STOP_RING) {
+		DRM_DEBUG_DRIVER("STOP_RING still set in RING_MI_MODE\n");
+		unexpected = true;
+	}
+
+	return unexpected;
+}
+
 static int gen8_init_common_ring(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -1801,6 +1817,12 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
 	intel_engine_reset_breadcrumbs(engine);
 	intel_engine_init_hangcheck(engine);
 
+	if (GEM_SHOW_DEBUG() && unexpected_starting_state(engine)) {
+		struct drm_printer p = drm_debug_printer(__func__);
+
+		intel_engine_dump(engine, &p, NULL);
+	}
+
 	enable_execlists(engine);
 
 	/* After a GPU reset, we may have requests to replay */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/18] drm/i915: Be irqsafe inside reset
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (5 preceding siblings ...)
  2018-05-25  9:31 ` [PATCH 06/18] drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset Chris Wilson
@ 2018-05-25  9:31 ` Chris Wilson
  2018-05-25  9:31 ` [PATCH 08/18] drm/i915/execlists: Wait for ELSP submission on restart Chris Wilson
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx

As we want to be able to call i915_reset_engine and co from a softirq or
timer context, we need to be irqsafe at all timers. So we have to forgo
the simple spin_lock_irq for the full spin_lock_irqsave.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b0980b3186e7..c26f4da77e93 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3141,15 +3141,17 @@ i915_gem_reset_request(struct intel_engine_cs *engine,
 		 */
 		request = i915_gem_find_active_request(engine);
 		if (request) {
+			unsigned long flags;
+
 			i915_gem_context_mark_innocent(request->gem_context);
 			dma_fence_set_error(&request->fence, -EAGAIN);
 
 			/* Rewind the engine to replay the incomplete rq */
-			spin_lock_irq(&engine->timeline.lock);
+			spin_lock_irqsave(&engine->timeline.lock, flags);
 			request = list_prev_entry(request, link);
 			if (&request->link == &engine->timeline.requests)
 				request = NULL;
-			spin_unlock_irq(&engine->timeline.lock);
+			spin_unlock_irqrestore(&engine->timeline.lock, flags);
 		}
 	}
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/18] drm/i915/execlists: Wait for ELSP submission on restart
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (6 preceding siblings ...)
  2018-05-25  9:31 ` [PATCH 07/18] drm/i915: Be irqsafe inside reset Chris Wilson
@ 2018-05-25  9:31 ` Chris Wilson
  2018-05-25 12:37   ` Mika Kuoppala
  2018-05-25  9:31 ` [PATCH 09/18] drm/i915/execlists: Reset the CSB head tracking on reset/sanitization Chris Wilson
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx

After a reset, we will ensure that there is at least one request
submitted to HW to ensure that a context is loaded for powersaving.
Let's wait for this submission via a tasklet to complete before we drop
our forcewake, ensuring the system is ready for rc6 before we let it
possible sleep.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.h  |  6 ++++++
 drivers/gpu/drm/i915/intel_lrc.c | 15 ++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 62ee4e385365..261da577829a 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -82,4 +82,10 @@ static inline void __tasklet_disable_sync_once(struct tasklet_struct *t)
 		tasklet_unlock_wait(t);
 }
 
+static inline void __tasklet_enable_sync_once(struct tasklet_struct *t)
+{
+	if (atomic_dec_return(&t->count) == 0)
+		tasklet_kill(t);
+}
+
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b3a5cfb7689f..f2fb48b1a7b7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1877,6 +1877,8 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 
 	GEM_TRACE("%s\n", engine->name);
 
+	intel_uncore_forcewake_get(engine->i915, FORCEWAKE_ALL);
+
 	/*
 	 * Prevent request submission to the hardware until we have
 	 * completed the reset in i915_gem_reset_finish(). If a request
@@ -2007,7 +2009,18 @@ static void execlists_reset(struct intel_engine_cs *engine,
 
 static void execlists_reset_finish(struct intel_engine_cs *engine)
 {
-	tasklet_enable(&engine->execlists.tasklet);
+	/*
+	 * Flush the tasklet while we still have the forcewake to be sure
+	 * that it is not allowed to sleep before we restart and reload a
+	 * context.
+	 *
+	 * As before (with execlists_reset_prepare) we rely on the caller
+	 * serialising mutiple attempts to reset so that we know that we
+	 * are the only one manipulating tasklet state.
+	 */
+	__tasklet_enable_sync_once(&engine->execlists.tasklet);
+
+	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
 
 	GEM_TRACE("%s\n", engine->name);
 }
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/18] drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (7 preceding siblings ...)
  2018-05-25  9:31 ` [PATCH 08/18] drm/i915/execlists: Wait for ELSP submission on restart Chris Wilson
@ 2018-05-25  9:31 ` Chris Wilson
  2018-05-25  9:31 ` [PATCH 10/18] drm/i915/execlists: Pull submit after dequeue under timeline lock Chris Wilson
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx

We can avoid the mmio read of the CSB pointers after reset based on the
knowledge that the HW always start writing at entry 0 in the CSB buffer.
We need to reset our CSB head tracking after GPU reset (and on
sanitization after resume) so that we are expecting to read from entry
0.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f2fb48b1a7b7..ea85e2a50b83 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -965,22 +965,19 @@ static void process_csb(struct intel_engine_cs *engine)
 			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
 		unsigned int head, tail;
 
-		if (unlikely(execlists->csb_use_mmio)) {
-			buf = (u32 * __force)
-				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
-			execlists->csb_head = -1; /* force mmio read of CSB */
-		}
-
 		/* Clear before reading to catch new interrupts */
 		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 		smp_mb__after_atomic();
 
-		if (unlikely(execlists->csb_head == -1)) { /* after a reset */
+		if (unlikely(execlists->csb_use_mmio)) {
 			if (!fw) {
 				intel_uncore_forcewake_get(i915, execlists->fw_domains);
 				fw = true;
 			}
 
+			buf = (u32 * __force)
+				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+
 			head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
 			tail = GEN8_CSB_WRITE_PTR(head);
 			head = GEN8_CSB_READ_PTR(head);
@@ -1787,9 +1784,6 @@ static void enable_execlists(struct intel_engine_cs *engine)
 	I915_WRITE(RING_HWS_PGA(engine->mmio_base),
 		   engine->status_page.ggtt_offset);
 	POSTING_READ(RING_HWS_PGA(engine->mmio_base));
-
-	/* Following the reset, we need to reload the CSB read/write pointers */
-	engine->execlists.csb_head = -1;
 }
 
 static bool unexpected_starting_state(struct intel_engine_cs *engine)
@@ -1963,6 +1957,9 @@ static void execlists_reset(struct intel_engine_cs *engine,
 	__unwind_incomplete_requests(engine);
 	spin_unlock(&engine->timeline.lock);
 
+	/* Following the reset, we need to reload the CSB read/write pointers */
+	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
+
 	local_irq_restore(flags);
 
 	/*
@@ -2464,6 +2461,8 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 			upper_32_bits(ce->lrc_desc);
 	}
 
+	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
+
 	return 0;
 
 error:
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/18] drm/i915/execlists: Pull submit after dequeue under timeline lock
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (8 preceding siblings ...)
  2018-05-25  9:31 ` [PATCH 09/18] drm/i915/execlists: Reset the CSB head tracking on reset/sanitization Chris Wilson
@ 2018-05-25  9:31 ` Chris Wilson
  2018-05-25  9:31 ` [PATCH 11/18] drm/i915/execlists: Pull CSB reset under the timeline.lock Chris Wilson
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will begin processing the CSB from inside the
interrupt handler. This means that updating the execlists->port[] will
no longer be locked by the tasklet but by the engine->timeline.lock
instead. Pull dequeue and submit under the same lock for protection.
(An alternative, future, plan is to keep the in/out arrays separate for
concurrent processing and reduced lock coverage.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 32 ++++++++++++--------------------
 1 file changed, 12 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ea85e2a50b83..703135637798 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -562,7 +562,7 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
 	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
 }
 
-static bool __execlists_dequeue(struct intel_engine_cs *engine)
+static void __execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
@@ -617,11 +617,11 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 		 * the HW to indicate that it has had a chance to respond.
 		 */
 		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
-			return false;
+			return;
 
 		if (need_preempt(engine, last, execlists->queue_priority)) {
 			inject_preempt_context(engine);
-			return false;
+			return;
 		}
 
 		/*
@@ -646,7 +646,7 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 		 * priorities of the ports haven't been switch.
 		 */
 		if (port_count(&port[1]))
-			return false;
+			return;
 
 		/*
 		 * WaIdleLiteRestore:bdw,skl
@@ -746,8 +746,10 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 		port != execlists->port ? rq_prio(last) : INT_MIN;
 
 	execlists->first = rb;
-	if (submit)
+	if (submit) {
 		port_assign(port, last);
+		execlists_submit_ports(engine);
+	}
 
 	/* We must always keep the beast fed if we have work piled up */
 	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
@@ -756,24 +758,19 @@ static bool __execlists_dequeue(struct intel_engine_cs *engine)
 	if (last)
 		execlists_user_begin(execlists, execlists->port);
 
-	return submit;
+	/* If the engine is now idle, so should be the flag; and vice versa. */
+	GEM_BUG_ON(execlists_is_active(&engine->execlists,
+				       EXECLISTS_ACTIVE_USER) ==
+		   !port_isset(engine->execlists.port));
 }
 
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
 	unsigned long flags;
-	bool submit;
 
 	spin_lock_irqsave(&engine->timeline.lock, flags);
-	submit = __execlists_dequeue(engine);
+	__execlists_dequeue(engine);
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
-
-	if (submit)
-		execlists_submit_ports(engine);
-
-	GEM_BUG_ON(port_isset(execlists->port) &&
-		   !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
 }
 
 void
@@ -1156,11 +1153,6 @@ static void execlists_submission_tasklet(unsigned long data)
 
 	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
 		execlists_dequeue(engine);
-
-	/* If the engine is now idle, so should be the flag; and vice versa. */
-	GEM_BUG_ON(execlists_is_active(&engine->execlists,
-				       EXECLISTS_ACTIVE_USER) ==
-		   !port_isset(engine->execlists.port));
 }
 
 static void queue_request(struct intel_engine_cs *engine,
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 11/18] drm/i915/execlists: Pull CSB reset under the timeline.lock
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (9 preceding siblings ...)
  2018-05-25  9:31 ` [PATCH 10/18] drm/i915/execlists: Pull submit after dequeue under timeline lock Chris Wilson
@ 2018-05-25  9:31 ` Chris Wilson
  2018-05-25  9:32 ` [PATCH 12/18] drm/i915/execlists: Process one CSB interrupt at a time Chris Wilson
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:31 UTC (permalink / raw)
  To: intel-gfx

In the following patch, we will process the CSB interrupt under the
timeline.lock and not under the tasklet lock. This also means that we
will need to protect access to common variables such as
execlists->csb_head with the timeline.lock during reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 703135637798..e22b04bb58fe 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1929,8 +1929,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
 		  engine->name, request ? request->global_seqno : 0,
 		  intel_engine_get_seqno(engine));
 
-	/* See execlists_cancel_requests() for the irq/spinlock split. */
-	local_irq_save(flags);
+	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	/*
 	 * Catch up with any missed context-switch interrupts.
@@ -1945,14 +1944,12 @@ static void execlists_reset(struct intel_engine_cs *engine,
 	reset_irq(engine);
 
 	/* Push back any incomplete requests for replay after the reset. */
-	spin_lock(&engine->timeline.lock);
 	__unwind_incomplete_requests(engine);
-	spin_unlock(&engine->timeline.lock);
 
 	/* Following the reset, we need to reload the CSB read/write pointers */
 	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
 
-	local_irq_restore(flags);
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
 	/*
 	 * If the request was innocent, we leave the request in the ELSP
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 12/18] drm/i915/execlists: Process one CSB interrupt at a time
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (10 preceding siblings ...)
  2018-05-25  9:31 ` [PATCH 11/18] drm/i915/execlists: Pull CSB reset under the timeline.lock Chris Wilson
@ 2018-05-25  9:32 ` Chris Wilson
  2018-05-25  9:32 ` [PATCH 13/18] drm/i915/execlists: Unify CSB access pointers Chris Wilson
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:32 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will process the CSB events directly from the CS
interrupt handler, being called for each interrupt. Hence, we will no
longer have the need for a loop until the has-interrupt bit is clear,
and in the meantime can remove that small optimisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 274 +++++++++++++++----------------
 1 file changed, 135 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e22b04bb58fe..7c2d9310fb36 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -954,166 +954,162 @@ static void process_csb(struct intel_engine_cs *engine)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
 	struct drm_i915_private *i915 = engine->i915;
+
+	/* The HWSP contains a (cacheable) mirror of the CSB */
+	const u32 *buf =
+		&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
+	unsigned int head, tail;
 	bool fw = false;
 
-	do {
-		/* The HWSP contains a (cacheable) mirror of the CSB */
-		const u32 *buf =
-			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
-		unsigned int head, tail;
+	/* Clear before reading to catch new interrupts */
+	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+	smp_mb__after_atomic();
 
-		/* Clear before reading to catch new interrupts */
-		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-		smp_mb__after_atomic();
+	if (unlikely(execlists->csb_use_mmio)) {
+		intel_uncore_forcewake_get(i915, execlists->fw_domains);
+		fw = true;
 
-		if (unlikely(execlists->csb_use_mmio)) {
-			if (!fw) {
-				intel_uncore_forcewake_get(i915, execlists->fw_domains);
-				fw = true;
-			}
+		buf = (u32 * __force)
+			(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
 
-			buf = (u32 * __force)
-				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+		head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+		tail = GEN8_CSB_WRITE_PTR(head);
+		head = GEN8_CSB_READ_PTR(head);
+		execlists->csb_head = head;
+	} else {
+		const int write_idx =
+			intel_hws_csb_write_index(i915) -
+			I915_HWS_CSB_BUF0_INDEX;
 
-			head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
-			tail = GEN8_CSB_WRITE_PTR(head);
-			head = GEN8_CSB_READ_PTR(head);
-			execlists->csb_head = head;
-		} else {
-			const int write_idx =
-				intel_hws_csb_write_index(i915) -
-				I915_HWS_CSB_BUF0_INDEX;
+		head = execlists->csb_head;
+		tail = READ_ONCE(buf[write_idx]);
+		rmb(); /* Hopefully paired with a wmb() in HW */
+	}
+	GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
+		  engine->name,
+		  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
+		  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
 
-			head = execlists->csb_head;
-			tail = READ_ONCE(buf[write_idx]);
-			rmb(); /* Hopefully paired with a wmb() in HW */
-		}
-		GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
-			  engine->name,
-			  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
-			  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
+	while (head != tail) {
+		struct i915_request *rq;
+		unsigned int status;
+		unsigned int count;
 
-		while (head != tail) {
-			struct i915_request *rq;
-			unsigned int status;
-			unsigned int count;
+		if (++head == GEN8_CSB_ENTRIES)
+			head = 0;
 
-			if (++head == GEN8_CSB_ENTRIES)
-				head = 0;
+		/*
+		 * We are flying near dragons again.
+		 *
+		 * We hold a reference to the request in execlist_port[]
+		 * but no more than that. We are operating in softirq
+		 * context and so cannot hold any mutex or sleep. That
+		 * prevents us stopping the requests we are processing
+		 * in port[] from being retired simultaneously (the
+		 * breadcrumb will be complete before we see the
+		 * context-switch). As we only hold the reference to the
+		 * request, any pointer chasing underneath the request
+		 * is subject to a potential use-after-free. Thus we
+		 * store all of the bookkeeping within port[] as
+		 * required, and avoid using unguarded pointers beneath
+		 * request itself. The same applies to the atomic
+		 * status notifier.
+		 */
 
-			/*
-			 * We are flying near dragons again.
-			 *
-			 * We hold a reference to the request in execlist_port[]
-			 * but no more than that. We are operating in softirq
-			 * context and so cannot hold any mutex or sleep. That
-			 * prevents us stopping the requests we are processing
-			 * in port[] from being retired simultaneously (the
-			 * breadcrumb will be complete before we see the
-			 * context-switch). As we only hold the reference to the
-			 * request, any pointer chasing underneath the request
-			 * is subject to a potential use-after-free. Thus we
-			 * store all of the bookkeeping within port[] as
-			 * required, and avoid using unguarded pointers beneath
-			 * request itself. The same applies to the atomic
-			 * status notifier.
-			 */
+		status = READ_ONCE(buf[2 * head]); /* maybe mmio! */
+		GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x, active=0x%x\n",
+			  engine->name, head,
+			  status, buf[2*head + 1],
+			  execlists->active);
+
+		if (status & (GEN8_CTX_STATUS_IDLE_ACTIVE |
+			      GEN8_CTX_STATUS_PREEMPTED))
+			execlists_set_active(execlists,
+					     EXECLISTS_ACTIVE_HWACK);
+		if (status & GEN8_CTX_STATUS_ACTIVE_IDLE)
+			execlists_clear_active(execlists,
+					       EXECLISTS_ACTIVE_HWACK);
+
+		if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
+			continue;
 
-			status = READ_ONCE(buf[2 * head]); /* maybe mmio! */
-			GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x, active=0x%x\n",
-				  engine->name, head,
-				  status, buf[2*head + 1],
-				  execlists->active);
-
-			if (status & (GEN8_CTX_STATUS_IDLE_ACTIVE |
-				      GEN8_CTX_STATUS_PREEMPTED))
-				execlists_set_active(execlists,
-						     EXECLISTS_ACTIVE_HWACK);
-			if (status & GEN8_CTX_STATUS_ACTIVE_IDLE)
-				execlists_clear_active(execlists,
-						       EXECLISTS_ACTIVE_HWACK);
-
-			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
-				continue;
+		/* We should never get a COMPLETED | IDLE_ACTIVE! */
+		GEM_BUG_ON(status & GEN8_CTX_STATUS_IDLE_ACTIVE);
 
-			/* We should never get a COMPLETED | IDLE_ACTIVE! */
-			GEM_BUG_ON(status & GEN8_CTX_STATUS_IDLE_ACTIVE);
+		if (status & GEN8_CTX_STATUS_COMPLETE &&
+		    buf[2*head + 1] == execlists->preempt_complete_status) {
+			GEM_TRACE("%s preempt-idle\n", engine->name);
+			complete_preempt_context(execlists);
+			continue;
+		}
 
-			if (status & GEN8_CTX_STATUS_COMPLETE &&
-			    buf[2*head + 1] == execlists->preempt_complete_status) {
-				GEM_TRACE("%s preempt-idle\n", engine->name);
-				complete_preempt_context(execlists);
-				continue;
-			}
+		if (status & GEN8_CTX_STATUS_PREEMPTED &&
+		    execlists_is_active(execlists,
+					EXECLISTS_ACTIVE_PREEMPT))
+			continue;
 
-			if (status & GEN8_CTX_STATUS_PREEMPTED &&
-			    execlists_is_active(execlists,
-						EXECLISTS_ACTIVE_PREEMPT))
-				continue;
+		GEM_BUG_ON(!execlists_is_active(execlists,
+						EXECLISTS_ACTIVE_USER));
 
-			GEM_BUG_ON(!execlists_is_active(execlists,
-							EXECLISTS_ACTIVE_USER));
+		rq = port_unpack(port, &count);
+		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%d) (current %d), prio=%d\n",
+			  engine->name,
+			  port->context_id, count,
+			  rq ? rq->global_seqno : 0,
+			  rq ? rq->fence.context : 0,
+			  rq ? rq->fence.seqno : 0,
+			  intel_engine_get_seqno(engine),
+			  rq ? rq_prio(rq) : 0);
+
+		/* Check the context/desc id for this event matches */
+		GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
+
+		GEM_BUG_ON(count == 0);
+		if (--count == 0) {
+			/*
+			 * On the final event corresponding to the
+			 * submission of this context, we expect either
+			 * an element-switch event or a completion
+			 * event (and on completion, the active-idle
+			 * marker). No more preemptions, lite-restore
+			 * or otherwise.
+			 */
+			GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
+			GEM_BUG_ON(port_isset(&port[1]) &&
+				   !(status & GEN8_CTX_STATUS_ELEMENT_SWITCH));
+			GEM_BUG_ON(!port_isset(&port[1]) &&
+				   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
 
-			rq = port_unpack(port, &count);
-			GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%d) (current %d), prio=%d\n",
-				  engine->name,
-				  port->context_id, count,
-				  rq ? rq->global_seqno : 0,
-				  rq ? rq->fence.context : 0,
-				  rq ? rq->fence.seqno : 0,
-				  intel_engine_get_seqno(engine),
-				  rq ? rq_prio(rq) : 0);
+			/*
+			 * We rely on the hardware being strongly
+			 * ordered, that the breadcrumb write is
+			 * coherent (visible from the CPU) before the
+			 * user interrupt and CSB is processed.
+			 */
+			GEM_BUG_ON(!i915_request_completed(rq));
 
-			/* Check the context/desc id for this event matches */
-			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
+			execlists_context_schedule_out(rq,
+						       INTEL_CONTEXT_SCHEDULE_OUT);
+			i915_request_put(rq);
 
-			GEM_BUG_ON(count == 0);
-			if (--count == 0) {
-				/*
-				 * On the final event corresponding to the
-				 * submission of this context, we expect either
-				 * an element-switch event or a completion
-				 * event (and on completion, the active-idle
-				 * marker). No more preemptions, lite-restore
-				 * or otherwise.
-				 */
-				GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED);
-				GEM_BUG_ON(port_isset(&port[1]) &&
-					   !(status & GEN8_CTX_STATUS_ELEMENT_SWITCH));
-				GEM_BUG_ON(!port_isset(&port[1]) &&
-					   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
+			GEM_TRACE("%s completed ctx=%d\n",
+				  engine->name, port->context_id);
 
-				/*
-				 * We rely on the hardware being strongly
-				 * ordered, that the breadcrumb write is
-				 * coherent (visible from the CPU) before the
-				 * user interrupt and CSB is processed.
-				 */
-				GEM_BUG_ON(!i915_request_completed(rq));
-
-				execlists_context_schedule_out(rq,
-							       INTEL_CONTEXT_SCHEDULE_OUT);
-				i915_request_put(rq);
-
-				GEM_TRACE("%s completed ctx=%d\n",
-					  engine->name, port->context_id);
-
-				port = execlists_port_complete(execlists, port);
-				if (port_isset(port))
-					execlists_user_begin(execlists, port);
-				else
-					execlists_user_end(execlists);
-			} else {
-				port_set(port, port_pack(rq, count));
-			}
+			port = execlists_port_complete(execlists, port);
+			if (port_isset(port))
+				execlists_user_begin(execlists, port);
+			else
+				execlists_user_end(execlists);
+		} else {
+			port_set(port, port_pack(rq, count));
 		}
+	}
 
-		if (head != execlists->csb_head) {
-			execlists->csb_head = head;
-			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
-			       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
-		}
-	} while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
+	if (head != execlists->csb_head) {
+		execlists->csb_head = head;
+		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
+		       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+	}
 
 	if (unlikely(fw))
 		intel_uncore_forcewake_put(i915, execlists->fw_domains);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 13/18] drm/i915/execlists: Unify CSB access pointers
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (11 preceding siblings ...)
  2018-05-25  9:32 ` [PATCH 12/18] drm/i915/execlists: Process one CSB interrupt at a time Chris Wilson
@ 2018-05-25  9:32 ` Chris Wilson
  2018-05-25  9:32 ` [PATCH 14/18] drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd) Chris Wilson
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:32 UTC (permalink / raw)
  To: intel-gfx

Following the removal of the last workarounds, the only CSB mmio access
is for the old vGPU interface. The mmio registers presented by vGPU do
not require forcewake and can be treated as ordinary volatile memory,
i.e. they behave just like the HWSP access just at a different location.
We can reduce the CSB access inside the irq handler to a set of
read/write/buffer pointers and treat the various paths identically and
not worry about forcewake.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  |  12 ---
 drivers/gpu/drm/i915/intel_lrc.c        | 116 ++++++++++--------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  23 +++--
 3 files changed, 65 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 13448ea76f57..bc0193199a03 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -25,7 +25,6 @@
 #include <drm/drm_print.h>
 
 #include "i915_drv.h"
-#include "i915_vgpu.h"
 #include "intel_ringbuffer.h"
 #include "intel_lrc.h"
 
@@ -456,21 +455,10 @@ static void intel_engine_init_batch_pool(struct intel_engine_cs *engine)
 	i915_gem_batch_pool_init(&engine->batch_pool, engine);
 }
 
-static bool csb_force_mmio(struct drm_i915_private *i915)
-{
-	/* Older GVT emulation depends upon intercepting CSB mmio */
-	if (intel_vgpu_active(i915) && !intel_vgpu_has_hwsp_emulation(i915))
-		return true;
-
-	return false;
-}
-
 static void intel_engine_init_execlist(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 
-	execlists->csb_use_mmio = csb_force_mmio(engine->i915);
-
 	execlists->port_mask = 1;
 	BUILD_BUG_ON_NOT_POWER_OF_2(execlists_num_ports(execlists));
 	GEM_BUG_ON(execlists_num_ports(execlists) > EXECLIST_MAX_PORTS);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7c2d9310fb36..71693008b5d0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -137,6 +137,7 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 #include "i915_gem_render_state.h"
+#include "i915_vgpu.h"
 #include "intel_lrc_reg.h"
 #include "intel_mocs.h"
 #include "intel_workarounds.h"
@@ -953,44 +954,23 @@ static void process_csb(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
-	struct drm_i915_private *i915 = engine->i915;
-
-	/* The HWSP contains a (cacheable) mirror of the CSB */
-	const u32 *buf =
-		&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
-	unsigned int head, tail;
-	bool fw = false;
+	const u32 * const buf = execlists->csb_status;
+	u8 head, tail;
 
 	/* Clear before reading to catch new interrupts */
 	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 	smp_mb__after_atomic();
 
-	if (unlikely(execlists->csb_use_mmio)) {
-		intel_uncore_forcewake_get(i915, execlists->fw_domains);
-		fw = true;
-
-		buf = (u32 * __force)
-			(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
-
-		head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
-		tail = GEN8_CSB_WRITE_PTR(head);
-		head = GEN8_CSB_READ_PTR(head);
-		execlists->csb_head = head;
-	} else {
-		const int write_idx =
-			intel_hws_csb_write_index(i915) -
-			I915_HWS_CSB_BUF0_INDEX;
+	/* Note that csb_write, csb_status may be either in HWSP or mmio */
+	head = execlists->csb_head;
+	tail = READ_ONCE(*execlists->csb_write);
+	GEM_TRACE("%s cs-irq head=%d, tail=%d\n", engine->name, head, tail);
+	if (unlikely(head == tail))
+		return;
 
-		head = execlists->csb_head;
-		tail = READ_ONCE(buf[write_idx]);
-		rmb(); /* Hopefully paired with a wmb() in HW */
-	}
-	GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
-		  engine->name,
-		  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
-		  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
+	rmb(); /* Hopefully paired with a wmb() in HW */
 
-	while (head != tail) {
+	do {
 		struct i915_request *rq;
 		unsigned int status;
 		unsigned int count;
@@ -1016,12 +996,12 @@ static void process_csb(struct intel_engine_cs *engine)
 		 * status notifier.
 		 */
 
-		status = READ_ONCE(buf[2 * head]); /* maybe mmio! */
 		GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x, active=0x%x\n",
 			  engine->name, head,
-			  status, buf[2*head + 1],
+			  buf[2 * head + 0], buf[2 * head + 1],
 			  execlists->active);
 
+		status = buf[2 * head];
 		if (status & (GEN8_CTX_STATUS_IDLE_ACTIVE |
 			      GEN8_CTX_STATUS_PREEMPTED))
 			execlists_set_active(execlists,
@@ -1103,16 +1083,11 @@ static void process_csb(struct intel_engine_cs *engine)
 		} else {
 			port_set(port, port_pack(rq, count));
 		}
-	}
-
-	if (head != execlists->csb_head) {
-		execlists->csb_head = head;
-		writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
-		       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
-	}
+	} while (head != tail);
 
-	if (unlikely(fw))
-		intel_uncore_forcewake_put(i915, execlists->fw_domains);
+	writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
+	       execlists->csb_read);
+	execlists->csb_head = head;
 }
 
 /*
@@ -2390,28 +2365,11 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
 static void
 logical_ring_setup(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-	enum forcewake_domains fw_domains;
-
 	intel_engine_setup_common(engine);
 
 	/* Intentionally left blank. */
 	engine->buffer = NULL;
 
-	fw_domains = intel_uncore_forcewake_for_reg(dev_priv,
-						    RING_ELSP(engine),
-						    FW_REG_WRITE);
-
-	fw_domains |= intel_uncore_forcewake_for_reg(dev_priv,
-						     RING_CONTEXT_STATUS_PTR(engine),
-						     FW_REG_READ | FW_REG_WRITE);
-
-	fw_domains |= intel_uncore_forcewake_for_reg(dev_priv,
-						     RING_CONTEXT_STATUS_BUF_BASE(engine),
-						     FW_REG_READ);
-
-	engine->execlists.fw_domains = fw_domains;
-
 	tasklet_init(&engine->execlists.tasklet,
 		     execlists_submission_tasklet, (unsigned long)engine);
 
@@ -2419,34 +2377,56 @@ logical_ring_setup(struct intel_engine_cs *engine)
 	logical_ring_default_irqs(engine);
 }
 
+static bool csb_force_mmio(struct drm_i915_private *i915)
+{
+	/* Older GVT emulation depends upon intercepting CSB mmio */
+	return intel_vgpu_active(i915) && !intel_vgpu_has_hwsp_emulation(i915);
+}
+
 static int logical_ring_init(struct intel_engine_cs *engine)
 {
+	struct drm_i915_private *i915 = engine->i915;
+	struct intel_engine_execlists * const execlists = &engine->execlists;
 	int ret;
 
 	ret = intel_engine_init_common(engine);
 	if (ret)
 		goto error;
 
-	if (HAS_LOGICAL_RING_ELSQ(engine->i915)) {
-		engine->execlists.submit_reg = engine->i915->regs +
+	if (HAS_LOGICAL_RING_ELSQ(i915)) {
+		execlists->submit_reg = i915->regs +
 			i915_mmio_reg_offset(RING_EXECLIST_SQ_CONTENTS(engine));
-		engine->execlists.ctrl_reg = engine->i915->regs +
+		execlists->ctrl_reg = i915->regs +
 			i915_mmio_reg_offset(RING_EXECLIST_CONTROL(engine));
 	} else {
-		engine->execlists.submit_reg = engine->i915->regs +
+		execlists->submit_reg = i915->regs +
 			i915_mmio_reg_offset(RING_ELSP(engine));
 	}
 
-	engine->execlists.preempt_complete_status = ~0u;
-	if (engine->i915->preempt_context) {
+	execlists->preempt_complete_status = ~0u;
+	if (i915->preempt_context) {
 		struct intel_context *ce =
-			to_intel_context(engine->i915->preempt_context, engine);
+			to_intel_context(i915->preempt_context, engine);
 
-		engine->execlists.preempt_complete_status =
+		execlists->preempt_complete_status =
 			upper_32_bits(ce->lrc_desc);
 	}
 
-	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
+	execlists->csb_head = GEN8_CSB_ENTRIES - 1;
+	execlists->csb_read =
+		i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
+	if (csb_force_mmio(i915)) {
+		execlists->csb_status =
+			i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0));
+
+		execlists->csb_write = execlists->csb_read;
+	} else {
+		execlists->csb_status =
+			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
+
+		execlists->csb_write =
+			&engine->status_page.page_addr[intel_hws_csb_write_index(i915)];
+	}
 
 	return 0;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index acef385c4c80..b92e76c23843 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -299,19 +299,30 @@ struct intel_engine_execlists {
 	struct rb_node *first;
 
 	/**
-	 * @fw_domains: forcewake domains for irq tasklet
+	 * @csb_head: context status buffer head
 	 */
-	unsigned int fw_domains;
+	unsigned int csb_head;
 
 	/**
-	 * @csb_head: context status buffer head
+	 * @csb_read: control register for Context Switch buffer
+	 *
+	 * Note this register is always in mmio.
 	 */
-	unsigned int csb_head;
+	u32 *csb_read;
 
 	/**
-	 * @csb_use_mmio: access csb through mmio, instead of hwsp
+	 * @csb_write: control register for Context Switch buffer
+	 *
+	 * Note this register may be either mmio or HWSP shadow.
+	 */
+	u32 *csb_write;
+
+	/**
+	 * @csb_status: status array for Context Switch buffer
+	 *
+	 * Note these register may be either mmio or HWSP shadow.
 	 */
-	bool csb_use_mmio;
+	u32 *csb_status;
 
 	/**
 	 * @preempt_complete_status: expected CSB upon completing preemption
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 14/18] drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (12 preceding siblings ...)
  2018-05-25  9:32 ` [PATCH 13/18] drm/i915/execlists: Unify CSB access pointers Chris Wilson
@ 2018-05-25  9:32 ` Chris Wilson
  2018-05-25  9:32 ` [PATCH 15/18] drm/i915: Move rate-limiting request retire to after submission Chris Wilson
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:32 UTC (permalink / raw)
  To: intel-gfx

Back in commit 27af5eea54d1 ("drm/i915: Move execlists irq handler to a
bottom half"), we came to the conclusion that running our CSB processing
and ELSP submission from inside the irq handler was a bad idea. A really
bad idea as we could impose nearly 1s latency on other users of the
system, on average! Deferring our work to a tasklet allowed us to do the
processing with irqs enabled, reducing the impact to an average of about
50us.

We have since eradicated the use of forcewaked mmio from inside the CSB
processing and ELSP submission, bringing the impact down to around 5us
(on Kabylake); an order of magnitude better than our measurements 2
years ago on Broadwell and only about 2x worse on average than the
gem_syslatency on an unladen system.

In this iteration of the tasklet-vs-direct submission debate, we seek a
compromise where by we submit new requests immediately to the HW but
defer processing the CS interrupt onto a tasklet. We gain the advantage
of low-latency and ksoftirqd avoidance when waking up the HW, while
avoiding the system-wide starvation of our CS irq-storms.

Comparing the impact on the maximum latency observed (that is the time stolen
from an RT process) over a 120s interval, repeated several times (using
gem_syslatency, similar to RT's cyclictest) while the system is fully
laden with i915 nops, we see that direct submission an actually improve
the worse case.

Maximum latency in microseconds of a third party RT thread
(gem_syslatency -t 120 -f 2)
  x Always using tasklets (a couple of >1000us outliers removed)
  + Only using tasklets from CS irq, direct submission of requests
+------------------------------------------------------------------------+
|          +                                                             |
|          +                                                             |
|          +                                                             |
|          +       +                                                     |
|          + +     +                                                     |
|       +  + +     +  x     x     x                                      |
|      +++ + +     +  x  x  x  x  x  x                                   |
|      +++ + ++  + +  *x x  x  x  x  x                                   |
|      +++ + ++  + *  *x x  *  x  x  x                                   |
|    + +++ + ++  * * +*xxx  *  x  x  xx                                  |
|    * +++ + ++++* *x+**xx+ *  x  x xxxx x                               |
|   **x++++*++**+*x*x****x+ * +x xx xxxx x          x                    |
|x* ******+***************++*+***xxxxxx* xx*x     xxx +                x+|
|             |__________MA___________|                                  |
|      |______M__A________|                                              |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 118            91           186           124     125.28814     16.279137
+ 120            92           187           109     112.00833     13.458617
Difference at 95.0% confidence
	-13.2798 +/- 3.79219
	-10.5994% +/- 3.02677%
	(Student's t, pooled s = 14.9237)

However the mean latency is adversely affected:

Mean latency in microseconds of a third party RT thread
(gem_syslatency -t 120 -f 1)
  x Always using tasklets
  + Only using tasklets from CS irq, direct submission of requests
+------------------------------------------------------------------------+
|           xxxxxx                                        +   ++         |
|           xxxxxx                                        +   ++         |
|           xxxxxx                                      + +++ ++         |
|           xxxxxxx                                     +++++ ++         |
|           xxxxxxx                                     +++++ ++         |
|           xxxxxxx                                     +++++ +++        |
|           xxxxxxx                                   + ++++++++++       |
|           xxxxxxxx                                 ++ ++++++++++       |
|           xxxxxxxx                                 ++ ++++++++++       |
|          xxxxxxxxxx                                +++++++++++++++     |
|         xxxxxxxxxxx    x                           +++++++++++++++     |
|x       xxxxxxxxxxxxx   x           +            + ++++++++++++++++++  +|
|           |__A__|                                                      |
|                                                      |____A___|        |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 120         3.506         3.727         3.631     3.6321417    0.02773109
+ 120         3.834         4.149         4.039     4.0375167   0.041221676
Difference at 95.0% confidence
	0.405375 +/- 0.00888913
	11.1608% +/- 0.244735%
	(Student's t, pooled s = 0.03513)

However, since the mean latency corresponds to the amount of irqsoff
processing we have to do for a CS interrupt, we only need to speed that
up to benefit not just system latency but our own throughput.

v2: Remember to defer submissions when under reset.
v4: Only use direct submission for new requests

Testcase: igt/gem_exec_latency/*rthog*
References: 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.h         |   5 +
 drivers/gpu/drm/i915/i915_irq.c         |  11 +-
 drivers/gpu/drm/i915/intel_engine_cs.c  |   8 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 147 ++++++++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 -
 5 files changed, 98 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 261da577829a..7892ac773916 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -88,4 +88,9 @@ static inline void __tasklet_enable_sync_once(struct tasklet_struct *t)
 		tasklet_kill(t);
 }
 
+static inline bool __tasklet_is_enabled(const struct tasklet_struct *t)
+{
+	return likely(!atomic_read(&t->count));
+}
+
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f9bc3aaa90d0..3a8eacab5172 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1462,14 +1462,10 @@ static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
 static void
 gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
 	bool tasklet = false;
 
-	if (iir & GT_CONTEXT_SWITCH_INTERRUPT) {
-		if (READ_ONCE(engine->execlists.active))
-			tasklet = !test_and_set_bit(ENGINE_IRQ_EXECLIST,
-						    &engine->irq_posted);
-	}
+	if (iir & GT_CONTEXT_SWITCH_INTERRUPT)
+		tasklet = true;
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
 		notify_ring(engine);
@@ -1477,7 +1473,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 	}
 
 	if (tasklet)
-		tasklet_hi_schedule(&execlists->tasklet);
+		tasklet_hi_schedule(&engine->execlists.tasklet);
 }
 
 static void gen8_gt_irq_ack(struct drm_i915_private *i915,
@@ -2185,7 +2181,6 @@ static irqreturn_t cherryview_irq_handler(int irq, void *arg)
 
 		I915_WRITE(VLV_IER, ier);
 		I915_WRITE(GEN8_MASTER_IRQ, GEN8_MASTER_IRQ_CONTROL);
-		POSTING_READ(GEN8_MASTER_IRQ);
 
 		gen8_gt_irq_handler(dev_priv, master_ctl, gt_iir);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index bc0193199a03..43880bf0f529 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1329,12 +1329,10 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
 		ptr = I915_READ(RING_CONTEXT_STATUS_PTR(engine));
 		read = GEN8_CSB_READ_PTR(ptr);
 		write = GEN8_CSB_WRITE_PTR(ptr);
-		drm_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws], interrupt posted? %s, tasklet queued? %s (%s)\n",
+		drm_printf(m, "\tExeclist CSB read %d [%d cached], write %d [%d from hws], tasklet queued? %s (%s)\n",
 			   read, execlists->csb_head,
 			   write,
 			   intel_read_status_page(engine, intel_hws_csb_write_index(engine->i915)),
-			   yesno(test_bit(ENGINE_IRQ_EXECLIST,
-					  &engine->irq_posted)),
 			   yesno(test_bit(TASKLET_STATE_SCHED,
 					  &engine->execlists.tasklet.state)),
 			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)));
@@ -1515,11 +1513,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	spin_unlock(&b->rb_lock);
 	local_irq_restore(flags);
 
-	drm_printf(m, "IRQ? 0x%lx (breadcrumbs? %s) (execlists? %s)\n",
+	drm_printf(m, "IRQ? 0x%lx (breadcrumbs? %s)\n",
 		   engine->irq_posted,
 		   yesno(test_bit(ENGINE_IRQ_BREADCRUMB,
-				  &engine->irq_posted)),
-		   yesno(test_bit(ENGINE_IRQ_EXECLIST,
 				  &engine->irq_posted)));
 
 	drm_printf(m, "HWSP:\n");
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 71693008b5d0..c88ea807945a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -557,13 +557,15 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
 {
 	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
 
+	__unwind_incomplete_requests(container_of(execlists,
+						  typeof(struct intel_engine_cs),
+						  execlists));
 	execlists_cancel_port_requests(execlists);
-	execlists_unwind_incomplete_requests(execlists);
 
 	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
 }
 
-static void __execlists_dequeue(struct intel_engine_cs *engine)
+static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
@@ -575,7 +577,11 @@ static void __execlists_dequeue(struct intel_engine_cs *engine)
 
 	lockdep_assert_held(&engine->timeline.lock);
 
-	/* Hardware submission is through 2 ports. Conceptually each port
+	GEM_BUG_ON(execlists_is_active(&engine->execlists,
+				       EXECLISTS_ACTIVE_PREEMPT));
+
+	/*
+	 * Hardware submission is through 2 ports. Conceptually each port
 	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
 	 * static for a context, and unique to each, so we only execute
 	 * requests belonging to a single context from each ring. RING_HEAD
@@ -765,15 +771,6 @@ static void __execlists_dequeue(struct intel_engine_cs *engine)
 		   !port_isset(engine->execlists.port));
 }
 
-static void execlists_dequeue(struct intel_engine_cs *engine)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&engine->timeline.lock, flags);
-	__execlists_dequeue(engine);
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
-}
-
 void
 execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 {
@@ -870,14 +867,6 @@ static void reset_irq(struct intel_engine_cs *engine)
 	synchronize_hardirq(engine->i915->drm.irq);
 
 	clear_gtiir(engine);
-
-	/*
-	 * The port is checked prior to scheduling a tasklet, but
-	 * just in case we have suspended the tasklet to do the
-	 * wedging make sure that when it wakes, it decides there
-	 * is no work to do by clearing the irq_posted bit.
-	 */
-	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 }
 
 static void execlists_cancel_requests(struct intel_engine_cs *engine)
@@ -950,6 +939,12 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	local_irq_restore(flags);
 }
 
+static inline bool
+reset_in_progress(const struct intel_engine_execlists *execlists)
+{
+	return unlikely(!__tasklet_is_enabled(&execlists->tasklet));
+}
+
 static void process_csb(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -957,10 +952,6 @@ static void process_csb(struct intel_engine_cs *engine)
 	const u32 * const buf = execlists->csb_status;
 	u8 head, tail;
 
-	/* Clear before reading to catch new interrupts */
-	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-	smp_mb__after_atomic();
-
 	/* Note that csb_write, csb_status may be either in HWSP or mmio */
 	head = execlists->csb_head;
 	tail = READ_ONCE(*execlists->csb_write);
@@ -1090,19 +1081,9 @@ static void process_csb(struct intel_engine_cs *engine)
 	execlists->csb_head = head;
 }
 
-/*
- * Check the unread Context Status Buffers and manage the submission of new
- * contexts to the ELSP accordingly.
- */
-static void execlists_submission_tasklet(unsigned long data)
+static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 {
-	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
-
-	GEM_TRACE("%s awake?=%d, active=%x, irq-posted?=%d\n",
-		  engine->name,
-		  engine->i915->gt.awake,
-		  engine->execlists.active,
-		  test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
+	lockdep_assert_held(&engine->timeline.lock);
 
 	/*
 	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
@@ -1114,18 +1095,32 @@ static void execlists_submission_tasklet(unsigned long data)
 	 */
 	GEM_BUG_ON(!engine->i915->gt.awake);
 
-	/*
-	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
-	 * imposing the cost of a locked atomic transaction when submitting a
-	 * new request (outside of the context-switch interrupt).
-	 */
-	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
-		process_csb(engine);
-
+	process_csb(engine);
 	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
 		execlists_dequeue(engine);
 }
 
+/*
+ * Check the unread Context Status Buffers and manage the submission of new
+ * contexts to the ELSP accordingly.
+ */
+static void execlists_submission_tasklet(unsigned long data)
+{
+	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
+	unsigned long flags;
+
+	GEM_TRACE("%s awake?=%d, active=%x\n",
+		  engine->name,
+		  engine->i915->gt.awake,
+		  engine->execlists.active);
+
+	spin_lock_irqsave(&engine->timeline.lock, flags);
+
+	__execlists_submission_tasklet(engine);
+
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+}
+
 static void queue_request(struct intel_engine_cs *engine,
 			  struct i915_sched_node *node,
 			  int prio)
@@ -1134,16 +1129,30 @@ static void queue_request(struct intel_engine_cs *engine,
 		      &lookup_priolist(engine, prio)->requests);
 }
 
-static void __submit_queue(struct intel_engine_cs *engine, int prio)
+static void __update_queue(struct intel_engine_cs *engine, int prio)
 {
 	engine->execlists.queue_priority = prio;
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+}
+
+static void __submit_queue(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+
+	if (reset_in_progress(execlists))
+		return; /* defer until we restart the engine following reset */
+
+	if (execlists->tasklet.func == execlists_submission_tasklet)
+		__execlists_submission_tasklet(engine);
+	else
+		tasklet_hi_schedule(&execlists->tasklet);
 }
 
 static void submit_queue(struct intel_engine_cs *engine, int prio)
 {
-	if (prio > engine->execlists.queue_priority)
-		__submit_queue(engine, prio);
+	if (prio > engine->execlists.queue_priority) {
+		__update_queue(engine, prio);
+		__submit_queue(engine);
+	}
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -1155,11 +1164,12 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	queue_request(engine, &request->sched, rq_prio(request));
-	submit_queue(engine, rq_prio(request));
 
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->sched.link));
 
+	submit_queue(engine, rq_prio(request));
+
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
@@ -1286,8 +1296,11 @@ static void execlists_schedule(struct i915_request *request,
 		}
 
 		if (prio > engine->execlists.queue_priority &&
-		    i915_sw_fence_done(&sched_to_request(node)->submit))
-			__submit_queue(engine, prio);
+		    i915_sw_fence_done(&sched_to_request(node)->submit)) {
+			/* defer submission until after all of our updates */
+			__update_queue(engine, prio);
+			tasklet_hi_schedule(&engine->execlists.tasklet);
+		}
 	}
 
 	spin_unlock_irq(&engine->timeline.lock);
@@ -1831,6 +1844,7 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_request *request, *active;
+	unsigned long flags;
 
 	GEM_TRACE("%s\n", engine->name);
 
@@ -1854,8 +1868,10 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	 * and avoid blaming an innocent request if the stall was due to the
 	 * preemption itself.
 	 */
-	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
-		process_csb(engine);
+	spin_lock_irqsave(&engine->timeline.lock, flags);
+	synchronize_hardirq(engine->i915->drm.irq);
+
+	process_csb(engine);
 
 	/*
 	 * The last active request can then be no later than the last request
@@ -1865,15 +1881,12 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	active = NULL;
 	request = port_request(execlists->port);
 	if (request) {
-		unsigned long flags;
-
 		/*
 		 * Prevent the breadcrumb from advancing before we decide
 		 * which request is currently active.
 		 */
 		intel_engine_stop_cs(engine);
 
-		spin_lock_irqsave(&engine->timeline.lock, flags);
 		list_for_each_entry_from_reverse(request,
 						 &engine->timeline.requests,
 						 link) {
@@ -1883,12 +1896,28 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 
 			active = request;
 		}
-		spin_unlock_irqrestore(&engine->timeline.lock, flags);
 	}
 
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+
 	return active;
 }
 
+static void reset_csb_pointers(struct intel_engine_execlists *execlists)
+{
+	/*
+	 * After a reset, the HW starts writing into CSB entry [0]. We
+	 * therefore have to set our HEAD pointer back one entry so that
+	 * the *first* entry we check is entry 0. To complicate this further,
+	 * as we don't wait for the first interrupt after reset, we have to
+	 * fake the HW write to point back to the last entry so that our
+	 * inline comparison of our cached head position against the last HW
+	 * write works even before the first interrupt.
+	 */
+	execlists->csb_head = GEN8_CSB_ENTRIES - 1;
+	WRITE_ONCE(*execlists->csb_write, (GEN8_CSB_ENTRIES - 1) | 0xff << 16);
+}
+
 static void execlists_reset(struct intel_engine_cs *engine,
 			    struct i915_request *request)
 {
@@ -1918,7 +1947,7 @@ static void execlists_reset(struct intel_engine_cs *engine,
 	__unwind_incomplete_requests(engine);
 
 	/* Following the reset, we need to reload the CSB read/write pointers */
-	engine->execlists.csb_head = GEN8_CSB_ENTRIES - 1;
+	reset_csb_pointers(&engine->execlists);
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
@@ -2412,7 +2441,6 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 			upper_32_bits(ce->lrc_desc);
 	}
 
-	execlists->csb_head = GEN8_CSB_ENTRIES - 1;
 	execlists->csb_read =
 		i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine));
 	if (csb_force_mmio(i915)) {
@@ -2427,6 +2455,7 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 		execlists->csb_write =
 			&engine->status_page.page_addr[intel_hws_csb_write_index(i915)];
 	}
+	reset_csb_pointers(execlists);
 
 	return 0;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b92e76c23843..636a1b1ac0dc 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -358,7 +358,6 @@ struct intel_engine_cs {
 	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
-#define ENGINE_IRQ_EXECLIST 1
 
 	/* Rather than have every client wait upon all user interrupts,
 	 * with the herd waking after every interrupt and each doing the
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 15/18] drm/i915: Move rate-limiting request retire to after submission
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (13 preceding siblings ...)
  2018-05-25  9:32 ` [PATCH 14/18] drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd) Chris Wilson
@ 2018-05-25  9:32 ` Chris Wilson
  2018-05-25  9:32 ` [PATCH 16/18] drm/i915: Wait for engines to idle before retiring Chris Wilson
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:32 UTC (permalink / raw)
  To: intel-gfx

Our long standing defense against a single client from flooding the
system with requests (causing mempressure and stalls across the whole
system) is to retire the old request on every allocation. (By retiring
the oldest, we try to keep returning requests back to the system in a
steady flow.) This adds an extra step into the submission path that we
can reduce simply by moving it to after the submission itself.

We already do try to clean up a stale request list after submission, so
always retiring all completed requests fits in as a natural extension.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f187250e60c6..796dad25e6ba 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -694,12 +694,6 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	if (ret)
 		goto err_unreserve;
 
-	/* Move our oldest request to the slab-cache (if not in use!) */
-	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
-	if (!list_is_last(&rq->ring_link, &ce->ring->request_list) &&
-	    i915_request_completed(rq))
-		i915_request_retire(rq);
-
 	/*
 	 * Beware: Dragons be flying overhead.
 	 *
@@ -1122,6 +1116,8 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
 
 	/*
+	 * Move our oldest requests to the slab-cache (if not in use!)
+	 *
 	 * In typical scenarios, we do not expect the previous request on
 	 * the timeline to be still tracked by timeline->last_request if it
 	 * has been completed. If the completed request is still here, that
@@ -1138,8 +1134,22 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	 * work on behalf of others -- but instead we should benefit from
 	 * improved resource management. (Well, that's the theory at least.)
 	 */
-	if (prev && i915_request_completed(prev))
-		i915_request_retire_upto(prev);
+	do {
+		prev = list_first_entry(&ring->request_list,
+					typeof(*prev), ring_link);
+
+		/*
+		 * Keep the current request, the caller may not be
+		 * expecting it to be retired (and freed!) immediately,
+		 * and preserving one request from the client allows us to
+		 * carry forward frequently reused state onto the next
+		 * submission.
+		 */
+		if (prev == request || !i915_request_completed(prev))
+			break;
+
+		i915_request_retire(prev);
+	} while (1);
 }
 
 static unsigned long local_clock_us(unsigned int *cpu)
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 16/18] drm/i915: Wait for engines to idle before retiring
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (14 preceding siblings ...)
  2018-05-25  9:32 ` [PATCH 15/18] drm/i915: Move rate-limiting request retire to after submission Chris Wilson
@ 2018-05-25  9:32 ` Chris Wilson
  2018-05-25  9:32 ` [PATCH 17/18] drm/i915: Move engine request retirement to intel_engine_cs Chris Wilson
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:32 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will start to defer retiring the request from the
engine list if it is still active on the submission backend. To preserve
the semantics that after wait-for-idle completes the system is idle and
fully retired, we need to therefore wait for the backends to idle before
calling i915_retire_requests().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c26f4da77e93..83f02ad43afa 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3764,10 +3764,13 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915, unsigned int flags)
 			if (err)
 				return err;
 		}
+
+		err = wait_for_engines(i915);
+		if (err)
+			return err;
+
 		i915_retire_requests(i915);
 		GEM_BUG_ON(i915->gt.active_requests);
-
-		return wait_for_engines(i915);
 	} else {
 		struct intel_engine_cs *engine;
 		enum intel_engine_id id;
@@ -3778,9 +3781,9 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915, unsigned int flags)
 			if (err)
 				return err;
 		}
-
-		return 0;
 	}
+
+	return 0;
 }
 
 static void __i915_gem_object_flush_for_display(struct drm_i915_gem_object *obj)
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 17/18] drm/i915: Move engine request retirement to intel_engine_cs
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (15 preceding siblings ...)
  2018-05-25  9:32 ` [PATCH 16/18] drm/i915: Wait for engines to idle before retiring Chris Wilson
@ 2018-05-25  9:32 ` Chris Wilson
  2018-05-25  9:32 ` [PATCH 18/18] drm/i915: Hold request reference for submission until retirement Chris Wilson
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:32 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will move ownership of the fence reference to the
submission backend and will want to drop its final reference when
retiring it from the submission backend. We will also need a catch up
when parking the engine to cleanup any residual entries in the engine
timeline. In short, move the engine retirement from i915_request.c to
intel_engine_cs.c for future use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c     | 47 +--------------------
 drivers/gpu/drm/i915/intel_engine_cs.c  | 54 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +
 3 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 796dad25e6ba..f91942e4d852 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -344,50 +344,6 @@ static void free_capture_list(struct i915_request *request)
 	}
 }
 
-static void __retire_engine_request(struct intel_engine_cs *engine,
-				    struct i915_request *rq)
-{
-	GEM_TRACE("%s(%s) fence %llx:%d, global=%d, current %d\n",
-		  __func__, engine->name,
-		  rq->fence.context, rq->fence.seqno,
-		  rq->global_seqno,
-		  intel_engine_get_seqno(engine));
-
-	GEM_BUG_ON(!i915_request_completed(rq));
-
-	local_irq_disable();
-
-	spin_lock(&engine->timeline.lock);
-	GEM_BUG_ON(!list_is_first(&rq->link, &engine->timeline.requests));
-	list_del_init(&rq->link);
-	spin_unlock(&engine->timeline.lock);
-
-	spin_lock(&rq->lock);
-	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
-		dma_fence_signal_locked(&rq->fence);
-	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
-		intel_engine_cancel_signaling(rq);
-	if (rq->waitboost) {
-		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
-		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
-	}
-	spin_unlock(&rq->lock);
-
-	local_irq_enable();
-
-	/*
-	 * The backing object for the context is done after switching to the
-	 * *next* context. Therefore we cannot retire the previous context until
-	 * the next context has already started running. However, since we
-	 * cannot take the required locks at i915_request_submit() we
-	 * defer the unpinning of the active context to now, retirement of
-	 * the subsequent request.
-	 */
-	if (engine->last_retired_context)
-		intel_context_unpin(engine->last_retired_context);
-	engine->last_retired_context = rq->hw_context;
-}
-
 static void __retire_engine_upto(struct intel_engine_cs *engine,
 				 struct i915_request *rq)
 {
@@ -400,8 +356,7 @@ static void __retire_engine_upto(struct intel_engine_cs *engine,
 		tmp = list_first_entry(&engine->timeline.requests,
 				       typeof(*tmp), link);
 
-		GEM_BUG_ON(tmp->engine != engine);
-		__retire_engine_request(engine, tmp);
+		intel_engine_retire_request(engine, tmp);
 	} while (tmp != rq);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 43880bf0f529..cce7234b9071 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1064,6 +1064,60 @@ void intel_engines_reset_default_submission(struct drm_i915_private *i915)
 		engine->set_default_submission(engine);
 }
 
+/**
+ * intel_engines_retire_request: drop the request reference from the engine
+ * @engine: the engine
+ * @rq: the request
+ *
+ * This request has been completed and is part of the chain being retired by
+ * the caller, so drop any reference to it from the engine.
+ */
+void intel_engine_retire_request(struct intel_engine_cs *engine,
+				 struct i915_request *rq)
+{
+	GEM_TRACE("%s(%s) fence %llx:%d, global=%d, current %d\n",
+		  __func__, engine->name,
+		  rq->fence.context, rq->fence.seqno,
+		  rq->global_seqno,
+		  intel_engine_get_seqno(engine));
+
+	lockdep_assert_held(&engine->i915->drm.struct_mutex);
+	GEM_BUG_ON(rq->engine != engine);
+	GEM_BUG_ON(!i915_request_completed(rq));
+
+	local_irq_disable();
+
+	spin_lock(&engine->timeline.lock);
+	GEM_BUG_ON(!list_is_first(&rq->link, &engine->timeline.requests));
+	list_del_init(&rq->link);
+	spin_unlock(&engine->timeline.lock);
+
+	spin_lock(&rq->lock);
+	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
+		dma_fence_signal_locked(&rq->fence);
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
+		intel_engine_cancel_signaling(rq);
+	if (rq->waitboost) {
+		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
+		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
+	}
+	spin_unlock(&rq->lock);
+
+	local_irq_enable();
+
+	/*
+	 * The backing object for the context is done after switching to the
+	 * *next* context. Therefore we cannot retire the previous context until
+	 * the next context has already started running. However, since we
+	 * cannot take the required locks at i915_request_submit() we
+	 * defer the unpinning of the active context to now, retirement of
+	 * the subsequent request.
+	 */
+	if (engine->last_retired_context)
+		intel_context_unpin(engine->last_retired_context);
+	engine->last_retired_context = rq->hw_context;
+}
+
 /**
  * intel_engines_park: called when the GT is transitioning from busy->idle
  * @i915: the i915 device
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 636a1b1ac0dc..86d99366e7ed 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -888,6 +888,8 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_blt_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine);
 
+void intel_engine_retire_request(struct intel_engine_cs *engine,
+				 struct i915_request *rq);
 int intel_engine_stop_cs(struct intel_engine_cs *engine);
 
 u64 intel_engine_get_active_head(const struct intel_engine_cs *engine);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 18/18] drm/i915: Hold request reference for submission until retirement
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (16 preceding siblings ...)
  2018-05-25  9:32 ` [PATCH 17/18] drm/i915: Move engine request retirement to intel_engine_cs Chris Wilson
@ 2018-05-25  9:32 ` Chris Wilson
  2018-05-25  9:55 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915: Prepare GEM for suspend earlier Patchwork
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25  9:32 UTC (permalink / raw)
  To: intel-gfx

Currently the async submission backends (guc and execlists) hold a extra
reference to the requests being processed as they are not serialised with
request retirement. If we instead, prevent the request being dropped
from the engine timeline until after submission has finished processing
the request, we can use a single reference held for the entire
submission process (currently, it is held only for the submission
fence).

By doing so we remove a few atomics from inside the irqoff path, on the
order of 200ns as measured by gem_syslatency, bringing the impact of
direct submission into line with the previous tasklet implementation.
The tradeoff is that as we may postpone the retirement, we have to check
for any residual requests after detecting that the engines are idle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_request.c         | 20 ++++++++---------
 drivers/gpu/drm/i915/intel_engine_cs.c      | 24 ++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_guc_submission.c |  4 +---
 drivers/gpu/drm/i915/intel_lrc.c            | 10 +--------
 drivers/gpu/drm/i915/intel_ringbuffer.h     |  2 +-
 5 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f91942e4d852..2cfa6f3a2f16 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -347,17 +347,15 @@ static void free_capture_list(struct i915_request *request)
 static void __retire_engine_upto(struct intel_engine_cs *engine,
 				 struct i915_request *rq)
 {
+	struct list_head * const requests = &engine->timeline.requests;
 	struct i915_request *tmp;
 
 	if (list_empty(&rq->link))
 		return;
 
-	do {
-		tmp = list_first_entry(&engine->timeline.requests,
-				       typeof(*tmp), link);
-
-		intel_engine_retire_request(engine, tmp);
-	} while (tmp != rq);
+	do
+		tmp = list_first_entry(requests, typeof(*tmp), link);
+	while (intel_engine_retire_request(engine, tmp) && tmp != rq);
 }
 
 static void i915_request_retire(struct i915_request *request)
@@ -376,6 +374,8 @@ static void i915_request_retire(struct i915_request *request)
 
 	trace_i915_request_retire(request);
 
+	__retire_engine_upto(request->engine, request);
+
 	advance_ring(request);
 	free_capture_list(request);
 
@@ -414,8 +414,6 @@ static void i915_request_retire(struct i915_request *request)
 	atomic_dec_if_positive(&request->gem_context->ban_score);
 	intel_context_unpin(request->hw_context);
 
-	__retire_engine_upto(request->engine, request);
-
 	unreserve_gt(request->i915);
 
 	i915_sched_node_fini(request->i915, &request->sched);
@@ -722,8 +720,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		       rq->timeline->fence_context,
 		       timeline_get_seqno(rq->timeline));
 
-	/* We bump the ref for the fence chain */
-	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
+	/* We bump the ref for the fence chain and for the submit backend. */
+	refcount_set(&rq->fence.refcount.refcount, 3);
+
+	i915_sw_fence_init(&rq->submit, submit_notify);
 	init_waitqueue_head(&rq->execute);
 
 	i915_sched_node_init(&rq->sched);
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index cce7234b9071..b45d6aa7d29d 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1071,8 +1071,10 @@ void intel_engines_reset_default_submission(struct drm_i915_private *i915)
  *
  * This request has been completed and is part of the chain being retired by
  * the caller, so drop any reference to it from the engine.
+ *
+ * Returns: true if the reference was dropped, false if it was still busy.
  */
-void intel_engine_retire_request(struct intel_engine_cs *engine,
+bool intel_engine_retire_request(struct intel_engine_cs *engine,
 				 struct i915_request *rq)
 {
 	GEM_TRACE("%s(%s) fence %llx:%d, global=%d, current %d\n",
@@ -1085,6 +1087,10 @@ void intel_engine_retire_request(struct intel_engine_cs *engine,
 	GEM_BUG_ON(rq->engine != engine);
 	GEM_BUG_ON(!i915_request_completed(rq));
 
+	/* Don't drop the final ref until after the backend has finished */
+	if (port_request(engine->execlists.port) == rq)
+		return false;
+
 	local_irq_disable();
 
 	spin_lock(&engine->timeline.lock);
@@ -1116,6 +1122,19 @@ void intel_engine_retire_request(struct intel_engine_cs *engine,
 	if (engine->last_retired_context)
 		intel_context_unpin(engine->last_retired_context);
 	engine->last_retired_context = rq->hw_context;
+
+	i915_request_put(rq);
+	return true;
+}
+
+static void engine_retire_requests(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq, *next;
+
+	list_for_each_entry_safe(rq, next, &engine->timeline.requests, link) {
+		if (WARN_ON(!intel_engine_retire_request(engine, rq)))
+			break;
+	}
 }
 
 /**
@@ -1148,6 +1167,7 @@ void intel_engines_park(struct drm_i915_private *i915)
 				"%s is not idle before parking\n",
 				engine->name);
 			intel_engine_dump(engine, &p, NULL);
+			engine->cancel_requests(engine);
 		}
 
 		/* Must be reset upon idling, or we may miss the busy wakeup. */
@@ -1156,6 +1176,8 @@ void intel_engines_park(struct drm_i915_private *i915)
 		if (engine->park)
 			engine->park(engine);
 
+		engine_retire_requests(engine);
+
 		if (engine->pinned_default_state) {
 			i915_gem_object_unpin_map(engine->default_state);
 			engine->pinned_default_state = NULL;
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 133367a17863..6f6223644140 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -669,8 +669,7 @@ static void guc_submit(struct intel_engine_cs *engine)
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(port_isset(port));
-
-	port_set(port, i915_request_get(rq));
+	port_set(port, rq);
 }
 
 static inline int rq_prio(const struct i915_request *rq)
@@ -793,7 +792,6 @@ static void guc_submission_tasklet(unsigned long data)
 	rq = port_request(port);
 	while (rq && i915_request_completed(rq)) {
 		trace_i915_request_out(rq);
-		i915_request_put(rq);
 
 		port = execlists_port_complete(execlists, port);
 		if (port_isset(port)) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c88ea807945a..a0b139debe1f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -513,11 +513,7 @@ static bool can_merge_ctx(const struct intel_context *prev,
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(rq == port_request(port));
-
-	if (port_isset(port))
-		i915_request_put(port_request(port));
-
-	port_set(port, port_pack(i915_request_get(rq), port_count(port)));
+	port_set(port, port_pack(rq, port_count(port)));
 }
 
 static void inject_preempt_context(struct intel_engine_cs *engine)
@@ -793,8 +789,6 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 					       INTEL_CONTEXT_SCHEDULE_OUT :
 					       INTEL_CONTEXT_SCHEDULE_PREEMPTED);
 
-		i915_request_put(rq);
-
 		memset(port, 0, sizeof(*port));
 		port++;
 	}
@@ -1061,8 +1055,6 @@ static void process_csb(struct intel_engine_cs *engine)
 
 			execlists_context_schedule_out(rq,
 						       INTEL_CONTEXT_SCHEDULE_OUT);
-			i915_request_put(rq);
-
 			GEM_TRACE("%s completed ctx=%d\n",
 				  engine->name, port->context_id);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 86d99366e7ed..1d4847c11d71 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -888,7 +888,7 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_blt_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine);
 
-void intel_engine_retire_request(struct intel_engine_cs *engine,
+bool intel_engine_retire_request(struct intel_engine_cs *engine,
 				 struct i915_request *rq);
 int intel_engine_stop_cs(struct intel_engine_cs *engine);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915: Prepare GEM for suspend earlier
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (17 preceding siblings ...)
  2018-05-25  9:32 ` [PATCH 18/18] drm/i915: Hold request reference for submission until retirement Chris Wilson
@ 2018-05-25  9:55 ` Patchwork
  2018-05-25 10:00 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2018-05-25  9:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915: Prepare GEM for suspend earlier
URL   : https://patchwork.freedesktop.org/series/43765/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
f6ebcbc1316a drm/i915: Prepare GEM for suspend earlier
14c41b87a902 drm/i915: Switch to kernel context before idling at runtime
5523a66842f6 drm/i915: "Race-to-idle" after switching to the kernel context
a18c5ca5e1be drm/i915: After reset on sanitization, reset the engine backends
13da5d1a87bb drm/i915: Only sanitize GEM from late suspend
6f2dabddf373 drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset
-:11: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#11: 
<7>[  239.094843] i915_gem_set_wedged 	current seqno 19a98, last 19a9a, hangcheck 0 [5158 ms]

total: 0 errors, 1 warnings, 0 checks, 40 lines checked
add445ce0814 drm/i915: Be irqsafe inside reset
6ee85e893f58 drm/i915/execlists: Wait for ELSP submission on restart
-:53: WARNING:TYPO_SPELLING: 'mutiple' may be misspelled - perhaps 'multiple'?
#53: FILE: drivers/gpu/drm/i915/intel_lrc.c:2018:
+	 * serialising mutiple attempts to reset so that we know that we

total: 0 errors, 1 warnings, 0 checks, 37 lines checked
7d22a392f8d1 drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
-:41: WARNING:LONG_LINE: line over 100 characters
#41: FILE: drivers/gpu/drm/i915/intel_lrc.c:979:
+				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));

total: 0 errors, 1 warnings, 0 checks, 52 lines checked
f00e2ca10476 drm/i915/execlists: Pull submit after dequeue under timeline lock
4474beaad02f drm/i915/execlists: Pull CSB reset under the timeline.lock
a1159c99f645 drm/i915/execlists: Process one CSB interrupt at a time
-:35: WARNING:MEMORY_BARRIER: memory barrier without comment
#35: FILE: drivers/gpu/drm/i915/intel_lrc.c:966:
+	smp_mb__after_atomic();

-:77: WARNING:LONG_LINE: line over 100 characters
#77: FILE: drivers/gpu/drm/i915/intel_lrc.c:990:
+		  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",

-:78: WARNING:LONG_LINE: line over 100 characters
#78: FILE: drivers/gpu/drm/i915/intel_lrc.c:991:
+		  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");

-:140: CHECK:SPACING: spaces preferred around that '*' (ctx:VxV)
#140: FILE: drivers/gpu/drm/i915/intel_lrc.c:1022:
+			  status, buf[2*head + 1],
 			               ^

-:176: CHECK:SPACING: spaces preferred around that '*' (ctx:VxV)
#176: FILE: drivers/gpu/drm/i915/intel_lrc.c:1040:
+		    buf[2*head + 1] == execlists->preempt_complete_status) {
 		         ^

total: 0 errors, 3 warnings, 2 checks, 301 lines checked
ce11b28a15fa drm/i915/execlists: Unify CSB access pointers
1d85b2b170a0 drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
-:27: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#27: 
Comparing the impact on the maximum latency observed (that is the time stolen

-:100: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half")'
#100: 
References: 27af5eea54d1 ("drm/i915: Move execlists irq handler to a bottom half")

total: 1 errors, 1 warnings, 0 checks, 358 lines checked
5f5f78da7433 drm/i915: Move rate-limiting request retire to after submission
44f2b6d34b54 drm/i915: Wait for engines to idle before retiring
408446da1803 drm/i915: Move engine request retirement to intel_engine_cs
66703c7b803c drm/i915: Hold request reference for submission until retirement

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/18] drm/i915: Prepare GEM for suspend earlier
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (18 preceding siblings ...)
  2018-05-25  9:55 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915: Prepare GEM for suspend earlier Patchwork
@ 2018-05-25 10:00 ` Patchwork
  2018-05-25 10:16 ` ✓ Fi.CI.BAT: success " Patchwork
  2018-05-25 14:25 ` ✗ Fi.CI.IGT: failure " Patchwork
  21 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2018-05-25 10:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915: Prepare GEM for suspend earlier
URL   : https://patchwork.freedesktop.org/series/43765/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Prepare GEM for suspend earlier
Okay!

Commit: drm/i915: Switch to kernel context before idling at runtime
Okay!

Commit: drm/i915: "Race-to-idle" after switching to the kernel context
Okay!

Commit: drm/i915: After reset on sanitization, reset the engine backends
Okay!

Commit: drm/i915: Only sanitize GEM from late suspend
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3664:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3665:16: warning: expression using sizeof(void)

Commit: drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset
Okay!

Commit: drm/i915: Be irqsafe inside reset
Okay!

Commit: drm/i915/execlists: Wait for ELSP submission on restart
Okay!

Commit: drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
Okay!

Commit: drm/i915/execlists: Pull submit after dequeue under timeline lock
Okay!

Commit: drm/i915/execlists: Pull CSB reset under the timeline.lock
Okay!

Commit: drm/i915/execlists: Process one CSB interrupt at a time
Okay!

Commit: drm/i915/execlists: Unify CSB access pointers
+drivers/gpu/drm/i915/intel_lrc.c:1089:25:    expected void volatile [noderef] <asn:2>*addr
+drivers/gpu/drm/i915/intel_lrc.c:1089:25:    got unsigned int [usertype] *csb_read
+drivers/gpu/drm/i915/intel_lrc.c:1089:25: warning: incorrect type in argument 2 (different address spaces)
+drivers/gpu/drm/i915/intel_lrc.c:2416:29:    expected unsigned int [usertype] *csb_read
+drivers/gpu/drm/i915/intel_lrc.c:2416:29:    got void [noderef] <asn:2>*
+drivers/gpu/drm/i915/intel_lrc.c:2416:29: warning: incorrect type in assignment (different address spaces)
+drivers/gpu/drm/i915/intel_lrc.c:2419:39:    expected unsigned int [usertype] *csb_status
+drivers/gpu/drm/i915/intel_lrc.c:2419:39:    got void [noderef] <asn:2>*
+drivers/gpu/drm/i915/intel_lrc.c:2419:39: warning: incorrect type in assignment (different address spaces)

Commit: drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
-O:drivers/gpu/drm/i915/intel_lrc.c:2416:29:    expected unsigned int [usertype] *csb_read
-O:drivers/gpu/drm/i915/intel_lrc.c:2416:29:    got void [noderef] <asn:2>*
-O:drivers/gpu/drm/i915/intel_lrc.c:2416:29: warning: incorrect type in assignment (different address spaces)
+drivers/gpu/drm/i915/intel_lrc.c:2444:29:    expected unsigned int [usertype] *csb_read
+drivers/gpu/drm/i915/intel_lrc.c:2444:29:    got void [noderef] <asn:2>*
+drivers/gpu/drm/i915/intel_lrc.c:2444:29: warning: incorrect type in assignment (different address spaces)

Commit: drm/i915: Move rate-limiting request retire to after submission
Okay!

Commit: drm/i915: Wait for engines to idle before retiring
Okay!

Commit: drm/i915: Move engine request retirement to intel_engine_cs
Okay!

Commit: drm/i915: Hold request reference for submission until retirement
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/18] drm/i915: Prepare GEM for suspend earlier
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (19 preceding siblings ...)
  2018-05-25 10:00 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-05-25 10:16 ` Patchwork
  2018-05-25 14:25 ` ✗ Fi.CI.IGT: failure " Patchwork
  21 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2018-05-25 10:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915: Prepare GEM for suspend earlier
URL   : https://patchwork.freedesktop.org/series/43765/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4238 -> Patchwork_9120 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/43765/revisions/1/mbox/

== Known issues ==

  Here are the changes found in Patchwork_9120 that come from known issues:

  === IGT changes ===

    ==== Possible fixes ====

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      fi-snb-2520m:       INCOMPLETE (fdo#103713) -> PASS

    
  fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713


== Participating hosts (44 -> 37) ==

  Missing    (7): fi-ilk-m540 fi-byt-j1900 fi-cnl-y3 fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4238 -> Patchwork_9120

  CI_DRM_4238: 2771a5e6347eb63e43fdfc432a9f15ffb55ef209 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4498: f9ecb79ad8b02278cfdb5b82495df47061c04f8f @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9120: 66703c7b803c782443434c0c3dc479870dce7df9 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

66703c7b803c drm/i915: Hold request reference for submission until retirement
408446da1803 drm/i915: Move engine request retirement to intel_engine_cs
44f2b6d34b54 drm/i915: Wait for engines to idle before retiring
5f5f78da7433 drm/i915: Move rate-limiting request retire to after submission
1d85b2b170a0 drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
ce11b28a15fa drm/i915/execlists: Unify CSB access pointers
a1159c99f645 drm/i915/execlists: Process one CSB interrupt at a time
4474beaad02f drm/i915/execlists: Pull CSB reset under the timeline.lock
f00e2ca10476 drm/i915/execlists: Pull submit after dequeue under timeline lock
7d22a392f8d1 drm/i915/execlists: Reset the CSB head tracking on reset/sanitization
6ee85e893f58 drm/i915/execlists: Wait for ELSP submission on restart
add445ce0814 drm/i915: Be irqsafe inside reset
6f2dabddf373 drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset
13da5d1a87bb drm/i915: Only sanitize GEM from late suspend
a18c5ca5e1be drm/i915: After reset on sanitization, reset the engine backends
5523a66842f6 drm/i915: "Race-to-idle" after switching to the kernel context
14c41b87a902 drm/i915: Switch to kernel context before idling at runtime
f6ebcbc1316a drm/i915: Prepare GEM for suspend earlier

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9120/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/18] drm/i915: "Race-to-idle" after switching to the kernel context
  2018-05-25  9:31 ` [PATCH 03/18] drm/i915: "Race-to-idle" after switching to the kernel context Chris Wilson
@ 2018-05-25 12:22   ` Mika Kuoppala
  2018-05-25 12:26     ` Chris Wilson
  0 siblings, 1 reply; 31+ messages in thread
From: Mika Kuoppala @ 2018-05-25 12:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> During suspend we want to flush out all active contexts and their
> rendering. To do so we queue a request from the kernel's context, once
> we know that request is done, we know the GPU is completely idle. To
> speed up that switch bump the GPU clocks.
>
> Switching to the kernel context prior to idling is also used to enforce
> a barrier before changing OA properties, and when evicting active
> rendering from the global GTT. All cases where we do want to
> race-to-idle.
>
> v2: Limit the boosting to only the switch before suspend.
> v3: Limit it to the wait-for-idle on suspend.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: David Weinehall <david.weinehall@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> #v1
> Tested-by: David Weinehall <david.weinehall@linux.intel.com> #v1
> ---
>  drivers/gpu/drm/i915/i915_gem.c     | 27 +++++++++++++++++++++++++--
>  drivers/gpu/drm/i915/i915_request.h |  1 +
>  2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c93f5dcb1d82..7b5544efa0ba 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3704,7 +3704,29 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  
>  static int wait_for_timeline(struct i915_timeline *tl, unsigned int flags)
>  {
> -	return i915_gem_active_wait(&tl->last_request, flags);
> +	struct i915_request *rq;
> +	long ret;
> +
> +	rq = i915_gem_active_get_unlocked(&tl->last_request);
> +	if (!rq)
> +		return 0;
> +
> +	/*
> +	 * "Race-to-idle".
> +	 *
> +	 * Switching to the kernel context is often used a synchronous
> +	 * step prior to idling, e.g. in suspend for flushing all
> +	 * current operations to memory before sleeping. These we
> +	 * want to complete as quickly as possible to avoid prolonged
> +	 * stalls, so allow the gpu to boost to maximum clocks.
> +	 */
> +	if (flags & I915_WAIT_FOR_IDLE_BOOST)
> +		gen6_rps_boost(rq, NULL);
> +
> +	ret = i915_request_wait(rq, flags, MAX_SCHEDULE_TIMEOUT);

I pondered about the boost flag falling through into wait.
But they are flags on wait domain so no reason to split/limit.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> +	i915_request_put(rq);
> +
> +	return ret < 0 ? ret : 0;
>  }
>  
>  static int wait_for_engines(struct drm_i915_private *i915)
> @@ -4979,7 +5001,8 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
>  
>  		ret = i915_gem_wait_for_idle(dev_priv,
>  					     I915_WAIT_INTERRUPTIBLE |
> -					     I915_WAIT_LOCKED);
> +					     I915_WAIT_LOCKED |
> +					     I915_WAIT_FOR_IDLE_BOOST);
>  		if (ret && ret != -EIO)
>  			goto err_unlock;
>  
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 1bbbb7a9fa03..491ff81d0fea 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -267,6 +267,7 @@ long i915_request_wait(struct i915_request *rq,
>  #define I915_WAIT_INTERRUPTIBLE	BIT(0)
>  #define I915_WAIT_LOCKED	BIT(1) /* struct_mutex held, handle GPU reset */
>  #define I915_WAIT_ALL		BIT(2) /* used by i915_gem_object_wait() */
> +#define I915_WAIT_FOR_IDLE_BOOST BIT(3)
>  
>  static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine);
>  
> -- 
> 2.17.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/18] drm/i915: "Race-to-idle" after switching to the kernel context
  2018-05-25 12:22   ` Mika Kuoppala
@ 2018-05-25 12:26     ` Chris Wilson
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Wilson @ 2018-05-25 12:26 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-05-25 13:22:55)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > During suspend we want to flush out all active contexts and their
> > rendering. To do so we queue a request from the kernel's context, once
> > we know that request is done, we know the GPU is completely idle. To
> > speed up that switch bump the GPU clocks.
> >
> > Switching to the kernel context prior to idling is also used to enforce
> > a barrier before changing OA properties, and when evicting active
> > rendering from the global GTT. All cases where we do want to
> > race-to-idle.
> >
> > v2: Limit the boosting to only the switch before suspend.
> > v3: Limit it to the wait-for-idle on suspend.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: David Weinehall <david.weinehall@linux.intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> #v1
> > Tested-by: David Weinehall <david.weinehall@linux.intel.com> #v1
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c     | 27 +++++++++++++++++++++++++--
> >  drivers/gpu/drm/i915/i915_request.h |  1 +
> >  2 files changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index c93f5dcb1d82..7b5544efa0ba 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3704,7 +3704,29 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> >  
> >  static int wait_for_timeline(struct i915_timeline *tl, unsigned int flags)
> >  {
> > -     return i915_gem_active_wait(&tl->last_request, flags);
> > +     struct i915_request *rq;
> > +     long ret;
> > +
> > +     rq = i915_gem_active_get_unlocked(&tl->last_request);
> > +     if (!rq)
> > +             return 0;
> > +
> > +     /*
> > +      * "Race-to-idle".
> > +      *
> > +      * Switching to the kernel context is often used a synchronous
> > +      * step prior to idling, e.g. in suspend for flushing all
> > +      * current operations to memory before sleeping. These we
> > +      * want to complete as quickly as possible to avoid prolonged
> > +      * stalls, so allow the gpu to boost to maximum clocks.
> > +      */
> > +     if (flags & I915_WAIT_FOR_IDLE_BOOST)
> > +             gen6_rps_boost(rq, NULL);
> > +
> > +     ret = i915_request_wait(rq, flags, MAX_SCHEDULE_TIMEOUT);
> 
> I pondered about the boost flag falling through into wait.
> But they are flags on wait domain so no reason to split/limit.

Indeed,

> >  #define I915_WAIT_INTERRUPTIBLE      BIT(0)
> >  #define I915_WAIT_LOCKED     BIT(1) /* struct_mutex held, handle GPU reset */
> >  #define I915_WAIT_ALL                BIT(2) /* used by i915_gem_object_wait() */
> > +#define I915_WAIT_FOR_IDLE_BOOST BIT(3)

I thought was a reasonable way to share the same set of flags with
distinct regions.

Thanks for the review and discussion,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 08/18] drm/i915/execlists: Wait for ELSP submission on restart
  2018-05-25  9:31 ` [PATCH 08/18] drm/i915/execlists: Wait for ELSP submission on restart Chris Wilson
@ 2018-05-25 12:37   ` Mika Kuoppala
  0 siblings, 0 replies; 31+ messages in thread
From: Mika Kuoppala @ 2018-05-25 12:37 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> After a reset, we will ensure that there is at least one request
> submitted to HW to ensure that a context is loaded for powersaving.
> Let's wait for this submission via a tasklet to complete before we drop
> our forcewake, ensuring the system is ready for rc6 before we let it
> possible sleep.

s/possible/possibly ?

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.h  |  6 ++++++
>  drivers/gpu/drm/i915/intel_lrc.c | 15 ++++++++++++++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
> index 62ee4e385365..261da577829a 100644
> --- a/drivers/gpu/drm/i915/i915_gem.h
> +++ b/drivers/gpu/drm/i915/i915_gem.h
> @@ -82,4 +82,10 @@ static inline void __tasklet_disable_sync_once(struct tasklet_struct *t)
>  		tasklet_unlock_wait(t);
>  }
>  
> +static inline void __tasklet_enable_sync_once(struct tasklet_struct *t)
> +{
> +	if (atomic_dec_return(&t->count) == 0)
> +		tasklet_kill(t);
> +}
> +
>  #endif /* __I915_GEM_H__ */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index b3a5cfb7689f..f2fb48b1a7b7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1877,6 +1877,8 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
>  
>  	GEM_TRACE("%s\n", engine->name);
>  
> +	intel_uncore_forcewake_get(engine->i915, FORCEWAKE_ALL);
> +
>  	/*
>  	 * Prevent request submission to the hardware until we have
>  	 * completed the reset in i915_gem_reset_finish(). If a request
> @@ -2007,7 +2009,18 @@ static void execlists_reset(struct intel_engine_cs *engine,
>  
>  static void execlists_reset_finish(struct intel_engine_cs *engine)
>  {
> -	tasklet_enable(&engine->execlists.tasklet);
> +	/*
> +	 * Flush the tasklet while we still have the forcewake to be sure
> +	 * that it is not allowed to sleep before we restart and reload a
> +	 * context.
> +	 *
> +	 * As before (with execlists_reset_prepare) we rely on the caller
> +	 * serialising mutiple attempts to reset so that we know that we

s/mutiple/multiple

> +	 * are the only one manipulating tasklet state.
> +	 */
> +	__tasklet_enable_sync_once(&engine->execlists.tasklet);
> +
> +	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);

as discussed in irc, we already hold forcewakes on behalf of
prepare_engine() above us.


With forcewake_put/get removed,

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  
>  	GEM_TRACE("%s\n", engine->name);
>  }
> -- 
> 2.17.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/18] drm/i915: Prepare GEM for suspend earlier
  2018-05-25  9:31 ` [PATCH 01/18] drm/i915: Prepare GEM for suspend earlier Chris Wilson
@ 2018-05-25 12:56   ` Mika Kuoppala
  0 siblings, 0 replies; 31+ messages in thread
From: Mika Kuoppala @ 2018-05-25 12:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> In order to prepare the GPU for sleeping, we may want to submit commands
> to it. This is a complicated process that may even require some swapping
> in from shmemfs, if the GPU was in the wrong state. As such, we need to
> do this preparation step synchronously before the rest of the system has
> started to turn off (e.g. swapin fails if scsi is suspended).
> Fortunately, we are provided with a such a hook, pm_ops.prepare().
>
> v2: Compile cleanup
> v3: Fewer asserts, fewer problems?
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106640
> Testcase: igt/drv_suspend after igt/gem_tiled_swapping
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_drv.c | 41 +++++++++++++++++++++++++--------
>  1 file changed, 31 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 9c449b8d8eab..9d6ac7f44812 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1553,12 +1553,24 @@ static bool suspend_to_idle(struct drm_i915_private *dev_priv)
>  	return false;
>  }
>  
> +static int i915_drm_prepare(struct drm_device *dev)
> +{
> +	struct drm_i915_private *i915 = to_i915(dev);
> +	int err;
> +
> +	err = i915_gem_suspend(i915);
> +	if (err)
> +		dev_err(&i915->drm.pdev->dev,
> +			"GEM idle failed, suspend/resume might fail\n");
> +
> +	return err;
> +}
> +
>  static int i915_drm_suspend(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = to_i915(dev);
>  	struct pci_dev *pdev = dev_priv->drm.pdev;
>  	pci_power_t opregion_target_state;
> -	int error;
>  
>  	/* ignore lid events during suspend */
>  	mutex_lock(&dev_priv->modeset_restore_lock);
> @@ -1575,13 +1587,6 @@ static int i915_drm_suspend(struct drm_device *dev)
>  
>  	pci_save_state(pdev);
>  
> -	error = i915_gem_suspend(dev_priv);
> -	if (error) {
> -		dev_err(&pdev->dev,
> -			"GEM idle failed, resume might fail\n");
> -		goto out;
> -	}
> -
>  	intel_display_suspend(dev);
>  
>  	intel_dp_mst_suspend(dev);
> @@ -1609,10 +1614,9 @@ static int i915_drm_suspend(struct drm_device *dev)
>  
>  	intel_csr_ucode_suspend(dev_priv);
>  
> -out:
>  	enable_rpm_wakeref_asserts(dev_priv);
>  
> -	return error;
> +	return 0;
>  }
>  
>  static int i915_drm_suspend_late(struct drm_device *dev, bool hibernation)
> @@ -2081,6 +2085,22 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
>  	return ret;
>  }
>  
> +static int i915_pm_prepare(struct device *kdev)
> +{
> +	struct pci_dev *pdev = to_pci_dev(kdev);
> +	struct drm_device *dev = pci_get_drvdata(pdev);
> +
> +	if (!dev) {
> +		dev_err(kdev, "DRM not initialized, aborting suspend.\n");
> +		return -ENODEV;
> +	}
> +
> +	if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> +		return 0;
> +
> +	return i915_drm_prepare(dev);
> +}
> +
>  static int i915_pm_suspend(struct device *kdev)
>  {
>  	struct pci_dev *pdev = to_pci_dev(kdev);
> @@ -2731,6 +2751,7 @@ const struct dev_pm_ops i915_pm_ops = {
>  	 * S0ix (via system suspend) and S3 event handlers [PMSG_SUSPEND,
>  	 * PMSG_RESUME]
>  	 */
> +	.prepare = i915_pm_prepare,
>  	.suspend = i915_pm_suspend,
>  	.suspend_late = i915_pm_suspend_late,
>  	.resume_early = i915_pm_resume_early,
> -- 
> 2.17.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 04/18] drm/i915: After reset on sanitization, reset the engine backends
  2018-05-25  9:31 ` [PATCH 04/18] drm/i915: After reset on sanitization, reset the engine backends Chris Wilson
@ 2018-05-25 13:13   ` Mika Kuoppala
  2018-05-25 13:17     ` Chris Wilson
  0 siblings, 1 reply; 31+ messages in thread
From: Mika Kuoppala @ 2018-05-25 13:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> As we reset the GPU on suspend/resume, we also do need to reset the
> engine state tracking so call into the engine backends. This is
> especially important so that we can also sanitize the state tracking
> across resume.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7b5544efa0ba..5a7e0b388ad0 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
>  
>  void i915_gem_sanitize(struct drm_i915_private *i915)
>  {
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	GEM_TRACE("\n");
> +
>  	mutex_lock(&i915->drm.struct_mutex);
> +
> +	intel_runtime_pm_get(i915);
> +	intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
> +
> +	/*
> +	 * As we have just resumed the machine and woken the device up from
> +	 * deep PCI sleep (presumably D3_cold), assume the HW has been reset
> +	 * back to defaults, recovering from whatever wedged state we left it
> +	 * in and so worth trying to use the device once more.
> +	 */
>  	if (i915_terminally_wedged(&i915->gpu_error))
>  		i915_gem_unset_wedged(i915);
>  
> @@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
>  	if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
>  		WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
>  
> +	/* Reset the submission backend after resume as well as the GPU reset */
> +	for_each_engine(engine, i915, id) {
> +		if (engine->reset.reset)
> +			engine->reset.reset(engine, NULL);
> +	}

The NULL guarantees that it wont try to do any funny things
with the incomplete state.

But what guarantees the the timeline cleanup so that
we don't endup unwinding incomplete requests crap?
-Mika

> +
> +	intel_uncore_forcewake_put(i915, FORCEWAKE_ALL);
> +	intel_runtime_pm_put(i915);
> +
>  	i915_gem_contexts_lost(i915);
>  	mutex_unlock(&i915->drm.struct_mutex);
>  }
> -- 
> 2.17.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 04/18] drm/i915: After reset on sanitization, reset the engine backends
  2018-05-25 13:13   ` Mika Kuoppala
@ 2018-05-25 13:17     ` Chris Wilson
  2018-05-25 13:25       ` Mika Kuoppala
  0 siblings, 1 reply; 31+ messages in thread
From: Chris Wilson @ 2018-05-25 13:17 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-05-25 14:13:19)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > As we reset the GPU on suspend/resume, we also do need to reset the
> > engine state tracking so call into the engine backends. This is
> > especially important so that we can also sanitize the state tracking
> > across resume.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 7b5544efa0ba..5a7e0b388ad0 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
> >  
> >  void i915_gem_sanitize(struct drm_i915_private *i915)
> >  {
> > +     struct intel_engine_cs *engine;
> > +     enum intel_engine_id id;
> > +
> > +     GEM_TRACE("\n");
> > +
> >       mutex_lock(&i915->drm.struct_mutex);
> > +
> > +     intel_runtime_pm_get(i915);
> > +     intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
> > +
> > +     /*
> > +      * As we have just resumed the machine and woken the device up from
> > +      * deep PCI sleep (presumably D3_cold), assume the HW has been reset
> > +      * back to defaults, recovering from whatever wedged state we left it
> > +      * in and so worth trying to use the device once more.
> > +      */
> >       if (i915_terminally_wedged(&i915->gpu_error))
> >               i915_gem_unset_wedged(i915);
> >  
> > @@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
> >       if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
> >               WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
> >  
> > +     /* Reset the submission backend after resume as well as the GPU reset */
> > +     for_each_engine(engine, i915, id) {
> > +             if (engine->reset.reset)
> > +                     engine->reset.reset(engine, NULL);
> > +     }
> 
> The NULL guarantees that it wont try to do any funny things
> with the incomplete state.

The NULL is there because this gets called really, really early before
we've finished setting up the engines.

> But what guarantees the the timeline cleanup so that
> we don't endup unwinding incomplete requests crap?

To get here we must have gone through at least the start of a suspend.
So we've already cleaned everything up; nicely or forcefully though a
wedge. Whatever is here is garbage, including any internal knowledge in
the backend about what state we left the machine in.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 04/18] drm/i915: After reset on sanitization, reset the engine backends
  2018-05-25 13:17     ` Chris Wilson
@ 2018-05-25 13:25       ` Mika Kuoppala
  0 siblings, 0 replies; 31+ messages in thread
From: Mika Kuoppala @ 2018-05-25 13:25 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-05-25 14:13:19)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > As we reset the GPU on suspend/resume, we also do need to reset the
>> > engine state tracking so call into the engine backends. This is
>> > especially important so that we can also sanitize the state tracking
>> > across resume.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > ---
>> >  drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++
>> >  1 file changed, 24 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> > index 7b5544efa0ba..5a7e0b388ad0 100644
>> > --- a/drivers/gpu/drm/i915/i915_gem.c
>> > +++ b/drivers/gpu/drm/i915/i915_gem.c
>> > @@ -4955,7 +4955,22 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
>> >  
>> >  void i915_gem_sanitize(struct drm_i915_private *i915)
>> >  {
>> > +     struct intel_engine_cs *engine;
>> > +     enum intel_engine_id id;
>> > +
>> > +     GEM_TRACE("\n");
>> > +
>> >       mutex_lock(&i915->drm.struct_mutex);
>> > +
>> > +     intel_runtime_pm_get(i915);
>> > +     intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
>> > +
>> > +     /*
>> > +      * As we have just resumed the machine and woken the device up from
>> > +      * deep PCI sleep (presumably D3_cold), assume the HW has been reset
>> > +      * back to defaults, recovering from whatever wedged state we left it
>> > +      * in and so worth trying to use the device once more.
>> > +      */
>> >       if (i915_terminally_wedged(&i915->gpu_error))
>> >               i915_gem_unset_wedged(i915);
>> >  
>> > @@ -4970,6 +4985,15 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
>> >       if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
>> >               WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
>> >  
>> > +     /* Reset the submission backend after resume as well as the GPU reset */
>> > +     for_each_engine(engine, i915, id) {
>> > +             if (engine->reset.reset)
>> > +                     engine->reset.reset(engine, NULL);
>> > +     }
>> 
>> The NULL guarantees that it wont try to do any funny things
>> with the incomplete state.
>
> The NULL is there because this gets called really, really early before
> we've finished setting up the engines.
>
>> But what guarantees the the timeline cleanup so that
>> we don't endup unwinding incomplete requests crap?
>
> To get here we must have gone through at least the start of a suspend.
> So we've already cleaned everything up; nicely or forcefully though a
> wedge. Whatever is here is garbage, including any internal knowledge in
> the backend about what state we left the machine in.

Fair enough,

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/18] drm/i915: Switch to kernel context before idling at runtime
  2018-05-25  9:31 ` [PATCH 02/18] drm/i915: Switch to kernel context before idling at runtime Chris Wilson
@ 2018-05-25 13:44   ` Mika Kuoppala
  0 siblings, 0 replies; 31+ messages in thread
From: Mika Kuoppala @ 2018-05-25 13:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> We can reduce our exposure to random neutrinos by resting on the kernel
> context having flushed out the user contexts to system memory and
> beyond. The corollary is that we then we require two passes through the
> idle handler to go to sleep, which on a truly idle system involves an
> extra pass through the slow and irregular retire work handler.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c |  7 +++++--
>  drivers/gpu/drm/i915/i915_gem.c     | 29 ++++++++++++++++++++++++-----
>  2 files changed, 29 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index a8e7761cdc7d..0fb86a8ef192 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4226,8 +4226,11 @@ i915_drop_caches_set(void *data, u64 val)
>  		i915_gem_shrink_all(dev_priv);
>  	fs_reclaim_release(GFP_KERNEL);
>  
> -	if (val & DROP_IDLE)
> -		drain_delayed_work(&dev_priv->gt.idle_work);
> +	if (val & DROP_IDLE) {
> +		do {
> +			drain_delayed_work(&dev_priv->gt.idle_work);
> +		} while (READ_ONCE(dev_priv->gt.awake));
> +	}

Should we do this on suspend path also? Perhaps better
just to warn there, like we do, and let it slide.

>  
>  	if (val & DROP_FREED)
>  		i915_gem_drain_freed_objects(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 05f44ca35a06..c93f5dcb1d82 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -139,6 +139,8 @@ int i915_mutex_lock_interruptible(struct drm_device *dev)
>  
>  static u32 __i915_gem_park(struct drm_i915_private *i915)
>  {
> +	GEM_TRACE("\n");
> +
>  	lockdep_assert_held(&i915->drm.struct_mutex);
>  	GEM_BUG_ON(i915->gt.active_requests);
>  	GEM_BUG_ON(!list_empty(&i915->gt.active_rings));
> @@ -193,6 +195,8 @@ void i915_gem_park(struct drm_i915_private *i915)
>  
>  void i915_gem_unpark(struct drm_i915_private *i915)
>  {
> +	GEM_TRACE("\n");
> +
>  	lockdep_assert_held(&i915->drm.struct_mutex);
>  	GEM_BUG_ON(!i915->gt.active_requests);
>  
> @@ -3504,6 +3508,21 @@ i915_gem_idle_work_handler(struct work_struct *work)
>  	if (!READ_ONCE(dev_priv->gt.awake))
>  		return;
>  
> +	/*
> +	 * Flush out the last user context, leaving only the pinned
> +	 * kernel context resident. When we are idling on the kernel_context,
> +	 * no more new requests (with a context switch) are emitted and we
> +	 * can finally rest. A consequence is that the idle work handler is
> +	 * always called at least twice before idling (and if the system is
> +	 * idle that implies a round trip through the retire worker).
> +	 */

We keep gt awake a little longer and inject somtimes 'useless'
switch to kernel ctx.

It is acceptable price to pay for a sound bed to sleep on.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> +	mutex_lock(&dev_priv->drm.struct_mutex);
> +	i915_gem_switch_to_kernel_context(dev_priv);
> +	mutex_unlock(&dev_priv->drm.struct_mutex);
> +
> +	GEM_TRACE("active_requests=%d (after switch-to-kernel-context)\n",
> +		  READ_ONCE(dev_priv->gt.active_requests));
> +
>  	/*
>  	 * Wait for last execlists context complete, but bail out in case a
>  	 * new request is submitted. As we don't trust the hardware, we
> @@ -4914,11 +4933,9 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
>  
>  void i915_gem_sanitize(struct drm_i915_private *i915)
>  {
> -	if (i915_terminally_wedged(&i915->gpu_error)) {
> -		mutex_lock(&i915->drm.struct_mutex);
> +	mutex_lock(&i915->drm.struct_mutex);
> +	if (i915_terminally_wedged(&i915->gpu_error))
>  		i915_gem_unset_wedged(i915);
> -		mutex_unlock(&i915->drm.struct_mutex);
> -	}
>  
>  	/*
>  	 * If we inherit context state from the BIOS or earlier occupants
> @@ -4930,6 +4947,9 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
>  	 */
>  	if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
>  		WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
> +
> +	i915_gem_contexts_lost(i915);
> +	mutex_unlock(&i915->drm.struct_mutex);
>  }
>  
>  int i915_gem_suspend(struct drm_i915_private *dev_priv)
> @@ -4965,7 +4985,6 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
>  
>  		assert_kernel_context_is_current(dev_priv);
>  	}
> -	i915_gem_contexts_lost(dev_priv);
>  	mutex_unlock(&dev->struct_mutex);
>  
>  	intel_uc_suspend(dev_priv);
> -- 
> 2.17.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [01/18] drm/i915: Prepare GEM for suspend earlier
  2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
                   ` (20 preceding siblings ...)
  2018-05-25 10:16 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-05-25 14:25 ` Patchwork
  21 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2018-05-25 14:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915: Prepare GEM for suspend earlier
URL   : https://patchwork.freedesktop.org/series/43765/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4238_full -> Patchwork_9120_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9120_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9120_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/43765/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9120_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@perf_pmu@busy-hang-rcs0:
      shard-kbl:          PASS -> FAIL
      shard-apl:          PASS -> FAIL
      shard-glk:          PASS -> FAIL

    
    ==== Warnings ====

    igt@gem_mocs_settings@mocs-rc6-dirty-render:
      shard-kbl:          SKIP -> PASS

    igt@kms_mmap_write_crc:
      shard-snb:          PASS -> SKIP +1

    
== Known issues ==

  Here are the changes found in Patchwork_9120_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_hangcheck:
      shard-glk:          PASS -> DMESG-FAIL (fdo#106560)

    igt@gem_eio@suspend:
      shard-snb:          PASS -> INCOMPLETE (fdo#105411) +1

    igt@gem_workarounds@suspend-resume:
      shard-kbl:          PASS -> INCOMPLETE (fdo#103665) +4

    igt@kms_flip@2x-plain-flip-fb-recreate:
      shard-glk:          PASS -> FAIL (fdo#100368)

    igt@kms_flip@flip-vs-expired-vblank:
      shard-glk:          PASS -> FAIL (fdo#105707)

    igt@kms_flip_tiling@flip-to-y-tiled:
      shard-glk:          PASS -> FAIL (fdo#103822, fdo#104724) +1

    igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt:
      shard-glk:          PASS -> FAIL (fdo#103167, fdo#104724)

    
    ==== Possible fixes ====

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking:
      shard-glk:          FAIL (fdo#105703) -> PASS

    
    ==== Warnings ====

    igt@gem_exec_schedule@pi-ringfull-bsd1:
      shard-kbl:          FAIL (fdo#103158) -> INCOMPLETE (fdo#103665)

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#103158 https://bugs.freedesktop.org/show_bug.cgi?id=103158
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#103822 https://bugs.freedesktop.org/show_bug.cgi?id=103822
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#105703 https://bugs.freedesktop.org/show_bug.cgi?id=105703
  fdo#105707 https://bugs.freedesktop.org/show_bug.cgi?id=105707
  fdo#106560 https://bugs.freedesktop.org/show_bug.cgi?id=106560


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4238 -> Patchwork_9120

  CI_DRM_4238: 2771a5e6347eb63e43fdfc432a9f15ffb55ef209 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4498: f9ecb79ad8b02278cfdb5b82495df47061c04f8f @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9120: 66703c7b803c782443434c0c3dc479870dce7df9 @ git://anongit.freedesktop.org/gfx-ci/linux

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9120/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-05-25 14:25 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-25  9:31 RFC avoiding ksoftirqd for first submission Chris Wilson
2018-05-25  9:31 ` [PATCH 01/18] drm/i915: Prepare GEM for suspend earlier Chris Wilson
2018-05-25 12:56   ` Mika Kuoppala
2018-05-25  9:31 ` [PATCH 02/18] drm/i915: Switch to kernel context before idling at runtime Chris Wilson
2018-05-25 13:44   ` Mika Kuoppala
2018-05-25  9:31 ` [PATCH 03/18] drm/i915: "Race-to-idle" after switching to the kernel context Chris Wilson
2018-05-25 12:22   ` Mika Kuoppala
2018-05-25 12:26     ` Chris Wilson
2018-05-25  9:31 ` [PATCH 04/18] drm/i915: After reset on sanitization, reset the engine backends Chris Wilson
2018-05-25 13:13   ` Mika Kuoppala
2018-05-25 13:17     ` Chris Wilson
2018-05-25 13:25       ` Mika Kuoppala
2018-05-25  9:31 ` [PATCH 05/18] drm/i915: Only sanitize GEM from late suspend Chris Wilson
2018-05-25  9:31 ` [PATCH 06/18] drm/i915: Flush the ring stop bit after clearing RING_HEAD in reset Chris Wilson
2018-05-25  9:31 ` [PATCH 07/18] drm/i915: Be irqsafe inside reset Chris Wilson
2018-05-25  9:31 ` [PATCH 08/18] drm/i915/execlists: Wait for ELSP submission on restart Chris Wilson
2018-05-25 12:37   ` Mika Kuoppala
2018-05-25  9:31 ` [PATCH 09/18] drm/i915/execlists: Reset the CSB head tracking on reset/sanitization Chris Wilson
2018-05-25  9:31 ` [PATCH 10/18] drm/i915/execlists: Pull submit after dequeue under timeline lock Chris Wilson
2018-05-25  9:31 ` [PATCH 11/18] drm/i915/execlists: Pull CSB reset under the timeline.lock Chris Wilson
2018-05-25  9:32 ` [PATCH 12/18] drm/i915/execlists: Process one CSB interrupt at a time Chris Wilson
2018-05-25  9:32 ` [PATCH 13/18] drm/i915/execlists: Unify CSB access pointers Chris Wilson
2018-05-25  9:32 ` [PATCH 14/18] drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd) Chris Wilson
2018-05-25  9:32 ` [PATCH 15/18] drm/i915: Move rate-limiting request retire to after submission Chris Wilson
2018-05-25  9:32 ` [PATCH 16/18] drm/i915: Wait for engines to idle before retiring Chris Wilson
2018-05-25  9:32 ` [PATCH 17/18] drm/i915: Move engine request retirement to intel_engine_cs Chris Wilson
2018-05-25  9:32 ` [PATCH 18/18] drm/i915: Hold request reference for submission until retirement Chris Wilson
2018-05-25  9:55 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915: Prepare GEM for suspend earlier Patchwork
2018-05-25 10:00 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-05-25 10:16 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-25 14:25 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.