All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arun Siluvery <arun.siluvery@linux.intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: Mika Kuoppala <mika.kuoppala@intel.com>, Tomas Elf <tomas.elf@intel.com>
Subject: [PATCH v2 03/15] drm/i915: Reinstate hang recovery work queue.
Date: Fri, 17 Jun 2016 08:09:03 +0100	[thread overview]
Message-ID: <1466147355-4635-4-git-send-email-arun.siluvery@linux.intel.com> (raw)
In-Reply-To: <1466147355-4635-1-git-send-email-arun.siluvery@linux.intel.com>

From: Tomas Elf <tomas.elf@intel.com>

There used to be a work queue separating the error handler from the hang
recovery path, which was removed a while back in this commit:

commit b8d24a06568368076ebd5a858a011699a97bfa42
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Wed Jan 28 17:03:14 2015 +0200

    drm/i915: Remove nested work in gpu error handling

Now we need to revert most of that commit since the work queue separating
hang detection from hang recovery is needed in preparation for the upcoming
watchdog timeout feature. The watchdog interrupt service routine will be a
second callsite of the error handler alongside the periodic hang checker,
which runs in a work queue context. Seeing as the error handler will be
serving a caller in a hard interrupt execution context that means that the
error handler must never end up in a situation where it needs to grab the
struct_mutex.  Unfortunately, that is exactly what we need to do first at
the start of the hang recovery path, which might potentially sleep if the
struct_mutex is already held by another thread. Not good when you're in a
hard interrupt context.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c |  1 +
 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_irq.c | 26 +++++++++++++++++++++-----
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 24b670f..8a94fc5 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1552,6 +1552,7 @@ int i915_driver_unload(struct drm_device *dev)
 	/* Free error state after interrupts are fully disabled. */
 	cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
 	i915_destroy_error_state(dev);
+	cancel_work_sync(&dev_priv->gpu_error.work);
 
 	/* Flush any outstanding unpin_work. */
 	flush_workqueue(dev_priv->wq);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8bb05b2..9ad15c7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1372,6 +1372,7 @@ struct i915_gpu_error {
 	spinlock_t lock;
 	/* Protected by the above dev->gpu_error.lock. */
 	struct drm_i915_error_state *first_error;
+	struct work_struct work;
 
 	unsigned long missed_irq_rings;
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 4378a65..224ff2b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2516,14 +2516,18 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 }
 
 /**
- * i915_reset_and_wakeup - do process context error handling work
- * @dev_priv: i915 device private
+ * i915_error_work_func - do process context error handling work
+ * @work: work item containing error struct, passed by the error handler
  *
  * Fire an error uevent so userspace can see that a hang or error
  * was detected.
  */
-static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv)
+static void i915_error_work_func(struct work_struct *work)
 {
+	struct i915_gpu_error *error = container_of(work, struct i915_gpu_error,
+	                                            work);
+	struct drm_i915_private *dev_priv =
+	        container_of(error, struct drm_i915_private, gpu_error);
 	struct kobject *kobj = &dev_priv->dev->primary->kdev->kobj;
 	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
 	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
@@ -2717,7 +2721,19 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
 		i915_error_wake_up(dev_priv, false);
 	}
 
-	i915_reset_and_wakeup(dev_priv);
+	/*
+	 * Our reset work can grab modeset locks (since it needs to reset the
+	 * state of outstanding pageflips). Hence it must not be run on our own
+	 * dev-priv->wq work queue for otherwise the flush_work in the pageflip
+	 * code will deadlock.
+	 * If error_work is already in the work queue then it will not be added
+	 * again. It hasn't yet executed so it will see the reset flags when
+	 * it is scheduled. If it isn't in the queue or it is currently
+	 * executing then this call will add it to the queue again so that
+	 * even if it misses the reset flags during the current call it is
+	 * guaranteed to see them on the next call.
+	 */
+	schedule_work(&dev_priv->gpu_error.work);
 }
 
 /* Called from drm generic code, passed 'crtc' which
@@ -4556,7 +4572,7 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
 	struct drm_device *dev = dev_priv->dev;
 
 	intel_hpd_init_work(dev_priv);
-
+	INIT_WORK(&dev_priv->gpu_error.work, i915_error_work_func);
 	INIT_WORK(&dev_priv->rps.work, gen6_pm_rps_work);
 	INIT_WORK(&dev_priv->l3_parity.error_work, ivybridge_parity_work);
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2016-06-17  7:09 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-17  7:09 [PATCH v2 00/15] Execlist based Engine reset and recovery Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 01/15] drm/i915: Update i915.reset to handle engine resets Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 02/15] drm/i915/tdr: Extend the idea of reset_counter to engine reset Arun Siluvery
2016-06-17  7:35   ` Chris Wilson
2016-06-17  7:09 ` Arun Siluvery [this message]
2016-06-17  7:09 ` [PATCH v2 04/15] drm/i915/tdr: Modify error handler for per engine hang recovery Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 05/15] drm/i915/tdr: Prepare execlist submission to handle tdr resubmission after reset Arun Siluvery
2016-06-17  7:39   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 06/15] drm/i915/tdr: Capture engine state before reset Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 07/15] drm/i915/tdr: Restore engine state and start after reset Arun Siluvery
2016-06-17  7:32   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 08/15] drm/i915/tdr: Add support for per engine reset recovery Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 09/15] drm/i915: Skip reset request if there is one already Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 10/15] drm/i915: Extending i915_gem_check_wedge to check engine reset in progress Arun Siluvery
2016-06-17  7:41   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 11/15] drm/i915: Port of Added scheduler support to __wait_request() calls Arun Siluvery
2016-06-17  7:42   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 12/15] drm/i915/tdr: Add engine reset count to error state Arun Siluvery
2016-06-17  7:09 ` [PATCH v2 13/15] drm/i915/tdr: Export reset count info to debugfs Arun Siluvery
2016-06-17  7:21   ` Chris Wilson
2016-06-17  7:09 ` [PATCH v2 14/15] drm/i915/tdr: Enable Engine reset and recovery support Arun Siluvery
2016-06-17  7:09 ` [ONLY FOR BAT v2 15/15] drm/i915: Disable GuC submission for testing Engine reset patches Arun Siluvery
2016-06-17  7:33 ` ✗ Ro.CI.BAT: failure for Execlist based Engine reset and recovery Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1466147355-4635-4-git-send-email-arun.siluvery@linux.intel.com \
    --to=arun.siluvery@linux.intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=mika.kuoppala@intel.com \
    --cc=tomas.elf@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.