All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Imre Deak <imre.deak@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH v3] drm/i915: get a runtime PM ref for the deferred GPU reset work
Date: Tue, 22 Apr 2014 21:38:00 +0200	[thread overview]
Message-ID: <20140422193800.GE10722@phenom.ffwll.local> (raw)
In-Reply-To: <1397825265-6841-1-git-send-email-imre.deak@intel.com>

On Fri, Apr 18, 2014 at 03:47:45PM +0300, Imre Deak wrote:
> Atm we can end up in the GPU reset deferred work in D3 state if the last
> runtime PM reference is dropped between detecting a hang/scheduling the
> work and executing the work. At least one such case I could trigger is
> the simulated reset via the i915_wedged debugfs entry. Fix this by
> disabling RPM before scheduling the work until the end of the work.
> 
> v2:
> - Instead of getting/putting the RPM reference in the reset work itself,
>   get it already before scheduling the work. By this we also prevent
>   going to D3 before the work gets to run, in addition to making sure
>   that we run the work itself in D0. (Ville, Daniel)
> v3:
> - fix inverted logic fail when putting the RPM ref on behalf of a
>   cancelled GPU reset work (Ville)
> 
> Signed-off-by: Imre Deak <imre.deak@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_dma.c |  8 +++++++-
>  drivers/gpu/drm/i915/i915_irq.c | 21 ++++++++++++++++++++-
>  2 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 0b38f88..f792576 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1823,7 +1823,13 @@ int i915_driver_unload(struct drm_device *dev)
>  
>  	/* Free error state after interrupts are fully disabled. */
>  	del_timer_sync(&dev_priv->gpu_error.hangcheck_timer);
> -	cancel_work_sync(&dev_priv->gpu_error.work);
> +	if (cancel_work_sync(&dev_priv->gpu_error.work))
> +		/*
> +		 * The following won't make any difference in the PM state,
> +		 * since RPM is disabled already, but do it still for
> +		 * consistency.
> +		 */
> +		intel_runtime_pm_put(dev_priv);
>  	i915_destroy_error_state(dev);
>  
>  	if (dev->pdev->msi_enabled)
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index a651d0d..5e079d8 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2210,6 +2210,9 @@ static void i915_error_work_func(struct work_struct *work)
>  		 */
>  		i915_error_wake_up(dev_priv, true);
>  	}
> +
> +	/* Drop the ref we took when scheduling this work. */
> +	intel_runtime_pm_put(dev_priv);
>  }
>  
>  static void i915_report_and_clear_eir(struct drm_device *dev)
> @@ -2353,8 +2356,24 @@ void i915_handle_error(struct drm_device *dev, bool wedged,
>  	 * state of outstanding pagelips). Hence it must not be run on our own
>  	 * dev-priv->wq work queue for otherwise the flush_work in the pageflip
>  	 * code will deadlock.
> +	 *
> +	 * It's guaranteed that here we are in D0 state, since we can only get
> +	 * here via one of the following paths:
> +	 * - From an IRQ handler's error detection. -> The driver must make
> +	 *   sure that IRQs are unmasked only while holding an RPM ref.
> +	 * - From hang-check due to a blocked request. -> The request holds an
> +	 *   RPM ref, that's only released in i915_gpu_idle() which in turn
> +	 *   won't be called until the request is finished.
> +	 * - From hang-check due to a flip hang. -> We have an RPM ref because
> +	 *   of the active modeset.
> +	 * - From debugfs i915_wedged_set(). -> The caller takes an explicit
> +	 *   RPM ref.
> +	 * Take here an atomic RPM ref still to make sure that we don't
> +	 * re-enter D3 until the error work gets to run and completes the
> +	 * reset.
>  	 */
> -	schedule_work(&dev_priv->gpu_error.work);
> +	if (schedule_work(&dev_priv->gpu_error.work))
> +		intel_runtime_pm_get_noresume(dev_priv);

Isn't this a bit racy? A 2nd gpu could immediately execute the work while
this one here gets interrupted by NMI ...

I think we need to grab the ref earlier before the schedule_work. Or we
just grab it in the reset work around the register access. Actually I
think this is the right fix since we can only hit this cause by
fake-injecting a hang through debugfs. For real hangs we'll be holding a
runtime pm ref until the last request is completed by the gpu, which won't
happen if the gpu is hung.

Doing the get/put in the worker only also avoids the need to do a get from
atomic context, which is fairly fragile.

We should have a comment in the reset work explaining why we need it
though.
-Daniel

>  }
>  
>  static void __always_unused i915_pageflip_stall_check(struct drm_device *dev, int pipe)
> -- 
> 1.8.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

  reply	other threads:[~2014-04-22 19:38 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-14 17:24 [PATCH v2 00/25] vlv: add support for RPM Imre Deak
2014-04-14 17:24 ` [PATCH v2 01/25] drm/i915: vlv: clean up GTLC wake control/status register macros Imre Deak
2014-04-24 21:04   ` Rodrigo Vivi
2014-04-14 17:24 ` [PATCH v2 02/25] drm/i915: vlv: clear master interrupt flag when disabling interrupts Imre Deak
2014-04-14 17:24 ` [PATCH v2 03/25] drm/i915: vlv: add RC6 residency counters Imre Deak
2014-04-14 17:24 ` [PATCH v2 04/25] drm/i915: fix the RC6 status debug print Imre Deak
2014-04-14 17:24 ` [PATCH v2 05/25] drm/i915: remove the i915_dpio debugfs entry Imre Deak
2014-04-14 17:24 ` [PATCH v2 06/25] drm/i915: get a runtime PM ref for debugfs entries where needed Imre Deak
2014-04-14 17:24 ` [PATCH v2 07/25] drm/i915: move getting struct_mutex lower in the callstack during GPU reset Imre Deak
2014-04-14 17:24 ` [PATCH v2 08/25] drm/i915: get a runtime PM ref for the deferred GT powersave enabling Imre Deak
2014-04-25  7:59   ` Daniel Vetter
2014-04-25  8:14     ` Imre Deak
2014-04-25  9:09       ` Daniel Vetter
2014-04-25  8:01   ` Daniel Vetter
2014-04-14 17:24 ` [PATCH v2 09/25] drm/i915: get a runtime PM ref for the deferred GPU reset work Imre Deak
2014-04-16 12:11   ` Ville Syrjälä
2014-04-18 12:47   ` [PATCH v3] " Imre Deak
2014-04-22 19:38     ` Daniel Vetter [this message]
2014-04-22 20:34       ` Imre Deak
2014-04-22 21:05         ` Daniel Vetter
2014-04-22 22:13     ` [PATCH v4] " Imre Deak
2014-04-23  7:07       ` Daniel Vetter
2014-04-23  7:52         ` Imre Deak
2014-04-25  8:00           ` Daniel Vetter
2014-04-14 17:24 ` [PATCH v2 10/25] drm/i915: gen2: move error capture of IER to its correct place Imre Deak
2014-04-16 12:22   ` Ville Syrjälä
2014-04-16 12:57     ` Imre Deak
2014-04-24 21:06       ` Rodrigo Vivi
2014-04-14 17:24 ` [PATCH v2 11/25] drm/i915: add missing error capturing of the PIPESTAT reg Imre Deak
2014-04-16 12:17   ` Ville Syrjälä
2014-04-18 11:44     ` Imre Deak
2014-04-18 12:55   ` [PATCH v3 " Imre Deak
2014-04-23  7:53     ` Ville Syrjälä
2014-04-14 17:24 ` [PATCH v2 12/25] drm/i915: vlv: check port power domain instead of only D0 for eDP VDD on Imre Deak
2014-04-14 17:24 ` [PATCH v2 13/25] drm/i915: fix unbalanced GT powersave enable / disable calls Imre Deak
2014-04-14 17:24 ` [PATCH v2 14/25] drm/i915: sanitize enable_rc6 option Imre Deak
2014-04-16 12:28   ` Ville Syrjälä
2014-04-16 12:37     ` Imre Deak
2014-04-18 13:01   ` [PATCH v3 " Imre Deak
2014-04-23  7:58     ` Ville Syrjälä
2014-04-14 17:24 ` [PATCH v2 15/25] drm/i915: disable runtime PM if RC6 is disabled Imre Deak
2014-04-14 17:24 ` [PATCH v2 16/25] drm/i915: make runtime PM interrupt enable/disable platform independent Imre Deak
2014-04-14 17:24 ` [PATCH v2 17/25] drm/i915: factor out gen6_update_ring_freq Imre Deak
2014-04-16 17:31   ` Ville Syrjälä
2014-04-18 13:16   ` [PATCH v3 " Imre Deak
2014-04-23  7:59     ` Ville Syrjälä
2014-04-14 17:24 ` [PATCH v2 18/25] drm/i915: make runtime PM swizzling/ring_freq init platform independent Imre Deak
2014-04-14 17:24 ` [PATCH v2 19/25] drm/i915: reinit GT power save during resume Imre Deak
2014-04-16 17:46   ` Ville Syrjälä
2014-04-18 10:51     ` Imre Deak
2014-04-22 17:21   ` [PATCH v3 " Imre Deak
2014-04-23  8:06     ` Ville Syrjälä
2014-04-14 17:24 ` [PATCH v2 20/25] drm/i915: vlv: setup RPS min/max frequencies once during init time Imre Deak
2014-04-14 17:24 ` [PATCH v2 21/25] drm/i915: vlv: factor out vlv_force_gfx_clock Imre Deak
2014-04-16 17:49   ` Ville Syrjälä
2014-04-18 13:35   ` [PATCH v3 21/25] drm/i915: vlv: factor out vlv_force_gfx_clock and check for pending force-off Imre Deak
2014-04-23  8:11     ` Ville Syrjälä
2014-04-14 17:24 ` [PATCH v2 22/25] drm/i915: vlv: increase timeout when forcing on the GFX clock Imre Deak
2014-04-25 14:04   ` Daniel Vetter
2014-04-14 17:24 ` [PATCH v2 23/25] drm/i915: add various missing GTI/Gunit register definitions Imre Deak
2014-04-24 21:17   ` Rodrigo Vivi
2014-04-24 21:49     ` Imre Deak
2014-04-30 14:32   ` Ville Syrjälä
2014-05-05 11:43     ` Imre Deak
2014-05-05 12:13   ` [PATCH v3 " Imre Deak
2014-05-05 12:21     ` Ville Syrjälä
2014-04-14 17:24 ` [PATCH v2 24/25] drm/i915: propagate the error code from runtime PM callbacks Imre Deak
2014-04-15 13:39   ` [PATCH v3 " Imre Deak
2014-04-30 18:05     ` Ville Syrjälä
2014-04-30 18:53       ` Imre Deak
2014-05-05 12:44         ` Ville Syrjälä
2014-05-05 13:18           ` Imre Deak
2014-04-14 17:24 ` [PATCH v2 25/25] drm/i915: vlv: add runtime PM support Imre Deak
2014-04-16 12:39   ` Ville Syrjälä
2014-04-16 14:53   ` Daniel Vetter
2014-04-16 16:01     ` Imre Deak
2014-04-22 17:28   ` [PATCH v3 " Imre Deak
2014-04-22 22:09     ` [PATCH v4] drm/i915: get a runtime PM ref for the deferred GPU reset work Imre Deak
2014-04-24 21:02       ` Rodrigo Vivi
2014-04-30 17:35     ` [PATCH v3 25/25] drm/i915: vlv: add runtime PM support Ville Syrjälä
2014-05-05  9:33       ` Daniel Vetter
2014-05-05 12:19     ` [PATCH v4 " Imre Deak
2014-04-14 17:41 ` [PATCH v2 26/25] drm/i915: vlv: enable runtime PM Imre Deak
2014-05-06 19:39   ` Daniel Vetter
2014-04-17 11:00 ` [PATCH v2 00/25] vlv: add support for RPM Ville Syrjälä

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140422193800.GE10722@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=imre.deak@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.