[PATCH] drm/i915: Clear pending reset requests during suspend

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/i915: Clear pending reset requests during suspend
@ 2016-01-14 10:49 Arun Siluvery
  2016-01-14 11:07 ` kbuild test robot
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Arun Siluvery @ 2016-01-14 10:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Pending reset requests are cleared before suspending, they should be picked up
after resume when new work is submitted.

This is originally added as part of TDR patches for Gen8 from Tomas Elf which
are under review, as suggested by Chris this is extracted as a separate patch
as it can be useful now.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index f17a2b0..09ed83e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -594,6 +594,13 @@ static int i915_drm_suspend(struct drm_device *dev)
 		goto out;
 	}
 
+	/*
+	 * Clear any pending reset requests. They should be picked up
+	 * after resume when new work is submitted
+	 */
+	atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG,
+			  &dev_priv->gpu_error.reset_counter);
+
 	intel_guc_suspend(dev);
 
 	intel_suspend_gt_powersave(dev);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
  2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery
@ 2016-01-14 11:07 ` kbuild test robot
  2016-01-14 11:19 ` Chris Wilson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2016-01-14 11:07 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx, kbuild-all, Mika Kuoppala

[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]

Hi Arun,

[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on v4.4 next-20160114]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Arun-Siluvery/drm-i915-Clear-pending-reset-requests-during-suspend/20160114-185121
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: x86_64-randconfig-x010-01140842 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/i915/i915_drv.c: In function 'i915_drm_suspend':
>> drivers/gpu/drm/i915/i915_drv.c:601:2: warning: 'atomic_clear_mask' is deprecated [-Wdeprecated-declarations]
     atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG,
     ^
   In file included from include/linux/debug_locks.h:5:0,
                    from include/linux/lockdep.h:23,
                    from include/linux/spinlock_types.h:18,
                    from include/linux/mutex.h:15,
                    from include/linux/kernfs.h:13,
                    from include/linux/sysfs.h:15,
                    from include/linux/kobject.h:21,
                    from include/linux/device.h:17,
                    from drivers/gpu/drm/i915/i915_drv.c:30:
   include/linux/atomic.h:458:33: note: declared here
    static inline __deprecated void atomic_clear_mask(unsigned int mask, atomic_t *v)
                                    ^

vim +/atomic_clear_mask +601 drivers/gpu/drm/i915/i915_drv.c

   585	
   586		drm_kms_helper_poll_disable(dev);
   587	
   588		pci_save_state(dev->pdev);
   589	
   590		error = i915_gem_suspend(dev);
   591		if (error) {
   592			dev_err(&dev->pdev->dev,
   593				"GEM idle failed, resume might fail\n");
   594			goto out;
   595		}
   596	
   597		/*
   598		 * Clear any pending reset requests. They should be picked up
   599		 * after resume when new work is submitted
   600		 */
 > 601		atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG,
   602				  &dev_priv->gpu_error.reset_counter);
   603	
   604		intel_guc_suspend(dev);
   605	
   606		intel_suspend_gt_powersave(dev);
   607	
   608		/*
   609		 * Disable CRTCs directly since we want to preserve sw state

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 22096 bytes --]

[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
  2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery
  2016-01-14 11:07 ` kbuild test robot
@ 2016-01-14 11:19 ` Chris Wilson
  2016-01-14 12:20 ` ✗ failure: Fi.CI.BAT Patchwork
  2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter
  3 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-01-14 11:19 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala

On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> Pending reset requests are cleared before suspending, they should be picked up
> after resume when new work is submitted.
> 
> This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> are under review, as suggested by Chris this is extracted as a separate patch
> as it can be useful now.
> 
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index f17a2b0..09ed83e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -594,6 +594,13 @@ static int i915_drm_suspend(struct drm_device *dev)
>  		goto out;
>  	}
>  
> +	/*
> +	 * Clear any pending reset requests. They should be picked up
> +	 * after resume when new work is submitted
> +	 */
> +	atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG,
> +			  &dev_priv->gpu_error.reset_counter);
> +

The comment is slightly wrong. When the error tasklet in progress sees
that the flag is unset, it return (i.e. doesn't perform the reset).

This is ok, because we are putting the device to PCI_D3, we are powering
it down which should be our ultimate reset. So no need for the reset on
resume. Except.... We do need to clean up the bookkeeping. Hmm. so what
we need to do is actually flush the reset task, and pretend it succeeded.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* ✗ failure: Fi.CI.BAT
  2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery
  2016-01-14 11:07 ` kbuild test robot
  2016-01-14 11:19 ` Chris Wilson
@ 2016-01-14 12:20 ` Patchwork
  2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter
  3 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2016-01-14 12:20 UTC (permalink / raw)
  To: arun.siluvery; +Cc: intel-gfx

== Summary ==

Built on 058740f8fced6851aeda34f366f5330322cd585f drm-intel-nightly: 2016y-01m-13d-17h-07m-44s UTC integration manifest

Test gem_ctx_basic:
                pass       -> FAIL       (bdw-ultra)

bdw-nuci7        total:138  pass:128  dwarn:1   dfail:0   fail:0   skip:9  
bdw-ultra        total:138  pass:131  dwarn:0   dfail:0   fail:1   skip:6  
bsw-nuc-2        total:141  pass:115  dwarn:2   dfail:0   fail:0   skip:24 
hsw-brixbox      total:141  pass:134  dwarn:0   dfail:0   fail:0   skip:7  
hsw-gt2          total:141  pass:137  dwarn:0   dfail:0   fail:0   skip:4  
ilk-hp8440p      total:141  pass:100  dwarn:4   dfail:0   fail:0   skip:37 
ivb-t430s        total:135  pass:122  dwarn:3   dfail:4   fail:0   skip:6  
skl-i7k-2        total:141  pass:131  dwarn:2   dfail:0   fail:0   skip:8  
snb-dellxps      total:141  pass:122  dwarn:5   dfail:0   fail:0   skip:14 
snb-x220t        total:141  pass:122  dwarn:5   dfail:0   fail:1   skip:13 

Results at /archive/results/CI_IGT_test/Patchwork_1184/

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
  2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery
                   ` (2 preceding siblings ...)
  2016-01-14 12:20 ` ✗ failure: Fi.CI.BAT Patchwork
@ 2016-01-19 12:09 ` Daniel Vetter
  2016-01-19 13:48   ` Chris Wilson
  3 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2016-01-19 12:09 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala

On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> Pending reset requests are cleared before suspending, they should be picked up
> after resume when new work is submitted.
> 
> This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> are under review, as suggested by Chris this is extracted as a separate patch
> as it can be useful now.
> 
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>

Pulling in the discussion we had from irc: Imo the right approach is to
simply wait for gpu reset to finish it's job. Since that could in turn
lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
that in a loop around gem_idle. And drop dev->struct_mutex in-between.
E.g.

while (busy) {
	mutex_lock();
	gpu_idle();
	mutex_unlock();

	flush_work(reset_work);
}

Cheers, Daniel
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index f17a2b0..09ed83e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -594,6 +594,13 @@ static int i915_drm_suspend(struct drm_device *dev)
>  		goto out;
>  	}
>  
> +	/*
> +	 * Clear any pending reset requests. They should be picked up
> +	 * after resume when new work is submitted
> +	 */
> +	atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG,
> +			  &dev_priv->gpu_error.reset_counter);
> +
>  	intel_guc_suspend(dev);
>  
>  	intel_suspend_gt_powersave(dev);
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
  2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter
@ 2016-01-19 13:48   ` Chris Wilson
  2016-01-19 14:04     ` Daniel Vetter
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-01-19 13:48 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Mika Kuoppala

On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
> On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> > Pending reset requests are cleared before suspending, they should be picked up
> > after resume when new work is submitted.
> > 
> > This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> > are under review, as suggested by Chris this is extracted as a separate patch
> > as it can be useful now.
> > 
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> 
> Pulling in the discussion we had from irc: Imo the right approach is to
> simply wait for gpu reset to finish it's job. Since that could in turn
> lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
> that in a loop around gem_idle. And drop dev->struct_mutex in-between.
> E.g.
> 
> while (busy) {
> 	mutex_lock();
> 	gpu_idle();
> 	mutex_unlock();
> 
> 	flush_work(reset_work);
> }

Where does the requirement for gpu_idle come from? If there is a global
reset in progress, it cannot queue a request to flush the work and
waiting on the old results will be skipped. So just wait for the global
reset to complete, i.e. flush_work().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
  2016-01-19 13:48   ` Chris Wilson
@ 2016-01-19 14:04     ` Daniel Vetter
  2016-01-19 14:13       ` Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2016-01-19 14:04 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Arun Siluvery, intel-gfx, Mika Kuoppala

On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
> On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
> > On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> > > Pending reset requests are cleared before suspending, they should be picked up
> > > after resume when new work is submitted.
> > > 
> > > This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> > > are under review, as suggested by Chris this is extracted as a separate patch
> > > as it can be useful now.
> > > 
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> > 
> > Pulling in the discussion we had from irc: Imo the right approach is to
> > simply wait for gpu reset to finish it's job. Since that could in turn
> > lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
> > that in a loop around gem_idle. And drop dev->struct_mutex in-between.
> > E.g.
> > 
> > while (busy) {
> > 	mutex_lock();
> > 	gpu_idle();
> > 	mutex_unlock();
> > 
> > 	flush_work(reset_work);
> > }
> 
> Where does the requirement for gpu_idle come from? If there is a global
> reset in progress, it cannot queue a request to flush the work and
> waiting on the old results will be skipped. So just wait for the global
> reset to complete, i.e. flush_work().

Yes, but the global reset might in turn leave a wrecked gpu behind, or at
least a non-idle one. Hence another gpu_idle on top, to make sure. If we
change init_hw() of engines to be synchronous then we should have at least
a WARN_ON(not_idle_but_i_expected_so()); in there ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
  2016-01-19 14:04     ` Daniel Vetter
@ 2016-01-19 14:13       ` Chris Wilson
  2016-01-19 15:04         ` Arun Siluvery
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-01-19 14:13 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Mika Kuoppala

On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote:
> On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
> > On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
> > > On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> > > > Pending reset requests are cleared before suspending, they should be picked up
> > > > after resume when new work is submitted.
> > > > 
> > > > This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> > > > are under review, as suggested by Chris this is extracted as a separate patch
> > > > as it can be useful now.
> > > > 
> > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> > > 
> > > Pulling in the discussion we had from irc: Imo the right approach is to
> > > simply wait for gpu reset to finish it's job. Since that could in turn
> > > lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
> > > that in a loop around gem_idle. And drop dev->struct_mutex in-between.
> > > E.g.
> > > 
> > > while (busy) {
> > > 	mutex_lock();
> > > 	gpu_idle();
> > > 	mutex_unlock();
> > > 
> > > 	flush_work(reset_work);
> > > }
> > 
> > Where does the requirement for gpu_idle come from? If there is a global
> > reset in progress, it cannot queue a request to flush the work and
> > waiting on the old results will be skipped. So just wait for the global
> > reset to complete, i.e. flush_work().
> 
> Yes, but the global reset might in turn leave a wrecked gpu behind, or at
> least a non-idle one. Hence another gpu_idle on top, to make sure. If we
> change init_hw() of engines to be synchronous then we should have at least
> a WARN_ON(not_idle_but_i_expected_so()); in there ...

Does it matter on suspend? We test on resume if the GPU is usable, but
if we wanted to test on suspend then we should do

flush_work();
if (i915_terminally_wedged())
   /* oh noes */;
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
  2016-01-19 14:13       ` Chris Wilson
@ 2016-01-19 15:04         ` Arun Siluvery
  2016-01-19 16:42           ` Daniel Vetter
  0 siblings, 1 reply; 12+ messages in thread
From: Arun Siluvery @ 2016-01-19 15:04 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Mika Kuoppala

On 19/01/2016 14:13, Chris Wilson wrote:
> On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote:
>> On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
>>> On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
>>>> On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
>>>>> Pending reset requests are cleared before suspending, they should be picked up
>>>>> after resume when new work is submitted.
>>>>>
>>>>> This is originally added as part of TDR patches for Gen8 from Tomas Elf which
>>>>> are under review, as suggested by Chris this is extracted as a separate patch
>>>>> as it can be useful now.
>>>>>
>>>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>
>>>> Pulling in the discussion we had from irc: Imo the right approach is to
>>>> simply wait for gpu reset to finish it's job. Since that could in turn
>>>> lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
>>>> that in a loop around gem_idle. And drop dev->struct_mutex in-between.
>>>> E.g.
>>>>
>>>> while (busy) {
>>>> 	mutex_lock();
>>>> 	gpu_idle();
>>>> 	mutex_unlock();
>>>>
>>>> 	flush_work(reset_work);
>>>> }
>>>
>>> Where does the requirement for gpu_idle come from? If there is a global
>>> reset in progress, it cannot queue a request to flush the work and
>>> waiting on the old results will be skipped. So just wait for the global
>>> reset to complete, i.e. flush_work().
>>
>> Yes, but the global reset might in turn leave a wrecked gpu behind, or at
>> least a non-idle one. Hence another gpu_idle on top, to make sure. If we
>> change init_hw() of engines to be synchronous then we should have at least
>> a WARN_ON(not_idle_but_i_expected_so()); in there ...

gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, 
we are doing reset in hangcheck work itself so I think there is no need 
to flush work.

while (i915_reset_in_progress(gpu_error) &&
        !i915_terminally_wedged(gpu_error)) {
         int ret;

         mutex_lock(&dev->struct_mutex);
         ret = i915_gpu_idle(dev);
         if (ret)
                 DRM_ERROR("GPU is in inconsistent state after reset\n");
         mutex_unlock(&dev->struct_mutex);
}

If the reset is successful we are idle before suspend otherwise in a 
wedged state. is this ok?

regards
Arun

>
> Does it matter on suspend? We test on resume if the GPU is usable, but
> if we wanted to test on suspend then we should do
>
> flush_work();
> if (i915_terminally_wedged())
>     /* oh noes */;
> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
  2016-01-19 15:04         ` Arun Siluvery
@ 2016-01-19 16:42           ` Daniel Vetter
  2016-01-19 17:01             ` Arun Siluvery
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2016-01-19 16:42 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala

On Tue, Jan 19, 2016 at 03:04:09PM +0000, Arun Siluvery wrote:
> On 19/01/2016 14:13, Chris Wilson wrote:
> >On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote:
> >>On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
> >>>On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
> >>>>On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> >>>>>Pending reset requests are cleared before suspending, they should be picked up
> >>>>>after resume when new work is submitted.
> >>>>>
> >>>>>This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> >>>>>are under review, as suggested by Chris this is extracted as a separate patch
> >>>>>as it can be useful now.
> >>>>>
> >>>>>Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> >>>>>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> >>>>
> >>>>Pulling in the discussion we had from irc: Imo the right approach is to
> >>>>simply wait for gpu reset to finish it's job. Since that could in turn
> >>>>lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
> >>>>that in a loop around gem_idle. And drop dev->struct_mutex in-between.
> >>>>E.g.
> >>>>
> >>>>while (busy) {
> >>>>	mutex_lock();
> >>>>	gpu_idle();
> >>>>	mutex_unlock();
> >>>>
> >>>>	flush_work(reset_work);
> >>>>}
> >>>
> >>>Where does the requirement for gpu_idle come from? If there is a global
> >>>reset in progress, it cannot queue a request to flush the work and
> >>>waiting on the old results will be skipped. So just wait for the global
> >>>reset to complete, i.e. flush_work().
> >>
> >>Yes, but the global reset might in turn leave a wrecked gpu behind, or at
> >>least a non-idle one. Hence another gpu_idle on top, to make sure. If we
> >>change init_hw() of engines to be synchronous then we should have at least
> >>a WARN_ON(not_idle_but_i_expected_so()); in there ...
> 
> gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we

git sha1 from your private tree are meaningless in the public. Either link
to some git weburl or mailing lists archive link.

Thanks, Daniel

> are doing reset in hangcheck work itself so I think there is no need to
> flush work.
> 
> while (i915_reset_in_progress(gpu_error) &&
>        !i915_terminally_wedged(gpu_error)) {
>         int ret;
> 
>         mutex_lock(&dev->struct_mutex);
>         ret = i915_gpu_idle(dev);
>         if (ret)
>                 DRM_ERROR("GPU is in inconsistent state after reset\n");
>         mutex_unlock(&dev->struct_mutex);
> }
> 
> If the reset is successful we are idle before suspend otherwise in a wedged
> state. is this ok?
> 
> regards
> Arun
> 
> >
> >Does it matter on suspend? We test on resume if the GPU is usable, but
> >if we wanted to test on suspend then we should do
> >
> >flush_work();
> >if (i915_terminally_wedged())
> >    /* oh noes */;
> >-Chris
> >
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
  2016-01-19 16:42           ` Daniel Vetter
@ 2016-01-19 17:01             ` Arun Siluvery
  2016-01-19 17:18               ` Daniel Vetter
  0 siblings, 1 reply; 12+ messages in thread
From: Arun Siluvery @ 2016-01-19 17:01 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Mika Kuoppala

On 19/01/2016 16:42, Daniel Vetter wrote:
> On Tue, Jan 19, 2016 at 03:04:09PM +0000, Arun Siluvery wrote:
>> On 19/01/2016 14:13, Chris Wilson wrote:
>>> On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote:
>>>> On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
>>>>> On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
>>>>>> On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
>>>>>>> Pending reset requests are cleared before suspending, they should be picked up
>>>>>>> after resume when new work is submitted.
>>>>>>>
>>>>>>> This is originally added as part of TDR patches for Gen8 from Tomas Elf which
>>>>>>> are under review, as suggested by Chris this is extracted as a separate patch
>>>>>>> as it can be useful now.
>>>>>>>
>>>>>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>>>
>>>>>> Pulling in the discussion we had from irc: Imo the right approach is to
>>>>>> simply wait for gpu reset to finish it's job. Since that could in turn
>>>>>> lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
>>>>>> that in a loop around gem_idle. And drop dev->struct_mutex in-between.
>>>>>> E.g.
>>>>>>
>>>>>> while (busy) {
>>>>>> 	mutex_lock();
>>>>>> 	gpu_idle();
>>>>>> 	mutex_unlock();
>>>>>>
>>>>>> 	flush_work(reset_work);
>>>>>> }
>>>>>
>>>>> Where does the requirement for gpu_idle come from? If there is a global
>>>>> reset in progress, it cannot queue a request to flush the work and
>>>>> waiting on the old results will be skipped. So just wait for the global
>>>>> reset to complete, i.e. flush_work().
>>>>
>>>> Yes, but the global reset might in turn leave a wrecked gpu behind, or at
>>>> least a non-idle one. Hence another gpu_idle on top, to make sure. If we
>>>> change init_hw() of engines to be synchronous then we should have at least
>>>> a WARN_ON(not_idle_but_i_expected_so()); in there ...
>>
>> gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we
>
> git sha1 from your private tree are meaningless in the public. Either link
> to some git weburl or mailing lists archive link.

It is from drm-intel repo,
http://cgit.freedesktop.org/drm-intel/commit/?id=b8d24a06568368076ebd5a858a011699a97bfa42

http://lists.freedesktop.org/archives/intel-gfx/2015-January/059154.html

regards
Arun

>
> Thanks, Daniel
>
>> are doing reset in hangcheck work itself so I think there is no need to
>> flush work.
>>
>> while (i915_reset_in_progress(gpu_error) &&
>>         !i915_terminally_wedged(gpu_error)) {
>>          int ret;
>>
>>          mutex_lock(&dev->struct_mutex);
>>          ret = i915_gpu_idle(dev);
>>          if (ret)
>>                  DRM_ERROR("GPU is in inconsistent state after reset\n");
>>          mutex_unlock(&dev->struct_mutex);
>> }
>>
>> If the reset is successful we are idle before suspend otherwise in a wedged
>> state. is this ok?
>>
>> regards
>> Arun
>>
>>>
>>> Does it matter on suspend? We test on resume if the GPU is usable, but
>>> if we wanted to test on suspend then we should do
>>>
>>> flush_work();
>>> if (i915_terminally_wedged())
>>>     /* oh noes */;
>>> -Chris
>>>
>>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
  2016-01-19 17:01             ` Arun Siluvery
@ 2016-01-19 17:18               ` Daniel Vetter
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Vetter @ 2016-01-19 17:18 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala

On Tue, Jan 19, 2016 at 05:01:00PM +0000, Arun Siluvery wrote:
> On 19/01/2016 16:42, Daniel Vetter wrote:
> >On Tue, Jan 19, 2016 at 03:04:09PM +0000, Arun Siluvery wrote:
> >>On 19/01/2016 14:13, Chris Wilson wrote:
> >>>On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote:
> >>>>On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
> >>>>>On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
> >>>>>>On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> >>>>>>>Pending reset requests are cleared before suspending, they should be picked up
> >>>>>>>after resume when new work is submitted.
> >>>>>>>
> >>>>>>>This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> >>>>>>>are under review, as suggested by Chris this is extracted as a separate patch
> >>>>>>>as it can be useful now.
> >>>>>>>
> >>>>>>>Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> >>>>>>>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>>>Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> >>>>>>
> >>>>>>Pulling in the discussion we had from irc: Imo the right approach is to
> >>>>>>simply wait for gpu reset to finish it's job. Since that could in turn
> >>>>>>lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
> >>>>>>that in a loop around gem_idle. And drop dev->struct_mutex in-between.
> >>>>>>E.g.
> >>>>>>
> >>>>>>while (busy) {
> >>>>>>	mutex_lock();
> >>>>>>	gpu_idle();
> >>>>>>	mutex_unlock();
> >>>>>>
> >>>>>>	flush_work(reset_work);
> >>>>>>}
> >>>>>
> >>>>>Where does the requirement for gpu_idle come from? If there is a global
> >>>>>reset in progress, it cannot queue a request to flush the work and
> >>>>>waiting on the old results will be skipped. So just wait for the global
> >>>>>reset to complete, i.e. flush_work().
> >>>>
> >>>>Yes, but the global reset might in turn leave a wrecked gpu behind, or at
> >>>>least a non-idle one. Hence another gpu_idle on top, to make sure. If we
> >>>>change init_hw() of engines to be synchronous then we should have at least
> >>>>a WARN_ON(not_idle_but_i_expected_so()); in there ...
> >>
> >>gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we
> >
> >git sha1 from your private tree are meaningless in the public. Either link
> >to some git weburl or mailing lists archive link.
> 
> It is from drm-intel repo,
> http://cgit.freedesktop.org/drm-intel/commit/?id=b8d24a06568368076ebd5a858a011699a97bfa42
> 
> http://lists.freedesktop.org/archives/intel-gfx/2015-January/059154.html

Oh right, forgot that this landed, sorry for the confusion.

Summary of our irc discussion: We idle the gpu and flush the hangcheck
(which should flush the reset work) so at least with current upstream
there shouldn't be a bug. If there is a bug we need to understand it, we
can't just add code without clear explanation and reasons: At best that
confuses, at worst it hides some real bugs.
-Daniel

> 
> regards
> Arun
> 
> >
> >Thanks, Daniel
> >
> >>are doing reset in hangcheck work itself so I think there is no need to
> >>flush work.
> >>
> >>while (i915_reset_in_progress(gpu_error) &&
> >>        !i915_terminally_wedged(gpu_error)) {
> >>         int ret;
> >>
> >>         mutex_lock(&dev->struct_mutex);
> >>         ret = i915_gpu_idle(dev);
> >>         if (ret)
> >>                 DRM_ERROR("GPU is in inconsistent state after reset\n");
> >>         mutex_unlock(&dev->struct_mutex);
> >>}
> >>
> >>If the reset is successful we are idle before suspend otherwise in a wedged
> >>state. is this ok?
> >>
> >>regards
> >>Arun
> >>
> >>>
> >>>Does it matter on suspend? We test on resume if the GPU is usable, but
> >>>if we wanted to test on suspend then we should do
> >>>
> >>>flush_work();
> >>>if (i915_terminally_wedged())
> >>>    /* oh noes */;
> >>>-Chris
> >>>
> >>
> >
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-19 17:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery
2016-01-14 11:07 ` kbuild test robot
2016-01-14 11:19 ` Chris Wilson
2016-01-14 12:20 ` ✗ failure: Fi.CI.BAT Patchwork
2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter
2016-01-19 13:48   ` Chris Wilson
2016-01-19 14:04     ` Daniel Vetter
2016-01-19 14:13       ` Chris Wilson
2016-01-19 15:04         ` Arun Siluvery
2016-01-19 16:42           ` Daniel Vetter
2016-01-19 17:01             ` Arun Siluvery
2016-01-19 17:18               ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.