* [PATCH] drm/i915: Clear pending reset requests during suspend
@ 2016-01-14 10:49 Arun Siluvery
2016-01-14 11:07 ` kbuild test robot
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Arun Siluvery @ 2016-01-14 10:49 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
Pending reset requests are cleared before suspending, they should be picked up
after resume when new work is submitted.
This is originally added as part of TDR patches for Gen8 from Tomas Elf which
are under review, as suggested by Chris this is extracted as a separate patch
as it can be useful now.
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
drivers/gpu/drm/i915/i915_drv.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index f17a2b0..09ed83e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -594,6 +594,13 @@ static int i915_drm_suspend(struct drm_device *dev)
goto out;
}
+ /*
+ * Clear any pending reset requests. They should be picked up
+ * after resume when new work is submitted
+ */
+ atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG,
+ &dev_priv->gpu_error.reset_counter);
+
intel_guc_suspend(dev);
intel_suspend_gt_powersave(dev);
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery
@ 2016-01-14 11:07 ` kbuild test robot
2016-01-14 11:19 ` Chris Wilson
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2016-01-14 11:07 UTC (permalink / raw)
To: Arun Siluvery; +Cc: intel-gfx, kbuild-all, Mika Kuoppala
[-- Attachment #1: Type: text/plain, Size: 2447 bytes --]
Hi Arun,
[auto build test WARNING on drm-intel/for-linux-next]
[also build test WARNING on v4.4 next-20160114]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]
url: https://github.com/0day-ci/linux/commits/Arun-Siluvery/drm-i915-Clear-pending-reset-requests-during-suspend/20160114-185121
base: git://anongit.freedesktop.org/drm-intel for-linux-next
config: x86_64-randconfig-x010-01140842 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All warnings (new ones prefixed by >>):
drivers/gpu/drm/i915/i915_drv.c: In function 'i915_drm_suspend':
>> drivers/gpu/drm/i915/i915_drv.c:601:2: warning: 'atomic_clear_mask' is deprecated [-Wdeprecated-declarations]
atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG,
^
In file included from include/linux/debug_locks.h:5:0,
from include/linux/lockdep.h:23,
from include/linux/spinlock_types.h:18,
from include/linux/mutex.h:15,
from include/linux/kernfs.h:13,
from include/linux/sysfs.h:15,
from include/linux/kobject.h:21,
from include/linux/device.h:17,
from drivers/gpu/drm/i915/i915_drv.c:30:
include/linux/atomic.h:458:33: note: declared here
static inline __deprecated void atomic_clear_mask(unsigned int mask, atomic_t *v)
^
vim +/atomic_clear_mask +601 drivers/gpu/drm/i915/i915_drv.c
585
586 drm_kms_helper_poll_disable(dev);
587
588 pci_save_state(dev->pdev);
589
590 error = i915_gem_suspend(dev);
591 if (error) {
592 dev_err(&dev->pdev->dev,
593 "GEM idle failed, resume might fail\n");
594 goto out;
595 }
596
597 /*
598 * Clear any pending reset requests. They should be picked up
599 * after resume when new work is submitted
600 */
> 601 atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG,
602 &dev_priv->gpu_error.reset_counter);
603
604 intel_guc_suspend(dev);
605
606 intel_suspend_gt_powersave(dev);
607
608 /*
609 * Disable CRTCs directly since we want to preserve sw state
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 22096 bytes --]
[-- Attachment #3: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery
2016-01-14 11:07 ` kbuild test robot
@ 2016-01-14 11:19 ` Chris Wilson
2016-01-14 12:20 ` ✗ failure: Fi.CI.BAT Patchwork
2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter
3 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-01-14 11:19 UTC (permalink / raw)
To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala
On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> Pending reset requests are cleared before suspending, they should be picked up
> after resume when new work is submitted.
>
> This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> are under review, as suggested by Chris this is extracted as a separate patch
> as it can be useful now.
>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index f17a2b0..09ed83e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -594,6 +594,13 @@ static int i915_drm_suspend(struct drm_device *dev)
> goto out;
> }
>
> + /*
> + * Clear any pending reset requests. They should be picked up
> + * after resume when new work is submitted
> + */
> + atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG,
> + &dev_priv->gpu_error.reset_counter);
> +
The comment is slightly wrong. When the error tasklet in progress sees
that the flag is unset, it return (i.e. doesn't perform the reset).
This is ok, because we are putting the device to PCI_D3, we are powering
it down which should be our ultimate reset. So no need for the reset on
resume. Except.... We do need to clean up the bookkeeping. Hmm. so what
we need to do is actually flush the reset task, and pretend it succeeded.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* ✗ failure: Fi.CI.BAT
2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery
2016-01-14 11:07 ` kbuild test robot
2016-01-14 11:19 ` Chris Wilson
@ 2016-01-14 12:20 ` Patchwork
2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter
3 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2016-01-14 12:20 UTC (permalink / raw)
To: arun.siluvery; +Cc: intel-gfx
== Summary ==
Built on 058740f8fced6851aeda34f366f5330322cd585f drm-intel-nightly: 2016y-01m-13d-17h-07m-44s UTC integration manifest
Test gem_ctx_basic:
pass -> FAIL (bdw-ultra)
bdw-nuci7 total:138 pass:128 dwarn:1 dfail:0 fail:0 skip:9
bdw-ultra total:138 pass:131 dwarn:0 dfail:0 fail:1 skip:6
bsw-nuc-2 total:141 pass:115 dwarn:2 dfail:0 fail:0 skip:24
hsw-brixbox total:141 pass:134 dwarn:0 dfail:0 fail:0 skip:7
hsw-gt2 total:141 pass:137 dwarn:0 dfail:0 fail:0 skip:4
ilk-hp8440p total:141 pass:100 dwarn:4 dfail:0 fail:0 skip:37
ivb-t430s total:135 pass:122 dwarn:3 dfail:4 fail:0 skip:6
skl-i7k-2 total:141 pass:131 dwarn:2 dfail:0 fail:0 skip:8
snb-dellxps total:141 pass:122 dwarn:5 dfail:0 fail:0 skip:14
snb-x220t total:141 pass:122 dwarn:5 dfail:0 fail:1 skip:13
Results at /archive/results/CI_IGT_test/Patchwork_1184/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery
` (2 preceding siblings ...)
2016-01-14 12:20 ` ✗ failure: Fi.CI.BAT Patchwork
@ 2016-01-19 12:09 ` Daniel Vetter
2016-01-19 13:48 ` Chris Wilson
3 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2016-01-19 12:09 UTC (permalink / raw)
To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala
On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> Pending reset requests are cleared before suspending, they should be picked up
> after resume when new work is submitted.
>
> This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> are under review, as suggested by Chris this is extracted as a separate patch
> as it can be useful now.
>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Pulling in the discussion we had from irc: Imo the right approach is to
simply wait for gpu reset to finish it's job. Since that could in turn
lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
that in a loop around gem_idle. And drop dev->struct_mutex in-between.
E.g.
while (busy) {
mutex_lock();
gpu_idle();
mutex_unlock();
flush_work(reset_work);
}
Cheers, Daniel
> ---
> drivers/gpu/drm/i915/i915_drv.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index f17a2b0..09ed83e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -594,6 +594,13 @@ static int i915_drm_suspend(struct drm_device *dev)
> goto out;
> }
>
> + /*
> + * Clear any pending reset requests. They should be picked up
> + * after resume when new work is submitted
> + */
> + atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG,
> + &dev_priv->gpu_error.reset_counter);
> +
> intel_guc_suspend(dev);
>
> intel_suspend_gt_powersave(dev);
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter
@ 2016-01-19 13:48 ` Chris Wilson
2016-01-19 14:04 ` Daniel Vetter
0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-01-19 13:48 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, Mika Kuoppala
On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
> On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> > Pending reset requests are cleared before suspending, they should be picked up
> > after resume when new work is submitted.
> >
> > This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> > are under review, as suggested by Chris this is extracted as a separate patch
> > as it can be useful now.
> >
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>
> Pulling in the discussion we had from irc: Imo the right approach is to
> simply wait for gpu reset to finish it's job. Since that could in turn
> lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
> that in a loop around gem_idle. And drop dev->struct_mutex in-between.
> E.g.
>
> while (busy) {
> mutex_lock();
> gpu_idle();
> mutex_unlock();
>
> flush_work(reset_work);
> }
Where does the requirement for gpu_idle come from? If there is a global
reset in progress, it cannot queue a request to flush the work and
waiting on the old results will be skipped. So just wait for the global
reset to complete, i.e. flush_work().
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
2016-01-19 13:48 ` Chris Wilson
@ 2016-01-19 14:04 ` Daniel Vetter
2016-01-19 14:13 ` Chris Wilson
0 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2016-01-19 14:04 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, Arun Siluvery, intel-gfx, Mika Kuoppala
On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
> On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
> > On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> > > Pending reset requests are cleared before suspending, they should be picked up
> > > after resume when new work is submitted.
> > >
> > > This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> > > are under review, as suggested by Chris this is extracted as a separate patch
> > > as it can be useful now.
> > >
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> >
> > Pulling in the discussion we had from irc: Imo the right approach is to
> > simply wait for gpu reset to finish it's job. Since that could in turn
> > lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
> > that in a loop around gem_idle. And drop dev->struct_mutex in-between.
> > E.g.
> >
> > while (busy) {
> > mutex_lock();
> > gpu_idle();
> > mutex_unlock();
> >
> > flush_work(reset_work);
> > }
>
> Where does the requirement for gpu_idle come from? If there is a global
> reset in progress, it cannot queue a request to flush the work and
> waiting on the old results will be skipped. So just wait for the global
> reset to complete, i.e. flush_work().
Yes, but the global reset might in turn leave a wrecked gpu behind, or at
least a non-idle one. Hence another gpu_idle on top, to make sure. If we
change init_hw() of engines to be synchronous then we should have at least
a WARN_ON(not_idle_but_i_expected_so()); in there ...
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
2016-01-19 14:04 ` Daniel Vetter
@ 2016-01-19 14:13 ` Chris Wilson
2016-01-19 15:04 ` Arun Siluvery
0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-01-19 14:13 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, Mika Kuoppala
On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote:
> On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
> > On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
> > > On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> > > > Pending reset requests are cleared before suspending, they should be picked up
> > > > after resume when new work is submitted.
> > > >
> > > > This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> > > > are under review, as suggested by Chris this is extracted as a separate patch
> > > > as it can be useful now.
> > > >
> > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> > >
> > > Pulling in the discussion we had from irc: Imo the right approach is to
> > > simply wait for gpu reset to finish it's job. Since that could in turn
> > > lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
> > > that in a loop around gem_idle. And drop dev->struct_mutex in-between.
> > > E.g.
> > >
> > > while (busy) {
> > > mutex_lock();
> > > gpu_idle();
> > > mutex_unlock();
> > >
> > > flush_work(reset_work);
> > > }
> >
> > Where does the requirement for gpu_idle come from? If there is a global
> > reset in progress, it cannot queue a request to flush the work and
> > waiting on the old results will be skipped. So just wait for the global
> > reset to complete, i.e. flush_work().
>
> Yes, but the global reset might in turn leave a wrecked gpu behind, or at
> least a non-idle one. Hence another gpu_idle on top, to make sure. If we
> change init_hw() of engines to be synchronous then we should have at least
> a WARN_ON(not_idle_but_i_expected_so()); in there ...
Does it matter on suspend? We test on resume if the GPU is usable, but
if we wanted to test on suspend then we should do
flush_work();
if (i915_terminally_wedged())
/* oh noes */;
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
2016-01-19 14:13 ` Chris Wilson
@ 2016-01-19 15:04 ` Arun Siluvery
2016-01-19 16:42 ` Daniel Vetter
0 siblings, 1 reply; 12+ messages in thread
From: Arun Siluvery @ 2016-01-19 15:04 UTC (permalink / raw)
To: Chris Wilson, Daniel Vetter, intel-gfx, Mika Kuoppala
On 19/01/2016 14:13, Chris Wilson wrote:
> On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote:
>> On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
>>> On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
>>>> On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
>>>>> Pending reset requests are cleared before suspending, they should be picked up
>>>>> after resume when new work is submitted.
>>>>>
>>>>> This is originally added as part of TDR patches for Gen8 from Tomas Elf which
>>>>> are under review, as suggested by Chris this is extracted as a separate patch
>>>>> as it can be useful now.
>>>>>
>>>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>
>>>> Pulling in the discussion we had from irc: Imo the right approach is to
>>>> simply wait for gpu reset to finish it's job. Since that could in turn
>>>> lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
>>>> that in a loop around gem_idle. And drop dev->struct_mutex in-between.
>>>> E.g.
>>>>
>>>> while (busy) {
>>>> mutex_lock();
>>>> gpu_idle();
>>>> mutex_unlock();
>>>>
>>>> flush_work(reset_work);
>>>> }
>>>
>>> Where does the requirement for gpu_idle come from? If there is a global
>>> reset in progress, it cannot queue a request to flush the work and
>>> waiting on the old results will be skipped. So just wait for the global
>>> reset to complete, i.e. flush_work().
>>
>> Yes, but the global reset might in turn leave a wrecked gpu behind, or at
>> least a non-idle one. Hence another gpu_idle on top, to make sure. If we
>> change init_hw() of engines to be synchronous then we should have at least
>> a WARN_ON(not_idle_but_i_expected_so()); in there ...
gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42,
we are doing reset in hangcheck work itself so I think there is no need
to flush work.
while (i915_reset_in_progress(gpu_error) &&
!i915_terminally_wedged(gpu_error)) {
int ret;
mutex_lock(&dev->struct_mutex);
ret = i915_gpu_idle(dev);
if (ret)
DRM_ERROR("GPU is in inconsistent state after reset\n");
mutex_unlock(&dev->struct_mutex);
}
If the reset is successful we are idle before suspend otherwise in a
wedged state. is this ok?
regards
Arun
>
> Does it matter on suspend? We test on resume if the GPU is usable, but
> if we wanted to test on suspend then we should do
>
> flush_work();
> if (i915_terminally_wedged())
> /* oh noes */;
> -Chris
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
2016-01-19 15:04 ` Arun Siluvery
@ 2016-01-19 16:42 ` Daniel Vetter
2016-01-19 17:01 ` Arun Siluvery
0 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2016-01-19 16:42 UTC (permalink / raw)
To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala
On Tue, Jan 19, 2016 at 03:04:09PM +0000, Arun Siluvery wrote:
> On 19/01/2016 14:13, Chris Wilson wrote:
> >On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote:
> >>On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
> >>>On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
> >>>>On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> >>>>>Pending reset requests are cleared before suspending, they should be picked up
> >>>>>after resume when new work is submitted.
> >>>>>
> >>>>>This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> >>>>>are under review, as suggested by Chris this is extracted as a separate patch
> >>>>>as it can be useful now.
> >>>>>
> >>>>>Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> >>>>>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> >>>>
> >>>>Pulling in the discussion we had from irc: Imo the right approach is to
> >>>>simply wait for gpu reset to finish it's job. Since that could in turn
> >>>>lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
> >>>>that in a loop around gem_idle. And drop dev->struct_mutex in-between.
> >>>>E.g.
> >>>>
> >>>>while (busy) {
> >>>> mutex_lock();
> >>>> gpu_idle();
> >>>> mutex_unlock();
> >>>>
> >>>> flush_work(reset_work);
> >>>>}
> >>>
> >>>Where does the requirement for gpu_idle come from? If there is a global
> >>>reset in progress, it cannot queue a request to flush the work and
> >>>waiting on the old results will be skipped. So just wait for the global
> >>>reset to complete, i.e. flush_work().
> >>
> >>Yes, but the global reset might in turn leave a wrecked gpu behind, or at
> >>least a non-idle one. Hence another gpu_idle on top, to make sure. If we
> >>change init_hw() of engines to be synchronous then we should have at least
> >>a WARN_ON(not_idle_but_i_expected_so()); in there ...
>
> gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we
git sha1 from your private tree are meaningless in the public. Either link
to some git weburl or mailing lists archive link.
Thanks, Daniel
> are doing reset in hangcheck work itself so I think there is no need to
> flush work.
>
> while (i915_reset_in_progress(gpu_error) &&
> !i915_terminally_wedged(gpu_error)) {
> int ret;
>
> mutex_lock(&dev->struct_mutex);
> ret = i915_gpu_idle(dev);
> if (ret)
> DRM_ERROR("GPU is in inconsistent state after reset\n");
> mutex_unlock(&dev->struct_mutex);
> }
>
> If the reset is successful we are idle before suspend otherwise in a wedged
> state. is this ok?
>
> regards
> Arun
>
> >
> >Does it matter on suspend? We test on resume if the GPU is usable, but
> >if we wanted to test on suspend then we should do
> >
> >flush_work();
> >if (i915_terminally_wedged())
> > /* oh noes */;
> >-Chris
> >
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
2016-01-19 16:42 ` Daniel Vetter
@ 2016-01-19 17:01 ` Arun Siluvery
2016-01-19 17:18 ` Daniel Vetter
0 siblings, 1 reply; 12+ messages in thread
From: Arun Siluvery @ 2016-01-19 17:01 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, Mika Kuoppala
On 19/01/2016 16:42, Daniel Vetter wrote:
> On Tue, Jan 19, 2016 at 03:04:09PM +0000, Arun Siluvery wrote:
>> On 19/01/2016 14:13, Chris Wilson wrote:
>>> On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote:
>>>> On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
>>>>> On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
>>>>>> On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
>>>>>>> Pending reset requests are cleared before suspending, they should be picked up
>>>>>>> after resume when new work is submitted.
>>>>>>>
>>>>>>> This is originally added as part of TDR patches for Gen8 from Tomas Elf which
>>>>>>> are under review, as suggested by Chris this is extracted as a separate patch
>>>>>>> as it can be useful now.
>>>>>>>
>>>>>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>>>
>>>>>> Pulling in the discussion we had from irc: Imo the right approach is to
>>>>>> simply wait for gpu reset to finish it's job. Since that could in turn
>>>>>> lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
>>>>>> that in a loop around gem_idle. And drop dev->struct_mutex in-between.
>>>>>> E.g.
>>>>>>
>>>>>> while (busy) {
>>>>>> mutex_lock();
>>>>>> gpu_idle();
>>>>>> mutex_unlock();
>>>>>>
>>>>>> flush_work(reset_work);
>>>>>> }
>>>>>
>>>>> Where does the requirement for gpu_idle come from? If there is a global
>>>>> reset in progress, it cannot queue a request to flush the work and
>>>>> waiting on the old results will be skipped. So just wait for the global
>>>>> reset to complete, i.e. flush_work().
>>>>
>>>> Yes, but the global reset might in turn leave a wrecked gpu behind, or at
>>>> least a non-idle one. Hence another gpu_idle on top, to make sure. If we
>>>> change init_hw() of engines to be synchronous then we should have at least
>>>> a WARN_ON(not_idle_but_i_expected_so()); in there ...
>>
>> gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we
>
> git sha1 from your private tree are meaningless in the public. Either link
> to some git weburl or mailing lists archive link.
It is from drm-intel repo,
http://cgit.freedesktop.org/drm-intel/commit/?id=b8d24a06568368076ebd5a858a011699a97bfa42
http://lists.freedesktop.org/archives/intel-gfx/2015-January/059154.html
regards
Arun
>
> Thanks, Daniel
>
>> are doing reset in hangcheck work itself so I think there is no need to
>> flush work.
>>
>> while (i915_reset_in_progress(gpu_error) &&
>> !i915_terminally_wedged(gpu_error)) {
>> int ret;
>>
>> mutex_lock(&dev->struct_mutex);
>> ret = i915_gpu_idle(dev);
>> if (ret)
>> DRM_ERROR("GPU is in inconsistent state after reset\n");
>> mutex_unlock(&dev->struct_mutex);
>> }
>>
>> If the reset is successful we are idle before suspend otherwise in a wedged
>> state. is this ok?
>>
>> regards
>> Arun
>>
>>>
>>> Does it matter on suspend? We test on resume if the GPU is usable, but
>>> if we wanted to test on suspend then we should do
>>>
>>> flush_work();
>>> if (i915_terminally_wedged())
>>> /* oh noes */;
>>> -Chris
>>>
>>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend
2016-01-19 17:01 ` Arun Siluvery
@ 2016-01-19 17:18 ` Daniel Vetter
0 siblings, 0 replies; 12+ messages in thread
From: Daniel Vetter @ 2016-01-19 17:18 UTC (permalink / raw)
To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala
On Tue, Jan 19, 2016 at 05:01:00PM +0000, Arun Siluvery wrote:
> On 19/01/2016 16:42, Daniel Vetter wrote:
> >On Tue, Jan 19, 2016 at 03:04:09PM +0000, Arun Siluvery wrote:
> >>On 19/01/2016 14:13, Chris Wilson wrote:
> >>>On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote:
> >>>>On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
> >>>>>On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
> >>>>>>On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
> >>>>>>>Pending reset requests are cleared before suspending, they should be picked up
> >>>>>>>after resume when new work is submitted.
> >>>>>>>
> >>>>>>>This is originally added as part of TDR patches for Gen8 from Tomas Elf which
> >>>>>>>are under review, as suggested by Chris this is extracted as a separate patch
> >>>>>>>as it can be useful now.
> >>>>>>>
> >>>>>>>Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> >>>>>>>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>>>Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> >>>>>>
> >>>>>>Pulling in the discussion we had from irc: Imo the right approach is to
> >>>>>>simply wait for gpu reset to finish it's job. Since that could in turn
> >>>>>>lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
> >>>>>>that in a loop around gem_idle. And drop dev->struct_mutex in-between.
> >>>>>>E.g.
> >>>>>>
> >>>>>>while (busy) {
> >>>>>> mutex_lock();
> >>>>>> gpu_idle();
> >>>>>> mutex_unlock();
> >>>>>>
> >>>>>> flush_work(reset_work);
> >>>>>>}
> >>>>>
> >>>>>Where does the requirement for gpu_idle come from? If there is a global
> >>>>>reset in progress, it cannot queue a request to flush the work and
> >>>>>waiting on the old results will be skipped. So just wait for the global
> >>>>>reset to complete, i.e. flush_work().
> >>>>
> >>>>Yes, but the global reset might in turn leave a wrecked gpu behind, or at
> >>>>least a non-idle one. Hence another gpu_idle on top, to make sure. If we
> >>>>change init_hw() of engines to be synchronous then we should have at least
> >>>>a WARN_ON(not_idle_but_i_expected_so()); in there ...
> >>
> >>gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we
> >
> >git sha1 from your private tree are meaningless in the public. Either link
> >to some git weburl or mailing lists archive link.
>
> It is from drm-intel repo,
> http://cgit.freedesktop.org/drm-intel/commit/?id=b8d24a06568368076ebd5a858a011699a97bfa42
>
> http://lists.freedesktop.org/archives/intel-gfx/2015-January/059154.html
Oh right, forgot that this landed, sorry for the confusion.
Summary of our irc discussion: We idle the gpu and flush the hangcheck
(which should flush the reset work) so at least with current upstream
there shouldn't be a bug. If there is a bug we need to understand it, we
can't just add code without clear explanation and reasons: At best that
confuses, at worst it hides some real bugs.
-Daniel
>
> regards
> Arun
>
> >
> >Thanks, Daniel
> >
> >>are doing reset in hangcheck work itself so I think there is no need to
> >>flush work.
> >>
> >>while (i915_reset_in_progress(gpu_error) &&
> >> !i915_terminally_wedged(gpu_error)) {
> >> int ret;
> >>
> >> mutex_lock(&dev->struct_mutex);
> >> ret = i915_gpu_idle(dev);
> >> if (ret)
> >> DRM_ERROR("GPU is in inconsistent state after reset\n");
> >> mutex_unlock(&dev->struct_mutex);
> >>}
> >>
> >>If the reset is successful we are idle before suspend otherwise in a wedged
> >>state. is this ok?
> >>
> >>regards
> >>Arun
> >>
> >>>
> >>>Does it matter on suspend? We test on resume if the GPU is usable, but
> >>>if we wanted to test on suspend then we should do
> >>>
> >>>flush_work();
> >>>if (i915_terminally_wedged())
> >>> /* oh noes */;
> >>>-Chris
> >>>
> >>
> >
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-01-19 17:18 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery
2016-01-14 11:07 ` kbuild test robot
2016-01-14 11:19 ` Chris Wilson
2016-01-14 12:20 ` ✗ failure: Fi.CI.BAT Patchwork
2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter
2016-01-19 13:48 ` Chris Wilson
2016-01-19 14:04 ` Daniel Vetter
2016-01-19 14:13 ` Chris Wilson
2016-01-19 15:04 ` Arun Siluvery
2016-01-19 16:42 ` Daniel Vetter
2016-01-19 17:01 ` Arun Siluvery
2016-01-19 17:18 ` Daniel Vetter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.