* [PATCH] drm/i915: Clear pending reset requests during suspend @ 2016-01-14 10:49 Arun Siluvery 2016-01-14 11:07 ` kbuild test robot ` (3 more replies) 0 siblings, 4 replies; 12+ messages in thread From: Arun Siluvery @ 2016-01-14 10:49 UTC (permalink / raw) To: intel-gfx; +Cc: Mika Kuoppala Pending reset requests are cleared before suspending, they should be picked up after resume when new work is submitted. This is originally added as part of TDR patches for Gen8 from Tomas Elf which are under review, as suggested by Chris this is extracted as a separate patch as it can be useful now. Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> --- drivers/gpu/drm/i915/i915_drv.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index f17a2b0..09ed83e 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -594,6 +594,13 @@ static int i915_drm_suspend(struct drm_device *dev) goto out; } + /* + * Clear any pending reset requests. They should be picked up + * after resume when new work is submitted + */ + atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG, + &dev_priv->gpu_error.reset_counter); + intel_guc_suspend(dev); intel_suspend_gt_powersave(dev); -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend 2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery @ 2016-01-14 11:07 ` kbuild test robot 2016-01-14 11:19 ` Chris Wilson ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: kbuild test robot @ 2016-01-14 11:07 UTC (permalink / raw) To: Arun Siluvery; +Cc: intel-gfx, kbuild-all, Mika Kuoppala [-- Attachment #1: Type: text/plain, Size: 2447 bytes --] Hi Arun, [auto build test WARNING on drm-intel/for-linux-next] [also build test WARNING on v4.4 next-20160114] [if your patch is applied to the wrong git tree, please drop us a note to help improving the system] url: https://github.com/0day-ci/linux/commits/Arun-Siluvery/drm-i915-Clear-pending-reset-requests-during-suspend/20160114-185121 base: git://anongit.freedesktop.org/drm-intel for-linux-next config: x86_64-randconfig-x010-01140842 (attached as .config) reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): drivers/gpu/drm/i915/i915_drv.c: In function 'i915_drm_suspend': >> drivers/gpu/drm/i915/i915_drv.c:601:2: warning: 'atomic_clear_mask' is deprecated [-Wdeprecated-declarations] atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG, ^ In file included from include/linux/debug_locks.h:5:0, from include/linux/lockdep.h:23, from include/linux/spinlock_types.h:18, from include/linux/mutex.h:15, from include/linux/kernfs.h:13, from include/linux/sysfs.h:15, from include/linux/kobject.h:21, from include/linux/device.h:17, from drivers/gpu/drm/i915/i915_drv.c:30: include/linux/atomic.h:458:33: note: declared here static inline __deprecated void atomic_clear_mask(unsigned int mask, atomic_t *v) ^ vim +/atomic_clear_mask +601 drivers/gpu/drm/i915/i915_drv.c 585 586 drm_kms_helper_poll_disable(dev); 587 588 pci_save_state(dev->pdev); 589 590 error = i915_gem_suspend(dev); 591 if (error) { 592 dev_err(&dev->pdev->dev, 593 "GEM idle failed, resume might fail\n"); 594 goto out; 595 } 596 597 /* 598 * Clear any pending reset requests. They should be picked up 599 * after resume when new work is submitted 600 */ > 601 atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG, 602 &dev_priv->gpu_error.reset_counter); 603 604 intel_guc_suspend(dev); 605 606 intel_suspend_gt_powersave(dev); 607 608 /* 609 * Disable CRTCs directly since we want to preserve sw state --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/octet-stream, Size: 22096 bytes --] [-- Attachment #3: Type: text/plain, Size: 159 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend 2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery 2016-01-14 11:07 ` kbuild test robot @ 2016-01-14 11:19 ` Chris Wilson 2016-01-14 12:20 ` ✗ failure: Fi.CI.BAT Patchwork 2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter 3 siblings, 0 replies; 12+ messages in thread From: Chris Wilson @ 2016-01-14 11:19 UTC (permalink / raw) To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote: > Pending reset requests are cleared before suspending, they should be picked up > after resume when new work is submitted. > > This is originally added as part of TDR patches for Gen8 from Tomas Elf which > are under review, as suggested by Chris this is extracted as a separate patch > as it can be useful now. > > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_drv.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index f17a2b0..09ed83e 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -594,6 +594,13 @@ static int i915_drm_suspend(struct drm_device *dev) > goto out; > } > > + /* > + * Clear any pending reset requests. They should be picked up > + * after resume when new work is submitted > + */ > + atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG, > + &dev_priv->gpu_error.reset_counter); > + The comment is slightly wrong. When the error tasklet in progress sees that the flag is unset, it return (i.e. doesn't perform the reset). This is ok, because we are putting the device to PCI_D3, we are powering it down which should be our ultimate reset. So no need for the reset on resume. Except.... We do need to clean up the bookkeeping. Hmm. so what we need to do is actually flush the reset task, and pretend it succeeded. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* ✗ failure: Fi.CI.BAT 2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery 2016-01-14 11:07 ` kbuild test robot 2016-01-14 11:19 ` Chris Wilson @ 2016-01-14 12:20 ` Patchwork 2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter 3 siblings, 0 replies; 12+ messages in thread From: Patchwork @ 2016-01-14 12:20 UTC (permalink / raw) To: arun.siluvery; +Cc: intel-gfx == Summary == Built on 058740f8fced6851aeda34f366f5330322cd585f drm-intel-nightly: 2016y-01m-13d-17h-07m-44s UTC integration manifest Test gem_ctx_basic: pass -> FAIL (bdw-ultra) bdw-nuci7 total:138 pass:128 dwarn:1 dfail:0 fail:0 skip:9 bdw-ultra total:138 pass:131 dwarn:0 dfail:0 fail:1 skip:6 bsw-nuc-2 total:141 pass:115 dwarn:2 dfail:0 fail:0 skip:24 hsw-brixbox total:141 pass:134 dwarn:0 dfail:0 fail:0 skip:7 hsw-gt2 total:141 pass:137 dwarn:0 dfail:0 fail:0 skip:4 ilk-hp8440p total:141 pass:100 dwarn:4 dfail:0 fail:0 skip:37 ivb-t430s total:135 pass:122 dwarn:3 dfail:4 fail:0 skip:6 skl-i7k-2 total:141 pass:131 dwarn:2 dfail:0 fail:0 skip:8 snb-dellxps total:141 pass:122 dwarn:5 dfail:0 fail:0 skip:14 snb-x220t total:141 pass:122 dwarn:5 dfail:0 fail:1 skip:13 Results at /archive/results/CI_IGT_test/Patchwork_1184/ _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend 2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery ` (2 preceding siblings ...) 2016-01-14 12:20 ` ✗ failure: Fi.CI.BAT Patchwork @ 2016-01-19 12:09 ` Daniel Vetter 2016-01-19 13:48 ` Chris Wilson 3 siblings, 1 reply; 12+ messages in thread From: Daniel Vetter @ 2016-01-19 12:09 UTC (permalink / raw) To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote: > Pending reset requests are cleared before suspending, they should be picked up > after resume when new work is submitted. > > This is originally added as part of TDR patches for Gen8 from Tomas Elf which > are under review, as suggested by Chris this is extracted as a separate patch > as it can be useful now. > > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Pulling in the discussion we had from irc: Imo the right approach is to simply wait for gpu reset to finish it's job. Since that could in turn lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do that in a loop around gem_idle. And drop dev->struct_mutex in-between. E.g. while (busy) { mutex_lock(); gpu_idle(); mutex_unlock(); flush_work(reset_work); } Cheers, Daniel > --- > drivers/gpu/drm/i915/i915_drv.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index f17a2b0..09ed83e 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -594,6 +594,13 @@ static int i915_drm_suspend(struct drm_device *dev) > goto out; > } > > + /* > + * Clear any pending reset requests. They should be picked up > + * after resume when new work is submitted > + */ > + atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG, > + &dev_priv->gpu_error.reset_counter); > + > intel_guc_suspend(dev); > > intel_suspend_gt_powersave(dev); > -- > 1.9.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend 2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter @ 2016-01-19 13:48 ` Chris Wilson 2016-01-19 14:04 ` Daniel Vetter 0 siblings, 1 reply; 12+ messages in thread From: Chris Wilson @ 2016-01-19 13:48 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx, Mika Kuoppala On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote: > On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote: > > Pending reset requests are cleared before suspending, they should be picked up > > after resume when new work is submitted. > > > > This is originally added as part of TDR patches for Gen8 from Tomas Elf which > > are under review, as suggested by Chris this is extracted as a separate patch > > as it can be useful now. > > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > > Pulling in the discussion we had from irc: Imo the right approach is to > simply wait for gpu reset to finish it's job. Since that could in turn > lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do > that in a loop around gem_idle. And drop dev->struct_mutex in-between. > E.g. > > while (busy) { > mutex_lock(); > gpu_idle(); > mutex_unlock(); > > flush_work(reset_work); > } Where does the requirement for gpu_idle come from? If there is a global reset in progress, it cannot queue a request to flush the work and waiting on the old results will be skipped. So just wait for the global reset to complete, i.e. flush_work(). -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend 2016-01-19 13:48 ` Chris Wilson @ 2016-01-19 14:04 ` Daniel Vetter 2016-01-19 14:13 ` Chris Wilson 0 siblings, 1 reply; 12+ messages in thread From: Daniel Vetter @ 2016-01-19 14:04 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, Arun Siluvery, intel-gfx, Mika Kuoppala On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote: > On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote: > > On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote: > > > Pending reset requests are cleared before suspending, they should be picked up > > > after resume when new work is submitted. > > > > > > This is originally added as part of TDR patches for Gen8 from Tomas Elf which > > > are under review, as suggested by Chris this is extracted as a separate patch > > > as it can be useful now. > > > > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > > > > Pulling in the discussion we had from irc: Imo the right approach is to > > simply wait for gpu reset to finish it's job. Since that could in turn > > lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do > > that in a loop around gem_idle. And drop dev->struct_mutex in-between. > > E.g. > > > > while (busy) { > > mutex_lock(); > > gpu_idle(); > > mutex_unlock(); > > > > flush_work(reset_work); > > } > > Where does the requirement for gpu_idle come from? If there is a global > reset in progress, it cannot queue a request to flush the work and > waiting on the old results will be skipped. So just wait for the global > reset to complete, i.e. flush_work(). Yes, but the global reset might in turn leave a wrecked gpu behind, or at least a non-idle one. Hence another gpu_idle on top, to make sure. If we change init_hw() of engines to be synchronous then we should have at least a WARN_ON(not_idle_but_i_expected_so()); in there ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend 2016-01-19 14:04 ` Daniel Vetter @ 2016-01-19 14:13 ` Chris Wilson 2016-01-19 15:04 ` Arun Siluvery 0 siblings, 1 reply; 12+ messages in thread From: Chris Wilson @ 2016-01-19 14:13 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx, Mika Kuoppala On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote: > On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote: > > On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote: > > > On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote: > > > > Pending reset requests are cleared before suspending, they should be picked up > > > > after resume when new work is submitted. > > > > > > > > This is originally added as part of TDR patches for Gen8 from Tomas Elf which > > > > are under review, as suggested by Chris this is extracted as a separate patch > > > > as it can be useful now. > > > > > > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > > > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > > > > > > Pulling in the discussion we had from irc: Imo the right approach is to > > > simply wait for gpu reset to finish it's job. Since that could in turn > > > lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do > > > that in a loop around gem_idle. And drop dev->struct_mutex in-between. > > > E.g. > > > > > > while (busy) { > > > mutex_lock(); > > > gpu_idle(); > > > mutex_unlock(); > > > > > > flush_work(reset_work); > > > } > > > > Where does the requirement for gpu_idle come from? If there is a global > > reset in progress, it cannot queue a request to flush the work and > > waiting on the old results will be skipped. So just wait for the global > > reset to complete, i.e. flush_work(). > > Yes, but the global reset might in turn leave a wrecked gpu behind, or at > least a non-idle one. Hence another gpu_idle on top, to make sure. If we > change init_hw() of engines to be synchronous then we should have at least > a WARN_ON(not_idle_but_i_expected_so()); in there ... Does it matter on suspend? We test on resume if the GPU is usable, but if we wanted to test on suspend then we should do flush_work(); if (i915_terminally_wedged()) /* oh noes */; -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend 2016-01-19 14:13 ` Chris Wilson @ 2016-01-19 15:04 ` Arun Siluvery 2016-01-19 16:42 ` Daniel Vetter 0 siblings, 1 reply; 12+ messages in thread From: Arun Siluvery @ 2016-01-19 15:04 UTC (permalink / raw) To: Chris Wilson, Daniel Vetter, intel-gfx, Mika Kuoppala On 19/01/2016 14:13, Chris Wilson wrote: > On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote: >> On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote: >>> On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote: >>>> On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote: >>>>> Pending reset requests are cleared before suspending, they should be picked up >>>>> after resume when new work is submitted. >>>>> >>>>> This is originally added as part of TDR patches for Gen8 from Tomas Elf which >>>>> are under review, as suggested by Chris this is extracted as a separate patch >>>>> as it can be useful now. >>>>> >>>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com> >>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk> >>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >>>> >>>> Pulling in the discussion we had from irc: Imo the right approach is to >>>> simply wait for gpu reset to finish it's job. Since that could in turn >>>> lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do >>>> that in a loop around gem_idle. And drop dev->struct_mutex in-between. >>>> E.g. >>>> >>>> while (busy) { >>>> mutex_lock(); >>>> gpu_idle(); >>>> mutex_unlock(); >>>> >>>> flush_work(reset_work); >>>> } >>> >>> Where does the requirement for gpu_idle come from? If there is a global >>> reset in progress, it cannot queue a request to flush the work and >>> waiting on the old results will be skipped. So just wait for the global >>> reset to complete, i.e. flush_work(). >> >> Yes, but the global reset might in turn leave a wrecked gpu behind, or at >> least a non-idle one. Hence another gpu_idle on top, to make sure. If we >> change init_hw() of engines to be synchronous then we should have at least >> a WARN_ON(not_idle_but_i_expected_so()); in there ... gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we are doing reset in hangcheck work itself so I think there is no need to flush work. while (i915_reset_in_progress(gpu_error) && !i915_terminally_wedged(gpu_error)) { int ret; mutex_lock(&dev->struct_mutex); ret = i915_gpu_idle(dev); if (ret) DRM_ERROR("GPU is in inconsistent state after reset\n"); mutex_unlock(&dev->struct_mutex); } If the reset is successful we are idle before suspend otherwise in a wedged state. is this ok? regards Arun > > Does it matter on suspend? We test on resume if the GPU is usable, but > if we wanted to test on suspend then we should do > > flush_work(); > if (i915_terminally_wedged()) > /* oh noes */; > -Chris > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend 2016-01-19 15:04 ` Arun Siluvery @ 2016-01-19 16:42 ` Daniel Vetter 2016-01-19 17:01 ` Arun Siluvery 0 siblings, 1 reply; 12+ messages in thread From: Daniel Vetter @ 2016-01-19 16:42 UTC (permalink / raw) To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala On Tue, Jan 19, 2016 at 03:04:09PM +0000, Arun Siluvery wrote: > On 19/01/2016 14:13, Chris Wilson wrote: > >On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote: > >>On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote: > >>>On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote: > >>>>On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote: > >>>>>Pending reset requests are cleared before suspending, they should be picked up > >>>>>after resume when new work is submitted. > >>>>> > >>>>>This is originally added as part of TDR patches for Gen8 from Tomas Elf which > >>>>>are under review, as suggested by Chris this is extracted as a separate patch > >>>>>as it can be useful now. > >>>>> > >>>>>Cc: Mika Kuoppala <mika.kuoppala@intel.com> > >>>>>Cc: Chris Wilson <chris@chris-wilson.co.uk> > >>>>>Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > >>>> > >>>>Pulling in the discussion we had from irc: Imo the right approach is to > >>>>simply wait for gpu reset to finish it's job. Since that could in turn > >>>>lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do > >>>>that in a loop around gem_idle. And drop dev->struct_mutex in-between. > >>>>E.g. > >>>> > >>>>while (busy) { > >>>> mutex_lock(); > >>>> gpu_idle(); > >>>> mutex_unlock(); > >>>> > >>>> flush_work(reset_work); > >>>>} > >>> > >>>Where does the requirement for gpu_idle come from? If there is a global > >>>reset in progress, it cannot queue a request to flush the work and > >>>waiting on the old results will be skipped. So just wait for the global > >>>reset to complete, i.e. flush_work(). > >> > >>Yes, but the global reset might in turn leave a wrecked gpu behind, or at > >>least a non-idle one. Hence another gpu_idle on top, to make sure. If we > >>change init_hw() of engines to be synchronous then we should have at least > >>a WARN_ON(not_idle_but_i_expected_so()); in there ... > > gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we git sha1 from your private tree are meaningless in the public. Either link to some git weburl or mailing lists archive link. Thanks, Daniel > are doing reset in hangcheck work itself so I think there is no need to > flush work. > > while (i915_reset_in_progress(gpu_error) && > !i915_terminally_wedged(gpu_error)) { > int ret; > > mutex_lock(&dev->struct_mutex); > ret = i915_gpu_idle(dev); > if (ret) > DRM_ERROR("GPU is in inconsistent state after reset\n"); > mutex_unlock(&dev->struct_mutex); > } > > If the reset is successful we are idle before suspend otherwise in a wedged > state. is this ok? > > regards > Arun > > > > >Does it matter on suspend? We test on resume if the GPU is usable, but > >if we wanted to test on suspend then we should do > > > >flush_work(); > >if (i915_terminally_wedged()) > > /* oh noes */; > >-Chris > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend 2016-01-19 16:42 ` Daniel Vetter @ 2016-01-19 17:01 ` Arun Siluvery 2016-01-19 17:18 ` Daniel Vetter 0 siblings, 1 reply; 12+ messages in thread From: Arun Siluvery @ 2016-01-19 17:01 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx, Mika Kuoppala On 19/01/2016 16:42, Daniel Vetter wrote: > On Tue, Jan 19, 2016 at 03:04:09PM +0000, Arun Siluvery wrote: >> On 19/01/2016 14:13, Chris Wilson wrote: >>> On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote: >>>> On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote: >>>>> On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote: >>>>>> On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote: >>>>>>> Pending reset requests are cleared before suspending, they should be picked up >>>>>>> after resume when new work is submitted. >>>>>>> >>>>>>> This is originally added as part of TDR patches for Gen8 from Tomas Elf which >>>>>>> are under review, as suggested by Chris this is extracted as a separate patch >>>>>>> as it can be useful now. >>>>>>> >>>>>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com> >>>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk> >>>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> >>>>>> >>>>>> Pulling in the discussion we had from irc: Imo the right approach is to >>>>>> simply wait for gpu reset to finish it's job. Since that could in turn >>>>>> lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do >>>>>> that in a loop around gem_idle. And drop dev->struct_mutex in-between. >>>>>> E.g. >>>>>> >>>>>> while (busy) { >>>>>> mutex_lock(); >>>>>> gpu_idle(); >>>>>> mutex_unlock(); >>>>>> >>>>>> flush_work(reset_work); >>>>>> } >>>>> >>>>> Where does the requirement for gpu_idle come from? If there is a global >>>>> reset in progress, it cannot queue a request to flush the work and >>>>> waiting on the old results will be skipped. So just wait for the global >>>>> reset to complete, i.e. flush_work(). >>>> >>>> Yes, but the global reset might in turn leave a wrecked gpu behind, or at >>>> least a non-idle one. Hence another gpu_idle on top, to make sure. If we >>>> change init_hw() of engines to be synchronous then we should have at least >>>> a WARN_ON(not_idle_but_i_expected_so()); in there ... >> >> gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we > > git sha1 from your private tree are meaningless in the public. Either link > to some git weburl or mailing lists archive link. It is from drm-intel repo, http://cgit.freedesktop.org/drm-intel/commit/?id=b8d24a06568368076ebd5a858a011699a97bfa42 http://lists.freedesktop.org/archives/intel-gfx/2015-January/059154.html regards Arun > > Thanks, Daniel > >> are doing reset in hangcheck work itself so I think there is no need to >> flush work. >> >> while (i915_reset_in_progress(gpu_error) && >> !i915_terminally_wedged(gpu_error)) { >> int ret; >> >> mutex_lock(&dev->struct_mutex); >> ret = i915_gpu_idle(dev); >> if (ret) >> DRM_ERROR("GPU is in inconsistent state after reset\n"); >> mutex_unlock(&dev->struct_mutex); >> } >> >> If the reset is successful we are idle before suspend otherwise in a wedged >> state. is this ok? >> >> regards >> Arun >> >>> >>> Does it matter on suspend? We test on resume if the GPU is usable, but >>> if we wanted to test on suspend then we should do >>> >>> flush_work(); >>> if (i915_terminally_wedged()) >>> /* oh noes */; >>> -Chris >>> >> > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Clear pending reset requests during suspend 2016-01-19 17:01 ` Arun Siluvery @ 2016-01-19 17:18 ` Daniel Vetter 0 siblings, 0 replies; 12+ messages in thread From: Daniel Vetter @ 2016-01-19 17:18 UTC (permalink / raw) To: Arun Siluvery; +Cc: intel-gfx, Mika Kuoppala On Tue, Jan 19, 2016 at 05:01:00PM +0000, Arun Siluvery wrote: > On 19/01/2016 16:42, Daniel Vetter wrote: > >On Tue, Jan 19, 2016 at 03:04:09PM +0000, Arun Siluvery wrote: > >>On 19/01/2016 14:13, Chris Wilson wrote: > >>>On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote: > >>>>On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote: > >>>>>On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote: > >>>>>>On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote: > >>>>>>>Pending reset requests are cleared before suspending, they should be picked up > >>>>>>>after resume when new work is submitted. > >>>>>>> > >>>>>>>This is originally added as part of TDR patches for Gen8 from Tomas Elf which > >>>>>>>are under review, as suggested by Chris this is extracted as a separate patch > >>>>>>>as it can be useful now. > >>>>>>> > >>>>>>>Cc: Mika Kuoppala <mika.kuoppala@intel.com> > >>>>>>>Cc: Chris Wilson <chris@chris-wilson.co.uk> > >>>>>>>Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> > >>>>>> > >>>>>>Pulling in the discussion we had from irc: Imo the right approach is to > >>>>>>simply wait for gpu reset to finish it's job. Since that could in turn > >>>>>>lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do > >>>>>>that in a loop around gem_idle. And drop dev->struct_mutex in-between. > >>>>>>E.g. > >>>>>> > >>>>>>while (busy) { > >>>>>> mutex_lock(); > >>>>>> gpu_idle(); > >>>>>> mutex_unlock(); > >>>>>> > >>>>>> flush_work(reset_work); > >>>>>>} > >>>>> > >>>>>Where does the requirement for gpu_idle come from? If there is a global > >>>>>reset in progress, it cannot queue a request to flush the work and > >>>>>waiting on the old results will be skipped. So just wait for the global > >>>>>reset to complete, i.e. flush_work(). > >>>> > >>>>Yes, but the global reset might in turn leave a wrecked gpu behind, or at > >>>>least a non-idle one. Hence another gpu_idle on top, to make sure. If we > >>>>change init_hw() of engines to be synchronous then we should have at least > >>>>a WARN_ON(not_idle_but_i_expected_so()); in there ... > >> > >>gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we > > > >git sha1 from your private tree are meaningless in the public. Either link > >to some git weburl or mailing lists archive link. > > It is from drm-intel repo, > http://cgit.freedesktop.org/drm-intel/commit/?id=b8d24a06568368076ebd5a858a011699a97bfa42 > > http://lists.freedesktop.org/archives/intel-gfx/2015-January/059154.html Oh right, forgot that this landed, sorry for the confusion. Summary of our irc discussion: We idle the gpu and flush the hangcheck (which should flush the reset work) so at least with current upstream there shouldn't be a bug. If there is a bug we need to understand it, we can't just add code without clear explanation and reasons: At best that confuses, at worst it hides some real bugs. -Daniel > > regards > Arun > > > > >Thanks, Daniel > > > >>are doing reset in hangcheck work itself so I think there is no need to > >>flush work. > >> > >>while (i915_reset_in_progress(gpu_error) && > >> !i915_terminally_wedged(gpu_error)) { > >> int ret; > >> > >> mutex_lock(&dev->struct_mutex); > >> ret = i915_gpu_idle(dev); > >> if (ret) > >> DRM_ERROR("GPU is in inconsistent state after reset\n"); > >> mutex_unlock(&dev->struct_mutex); > >>} > >> > >>If the reset is successful we are idle before suspend otherwise in a wedged > >>state. is this ok? > >> > >>regards > >>Arun > >> > >>> > >>>Does it matter on suspend? We test on resume if the GPU is usable, but > >>>if we wanted to test on suspend then we should do > >>> > >>>flush_work(); > >>>if (i915_terminally_wedged()) > >>> /* oh noes */; > >>>-Chris > >>> > >> > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-01-19 17:18 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-14 10:49 [PATCH] drm/i915: Clear pending reset requests during suspend Arun Siluvery 2016-01-14 11:07 ` kbuild test robot 2016-01-14 11:19 ` Chris Wilson 2016-01-14 12:20 ` ✗ failure: Fi.CI.BAT Patchwork 2016-01-19 12:09 ` [PATCH] drm/i915: Clear pending reset requests during suspend Daniel Vetter 2016-01-19 13:48 ` Chris Wilson 2016-01-19 14:04 ` Daniel Vetter 2016-01-19 14:13 ` Chris Wilson 2016-01-19 15:04 ` Arun Siluvery 2016-01-19 16:42 ` Daniel Vetter 2016-01-19 17:01 ` Arun Siluvery 2016-01-19 17:18 ` Daniel Vetter
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.