* [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
@ 2017-02-20 15:29 Imre Deak
2017-02-20 16:05 ` Chris Wilson
` (3 more replies)
0 siblings, 4 replies; 21+ messages in thread
From: Imre Deak @ 2017-02-20 15:29 UTC (permalink / raw)
To: intel-gfx; +Cc: Ville Syrjälä, Chris Wilson, stable
After
commit 2c7d0602c815277f7cb7c932b091288710d8aba7
Author: Imre Deak <imre.deak@intel.com>
Date: Mon Dec 5 18:27:37 2016 +0200
drm/i915/gen9: Fix PCODE polling during CDCLK change notification
there is still one report of the CDCLK-change request timing out on a
KBL machine, see the Reference link. On that machine the maximum time
the request took to succeed was 34ms, so increase the timeout to 100ms.
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=99345
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
drivers/gpu/drm/i915/intel_drv.h | 2 +-
drivers/gpu/drm/i915/intel_pm.c | 11 ++++++-----
2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 821c57c..7970ba8 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -87,7 +87,7 @@
int cpu, ret, timeout = (US) * 1000; \
u64 base; \
_WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
- BUILD_BUG_ON((US) > 50000); \
+ BUILD_BUG_ON((US) > 100000); \
if (!(ATOMIC)) { \
preempt_disable(); \
cpu = smp_processor_id(); \
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index fe243c6..90134b0 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7910,10 +7910,10 @@ static bool skl_pcode_try_request(struct drm_i915_private *dev_priv, u32 mbox,
* @timeout_base_ms: timeout for polling with preemption enabled
*
* Keep resending the @request to @mbox until PCODE acknowledges it, PCODE
- * reports an error or an overall timeout of @timeout_base_ms+10 ms expires.
+ * reports an error or an overall timeout of @timeout_base_ms+100 ms expires.
* The request is acknowledged once the PCODE reply dword equals @reply after
* applying @reply_mask. Polling is first attempted with preemption enabled
- * for @timeout_base_ms and if this times out for another 10 ms with
+ * for @timeout_base_ms and if this times out for another 100 ms with
* preemption disabled.
*
* Returns 0 on success, %-ETIMEDOUT in case of a timeout, <0 in case of some
@@ -7949,14 +7949,15 @@ int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
* worst case) _and_ PCODE was busy for some reason even after a
* (queued) request and @timeout_base_ms delay. As a workaround retry
* the poll with preemption disabled to maximize the number of
- * requests. Increase the timeout from @timeout_base_ms to 10ms to
+ * requests. Increase the timeout from @timeout_base_ms to 100ms to
* account for interrupts that could reduce the number of these
- * requests.
+ * requests, and for any quirks of the PCODE firmware that delays
+ * the request completion.
*/
DRM_DEBUG_KMS("PCODE timeout, retrying with preemption disabled\n");
WARN_ON_ONCE(timeout_base_ms > 3);
preempt_disable();
- ret = wait_for_atomic(COND, 10);
+ ret = wait_for_atomic(COND, 100);
preempt_enable();
out:
--
2.5.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
2017-02-20 15:29 [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms Imre Deak
@ 2017-02-20 16:05 ` Chris Wilson
2017-02-20 17:22 ` ✓ Fi.CI.BAT: success for " Patchwork
` (2 subsequent siblings)
3 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2017-02-20 16:05 UTC (permalink / raw)
To: Imre Deak; +Cc: intel-gfx, Ville Syrjälä, stable
On Mon, Feb 20, 2017 at 05:29:44PM +0200, Imre Deak wrote:
> After
> commit 2c7d0602c815277f7cb7c932b091288710d8aba7
> Author: Imre Deak <imre.deak@intel.com>
> Date: Mon Dec 5 18:27:37 2016 +0200
>
> drm/i915/gen9: Fix PCODE polling during CDCLK change notification
>
> there is still one report of the CDCLK-change request timing out on a
> KBL machine, see the Reference link. On that machine the maximum time
> the request took to succeed was 34ms, so increase the timeout to 100ms.
>
> Reference: https://bugs.freedesktop.org/show_bug.cgi?id=99345
> Cc: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Imre Deak <imre.deak@intel.com>
> ---
> drivers/gpu/drm/i915/intel_drv.h | 2 +-
> drivers/gpu/drm/i915/intel_pm.c | 11 ++++++-----
> 2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 821c57c..7970ba8 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -87,7 +87,7 @@
> int cpu, ret, timeout = (US) * 1000; \
> u64 base; \
> _WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
> - BUILD_BUG_ON((US) > 50000); \
> + BUILD_BUG_ON((US) > 100000); \
> if (!(ATOMIC)) { \
> preempt_disable(); \
> cpu = smp_processor_id(); \
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index fe243c6..90134b0 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -7910,10 +7910,10 @@ static bool skl_pcode_try_request(struct drm_i915_private *dev_priv, u32 mbox,
> * @timeout_base_ms: timeout for polling with preemption enabled
> *
> * Keep resending the @request to @mbox until PCODE acknowledges it, PCODE
> - * reports an error or an overall timeout of @timeout_base_ms+10 ms expires.
> + * reports an error or an overall timeout of @timeout_base_ms+100 ms expires.
> * The request is acknowledged once the PCODE reply dword equals @reply after
> * applying @reply_mask. Polling is first attempted with preemption enabled
> - * for @timeout_base_ms and if this times out for another 10 ms with
> + * for @timeout_base_ms and if this times out for another 100 ms with
> * preemption disabled.
> *
> * Returns 0 on success, %-ETIMEDOUT in case of a timeout, <0 in case of some
> @@ -7949,14 +7949,15 @@ int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
> * worst case) _and_ PCODE was busy for some reason even after a
> * (queued) request and @timeout_base_ms delay. As a workaround retry
> * the poll with preemption disabled to maximize the number of
> - * requests. Increase the timeout from @timeout_base_ms to 10ms to
> + * requests. Increase the timeout from @timeout_base_ms to 100ms to
> * account for interrupts that could reduce the number of these
> - * requests.
> + * requests, and for any quirks of the PCODE firmware that delays
> + * the request completion.
> */
> DRM_DEBUG_KMS("PCODE timeout, retrying with preemption disabled\n");
> WARN_ON_ONCE(timeout_base_ms > 3);
> preempt_disable();
> - ret = wait_for_atomic(COND, 10);
> + ret = wait_for_atomic(COND, 100);
> preempt_enable();
Ugh. Straw + camel. How about something like:
__try_request_atomic:
cond_resched();
preempt_disable()
ret = COND ? wait_for_atomic(COND, 10) : 0;
preempt_enable();
return ret;
try_request:
ret = wait_for(__try_request_atomic() == 0, 100);
So that our preempt-off period doesn't grow completely unchecked, or do
we need that 34ms loop?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
@ 2017-02-20 16:05 ` Chris Wilson
0 siblings, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2017-02-20 16:05 UTC (permalink / raw)
To: Imre Deak; +Cc: intel-gfx, Ville Syrjälä, stable
On Mon, Feb 20, 2017 at 05:29:44PM +0200, Imre Deak wrote:
> After
> commit 2c7d0602c815277f7cb7c932b091288710d8aba7
> Author: Imre Deak <imre.deak@intel.com>
> Date: Mon Dec 5 18:27:37 2016 +0200
>
> drm/i915/gen9: Fix PCODE polling during CDCLK change notification
>
> there is still one report of the CDCLK-change request timing out on a
> KBL machine, see the Reference link. On that machine the maximum time
> the request took to succeed was 34ms, so increase the timeout to 100ms.
>
> Reference: https://bugs.freedesktop.org/show_bug.cgi?id=99345
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Imre Deak <imre.deak@intel.com>
> ---
> drivers/gpu/drm/i915/intel_drv.h | 2 +-
> drivers/gpu/drm/i915/intel_pm.c | 11 ++++++-----
> 2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 821c57c..7970ba8 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -87,7 +87,7 @@
> int cpu, ret, timeout = (US) * 1000; \
> u64 base; \
> _WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
> - BUILD_BUG_ON((US) > 50000); \
> + BUILD_BUG_ON((US) > 100000); \
> if (!(ATOMIC)) { \
> preempt_disable(); \
> cpu = smp_processor_id(); \
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index fe243c6..90134b0 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -7910,10 +7910,10 @@ static bool skl_pcode_try_request(struct drm_i915_private *dev_priv, u32 mbox,
> * @timeout_base_ms: timeout for polling with preemption enabled
> *
> * Keep resending the @request to @mbox until PCODE acknowledges it, PCODE
> - * reports an error or an overall timeout of @timeout_base_ms+10 ms expires.
> + * reports an error or an overall timeout of @timeout_base_ms+100 ms expires.
> * The request is acknowledged once the PCODE reply dword equals @reply after
> * applying @reply_mask. Polling is first attempted with preemption enabled
> - * for @timeout_base_ms and if this times out for another 10 ms with
> + * for @timeout_base_ms and if this times out for another 100 ms with
> * preemption disabled.
> *
> * Returns 0 on success, %-ETIMEDOUT in case of a timeout, <0 in case of some
> @@ -7949,14 +7949,15 @@ int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
> * worst case) _and_ PCODE was busy for some reason even after a
> * (queued) request and @timeout_base_ms delay. As a workaround retry
> * the poll with preemption disabled to maximize the number of
> - * requests. Increase the timeout from @timeout_base_ms to 10ms to
> + * requests. Increase the timeout from @timeout_base_ms to 100ms to
> * account for interrupts that could reduce the number of these
> - * requests.
> + * requests, and for any quirks of the PCODE firmware that delays
> + * the request completion.
> */
> DRM_DEBUG_KMS("PCODE timeout, retrying with preemption disabled\n");
> WARN_ON_ONCE(timeout_base_ms > 3);
> preempt_disable();
> - ret = wait_for_atomic(COND, 10);
> + ret = wait_for_atomic(COND, 100);
> preempt_enable();
Ugh. Straw + camel. How about something like:
__try_request_atomic:
cond_resched();
preempt_disable()
ret = COND ? wait_for_atomic(COND, 10) : 0;
preempt_enable();
return ret;
try_request:
ret = wait_for(__try_request_atomic() == 0, 100);
So that our preempt-off period doesn't grow completely unchecked, or do
we need that 34ms loop?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 21+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915/gen9: Increase PCODE request timeout to 100ms
2017-02-20 15:29 [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms Imre Deak
2017-02-20 16:05 ` Chris Wilson
@ 2017-02-20 17:22 ` Patchwork
2017-02-24 14:32 ` [PATCH v2] drm/i915/gen9: Increase PCODE request timeout to 50ms Imre Deak
2017-02-24 15:52 ` ✓ Fi.CI.BAT: success for drm/i915/gen9: Increase PCODE request timeout to 100ms (rev2) Patchwork
3 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2017-02-20 17:22 UTC (permalink / raw)
To: Imre Deak; +Cc: intel-gfx
== Series Details ==
Series: drm/i915/gen9: Increase PCODE request timeout to 100ms
URL : https://patchwork.freedesktop.org/series/19961/
State : success
== Summary ==
Series 19961v1 drm/i915/gen9: Increase PCODE request timeout to 100ms
https://patchwork.freedesktop.org/api/1.0/series/19961/revisions/1/mbox/
fi-bdw-5557u total:253 pass:242 dwarn:0 dfail:0 fail:0 skip:11
fi-bsw-n3050 total:253 pass:214 dwarn:0 dfail:0 fail:0 skip:39
fi-bxt-j4205 total:253 pass:234 dwarn:0 dfail:0 fail:0 skip:19
fi-bxt-t5700 total:83 pass:70 dwarn:0 dfail:0 fail:0 skip:12
fi-byt-j1900 total:253 pass:226 dwarn:0 dfail:0 fail:0 skip:27
fi-byt-n2820 total:253 pass:222 dwarn:0 dfail:0 fail:0 skip:31
fi-hsw-4770 total:253 pass:237 dwarn:0 dfail:0 fail:0 skip:16
fi-hsw-4770r total:253 pass:237 dwarn:0 dfail:0 fail:0 skip:16
fi-ilk-650 total:253 pass:203 dwarn:0 dfail:0 fail:0 skip:50
fi-ivb-3520m total:253 pass:235 dwarn:0 dfail:0 fail:0 skip:18
fi-ivb-3770 total:253 pass:235 dwarn:0 dfail:0 fail:0 skip:18
fi-kbl-7500u total:253 pass:235 dwarn:0 dfail:0 fail:0 skip:18
fi-skl-6260u total:253 pass:243 dwarn:0 dfail:0 fail:0 skip:10
fi-skl-6700hq total:253 pass:236 dwarn:0 dfail:0 fail:0 skip:17
fi-skl-6700k total:253 pass:231 dwarn:4 dfail:0 fail:0 skip:18
fi-skl-6770hq total:253 pass:243 dwarn:0 dfail:0 fail:0 skip:10
fi-snb-2520m total:253 pass:225 dwarn:0 dfail:0 fail:0 skip:28
fi-snb-2600 total:253 pass:224 dwarn:0 dfail:0 fail:0 skip:29
9cdbb2744ab120d339614f41acb1ff3a2fbd8f80 drm-tip: 2017y-02m-20d-14h-54m-11s UTC integration manifest
23886e9 drm/i915/gen9: Increase PCODE request timeout to 100ms
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3903/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
2017-02-20 16:05 ` Chris Wilson
@ 2017-02-21 9:22 ` Imre Deak
-1 siblings, 0 replies; 21+ messages in thread
From: Imre Deak @ 2017-02-21 9:22 UTC (permalink / raw)
To: Chris Wilson, intel-gfx, Ville Syrjälä, stable
On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
> On Mon, Feb 20, 2017 at 05:29:44PM +0200, Imre Deak wrote:
> > After
> > commit 2c7d0602c815277f7cb7c932b091288710d8aba7
> > Author: Imre Deak <imre.deak@intel.com>
> > Date: Mon Dec 5 18:27:37 2016 +0200
> >
> > drm/i915/gen9: Fix PCODE polling during CDCLK change notification
> >
> > there is still one report of the CDCLK-change request timing out on a
> > KBL machine, see the Reference link. On that machine the maximum time
> > the request took to succeed was 34ms, so increase the timeout to 100ms.
> >
> > Reference: https://bugs.freedesktop.org/show_bug.cgi?id=99345
> > Cc: Ville Syrj�l� <ville.syrjala@linux.intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > ---
> > drivers/gpu/drm/i915/intel_drv.h | 2 +-
> > drivers/gpu/drm/i915/intel_pm.c | 11 ++++++-----
> > 2 files changed, 7 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > index 821c57c..7970ba8 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -87,7 +87,7 @@
> > int cpu, ret, timeout = (US) * 1000; \
> > u64 base; \
> > _WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
> > - BUILD_BUG_ON((US) > 50000); \
> > + BUILD_BUG_ON((US) > 100000); \
> > if (!(ATOMIC)) { \
> > preempt_disable(); \
> > cpu = smp_processor_id(); \
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index fe243c6..90134b0 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -7910,10 +7910,10 @@ static bool skl_pcode_try_request(struct drm_i915_private *dev_priv, u32 mbox,
> > * @timeout_base_ms: timeout for polling with preemption enabled
> > *
> > * Keep resending the @request to @mbox until PCODE acknowledges it, PCODE
> > - * reports an error or an overall timeout of @timeout_base_ms+10 ms expires.
> > + * reports an error or an overall timeout of @timeout_base_ms+100 ms expires.
> > * The request is acknowledged once the PCODE reply dword equals @reply after
> > * applying @reply_mask. Polling is first attempted with preemption enabled
> > - * for @timeout_base_ms and if this times out for another 10 ms with
> > + * for @timeout_base_ms and if this times out for another 100 ms with
> > * preemption disabled.
> > *
> > * Returns 0 on success, %-ETIMEDOUT in case of a timeout, <0 in case of some
> > @@ -7949,14 +7949,15 @@ int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
> > * worst case) _and_ PCODE was busy for some reason even after a
> > * (queued) request and @timeout_base_ms delay. As a workaround retry
> > * the poll with preemption disabled to maximize the number of
> > - * requests. Increase the timeout from @timeout_base_ms to 10ms to
> > + * requests. Increase the timeout from @timeout_base_ms to 100ms to
> > * account for interrupts that could reduce the number of these
> > - * requests.
> > + * requests, and for any quirks of the PCODE firmware that delays
> > + * the request completion.
> > */
> > DRM_DEBUG_KMS("PCODE timeout, retrying with preemption disabled\n");
> > WARN_ON_ONCE(timeout_base_ms > 3);
> > preempt_disable();
> > - ret = wait_for_atomic(COND, 10);
> > + ret = wait_for_atomic(COND, 100);
> > preempt_enable();
>
> Ugh. Straw + camel. How about something like:
>
> __try_request_atomic:
> cond_resched();
>
> preempt_disable()
> ret = COND ? wait_for_atomic(COND, 10) : 0;
> preempt_enable();
> return ret;
>
> try_request:
> ret = wait_for(__try_request_atomic() == 0, 100);
>
> So that our preempt-off period doesn't grow completely unchecked, or do
> we need that 34ms loop?
Yes, that's at least how I understand it. Scheduling away is what let's
PCODE start servicing some other request than ours or go idle. That's
in a way what we see when the preempt-enabled poll times out.
--Imre
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
@ 2017-02-21 9:22 ` Imre Deak
0 siblings, 0 replies; 21+ messages in thread
From: Imre Deak @ 2017-02-21 9:22 UTC (permalink / raw)
To: Chris Wilson, intel-gfx, Ville Syrjälä, stable
On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
> On Mon, Feb 20, 2017 at 05:29:44PM +0200, Imre Deak wrote:
> > After
> > commit 2c7d0602c815277f7cb7c932b091288710d8aba7
> > Author: Imre Deak <imre.deak@intel.com>
> > Date: Mon Dec 5 18:27:37 2016 +0200
> >
> > drm/i915/gen9: Fix PCODE polling during CDCLK change notification
> >
> > there is still one report of the CDCLK-change request timing out on a
> > KBL machine, see the Reference link. On that machine the maximum time
> > the request took to succeed was 34ms, so increase the timeout to 100ms.
> >
> > Reference: https://bugs.freedesktop.org/show_bug.cgi?id=99345
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > ---
> > drivers/gpu/drm/i915/intel_drv.h | 2 +-
> > drivers/gpu/drm/i915/intel_pm.c | 11 ++++++-----
> > 2 files changed, 7 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > index 821c57c..7970ba8 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -87,7 +87,7 @@
> > int cpu, ret, timeout = (US) * 1000; \
> > u64 base; \
> > _WAIT_FOR_ATOMIC_CHECK(ATOMIC); \
> > - BUILD_BUG_ON((US) > 50000); \
> > + BUILD_BUG_ON((US) > 100000); \
> > if (!(ATOMIC)) { \
> > preempt_disable(); \
> > cpu = smp_processor_id(); \
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index fe243c6..90134b0 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -7910,10 +7910,10 @@ static bool skl_pcode_try_request(struct drm_i915_private *dev_priv, u32 mbox,
> > * @timeout_base_ms: timeout for polling with preemption enabled
> > *
> > * Keep resending the @request to @mbox until PCODE acknowledges it, PCODE
> > - * reports an error or an overall timeout of @timeout_base_ms+10 ms expires.
> > + * reports an error or an overall timeout of @timeout_base_ms+100 ms expires.
> > * The request is acknowledged once the PCODE reply dword equals @reply after
> > * applying @reply_mask. Polling is first attempted with preemption enabled
> > - * for @timeout_base_ms and if this times out for another 10 ms with
> > + * for @timeout_base_ms and if this times out for another 100 ms with
> > * preemption disabled.
> > *
> > * Returns 0 on success, %-ETIMEDOUT in case of a timeout, <0 in case of some
> > @@ -7949,14 +7949,15 @@ int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
> > * worst case) _and_ PCODE was busy for some reason even after a
> > * (queued) request and @timeout_base_ms delay. As a workaround retry
> > * the poll with preemption disabled to maximize the number of
> > - * requests. Increase the timeout from @timeout_base_ms to 10ms to
> > + * requests. Increase the timeout from @timeout_base_ms to 100ms to
> > * account for interrupts that could reduce the number of these
> > - * requests.
> > + * requests, and for any quirks of the PCODE firmware that delays
> > + * the request completion.
> > */
> > DRM_DEBUG_KMS("PCODE timeout, retrying with preemption disabled\n");
> > WARN_ON_ONCE(timeout_base_ms > 3);
> > preempt_disable();
> > - ret = wait_for_atomic(COND, 10);
> > + ret = wait_for_atomic(COND, 100);
> > preempt_enable();
>
> Ugh. Straw + camel. How about something like:
>
> __try_request_atomic:
> cond_resched();
>
> preempt_disable()
> ret = COND ? wait_for_atomic(COND, 10) : 0;
> preempt_enable();
> return ret;
>
> try_request:
> ret = wait_for(__try_request_atomic() == 0, 100);
>
> So that our preempt-off period doesn't grow completely unchecked, or do
> we need that 34ms loop?
Yes, that's at least how I understand it. Scheduling away is what let's
PCODE start servicing some other request than ours or go idle. That's
in a way what we see when the preempt-enabled poll times out.
--Imre
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
2017-02-21 9:22 ` Imre Deak
(?)
@ 2017-02-21 9:37 ` Chris Wilson
2017-02-21 10:06 ` [Intel-gfx] " Tvrtko Ursulin
-1 siblings, 1 reply; 21+ messages in thread
From: Chris Wilson @ 2017-02-21 9:37 UTC (permalink / raw)
To: Imre Deak; +Cc: intel-gfx, Ville Syrjälä, stable
On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
> On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
> > So that our preempt-off period doesn't grow completely unchecked, or do
> > we need that 34ms loop?
>
> Yes, that's at least how I understand it. Scheduling away is what let's
> PCODE start servicing some other request than ours or go idle. That's
> in a way what we see when the preempt-enabled poll times out.
I was thinking along the lines of if it was just busy/unavailable for the
first 33ms that particular time, it just needed to sleep until ready.
Once available, the next request ran in the expected 1ms.
Do you not see any value in trying a sleeping loop? Perhaps compromise
and have the preempt-disable timeout increase each iteration.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
2017-02-21 9:37 ` Chris Wilson
@ 2017-02-21 10:06 ` Tvrtko Ursulin
2017-02-21 12:43 ` Imre Deak
0 siblings, 1 reply; 21+ messages in thread
From: Tvrtko Ursulin @ 2017-02-21 10:06 UTC (permalink / raw)
To: Chris Wilson, Imre Deak, intel-gfx, Ville Syrjälä, stable
On 21/02/2017 09:37, Chris Wilson wrote:
> On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
>> On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
>>> So that our preempt-off period doesn't grow completely unchecked, or do
>>> we need that 34ms loop?
>>
>> Yes, that's at least how I understand it. Scheduling away is what let's
>> PCODE start servicing some other request than ours or go idle. That's
>> in a way what we see when the preempt-enabled poll times out.
>
> I was thinking along the lines of if it was just busy/unavailable for the
> first 33ms that particular time, it just needed to sleep until ready.
> Once available, the next request ran in the expected 1ms.
>
> Do you not see any value in trying a sleeping loop? Perhaps compromise
> and have the preempt-disable timeout increase each iteration.
Parachuting in so apologies if I misunderstood something.
Is the issue here that we can get starved out of CPU time for more than
33ms while waiting for an event?
Could we play games with sched_setscheduler and maybe temporarily go
SCHED_DEADLINE or something? Would have to look into how to correctly
restore to the old state from that and from which contexts we can
actually end up in this wait.
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
2017-02-21 10:06 ` [Intel-gfx] " Tvrtko Ursulin
@ 2017-02-21 12:43 ` Imre Deak
2017-02-21 13:11 ` Tvrtko Ursulin
2017-02-21 13:19 ` [Intel-gfx] " Chris Wilson
0 siblings, 2 replies; 21+ messages in thread
From: Imre Deak @ 2017-02-21 12:43 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: Chris Wilson, intel-gfx, Ville Syrjälä, stable
On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
>
> On 21/02/2017 09:37, Chris Wilson wrote:
> >On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
> >>On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
> >>>So that our preempt-off period doesn't grow completely unchecked, or do
> >>>we need that 34ms loop?
> >>
> >>Yes, that's at least how I understand it. Scheduling away is what let's
> >>PCODE start servicing some other request than ours or go idle. That's
> >>in a way what we see when the preempt-enabled poll times out.
> >
> >I was thinking along the lines of if it was just busy/unavailable for the
> >first 33ms that particular time, it just needed to sleep until ready.
> >Once available, the next request ran in the expected 1ms.
>
> >Do you not see any value in trying a sleeping loop? Perhaps compromise
> >and have the preempt-disable timeout increase each iteration.
This fallback method would work too, but imo the worst case is what
matters and that would be anyway the same in both cases. Because of this
and since it's a WA I'd rather keep it simple.
> Parachuting in so apologies if I misunderstood something.
>
> Is the issue here that we can get starved out of CPU time for more than 33ms
> while waiting for an event?
We need to actively resend the same request for this duration.
> Could we play games with sched_setscheduler and maybe temporarily go
> SCHED_DEADLINE or something? Would have to look into how to correctly
> restore to the old state from that and from which contexts we can actually
> end up in this wait.
What would be the benefit wrt. disabling preemption? Note that since
it's a workaround it would be good to keep it simple and close to how it
worked on previous platforms (SKL/APL).
--Imre
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
2017-02-21 12:43 ` Imre Deak
@ 2017-02-21 13:11 ` Tvrtko Ursulin
2017-02-21 13:19 ` [Intel-gfx] " Chris Wilson
1 sibling, 0 replies; 21+ messages in thread
From: Tvrtko Ursulin @ 2017-02-21 13:11 UTC (permalink / raw)
To: imre.deak; +Cc: Chris Wilson, intel-gfx, Ville Syrjälä, stable
On 21/02/2017 12:43, Imre Deak wrote:
> On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
>>
>> On 21/02/2017 09:37, Chris Wilson wrote:
>>> On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
>>>> On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
>>>>> So that our preempt-off period doesn't grow completely unchecked, or do
>>>>> we need that 34ms loop?
>>>>
>>>> Yes, that's at least how I understand it. Scheduling away is what let's
>>>> PCODE start servicing some other request than ours or go idle. That's
>>>> in a way what we see when the preempt-enabled poll times out.
>>>
>>> I was thinking along the lines of if it was just busy/unavailable for the
>>> first 33ms that particular time, it just needed to sleep until ready.
>>> Once available, the next request ran in the expected 1ms.
>>
>>> Do you not see any value in trying a sleeping loop? Perhaps compromise
>>> and have the preempt-disable timeout increase each iteration.
>
> This fallback method would work too, but imo the worst case is what
> matters and that would be anyway the same in both cases. Because of this
> and since it's a WA I'd rather keep it simple.
>
>> Parachuting in so apologies if I misunderstood something.
>>
>> Is the issue here that we can get starved out of CPU time for more than 33ms
>> while waiting for an event?
>
> We need to actively resend the same request for this duration.
>
>> Could we play games with sched_setscheduler and maybe temporarily go
>> SCHED_DEADLINE or something? Would have to look into how to correctly
>> restore to the old state from that and from which contexts we can actually
>> end up in this wait.
>
> What would be the benefit wrt. disabling preemption? Note that since
> it's a workaround it would be good to keep it simple and close to how it
> worked on previous platforms (SKL/APL).
It would be nicer not to relax that BUILD_BUG_ON in atomic wait for and,
if the main problem is the scheduler/CPU starvation, to see if it can be
solved differently. Even though the atomic wait here would trigger very
rarely it might be worth coming up with something nicer and generalized.
If I understood it correctly, the difference between this wait_for call
site and the rest is that here it wants a certain number of COND checks
to be guaranteed? The other call sites care more about checking on enter
and exit.
So in this case we want the period parameter to actually be guaranteed
(or close). This sounded like a good candidate for SCHED_DEADLINE to me.
Like wait_for_periodic(COND, TIMEOUT, INTERVAL).
Maybe that could get away with the second atomic loop and be a generic
solution on all platforms.
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
@ 2017-02-21 13:11 ` Tvrtko Ursulin
0 siblings, 0 replies; 21+ messages in thread
From: Tvrtko Ursulin @ 2017-02-21 13:11 UTC (permalink / raw)
To: imre.deak; +Cc: intel-gfx, stable
On 21/02/2017 12:43, Imre Deak wrote:
> On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
>>
>> On 21/02/2017 09:37, Chris Wilson wrote:
>>> On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
>>>> On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
>>>>> So that our preempt-off period doesn't grow completely unchecked, or do
>>>>> we need that 34ms loop?
>>>>
>>>> Yes, that's at least how I understand it. Scheduling away is what let's
>>>> PCODE start servicing some other request than ours or go idle. That's
>>>> in a way what we see when the preempt-enabled poll times out.
>>>
>>> I was thinking along the lines of if it was just busy/unavailable for the
>>> first 33ms that particular time, it just needed to sleep until ready.
>>> Once available, the next request ran in the expected 1ms.
>>
>>> Do you not see any value in trying a sleeping loop? Perhaps compromise
>>> and have the preempt-disable timeout increase each iteration.
>
> This fallback method would work too, but imo the worst case is what
> matters and that would be anyway the same in both cases. Because of this
> and since it's a WA I'd rather keep it simple.
>
>> Parachuting in so apologies if I misunderstood something.
>>
>> Is the issue here that we can get starved out of CPU time for more than 33ms
>> while waiting for an event?
>
> We need to actively resend the same request for this duration.
>
>> Could we play games with sched_setscheduler and maybe temporarily go
>> SCHED_DEADLINE or something? Would have to look into how to correctly
>> restore to the old state from that and from which contexts we can actually
>> end up in this wait.
>
> What would be the benefit wrt. disabling preemption? Note that since
> it's a workaround it would be good to keep it simple and close to how it
> worked on previous platforms (SKL/APL).
It would be nicer not to relax that BUILD_BUG_ON in atomic wait for and,
if the main problem is the scheduler/CPU starvation, to see if it can be
solved differently. Even though the atomic wait here would trigger very
rarely it might be worth coming up with something nicer and generalized.
If I understood it correctly, the difference between this wait_for call
site and the rest is that here it wants a certain number of COND checks
to be guaranteed? The other call sites care more about checking on enter
and exit.
So in this case we want the period parameter to actually be guaranteed
(or close). This sounded like a good candidate for SCHED_DEADLINE to me.
Like wait_for_periodic(COND, TIMEOUT, INTERVAL).
Maybe that could get away with the second atomic loop and be a generic
solution on all platforms.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
2017-02-21 12:43 ` Imre Deak
2017-02-21 13:11 ` Tvrtko Ursulin
@ 2017-02-21 13:19 ` Chris Wilson
2017-02-21 14:18 ` Imre Deak
1 sibling, 1 reply; 21+ messages in thread
From: Chris Wilson @ 2017-02-21 13:19 UTC (permalink / raw)
To: Imre Deak; +Cc: Tvrtko Ursulin, intel-gfx, Ville Syrjälä, stable
On Tue, Feb 21, 2017 at 02:43:30PM +0200, Imre Deak wrote:
> On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
> >
> > On 21/02/2017 09:37, Chris Wilson wrote:
> > >On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
> > >>On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
> > >>>So that our preempt-off period doesn't grow completely unchecked, or do
> > >>>we need that 34ms loop?
> > >>
> > >>Yes, that's at least how I understand it. Scheduling away is what let's
> > >>PCODE start servicing some other request than ours or go idle. That's
> > >>in a way what we see when the preempt-enabled poll times out.
> > >
> > >I was thinking along the lines of if it was just busy/unavailable for the
> > >first 33ms that particular time, it just needed to sleep until ready.
> > >Once available, the next request ran in the expected 1ms.
> >
> > >Do you not see any value in trying a sleeping loop? Perhaps compromise
> > >and have the preempt-disable timeout increase each iteration.
>
> This fallback method would work too, but imo the worst case is what
> matters and that would be anyway the same in both cases. Because of this
> and since it's a WA I'd rather keep it simple.
>
> > Parachuting in so apologies if I misunderstood something.
> >
> > Is the issue here that we can get starved out of CPU time for more than 33ms
> > while waiting for an event?
>
> We need to actively resend the same request for this duration.
>
> > Could we play games with sched_setscheduler and maybe temporarily go
> > SCHED_DEADLINE or something? Would have to look into how to correctly
> > restore to the old state from that and from which contexts we can actually
> > end up in this wait.
>
> What would be the benefit wrt. disabling preemption? Note that since
> it's a workaround it would be good to keep it simple and close to how it
> worked on previous platforms (SKL/APL).
Yeah, I'm not happy with busy-spinning for 34ms without any scheduler
interaction at all. Or that we don't handle the failure gracefully. Or
that the hw appears pretty flimsy and the communitcation method is hit
and miss.
I'd accept a compromise that bumped the timer to 50ms i.e. didn't have
to up the BUILD_BUG_ON. Only a 50% safety factor, but we are already
an order of magnitude beyond the expected response time.
50 I would ack. :|
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
2017-02-21 13:11 ` Tvrtko Ursulin
(?)
@ 2017-02-21 14:13 ` Imre Deak
2017-02-21 14:16 ` Tvrtko Ursulin
-1 siblings, 1 reply; 21+ messages in thread
From: Imre Deak @ 2017-02-21 14:13 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: Chris Wilson, intel-gfx, Ville Syrjälä, stable
On Tue, Feb 21, 2017 at 01:11:27PM +0000, Tvrtko Ursulin wrote:
>
> On 21/02/2017 12:43, Imre Deak wrote:
> >On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 21/02/2017 09:37, Chris Wilson wrote:
> >>>On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
> >>>>On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
> >>>>>So that our preempt-off period doesn't grow completely unchecked, or do
> >>>>>we need that 34ms loop?
> >>>>
> >>>>Yes, that's at least how I understand it. Scheduling away is what let's
> >>>>PCODE start servicing some other request than ours or go idle. That's
> >>>>in a way what we see when the preempt-enabled poll times out.
> >>>
> >>>I was thinking along the lines of if it was just busy/unavailable for the
> >>>first 33ms that particular time, it just needed to sleep until ready.
> >>>Once available, the next request ran in the expected 1ms.
> >>
> >>>Do you not see any value in trying a sleeping loop? Perhaps compromise
> >>>and have the preempt-disable timeout increase each iteration.
> >
> >This fallback method would work too, but imo the worst case is what
> >matters and that would be anyway the same in both cases. Because of this
> >and since it's a WA I'd rather keep it simple.
> >
> >>Parachuting in so apologies if I misunderstood something.
> >>
> >>Is the issue here that we can get starved out of CPU time for more than 33ms
> >>while waiting for an event?
> >
> >We need to actively resend the same request for this duration.
> >
> >>Could we play games with sched_setscheduler and maybe temporarily go
> >>SCHED_DEADLINE or something? Would have to look into how to correctly
> >>restore to the old state from that and from which contexts we can actually
> >>end up in this wait.
> >
> >What would be the benefit wrt. disabling preemption? Note that since
> >it's a workaround it would be good to keep it simple and close to how it
> >worked on previous platforms (SKL/APL).
>
> It would be nicer not to relax that BUILD_BUG_ON in atomic wait for and, if
> the main problem is the scheduler/CPU starvation, to see if it can be solved
> differently. Even though the atomic wait here would trigger very rarely it
> might be worth coming up with something nicer and generalized.
>
> If I understood it correctly, the difference between this wait_for call site
> and the rest is that here it wants a certain number of COND checks to be
> guaranteed?
Yes.
> The other call sites care more about checking on enter and exit.
>
> So in this case we want the period parameter to actually be guaranteed (or
> close). This sounded like a good candidate for SCHED_DEADLINE to me. Like
> wait_for_periodic(COND, TIMEOUT, INTERVAL).
Could be. But this would give less guarantee than disabling preemption,
as SCHED_DEADLINE still works on a best effort basis. How about
increasing the timeout now (to 50ms) and trying what you suggest as a
follow-up? That way we have also something for -stable.
--Imre
> Maybe that could get away with the second atomic loop and be a generic
> solution on all platforms.
>
> Regards,
>
> Tvrtko
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
2017-02-21 14:13 ` [Intel-gfx] " Imre Deak
@ 2017-02-21 14:16 ` Tvrtko Ursulin
0 siblings, 0 replies; 21+ messages in thread
From: Tvrtko Ursulin @ 2017-02-21 14:16 UTC (permalink / raw)
To: imre.deak; +Cc: Chris Wilson, intel-gfx, Ville Syrjälä, stable
On 21/02/2017 14:13, Imre Deak wrote:
> On Tue, Feb 21, 2017 at 01:11:27PM +0000, Tvrtko Ursulin wrote:
>>
>> On 21/02/2017 12:43, Imre Deak wrote:
>>> On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/02/2017 09:37, Chris Wilson wrote:
>>>>> On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
>>>>>> On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
>>>>>>> So that our preempt-off period doesn't grow completely unchecked, or do
>>>>>>> we need that 34ms loop?
>>>>>>
>>>>>> Yes, that's at least how I understand it. Scheduling away is what let's
>>>>>> PCODE start servicing some other request than ours or go idle. That's
>>>>>> in a way what we see when the preempt-enabled poll times out.
>>>>>
>>>>> I was thinking along the lines of if it was just busy/unavailable for the
>>>>> first 33ms that particular time, it just needed to sleep until ready.
>>>>> Once available, the next request ran in the expected 1ms.
>>>>
>>>>> Do you not see any value in trying a sleeping loop? Perhaps compromise
>>>>> and have the preempt-disable timeout increase each iteration.
>>>
>>> This fallback method would work too, but imo the worst case is what
>>> matters and that would be anyway the same in both cases. Because of this
>>> and since it's a WA I'd rather keep it simple.
>>>
>>>> Parachuting in so apologies if I misunderstood something.
>>>>
>>>> Is the issue here that we can get starved out of CPU time for more than 33ms
>>>> while waiting for an event?
>>>
>>> We need to actively resend the same request for this duration.
>>>
>>>> Could we play games with sched_setscheduler and maybe temporarily go
>>>> SCHED_DEADLINE or something? Would have to look into how to correctly
>>>> restore to the old state from that and from which contexts we can actually
>>>> end up in this wait.
>>>
>>> What would be the benefit wrt. disabling preemption? Note that since
>>> it's a workaround it would be good to keep it simple and close to how it
>>> worked on previous platforms (SKL/APL).
>>
>> It would be nicer not to relax that BUILD_BUG_ON in atomic wait for and, if
>> the main problem is the scheduler/CPU starvation, to see if it can be solved
>> differently. Even though the atomic wait here would trigger very rarely it
>> might be worth coming up with something nicer and generalized.
>>
>> If I understood it correctly, the difference between this wait_for call site
>> and the rest is that here it wants a certain number of COND checks to be
>> guaranteed?
>
> Yes.
>
>> The other call sites care more about checking on enter and exit.
>>
>> So in this case we want the period parameter to actually be guaranteed (or
>> close). This sounded like a good candidate for SCHED_DEADLINE to me. Like
>> wait_for_periodic(COND, TIMEOUT, INTERVAL).
>
> Could be. But this would give less guarantee than disabling preemption,
> as SCHED_DEADLINE still works on a best effort basis. How about
> increasing the timeout now (to 50ms) and trying what you suggest as a
> follow-up? That way we have also something for -stable.
If 50ms works it is fine by me.
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
@ 2017-02-21 14:16 ` Tvrtko Ursulin
0 siblings, 0 replies; 21+ messages in thread
From: Tvrtko Ursulin @ 2017-02-21 14:16 UTC (permalink / raw)
To: imre.deak; +Cc: intel-gfx, stable
On 21/02/2017 14:13, Imre Deak wrote:
> On Tue, Feb 21, 2017 at 01:11:27PM +0000, Tvrtko Ursulin wrote:
>>
>> On 21/02/2017 12:43, Imre Deak wrote:
>>> On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/02/2017 09:37, Chris Wilson wrote:
>>>>> On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
>>>>>> On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
>>>>>>> So that our preempt-off period doesn't grow completely unchecked, or do
>>>>>>> we need that 34ms loop?
>>>>>>
>>>>>> Yes, that's at least how I understand it. Scheduling away is what let's
>>>>>> PCODE start servicing some other request than ours or go idle. That's
>>>>>> in a way what we see when the preempt-enabled poll times out.
>>>>>
>>>>> I was thinking along the lines of if it was just busy/unavailable for the
>>>>> first 33ms that particular time, it just needed to sleep until ready.
>>>>> Once available, the next request ran in the expected 1ms.
>>>>
>>>>> Do you not see any value in trying a sleeping loop? Perhaps compromise
>>>>> and have the preempt-disable timeout increase each iteration.
>>>
>>> This fallback method would work too, but imo the worst case is what
>>> matters and that would be anyway the same in both cases. Because of this
>>> and since it's a WA I'd rather keep it simple.
>>>
>>>> Parachuting in so apologies if I misunderstood something.
>>>>
>>>> Is the issue here that we can get starved out of CPU time for more than 33ms
>>>> while waiting for an event?
>>>
>>> We need to actively resend the same request for this duration.
>>>
>>>> Could we play games with sched_setscheduler and maybe temporarily go
>>>> SCHED_DEADLINE or something? Would have to look into how to correctly
>>>> restore to the old state from that and from which contexts we can actually
>>>> end up in this wait.
>>>
>>> What would be the benefit wrt. disabling preemption? Note that since
>>> it's a workaround it would be good to keep it simple and close to how it
>>> worked on previous platforms (SKL/APL).
>>
>> It would be nicer not to relax that BUILD_BUG_ON in atomic wait for and, if
>> the main problem is the scheduler/CPU starvation, to see if it can be solved
>> differently. Even though the atomic wait here would trigger very rarely it
>> might be worth coming up with something nicer and generalized.
>>
>> If I understood it correctly, the difference between this wait_for call site
>> and the rest is that here it wants a certain number of COND checks to be
>> guaranteed?
>
> Yes.
>
>> The other call sites care more about checking on enter and exit.
>>
>> So in this case we want the period parameter to actually be guaranteed (or
>> close). This sounded like a good candidate for SCHED_DEADLINE to me. Like
>> wait_for_periodic(COND, TIMEOUT, INTERVAL).
>
> Could be. But this would give less guarantee than disabling preemption,
> as SCHED_DEADLINE still works on a best effort basis. How about
> increasing the timeout now (to 50ms) and trying what you suggest as a
> follow-up? That way we have also something for -stable.
If 50ms works it is fine by me.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
2017-02-21 13:19 ` [Intel-gfx] " Chris Wilson
@ 2017-02-21 14:18 ` Imre Deak
0 siblings, 0 replies; 21+ messages in thread
From: Imre Deak @ 2017-02-21 14:18 UTC (permalink / raw)
To: Chris Wilson, Tvrtko Ursulin, intel-gfx, Ville Syrjälä, stable
On Tue, Feb 21, 2017 at 01:19:37PM +0000, Chris Wilson wrote:
> On Tue, Feb 21, 2017 at 02:43:30PM +0200, Imre Deak wrote:
> > On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
> > >
> > > On 21/02/2017 09:37, Chris Wilson wrote:
> > > >On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
> > > >>On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
> > > >>>So that our preempt-off period doesn't grow completely unchecked, or do
> > > >>>we need that 34ms loop?
> > > >>
> > > >>Yes, that's at least how I understand it. Scheduling away is what let's
> > > >>PCODE start servicing some other request than ours or go idle. That's
> > > >>in a way what we see when the preempt-enabled poll times out.
> > > >
> > > >I was thinking along the lines of if it was just busy/unavailable for the
> > > >first 33ms that particular time, it just needed to sleep until ready.
> > > >Once available, the next request ran in the expected 1ms.
> > >
> > > >Do you not see any value in trying a sleeping loop? Perhaps compromise
> > > >and have the preempt-disable timeout increase each iteration.
> >
> > This fallback method would work too, but imo the worst case is what
> > matters and that would be anyway the same in both cases. Because of this
> > and since it's a WA I'd rather keep it simple.
> >
> > > Parachuting in so apologies if I misunderstood something.
> > >
> > > Is the issue here that we can get starved out of CPU time for more than 33ms
> > > while waiting for an event?
> >
> > We need to actively resend the same request for this duration.
> >
> > > Could we play games with sched_setscheduler and maybe temporarily go
> > > SCHED_DEADLINE or something? Would have to look into how to correctly
> > > restore to the old state from that and from which contexts we can actually
> > > end up in this wait.
> >
> > What would be the benefit wrt. disabling preemption? Note that since
> > it's a workaround it would be good to keep it simple and close to how it
> > worked on previous platforms (SKL/APL).
>
> Yeah, I'm not happy with busy-spinning for 34ms without any scheduler
> interaction at all. Or that we don't handle the failure gracefully. Or
> that the hw appears pretty flimsy and the communitcation method is hit
> and miss.
Yes, me neither. It's clearly not by design, since based on the
specification two requests 3ms apart would need to be enough.
> I'd accept a compromise that bumped the timer to 50ms i.e. didn't have
> to up the BUILD_BUG_ON. Only a 50% safety factor, but we are already
> an order of magnitude beyond the expected response time.
>
> 50 I would ack. :|
Ok, I can resend with that if Tvrtko agrees.
--Imre
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms
@ 2017-02-21 14:18 ` Imre Deak
0 siblings, 0 replies; 21+ messages in thread
From: Imre Deak @ 2017-02-21 14:18 UTC (permalink / raw)
To: Chris Wilson, Tvrtko Ursulin, intel-gfx, Ville Syrjälä, stable
On Tue, Feb 21, 2017 at 01:19:37PM +0000, Chris Wilson wrote:
> On Tue, Feb 21, 2017 at 02:43:30PM +0200, Imre Deak wrote:
> > On Tue, Feb 21, 2017 at 10:06:45AM +0000, Tvrtko Ursulin wrote:
> > >
> > > On 21/02/2017 09:37, Chris Wilson wrote:
> > > >On Tue, Feb 21, 2017 at 11:22:12AM +0200, Imre Deak wrote:
> > > >>On Mon, Feb 20, 2017 at 04:05:33PM +0000, Chris Wilson wrote:
> > > >>>So that our preempt-off period doesn't grow completely unchecked, or do
> > > >>>we need that 34ms loop?
> > > >>
> > > >>Yes, that's at least how I understand it. Scheduling away is what let's
> > > >>PCODE start servicing some other request than ours or go idle. That's
> > > >>in a way what we see when the preempt-enabled poll times out.
> > > >
> > > >I was thinking along the lines of if it was just busy/unavailable for the
> > > >first 33ms that particular time, it just needed to sleep until ready.
> > > >Once available, the next request ran in the expected 1ms.
> > >
> > > >Do you not see any value in trying a sleeping loop? Perhaps compromise
> > > >and have the preempt-disable timeout increase each iteration.
> >
> > This fallback method would work too, but imo the worst case is what
> > matters and that would be anyway the same in both cases. Because of this
> > and since it's a WA I'd rather keep it simple.
> >
> > > Parachuting in so apologies if I misunderstood something.
> > >
> > > Is the issue here that we can get starved out of CPU time for more than 33ms
> > > while waiting for an event?
> >
> > We need to actively resend the same request for this duration.
> >
> > > Could we play games with sched_setscheduler and maybe temporarily go
> > > SCHED_DEADLINE or something? Would have to look into how to correctly
> > > restore to the old state from that and from which contexts we can actually
> > > end up in this wait.
> >
> > What would be the benefit wrt. disabling preemption? Note that since
> > it's a workaround it would be good to keep it simple and close to how it
> > worked on previous platforms (SKL/APL).
>
> Yeah, I'm not happy with busy-spinning for 34ms without any scheduler
> interaction at all. Or that we don't handle the failure gracefully. Or
> that the hw appears pretty flimsy and the communitcation method is hit
> and miss.
Yes, me neither. It's clearly not by design, since based on the
specification two requests 3ms apart would need to be enough.
> I'd accept a compromise that bumped the timer to 50ms i.e. didn't have
> to up the BUILD_BUG_ON. Only a 50% safety factor, but we are already
> an order of magnitude beyond the expected response time.
>
> 50 I would ack. :|
Ok, I can resend with that if Tvrtko agrees.
--Imre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2] drm/i915/gen9: Increase PCODE request timeout to 50ms
2017-02-20 15:29 [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms Imre Deak
2017-02-20 16:05 ` Chris Wilson
2017-02-20 17:22 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-02-24 14:32 ` Imre Deak
2017-02-24 15:52 ` ✓ Fi.CI.BAT: success for drm/i915/gen9: Increase PCODE request timeout to 100ms (rev2) Patchwork
3 siblings, 0 replies; 21+ messages in thread
From: Imre Deak @ 2017-02-24 14:32 UTC (permalink / raw)
To: intel-gfx; +Cc: Ville Syrjälä, Chris Wilson, Tvrtko Ursulin, stable
After
commit 2c7d0602c815277f7cb7c932b091288710d8aba7
Author: Imre Deak <imre.deak@intel.com>
Date: Mon Dec 5 18:27:37 2016 +0200
drm/i915/gen9: Fix PCODE polling during CDCLK change notification
there is still one report of the CDCLK-change request timing out on a
KBL machine, see the Reference link. On that machine the maximum time
the request took to succeed was 34ms, so increase the timeout to 50ms.
v2:
- Change timeout from 100 to 50 ms to maintain the current 50 ms limit
for atomic waits in the driver. (Chris, Tvrtko)
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=99345
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
drivers/gpu/drm/i915/intel_pm.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 169c490..e7d4c2a 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7909,10 +7909,10 @@ static bool skl_pcode_try_request(struct drm_i915_private *dev_priv, u32 mbox,
* @timeout_base_ms: timeout for polling with preemption enabled
*
* Keep resending the @request to @mbox until PCODE acknowledges it, PCODE
- * reports an error or an overall timeout of @timeout_base_ms+10 ms expires.
+ * reports an error or an overall timeout of @timeout_base_ms+50 ms expires.
* The request is acknowledged once the PCODE reply dword equals @reply after
* applying @reply_mask. Polling is first attempted with preemption enabled
- * for @timeout_base_ms and if this times out for another 10 ms with
+ * for @timeout_base_ms and if this times out for another 50 ms with
* preemption disabled.
*
* Returns 0 on success, %-ETIMEDOUT in case of a timeout, <0 in case of some
@@ -7948,14 +7948,15 @@ int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
* worst case) _and_ PCODE was busy for some reason even after a
* (queued) request and @timeout_base_ms delay. As a workaround retry
* the poll with preemption disabled to maximize the number of
- * requests. Increase the timeout from @timeout_base_ms to 10ms to
+ * requests. Increase the timeout from @timeout_base_ms to 50ms to
* account for interrupts that could reduce the number of these
- * requests.
+ * requests, and for any quirks of the PCODE firmware that delays
+ * the request completion.
*/
DRM_DEBUG_KMS("PCODE timeout, retrying with preemption disabled\n");
WARN_ON_ONCE(timeout_base_ms > 3);
preempt_disable();
- ret = wait_for_atomic(COND, 10);
+ ret = wait_for_atomic(COND, 50);
preempt_enable();
out:
--
2.5.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915/gen9: Increase PCODE request timeout to 100ms (rev2)
2017-02-20 15:29 [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms Imre Deak
` (2 preceding siblings ...)
2017-02-24 14:32 ` [PATCH v2] drm/i915/gen9: Increase PCODE request timeout to 50ms Imre Deak
@ 2017-02-24 15:52 ` Patchwork
2017-02-24 19:18 ` Chris Wilson
2017-03-01 11:17 ` Imre Deak
3 siblings, 2 replies; 21+ messages in thread
From: Patchwork @ 2017-02-24 15:52 UTC (permalink / raw)
To: Imre Deak; +Cc: intel-gfx
== Series Details ==
Series: drm/i915/gen9: Increase PCODE request timeout to 100ms (rev2)
URL : https://patchwork.freedesktop.org/series/19961/
State : success
== Summary ==
Series 19961v2 drm/i915/gen9: Increase PCODE request timeout to 100ms
https://patchwork.freedesktop.org/api/1.0/series/19961/revisions/2/mbox/
fi-bdw-5557u total:278 pass:267 dwarn:0 dfail:0 fail:0 skip:11
fi-bsw-n3050 total:278 pass:239 dwarn:0 dfail:0 fail:0 skip:39
fi-bxt-j4205 total:278 pass:259 dwarn:0 dfail:0 fail:0 skip:19
fi-bxt-t5700 total:108 pass:95 dwarn:0 dfail:0 fail:0 skip:12
fi-byt-j1900 total:278 pass:251 dwarn:0 dfail:0 fail:0 skip:27
fi-byt-n2820 total:278 pass:247 dwarn:0 dfail:0 fail:0 skip:31
fi-hsw-4770 total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16
fi-hsw-4770r total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16
fi-ilk-650 total:278 pass:228 dwarn:0 dfail:0 fail:0 skip:50
fi-ivb-3520m total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18
fi-ivb-3770 total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18
fi-kbl-7500u total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18
fi-skl-6260u total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10
fi-skl-6700hq total:278 pass:261 dwarn:0 dfail:0 fail:0 skip:17
fi-skl-6700k total:278 pass:256 dwarn:4 dfail:0 fail:0 skip:18
fi-skl-6770hq total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10
fi-snb-2520m total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28
fi-snb-2600 total:278 pass:249 dwarn:0 dfail:0 fail:0 skip:29
69fbd8896f5f272550747b763068883326962d76 drm-tip: 2017y-02m-24d-14h-04m-04s UTC integration manifest
11e17c3 drm/i915/gen9: Increase PCODE request timeout to 50ms
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3961/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: ✓ Fi.CI.BAT: success for drm/i915/gen9: Increase PCODE request timeout to 100ms (rev2)
2017-02-24 15:52 ` ✓ Fi.CI.BAT: success for drm/i915/gen9: Increase PCODE request timeout to 100ms (rev2) Patchwork
@ 2017-02-24 19:18 ` Chris Wilson
2017-03-01 11:17 ` Imre Deak
1 sibling, 0 replies; 21+ messages in thread
From: Chris Wilson @ 2017-02-24 19:18 UTC (permalink / raw)
To: intel-gfx
On Fri, Feb 24, 2017 at 03:52:15PM -0000, Patchwork wrote:
> == Series Details ==
>
> Series: drm/i915/gen9: Increase PCODE request timeout to 100ms (rev2)
> URL : https://patchwork.freedesktop.org/series/19961/
> State : success
>
> == Summary ==
>
> Series 19961v2 drm/i915/gen9: Increase PCODE request timeout to 100ms
> https://patchwork.freedesktop.org/api/1.0/series/19961/revisions/2/mbox/
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: ✓ Fi.CI.BAT: success for drm/i915/gen9: Increase PCODE request timeout to 100ms (rev2)
2017-02-24 15:52 ` ✓ Fi.CI.BAT: success for drm/i915/gen9: Increase PCODE request timeout to 100ms (rev2) Patchwork
2017-02-24 19:18 ` Chris Wilson
@ 2017-03-01 11:17 ` Imre Deak
1 sibling, 0 replies; 21+ messages in thread
From: Imre Deak @ 2017-03-01 11:17 UTC (permalink / raw)
To: intel-gfx, Chris Wilson, Tvrtko Ursulin
On Fri, Feb 24, 2017 at 03:52:15PM +0000, Patchwork wrote:
> == Series Details ==
>
> Series: drm/i915/gen9: Increase PCODE request timeout to 100ms (rev2)
> URL : https://patchwork.freedesktop.org/series/19961/
> State : success
>
> == Summary ==
>
> Series 19961v2 drm/i915/gen9: Increase PCODE request timeout to 100ms
> https://patchwork.freedesktop.org/api/1.0/series/19961/revisions/2/mbox/
>
> fi-bdw-5557u total:278 pass:267 dwarn:0 dfail:0 fail:0 skip:11
> fi-bsw-n3050 total:278 pass:239 dwarn:0 dfail:0 fail:0 skip:39
> fi-bxt-j4205 total:278 pass:259 dwarn:0 dfail:0 fail:0 skip:19
> fi-bxt-t5700 total:108 pass:95 dwarn:0 dfail:0 fail:0 skip:12
> fi-byt-j1900 total:278 pass:251 dwarn:0 dfail:0 fail:0 skip:27
> fi-byt-n2820 total:278 pass:247 dwarn:0 dfail:0 fail:0 skip:31
> fi-hsw-4770 total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16
> fi-hsw-4770r total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16
> fi-ilk-650 total:278 pass:228 dwarn:0 dfail:0 fail:0 skip:50
> fi-ivb-3520m total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18
> fi-ivb-3770 total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18
> fi-kbl-7500u total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18
> fi-skl-6260u total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10
> fi-skl-6700hq total:278 pass:261 dwarn:0 dfail:0 fail:0 skip:17
> fi-skl-6700k total:278 pass:256 dwarn:4 dfail:0 fail:0 skip:18
> fi-skl-6770hq total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10
> fi-snb-2520m total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28
> fi-snb-2600 total:278 pass:249 dwarn:0 dfail:0 fail:0 skip:29
>
> 69fbd8896f5f272550747b763068883326962d76 drm-tip: 2017y-02m-24d-14h-04m-04s UTC integration manifest
> 11e17c3 drm/i915/gen9: Increase PCODE request timeout to 50ms
I pushed this to -dinq with Chris' ack, thanks for the reviews.
--Imre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2017-03-01 11:17 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-20 15:29 [PATCH] drm/i915/gen9: Increase PCODE request timeout to 100ms Imre Deak
2017-02-20 16:05 ` Chris Wilson
2017-02-20 16:05 ` Chris Wilson
2017-02-21 9:22 ` Imre Deak
2017-02-21 9:22 ` Imre Deak
2017-02-21 9:37 ` Chris Wilson
2017-02-21 10:06 ` [Intel-gfx] " Tvrtko Ursulin
2017-02-21 12:43 ` Imre Deak
2017-02-21 13:11 ` Tvrtko Ursulin
2017-02-21 13:11 ` Tvrtko Ursulin
2017-02-21 14:13 ` [Intel-gfx] " Imre Deak
2017-02-21 14:16 ` Tvrtko Ursulin
2017-02-21 14:16 ` Tvrtko Ursulin
2017-02-21 13:19 ` [Intel-gfx] " Chris Wilson
2017-02-21 14:18 ` Imre Deak
2017-02-21 14:18 ` Imre Deak
2017-02-20 17:22 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-02-24 14:32 ` [PATCH v2] drm/i915/gen9: Increase PCODE request timeout to 50ms Imre Deak
2017-02-24 15:52 ` ✓ Fi.CI.BAT: success for drm/i915/gen9: Increase PCODE request timeout to 100ms (rev2) Patchwork
2017-02-24 19:18 ` Chris Wilson
2017-03-01 11:17 ` Imre Deak
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.