* [PATCH] drm/i915: Apply a cond_resched() to the saturated signaler
@ 2017-04-04 10:10 Chris Wilson
2017-04-04 10:34 ` ✓ Fi.CI.BAT: success for " Patchwork
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Chris Wilson @ 2017-04-04 10:10 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
If the engine is continually completing nops, we can saturate the
signaler and keep it working indefinitely. This angers the NMI watchdog!
A good example is to disable semaphores on snb and run igt/gem_exec_nop -
the parallel, multi-engine workloads are more than sufficient to hog the
CPU, preventing the system from even processing ICMP echo replies.
Fixes: c81d46138da6 ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/intel_breadcrumbs.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 1b14c3be6046..763ed497961f 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -622,6 +622,13 @@ static int intel_breadcrumbs_signaler(void *arg)
spin_unlock_irq(&b->rb_lock);
i915_gem_request_put(request);
+
+ /* If the engine is saturated we may be continually
+ * processing completed requests. This angers the
+ * NMI watchdog if we never let anything else
+ * have access to the CPU. Let's pretend to be nice.
+ */
+ cond_resched();
} else {
DEFINE_WAIT(exec);
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 6+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler
2017-04-04 10:10 [PATCH] drm/i915: Apply a cond_resched() to the saturated signaler Chris Wilson
@ 2017-04-04 10:34 ` Patchwork
2017-04-04 12:05 ` [PATCH v2] " Chris Wilson
2017-04-04 12:24 ` ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2) Patchwork
2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2017-04-04 10:34 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Apply a cond_resched() to the saturated signaler
URL : https://patchwork.freedesktop.org/series/22418/
State : success
== Summary ==
Series 22418v1 drm/i915: Apply a cond_resched() to the saturated signaler
https://patchwork.freedesktop.org/api/1.0/series/22418/revisions/1/mbox/
Test gem_exec_flush:
Subgroup basic-batch-kernel-default-uc:
fail -> PASS (fi-snb-2600) fdo#100007
Test kms_cursor_legacy:
Subgroup basic-busy-flip-before-cursor-legacy:
fail -> PASS (fi-snb-2600)
fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fi-bdw-5557u total:278 pass:267 dwarn:0 dfail:0 fail:0 skip:11 time: 438s
fi-bdw-gvtdvm total:278 pass:256 dwarn:8 dfail:0 fail:0 skip:14 time: 426s
fi-bsw-n3050 total:278 pass:239 dwarn:0 dfail:0 fail:0 skip:39 time: 581s
fi-bxt-j4205 total:278 pass:259 dwarn:0 dfail:0 fail:0 skip:19 time: 507s
fi-bxt-t5700 total:278 pass:258 dwarn:0 dfail:0 fail:0 skip:20 time: 558s
fi-byt-j1900 total:278 pass:251 dwarn:0 dfail:0 fail:0 skip:27 time: 487s
fi-byt-n2820 total:278 pass:247 dwarn:0 dfail:0 fail:0 skip:31 time: 476s
fi-hsw-4770 total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time: 412s
fi-hsw-4770r total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time: 403s
fi-ilk-650 total:278 pass:228 dwarn:0 dfail:0 fail:0 skip:50 time: 421s
fi-ivb-3520m total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time: 488s
fi-ivb-3770 total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time: 470s
fi-kbl-7500u total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time: 452s
fi-kbl-7560u total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time: 570s
fi-skl-6260u total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time: 454s
fi-skl-6700hq total:278 pass:261 dwarn:0 dfail:0 fail:0 skip:17 time: 569s
fi-skl-6700k total:278 pass:256 dwarn:4 dfail:0 fail:0 skip:18 time: 458s
fi-skl-6770hq total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time: 488s
fi-skl-gvtdvm total:278 pass:265 dwarn:0 dfail:0 fail:0 skip:13 time: 435s
fi-snb-2520m total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time: 528s
fi-snb-2600 total:278 pass:249 dwarn:0 dfail:0 fail:0 skip:29 time: 400s
abec0508865ae60a2bb30ca3ba5ca2c4ac49afc5 drm-tip: 2017y-04m-04d-08h-52m-39s UTC integration manifest
ba432f5 drm/i915: Apply a cond_resched() to the saturated signaler
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4389/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2] drm/i915: Apply a cond_resched() to the saturated signaler
2017-04-04 10:10 [PATCH] drm/i915: Apply a cond_resched() to the saturated signaler Chris Wilson
2017-04-04 10:34 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-04-04 12:05 ` Chris Wilson
2017-04-04 12:38 ` Tvrtko Ursulin
2017-04-04 12:24 ` ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2) Patchwork
2 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2017-04-04 12:05 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
If the engine is continually completing nops, we can saturate the
signaler and keep it working indefinitely. This angers the NMI watchdog!
A good example is to disable semaphores on snb and run igt/gem_exec_nop -
the parallel, multi-engine workloads are more than sufficient to hog the
CPU, preventing the system from even processing ICMP echo replies.
v2: Tvrtko dug into cond_resched() on x86 and found that it only
depended upon preempt_cound and not tif_need_resched() - which means
that we would always call schedule() at that point.
Fixes: c81d46138da6 ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/intel_breadcrumbs.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 308c56a021ab..9ccbf26124c6 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -580,6 +580,8 @@ static int intel_breadcrumbs_signaler(void *arg)
signaler_set_rtpriority();
do {
+ bool do_schedule = true;
+
set_current_state(TASK_INTERRUPTIBLE);
/* We are either woken up by the interrupt bottom-half,
@@ -626,7 +628,18 @@ static int intel_breadcrumbs_signaler(void *arg)
spin_unlock_irq(&b->rb_lock);
i915_gem_request_put(request);
- } else {
+
+ /* If the engine is saturated we may be continually
+ * processing completed requests. This angers the
+ * NMI watchdog if we never let anything else
+ * have access to the CPU. Let's pretend to be nice
+ * and relinquish the CPU if we burn through the
+ * entire RT timeslice!
+ */
+ do_schedule = need_resched();
+ }
+
+ if (unlikely(do_schedule)) {
DEFINE_WAIT(exec);
if (kthread_should_park())
--
2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 6+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2)
2017-04-04 10:10 [PATCH] drm/i915: Apply a cond_resched() to the saturated signaler Chris Wilson
2017-04-04 10:34 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-04-04 12:05 ` [PATCH v2] " Chris Wilson
@ 2017-04-04 12:24 ` Patchwork
2017-04-04 12:52 ` Chris Wilson
2 siblings, 1 reply; 6+ messages in thread
From: Patchwork @ 2017-04-04 12:24 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Apply a cond_resched() to the saturated signaler (rev2)
URL : https://patchwork.freedesktop.org/series/22418/
State : success
== Summary ==
Series 22418v2 drm/i915: Apply a cond_resched() to the saturated signaler
https://patchwork.freedesktop.org/api/1.0/series/22418/revisions/2/mbox/
Test gem_exec_flush:
Subgroup basic-batch-kernel-default-uc:
fail -> PASS (fi-snb-2600) fdo#100007
Test gem_exec_suspend:
Subgroup basic-s4-devices:
pass -> DMESG-WARN (fi-kbl-7560u) fdo#100125
pass -> DMESG-WARN (fi-bxt-t5700) fdo#100125
Test kms_cursor_legacy:
Subgroup basic-busy-flip-before-cursor-legacy:
fail -> PASS (fi-snb-2600)
fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125
fi-bdw-5557u total:278 pass:267 dwarn:0 dfail:0 fail:0 skip:11 time: 433s
fi-bdw-gvtdvm total:278 pass:256 dwarn:8 dfail:0 fail:0 skip:14 time: 427s
fi-bsw-n3050 total:278 pass:239 dwarn:0 dfail:0 fail:0 skip:39 time: 573s
fi-bxt-j4205 total:278 pass:259 dwarn:0 dfail:0 fail:0 skip:19 time: 509s
fi-bxt-t5700 total:278 pass:257 dwarn:1 dfail:0 fail:0 skip:20 time: 548s
fi-byt-j1900 total:278 pass:251 dwarn:0 dfail:0 fail:0 skip:27 time: 484s
fi-byt-n2820 total:278 pass:247 dwarn:0 dfail:0 fail:0 skip:31 time: 478s
fi-hsw-4770 total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time: 414s
fi-hsw-4770r total:278 pass:262 dwarn:0 dfail:0 fail:0 skip:16 time: 409s
fi-ilk-650 total:278 pass:228 dwarn:0 dfail:0 fail:0 skip:50 time: 416s
fi-ivb-3520m total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time: 483s
fi-ivb-3770 total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time: 468s
fi-kbl-7500u total:278 pass:260 dwarn:0 dfail:0 fail:0 skip:18 time: 462s
fi-kbl-7560u total:278 pass:267 dwarn:1 dfail:0 fail:0 skip:10 time: 566s
fi-skl-6260u total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time: 457s
fi-skl-6700hq total:278 pass:261 dwarn:0 dfail:0 fail:0 skip:17 time: 568s
fi-skl-6700k total:278 pass:256 dwarn:4 dfail:0 fail:0 skip:18 time: 461s
fi-skl-6770hq total:278 pass:268 dwarn:0 dfail:0 fail:0 skip:10 time: 495s
fi-skl-gvtdvm total:278 pass:265 dwarn:0 dfail:0 fail:0 skip:13 time: 438s
fi-snb-2520m total:278 pass:250 dwarn:0 dfail:0 fail:0 skip:28 time: 532s
fi-snb-2600 total:278 pass:249 dwarn:0 dfail:0 fail:0 skip:29 time: 402s
abec0508865ae60a2bb30ca3ba5ca2c4ac49afc5 drm-tip: 2017y-04m-04d-08h-52m-39s UTC integration manifest
e15d53e drm/i915: Apply a cond_resched() to the saturated signaler
== Logs ==
For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4392/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] drm/i915: Apply a cond_resched() to the saturated signaler
2017-04-04 12:05 ` [PATCH v2] " Chris Wilson
@ 2017-04-04 12:38 ` Tvrtko Ursulin
0 siblings, 0 replies; 6+ messages in thread
From: Tvrtko Ursulin @ 2017-04-04 12:38 UTC (permalink / raw)
To: Chris Wilson, intel-gfx; +Cc: Mika Kuoppala
On 04/04/2017 13:05, Chris Wilson wrote:
> If the engine is continually completing nops, we can saturate the
> signaler and keep it working indefinitely. This angers the NMI watchdog!
>
> A good example is to disable semaphores on snb and run igt/gem_exec_nop -
> the parallel, multi-engine workloads are more than sufficient to hog the
> CPU, preventing the system from even processing ICMP echo replies.
>
> v2: Tvrtko dug into cond_resched() on x86 and found that it only
> depended upon preempt_cound and not tif_need_resched() - which means
> that we would always call schedule() at that point.
>
> Fixes: c81d46138da6 ("drm/i915: Convert trace-irq to the breadcrumb waiter")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
> drivers/gpu/drm/i915/intel_breadcrumbs.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 308c56a021ab..9ccbf26124c6 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -580,6 +580,8 @@ static int intel_breadcrumbs_signaler(void *arg)
> signaler_set_rtpriority();
>
> do {
> + bool do_schedule = true;
> +
> set_current_state(TASK_INTERRUPTIBLE);
>
> /* We are either woken up by the interrupt bottom-half,
> @@ -626,7 +628,18 @@ static int intel_breadcrumbs_signaler(void *arg)
> spin_unlock_irq(&b->rb_lock);
>
> i915_gem_request_put(request);
> - } else {
> +
> + /* If the engine is saturated we may be continually
> + * processing completed requests. This angers the
> + * NMI watchdog if we never let anything else
> + * have access to the CPU. Let's pretend to be nice
> + * and relinquish the CPU if we burn through the
> + * entire RT timeslice!
> + */
> + do_schedule = need_resched();
> + }
> +
> + if (unlikely(do_schedule)) {
> DEFINE_WAIT(exec);
>
> if (kthread_should_park())
>
My thinking was to add a check before the request assignment like:
rcu_read_lock();
request = ...;
if (request && !need_resched())
request = ...;
else
request = NULL;
rcu_read_unlock();
But this looks correct as well, maybe it is just my preference on what
would have been easier to understand.
I trust that you have tested it both for solving the NMI lockup detector
issue and that it doesn't affect the signaller latency a lot.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2)
2017-04-04 12:24 ` ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2) Patchwork
@ 2017-04-04 12:52 ` Chris Wilson
0 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2017-04-04 12:52 UTC (permalink / raw)
To: intel-gfx
On Tue, Apr 04, 2017 at 12:24:15PM -0000, Patchwork wrote:
> == Series Details ==
>
> Series: drm/i915: Apply a cond_resched() to the saturated signaler (rev2)
> URL : https://patchwork.freedesktop.org/series/22418/
> State : success
>
> == Summary ==
>
> Series 22418v2 drm/i915: Apply a cond_resched() to the saturated signaler
> https://patchwork.freedesktop.org/api/1.0/series/22418/revisions/2/mbox/
>
> Test gem_exec_flush:
> Subgroup basic-batch-kernel-default-uc:
> fail -> PASS (fi-snb-2600) fdo#100007
> Test gem_exec_suspend:
> Subgroup basic-s4-devices:
> pass -> DMESG-WARN (fi-kbl-7560u) fdo#100125
> pass -> DMESG-WARN (fi-bxt-t5700) fdo#100125
> Test kms_cursor_legacy:
> Subgroup basic-busy-flip-before-cursor-legacy:
> fail -> PASS (fi-snb-2600)
>
> fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
> fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125
Unrelated.
From the aether, Tvrtko wrote:
> My thinking was to add a check before the request assignment like:
>
> rcu_read_lock();
> request = ...;
> if (request && !need_resched())
> request = ...;
> else
> request = NULL;
> rcu_read_unlock();
>
> But this looks correct as well, maybe it is just my preference on what
> would have been easier to understand.
This has the danger of missing a wake-up reason. After setting
TASK_INTERRUPTIBLE, we must then check all the possible wake-up sources
before calling schedule, or else we risk another kthread_park() style
bug.
Thanks for the review and digging through the cond_resched() mystery.
Pushed,
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-04-04 12:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-04 10:10 [PATCH] drm/i915: Apply a cond_resched() to the saturated signaler Chris Wilson
2017-04-04 10:34 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-04-04 12:05 ` [PATCH v2] " Chris Wilson
2017-04-04 12:38 ` Tvrtko Ursulin
2017-04-04 12:24 ` ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2) Patchwork
2017-04-04 12:52 ` Chris Wilson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.