All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Apply a cond_resched() to the saturated signaler
@ 2017-04-04 10:10 Chris Wilson
  2017-04-04 10:34 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Chris Wilson @ 2017-04-04 10:10 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

If the engine is continually completing nops, we can saturate the
signaler and keep it working indefinitely. This angers the NMI watchdog!

A good example is to disable semaphores on snb and run igt/gem_exec_nop -
the parallel, multi-engine workloads are more than sufficient to hog the
CPU, preventing the system from even processing ICMP echo replies.

Fixes: c81d46138da6 ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 1b14c3be6046..763ed497961f 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -622,6 +622,13 @@ static int intel_breadcrumbs_signaler(void *arg)
 			spin_unlock_irq(&b->rb_lock);
 
 			i915_gem_request_put(request);
+
+			/* If the engine is saturated we may be continually
+			 * processing completed requests. This angers the
+			 * NMI watchdog if we never let anything else
+			 * have access to the CPU. Let's pretend to be nice.
+			 */
+			cond_resched();
 		} else {
 			DEFINE_WAIT(exec);
 
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler
  2017-04-04 10:10 [PATCH] drm/i915: Apply a cond_resched() to the saturated signaler Chris Wilson
@ 2017-04-04 10:34 ` Patchwork
  2017-04-04 12:05 ` [PATCH v2] " Chris Wilson
  2017-04-04 12:24 ` ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2) Patchwork
  2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2017-04-04 10:34 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Apply a cond_resched() to the saturated signaler
URL   : https://patchwork.freedesktop.org/series/22418/
State : success

== Summary ==

Series 22418v1 drm/i915: Apply a cond_resched() to the saturated signaler
https://patchwork.freedesktop.org/api/1.0/series/22418/revisions/1/mbox/

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                fail       -> PASS       (fi-snb-2600) fdo#100007
Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-legacy:
                fail       -> PASS       (fi-snb-2600)

fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007

fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time: 438s
fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time: 426s
fi-bsw-n3050     total:278  pass:239  dwarn:0   dfail:0   fail:0   skip:39  time: 581s
fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time: 507s
fi-bxt-t5700     total:278  pass:258  dwarn:0   dfail:0   fail:0   skip:20  time: 558s
fi-byt-j1900     total:278  pass:251  dwarn:0   dfail:0   fail:0   skip:27  time: 487s
fi-byt-n2820     total:278  pass:247  dwarn:0   dfail:0   fail:0   skip:31  time: 476s
fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time: 412s
fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time: 403s
fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time: 421s
fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 488s
fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 470s
fi-kbl-7500u     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 452s
fi-kbl-7560u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time: 570s
fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time: 454s
fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17  time: 569s
fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time: 458s
fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time: 488s
fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time: 435s
fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time: 528s
fi-snb-2600      total:278  pass:249  dwarn:0   dfail:0   fail:0   skip:29  time: 400s

abec0508865ae60a2bb30ca3ba5ca2c4ac49afc5 drm-tip: 2017y-04m-04d-08h-52m-39s UTC integration manifest
ba432f5 drm/i915: Apply a cond_resched() to the saturated signaler

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4389/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] drm/i915: Apply a cond_resched() to the saturated signaler
  2017-04-04 10:10 [PATCH] drm/i915: Apply a cond_resched() to the saturated signaler Chris Wilson
  2017-04-04 10:34 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-04-04 12:05 ` Chris Wilson
  2017-04-04 12:38   ` Tvrtko Ursulin
  2017-04-04 12:24 ` ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2) Patchwork
  2 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2017-04-04 12:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

If the engine is continually completing nops, we can saturate the
signaler and keep it working indefinitely. This angers the NMI watchdog!

A good example is to disable semaphores on snb and run igt/gem_exec_nop -
the parallel, multi-engine workloads are more than sufficient to hog the
CPU, preventing the system from even processing ICMP echo replies.

v2: Tvrtko dug into cond_resched() on x86 and found that it only
depended upon preempt_cound and not tif_need_resched() - which means
that we would always call schedule() at that point.

Fixes: c81d46138da6 ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 308c56a021ab..9ccbf26124c6 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -580,6 +580,8 @@ static int intel_breadcrumbs_signaler(void *arg)
 	signaler_set_rtpriority();
 
 	do {
+		bool do_schedule = true;
+
 		set_current_state(TASK_INTERRUPTIBLE);
 
 		/* We are either woken up by the interrupt bottom-half,
@@ -626,7 +628,18 @@ static int intel_breadcrumbs_signaler(void *arg)
 			spin_unlock_irq(&b->rb_lock);
 
 			i915_gem_request_put(request);
-		} else {
+
+			/* If the engine is saturated we may be continually
+			 * processing completed requests. This angers the
+			 * NMI watchdog if we never let anything else
+			 * have access to the CPU. Let's pretend to be nice
+			 * and relinquish the CPU if we burn through the
+			 * entire RT timeslice!
+			 */
+			do_schedule = need_resched();
+		}
+
+		if (unlikely(do_schedule)) {
 			DEFINE_WAIT(exec);
 
 			if (kthread_should_park())
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2)
  2017-04-04 10:10 [PATCH] drm/i915: Apply a cond_resched() to the saturated signaler Chris Wilson
  2017-04-04 10:34 ` ✓ Fi.CI.BAT: success for " Patchwork
  2017-04-04 12:05 ` [PATCH v2] " Chris Wilson
@ 2017-04-04 12:24 ` Patchwork
  2017-04-04 12:52   ` Chris Wilson
  2 siblings, 1 reply; 6+ messages in thread
From: Patchwork @ 2017-04-04 12:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Apply a cond_resched() to the saturated signaler (rev2)
URL   : https://patchwork.freedesktop.org/series/22418/
State : success

== Summary ==

Series 22418v2 drm/i915: Apply a cond_resched() to the saturated signaler
https://patchwork.freedesktop.org/api/1.0/series/22418/revisions/2/mbox/

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                fail       -> PASS       (fi-snb-2600) fdo#100007
Test gem_exec_suspend:
        Subgroup basic-s4-devices:
                pass       -> DMESG-WARN (fi-kbl-7560u) fdo#100125
                pass       -> DMESG-WARN (fi-bxt-t5700) fdo#100125
Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-legacy:
                fail       -> PASS       (fi-snb-2600)

fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125

fi-bdw-5557u     total:278  pass:267  dwarn:0   dfail:0   fail:0   skip:11  time: 433s
fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time: 427s
fi-bsw-n3050     total:278  pass:239  dwarn:0   dfail:0   fail:0   skip:39  time: 573s
fi-bxt-j4205     total:278  pass:259  dwarn:0   dfail:0   fail:0   skip:19  time: 509s
fi-bxt-t5700     total:278  pass:257  dwarn:1   dfail:0   fail:0   skip:20  time: 548s
fi-byt-j1900     total:278  pass:251  dwarn:0   dfail:0   fail:0   skip:27  time: 484s
fi-byt-n2820     total:278  pass:247  dwarn:0   dfail:0   fail:0   skip:31  time: 478s
fi-hsw-4770      total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time: 414s
fi-hsw-4770r     total:278  pass:262  dwarn:0   dfail:0   fail:0   skip:16  time: 409s
fi-ilk-650       total:278  pass:228  dwarn:0   dfail:0   fail:0   skip:50  time: 416s
fi-ivb-3520m     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 483s
fi-ivb-3770      total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 468s
fi-kbl-7500u     total:278  pass:260  dwarn:0   dfail:0   fail:0   skip:18  time: 462s
fi-kbl-7560u     total:278  pass:267  dwarn:1   dfail:0   fail:0   skip:10  time: 566s
fi-skl-6260u     total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time: 457s
fi-skl-6700hq    total:278  pass:261  dwarn:0   dfail:0   fail:0   skip:17  time: 568s
fi-skl-6700k     total:278  pass:256  dwarn:4   dfail:0   fail:0   skip:18  time: 461s
fi-skl-6770hq    total:278  pass:268  dwarn:0   dfail:0   fail:0   skip:10  time: 495s
fi-skl-gvtdvm    total:278  pass:265  dwarn:0   dfail:0   fail:0   skip:13  time: 438s
fi-snb-2520m     total:278  pass:250  dwarn:0   dfail:0   fail:0   skip:28  time: 532s
fi-snb-2600      total:278  pass:249  dwarn:0   dfail:0   fail:0   skip:29  time: 402s

abec0508865ae60a2bb30ca3ba5ca2c4ac49afc5 drm-tip: 2017y-04m-04d-08h-52m-39s UTC integration manifest
e15d53e drm/i915: Apply a cond_resched() to the saturated signaler

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_4392/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] drm/i915: Apply a cond_resched() to the saturated signaler
  2017-04-04 12:05 ` [PATCH v2] " Chris Wilson
@ 2017-04-04 12:38   ` Tvrtko Ursulin
  0 siblings, 0 replies; 6+ messages in thread
From: Tvrtko Ursulin @ 2017-04-04 12:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Mika Kuoppala


On 04/04/2017 13:05, Chris Wilson wrote:
> If the engine is continually completing nops, we can saturate the
> signaler and keep it working indefinitely. This angers the NMI watchdog!
>
> A good example is to disable semaphores on snb and run igt/gem_exec_nop -
> the parallel, multi-engine workloads are more than sufficient to hog the
> CPU, preventing the system from even processing ICMP echo replies.
>
> v2: Tvrtko dug into cond_resched() on x86 and found that it only
> depended upon preempt_cound and not tif_need_resched() - which means
> that we would always call schedule() at that point.
>
> Fixes: c81d46138da6 ("drm/i915: Convert trace-irq to the breadcrumb waiter")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_breadcrumbs.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 308c56a021ab..9ccbf26124c6 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -580,6 +580,8 @@ static int intel_breadcrumbs_signaler(void *arg)
>  	signaler_set_rtpriority();
>
>  	do {
> +		bool do_schedule = true;
> +
>  		set_current_state(TASK_INTERRUPTIBLE);
>
>  		/* We are either woken up by the interrupt bottom-half,
> @@ -626,7 +628,18 @@ static int intel_breadcrumbs_signaler(void *arg)
>  			spin_unlock_irq(&b->rb_lock);
>
>  			i915_gem_request_put(request);
> -		} else {
> +
> +			/* If the engine is saturated we may be continually
> +			 * processing completed requests. This angers the
> +			 * NMI watchdog if we never let anything else
> +			 * have access to the CPU. Let's pretend to be nice
> +			 * and relinquish the CPU if we burn through the
> +			 * entire RT timeslice!
> +			 */
> +			do_schedule = need_resched();
> +		}
> +
> +		if (unlikely(do_schedule)) {
>  			DEFINE_WAIT(exec);
>
>  			if (kthread_should_park())
>

My thinking was to add a check before the request assignment like:

	rcu_read_lock();
	request = ...;
	if (request && !need_resched())
		request = ...;
	else
		request = NULL;
	rcu_read_unlock();

But this looks correct as well, maybe it is just my preference on what 
would have been easier to understand.

I trust that you have tested it both for solving the NMI lockup detector 
issue and that it doesn't affect the signaller latency a lot.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2)
  2017-04-04 12:24 ` ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2) Patchwork
@ 2017-04-04 12:52   ` Chris Wilson
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2017-04-04 12:52 UTC (permalink / raw)
  To: intel-gfx

On Tue, Apr 04, 2017 at 12:24:15PM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915: Apply a cond_resched() to the saturated signaler (rev2)
> URL   : https://patchwork.freedesktop.org/series/22418/
> State : success
> 
> == Summary ==
> 
> Series 22418v2 drm/i915: Apply a cond_resched() to the saturated signaler
> https://patchwork.freedesktop.org/api/1.0/series/22418/revisions/2/mbox/
> 
> Test gem_exec_flush:
>         Subgroup basic-batch-kernel-default-uc:
>                 fail       -> PASS       (fi-snb-2600) fdo#100007
> Test gem_exec_suspend:
>         Subgroup basic-s4-devices:
>                 pass       -> DMESG-WARN (fi-kbl-7560u) fdo#100125
>                 pass       -> DMESG-WARN (fi-bxt-t5700) fdo#100125
> Test kms_cursor_legacy:
>         Subgroup basic-busy-flip-before-cursor-legacy:
>                 fail       -> PASS       (fi-snb-2600)
> 
> fdo#100007 https://bugs.freedesktop.org/show_bug.cgi?id=100007
> fdo#100125 https://bugs.freedesktop.org/show_bug.cgi?id=100125

Unrelated.

From the aether, Tvrtko wrote:
> My thinking was to add a check before the request assignment like:
> 
> 	rcu_read_lock();
> 	request = ...;
> 	if (request && !need_resched())
> 		request = ...;
> 	else
> 		request = NULL;
> 	rcu_read_unlock();
> 
> But this looks correct as well, maybe it is just my preference on what 
> would have been easier to understand.

This has the danger of missing a wake-up reason. After setting
TASK_INTERRUPTIBLE, we must then check all the possible wake-up sources
before calling schedule, or else we risk another kthread_park() style
bug.

Thanks for the review and digging through the cond_resched() mystery.
Pushed,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-04 12:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-04 10:10 [PATCH] drm/i915: Apply a cond_resched() to the saturated signaler Chris Wilson
2017-04-04 10:34 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-04-04 12:05 ` [PATCH v2] " Chris Wilson
2017-04-04 12:38   ` Tvrtko Ursulin
2017-04-04 12:24 ` ✓ Fi.CI.BAT: success for drm/i915: Apply a cond_resched() to the saturated signaler (rev2) Patchwork
2017-04-04 12:52   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.