All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915/selftests: Increase timeout in requests perf selftest
@ 2021-10-11 17:57 ` Matthew Brost
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew Brost @ 2021-10-11 17:57 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: thomas.hellstrom

perf_parallel_engines is micro benchmark to test i915 request
scheduling. The test creates a thread per physical engine and submits
NOP requests and waits the requests to complete in a loop. In execlists
mode this works perfectly fine as powerful CPU has enough cores to feed
each engine and process the CSBs. With GuC submission the uC gets
overwhelmed as all threads feed into a single CTB channel and the GuC
gets bombarded with CSBs as contexts are immediately switched in and out
on the engines due to the zero runtime of the requests. When the GuC is
overwhelmed scheduling of contexts is unfair due to the nature of the
GuC scheduling algorithm. This behavior is understood and deemed
acceptable as this micro benchmark isn't close to real world use case.
Increasing the timeout of wait period for requests to complete. This
makes the test understand that is ok for contexts to get starved in this
scenario.

A future patch / cleanup may just delete these micro benchmark tests as
they basically mean nothing. We care about real workloads not made up
ones.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index d67710d10615..6496671a113c 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -2805,7 +2805,7 @@ static int p_sync0(void *arg)
 		i915_request_add(rq);
 
 		err = 0;
-		if (i915_request_wait(rq, 0, HZ / 5) < 0)
+		if (i915_request_wait(rq, 0, HZ) < 0)
 			err = -ETIME;
 		i915_request_put(rq);
 		if (err)
@@ -2876,7 +2876,7 @@ static int p_sync1(void *arg)
 		i915_request_add(rq);
 
 		err = 0;
-		if (prev && i915_request_wait(prev, 0, HZ / 5) < 0)
+		if (prev && i915_request_wait(prev, 0, HZ) < 0)
 			err = -ETIME;
 		i915_request_put(prev);
 		prev = rq;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/selftests: Increase timeout in requests perf selftest
@ 2021-10-11 17:57 ` Matthew Brost
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew Brost @ 2021-10-11 17:57 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: thomas.hellstrom

perf_parallel_engines is micro benchmark to test i915 request
scheduling. The test creates a thread per physical engine and submits
NOP requests and waits the requests to complete in a loop. In execlists
mode this works perfectly fine as powerful CPU has enough cores to feed
each engine and process the CSBs. With GuC submission the uC gets
overwhelmed as all threads feed into a single CTB channel and the GuC
gets bombarded with CSBs as contexts are immediately switched in and out
on the engines due to the zero runtime of the requests. When the GuC is
overwhelmed scheduling of contexts is unfair due to the nature of the
GuC scheduling algorithm. This behavior is understood and deemed
acceptable as this micro benchmark isn't close to real world use case.
Increasing the timeout of wait period for requests to complete. This
makes the test understand that is ok for contexts to get starved in this
scenario.

A future patch / cleanup may just delete these micro benchmark tests as
they basically mean nothing. We care about real workloads not made up
ones.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index d67710d10615..6496671a113c 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -2805,7 +2805,7 @@ static int p_sync0(void *arg)
 		i915_request_add(rq);
 
 		err = 0;
-		if (i915_request_wait(rq, 0, HZ / 5) < 0)
+		if (i915_request_wait(rq, 0, HZ) < 0)
 			err = -ETIME;
 		i915_request_put(rq);
 		if (err)
@@ -2876,7 +2876,7 @@ static int p_sync1(void *arg)
 		i915_request_add(rq);
 
 		err = 0;
-		if (prev && i915_request_wait(prev, 0, HZ / 5) < 0)
+		if (prev && i915_request_wait(prev, 0, HZ) < 0)
 			err = -ETIME;
 		i915_request_put(prev);
 		prev = rq;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/selftests: Increase timeout in requests perf selftest
  2021-10-11 17:57 ` [Intel-gfx] " Matthew Brost
  (?)
@ 2021-10-11 19:10 ` Patchwork
  -1 siblings, 0 replies; 5+ messages in thread
From: Patchwork @ 2021-10-11 19:10 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 3301 bytes --]

== Series Details ==

Series: drm/i915/selftests: Increase timeout in requests perf selftest
URL   : https://patchwork.freedesktop.org/series/95688/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10717 -> Patchwork_21307
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_21307 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21307, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21307/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_21307:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@gem:
    - fi-pnv-d510:        [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10717/fi-pnv-d510/igt@i915_selftest@live@gem.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21307/fi-pnv-d510/igt@i915_selftest@live@gem.html

  
Known issues
------------

  Here are the changes found in Patchwork_21307 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_cs_nop@sync-fork-compute0:
    - fi-snb-2600:        NOTRUN -> [SKIP][3] ([fdo#109271]) +17 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21307/fi-snb-2600/igt@amdgpu/amd_cs_nop@sync-fork-compute0.html

  * igt@runner@aborted:
    - fi-pnv-d510:        NOTRUN -> [FAIL][4] ([fdo#109271] / [i915#2403])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21307/fi-pnv-d510/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@hangcheck:
    - fi-snb-2600:        [INCOMPLETE][5] ([i915#3921]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10717/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21307/fi-snb-2600/igt@i915_selftest@live@hangcheck.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#2403]: https://gitlab.freedesktop.org/drm/intel/issues/2403
  [i915#3921]: https://gitlab.freedesktop.org/drm/intel/issues/3921


Participating hosts (37 -> 19)
------------------------------

  Missing    (18): fi-kbl-soraka fi-cml-u2 fi-kbl-7567u fi-bxt-dsi fi-bdw-5557u fi-jsl-1 fi-bsw-n3050 fi-glk-dsi fi-icl-u2 fi-bsw-cyan fi-kbl-7500u fi-cfl-8109u fi-skl-6700k2 fi-ehl-2 fi-bsw-kefka fi-bsw-nick fi-skl-6600u fi-kbl-r 


Build changes
-------------

  * Linux: CI_DRM_10717 -> Patchwork_21307

  CI-20190529: 20190529
  CI_DRM_10717: 81e199c3565fe949631d8d08343bd89632a8ec0c @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6242: 721fd85ee95225ed5df322f7182bdfa9b86a3e68 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_21307: 5a3113b7da092778de3dbfe25fcb183fdf558b8f @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

5a3113b7da09 drm/i915/selftests: Increase timeout in requests perf selftest

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21307/index.html

[-- Attachment #2: Type: text/html, Size: 4059 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/selftests: Increase timeout in requests perf selftest
  2021-10-11 17:57 ` [Intel-gfx] " Matthew Brost
  (?)
  (?)
@ 2021-10-20 20:34 ` John Harrison
  2021-10-21  5:36   ` Thomas Hellström
  -1 siblings, 1 reply; 5+ messages in thread
From: John Harrison @ 2021-10-20 20:34 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: thomas.hellstrom

On 10/11/2021 10:57, Matthew Brost wrote:
> perf_parallel_engines is micro benchmark to test i915 request
> scheduling. The test creates a thread per physical engine and submits
> NOP requests and waits the requests to complete in a loop. In execlists
> mode this works perfectly fine as powerful CPU has enough cores to feed
> each engine and process the CSBs. With GuC submission the uC gets
> overwhelmed as all threads feed into a single CTB channel and the GuC
> gets bombarded with CSBs as contexts are immediately switched in and out
> on the engines due to the zero runtime of the requests. When the GuC is
> overwhelmed scheduling of contexts is unfair due to the nature of the
> GuC scheduling algorithm. This behavior is understood and deemed
> acceptable as this micro benchmark isn't close to real world use case.
> Increasing the timeout of wait period for requests to complete. This
> makes the test understand that is ok for contexts to get starved in this
> scenario.
>
> A future patch / cleanup may just delete these micro benchmark tests as
> they basically mean nothing. We care about real workloads not made up
> ones.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index d67710d10615..6496671a113c 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -2805,7 +2805,7 @@ static int p_sync0(void *arg)
>   		i915_request_add(rq);
>   
>   		err = 0;
> -		if (i915_request_wait(rq, 0, HZ / 5) < 0)
> +		if (i915_request_wait(rq, 0, HZ) < 0)
>   			err = -ETIME;
>   		i915_request_put(rq);
>   		if (err)
> @@ -2876,7 +2876,7 @@ static int p_sync1(void *arg)
>   		i915_request_add(rq);
>   
>   		err = 0;
> -		if (prev && i915_request_wait(prev, 0, HZ / 5) < 0)
> +		if (prev && i915_request_wait(prev, 0, HZ) < 0)
>   			err = -ETIME;
>   		i915_request_put(prev);
>   		prev = rq;


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/selftests: Increase timeout in requests perf selftest
  2021-10-20 20:34 ` [Intel-gfx] [PATCH] " John Harrison
@ 2021-10-21  5:36   ` Thomas Hellström
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Hellström @ 2021-10-21  5:36 UTC (permalink / raw)
  To: John Harrison, Matthew Brost, intel-gfx, dri-devel

On Wed, 2021-10-20 at 13:34 -0700, John Harrison wrote:
> On 10/11/2021 10:57, Matthew Brost wrote:
> > perf_parallel_engines is micro benchmark to test i915 request
> > scheduling. The test creates a thread per physical engine and
> > submits
> > NOP requests and waits the requests to complete in a loop. In
> > execlists
> > mode this works perfectly fine as powerful CPU has enough cores to
> > feed
> > each engine and process the CSBs. With GuC submission the uC gets
> > overwhelmed as all threads feed into a single CTB channel and the
> > GuC
> > gets bombarded with CSBs as contexts are immediately switched in
> > and out
> > on the engines due to the zero runtime of the requests. When the
> > GuC is
> > overwhelmed scheduling of contexts is unfair due to the nature of
> > the
> > GuC scheduling algorithm. This behavior is understood and deemed
> > acceptable as this micro benchmark isn't close to real world use
> > case.
> > Increasing the timeout of wait period for requests to complete.
> > This
> > makes the test understand that is ok for contexts to get starved in
> > this
> > scenario.
> > 
> > A future patch / cleanup may just delete these micro benchmark
> > tests as
> > they basically mean nothing. We care about real workloads not made
> > up
> > ones.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

Also
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

I think one important thing to keep in mind here is that this selftest
actually *did* find a flaw, Albeit it upon analysis turned out not to
be significant.

But given that, user-space clients like, for example, gem_exec_suspend
seems to be able to trigger similar delays as well at least to some
extend with a huge amount of small requests submitted from user-space
we shold probably verify at some point that this isn't exploitable by a
malicious client starving other clients on the same system.

/Thomas


> 
> > ---
> >   drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c
> > b/drivers/gpu/drm/i915/selftests/i915_request.c
> > index d67710d10615..6496671a113c 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> > @@ -2805,7 +2805,7 @@ static int p_sync0(void *arg)
> >                 i915_request_add(rq);
> >   
> >                 err = 0;
> > -               if (i915_request_wait(rq, 0, HZ / 5) < 0)
> > +               if (i915_request_wait(rq, 0, HZ) < 0)
> >                         err = -ETIME;
> >                 i915_request_put(rq);
> >                 if (err)
> > @@ -2876,7 +2876,7 @@ static int p_sync1(void *arg)
> >                 i915_request_add(rq);
> >   
> >                 err = 0;
> > -               if (prev && i915_request_wait(prev, 0, HZ / 5) < 0)
> > +               if (prev && i915_request_wait(prev, 0, HZ) < 0)
> >                         err = -ETIME;
> >                 i915_request_put(prev);
> >                 prev = rq;
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-21  5:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-11 17:57 [PATCH] drm/i915/selftests: Increase timeout in requests perf selftest Matthew Brost
2021-10-11 17:57 ` [Intel-gfx] " Matthew Brost
2021-10-11 19:10 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for " Patchwork
2021-10-20 20:34 ` [Intel-gfx] [PATCH] " John Harrison
2021-10-21  5:36   ` Thomas Hellström

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.