[PATCH i-g-t] tests/perf_pmu: Use absolute tolerance in accuracy tests

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH i-g-t] tests/perf_pmu: Use absolute tolerance in accuracy tests
@ 2018-03-07 11:11 ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-03-07 11:11 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We need to use absolute tolerance when asserting on percentages. Relative
tolerance in this case is unfair and inaccurate since it's strictness
varies with relative target busyness.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/perf_pmu.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 9ebffc64d1f1..8e547338b47c 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -1459,7 +1459,15 @@ static void __rearm_spin_batch(igt_spin_t *spin)
        __sync_synchronize();
 }
 
-#define div_round_up(a, b) (((a) + (b) - 1) / (b))
+#define __assert_within(x, ref, tol_up, tol_down) \
+	igt_assert_f((double)(x) <= ((double)(ref) + (tol_up)) && \
+		     (double)(x) >= ((double)(ref) - (tol_down)), \
+		     "%f not within +%f/-%f of %f! ('%s' vs '%s')\n", \
+		     (double)(x), (double)(tol_up), (double)(tol_down), \
+		     (double)(ref), #x, #ref)
+
+#define assert_within(x, ref, tolerance) \
+	__assert_within(x, ref, tolerance, tolerance)
 
 static void
 accuracy(int gem_fd, const struct intel_execution_engine2 *e,
@@ -1571,7 +1579,7 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 
 	/* Let the child run. */
 	read(link[0], &expected, sizeof(expected));
-	assert_within_epsilon(expected, target_busy_pct/100., 0.05);
+	assert_within(100.0 * expected, target_busy_pct, 5);
 
 	/* Collect engine busyness for an interesting part of child runtime. */
 	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
@@ -1590,8 +1598,11 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 	igt_info("error=%.2f%% (%.2f%% vs %.2f%%)\n",
 		 __error(busy_r, expected), 100 * busy_r, 100 * expected);
 
-	assert_within_epsilon(busy_r, expected, 0.15);
-	assert_within_epsilon(1 - busy_r, 1 - expected, 0.15);
+	busy_r *= 100.0;
+	expected *= 100.0;
+
+	assert_within(busy_r, expected, 2);
+	assert_within(100.0 - busy_r, 100.0 - expected, 2);
 }
 
 igt_main
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [igt-dev] [PATCH i-g-t] tests/perf_pmu: Use absolute tolerance in accuracy tests
@ 2018-03-07 11:11 ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-03-07 11:11 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We need to use absolute tolerance when asserting on percentages. Relative
tolerance in this case is unfair and inaccurate since it's strictness
varies with relative target busyness.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/perf_pmu.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 9ebffc64d1f1..8e547338b47c 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -1459,7 +1459,15 @@ static void __rearm_spin_batch(igt_spin_t *spin)
        __sync_synchronize();
 }
 
-#define div_round_up(a, b) (((a) + (b) - 1) / (b))
+#define __assert_within(x, ref, tol_up, tol_down) \
+	igt_assert_f((double)(x) <= ((double)(ref) + (tol_up)) && \
+		     (double)(x) >= ((double)(ref) - (tol_down)), \
+		     "%f not within +%f/-%f of %f! ('%s' vs '%s')\n", \
+		     (double)(x), (double)(tol_up), (double)(tol_down), \
+		     (double)(ref), #x, #ref)
+
+#define assert_within(x, ref, tolerance) \
+	__assert_within(x, ref, tolerance, tolerance)
 
 static void
 accuracy(int gem_fd, const struct intel_execution_engine2 *e,
@@ -1571,7 +1579,7 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 
 	/* Let the child run. */
 	read(link[0], &expected, sizeof(expected));
-	assert_within_epsilon(expected, target_busy_pct/100., 0.05);
+	assert_within(100.0 * expected, target_busy_pct, 5);
 
 	/* Collect engine busyness for an interesting part of child runtime. */
 	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
@@ -1590,8 +1598,11 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 	igt_info("error=%.2f%% (%.2f%% vs %.2f%%)\n",
 		 __error(busy_r, expected), 100 * busy_r, 100 * expected);
 
-	assert_within_epsilon(busy_r, expected, 0.15);
-	assert_within_epsilon(1 - busy_r, 1 - expected, 0.15);
+	busy_r *= 100.0;
+	expected *= 100.0;
+
+	assert_within(busy_r, expected, 2);
+	assert_within(100.0 - busy_r, 100.0 - expected, 2);
 }
 
 igt_main
-- 
2.14.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH i-g-t] tests/perf_pmu: Use absolute tolerance in accuracy tests
  2018-03-07 11:11 ` [igt-dev] " Tvrtko Ursulin
@ 2018-03-07 11:34   ` Chris Wilson
  -1 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2018-03-07 11:34 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx

Quoting Tvrtko Ursulin (2018-03-07 11:11:19)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We need to use absolute tolerance when asserting on percentages. Relative
> tolerance in this case is unfair and inaccurate since it's strictness
> varies with relative target busyness.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  tests/perf_pmu.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index 9ebffc64d1f1..8e547338b47c 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -1459,7 +1459,15 @@ static void __rearm_spin_batch(igt_spin_t *spin)
>         __sync_synchronize();
>  }
>  
> -#define div_round_up(a, b) (((a) + (b) - 1) / (b))
> +#define __assert_within(x, ref, tol_up, tol_down) \
> +       igt_assert_f((double)(x) <= ((double)(ref) + (tol_up)) && \
> +                    (double)(x) >= ((double)(ref) - (tol_down)), \
> +                    "%f not within +%f/-%f of %f! ('%s' vs '%s')\n", \
> +                    (double)(x), (double)(tol_up), (double)(tol_down), \
> +                    (double)(ref), #x, #ref)
> +
> +#define assert_within(x, ref, tolerance) \
> +       __assert_within(x, ref, tolerance, tolerance)
>  
>  static void
>  accuracy(int gem_fd, const struct intel_execution_engine2 *e,
> @@ -1571,7 +1579,7 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
>  
>         /* Let the child run. */
>         read(link[0], &expected, sizeof(expected));
> -       assert_within_epsilon(expected, target_busy_pct/100., 0.05);
> +       assert_within(100.0 * expected, target_busy_pct, 5);
>  
>         /* Collect engine busyness for an interesting part of child runtime. */
>         fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
> @@ -1590,8 +1598,11 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
>         igt_info("error=%.2f%% (%.2f%% vs %.2f%%)\n",
>                  __error(busy_r, expected), 100 * busy_r, 100 * expected);
>  
> -       assert_within_epsilon(busy_r, expected, 0.15);
> -       assert_within_epsilon(1 - busy_r, 1 - expected, 0.15);
> +       busy_r *= 100.0;
> +       expected *= 100.0;
> +
> +       assert_within(busy_r, expected, 2);
> +       assert_within(100.0 - busy_r, 100.0 - expected, 2);

The advantage of switching to absolute here is that we only need the
single test. Ok, using a factor of 100 here should make the output more
readable.

Kill the extra assert_within,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

But I suspect we may need to relax the target for kasan, we will see in
a few days.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [igt-dev] [PATCH i-g-t] tests/perf_pmu: Use absolute tolerance in accuracy tests
@ 2018-03-07 11:34   ` Chris Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2018-03-07 11:34 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-03-07 11:11:19)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We need to use absolute tolerance when asserting on percentages. Relative
> tolerance in this case is unfair and inaccurate since it's strictness
> varies with relative target busyness.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  tests/perf_pmu.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index 9ebffc64d1f1..8e547338b47c 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -1459,7 +1459,15 @@ static void __rearm_spin_batch(igt_spin_t *spin)
>         __sync_synchronize();
>  }
>  
> -#define div_round_up(a, b) (((a) + (b) - 1) / (b))
> +#define __assert_within(x, ref, tol_up, tol_down) \
> +       igt_assert_f((double)(x) <= ((double)(ref) + (tol_up)) && \
> +                    (double)(x) >= ((double)(ref) - (tol_down)), \
> +                    "%f not within +%f/-%f of %f! ('%s' vs '%s')\n", \
> +                    (double)(x), (double)(tol_up), (double)(tol_down), \
> +                    (double)(ref), #x, #ref)
> +
> +#define assert_within(x, ref, tolerance) \
> +       __assert_within(x, ref, tolerance, tolerance)
>  
>  static void
>  accuracy(int gem_fd, const struct intel_execution_engine2 *e,
> @@ -1571,7 +1579,7 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
>  
>         /* Let the child run. */
>         read(link[0], &expected, sizeof(expected));
> -       assert_within_epsilon(expected, target_busy_pct/100., 0.05);
> +       assert_within(100.0 * expected, target_busy_pct, 5);
>  
>         /* Collect engine busyness for an interesting part of child runtime. */
>         fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
> @@ -1590,8 +1598,11 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
>         igt_info("error=%.2f%% (%.2f%% vs %.2f%%)\n",
>                  __error(busy_r, expected), 100 * busy_r, 100 * expected);
>  
> -       assert_within_epsilon(busy_r, expected, 0.15);
> -       assert_within_epsilon(1 - busy_r, 1 - expected, 0.15);
> +       busy_r *= 100.0;
> +       expected *= 100.0;
> +
> +       assert_within(busy_r, expected, 2);
> +       assert_within(100.0 - busy_r, 100.0 - expected, 2);

The advantage of switching to absolute here is that we only need the
single test. Ok, using a factor of 100 here should make the output more
readable.

Kill the extra assert_within,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

But I suspect we may need to relax the target for kasan, we will see in
a few days.
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [igt-dev] ✓ Fi.CI.BAT: success for tests/perf_pmu: Use absolute tolerance in accuracy tests
  2018-03-07 11:11 ` [igt-dev] " Tvrtko Ursulin
  (?)
  (?)
@ 2018-03-07 11:48 ` Patchwork
  -1 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2018-03-07 11:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

== Series Details ==

Series: tests/perf_pmu: Use absolute tolerance in accuracy tests
URL   : https://patchwork.freedesktop.org/series/39518/
State : success

== Summary ==

IGT patchset tested on top of latest successful build
b4689dce36d0fbd9aec70d5a4b077c43a6b9c254 igt: Remove gen7_forcewake_mt

with latest DRM-Tip kernel build CI_DRM_3885
8a4eb4556f66 drm-tip: 2018y-03m-06d-22h-59m-29s UTC integration manifest

No testlist changes.

---- Known issues:

Test debugfs_test:
        Subgroup read_all_entries:
                pass       -> INCOMPLETE (fi-snb-2520m) fdo#103713
Test prime_vgem:
        Subgroup basic-fence-flip:
                fail       -> PASS       (fi-ilk-650) fdo#104008

fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:423s
fi-bdw-gvtdvm    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:426s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:374s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:507s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:280s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:496s
fi-byt-j1900     total:288  pass:253  dwarn:0   dfail:0   fail:0   skip:35  time:485s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:477s
fi-cfl-8700k     total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:404s
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:580s
fi-elk-e7500     total:288  pass:229  dwarn:0   dfail:0   fail:0   skip:59  time:420s
fi-gdg-551       total:288  pass:179  dwarn:0   dfail:0   fail:1   skip:108 time:290s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:520s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:398s
fi-ilk-650       total:288  pass:228  dwarn:0   dfail:0   fail:0   skip:60  time:415s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:458s
fi-ivb-3770      total:288  pass:255  dwarn:0   dfail:0   fail:0   skip:33  time:424s
fi-kbl-7500u     total:288  pass:263  dwarn:1   dfail:0   fail:0   skip:24  time:471s
fi-kbl-7560u     total:108  pass:104  dwarn:0   dfail:0   fail:0   skip:3  
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:459s
fi-kbl-r         total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:510s
fi-pnv-d510      total:288  pass:222  dwarn:1   dfail:0   fail:0   skip:65  time:588s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:434s
fi-skl-6600u     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:527s
fi-skl-6700hq    total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:537s
fi-skl-6700k2    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:498s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:481s
fi-skl-guc       total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:422s
fi-skl-gvtdvm    total:288  pass:265  dwarn:0   dfail:0   fail:0   skip:23  time:434s
fi-snb-2520m     total:3    pass:2    dwarn:0   dfail:0   fail:0   skip:0  
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:399s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1074/issues.html
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [igt-dev] ✗ Fi.CI.IGT: warning for tests/perf_pmu: Use absolute tolerance in accuracy tests
  2018-03-07 11:11 ` [igt-dev] " Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  (?)
@ 2018-03-07 12:35 ` Patchwork
  2018-03-07 14:27   ` Tvrtko Ursulin
  -1 siblings, 1 reply; 16+ messages in thread
From: Patchwork @ 2018-03-07 12:35 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

== Series Details ==

Series: tests/perf_pmu: Use absolute tolerance in accuracy tests
URL   : https://patchwork.freedesktop.org/series/39518/
State : warning

== Summary ==

---- Possible new issues:

Test gem_linear_blits:
        Subgroup normal:
                pass       -> SKIP       (shard-apl)

---- Known issues:

Test gem_softpin:
        Subgroup noreloc-s3:
                skip       -> PASS       (shard-snb) fdo#103375
Test kms_chv_cursor_fail:
        Subgroup pipe-b-128x128-right-edge:
                pass       -> DMESG-WARN (shard-snb) fdo#105185 +4
Test kms_flip:
        Subgroup 2x-flip-vs-blocking-wf-vblank:
                pass       -> FAIL       (shard-hsw) fdo#100368 +2
Test kms_rotation_crc:
        Subgroup primary-rotation-180:
                fail       -> PASS       (shard-snb) fdo#103925
Test perf:
        Subgroup blocking:
                fail       -> PASS       (shard-hsw) fdo#102252 +1
Test pm_lpsp:
        Subgroup screens-disabled:
                fail       -> PASS       (shard-hsw) fdo#104941

fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
fdo#105185 https://bugs.freedesktop.org/show_bug.cgi?id=105185
fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
fdo#103925 https://bugs.freedesktop.org/show_bug.cgi?id=103925
fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
fdo#104941 https://bugs.freedesktop.org/show_bug.cgi?id=104941

shard-apl        total:3467 pass:1822 dwarn:1   dfail:0   fail:10  skip:1633 time:12344s
shard-hsw        total:3467 pass:1772 dwarn:1   dfail:0   fail:2   skip:1691 time:11981s
shard-snb        total:3467 pass:1363 dwarn:3   dfail:0   fail:1   skip:2100 time:7138s
Blacklisted hosts:
shard-kbl        total:3365 pass:1880 dwarn:1   dfail:0   fail:9   skip:1472 time:8609s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1074/shards.html
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [igt-dev] ✗ Fi.CI.IGT: warning for tests/perf_pmu: Use absolute tolerance in accuracy tests
  2018-03-07 12:35 ` [igt-dev] ✗ Fi.CI.IGT: warning " Patchwork
@ 2018-03-07 14:27   ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-03-07 14:27 UTC (permalink / raw)
  To: igt-dev, Patchwork, Tvrtko Ursulin


On 07/03/2018 12:35, Patchwork wrote:
> == Series Details ==
> 
> Series: tests/perf_pmu: Use absolute tolerance in accuracy tests
> URL   : https://patchwork.freedesktop.org/series/39518/
> State : warning
> 
> == Summary ==
> 
> ---- Possible new issues:
> 
> Test gem_linear_blits:
>          Subgroup normal:
>                  pass       -> SKIP       (shard-apl)
> 
> ---- Known issues:
> 
> Test gem_softpin:
>          Subgroup noreloc-s3:
>                  skip       -> PASS       (shard-snb) fdo#103375
> Test kms_chv_cursor_fail:
>          Subgroup pipe-b-128x128-right-edge:
>                  pass       -> DMESG-WARN (shard-snb) fdo#105185 +4
> Test kms_flip:
>          Subgroup 2x-flip-vs-blocking-wf-vblank:
>                  pass       -> FAIL       (shard-hsw) fdo#100368 +2
> Test kms_rotation_crc:
>          Subgroup primary-rotation-180:
>                  fail       -> PASS       (shard-snb) fdo#103925
> Test perf:
>          Subgroup blocking:
>                  fail       -> PASS       (shard-hsw) fdo#102252 +1
> Test pm_lpsp:
>          Subgroup screens-disabled:
>                  fail       -> PASS       (shard-hsw) fdo#104941

It doesn't say here but 50% tests are failing on shards. By the look of 
it around 3-4% off the target. So strange.. I need to figure out why 
would 50% be so much worse than 2% and 98% loads.

Regards,

Tvrtko

> fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
> fdo#105185 https://bugs.freedesktop.org/show_bug.cgi?id=105185
> fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
> fdo#103925 https://bugs.freedesktop.org/show_bug.cgi?id=103925
> fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
> fdo#104941 https://bugs.freedesktop.org/show_bug.cgi?id=104941
> 
> shard-apl        total:3467 pass:1822 dwarn:1   dfail:0   fail:10  skip:1633 time:12344s
> shard-hsw        total:3467 pass:1772 dwarn:1   dfail:0   fail:2   skip:1691 time:11981s
> shard-snb        total:3467 pass:1363 dwarn:3   dfail:0   fail:1   skip:2100 time:7138s
> Blacklisted hosts:
> shard-kbl        total:3365 pass:1880 dwarn:1   dfail:0   fail:9   skip:1472 time:8609s
> 
> == Logs ==
> 
> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1074/shards.html
> _______________________________________________
> igt-dev mailing list
> igt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/igt-dev
> 
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH i-g-t v2] tests/perf_pmu: Use absolute tolerance in accuracy tests
  2018-03-07 11:34   ` [igt-dev] " Chris Wilson
@ 2018-03-09 11:54     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-03-09 11:54 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We need to use absolute tolerance when asserting on percentages. Relative
tolerance in this case is unfair and inaccurate since it's strictness
varies with relative target busyness.

v2:
 * Do not include spin batch edit and submit into measured time.
 * Open PMU before child is in test PWM phase.
 * No need to emit test PWM for twice as long with the new explicit
   synchroniazation via pipe.
 * Log test duration in ms for better readability.
 * Drop inverse assert. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v1
---
 tests/perf_pmu.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 9ebffc64d1f1..ff9f71540ee4 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -1459,7 +1459,15 @@ static void __rearm_spin_batch(igt_spin_t *spin)
        __sync_synchronize();
 }
 
-#define div_round_up(a, b) (((a) + (b) - 1) / (b))
+#define __assert_within(x, ref, tol_up, tol_down) \
+	igt_assert_f((double)(x) <= ((double)(ref) + (tol_up)) && \
+		     (double)(x) >= ((double)(ref) - (tol_down)), \
+		     "%f not within +%f/-%f of %f! ('%s' vs '%s')\n", \
+		     (double)(x), (double)(tol_up), (double)(tol_down), \
+		     (double)(ref), #x, #ref)
+
+#define assert_within(x, ref, tolerance) \
+	__assert_within(x, ref, tolerance, tolerance)
 
 static void
 accuracy(int gem_fd, const struct intel_execution_engine2 *e,
@@ -1493,8 +1501,8 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 	while (test_us < min_test_us)
 		test_us += busy_us + idle_us;
 
-	igt_info("calibration=%luus, test=%luus; ratio=%.2f%% (%luus/%luus)\n",
-		 pwm_calibration_us, test_us,
+	igt_info("calibration=%lums, test=%lums; ratio=%.2f%% (%luus/%luus)\n",
+		 pwm_calibration_us / 1000, test_us / 1000,
 		 (double)busy_us / (busy_us + idle_us) * 100.0,
 		 busy_us, idle_us);
 
@@ -1507,7 +1515,7 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 	igt_fork(child, 1) {
 		struct sched_param rt = { .sched_priority = 99 };
 		const unsigned long timeout[] = {
-			pwm_calibration_us * 1000, test_us * 2 * 1000
+			pwm_calibration_us * 1000, test_us * 1000
 		};
 		struct drm_i915_gem_exec_object2 obj = {};
 		uint64_t total_busy_ns = 0, total_idle_ns = 0;
@@ -1537,19 +1545,16 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 
 			igt_nsec_elapsed(&test_start);
 			do {
-				struct timespec t_busy = { };
-				unsigned int target_idle_us;
-
-				igt_nsec_elapsed(&t_busy);
+				unsigned int target_idle_us, t_busy;
 
 				/* Restart the spinbatch. */
 				__rearm_spin_batch(spin);
 				__submit_spin_batch(gem_fd, &obj, e);
-				measured_usleep(busy_us);
+				t_busy = measured_usleep(busy_us);
 				igt_spin_batch_end(spin);
 				gem_sync(gem_fd, obj.handle);
 
-				total_busy_ns += igt_nsec_elapsed(&t_busy);
+				total_busy_ns += t_busy;
 
 				target_idle_us =
 					(100 * total_busy_ns / target_busy_pct - (total_busy_ns + total_idle_ns)) / 1000;
@@ -1569,12 +1574,13 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 		igt_spin_batch_free(gem_fd, spin);
 	}
 
+	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
+
 	/* Let the child run. */
 	read(link[0], &expected, sizeof(expected));
-	assert_within_epsilon(expected, target_busy_pct/100., 0.05);
+	assert_within(100.0 * expected, target_busy_pct, 5);
 
 	/* Collect engine busyness for an interesting part of child runtime. */
-	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
 	val[0] = __pmu_read_single(fd, &ts[0]);
 	read(link[0], &expected, sizeof(expected));
 	val[1] = __pmu_read_single(fd, &ts[1]);
@@ -1590,8 +1596,7 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 	igt_info("error=%.2f%% (%.2f%% vs %.2f%%)\n",
 		 __error(busy_r, expected), 100 * busy_r, 100 * expected);
 
-	assert_within_epsilon(busy_r, expected, 0.15);
-	assert_within_epsilon(1 - busy_r, 1 - expected, 0.15);
+	assert_within(100.0 * busy_r, 100.0 * expected, 2);
 }
 
 igt_main
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Intel-gfx] [PATCH i-g-t v2] tests/perf_pmu: Use absolute tolerance in accuracy tests
@ 2018-03-09 11:54     ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-03-09 11:54 UTC (permalink / raw)
  To: igt-dev; +Cc: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We need to use absolute tolerance when asserting on percentages. Relative
tolerance in this case is unfair and inaccurate since it's strictness
varies with relative target busyness.

v2:
 * Do not include spin batch edit and submit into measured time.
 * Open PMU before child is in test PWM phase.
 * No need to emit test PWM for twice as long with the new explicit
   synchroniazation via pipe.
 * Log test duration in ms for better readability.
 * Drop inverse assert. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v1
---
 tests/perf_pmu.c | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 9ebffc64d1f1..ff9f71540ee4 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -1459,7 +1459,15 @@ static void __rearm_spin_batch(igt_spin_t *spin)
        __sync_synchronize();
 }
 
-#define div_round_up(a, b) (((a) + (b) - 1) / (b))
+#define __assert_within(x, ref, tol_up, tol_down) \
+	igt_assert_f((double)(x) <= ((double)(ref) + (tol_up)) && \
+		     (double)(x) >= ((double)(ref) - (tol_down)), \
+		     "%f not within +%f/-%f of %f! ('%s' vs '%s')\n", \
+		     (double)(x), (double)(tol_up), (double)(tol_down), \
+		     (double)(ref), #x, #ref)
+
+#define assert_within(x, ref, tolerance) \
+	__assert_within(x, ref, tolerance, tolerance)
 
 static void
 accuracy(int gem_fd, const struct intel_execution_engine2 *e,
@@ -1493,8 +1501,8 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 	while (test_us < min_test_us)
 		test_us += busy_us + idle_us;
 
-	igt_info("calibration=%luus, test=%luus; ratio=%.2f%% (%luus/%luus)\n",
-		 pwm_calibration_us, test_us,
+	igt_info("calibration=%lums, test=%lums; ratio=%.2f%% (%luus/%luus)\n",
+		 pwm_calibration_us / 1000, test_us / 1000,
 		 (double)busy_us / (busy_us + idle_us) * 100.0,
 		 busy_us, idle_us);
 
@@ -1507,7 +1515,7 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 	igt_fork(child, 1) {
 		struct sched_param rt = { .sched_priority = 99 };
 		const unsigned long timeout[] = {
-			pwm_calibration_us * 1000, test_us * 2 * 1000
+			pwm_calibration_us * 1000, test_us * 1000
 		};
 		struct drm_i915_gem_exec_object2 obj = {};
 		uint64_t total_busy_ns = 0, total_idle_ns = 0;
@@ -1537,19 +1545,16 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 
 			igt_nsec_elapsed(&test_start);
 			do {
-				struct timespec t_busy = { };
-				unsigned int target_idle_us;
-
-				igt_nsec_elapsed(&t_busy);
+				unsigned int target_idle_us, t_busy;
 
 				/* Restart the spinbatch. */
 				__rearm_spin_batch(spin);
 				__submit_spin_batch(gem_fd, &obj, e);
-				measured_usleep(busy_us);
+				t_busy = measured_usleep(busy_us);
 				igt_spin_batch_end(spin);
 				gem_sync(gem_fd, obj.handle);
 
-				total_busy_ns += igt_nsec_elapsed(&t_busy);
+				total_busy_ns += t_busy;
 
 				target_idle_us =
 					(100 * total_busy_ns / target_busy_pct - (total_busy_ns + total_idle_ns)) / 1000;
@@ -1569,12 +1574,13 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 		igt_spin_batch_free(gem_fd, spin);
 	}
 
+	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
+
 	/* Let the child run. */
 	read(link[0], &expected, sizeof(expected));
-	assert_within_epsilon(expected, target_busy_pct/100., 0.05);
+	assert_within(100.0 * expected, target_busy_pct, 5);
 
 	/* Collect engine busyness for an interesting part of child runtime. */
-	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
 	val[0] = __pmu_read_single(fd, &ts[0]);
 	read(link[0], &expected, sizeof(expected));
 	val[1] = __pmu_read_single(fd, &ts[1]);
@@ -1590,8 +1596,7 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
 	igt_info("error=%.2f%% (%.2f%% vs %.2f%%)\n",
 		 __error(busy_r, expected), 100 * busy_r, 100 * expected);
 
-	assert_within_epsilon(busy_r, expected, 0.15);
-	assert_within_epsilon(1 - busy_r, 1 - expected, 0.15);
+	assert_within(100.0 * busy_r, 100.0 * expected, 2);
 }
 
 igt_main
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [igt-dev] ✓ Fi.CI.BAT: success for tests/perf_pmu: Use absolute tolerance in accuracy tests (rev2)
  2018-03-07 11:11 ` [igt-dev] " Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  (?)
@ 2018-03-09 12:48 ` Patchwork
  -1 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2018-03-09 12:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

== Series Details ==

Series: tests/perf_pmu: Use absolute tolerance in accuracy tests (rev2)
URL   : https://patchwork.freedesktop.org/series/39518/
State : success

== Summary ==

IGT patchset tested on top of latest successful build
b4689dce36d0fbd9aec70d5a4b077c43a6b9c254 igt: Remove gen7_forcewake_mt

with latest DRM-Tip kernel build CI_DRM_3904
074e834cb3cc drm-tip: 2018y-03m-09d-10h-30m-56s UTC integration manifest

No testlist changes.

---- Known issues:

Test kms_frontbuffer_tracking:
        Subgroup basic:
                fail       -> PASS       (fi-cnl-y3) fdo#103167
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-a:
                dmesg-warn -> PASS       (fi-skl-6700k2) fdo#103191
Test prime_vgem:
        Subgroup basic-fence-flip:
                pass       -> FAIL       (fi-ilk-650) fdo#104008

fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:425s
fi-bdw-gvtdvm    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:429s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:373s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:504s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:279s
fi-bxt-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:493s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:490s
fi-byt-j1900     total:288  pass:253  dwarn:0   dfail:0   fail:0   skip:35  time:483s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:472s
fi-cfl-8700k     total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:406s
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:576s
fi-cnl-y3        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:582s
fi-elk-e7500     total:288  pass:229  dwarn:0   dfail:0   fail:0   skip:59  time:415s
fi-gdg-551       total:288  pass:179  dwarn:0   dfail:0   fail:1   skip:108 time:290s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:517s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:400s
fi-ilk-650       total:288  pass:227  dwarn:0   dfail:0   fail:1   skip:60  time:414s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:464s
fi-ivb-3770      total:288  pass:255  dwarn:0   dfail:0   fail:0   skip:33  time:418s
fi-kbl-7500u     total:288  pass:263  dwarn:1   dfail:0   fail:0   skip:24  time:478s
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:469s
fi-kbl-r         total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:506s
fi-pnv-d510      total:288  pass:222  dwarn:1   dfail:0   fail:0   skip:65  time:593s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:430s
fi-skl-6600u     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:523s
fi-skl-6700hq    total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:535s
fi-skl-6700k2    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:501s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:481s
fi-skl-guc       total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:423s
fi-snb-2520m     total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:515s
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:391s
Blacklisted hosts:
fi-cnl-drrs      total:288  pass:257  dwarn:3   dfail:0   fail:0   skip:19  time:521s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1094/issues.html
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [igt-dev] ✗ Fi.CI.IGT: failure for tests/perf_pmu: Use absolute tolerance in accuracy tests (rev2)
  2018-03-07 11:11 ` [igt-dev] " Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  (?)
@ 2018-03-09 16:20 ` Patchwork
  2018-03-09 16:45   ` Tvrtko Ursulin
  -1 siblings, 1 reply; 16+ messages in thread
From: Patchwork @ 2018-03-09 16:20 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

== Series Details ==

Series: tests/perf_pmu: Use absolute tolerance in accuracy tests (rev2)
URL   : https://patchwork.freedesktop.org/series/39518/
State : failure

== Summary ==

---- Possible new issues:

Test kms_cursor_crc:
        Subgroup cursor-64x64-random:
                pass       -> INCOMPLETE (shard-hsw)

---- Known issues:

Test gem_eio:
        Subgroup in-flight:
                incomplete -> PASS       (shard-apl) fdo#105341
Test kms_cursor_crc:
        Subgroup cursor-256x256-suspend:
                incomplete -> PASS       (shard-hsw) fdo#103375 +1
Test kms_vblank:
        Subgroup pipe-a-ts-continuation-dpms-suspend:
                incomplete -> PASS       (shard-hsw) fdo#103540

fdo#105341 https://bugs.freedesktop.org/show_bug.cgi?id=105341
fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
fdo#103540 https://bugs.freedesktop.org/show_bug.cgi?id=103540

shard-apl        total:3467 pass:1825 dwarn:1   dfail:0   fail:8   skip:1632 time:12322s
shard-hsw        total:3421 pass:1754 dwarn:1   dfail:0   fail:1   skip:1663 time:10826s
shard-snb        total:3467 pass:1364 dwarn:1   dfail:0   fail:2   skip:2100 time:7013s
Blacklisted hosts:
shard-kbl        total:3281 pass:1838 dwarn:9   dfail:0   fail:9   skip:1423 time:8774s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1094/shards.html
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [igt-dev] ✗ Fi.CI.IGT: failure for tests/perf_pmu: Use absolute tolerance in accuracy tests (rev2)
  2018-03-09 16:20 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork
@ 2018-03-09 16:45   ` Tvrtko Ursulin
  2018-03-09 17:09     ` Chris Wilson
  0 siblings, 1 reply; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-03-09 16:45 UTC (permalink / raw)
  To: igt-dev, Patchwork, Tvrtko Ursulin


Much better accuracy with these tweaks.

Looks like WC writes and ioctls were slow and were affecting the 
self-calibration. Although I don't have the explanation on why were the 
50% tests most affected, especially compared with 2% ones. Shrug.

Will it be good enough for KASAN enabled runs I don't know.

Regards,

Tvrtko

On 09/03/2018 16:20, Patchwork wrote:
> == Series Details ==
> 
> Series: tests/perf_pmu: Use absolute tolerance in accuracy tests (rev2)
> URL   : https://patchwork.freedesktop.org/series/39518/
> State : failure
> 
> == Summary ==
> 
> ---- Possible new issues:
> 
> Test kms_cursor_crc:
>          Subgroup cursor-64x64-random:
>                  pass       -> INCOMPLETE (shard-hsw)
> 
> ---- Known issues:
> 
> Test gem_eio:
>          Subgroup in-flight:
>                  incomplete -> PASS       (shard-apl) fdo#105341
> Test kms_cursor_crc:
>          Subgroup cursor-256x256-suspend:
>                  incomplete -> PASS       (shard-hsw) fdo#103375 +1
> Test kms_vblank:
>          Subgroup pipe-a-ts-continuation-dpms-suspend:
>                  incomplete -> PASS       (shard-hsw) fdo#103540
> 
> fdo#105341 https://bugs.freedesktop.org/show_bug.cgi?id=105341
> fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
> fdo#103540 https://bugs.freedesktop.org/show_bug.cgi?id=103540
> 
> shard-apl        total:3467 pass:1825 dwarn:1   dfail:0   fail:8   skip:1632 time:12322s
> shard-hsw        total:3421 pass:1754 dwarn:1   dfail:0   fail:1   skip:1663 time:10826s
> shard-snb        total:3467 pass:1364 dwarn:1   dfail:0   fail:2   skip:2100 time:7013s
> Blacklisted hosts:
> shard-kbl        total:3281 pass:1838 dwarn:9   dfail:0   fail:9   skip:1423 time:8774s
> 
> == Logs ==
> 
> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1094/shards.html
> _______________________________________________
> igt-dev mailing list
> igt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/igt-dev
> 
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [igt-dev] ✗ Fi.CI.IGT: failure for tests/perf_pmu: Use absolute tolerance in accuracy tests (rev2)
  2018-03-09 16:45   ` Tvrtko Ursulin
@ 2018-03-09 17:09     ` Chris Wilson
  2018-03-09 17:37       ` Tvrtko Ursulin
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2018-03-09 17:09 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev, Patchwork, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-03-09 16:45:31)
> 
> Much better accuracy with these tweaks.
> 
> Looks like WC writes and ioctls were slow and were affecting the 
> self-calibration. Although I don't have the explanation on why were the 
> 50% tests most affected, especially compared with 2% ones. Shrug.

loop duration for 2%: 2500 + 122500 = 125000us
loop duration for 50%: 2500 + 2500 = 5000us

So 25x more ioctls at 50%.

Bound to be the ioctls, scary. We need to track down the cause of that.
A latency histogram just to see the distribution? kcov tracing for the
outliers? Though ftrace is probably better, if we assume that it's
likely outside forces (the path through i915 should be pretty static --
or is it???).

Something like capture an ftrace snippet for each ioctl (having
reproduced something that show the spikes or whatever); then throw away
all traces that lie within the normal distribution and look for patterns
in the outliers?

Hmm, my guess would be ksoftirqd. If the submit wasn't immediate then it
would only be run when the RT calibration thread slept (the submit will
be from the same cpu because it's a tasklet). Ho hum, that sounds very
plausible.
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [igt-dev] ✗ Fi.CI.IGT: failure for tests/perf_pmu: Use absolute tolerance in accuracy tests (rev2)
  2018-03-09 17:09     ` Chris Wilson
@ 2018-03-09 17:37       ` Tvrtko Ursulin
  0 siblings, 0 replies; 16+ messages in thread
From: Tvrtko Ursulin @ 2018-03-09 17:37 UTC (permalink / raw)
  To: Chris Wilson, igt-dev, Patchwork, Tvrtko Ursulin


On 09/03/2018 17:09, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-03-09 16:45:31)
>>
>> Much better accuracy with these tweaks.
>>
>> Looks like WC writes and ioctls were slow and were affecting the
>> self-calibration. Although I don't have the explanation on why were the
>> 50% tests most affected, especially compared with 2% ones. Shrug.
> 
> loop duration for 2%: 2500 + 122500 = 125000us
> loop duration for 50%: 2500 + 2500 = 5000us
> 
> So 25x more ioctls at 50%.
> 
> Bound to be the ioctls, scary. We need to track down the cause of that.
> A latency histogram just to see the distribution? kcov tracing for the
> outliers? Though ftrace is probably better, if we assume that it's
> likely outside forces (the path through i915 should be pretty static --
> or is it???).
> 
> Something like capture an ftrace snippet for each ioctl (having
> reproduced something that show the spikes or whatever); then throw away
> all traces that lie within the normal distribution and look for patterns
> in the outliers?
> 
> Hmm, my guess would be ksoftirqd. If the submit wasn't immediate then it
> would only be run when the RT calibration thread slept (the submit will
> be from the same cpu because it's a tasklet). Ho hum, that sounds very
> plausible.

Yes tasklet delays.. I even realized, well suspected, that while 
tweaking the test. By the time shard results came in it was long 
forgotten. :(

Huge difference in number of execbufs did not strike me though, well 
spotted. It is fewest towards the edges, and most in the middle. So it 
makes perfect sense. (98% comes up as 160000us busy, 3264us idle.)

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH i-g-t v2] tests/perf_pmu: Use absolute tolerance in accuracy tests
  2018-03-09 11:54     ` [Intel-gfx] " Tvrtko Ursulin
@ 2018-03-09 20:49       ` Chris Wilson
  -1 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2018-03-09 20:49 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx

Quoting Tvrtko Ursulin (2018-03-09 11:54:13)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We need to use absolute tolerance when asserting on percentages. Relative
> tolerance in this case is unfair and inaccurate since it's strictness
> varies with relative target busyness.
> 
> v2:
>  * Do not include spin batch edit and submit into measured time.
>  * Open PMU before child is in test PWM phase.
>  * No need to emit test PWM for twice as long with the new explicit
>    synchroniazation via pipe.
>  * Log test duration in ms for better readability.
>  * Drop inverse assert. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Would be nice to add a comment now we have a reasonable suspicion:

> @@ -1537,19 +1545,16 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
>  
>                         igt_nsec_elapsed(&test_start);
>                         do {
> -                               struct timespec t_busy = { };
> -                               unsigned int target_idle_us;
> -
> -                               igt_nsec_elapsed(&t_busy);
> +                               unsigned int target_idle_us, t_busy;
>  
>                                 /* Restart the spinbatch. */
>                                 __rearm_spin_batch(spin);
>                                 __submit_spin_batch(gem_fd, &obj, e);

/*
 * Note that the submission may be delayed to a tasklet (ksoftirqd)
 * which cannot run until we sleep as we hog the cpu (we are RT).
 */

> -                               measured_usleep(busy_us);
> +                               t_busy = measured_usleep(busy_us);
>                                 igt_spin_batch_end(spin);
>                                 gem_sync(gem_fd, obj.handle);

And back to thinking how we can kick the tasklet, or kick the habit.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [igt-dev] [PATCH i-g-t v2] tests/perf_pmu: Use absolute tolerance in accuracy tests
@ 2018-03-09 20:49       ` Chris Wilson
  0 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2018-03-09 20:49 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-03-09 11:54:13)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We need to use absolute tolerance when asserting on percentages. Relative
> tolerance in this case is unfair and inaccurate since it's strictness
> varies with relative target busyness.
> 
> v2:
>  * Do not include spin batch edit and submit into measured time.
>  * Open PMU before child is in test PWM phase.
>  * No need to emit test PWM for twice as long with the new explicit
>    synchroniazation via pipe.
>  * Log test duration in ms for better readability.
>  * Drop inverse assert. (Chris Wilson)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Would be nice to add a comment now we have a reasonable suspicion:

> @@ -1537,19 +1545,16 @@ accuracy(int gem_fd, const struct intel_execution_engine2 *e,
>  
>                         igt_nsec_elapsed(&test_start);
>                         do {
> -                               struct timespec t_busy = { };
> -                               unsigned int target_idle_us;
> -
> -                               igt_nsec_elapsed(&t_busy);
> +                               unsigned int target_idle_us, t_busy;
>  
>                                 /* Restart the spinbatch. */
>                                 __rearm_spin_batch(spin);
>                                 __submit_spin_batch(gem_fd, &obj, e);

/*
 * Note that the submission may be delayed to a tasklet (ksoftirqd)
 * which cannot run until we sleep as we hog the cpu (we are RT).
 */

> -                               measured_usleep(busy_us);
> +                               t_busy = measured_usleep(busy_us);
>                                 igt_spin_batch_end(spin);
>                                 gem_sync(gem_fd, obj.handle);

And back to thinking how we can kick the tasklet, or kick the habit.
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-03-09 20:49 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-07 11:11 [PATCH i-g-t] tests/perf_pmu: Use absolute tolerance in accuracy tests Tvrtko Ursulin
2018-03-07 11:11 ` [igt-dev] " Tvrtko Ursulin
2018-03-07 11:34 ` Chris Wilson
2018-03-07 11:34   ` [igt-dev] " Chris Wilson
2018-03-09 11:54   ` [PATCH i-g-t v2] " Tvrtko Ursulin
2018-03-09 11:54     ` [Intel-gfx] " Tvrtko Ursulin
2018-03-09 20:49     ` Chris Wilson
2018-03-09 20:49       ` [igt-dev] " Chris Wilson
2018-03-07 11:48 ` [igt-dev] ✓ Fi.CI.BAT: success for " Patchwork
2018-03-07 12:35 ` [igt-dev] ✗ Fi.CI.IGT: warning " Patchwork
2018-03-07 14:27   ` Tvrtko Ursulin
2018-03-09 12:48 ` [igt-dev] ✓ Fi.CI.BAT: success for tests/perf_pmu: Use absolute tolerance in accuracy tests (rev2) Patchwork
2018-03-09 16:20 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork
2018-03-09 16:45   ` Tvrtko Ursulin
2018-03-09 17:09     ` Chris Wilson
2018-03-09 17:37       ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.