From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> To: Chris Wilson <chris@chris-wilson.co.uk>, Tvrtko Ursulin <tursulin@ursulin.net>, igt-dev@lists.freedesktop.org Cc: Intel-gfx@lists.freedesktop.org Subject: Re: [igt-dev] [PATCH i-g-t 2/3] tests/gem_eio: Speed up test execution Date: Thu, 22 Mar 2018 12:36:58 +0000 [thread overview] Message-ID: <499c5ae6-900f-0958-149e-036ba0d9de7a@linux.intel.com> (raw) In-Reply-To: <152171875518.23562.12227602056261860847@mail.alporthouse.com> On 22/03/2018 11:39, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-03-22 11:17:11) >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> If we stop relying on regular GPU hangs to be detected, but trigger them >> manually as soon as we know our batch of interest is actually executing >> on the GPU, we can dramatically speed up various subtests. >> >> This is enabled by the pollable spin batch added in the previous patch. >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Antonio Argenziano <antonio.argenziano@intel.com> >> --- >> Note that the 'wait' subtest is mysteriously hanging for me in the no-op >> batch send by gem_test_engines, but only on RCS engine. TBD while I am >> getting some CI results. >> --- >> lib.tar | Bin 0 -> 102400 bytes >> tests/gem_eio.c | 97 ++++++++++++++++++++++++++++++++++++++++---------------- >> 2 files changed, 70 insertions(+), 27 deletions(-) >> create mode 100644 lib.tar >> >> diff --git a/lib.tar b/lib.tar >> new file mode 100644 >> index 0000000000000000000000000000000000000000..ea04fad219a87f2e975852989526f8da4c9b7d6d >> GIT binary patch >> literal 102400 >> zcmeHw>vkJQlBWMsPf=!{kwJ=gUAkMAJc3A2!kPrQASAWM<5LF&3M57#fW}3X+V;H9 > > Looks correct. Just backing up in the cloud. :)) >> diff --git a/tests/gem_eio.c b/tests/gem_eio.c >> index 4bcc5937db39..93400056124b 100644 >> --- a/tests/gem_eio.c >> +++ b/tests/gem_eio.c >> @@ -71,26 +71,23 @@ static void trigger_reset(int fd) >> gem_quiescent_gpu(fd); >> } >> >> -static void wedge_gpu(int fd) >> +static void manual_hang(int drm_fd) >> { >> - /* First idle the GPU then disable GPU resets before injecting a hang */ >> - gem_quiescent_gpu(fd); >> - >> - igt_require(i915_reset_control(false)); >> + int dir = igt_debugfs_dir(drm_fd); >> >> - igt_debug("Wedging GPU by injecting hang\n"); >> - igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT)); >> + igt_sysfs_set(dir, "i915_wedged", "-1"); >> >> - igt_assert(i915_reset_control(true)); >> + close(dir); >> } > > Ok. > >> -static void wedgeme(int drm_fd) >> +static void wedge_gpu(int fd) >> { >> - int dir = igt_debugfs_dir(drm_fd); >> - >> - igt_sysfs_set(dir, "i915_wedged", "-1"); >> + /* First idle the GPU then disable GPU resets before injecting a hang */ >> + gem_quiescent_gpu(fd); >> >> - close(dir); >> + igt_require(i915_reset_control(false)); >> + manual_hang(fd); >> + igt_assert(i915_reset_control(true)); >> } > > Ok. Well done reading that awful diff! >> >> static int __gem_throttle(int fd) >> @@ -149,29 +146,66 @@ static int __gem_wait(int fd, uint32_t handle, int64_t timeout) >> return err; >> } >> >> +static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags) >> +{ >> + if (gem_can_store_dword(fd, flags)) >> + return __igt_spin_batch_new_poll(fd, ctx, flags); >> + else >> + return __igt_spin_batch_new(fd, ctx, flags, 0); >> +} >> + >> +static void __spin_wait(int fd, igt_spin_t *spin) >> +{ >> + if (spin->running) { >> + while (!*((volatile bool *)spin->running)) >> + ; >> + } else { >> + igt_debug("__spin_wait - usleep mode\n"); >> + usleep(500e3); /* Better than nothing! */ >> + } >> +} >> + >> +/* >> + * Wedge the GPU when we know our batch is running. >> + */ >> +static void wedge_after_running(int fd, igt_spin_t *spin) >> +{ >> + __spin_wait(fd, spin); >> + manual_hang(fd); >> +} >> + >> static void test_wait(int fd) >> { >> - igt_hang_t hang; >> + struct timespec ts = { }; >> + igt_spin_t *hang; >> >> igt_require_gem(fd); >> >> + igt_nsec_elapsed(&ts); >> + >> /* If the request we wait on completes due to a hang (even for >> * that request), the user expects the return value to 0 (success). >> */ >> - hang = igt_hang_ring(fd, I915_EXEC_DEFAULT); >> - igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0); >> - igt_post_hang_ring(fd, hang); >> + igt_require(i915_reset_control(true)); >> + hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT); >> + wedge_after_running(fd, hang); >> + igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0); >> + igt_spin_batch_free(fd, hang); > >> >> /* If the GPU is wedged during the wait, again we expect the return >> * value to be 0 (success). >> */ >> igt_require(i915_reset_control(false)); >> - hang = igt_hang_ring(fd, I915_EXEC_DEFAULT); >> - igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0); >> - igt_post_hang_ring(fd, hang); >> + hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT); >> + wedge_after_running(fd, hang); >> + igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0); >> + igt_spin_batch_free(fd, hang); >> igt_require(i915_reset_control(true)); > > Hmm. These are not equivalent to the original test. The tests > requires hangcheck to kick in while the test is blocked on igt_wait. > To do a fast equivalent, we need to kick off a timer. (Here we are just > asking if a wait on an already completed request doesn't block, not how > we handle the reset in the middle of a wait. Seems a reasonable addition > though.) > > I think that's a general pattern worth repeating for the rest of tests: > don't immediately inject the hang, but leave it a few milliseconds to > allow us to block on the subsequent wait. I would even repeat the tests > a few times with different timeouts; 0, 1us, 10ms (thinking of the > different phases for i915_request_wait). True, it's not the same. Makes sense to test with different timeouts. Will do. > >> trigger_reset(fd); >> + >> + /* HACK for CI */ >> + igt_assert(igt_nsec_elapsed(&ts) < 5e9); > > igt_seconds_elapsed() the approximation is worth the readability. > > In this case you might like to try igt_set_timeout(), as I think each > subtest and exithandlers are in place to make them robust against > premature failures. Well this was just to see that will happen on the shards here. As mentioned in the commit I get that yet unexplained GPU hang at subtest exit here. So the assert above is just to notice if the same happens on shards. Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> To: Chris Wilson <chris@chris-wilson.co.uk>, Tvrtko Ursulin <tursulin@ursulin.net>, igt-dev@lists.freedesktop.org Cc: Intel-gfx@lists.freedesktop.org, Tvrtko Ursulin <tvrtko.ursulin@intel.com> Subject: Re: [igt-dev] [PATCH i-g-t 2/3] tests/gem_eio: Speed up test execution Date: Thu, 22 Mar 2018 12:36:58 +0000 [thread overview] Message-ID: <499c5ae6-900f-0958-149e-036ba0d9de7a@linux.intel.com> (raw) In-Reply-To: <152171875518.23562.12227602056261860847@mail.alporthouse.com> On 22/03/2018 11:39, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-03-22 11:17:11) >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> If we stop relying on regular GPU hangs to be detected, but trigger them >> manually as soon as we know our batch of interest is actually executing >> on the GPU, we can dramatically speed up various subtests. >> >> This is enabled by the pollable spin batch added in the previous patch. >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Antonio Argenziano <antonio.argenziano@intel.com> >> --- >> Note that the 'wait' subtest is mysteriously hanging for me in the no-op >> batch send by gem_test_engines, but only on RCS engine. TBD while I am >> getting some CI results. >> --- >> lib.tar | Bin 0 -> 102400 bytes >> tests/gem_eio.c | 97 ++++++++++++++++++++++++++++++++++++++++---------------- >> 2 files changed, 70 insertions(+), 27 deletions(-) >> create mode 100644 lib.tar >> >> diff --git a/lib.tar b/lib.tar >> new file mode 100644 >> index 0000000000000000000000000000000000000000..ea04fad219a87f2e975852989526f8da4c9b7d6d >> GIT binary patch >> literal 102400 >> zcmeHw>vkJQlBWMsPf=!{kwJ=gUAkMAJc3A2!kPrQASAWM<5LF&3M57#fW}3X+V;H9 > > Looks correct. Just backing up in the cloud. :)) >> diff --git a/tests/gem_eio.c b/tests/gem_eio.c >> index 4bcc5937db39..93400056124b 100644 >> --- a/tests/gem_eio.c >> +++ b/tests/gem_eio.c >> @@ -71,26 +71,23 @@ static void trigger_reset(int fd) >> gem_quiescent_gpu(fd); >> } >> >> -static void wedge_gpu(int fd) >> +static void manual_hang(int drm_fd) >> { >> - /* First idle the GPU then disable GPU resets before injecting a hang */ >> - gem_quiescent_gpu(fd); >> - >> - igt_require(i915_reset_control(false)); >> + int dir = igt_debugfs_dir(drm_fd); >> >> - igt_debug("Wedging GPU by injecting hang\n"); >> - igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT)); >> + igt_sysfs_set(dir, "i915_wedged", "-1"); >> >> - igt_assert(i915_reset_control(true)); >> + close(dir); >> } > > Ok. > >> -static void wedgeme(int drm_fd) >> +static void wedge_gpu(int fd) >> { >> - int dir = igt_debugfs_dir(drm_fd); >> - >> - igt_sysfs_set(dir, "i915_wedged", "-1"); >> + /* First idle the GPU then disable GPU resets before injecting a hang */ >> + gem_quiescent_gpu(fd); >> >> - close(dir); >> + igt_require(i915_reset_control(false)); >> + manual_hang(fd); >> + igt_assert(i915_reset_control(true)); >> } > > Ok. Well done reading that awful diff! >> >> static int __gem_throttle(int fd) >> @@ -149,29 +146,66 @@ static int __gem_wait(int fd, uint32_t handle, int64_t timeout) >> return err; >> } >> >> +static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags) >> +{ >> + if (gem_can_store_dword(fd, flags)) >> + return __igt_spin_batch_new_poll(fd, ctx, flags); >> + else >> + return __igt_spin_batch_new(fd, ctx, flags, 0); >> +} >> + >> +static void __spin_wait(int fd, igt_spin_t *spin) >> +{ >> + if (spin->running) { >> + while (!*((volatile bool *)spin->running)) >> + ; >> + } else { >> + igt_debug("__spin_wait - usleep mode\n"); >> + usleep(500e3); /* Better than nothing! */ >> + } >> +} >> + >> +/* >> + * Wedge the GPU when we know our batch is running. >> + */ >> +static void wedge_after_running(int fd, igt_spin_t *spin) >> +{ >> + __spin_wait(fd, spin); >> + manual_hang(fd); >> +} >> + >> static void test_wait(int fd) >> { >> - igt_hang_t hang; >> + struct timespec ts = { }; >> + igt_spin_t *hang; >> >> igt_require_gem(fd); >> >> + igt_nsec_elapsed(&ts); >> + >> /* If the request we wait on completes due to a hang (even for >> * that request), the user expects the return value to 0 (success). >> */ >> - hang = igt_hang_ring(fd, I915_EXEC_DEFAULT); >> - igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0); >> - igt_post_hang_ring(fd, hang); >> + igt_require(i915_reset_control(true)); >> + hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT); >> + wedge_after_running(fd, hang); >> + igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0); >> + igt_spin_batch_free(fd, hang); > >> >> /* If the GPU is wedged during the wait, again we expect the return >> * value to be 0 (success). >> */ >> igt_require(i915_reset_control(false)); >> - hang = igt_hang_ring(fd, I915_EXEC_DEFAULT); >> - igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0); >> - igt_post_hang_ring(fd, hang); >> + hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT); >> + wedge_after_running(fd, hang); >> + igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0); >> + igt_spin_batch_free(fd, hang); >> igt_require(i915_reset_control(true)); > > Hmm. These are not equivalent to the original test. The tests > requires hangcheck to kick in while the test is blocked on igt_wait. > To do a fast equivalent, we need to kick off a timer. (Here we are just > asking if a wait on an already completed request doesn't block, not how > we handle the reset in the middle of a wait. Seems a reasonable addition > though.) > > I think that's a general pattern worth repeating for the rest of tests: > don't immediately inject the hang, but leave it a few milliseconds to > allow us to block on the subsequent wait. I would even repeat the tests > a few times with different timeouts; 0, 1us, 10ms (thinking of the > different phases for i915_request_wait). True, it's not the same. Makes sense to test with different timeouts. Will do. > >> trigger_reset(fd); >> + >> + /* HACK for CI */ >> + igt_assert(igt_nsec_elapsed(&ts) < 5e9); > > igt_seconds_elapsed() the approximation is worth the readability. > > In this case you might like to try igt_set_timeout(), as I think each > subtest and exithandlers are in place to make them robust against > premature failures. Well this was just to see that will happen on the shards here. As mentioned in the commit I get that yet unexplained GPU hang at subtest exit here. So the assert above is just to notice if the same happens on shards. Regards, Tvrtko _______________________________________________ igt-dev mailing list igt-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/igt-dev
next prev parent reply other threads:[~2018-03-22 12:36 UTC|newest] Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-03-22 11:17 [PATCH i-g-t 1/3] lib/dummyload: Add pollable spin batch Tvrtko Ursulin 2018-03-22 11:17 ` [Intel-gfx] " Tvrtko Ursulin 2018-03-22 11:17 ` [PATCH i-g-t 2/3] tests/gem_eio: Speed up test execution Tvrtko Ursulin 2018-03-22 11:17 ` [igt-dev] " Tvrtko Ursulin 2018-03-22 11:27 ` [PATCH i-g-t v2 " Tvrtko Ursulin 2018-03-22 11:27 ` [igt-dev] " Tvrtko Ursulin 2018-03-22 11:39 ` [igt-dev] [PATCH i-g-t " Chris Wilson 2018-03-22 11:39 ` Chris Wilson 2018-03-22 12:36 ` Tvrtko Ursulin [this message] 2018-03-22 12:36 ` Tvrtko Ursulin 2018-03-22 12:42 ` Chris Wilson 2018-03-22 12:42 ` Chris Wilson 2018-03-22 17:32 ` Antonio Argenziano 2018-03-22 17:32 ` [igt-dev] [Intel-gfx] " Antonio Argenziano 2018-03-22 18:14 ` [igt-dev] " Chris Wilson 2018-03-22 18:14 ` [igt-dev] [Intel-gfx] " Chris Wilson 2018-03-22 22:25 ` [igt-dev] " Antonio Argenziano 2018-03-22 22:25 ` [Intel-gfx] " Antonio Argenziano 2018-03-23 9:49 ` Tvrtko Ursulin 2018-03-23 9:49 ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin 2018-03-22 11:17 ` [PATCH i-g-t 3/3] tests/perf_pmu: Improve accuracy by waiting on spinner to start Tvrtko Ursulin 2018-03-22 11:17 ` [igt-dev] " Tvrtko Ursulin 2018-03-22 11:25 ` [igt-dev] [PATCH i-g-t 1/3] lib/dummyload: Add pollable spin batch Chris Wilson 2018-03-22 11:25 ` [Intel-gfx] " Chris Wilson 2018-03-22 11:59 ` [igt-dev] ✓ Fi.CI.BAT: success for series starting with [i-g-t,1/3] lib/dummyload: Add pollable spin batch (rev2) Patchwork 2018-03-22 13:03 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork 2018-03-22 17:24 [PATCH i-g-t 1/3] lib/dummyload: Add pollable spin batch Tvrtko Ursulin 2018-03-22 17:24 ` [igt-dev] [PATCH i-g-t 2/3] tests/gem_eio: Speed up test execution Tvrtko Ursulin 2018-03-22 17:44 ` Chris Wilson 2018-03-22 17:44 ` Chris Wilson 2018-03-23 9:46 ` Tvrtko Ursulin 2018-03-23 9:54 ` Chris Wilson
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=499c5ae6-900f-0958-149e-036ba0d9de7a@linux.intel.com \ --to=tvrtko.ursulin@linux.intel.com \ --cc=Intel-gfx@lists.freedesktop.org \ --cc=chris@chris-wilson.co.uk \ --cc=igt-dev@lists.freedesktop.org \ --cc=tursulin@ursulin.net \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.