All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Tvrtko Ursulin <tursulin@ursulin.net>,
	igt-dev@lists.freedesktop.org
Cc: Intel-gfx@lists.freedesktop.org
Subject: Re: [igt-dev] [PATCH i-g-t 2/3] tests/gem_eio: Speed up test execution
Date: Thu, 22 Mar 2018 12:36:58 +0000	[thread overview]
Message-ID: <499c5ae6-900f-0958-149e-036ba0d9de7a@linux.intel.com> (raw)
In-Reply-To: <152171875518.23562.12227602056261860847@mail.alporthouse.com>


On 22/03/2018 11:39, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-03-22 11:17:11)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> If we stop relying on regular GPU hangs to be detected, but trigger them
>> manually as soon as we know our batch of interest is actually executing
>> on the GPU, we can dramatically speed up various subtests.
>>
>> This is enabled by the pollable spin batch added in the previous patch.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Antonio Argenziano <antonio.argenziano@intel.com>
>> ---
>> Note that the 'wait' subtest is mysteriously hanging for me in the no-op
>> batch send by gem_test_engines, but only on RCS engine. TBD while I am
>> getting some CI results.
>> ---
>>   lib.tar         | Bin 0 -> 102400 bytes
>>   tests/gem_eio.c |  97 ++++++++++++++++++++++++++++++++++++++++----------------
>>   2 files changed, 70 insertions(+), 27 deletions(-)
>>   create mode 100644 lib.tar
>>
>> diff --git a/lib.tar b/lib.tar
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..ea04fad219a87f2e975852989526f8da4c9b7d6d
>> GIT binary patch
>> literal 102400
>> zcmeHw>vkJQlBWMsPf=!{kwJ=gUAkMAJc3A2!kPrQASAWM<5LF&3M57#fW}3X+V;H9
> 
> Looks correct.

Just backing up in the cloud. :))

>> diff --git a/tests/gem_eio.c b/tests/gem_eio.c
>> index 4bcc5937db39..93400056124b 100644
>> --- a/tests/gem_eio.c
>> +++ b/tests/gem_eio.c
>> @@ -71,26 +71,23 @@ static void trigger_reset(int fd)
>>          gem_quiescent_gpu(fd);
>>   }
>>   
>> -static void wedge_gpu(int fd)
>> +static void manual_hang(int drm_fd)
>>   {
>> -       /* First idle the GPU then disable GPU resets before injecting a hang */
>> -       gem_quiescent_gpu(fd);
>> -
>> -       igt_require(i915_reset_control(false));
>> +       int dir = igt_debugfs_dir(drm_fd);
>>   
>> -       igt_debug("Wedging GPU by injecting hang\n");
>> -       igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT));
>> +       igt_sysfs_set(dir, "i915_wedged", "-1");
>>   
>> -       igt_assert(i915_reset_control(true));
>> +       close(dir);
>>   }
> 
> Ok.
> 
>> -static void wedgeme(int drm_fd)
>> +static void wedge_gpu(int fd)
>>   {
>> -       int dir = igt_debugfs_dir(drm_fd);
>> -
>> -       igt_sysfs_set(dir, "i915_wedged", "-1");
>> +       /* First idle the GPU then disable GPU resets before injecting a hang */
>> +       gem_quiescent_gpu(fd);
>>   
>> -       close(dir);
>> +       igt_require(i915_reset_control(false));
>> +       manual_hang(fd);
>> +       igt_assert(i915_reset_control(true));
>>   }
> 
> Ok.

Well done reading that awful diff!

>>   
>>   static int __gem_throttle(int fd)
>> @@ -149,29 +146,66 @@ static int __gem_wait(int fd, uint32_t handle, int64_t timeout)
>>          return err;
>>   }
>>   
>> +static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
>> +{
>> +       if (gem_can_store_dword(fd, flags))
>> +               return __igt_spin_batch_new_poll(fd, ctx, flags);
>> +       else
>> +               return __igt_spin_batch_new(fd, ctx, flags, 0);
>> +}
>> +
>> +static void __spin_wait(int fd, igt_spin_t *spin)
>> +{
>> +       if (spin->running) {
>> +               while (!*((volatile bool *)spin->running))
>> +                       ;
>> +       } else {
>> +               igt_debug("__spin_wait - usleep mode\n");
>> +               usleep(500e3); /* Better than nothing! */
>> +       }
>> +}
>> +
>> +/*
>> + * Wedge the GPU when we know our batch is running.
>> + */
>> +static void wedge_after_running(int fd, igt_spin_t *spin)
>> +{
>> +       __spin_wait(fd, spin);
>> +       manual_hang(fd);
>> +}
>> +
>>   static void test_wait(int fd)
>>   {
>> -       igt_hang_t hang;
>> +       struct timespec ts = { };
>> +       igt_spin_t *hang;
>>   
>>          igt_require_gem(fd);
>>   
>> +       igt_nsec_elapsed(&ts);
>> +
>>          /* If the request we wait on completes due to a hang (even for
>>           * that request), the user expects the return value to 0 (success).
>>           */
>> -       hang = igt_hang_ring(fd, I915_EXEC_DEFAULT);
>> -       igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0);
>> -       igt_post_hang_ring(fd, hang);
>> +       igt_require(i915_reset_control(true));
>> +       hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT);
>> +       wedge_after_running(fd, hang);
>> +       igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0);
>> +       igt_spin_batch_free(fd, hang);
> 
>>   
>>          /* If the GPU is wedged during the wait, again we expect the return
>>           * value to be 0 (success).
>>           */
>>          igt_require(i915_reset_control(false));
>> -       hang = igt_hang_ring(fd, I915_EXEC_DEFAULT);
>> -       igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0);
>> -       igt_post_hang_ring(fd, hang);
>> +       hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT);
>> +       wedge_after_running(fd, hang);
>> +       igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0);
>> +       igt_spin_batch_free(fd, hang);
>>          igt_require(i915_reset_control(true));
> 
> Hmm. These are not equivalent to the original test. The tests
> requires hangcheck to kick in while the test is blocked on igt_wait.
> To do a fast equivalent, we need to kick off a timer. (Here we are just
> asking if a wait on an already completed request doesn't block, not how
> we handle the reset in the middle of a wait. Seems a reasonable addition
> though.)
> 
> I think that's a general pattern worth repeating for the rest of tests:
> don't immediately inject the hang, but leave it a few milliseconds to
> allow us to block on the subsequent wait. I would even repeat the tests
> a few times with different timeouts; 0, 1us, 10ms (thinking of the
> different phases for i915_request_wait).

True, it's not the same.

Makes sense to test with different timeouts. Will do.

> 
>>          trigger_reset(fd);
>> +
>> +       /* HACK for CI */
>> +       igt_assert(igt_nsec_elapsed(&ts) < 5e9);
> 
> igt_seconds_elapsed() the approximation is worth the readability.
> 
> In this case you might like to try igt_set_timeout(), as I think each
> subtest and exithandlers are in place to make them robust against
> premature failures.

Well this was just to see that will happen on the shards here. As 
mentioned in the commit I get that yet unexplained GPU hang at subtest 
exit here. So the assert above is just to notice if the same happens on 
shards.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Tvrtko Ursulin <tursulin@ursulin.net>,
	igt-dev@lists.freedesktop.org
Cc: Intel-gfx@lists.freedesktop.org,
	Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Subject: Re: [igt-dev] [PATCH i-g-t 2/3] tests/gem_eio: Speed up test execution
Date: Thu, 22 Mar 2018 12:36:58 +0000	[thread overview]
Message-ID: <499c5ae6-900f-0958-149e-036ba0d9de7a@linux.intel.com> (raw)
In-Reply-To: <152171875518.23562.12227602056261860847@mail.alporthouse.com>


On 22/03/2018 11:39, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-03-22 11:17:11)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> If we stop relying on regular GPU hangs to be detected, but trigger them
>> manually as soon as we know our batch of interest is actually executing
>> on the GPU, we can dramatically speed up various subtests.
>>
>> This is enabled by the pollable spin batch added in the previous patch.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Antonio Argenziano <antonio.argenziano@intel.com>
>> ---
>> Note that the 'wait' subtest is mysteriously hanging for me in the no-op
>> batch send by gem_test_engines, but only on RCS engine. TBD while I am
>> getting some CI results.
>> ---
>>   lib.tar         | Bin 0 -> 102400 bytes
>>   tests/gem_eio.c |  97 ++++++++++++++++++++++++++++++++++++++++----------------
>>   2 files changed, 70 insertions(+), 27 deletions(-)
>>   create mode 100644 lib.tar
>>
>> diff --git a/lib.tar b/lib.tar
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..ea04fad219a87f2e975852989526f8da4c9b7d6d
>> GIT binary patch
>> literal 102400
>> zcmeHw>vkJQlBWMsPf=!{kwJ=gUAkMAJc3A2!kPrQASAWM<5LF&3M57#fW}3X+V;H9
> 
> Looks correct.

Just backing up in the cloud. :))

>> diff --git a/tests/gem_eio.c b/tests/gem_eio.c
>> index 4bcc5937db39..93400056124b 100644
>> --- a/tests/gem_eio.c
>> +++ b/tests/gem_eio.c
>> @@ -71,26 +71,23 @@ static void trigger_reset(int fd)
>>          gem_quiescent_gpu(fd);
>>   }
>>   
>> -static void wedge_gpu(int fd)
>> +static void manual_hang(int drm_fd)
>>   {
>> -       /* First idle the GPU then disable GPU resets before injecting a hang */
>> -       gem_quiescent_gpu(fd);
>> -
>> -       igt_require(i915_reset_control(false));
>> +       int dir = igt_debugfs_dir(drm_fd);
>>   
>> -       igt_debug("Wedging GPU by injecting hang\n");
>> -       igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT));
>> +       igt_sysfs_set(dir, "i915_wedged", "-1");
>>   
>> -       igt_assert(i915_reset_control(true));
>> +       close(dir);
>>   }
> 
> Ok.
> 
>> -static void wedgeme(int drm_fd)
>> +static void wedge_gpu(int fd)
>>   {
>> -       int dir = igt_debugfs_dir(drm_fd);
>> -
>> -       igt_sysfs_set(dir, "i915_wedged", "-1");
>> +       /* First idle the GPU then disable GPU resets before injecting a hang */
>> +       gem_quiescent_gpu(fd);
>>   
>> -       close(dir);
>> +       igt_require(i915_reset_control(false));
>> +       manual_hang(fd);
>> +       igt_assert(i915_reset_control(true));
>>   }
> 
> Ok.

Well done reading that awful diff!

>>   
>>   static int __gem_throttle(int fd)
>> @@ -149,29 +146,66 @@ static int __gem_wait(int fd, uint32_t handle, int64_t timeout)
>>          return err;
>>   }
>>   
>> +static igt_spin_t * __spin_poll(int fd, uint32_t ctx, unsigned long flags)
>> +{
>> +       if (gem_can_store_dword(fd, flags))
>> +               return __igt_spin_batch_new_poll(fd, ctx, flags);
>> +       else
>> +               return __igt_spin_batch_new(fd, ctx, flags, 0);
>> +}
>> +
>> +static void __spin_wait(int fd, igt_spin_t *spin)
>> +{
>> +       if (spin->running) {
>> +               while (!*((volatile bool *)spin->running))
>> +                       ;
>> +       } else {
>> +               igt_debug("__spin_wait - usleep mode\n");
>> +               usleep(500e3); /* Better than nothing! */
>> +       }
>> +}
>> +
>> +/*
>> + * Wedge the GPU when we know our batch is running.
>> + */
>> +static void wedge_after_running(int fd, igt_spin_t *spin)
>> +{
>> +       __spin_wait(fd, spin);
>> +       manual_hang(fd);
>> +}
>> +
>>   static void test_wait(int fd)
>>   {
>> -       igt_hang_t hang;
>> +       struct timespec ts = { };
>> +       igt_spin_t *hang;
>>   
>>          igt_require_gem(fd);
>>   
>> +       igt_nsec_elapsed(&ts);
>> +
>>          /* If the request we wait on completes due to a hang (even for
>>           * that request), the user expects the return value to 0 (success).
>>           */
>> -       hang = igt_hang_ring(fd, I915_EXEC_DEFAULT);
>> -       igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0);
>> -       igt_post_hang_ring(fd, hang);
>> +       igt_require(i915_reset_control(true));
>> +       hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT);
>> +       wedge_after_running(fd, hang);
>> +       igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0);
>> +       igt_spin_batch_free(fd, hang);
> 
>>   
>>          /* If the GPU is wedged during the wait, again we expect the return
>>           * value to be 0 (success).
>>           */
>>          igt_require(i915_reset_control(false));
>> -       hang = igt_hang_ring(fd, I915_EXEC_DEFAULT);
>> -       igt_assert_eq(__gem_wait(fd, hang.handle, -1), 0);
>> -       igt_post_hang_ring(fd, hang);
>> +       hang = __spin_poll(fd, 0, I915_EXEC_DEFAULT);
>> +       wedge_after_running(fd, hang);
>> +       igt_assert_eq(__gem_wait(fd, hang->handle, -1), 0);
>> +       igt_spin_batch_free(fd, hang);
>>          igt_require(i915_reset_control(true));
> 
> Hmm. These are not equivalent to the original test. The tests
> requires hangcheck to kick in while the test is blocked on igt_wait.
> To do a fast equivalent, we need to kick off a timer. (Here we are just
> asking if a wait on an already completed request doesn't block, not how
> we handle the reset in the middle of a wait. Seems a reasonable addition
> though.)
> 
> I think that's a general pattern worth repeating for the rest of tests:
> don't immediately inject the hang, but leave it a few milliseconds to
> allow us to block on the subsequent wait. I would even repeat the tests
> a few times with different timeouts; 0, 1us, 10ms (thinking of the
> different phases for i915_request_wait).

True, it's not the same.

Makes sense to test with different timeouts. Will do.

> 
>>          trigger_reset(fd);
>> +
>> +       /* HACK for CI */
>> +       igt_assert(igt_nsec_elapsed(&ts) < 5e9);
> 
> igt_seconds_elapsed() the approximation is worth the readability.
> 
> In this case you might like to try igt_set_timeout(), as I think each
> subtest and exithandlers are in place to make them robust against
> premature failures.

Well this was just to see that will happen on the shards here. As 
mentioned in the commit I get that yet unexplained GPU hang at subtest 
exit here. So the assert above is just to notice if the same happens on 
shards.

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

  reply	other threads:[~2018-03-22 12:36 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-22 11:17 [PATCH i-g-t 1/3] lib/dummyload: Add pollable spin batch Tvrtko Ursulin
2018-03-22 11:17 ` [Intel-gfx] " Tvrtko Ursulin
2018-03-22 11:17 ` [PATCH i-g-t 2/3] tests/gem_eio: Speed up test execution Tvrtko Ursulin
2018-03-22 11:17   ` [igt-dev] " Tvrtko Ursulin
2018-03-22 11:27   ` [PATCH i-g-t v2 " Tvrtko Ursulin
2018-03-22 11:27     ` [igt-dev] " Tvrtko Ursulin
2018-03-22 11:39   ` [igt-dev] [PATCH i-g-t " Chris Wilson
2018-03-22 11:39     ` Chris Wilson
2018-03-22 12:36     ` Tvrtko Ursulin [this message]
2018-03-22 12:36       ` Tvrtko Ursulin
2018-03-22 12:42       ` Chris Wilson
2018-03-22 12:42         ` Chris Wilson
2018-03-22 17:32         ` Antonio Argenziano
2018-03-22 17:32           ` [igt-dev] [Intel-gfx] " Antonio Argenziano
2018-03-22 18:14           ` [igt-dev] " Chris Wilson
2018-03-22 18:14             ` [igt-dev] [Intel-gfx] " Chris Wilson
2018-03-22 22:25             ` [igt-dev] " Antonio Argenziano
2018-03-22 22:25               ` [Intel-gfx] " Antonio Argenziano
2018-03-23  9:49             ` Tvrtko Ursulin
2018-03-23  9:49               ` [igt-dev] [Intel-gfx] " Tvrtko Ursulin
2018-03-22 11:17 ` [PATCH i-g-t 3/3] tests/perf_pmu: Improve accuracy by waiting on spinner to start Tvrtko Ursulin
2018-03-22 11:17   ` [igt-dev] " Tvrtko Ursulin
2018-03-22 11:25 ` [igt-dev] [PATCH i-g-t 1/3] lib/dummyload: Add pollable spin batch Chris Wilson
2018-03-22 11:25   ` [Intel-gfx] " Chris Wilson
2018-03-22 11:59 ` [igt-dev] ✓ Fi.CI.BAT: success for series starting with [i-g-t,1/3] lib/dummyload: Add pollable spin batch (rev2) Patchwork
2018-03-22 13:03 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork
2018-03-22 17:24 [PATCH i-g-t 1/3] lib/dummyload: Add pollable spin batch Tvrtko Ursulin
2018-03-22 17:24 ` [igt-dev] [PATCH i-g-t 2/3] tests/gem_eio: Speed up test execution Tvrtko Ursulin
2018-03-22 17:44   ` Chris Wilson
2018-03-22 17:44     ` Chris Wilson
2018-03-23  9:46     ` Tvrtko Ursulin
2018-03-23  9:54       ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=499c5ae6-900f-0958-149e-036ba0d9de7a@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=tursulin@ursulin.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.