All of lore.kernel.org
 help / color / mirror / Atom feed
From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
To: Karolina Drobnik <karolina.drobnik@intel.com>
Cc: igt-dev@lists.freedesktop.org,
	Chris Wilson <chris@chris-wilson.co.uk>,
	Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Subject: Re: [igt-dev] [PATCH i-g-t v2 1/2] tests/gem_exec_fence: Check stored values only for valid workloads
Date: Fri, 29 Jul 2022 17:23:01 +0200	[thread overview]
Message-ID: <2595594.X9hSmTKtgW@jkrzyszt-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <c816892d-d233-a825-417f-48df8557d96b@intel.com>

On Friday, 29 July 2022 13:32:37 CEST Karolina Drobnik wrote:
> Hi Janusz,
> 
> On 29.07.2022 10:24, Janusz Krzysztofik wrote:
> > Hi Karolina,
> > 
> > On Friday, 29 July 2022 09:38:43 CEST Karolina Drobnik wrote:
> >> Hi Janusz,
> >>
> >> Thanks a lot for taking a look at the patch.
> >>
> >> On 28.07.2022 18:56, Janusz Krzysztofik wrote:
> >>> On Tuesday, 26 July 2022 12:13:11 CEST Karolina Drobnik wrote:
> >>>> From: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>
> >>>> test_fence_await verifies if a fence used to pipeline work signals
> >>>> correctly. await-hang and nb-await-hang test cases inject GPU hang,
> >>>> which causes an erroneous state, meaning the fence will be signaled
> >>>> without execution. The error causes an instant reset of the command
> >>>> streamer for the hanging workload. This revealed a problem with how
> >>>> we verify the fence state and results. The test assumes that the
> >>>> error notification happens after a hang is declared, which takes
> >>>> longer than submitting the next set of fences, making the test pass
> >>>> every time. With the immediate reset, this might not happen, so the
> >>>> assertion fails, as there are no active fences in the GPU hang case.
> >>>>
> >>>> Move the check for active fence to the path for non-hanging workload,
> >>>> and verify results only in this scenario.
> >>>>
> >>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>> Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
> >>>> ---
> >>>>    tests/i915/gem_exec_fence.c | 14 ++++++++------
> >>>>    1 file changed, 8 insertions(+), 6 deletions(-)
> >>>>
> >>>> diff --git a/tests/i915/gem_exec_fence.c b/tests/i915/gem_exec_fence.c
> >>>> index d46914c2..260aa82c 100644
> >>>> --- a/tests/i915/gem_exec_fence.c
> >>>> +++ b/tests/i915/gem_exec_fence.c
> >>>> @@ -350,18 +350,20 @@ static void test_fence_await(int fd, const
> > intel_ctx_t
> >>> *ctx,
> >>>>    	/* Long, but not too long to anger preemption disable checks */
> >>>>    	usleep(50 * 1000); /* 50 ms, typical preempt reset is 150+ms */
> >>>>    
> >>>> -	/* Check for invalidly completing the task early */
> >>>> -	igt_assert(fence_busy(spin->out_fence));
> >>>> -	for (int n = 0; n < i; n++)
> >>>> -		igt_assert_eq_u32(out[n], 0);
> >>>> +	if ((flags & HANG) == 0) {
> >>>> +		/* Check for invalidly completing the task early */
> >>>> +		igt_assert(fence_busy(spin->out_fence));
> >>>> +		for (int n = 0; n < i; n++)
> >>>> +			igt_assert_eq_u32(out[n], 0);
> >>>
> >>> AFAICU, in the 'hang' variant of the scenario we skip the above asserts
> >>> because the spin batch could have already hanged, then its out fence
> > already
> >>> signalled and store batches waiting for that signal already executed.  If
> >>> that's the case, how do this variant of gem_exec_fence test asserts that
> > the
> >>> fence actually worked as expected?
> >>
> >> With this change, yes, we would skip them. Still, the store batches
> >> wouldn't be executed, as we reset the CS on hang as a part of the error
> >> handling. For valid jobs, we expect to (1) see an active fence at the
> >> beginning of the request, (2) get a signaled fence after the wait, (3)
> >> store out[i] == i. With a hang, (1) and (3) would be false.
> >>
> >> In this particular loop, we could have garbage here with hang, not 0s
> >> (although, from my tests it looks like they are zeroed, but maybe I got
> >> lucky)
> > 
> > OK, so I missed the fact that the store batches won't be executed at all due
> > to reset of the whole command stream that also kills those batches.  But my
> > question is still valid: as soon as we omit those checks as not valid from
> > *await-hang variants, how do those variants still exercise fencing?  IOW, how
> > are those variants supposed to ever fail should something be wrong with i915
> > implementation of fencing specifically?
> 
> They would fail in the case where a hang happened but the fence is still 
> active, so it's this last assert you're referring to.
> 
> >>
> >>>>    
> >>>> -	if ((flags & HANG) == 0)
> >>>>    		igt_spin_end(spin);
> >>>> +	}
> >>>>    
> >>>>    	igt_waitchildren();
> >>>>    
> >>>>    	gem_set_domain(fd, scratch, I915_GEM_DOMAIN_GTT, 0);
> >>>> -	while (i--)
> >>>> +	igt_assert(!fence_busy(spin->out_fence));
> >>>
> >>> We only check that the out fence of the presumably hanged spin batch no
> > longer
> >>> blocks execution of store batches.
> >>
> >> This check applies to all workloads, all of them should be done with
> >> work at this point
> > 
> > OK, but since that's the only explicit assert in the *-hang processing path,
> > does it tell us anything about fencing working or not?  
> 
> It says that we were given an active fence, we wait at it and hope it 
> signals when an error is reported. Like I said, we can't check the 
> results itself, as they are meaningless with the reset. If we have an 
> active fence at this point, that's bad, and the test should fail.
> 
> > I think it doesn't,
> > and as long as I'm not wrong, I think such scenarios hardly belong to
> > gem_exec_fence.  
> 
> Hm, I'm not sure if I follow, but this exact transition (from active -> 
> (record error) -> signal) is one of the possible scenarios we wish to 
> test. 

OK, so we check if an active fence is signalled on error.  But then, what does 
'active' mean here?  Do we consider a fence active as soon as it has been 
exported to userspace?  Or only after it has been imported back from userspace 
by at least one consumer?  Assuming the former (as I guess), what do we need 
the store batches for in these now modified *await-hang scenarios?  What extra 
value do those scenarios provide compared to (nb-)?wait-hang ?

Thanks,
Janusz


> Or, do you mean that this test case doesn't test 
> drm_i915_gem_exec_fence? This test suite exercises different scenarios 
> of using fences implemented with sync_files. Maybe this could be split 
> up, but these seem to be connected, so they ended up in one file.
> 
> > Otherwise, I think we should at least add descriptions of
> > those subtests, providing some information on what is actually exercised.
> 
> The hang cases reuse test_fence_await which has _some_ description in 
> the basic cases. But I agree, it would be nice to have more 
> documentation for other subtests, but it's out of scope of this fix.
> 
> Many thanks,
> Karolina
> 
> > Thanks,
> > Janusz
> > 
> 




  reply	other threads:[~2022-07-29 15:23 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-26 10:13 [igt-dev] [PATCH i-g-t v2 0/2] tests/gem_exec_fence: Fix test_fence_await for hanging workloads Karolina Drobnik
2022-07-26 10:13 ` [igt-dev] [PATCH i-g-t v2 1/2] tests/gem_exec_fence: Check stored values only for valid workloads Karolina Drobnik
2022-07-26 14:27   ` Kamil Konieczny
2022-07-28 16:56   ` Janusz Krzysztofik
2022-07-29  7:38     ` Karolina Drobnik
2022-07-29  8:24       ` Janusz Krzysztofik
2022-07-29 11:32         ` Karolina Drobnik
2022-07-29 15:23           ` Janusz Krzysztofik [this message]
2022-07-26 10:13 ` [igt-dev] [PATCH i-g-t v2 2/2] tests/gem_exec_fence: Coordinate sleep with the start of the request Karolina Drobnik
2022-07-26 14:28   ` Kamil Konieczny
2022-07-26 10:54 ` [igt-dev] ✗ Fi.CI.BAT: failure for tests/gem_exec_fence: Fix test_fence_await for hanging workloads (rev2) Patchwork
2022-07-26 11:06   ` Karolina Drobnik
2022-07-28 15:17 ` [igt-dev] ✓ Fi.CI.BAT: success " Patchwork
2022-07-28 21:20 ` [igt-dev] ✓ Fi.CI.IGT: " Patchwork
     [not found] ` <6459819.4vTCxPXJkl@jkrzyszt-mobl1.ger.corp.intel.com>
     [not found]   ` <fb564118-4afb-6f4a-03cc-34e255b871ef@intel.com>
2022-08-01 11:54     ` [igt-dev] [PATCH i-g-t v2 1/2] tests/gem_exec_fence: Check stored values only for valid workloads Janusz Krzysztofik
2022-08-01 13:39       ` Karolina Drobnik
2022-08-01 14:43         ` Janusz Krzysztofik
2022-08-02 10:20           ` Karolina Drobnik
2022-08-03  7:21             ` Janusz Krzysztofik
2022-08-03  7:45               ` Karolina Drobnik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2595594.X9hSmTKtgW@jkrzyszt-mobl1.ger.corp.intel.com \
    --to=janusz.krzysztofik@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=karolina.drobnik@intel.com \
    --cc=tvrtko.ursulin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.