* [Intel-gfx] [PATCH i-g-t 0/1] Fix gem_scheduler.manycontexts for GuC submission @ 2021-07-27 18:20 Matthew Brost 2021-07-27 18:20 ` [Intel-gfx] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts Matthew Brost 0 siblings, 1 reply; 10+ messages in thread From: Matthew Brost @ 2021-07-27 18:20 UTC (permalink / raw) To: igt-dev; +Cc: intel-gfx Patch should explain it all. Will include in [1] when that series is respun. Signed-off-by: Matthew Brost <matthew.brost@intel.com> [1] https://patchwork.freedesktop.org/series/93071/ Matthew Brost (1): i915/gem_scheduler: Ensure submission order in manycontexts tests/i915/gem_exec_schedule.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) -- 2.28.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Intel-gfx] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts 2021-07-27 18:20 [Intel-gfx] [PATCH i-g-t 0/1] Fix gem_scheduler.manycontexts for GuC submission Matthew Brost @ 2021-07-27 18:20 ` Matthew Brost 2021-07-29 23:54 ` [Intel-gfx] [igt-dev] " John Harrison 2021-07-30 9:58 ` [Intel-gfx] " Tvrtko Ursulin 0 siblings, 2 replies; 10+ messages in thread From: Matthew Brost @ 2021-07-27 18:20 UTC (permalink / raw) To: igt-dev; +Cc: intel-gfx With GuC submission contexts can get reordered (compared to submission order), if contexts get reordered the sequential nature of the batches releasing the next batch's semaphore in function timesliceN() get broken resulting in the test taking much longer than if should. e.g. Every contexts needs to be timesliced to release the next batch. Corking the first submission until all the batches have been submitted should ensure submission order. Signed-off-by: Matthew Brost <matthew.brost@intel.com> --- tests/i915/gem_exec_schedule.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c index f03842478..41f2591a5 100644 --- a/tests/i915/gem_exec_schedule.c +++ b/tests/i915/gem_exec_schedule.c @@ -597,12 +597,13 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, struct drm_i915_gem_execbuffer2 execbuf = { .buffers_ptr = to_user_pointer(&obj), .buffer_count = 1, - .flags = engine | I915_EXEC_FENCE_OUT, + .flags = engine | I915_EXEC_FENCE_OUT | I915_EXEC_FENCE_SUBMIT, }; uint32_t *result = gem_mmap__device_coherent(i915, obj.handle, 0, sz, PROT_READ); const intel_ctx_t *ctx; int fence[count]; + IGT_CORK_FENCE(cork); /* * Create a pair of interlocking batches, that ping pong @@ -614,6 +615,17 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, igt_require(gem_scheduler_has_timeslicing(i915)); igt_require(intel_gen(intel_get_drm_devid(i915)) >= 8); + /* + * With GuC submission contexts can get reordered (compared to + * submission order), if contexts get reordered the sequential + * nature of the batches releasing the next batch's semaphore gets + * broken resulting in the test taking much longer than it should (e.g. + * every context needs to be timesliced to release the next batch). + * Corking the first submission until all batches have been + * submitted should ensure submission order. + */ + execbuf.rsvd2 = igt_cork_plug(&cork, i915); + /* No coupling between requests; free to timeslice */ for (int i = 0; i < count; i++) { @@ -624,8 +636,10 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, intel_ctx_destroy(i915, ctx); fence[i] = execbuf.rsvd2 >> 32; + execbuf.rsvd2 >>= 32; } + igt_cork_unplug(&cork); gem_sync(i915, obj.handle); gem_close(i915, obj.handle); -- 2.28.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts 2021-07-27 18:20 ` [Intel-gfx] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts Matthew Brost @ 2021-07-29 23:54 ` John Harrison 2021-07-30 0:00 ` Matthew Brost 2021-07-30 9:58 ` [Intel-gfx] " Tvrtko Ursulin 1 sibling, 1 reply; 10+ messages in thread From: John Harrison @ 2021-07-29 23:54 UTC (permalink / raw) To: Matthew Brost, igt-dev; +Cc: intel-gfx On 7/27/2021 11:20, Matthew Brost wrote: > With GuC submission contexts can get reordered (compared to submission > order), if contexts get reordered the sequential nature of the batches > releasing the next batch's semaphore in function timesliceN() get broken > resulting in the test taking much longer than if should. e.g. Every > contexts needs to be timesliced to release the next batch. Corking the > first submission until all the batches have been submitted should ensure > submission order. > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > --- > tests/i915/gem_exec_schedule.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c > index f03842478..41f2591a5 100644 > --- a/tests/i915/gem_exec_schedule.c > +++ b/tests/i915/gem_exec_schedule.c > @@ -597,12 +597,13 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > struct drm_i915_gem_execbuffer2 execbuf = { > .buffers_ptr = to_user_pointer(&obj), > .buffer_count = 1, > - .flags = engine | I915_EXEC_FENCE_OUT, > + .flags = engine | I915_EXEC_FENCE_OUT | I915_EXEC_FENCE_SUBMIT, > }; > uint32_t *result = > gem_mmap__device_coherent(i915, obj.handle, 0, sz, PROT_READ); > const intel_ctx_t *ctx; > int fence[count]; > + IGT_CORK_FENCE(cork); > > /* > * Create a pair of interlocking batches, that ping pong > @@ -614,6 +615,17 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > igt_require(gem_scheduler_has_timeslicing(i915)); > igt_require(intel_gen(intel_get_drm_devid(i915)) >= 8); > > + /* > + * With GuC submission contexts can get reordered (compared to > + * submission order), if contexts get reordered the sequential > + * nature of the batches releasing the next batch's semaphore gets > + * broken resulting in the test taking much longer than it should (e.g. > + * every context needs to be timesliced to release the next batch). > + * Corking the first submission until all batches have been > + * submitted should ensure submission order. > + */ > + execbuf.rsvd2 = igt_cork_plug(&cork, i915); > + > /* No coupling between requests; free to timeslice */ > > for (int i = 0; i < count; i++) { > @@ -624,8 +636,10 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > intel_ctx_destroy(i915, ctx); > > fence[i] = execbuf.rsvd2 >> 32; > + execbuf.rsvd2 >>= 32; This means you are passing fence_out[A] as fenc_in[B]? I.e. this patch is also changing the behaviour to make each batch dependent upon the previous one. That change is not mentioned in the new comment. It is also the exact opposite of the comment immediately above the loop - 'No coupling between requests'. John. > } > > + igt_cork_unplug(&cork); > gem_sync(i915, obj.handle); > gem_close(i915, obj.handle); > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts 2021-07-29 23:54 ` [Intel-gfx] [igt-dev] " John Harrison @ 2021-07-30 0:00 ` Matthew Brost 2021-08-19 23:31 ` John Harrison 0 siblings, 1 reply; 10+ messages in thread From: Matthew Brost @ 2021-07-30 0:00 UTC (permalink / raw) To: John Harrison; +Cc: igt-dev, intel-gfx On Thu, Jul 29, 2021 at 04:54:08PM -0700, John Harrison wrote: > On 7/27/2021 11:20, Matthew Brost wrote: > > With GuC submission contexts can get reordered (compared to submission > > order), if contexts get reordered the sequential nature of the batches > > releasing the next batch's semaphore in function timesliceN() get broken > > resulting in the test taking much longer than if should. e.g. Every > > contexts needs to be timesliced to release the next batch. Corking the > > first submission until all the batches have been submitted should ensure > > submission order. > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > --- > > tests/i915/gem_exec_schedule.c | 16 +++++++++++++++- > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c > > index f03842478..41f2591a5 100644 > > --- a/tests/i915/gem_exec_schedule.c > > +++ b/tests/i915/gem_exec_schedule.c > > @@ -597,12 +597,13 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > > struct drm_i915_gem_execbuffer2 execbuf = { > > .buffers_ptr = to_user_pointer(&obj), > > .buffer_count = 1, > > - .flags = engine | I915_EXEC_FENCE_OUT, > > + .flags = engine | I915_EXEC_FENCE_OUT | I915_EXEC_FENCE_SUBMIT, > > }; > > uint32_t *result = > > gem_mmap__device_coherent(i915, obj.handle, 0, sz, PROT_READ); > > const intel_ctx_t *ctx; > > int fence[count]; > > + IGT_CORK_FENCE(cork); > > /* > > * Create a pair of interlocking batches, that ping pong > > @@ -614,6 +615,17 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > > igt_require(gem_scheduler_has_timeslicing(i915)); > > igt_require(intel_gen(intel_get_drm_devid(i915)) >= 8); > > + /* > > + * With GuC submission contexts can get reordered (compared to > > + * submission order), if contexts get reordered the sequential > > + * nature of the batches releasing the next batch's semaphore gets > > + * broken resulting in the test taking much longer than it should (e.g. > > + * every context needs to be timesliced to release the next batch). > > + * Corking the first submission until all batches have been > > + * submitted should ensure submission order. > > + */ > > + execbuf.rsvd2 = igt_cork_plug(&cork, i915); > > + > > /* No coupling between requests; free to timeslice */ > > for (int i = 0; i < count; i++) { > > @@ -624,8 +636,10 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > > intel_ctx_destroy(i915, ctx); > > fence[i] = execbuf.rsvd2 >> 32; > > + execbuf.rsvd2 >>= 32; > This means you are passing fence_out[A] as fenc_in[B]? I.e. this patch is > also changing the behaviour to make each batch dependent upon the previous This is a submission fence, it just ensures they get submitted in order. Corking the first request + the fence, ensures all the requests get submitted basically at the same time compared to execbuf IOCTL time without it. > one. That change is not mentioned in the new comment. It is also the exact Yea, I could explain this better. Will fix. Matt > opposite of the comment immediately above the loop - 'No coupling between > requests'. > > John. > > > > } > > + igt_cork_unplug(&cork); > > gem_sync(i915, obj.handle); > > gem_close(i915, obj.handle); > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts 2021-07-30 0:00 ` Matthew Brost @ 2021-08-19 23:31 ` John Harrison 0 siblings, 0 replies; 10+ messages in thread From: John Harrison @ 2021-08-19 23:31 UTC (permalink / raw) To: Matthew Brost; +Cc: igt-dev, intel-gfx On 7/29/2021 17:00, Matthew Brost wrote: > On Thu, Jul 29, 2021 at 04:54:08PM -0700, John Harrison wrote: >> On 7/27/2021 11:20, Matthew Brost wrote: >>> With GuC submission contexts can get reordered (compared to submission >>> order), if contexts get reordered the sequential nature of the batches >>> releasing the next batch's semaphore in function timesliceN() get broken >>> resulting in the test taking much longer than if should. e.g. Every >>> contexts needs to be timesliced to release the next batch. Corking the >>> first submission until all the batches have been submitted should ensure >>> submission order. >>> >>> Signed-off-by: Matthew Brost <matthew.brost@intel.com> >>> --- >>> tests/i915/gem_exec_schedule.c | 16 +++++++++++++++- >>> 1 file changed, 15 insertions(+), 1 deletion(-) >>> >>> diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c >>> index f03842478..41f2591a5 100644 >>> --- a/tests/i915/gem_exec_schedule.c >>> +++ b/tests/i915/gem_exec_schedule.c >>> @@ -597,12 +597,13 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, >>> struct drm_i915_gem_execbuffer2 execbuf = { >>> .buffers_ptr = to_user_pointer(&obj), >>> .buffer_count = 1, >>> - .flags = engine | I915_EXEC_FENCE_OUT, >>> + .flags = engine | I915_EXEC_FENCE_OUT | I915_EXEC_FENCE_SUBMIT, >>> }; >>> uint32_t *result = >>> gem_mmap__device_coherent(i915, obj.handle, 0, sz, PROT_READ); >>> const intel_ctx_t *ctx; >>> int fence[count]; >>> + IGT_CORK_FENCE(cork); >>> /* >>> * Create a pair of interlocking batches, that ping pong >>> @@ -614,6 +615,17 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, >>> igt_require(gem_scheduler_has_timeslicing(i915)); >>> igt_require(intel_gen(intel_get_drm_devid(i915)) >= 8); >>> + /* >>> + * With GuC submission contexts can get reordered (compared to >>> + * submission order), if contexts get reordered the sequential >>> + * nature of the batches releasing the next batch's semaphore gets >>> + * broken resulting in the test taking much longer than it should (e.g. >>> + * every context needs to be timesliced to release the next batch). >>> + * Corking the first submission until all batches have been >>> + * submitted should ensure submission order. >>> + */ >>> + execbuf.rsvd2 = igt_cork_plug(&cork, i915); >>> + >>> /* No coupling between requests; free to timeslice */ >>> for (int i = 0; i < count; i++) { >>> @@ -624,8 +636,10 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, >>> intel_ctx_destroy(i915, ctx); >>> fence[i] = execbuf.rsvd2 >> 32; >>> + execbuf.rsvd2 >>= 32; >> This means you are passing fence_out[A] as fenc_in[B]? I.e. this patch is >> also changing the behaviour to make each batch dependent upon the previous > This is a submission fence, it just ensures they get submitted in order. > Corking the first request + the fence, ensures all the requests get > submitted basically at the same time compared to execbuf IOCTL time > without it. The input side is the submit fence, but the output side is the completion fence. You are chaining the out fence of the previous request as the submit fence of the next request. Loop 0: execbuf.rsvd2 = cork submit() execbuf.rsvd2 is now the out fence in the upper 32 fence[0] = execbuf.rsvd2 >> 32; execbuf.rsvd2 >>= 32; move new out fence to be the next in fence Loop 1: execbuf.rsvd2 == fence[0] submit() fence[1] = new out fence Loop 2: execbuf.rsvd2 == fence[1] ... You have changed the parallel requests into a sequential line. Request X is now waiting for Request Y to *complete* before it can be submitted. Only the first request is waiting on the cork. John. >> one. That change is not mentioned in the new comment. It is also the exact > Yea, I could explain this better. Will fix. > > Matt > >> opposite of the comment immediately above the loop - 'No coupling between >> requests'. >> >> John. >> >> >>> } >>> + igt_cork_unplug(&cork); >>> gem_sync(i915, obj.handle); >>> gem_close(i915, obj.handle); ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts 2021-07-27 18:20 ` [Intel-gfx] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts Matthew Brost 2021-07-29 23:54 ` [Intel-gfx] [igt-dev] " John Harrison @ 2021-07-30 9:58 ` Tvrtko Ursulin 2021-07-30 18:06 ` Matthew Brost 1 sibling, 1 reply; 10+ messages in thread From: Tvrtko Ursulin @ 2021-07-30 9:58 UTC (permalink / raw) To: Matthew Brost, igt-dev; +Cc: intel-gfx On 27/07/2021 19:20, Matthew Brost wrote: > With GuC submission contexts can get reordered (compared to submission > order), if contexts get reordered the sequential nature of the batches > releasing the next batch's semaphore in function timesliceN() get broken > resulting in the test taking much longer than if should. e.g. Every > contexts needs to be timesliced to release the next batch. Corking the > first submission until all the batches have been submitted should ensure > submission order. The explanation sounds suspect. Consider this comment from the test itself: /* * Create a pair of interlocking batches, that ping pong * between each other, and only advance one step at a time. * We require the kernel to preempt at each semaphore and * switch to the other batch in order to advance. */ I'd say the test does not rely on no re-ordering at all, but relies on context switch on an unsatisfied semaphore. In the commit you seem to acknowledge GuC does not do that but instead ends up waiting for the timeslice to expire, did I get that right? If so, why does the GuC does not do that and can we fix it? Regards, Tvrtko > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > --- > tests/i915/gem_exec_schedule.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c > index f03842478..41f2591a5 100644 > --- a/tests/i915/gem_exec_schedule.c > +++ b/tests/i915/gem_exec_schedule.c > @@ -597,12 +597,13 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > struct drm_i915_gem_execbuffer2 execbuf = { > .buffers_ptr = to_user_pointer(&obj), > .buffer_count = 1, > - .flags = engine | I915_EXEC_FENCE_OUT, > + .flags = engine | I915_EXEC_FENCE_OUT | I915_EXEC_FENCE_SUBMIT, > }; > uint32_t *result = > gem_mmap__device_coherent(i915, obj.handle, 0, sz, PROT_READ); > const intel_ctx_t *ctx; > int fence[count]; > + IGT_CORK_FENCE(cork); > > /* > * Create a pair of interlocking batches, that ping pong > @@ -614,6 +615,17 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > igt_require(gem_scheduler_has_timeslicing(i915)); > igt_require(intel_gen(intel_get_drm_devid(i915)) >= 8); > > + /* > + * With GuC submission contexts can get reordered (compared to > + * submission order), if contexts get reordered the sequential > + * nature of the batches releasing the next batch's semaphore gets > + * broken resulting in the test taking much longer than it should (e.g. > + * every context needs to be timesliced to release the next batch). > + * Corking the first submission until all batches have been > + * submitted should ensure submission order. > + */ > + execbuf.rsvd2 = igt_cork_plug(&cork, i915); > + > /* No coupling between requests; free to timeslice */ > > for (int i = 0; i < count; i++) { > @@ -624,8 +636,10 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > intel_ctx_destroy(i915, ctx); > > fence[i] = execbuf.rsvd2 >> 32; > + execbuf.rsvd2 >>= 32; > } > > + igt_cork_unplug(&cork); > gem_sync(i915, obj.handle); > gem_close(i915, obj.handle); > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts 2021-07-30 9:58 ` [Intel-gfx] " Tvrtko Ursulin @ 2021-07-30 18:06 ` Matthew Brost 2021-08-02 8:59 ` Tvrtko Ursulin 0 siblings, 1 reply; 10+ messages in thread From: Matthew Brost @ 2021-07-30 18:06 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: igt-dev, intel-gfx On Fri, Jul 30, 2021 at 10:58:38AM +0100, Tvrtko Ursulin wrote: > > On 27/07/2021 19:20, Matthew Brost wrote: > > With GuC submission contexts can get reordered (compared to submission > > order), if contexts get reordered the sequential nature of the batches > > releasing the next batch's semaphore in function timesliceN() get broken > > resulting in the test taking much longer than if should. e.g. Every > > contexts needs to be timesliced to release the next batch. Corking the > > first submission until all the batches have been submitted should ensure > > submission order. > > The explanation sounds suspect. > > Consider this comment from the test itself: > > /* > * Create a pair of interlocking batches, that ping pong > * between each other, and only advance one step at a time. > * We require the kernel to preempt at each semaphore and > * switch to the other batch in order to advance. > */ > > I'd say the test does not rely on no re-ordering at all, but relies on > context switch on an unsatisfied semaphore. > Yes, let do a simple example with 5 batches. Batch 0 releases batch's semaphore 1, batch 1 releases batch's 2 semaphore, etc... If the batches are seen in order the test should take 40 timeslices (8 semaphores in each batch have to be released). If the batches are in the below order: 0 2 1 3 4 Now we have 72 timeslices. Now imagine with 67 batches completely out of order, the number timeslices can explode. > In the commit you seem to acknowledge GuC does not do that but instead ends > up waiting for the timeslice to expire, did I get that right? If so, why I think GuC waits for the timeslice to expire if a semaphore is unsatisfied, I have to double check on that. I thought that was what execlists were doing too but I now see it has a convoluted algorithm to yield the timeslice if subsequent request comes in and the ring is waiting on a timeslice. Let me check with GuC team and see if they can / are doing something similiar. I was thinking the only to switch a sempahore was clear CTX_CTRL_INHIBIT_SYN_CTX_SWITCH but that appears to be incorrect. For what is worth, after this change the run times of test are pretty similar for execlists & GuC on TGL. Matt > does the GuC does not do that and can we fix it? > > Regards, > > Tvrtko > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > --- > > tests/i915/gem_exec_schedule.c | 16 +++++++++++++++- > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c > > index f03842478..41f2591a5 100644 > > --- a/tests/i915/gem_exec_schedule.c > > +++ b/tests/i915/gem_exec_schedule.c > > @@ -597,12 +597,13 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > > struct drm_i915_gem_execbuffer2 execbuf = { > > .buffers_ptr = to_user_pointer(&obj), > > .buffer_count = 1, > > - .flags = engine | I915_EXEC_FENCE_OUT, > > + .flags = engine | I915_EXEC_FENCE_OUT | I915_EXEC_FENCE_SUBMIT, > > }; > > uint32_t *result = > > gem_mmap__device_coherent(i915, obj.handle, 0, sz, PROT_READ); > > const intel_ctx_t *ctx; > > int fence[count]; > > + IGT_CORK_FENCE(cork); > > /* > > * Create a pair of interlocking batches, that ping pong > > @@ -614,6 +615,17 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > > igt_require(gem_scheduler_has_timeslicing(i915)); > > igt_require(intel_gen(intel_get_drm_devid(i915)) >= 8); > > + /* > > + * With GuC submission contexts can get reordered (compared to > > + * submission order), if contexts get reordered the sequential > > + * nature of the batches releasing the next batch's semaphore gets > > + * broken resulting in the test taking much longer than it should (e.g. > > + * every context needs to be timesliced to release the next batch). > > + * Corking the first submission until all batches have been > > + * submitted should ensure submission order. > > + */ > > + execbuf.rsvd2 = igt_cork_plug(&cork, i915); > > + > > /* No coupling between requests; free to timeslice */ > > for (int i = 0; i < count; i++) { > > @@ -624,8 +636,10 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > > intel_ctx_destroy(i915, ctx); > > fence[i] = execbuf.rsvd2 >> 32; > > + execbuf.rsvd2 >>= 32; > > } > > + igt_cork_unplug(&cork); > > gem_sync(i915, obj.handle); > > gem_close(i915, obj.handle); > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts 2021-07-30 18:06 ` Matthew Brost @ 2021-08-02 8:59 ` Tvrtko Ursulin 2021-08-02 20:10 ` Matthew Brost 0 siblings, 1 reply; 10+ messages in thread From: Tvrtko Ursulin @ 2021-08-02 8:59 UTC (permalink / raw) To: Matthew Brost; +Cc: igt-dev, intel-gfx On 30/07/2021 19:06, Matthew Brost wrote: > On Fri, Jul 30, 2021 at 10:58:38AM +0100, Tvrtko Ursulin wrote: >> >> On 27/07/2021 19:20, Matthew Brost wrote: >>> With GuC submission contexts can get reordered (compared to submission >>> order), if contexts get reordered the sequential nature of the batches >>> releasing the next batch's semaphore in function timesliceN() get broken >>> resulting in the test taking much longer than if should. e.g. Every >>> contexts needs to be timesliced to release the next batch. Corking the >>> first submission until all the batches have been submitted should ensure >>> submission order. >> >> The explanation sounds suspect. >> >> Consider this comment from the test itself: >> >> /* >> * Create a pair of interlocking batches, that ping pong >> * between each other, and only advance one step at a time. >> * We require the kernel to preempt at each semaphore and >> * switch to the other batch in order to advance. >> */ >> >> I'd say the test does not rely on no re-ordering at all, but relies on >> context switch on an unsatisfied semaphore. >> > > Yes, let do a simple example with 5 batches. Batch 0 releases batch's > semaphore 1, batch 1 releases batch's 2 semaphore, etc... If the batches > are seen in order the test should take 40 timeslices (8 semaphores in > each batch have to be released). > > If the batches are in the below order: > 0 2 1 3 4 > > Now we have 72 timeslices. Now imagine with 67 batches completely out of > order, the number timeslices can explode. Yes that part is clear, issue is to understand why is guc waiting for the timeslice to expire.. >> In the commit you seem to acknowledge GuC does not do that but instead ends >> up waiting for the timeslice to expire, did I get that right? If so, why > > I think GuC waits for the timeslice to expire if a semaphore is > unsatisfied, I have to double check on that. I thought that was what > execlists were doing too but I now see it has a convoluted algorithm to > yield the timeslice if subsequent request comes in and the ring is > waiting on a timeslice. Let me check with GuC team and see if they can > / are doing something similiar. I was thinking the only to switch a > sempahore was clear CTX_CTRL_INHIBIT_SYN_CTX_SWITCH but that appears to > be incorrect. .. so this will need clarifying with the firmware team. With execlists we enable and react on GT_WAIT_SEMAPHORE_INTERRUPT. If guc does not, or can not, do that that could be worrying since userspace can and does use semaphores legitimately so making it pay the timeslice penalty. Well actually that has an effect to unrelated clients as well, not just the semaphore user. > For what is worth, after this change the run times of test are pretty > similar for execlists & GuC on TGL. Yes, but the test was useful in this case since it found a weakness in guc scheduling so it may not be the best approach to hide that. Regards, Tvrtko > > Matt > >> does the GuC does not do that and can we fix it? >> >> Regards, >> >> Tvrtko >> >>> >>> Signed-off-by: Matthew Brost <matthew.brost@intel.com> >>> --- >>> tests/i915/gem_exec_schedule.c | 16 +++++++++++++++- >>> 1 file changed, 15 insertions(+), 1 deletion(-) >>> >>> diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c >>> index f03842478..41f2591a5 100644 >>> --- a/tests/i915/gem_exec_schedule.c >>> +++ b/tests/i915/gem_exec_schedule.c >>> @@ -597,12 +597,13 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, >>> struct drm_i915_gem_execbuffer2 execbuf = { >>> .buffers_ptr = to_user_pointer(&obj), >>> .buffer_count = 1, >>> - .flags = engine | I915_EXEC_FENCE_OUT, >>> + .flags = engine | I915_EXEC_FENCE_OUT | I915_EXEC_FENCE_SUBMIT, >>> }; >>> uint32_t *result = >>> gem_mmap__device_coherent(i915, obj.handle, 0, sz, PROT_READ); >>> const intel_ctx_t *ctx; >>> int fence[count]; >>> + IGT_CORK_FENCE(cork); >>> /* >>> * Create a pair of interlocking batches, that ping pong >>> @@ -614,6 +615,17 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, >>> igt_require(gem_scheduler_has_timeslicing(i915)); >>> igt_require(intel_gen(intel_get_drm_devid(i915)) >= 8); >>> + /* >>> + * With GuC submission contexts can get reordered (compared to >>> + * submission order), if contexts get reordered the sequential >>> + * nature of the batches releasing the next batch's semaphore gets >>> + * broken resulting in the test taking much longer than it should (e.g. >>> + * every context needs to be timesliced to release the next batch). >>> + * Corking the first submission until all batches have been >>> + * submitted should ensure submission order. >>> + */ >>> + execbuf.rsvd2 = igt_cork_plug(&cork, i915); >>> + >>> /* No coupling between requests; free to timeslice */ >>> for (int i = 0; i < count; i++) { >>> @@ -624,8 +636,10 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, >>> intel_ctx_destroy(i915, ctx); >>> fence[i] = execbuf.rsvd2 >> 32; >>> + execbuf.rsvd2 >>= 32; >>> } >>> + igt_cork_unplug(&cork); >>> gem_sync(i915, obj.handle); >>> gem_close(i915, obj.handle); >>> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts 2021-08-02 8:59 ` Tvrtko Ursulin @ 2021-08-02 20:10 ` Matthew Brost 2021-08-03 8:54 ` Tvrtko Ursulin 0 siblings, 1 reply; 10+ messages in thread From: Matthew Brost @ 2021-08-02 20:10 UTC (permalink / raw) To: Tvrtko Ursulin; +Cc: igt-dev, intel-gfx On Mon, Aug 02, 2021 at 09:59:01AM +0100, Tvrtko Ursulin wrote: > > > On 30/07/2021 19:06, Matthew Brost wrote: > > On Fri, Jul 30, 2021 at 10:58:38AM +0100, Tvrtko Ursulin wrote: > > > > > > On 27/07/2021 19:20, Matthew Brost wrote: > > > > With GuC submission contexts can get reordered (compared to submission > > > > order), if contexts get reordered the sequential nature of the batches > > > > releasing the next batch's semaphore in function timesliceN() get broken > > > > resulting in the test taking much longer than if should. e.g. Every > > > > contexts needs to be timesliced to release the next batch. Corking the > > > > first submission until all the batches have been submitted should ensure > > > > submission order. > > > > > > The explanation sounds suspect. > > > > > > Consider this comment from the test itself: > > > > > > /* > > > * Create a pair of interlocking batches, that ping pong > > > * between each other, and only advance one step at a time. > > > * We require the kernel to preempt at each semaphore and > > > * switch to the other batch in order to advance. > > > */ > > > > > > I'd say the test does not rely on no re-ordering at all, but relies on > > > context switch on an unsatisfied semaphore. > > > > > > > Yes, let do a simple example with 5 batches. Batch 0 releases batch's > > semaphore 1, batch 1 releases batch's 2 semaphore, etc... If the batches > > are seen in order the test should take 40 timeslices (8 semaphores in > > each batch have to be released). > > > > If the batches are in the below order: > > 0 2 1 3 4 > > > > Now we have 72 timeslices. Now imagine with 67 batches completely out of > > order, the number timeslices can explode. > > Yes that part is clear, issue is to understand why is guc waiting for the > timeslice to expire.. > > > > In the commit you seem to acknowledge GuC does not do that but instead ends > > > up waiting for the timeslice to expire, did I get that right? If so, why > > > > I think GuC waits for the timeslice to expire if a semaphore is > > unsatisfied, I have to double check on that. I thought that was what > > execlists were doing too but I now see it has a convoluted algorithm to > > yield the timeslice if subsequent request comes in and the ring is > > waiting on a timeslice. Let me check with GuC team and see if they can > > / are doing something similiar. I was thinking the only to switch a > > sempahore was clear CTX_CTRL_INHIBIT_SYN_CTX_SWITCH but that appears to > > be incorrect. > > .. so this will need clarifying with the firmware team. > They do not use the GT_WAIT_SEMAPHORE_INTERRUPT. However, we can clear CTX_CTRL_INHIBIT_SYN_CTX_SWITCH will result in more or less the same behavior as execlists but I'm suspect if that is the right solution. More on that below. > With execlists we enable and react on GT_WAIT_SEMAPHORE_INTERRUPT. If guc Because execlists does this, doesn't mean it is the spec or is correct. As far as I can tell this behavior is yet another thing just shoehorned into the execlists scheduler without a ton of thought or input from architecture about what the scheduler should look like or what the UMD needs actually are. If we change anything related to GuC scheduling there needs to be a clear need - again saying execlists does this is not an argument. There needs to be an agreement with architecture, the UMD teams, the i915 team, possibly the Windows team, and the GuC team before we make any changes. IMO the correct solution is to use tokens. Have uAPI interface which distributes tokens to the UMDs, the i915 clears context switch inhibit bit in the LRC if the user opted into tokens, and now semaphores switch out automatically and get rescheduled when the token is signaled. > does not, or can not, do that that could be worrying since userspace can and > does use semaphores legitimately so making it pay the timeslice penalty. > Well actually that has an effect to unrelated clients as well, not just the > semaphore user. Not buying this argument. Any user can submit a long running batch that always uses its full time slice and this affects unrelated clients. > > > For what is worth, after this change the run times of test are pretty > > similar for execlists & GuC on TGL. > > Yes, but the test was useful in this case since it found a weakness in guc > scheduling so it may not be the best approach to hide that. > Not a weakness, just a difference. Matt > Regards, > > Tvrtko > > > > > Matt > > > > > does the GuC does not do that and can we fix it? > > > > > > Regards, > > > > > > Tvrtko > > > > > > > > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > > > --- > > > > tests/i915/gem_exec_schedule.c | 16 +++++++++++++++- > > > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c > > > > index f03842478..41f2591a5 100644 > > > > --- a/tests/i915/gem_exec_schedule.c > > > > +++ b/tests/i915/gem_exec_schedule.c > > > > @@ -597,12 +597,13 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > > > > struct drm_i915_gem_execbuffer2 execbuf = { > > > > .buffers_ptr = to_user_pointer(&obj), > > > > .buffer_count = 1, > > > > - .flags = engine | I915_EXEC_FENCE_OUT, > > > > + .flags = engine | I915_EXEC_FENCE_OUT | I915_EXEC_FENCE_SUBMIT, > > > > }; > > > > uint32_t *result = > > > > gem_mmap__device_coherent(i915, obj.handle, 0, sz, PROT_READ); > > > > const intel_ctx_t *ctx; > > > > int fence[count]; > > > > + IGT_CORK_FENCE(cork); > > > > /* > > > > * Create a pair of interlocking batches, that ping pong > > > > @@ -614,6 +615,17 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > > > > igt_require(gem_scheduler_has_timeslicing(i915)); > > > > igt_require(intel_gen(intel_get_drm_devid(i915)) >= 8); > > > > + /* > > > > + * With GuC submission contexts can get reordered (compared to > > > > + * submission order), if contexts get reordered the sequential > > > > + * nature of the batches releasing the next batch's semaphore gets > > > > + * broken resulting in the test taking much longer than it should (e.g. > > > > + * every context needs to be timesliced to release the next batch). > > > > + * Corking the first submission until all batches have been > > > > + * submitted should ensure submission order. > > > > + */ > > > > + execbuf.rsvd2 = igt_cork_plug(&cork, i915); > > > > + > > > > /* No coupling between requests; free to timeslice */ > > > > for (int i = 0; i < count; i++) { > > > > @@ -624,8 +636,10 @@ static void timesliceN(int i915, const intel_ctx_cfg_t *cfg, > > > > intel_ctx_destroy(i915, ctx); > > > > fence[i] = execbuf.rsvd2 >> 32; > > > > + execbuf.rsvd2 >>= 32; > > > > } > > > > + igt_cork_unplug(&cork); > > > > gem_sync(i915, obj.handle); > > > > gem_close(i915, obj.handle); > > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Intel-gfx] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts 2021-08-02 20:10 ` Matthew Brost @ 2021-08-03 8:54 ` Tvrtko Ursulin 0 siblings, 0 replies; 10+ messages in thread From: Tvrtko Ursulin @ 2021-08-03 8:54 UTC (permalink / raw) To: Matthew Brost; +Cc: igt-dev, intel-gfx On 02/08/2021 21:10, Matthew Brost wrote: > On Mon, Aug 02, 2021 at 09:59:01AM +0100, Tvrtko Ursulin wrote: >> >> >> On 30/07/2021 19:06, Matthew Brost wrote: >>> On Fri, Jul 30, 2021 at 10:58:38AM +0100, Tvrtko Ursulin wrote: >>>> >>>> On 27/07/2021 19:20, Matthew Brost wrote: >>>>> With GuC submission contexts can get reordered (compared to submission >>>>> order), if contexts get reordered the sequential nature of the batches >>>>> releasing the next batch's semaphore in function timesliceN() get broken >>>>> resulting in the test taking much longer than if should. e.g. Every >>>>> contexts needs to be timesliced to release the next batch. Corking the >>>>> first submission until all the batches have been submitted should ensure >>>>> submission order. >>>> >>>> The explanation sounds suspect. >>>> >>>> Consider this comment from the test itself: >>>> >>>> /* >>>> * Create a pair of interlocking batches, that ping pong >>>> * between each other, and only advance one step at a time. >>>> * We require the kernel to preempt at each semaphore and >>>> * switch to the other batch in order to advance. >>>> */ >>>> >>>> I'd say the test does not rely on no re-ordering at all, but relies on >>>> context switch on an unsatisfied semaphore. >>>> >>> >>> Yes, let do a simple example with 5 batches. Batch 0 releases batch's >>> semaphore 1, batch 1 releases batch's 2 semaphore, etc... If the batches >>> are seen in order the test should take 40 timeslices (8 semaphores in >>> each batch have to be released). >>> >>> If the batches are in the below order: >>> 0 2 1 3 4 >>> >>> Now we have 72 timeslices. Now imagine with 67 batches completely out of >>> order, the number timeslices can explode. >> >> Yes that part is clear, issue is to understand why is guc waiting for the >> timeslice to expire.. >> >>>> In the commit you seem to acknowledge GuC does not do that but instead ends >>>> up waiting for the timeslice to expire, did I get that right? If so, why >>> >>> I think GuC waits for the timeslice to expire if a semaphore is >>> unsatisfied, I have to double check on that. I thought that was what >>> execlists were doing too but I now see it has a convoluted algorithm to >>> yield the timeslice if subsequent request comes in and the ring is >>> waiting on a timeslice. Let me check with GuC team and see if they can >>> / are doing something similiar. I was thinking the only to switch a >>> sempahore was clear CTX_CTRL_INHIBIT_SYN_CTX_SWITCH but that appears to >>> be incorrect. >> >> .. so this will need clarifying with the firmware team. >> > > They do not use the GT_WAIT_SEMAPHORE_INTERRUPT. However, we can > clear CTX_CTRL_INHIBIT_SYN_CTX_SWITCH will result in more or less the > same behavior as execlists but I'm suspect if that is the right > solution. More on that below. > >> With execlists we enable and react on GT_WAIT_SEMAPHORE_INTERRUPT. If guc > > Because execlists does this, doesn't mean it is the spec or is correct. > As far as I can tell this behavior is yet another thing just shoehorned > into the execlists scheduler without a ton of thought or input from > architecture about what the scheduler should look like or what the UMD > needs actually are. > > If we change anything related to GuC scheduling there needs to be a clear > need - again saying execlists does this is not an argument. There needs > to be an agreement with architecture, the UMD teams, the i915 team, > possibly the Windows team, and the GuC team before we make any changes. > > IMO the correct solution is to use tokens. Have uAPI interface which > distributes tokens to the UMDs, the i915 clears context switch inhibit > bit in the LRC if the user opted into tokens, and now semaphores switch > out automatically and get rescheduled when the token is signaled. Tokens are Gen12+ right? Downside in that plan would be what do you do with earlier platforms. >> does not, or can not, do that that could be worrying since userspace can and >> does use semaphores legitimately so making it pay the timeslice penalty. >> Well actually that has an effect to unrelated clients as well, not just the >> semaphore user. > > Not buying this argument. Any user can submit a long running batch that > always uses its full time slice and this affects unrelated clients. To an extent, but it's not the same if that batch is long running due some work it's doing, or long running because it sits there waiting on an unsatisfied semaphore wasting everyone's time. If nothing because that might not be what the userspace expects. But yes, you will need to figure out if UMDs benefit from this in practical use cases before you can rip this out. And it will tie back to the thing about tokens and uapi you mention. (Although I don't immediately see how exposing the hardware "flavour of the day" thing like tokens makes a good candidate to be mentioned in the uapi. Especially given their limited nature.) >>> For what is worth, after this change the run times of test are pretty >>> similar for execlists & GuC on TGL. >> >> Yes, but the test was useful in this case since it found a weakness in guc >> scheduling so it may not be the best approach to hide that. >> > > Not a weakness, just a difference. Okay not a weakness, it's just much slower when userspace uses semaphores. :) Also, I worry submit fence works for you in this patch for you not by ABI contract but due implementation details. Probably both in case of execlists and GuC. Because all it is guaranteeing as part of its ABI contract is that request B will not enter the backend before request A. But backend is really free to execute them in any order. (Assuming no other dependencies.) So I think that's the second reason this patch as is is not the best choice. Regards, Tvrtko ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-08-19 23:31 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-27 18:20 [Intel-gfx] [PATCH i-g-t 0/1] Fix gem_scheduler.manycontexts for GuC submission Matthew Brost 2021-07-27 18:20 ` [Intel-gfx] [PATCH i-g-t 1/1] i915/gem_scheduler: Ensure submission order in manycontexts Matthew Brost 2021-07-29 23:54 ` [Intel-gfx] [igt-dev] " John Harrison 2021-07-30 0:00 ` Matthew Brost 2021-08-19 23:31 ` John Harrison 2021-07-30 9:58 ` [Intel-gfx] " Tvrtko Ursulin 2021-07-30 18:06 ` Matthew Brost 2021-08-02 8:59 ` Tvrtko Ursulin 2021-08-02 20:10 ` Matthew Brost 2021-08-03 8:54 ` Tvrtko Ursulin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).