* [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context @ 2022-08-15 16:01 Alan Previn 2022-08-15 16:01 ` [Intel-gfx] [Intel-gfx 1/2] drm/i915/selftests: Use correct selfest calls for live tests Alan Previn ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Alan Previn @ 2022-08-15 16:01 UTC (permalink / raw) To: intel-gfx This is a revival of the same series posted by Matthew Brost back in October 2021 (https://patchwork.freedesktop.org/series/96167/). Additional real world measured metrics is included this time around that has proven the effectiveness of this series. This series adds a delay before disabling scheduling the guc-context when a context has become idle. The 2nd patch should explain it quite well. This is the 4th rev of this series (counting from the first version by Matt). Changes from prior revs: v4: Fix build error. v3: Differentiate and appropriately name helper functions for getting the 'default threshold of num-guc-ids' vs the 'max threshold of num-guc-ids' for bypassing sched-disable and use the correct one for the debugfs validation (John Harrison). v2: Changed the default of the schedule-disable delay to 34 milisecs and added debugfs to control this timing knob. Also added a debugfs to control the bypass for not delaying the schedule-disable if the we are under pressure with a very low balance of remaining guc-ds. (John Harrison). v1: Drop the trace log for intel_context_close (Chris Wilson). Add "Tested-by" into patch-2 (Chris Wilson) Add JIRA number into patch-0 (Chris Wilson). Summaries patch-2s problem and metrics into cover letter (Chris Wilson). Matthew Brost (2): drm/i915/selftests: Use correct selfest calls for live tests drm/i915/guc: Add delay to disable scheduling after pin count goes to zero drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- .../i915/gem/selftests/i915_gem_coherency.c | 2 +- .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 2 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 2 +- .../drm/i915/gem/selftests/i915_gem_object.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 9 ++ drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 18 +++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 57 +++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 150 +++++++++++++++--- drivers/gpu/drm/i915/i915_selftest.h | 2 + drivers/gpu/drm/i915/i915_trace.h | 10 ++ drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 2 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 2 +- drivers/gpu/drm/i915/selftests/i915_request.c | 2 +- drivers/gpu/drm/i915/selftests/i915_vma.c | 2 +- 16 files changed, 237 insertions(+), 34 deletions(-) base-commit: 1cb5379e17f93685065d8ec54444f1baf9386ffe -- 2.25.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Intel-gfx] [Intel-gfx 1/2] drm/i915/selftests: Use correct selfest calls for live tests 2022-08-15 16:01 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn @ 2022-08-15 16:01 ` Alan Previn 2022-08-15 22:40 ` John Harrison 2022-08-15 16:01 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn ` (2 subsequent siblings) 3 siblings, 1 reply; 13+ messages in thread From: Alan Previn @ 2022-08-15 16:01 UTC (permalink / raw) To: intel-gfx From: Matthew Brost <matthew.brost@intel.com> This will help in an upcoming patch where the live selftest wrappers are extended to do more. Signed-off-by: Matthew Brost <matthew.brost@intel.com> --- drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c | 2 +- drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c | 2 +- drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 2 +- drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c | 2 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 2 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 2 +- drivers/gpu/drm/i915/selftests/i915_request.c | 2 +- drivers/gpu/drm/i915/selftests/i915_vma.c | 2 +- 8 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c index 13b088cc787e..a666d7e610f5 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c @@ -434,5 +434,5 @@ int i915_gem_coherency_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_gem_coherency), }; - return i915_subtests(tests, i915); + return i915_live_subtests(tests, i915); } diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c index 62c61af77a42..51ed824b020c 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c @@ -476,5 +476,5 @@ int i915_gem_dmabuf_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_dmabuf_import_same_driver_lmem_smem), }; - return i915_subtests(tests, i915); + return i915_live_subtests(tests, i915); } diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 3ced9948a331..3cff08f04f6c 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -1844,5 +1844,5 @@ int i915_gem_mman_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_mmap_gpu), }; - return i915_subtests(tests, i915); + return i915_live_subtests(tests, i915); } diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c index fe0a890775e2..bdf5bb40ccf1 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c @@ -95,5 +95,5 @@ int i915_gem_object_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_gem_huge), }; - return i915_subtests(tests, i915); + return i915_live_subtests(tests, i915); } diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index ab9f17fc85bc..fb5e61963479 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -2324,5 +2324,5 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915) GEM_BUG_ON(offset_in_page(to_gt(i915)->ggtt->vm.total)); - return i915_subtests(tests, i915); + return i915_live_subtests(tests, i915); } diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c index 88db2e3d81d0..429c6d73b159 100644 --- a/drivers/gpu/drm/i915/selftests/i915_perf.c +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c @@ -431,7 +431,7 @@ int i915_perf_live_selftests(struct drm_i915_private *i915) if (err) return err; - err = i915_subtests(tests, i915); + err = i915_live_subtests(tests, i915); destroy_empty_config(&i915->perf); diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index ec05f578a698..818a4909c1f3 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -1821,7 +1821,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915) if (intel_gt_is_wedged(to_gt(i915))) return 0; - return i915_subtests(tests, i915); + return i915_live_subtests(tests, i915); } static int switch_to_kernel_sync(struct intel_context *ce, int err) diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c index 6921ba128015..e3821398a5b0 100644 --- a/drivers/gpu/drm/i915/selftests/i915_vma.c +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c @@ -1103,5 +1103,5 @@ int i915_vma_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_vma_remapped_gtt), }; - return i915_subtests(tests, i915); + return i915_live_subtests(tests, i915); } -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] [Intel-gfx 1/2] drm/i915/selftests: Use correct selfest calls for live tests 2022-08-15 16:01 ` [Intel-gfx] [Intel-gfx 1/2] drm/i915/selftests: Use correct selfest calls for live tests Alan Previn @ 2022-08-15 22:40 ` John Harrison 2022-08-16 1:54 ` Teres Alexis, Alan Previn 0 siblings, 1 reply; 13+ messages in thread From: John Harrison @ 2022-08-15 22:40 UTC (permalink / raw) To: Alan Previn, intel-gfx On 8/15/2022 09:01, Alan Previn wrote: > From: Matthew Brost <matthew.brost@intel.com> > > This will help in an upcoming patch where the live selftest wrappers > are extended to do more. > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> As per previous review, you still need to add your s-o-b as well as you are posting the patch. Also, why is the subject "[Intel-gfx X/2]" rather than "[PATCH X/2]"? The mailing list automatically adds the "[Intel-gfx]" prefix. You just need to use the standard "PATCH" or "PATCH v###" as appropriate. John. > --- > drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c | 2 +- > drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c | 2 +- > drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 2 +- > drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c | 2 +- > drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 2 +- > drivers/gpu/drm/i915/selftests/i915_perf.c | 2 +- > drivers/gpu/drm/i915/selftests/i915_request.c | 2 +- > drivers/gpu/drm/i915/selftests/i915_vma.c | 2 +- > 8 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c > index 13b088cc787e..a666d7e610f5 100644 > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c > @@ -434,5 +434,5 @@ int i915_gem_coherency_live_selftests(struct drm_i915_private *i915) > SUBTEST(igt_gem_coherency), > }; > > - return i915_subtests(tests, i915); > + return i915_live_subtests(tests, i915); > } > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c > index 62c61af77a42..51ed824b020c 100644 > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c > @@ -476,5 +476,5 @@ int i915_gem_dmabuf_live_selftests(struct drm_i915_private *i915) > SUBTEST(igt_dmabuf_import_same_driver_lmem_smem), > }; > > - return i915_subtests(tests, i915); > + return i915_live_subtests(tests, i915); > } > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c > index 3ced9948a331..3cff08f04f6c 100644 > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c > @@ -1844,5 +1844,5 @@ int i915_gem_mman_live_selftests(struct drm_i915_private *i915) > SUBTEST(igt_mmap_gpu), > }; > > - return i915_subtests(tests, i915); > + return i915_live_subtests(tests, i915); > } > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c > index fe0a890775e2..bdf5bb40ccf1 100644 > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object.c > @@ -95,5 +95,5 @@ int i915_gem_object_live_selftests(struct drm_i915_private *i915) > SUBTEST(igt_gem_huge), > }; > > - return i915_subtests(tests, i915); > + return i915_live_subtests(tests, i915); > } > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c > index ab9f17fc85bc..fb5e61963479 100644 > --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c > @@ -2324,5 +2324,5 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915) > > GEM_BUG_ON(offset_in_page(to_gt(i915)->ggtt->vm.total)); > > - return i915_subtests(tests, i915); > + return i915_live_subtests(tests, i915); > } > diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c > index 88db2e3d81d0..429c6d73b159 100644 > --- a/drivers/gpu/drm/i915/selftests/i915_perf.c > +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c > @@ -431,7 +431,7 @@ int i915_perf_live_selftests(struct drm_i915_private *i915) > if (err) > return err; > > - err = i915_subtests(tests, i915); > + err = i915_live_subtests(tests, i915); > > destroy_empty_config(&i915->perf); > > diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c > index ec05f578a698..818a4909c1f3 100644 > --- a/drivers/gpu/drm/i915/selftests/i915_request.c > +++ b/drivers/gpu/drm/i915/selftests/i915_request.c > @@ -1821,7 +1821,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915) > if (intel_gt_is_wedged(to_gt(i915))) > return 0; > > - return i915_subtests(tests, i915); > + return i915_live_subtests(tests, i915); > } > > static int switch_to_kernel_sync(struct intel_context *ce, int err) > diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c > index 6921ba128015..e3821398a5b0 100644 > --- a/drivers/gpu/drm/i915/selftests/i915_vma.c > +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c > @@ -1103,5 +1103,5 @@ int i915_vma_live_selftests(struct drm_i915_private *i915) > SUBTEST(igt_vma_remapped_gtt), > }; > > - return i915_subtests(tests, i915); > + return i915_live_subtests(tests, i915); > } ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] [Intel-gfx 1/2] drm/i915/selftests: Use correct selfest calls for live tests 2022-08-15 22:40 ` John Harrison @ 2022-08-16 1:54 ` Teres Alexis, Alan Previn 0 siblings, 0 replies; 13+ messages in thread From: Teres Alexis, Alan Previn @ 2022-08-16 1:54 UTC (permalink / raw) To: Harrison, John C, intel-gfx Hmmm - i guess I've been it wrong all the while (been using the same script for a while). Will fix - thanks ...alan On Mon, 2022-08-15 at 15:40 -0700, Harrison, John C wrote: > On 8/15/2022 09:01, Alan Previn wrote: > > From: Matthew Brost <matthew.brost@intel.com> > > > > This will help in an upcoming patch where the live selftest wrappers > > are extended to do more. > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > As per previous review, you still need to add your s-o-b as well as you > are posting the patch. > > Also, why is the subject "[Intel-gfx X/2]" rather than "[PATCH X/2]"? > The mailing list automatically adds the "[Intel-gfx]" prefix. You just > need to use the standard "PATCH" or "PATCH v###" as appropriate. > > John. > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero 2022-08-15 16:01 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn 2022-08-15 16:01 ` [Intel-gfx] [Intel-gfx 1/2] drm/i915/selftests: Use correct selfest calls for live tests Alan Previn @ 2022-08-15 16:01 ` Alan Previn 2022-08-15 23:57 ` John Harrison 2022-08-15 16:33 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Delay disabling scheduling on a context (rev4) Patchwork 2022-08-15 16:55 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork 3 siblings, 1 reply; 13+ messages in thread From: Alan Previn @ 2022-08-15 16:01 UTC (permalink / raw) To: intel-gfx From: Matthew Brost <matthew.brost@intel.com> Add a delay, configurable via debugs (default 34ms), to disable scheduling of a context after the pin count goes to zero. Disable scheduling is a somewhat costly operation so the idea is that a delay allows the user to resubmit something before doing this operation. This delay is only done if the context isn't closed and less than 3/4 of total guc_ids are in use. As temporary WA disable this feature for the selftests. Selftests are very timing sensitive and any change in timing can cause failure. A follow up patch will fixup the selftests to understand this delay. Alan Previn: Matt Brost first introduced this series back in Oct 2021. However no real world workload with measured performance impact was available to prove the intended results. Today, this series is being republished in response to a real world workload that benefited greatly from it along with measured performance improvement. Workload description: 36 containers were created on a DG2 device where each container was performing a combination of 720p 3d game rendering and 30fps video encoding. The workload density was configured in way that guaranteed each container to ALWAYS be able to render and encode no less than 30fps with a predefined maximum render + encode latency time. That means that the totality of all 36 containers and its workloads were not saturating the utilized hw engines to its max (in order to maintain just enough headrooom to meet the min fps and max latencies of incoming container submissions). Problem statement: It was observed that the CPU utilization of the CPU core that was pinned to i915 soft IRQ work was experiencing severe load. Using tracelogs and an instrumentation patch to count specific i915 IRQ events, it was confirmed that the majority of the CPU cycles were caused by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast majority of the cycles was determined to be processing a specific G2H IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent by GuC in response to i915 KMD sending H2G requests: INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G request are sent whenever a context is idle so that we can unpin the context from GuC. The high CPU utilization % symptom was limiting density scaling. Root Cause Analysis: Because the incoming execution buffers were spread across 36 different containers (each with multiple contexts) but the system in totality was NOT saturated to the max, it was assumed that each context was constantly idling between submissions. This was causing a thrashing of unpinning contexts from GuC at one moment, followed quickly by repinning them due to incoming workload the very next moment. These event-pairs were being triggered across multiple contexts per container, across all containers at the rate of > 30 times per sec per context. Metrics: When running this workload without this patch, we measured an average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or ~10 million times over ~25+ mins. With this patch, the count reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The improvement observed is ~99% for the average counts per 10 seconds. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 9 ++ drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 18 +++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 57 +++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 150 +++++++++++++++--- drivers/gpu/drm/i915/i915_selftest.h | 2 + drivers/gpu/drm/i915/i915_trace.h | 10 ++ 8 files changed, 229 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index dabdfe09f5e5..df7fd1b019ec 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1454,7 +1454,7 @@ static void engines_idle_release(struct i915_gem_context *ctx, int err; /* serialises with execbuf */ - set_bit(CONTEXT_CLOSED_BIT, &ce->flags); + intel_context_close(ce); if (!intel_context_pin_if_active(ce)) continue; diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 8e2d70630c49..7cc4bb9ad042 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -276,6 +276,15 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce) return test_bit(CONTEXT_BARRIER_BIT, &ce->flags); } +static inline void intel_context_close(struct intel_context *ce) +{ + set_bit(CONTEXT_CLOSED_BIT, &ce->flags); + + trace_intel_context_close(ce); + if (ce->ops->close) + ce->ops->close(ce); +} + static inline bool intel_context_is_closed(const struct intel_context *ce) { return test_bit(CONTEXT_CLOSED_BIT, &ce->flags); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 04eacae1aca5..86ac84e2edb9 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -43,6 +43,8 @@ struct intel_context_ops { void (*revoke)(struct intel_context *ce, struct i915_request *rq, unsigned int preempt_timeout_ms); + void (*close)(struct intel_context *ce); + int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr); int (*pin)(struct intel_context *ce, void *vaddr); void (*unpin)(struct intel_context *ce); @@ -208,6 +210,11 @@ struct intel_context { * each priority bucket */ u32 prio_count[GUC_CLIENT_PRIORITY_NUM]; + /** + * @sched_disable_delay: worker to disable scheduling on this + * context + */ + struct delayed_work sched_disable_delay; } guc_state; struct { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index a7acffbf15d1..2f534baf6f45 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -112,6 +112,10 @@ struct intel_guc { * refs */ struct list_head guc_id_list; + /** + * @guc_ids_in_use: Number single-lrc guc_ids in use + */ + u16 guc_ids_in_use; /** * @destroyed_contexts: list of contexts waiting to be destroyed * (deregistered with the GuC) @@ -132,6 +136,18 @@ struct intel_guc { * @reset_fail_mask: mask of engines that failed to reset */ intel_engine_mask_t reset_fail_mask; +#define SCHED_DISABLE_DELAY_MS 34 + /** + * @sched_disable_delay_ms: schedule disable delay, in ms, for + * contexts + */ + u64 sched_disable_delay_ms; + /** + * @sched_disable_gucid_threshold: threshold of min remaining available + * guc_ids before we start bypassing the schedule disable delay + */ + int sched_disable_gucid_threshold; + } submission_state; /** @@ -464,4 +480,6 @@ void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p); void intel_guc_write_barrier(struct intel_guc *guc); +int intel_guc_get_sched_disable_gucid_threshold_max(struct intel_guc *guc); + #endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index 25f09a420561..c0e2fe46050f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -71,12 +71,69 @@ static bool intel_eval_slpc_support(void *data) return intel_guc_slpc_is_used(guc); } +static int guc_sched_disable_delay_ms_get(void *data, u64 *val) +{ + struct intel_guc *guc = data; + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + *val = guc->submission_state.sched_disable_delay_ms; + + return 0; +} + +static int guc_sched_disable_delay_ms_set(void *data, u64 val) +{ + struct intel_guc *guc = data; + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + guc->submission_state.sched_disable_delay_ms = val; + + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_delay_ms_fops, + guc_sched_disable_delay_ms_get, + guc_sched_disable_delay_ms_set, "%lld\n"); + +static int guc_sched_disable_gucid_threshold_get(void *data, u64 *val) +{ + struct intel_guc *guc = data; + + *val = guc->submission_state.sched_disable_gucid_threshold; + return 0; +} + +static int guc_sched_disable_gucid_threshold_set(void *data, u64 val) +{ + struct intel_guc *guc = data; + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + if (val > intel_guc_get_sched_disable_gucid_threshold_max(guc)) + guc->submission_state.sched_disable_gucid_threshold = + intel_guc_get_sched_disable_gucid_threshold_max(guc); + else + guc->submission_state.sched_disable_gucid_threshold = val; + + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_gucid_threshold_fops, + guc_sched_disable_gucid_threshold_get, + guc_sched_disable_gucid_threshold_set, "%lld\n"); + void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct intel_gt_debugfs_file files[] = { { "guc_info", &guc_info_fops, NULL }, { "guc_registered_contexts", &guc_registered_contexts_fops, NULL }, { "guc_slpc_info", &guc_slpc_info_fops, &intel_eval_slpc_support}, + { "guc_sched_disable_delay_ms", &guc_sched_disable_delay_ms_fops, NULL }, + { "guc_sched_disable_gucid_threshold", &guc_sched_disable_gucid_threshold_fops, + NULL }, }; if (!intel_guc_is_supported(guc)) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0d17da77e787..30612a6e3a27 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -65,7 +65,14 @@ * corresponding G2H returns indicating the scheduling disable operation has * completed it is safe to unpin the context. While a disable is in flight it * isn't safe to resubmit the context so a fence is used to stall all future - * requests of that context until the G2H is returned. + * requests of that context until the G2H is returned. Because this interaction + * with the GuC takes a non-zero amount of time / GuC cycles we delay the + * disabling of scheduling after the pin count goes to zero by a configurable + * period of time (default 34 ms, see SCHED_DISABLE_DELAY_MS). The thought is + * this gives the user a window of time to resubmit something on the context + * before doing a somewhat costly operation. This delay is only done if it + * isn't closed and the guc_id usage is less than a threshold (default 3/4, + * (see intel_guc_get_sched_disable_gucid_threshold_default()). * * Context deregistration: * Before a context can be destroyed or if we steal its guc_id we must @@ -1989,6 +1996,9 @@ static int new_guc_id(struct intel_guc *guc, struct intel_context *ce) if (unlikely(ret < 0)) return ret; + if (!intel_context_is_parent(ce)) + ++guc->submission_state.guc_ids_in_use; + ce->guc_id.id = ret; return 0; } @@ -1998,14 +2008,16 @@ static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce) GEM_BUG_ON(intel_context_is_child(ce)); if (!context_guc_id_invalid(ce)) { - if (intel_context_is_parent(ce)) + if (intel_context_is_parent(ce)) { bitmap_release_region(guc->submission_state.guc_ids_bitmap, ce->guc_id.id, order_base_2(ce->parallel.number_children + 1)); - else + } else { + --guc->submission_state.guc_ids_in_use; ida_simple_remove(&guc->submission_state.guc_ids, ce->guc_id.id); + } clr_ctx_id_mapping(guc, ce->guc_id.id); set_context_guc_id_invalid(ce); } @@ -2993,41 +3005,98 @@ guc_context_revoke(struct intel_context *ce, struct i915_request *rq, } } -static void guc_context_sched_disable(struct intel_context *ce) +static void guc_context_sched_disable(struct intel_context *ce); + +static void do_sched_disable(struct intel_guc *guc, struct intel_context *ce, + unsigned long flags) + __releases(ce->guc_state.lock) { - struct intel_guc *guc = ce_to_guc(ce); - unsigned long flags; struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; intel_wakeref_t wakeref; - u16 guc_id; + lockdep_assert_held(&ce->guc_state.lock); + + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + with_intel_runtime_pm(runtime_pm, wakeref) + guc_context_sched_disable(ce); +} + +static bool bypass_sched_disable(struct intel_guc *guc, + struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); GEM_BUG_ON(intel_context_is_child(ce)); + if (submission_disabled(guc) || context_guc_id_invalid(ce) || + !ctx_id_mapped(guc, ce->guc_id.id)) { + clr_context_enabled(ce); + return true; + } + + return !context_enabled(ce); +} + +static void __delay_sched_disable(struct work_struct *wrk) +{ + struct intel_context *ce = + container_of(wrk, typeof(*ce), guc_state.sched_disable_delay.work); + struct intel_guc *guc = ce_to_guc(ce); + unsigned long flags; + spin_lock_irqsave(&ce->guc_state.lock, flags); - /* - * We have to check if the context has been disabled by another thread, - * check if submssion has been disabled to seal a race with reset and - * finally check if any more requests have been committed to the - * context ensursing that a request doesn't slip through the - * 'context_pending_disable' fence. - */ - if (unlikely(!context_enabled(ce) || submission_disabled(guc) || - context_has_committed_requests(ce))) { - clr_context_enabled(ce); + if (bypass_sched_disable(guc, ce)) { spin_unlock_irqrestore(&ce->guc_state.lock, flags); - goto unpin; + intel_context_sched_disable_unpin(ce); + } else { + do_sched_disable(guc, ce, flags); } - guc_id = prep_context_pending_disable(ce); +} - spin_unlock_irqrestore(&ce->guc_state.lock, flags); +static bool guc_id_pressure(struct intel_guc *guc, struct intel_context *ce) +{ + /* + * parent contexts are perma-pinned, if we are unpinning do schedule + * disable immediately. + */ + if (intel_context_is_parent(ce)) + return true; - with_intel_runtime_pm(runtime_pm, wakeref) - __guc_context_sched_disable(guc, ce, guc_id); + /* + * If we are beyond the threshold for avail guc_ids, do schedule disable immediately. + */ + return guc->submission_state.guc_ids_in_use > + guc->submission_state.sched_disable_gucid_threshold; +} + +static void guc_context_sched_disable(struct intel_context *ce) +{ + struct intel_guc *guc = ce_to_guc(ce); + u64 delay = guc->submission_state.sched_disable_delay_ms; + unsigned long flags; + + spin_lock_irqsave(&ce->guc_state.lock, flags); + + if (bypass_sched_disable(guc, ce)) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + intel_context_sched_disable_unpin(ce); + } else if (!intel_context_is_closed(ce) && !guc_id_pressure(guc, ce) && + delay) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + mod_delayed_work(system_unbound_wq, + &ce->guc_state.sched_disable_delay, + msecs_to_jiffies(delay)); + } else { + do_sched_disable(guc, ce, flags); + } +} - return; -unpin: - intel_context_sched_disable_unpin(ce); +static void guc_context_close(struct intel_context *ce) +{ + if (test_bit(CONTEXT_GUC_INIT, &ce->flags) && + cancel_delayed_work(&ce->guc_state.sched_disable_delay)) + __delay_sched_disable(&ce->guc_state.sched_disable_delay.work); } static inline void guc_lrc_desc_unpin(struct intel_context *ce) @@ -3346,6 +3415,8 @@ static void remove_from_context(struct i915_request *rq) static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc, + .close = guc_context_close, + .pre_pin = guc_context_pre_pin, .pin = guc_context_pin, .unpin = guc_context_unpin, @@ -3428,6 +3499,10 @@ static void guc_context_init(struct intel_context *ce) rcu_read_unlock(); ce->guc_state.prio = map_i915_prio_to_guc_prio(prio); + + INIT_DELAYED_WORK(&ce->guc_state.sched_disable_delay, + __delay_sched_disable); + set_bit(CONTEXT_GUC_INIT, &ce->flags); } @@ -3465,6 +3540,9 @@ static int guc_request_alloc(struct i915_request *rq) if (unlikely(!test_bit(CONTEXT_GUC_INIT, &ce->flags))) guc_context_init(ce); + if (cancel_delayed_work(&ce->guc_state.sched_disable_delay)) + intel_context_sched_disable_unpin(ce); + /* * Call pin_guc_id here rather than in the pinning step as with * dma_resv, contexts can be repeatedly pinned / unpinned trashing the @@ -3595,6 +3673,8 @@ static int guc_virtual_context_alloc(struct intel_context *ce) static const struct intel_context_ops virtual_guc_context_ops = { .alloc = guc_virtual_context_alloc, + .close = guc_context_close, + .pre_pin = guc_virtual_context_pre_pin, .pin = guc_virtual_context_pin, .unpin = guc_virtual_context_unpin, @@ -3684,6 +3764,8 @@ static void guc_child_context_destroy(struct kref *kref) static const struct intel_context_ops virtual_parent_context_ops = { .alloc = guc_virtual_context_alloc, + .close = guc_context_close, + .pre_pin = guc_context_pre_pin, .pin = guc_parent_context_pin, .unpin = guc_parent_context_unpin, @@ -4207,6 +4289,21 @@ static bool __guc_submission_selected(struct intel_guc *guc) return i915->params.enable_guc & ENABLE_GUC_SUBMISSION; } +static int __guc_get_non_lrc_num_guc_ids(struct intel_guc *guc) +{ + return guc->submission_state.num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc); +} + +static int __guc_get_sched_disable_gucid_threshold_default(struct intel_guc *guc) +{ + return (__guc_get_non_lrc_num_guc_ids(guc) / 4 * 3); +} + +int intel_guc_get_sched_disable_gucid_threshold_max(struct intel_guc *guc) +{ + return __guc_get_non_lrc_num_guc_ids(guc); +} + void intel_guc_submission_init_early(struct intel_guc *guc) { xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ); @@ -4223,7 +4320,10 @@ void intel_guc_submission_init_early(struct intel_guc *guc) spin_lock_init(&guc->timestamp.lock); INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping); + guc->submission_state.sched_disable_delay_ms = SCHED_DISABLE_DELAY_MS; guc->submission_state.num_guc_ids = GUC_MAX_CONTEXT_ID; + guc->submission_state.sched_disable_gucid_threshold = + __guc_get_sched_disable_gucid_threshold_default(guc); guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc); } diff --git a/drivers/gpu/drm/i915/i915_selftest.h b/drivers/gpu/drm/i915/i915_selftest.h index f54de0499be7..bdf3e22c0a34 100644 --- a/drivers/gpu/drm/i915/i915_selftest.h +++ b/drivers/gpu/drm/i915/i915_selftest.h @@ -92,12 +92,14 @@ int __i915_subtests(const char *caller, T, ARRAY_SIZE(T), data) #define i915_live_subtests(T, data) ({ \ typecheck(struct drm_i915_private *, data); \ + (data)->gt[0]->uc.guc.submission_state.sched_disable_delay_ms = 0; \ __i915_subtests(__func__, \ __i915_live_setup, __i915_live_teardown, \ T, ARRAY_SIZE(T), data); \ }) #define intel_gt_live_subtests(T, data) ({ \ typecheck(struct intel_gt *, data); \ + (data)->uc.guc.submission_state.sched_disable_delay_ms = 0; \ __i915_subtests(__func__, \ __intel_gt_live_setup, __intel_gt_live_teardown, \ T, ARRAY_SIZE(T), data); \ diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 37b5c9e9d260..1b85c671935c 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -430,6 +430,11 @@ DEFINE_EVENT(intel_context, intel_context_reset, TP_ARGS(ce) ); +DEFINE_EVENT(intel_context, intel_context_close, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + DEFINE_EVENT(intel_context, intel_context_ban, TP_PROTO(struct intel_context *ce), TP_ARGS(ce) @@ -532,6 +537,11 @@ trace_intel_context_reset(struct intel_context *ce) { } +static inline void +trace_intel_context_close(struct intel_context *ce) +{ +} + static inline void trace_intel_context_ban(struct intel_context *ce) { -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero 2022-08-15 16:01 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn @ 2022-08-15 23:57 ` John Harrison 2022-08-16 1:02 ` Teres Alexis, Alan Previn 0 siblings, 1 reply; 13+ messages in thread From: John Harrison @ 2022-08-15 23:57 UTC (permalink / raw) To: Alan Previn, intel-gfx On 8/15/2022 09:01, Alan Previn wrote: > From: Matthew Brost <matthew.brost@intel.com> > > Add a delay, configurable via debugs (default 34ms), to disable debugs -> debugfs > scheduling of a context after the pin count goes to zero. Disable > scheduling is a somewhat costly operation so the idea is that a delay costly operation as it requires synchronising with the GuC. So the idea > allows the user to resubmit something before doing this operation. > This delay is only done if the context isn't closed and less than 3/4 less than a given threshold (default is 3/4) of the guc_ids > of total guc_ids are in use. > > As temporary WA disable this feature for the selftests. Selftests are > very timing sensitive and any change in timing can cause failure. A > follow up patch will fixup the selftests to understand this delay. > > Alan Previn: Matt Brost first introduced this series back in Oct 2021. > However no real world workload with measured performance impact was > available to prove the intended results. Today, this series is being > republished in response to a real world workload that benefited greatly > from it along with measured performance improvement. > > Workload description: 36 containers were created on a DG2 device where > each container was performing a combination of 720p 3d game rendering > and 30fps video encoding. The workload density was configured in way in a way > that guaranteed each container to ALWAYS be able to render and > encode no less than 30fps with a predefined maximum render + encode > latency time. That means that the totality of all 36 containers and its and their > workloads were not saturating the utilized hw engines to its max "not saturating the hardware to its max" or "not saturating the engines to their max" > (in order to maintain just enough headrooom to meet the min fps and > max latencies of incoming container submissions). > > Problem statement: It was observed that the CPU utilization of the CPU > core that was pinned to i915 soft IRQ work was experiencing severe load. Lots of repetition. Would be better as just: It was observed that the CPU core processing the i915 soft IRQ work was experiencing severe load. > Using tracelogs and an instrumentation patch to count specific i915 IRQ > events, it was confirmed that the majority of the CPU cycles were caused > by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast > majority of the cycles was determined to be processing a specific G2H > IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent > by GuC in response to i915 KMD sending H2G requests: > INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G request are sent requests > whenever a context is idle so that we can unpin the context from GuC. a context goes idle > The high CPU utilization % symptom was limiting density scaling. > > Root Cause Analysis: Because the incoming execution buffers were spread > across 36 different containers (each with multiple contexts) but the > system in totality was NOT saturated to the max, it was assumed that each > context was constantly idling between submissions. This was causing > a thrashing of unpinning contexts from GuC at one moment, followed quickly > by repinning them due to incoming workload the very next moment. These > event-pairs were being triggered across multiple contexts per container, > across all containers at the rate of > 30 times per sec per context. > > Metrics: When running this workload without this patch, we measured an > average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 > seconds or ~10 million times over ~25+ mins. With this patch, the count > reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The > improvement observed is ~99% for the average counts per 10 seconds. > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- > drivers/gpu/drm/i915/gt/intel_context.h | 9 ++ > drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + > drivers/gpu/drm/i915/gt/uc/intel_guc.h | 18 +++ > .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 57 +++++++ > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 150 +++++++++++++++--- > drivers/gpu/drm/i915/i915_selftest.h | 2 + > drivers/gpu/drm/i915/i915_trace.h | 10 ++ > 8 files changed, 229 insertions(+), 26 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c > index dabdfe09f5e5..df7fd1b019ec 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > @@ -1454,7 +1454,7 @@ static void engines_idle_release(struct i915_gem_context *ctx, > int err; > > /* serialises with execbuf */ > - set_bit(CONTEXT_CLOSED_BIT, &ce->flags); > + intel_context_close(ce); > if (!intel_context_pin_if_active(ce)) > continue; > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h > index 8e2d70630c49..7cc4bb9ad042 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context.h > +++ b/drivers/gpu/drm/i915/gt/intel_context.h > @@ -276,6 +276,15 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce) > return test_bit(CONTEXT_BARRIER_BIT, &ce->flags); > } > > +static inline void intel_context_close(struct intel_context *ce) > +{ > + set_bit(CONTEXT_CLOSED_BIT, &ce->flags); > + > + trace_intel_context_close(ce); Doesn't the v1 commit message update say that this trace has been dropped? Or was it dropped from somewhere else? IMHO, it looks plausible here. > + if (ce->ops->close) > + ce->ops->close(ce); > +} > + > static inline bool intel_context_is_closed(const struct intel_context *ce) > { > return test_bit(CONTEXT_CLOSED_BIT, &ce->flags); > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h > index 04eacae1aca5..86ac84e2edb9 100644 > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h > @@ -43,6 +43,8 @@ struct intel_context_ops { > void (*revoke)(struct intel_context *ce, struct i915_request *rq, > unsigned int preempt_timeout_ms); > > + void (*close)(struct intel_context *ce); > + > int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr); > int (*pin)(struct intel_context *ce, void *vaddr); > void (*unpin)(struct intel_context *ce); > @@ -208,6 +210,11 @@ struct intel_context { > * each priority bucket > */ > u32 prio_count[GUC_CLIENT_PRIORITY_NUM]; > + /** > + * @sched_disable_delay: worker to disable scheduling on this > + * context > + */ > + struct delayed_work sched_disable_delay; > } guc_state; > > struct { > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > index a7acffbf15d1..2f534baf6f45 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > @@ -112,6 +112,10 @@ struct intel_guc { > * refs > */ > struct list_head guc_id_list; > + /** > + * @guc_ids_in_use: Number single-lrc guc_ids in use > + */ > + u16 guc_ids_in_use; > /** > * @destroyed_contexts: list of contexts waiting to be destroyed > * (deregistered with the GuC) > @@ -132,6 +136,18 @@ struct intel_guc { > * @reset_fail_mask: mask of engines that failed to reset > */ > intel_engine_mask_t reset_fail_mask; > +#define SCHED_DISABLE_DELAY_MS 34 This magic number seems like it should have some kind of explanation comment. Also, given that it is only used (and only intended to be used) by the init code in intel_guc_sbmissions.c, it seems wrong to put it in a header file. Including it here implies that random other bits of the driver are expected to use it. > + /** > + * @sched_disable_delay_ms: schedule disable delay, in ms, for > + * contexts > + */ > + u64 sched_disable_delay_ms; > + /** > + * @sched_disable_gucid_threshold: threshold of min remaining available > + * guc_ids before we start bypassing the schedule disable delay > + */ > + int sched_disable_gucid_threshold; > + Extra blank line. > } submission_state; > > /** > @@ -464,4 +480,6 @@ void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p); > > void intel_guc_write_barrier(struct intel_guc *guc); > > +int intel_guc_get_sched_disable_gucid_threshold_max(struct intel_guc *guc); > + > #endif > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c > index 25f09a420561..c0e2fe46050f 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c > @@ -71,12 +71,69 @@ static bool intel_eval_slpc_support(void *data) > return intel_guc_slpc_is_used(guc); > } > > +static int guc_sched_disable_delay_ms_get(void *data, u64 *val) > +{ > + struct intel_guc *guc = data; > + > + if (!intel_guc_submission_is_used(guc)) > + return -ENODEV; > + > + *val = guc->submission_state.sched_disable_delay_ms; > + > + return 0; > +} > + > +static int guc_sched_disable_delay_ms_set(void *data, u64 val) > +{ > + struct intel_guc *guc = data; > + > + if (!intel_guc_submission_is_used(guc)) > + return -ENODEV; > + > + guc->submission_state.sched_disable_delay_ms = val; > + > + return 0; > +} > +DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_delay_ms_fops, > + guc_sched_disable_delay_ms_get, > + guc_sched_disable_delay_ms_set, "%lld\n"); > + > +static int guc_sched_disable_gucid_threshold_get(void *data, u64 *val) > +{ > + struct intel_guc *guc = data; > + Why no check on submission_is_used here? > + *val = guc->submission_state.sched_disable_gucid_threshold; > + return 0; > +} > + > +static int guc_sched_disable_gucid_threshold_set(void *data, u64 val) > +{ > + struct intel_guc *guc = data; > + > + if (!intel_guc_submission_is_used(guc)) > + return -ENODEV; > + > + if (val > intel_guc_get_sched_disable_gucid_threshold_max(guc)) > + guc->submission_state.sched_disable_gucid_threshold = > + intel_guc_get_sched_disable_gucid_threshold_max(guc); > + else > + guc->submission_state.sched_disable_gucid_threshold = val; > + > + return 0; > +} > +DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_gucid_threshold_fops, > + guc_sched_disable_gucid_threshold_get, > + guc_sched_disable_gucid_threshold_set, "%lld\n"); > + > void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) > { > static const struct intel_gt_debugfs_file files[] = { > { "guc_info", &guc_info_fops, NULL }, > { "guc_registered_contexts", &guc_registered_contexts_fops, NULL }, > { "guc_slpc_info", &guc_slpc_info_fops, &intel_eval_slpc_support}, > + { "guc_sched_disable_delay_ms", &guc_sched_disable_delay_ms_fops, NULL }, > + { "guc_sched_disable_gucid_threshold", &guc_sched_disable_gucid_threshold_fops, > + NULL }, > }; > > if (!intel_guc_is_supported(guc)) > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index 0d17da77e787..30612a6e3a27 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -65,7 +65,14 @@ > * corresponding G2H returns indicating the scheduling disable operation has > * completed it is safe to unpin the context. While a disable is in flight it > * isn't safe to resubmit the context so a fence is used to stall all future > - * requests of that context until the G2H is returned. > + * requests of that context until the G2H is returned. Because this interaction > + * with the GuC takes a non-zero amount of time / GuC cycles we delay the I would drop the '/ GuC cycles'. Even if the GuC was instantaneous with its processing, there is still the non-zero round trip time of H2G/G2H. So just saying 'non-zero time' is sufficient - where that time is being spent doesn't really matter. > + * disabling of scheduling after the pin count goes to zero by a configurable > + * period of time (default 34 ms, see SCHED_DISABLE_DELAY_MS). The thought is > + * this gives the user a window of time to resubmit something on the context > + * before doing a somewhat costly operation. This delay is only done if it 'doing this costly operation'. 'only done if the context isn't closed' > + * isn't closed and the guc_id usage is less than a threshold (default 3/4, > + * (see intel_guc_get_sched_disable_gucid_threshold_default()). The function name is incorrect. > * > * Context deregistration: > * Before a context can be destroyed or if we steal its guc_id we must > @@ -1989,6 +1996,9 @@ static int new_guc_id(struct intel_guc *guc, struct intel_context *ce) > if (unlikely(ret < 0)) > return ret; > > + if (!intel_context_is_parent(ce)) > + ++guc->submission_state.guc_ids_in_use; > + > ce->guc_id.id = ret; > return 0; > } > @@ -1998,14 +2008,16 @@ static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce) > GEM_BUG_ON(intel_context_is_child(ce)); > > if (!context_guc_id_invalid(ce)) { > - if (intel_context_is_parent(ce)) > + if (intel_context_is_parent(ce)) { > bitmap_release_region(guc->submission_state.guc_ids_bitmap, > ce->guc_id.id, > order_base_2(ce->parallel.number_children > + 1)); > - else > + } else { > + --guc->submission_state.guc_ids_in_use; > ida_simple_remove(&guc->submission_state.guc_ids, > ce->guc_id.id); > + } > clr_ctx_id_mapping(guc, ce->guc_id.id); > set_context_guc_id_invalid(ce); > } > @@ -2993,41 +3005,98 @@ guc_context_revoke(struct intel_context *ce, struct i915_request *rq, > } > } > > -static void guc_context_sched_disable(struct intel_context *ce) > +static void guc_context_sched_disable(struct intel_context *ce); > + > +static void do_sched_disable(struct intel_guc *guc, struct intel_context *ce, > + unsigned long flags) > + __releases(ce->guc_state.lock) > { > - struct intel_guc *guc = ce_to_guc(ce); > - unsigned long flags; > struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; > intel_wakeref_t wakeref; > - u16 guc_id; > > + lockdep_assert_held(&ce->guc_state.lock); > + > + spin_unlock_irqrestore(&ce->guc_state.lock, flags); > + > + with_intel_runtime_pm(runtime_pm, wakeref) > + guc_context_sched_disable(ce); > +} > + > +static bool bypass_sched_disable(struct intel_guc *guc, > + struct intel_context *ce) > +{ > + lockdep_assert_held(&ce->guc_state.lock); > GEM_BUG_ON(intel_context_is_child(ce)); > > + if (submission_disabled(guc) || context_guc_id_invalid(ce) || > + !ctx_id_mapped(guc, ce->guc_id.id)) { > + clr_context_enabled(ce); > + return true; > + } > + > + return !context_enabled(ce); > +} > + > +static void __delay_sched_disable(struct work_struct *wrk) > +{ > + struct intel_context *ce = > + container_of(wrk, typeof(*ce), guc_state.sched_disable_delay.work); > + struct intel_guc *guc = ce_to_guc(ce); > + unsigned long flags; > + > spin_lock_irqsave(&ce->guc_state.lock, flags); > > - /* > - * We have to check if the context has been disabled by another thread, > - * check if submssion has been disabled to seal a race with reset and > - * finally check if any more requests have been committed to the > - * context ensursing that a request doesn't slip through the > - * 'context_pending_disable' fence. > - */ > - if (unlikely(!context_enabled(ce) || submission_disabled(guc) || > - context_has_committed_requests(ce))) { > - clr_context_enabled(ce); > + if (bypass_sched_disable(guc, ce)) { > spin_unlock_irqrestore(&ce->guc_state.lock, flags); > - goto unpin; > + intel_context_sched_disable_unpin(ce); > + } else { > + do_sched_disable(guc, ce, flags); > } > - guc_id = prep_context_pending_disable(ce); > +} > > - spin_unlock_irqrestore(&ce->guc_state.lock, flags); > +static bool guc_id_pressure(struct intel_guc *guc, struct intel_context *ce) > +{ > + /* > + * parent contexts are perma-pinned, if we are unpinning do schedule > + * disable immediately. > + */ > + if (intel_context_is_parent(ce)) > + return true; > > - with_intel_runtime_pm(runtime_pm, wakeref) > - __guc_context_sched_disable(guc, ce, guc_id); > + /* > + * If we are beyond the threshold for avail guc_ids, do schedule disable immediately. > + */ > + return guc->submission_state.guc_ids_in_use > > + guc->submission_state.sched_disable_gucid_threshold; > +} > + > +static void guc_context_sched_disable(struct intel_context *ce) > +{ > + struct intel_guc *guc = ce_to_guc(ce); > + u64 delay = guc->submission_state.sched_disable_delay_ms; > + unsigned long flags; > + > + spin_lock_irqsave(&ce->guc_state.lock, flags); > + > + if (bypass_sched_disable(guc, ce)) { > + spin_unlock_irqrestore(&ce->guc_state.lock, flags); > + intel_context_sched_disable_unpin(ce); > + } else if (!intel_context_is_closed(ce) && !guc_id_pressure(guc, ce) && > + delay) { > + spin_unlock_irqrestore(&ce->guc_state.lock, flags); > + mod_delayed_work(system_unbound_wq, > + &ce->guc_state.sched_disable_delay, > + msecs_to_jiffies(delay)); > + } else { > + do_sched_disable(guc, ce, flags); > + } > +} > > - return; > -unpin: > - intel_context_sched_disable_unpin(ce); > +static void guc_context_close(struct intel_context *ce) > +{ > + if (test_bit(CONTEXT_GUC_INIT, &ce->flags) && > + cancel_delayed_work(&ce->guc_state.sched_disable_delay)) > + __delay_sched_disable(&ce->guc_state.sched_disable_delay.work); > } > > static inline void guc_lrc_desc_unpin(struct intel_context *ce) > @@ -3346,6 +3415,8 @@ static void remove_from_context(struct i915_request *rq) > static const struct intel_context_ops guc_context_ops = { > .alloc = guc_context_alloc, > > + .close = guc_context_close, > + > .pre_pin = guc_context_pre_pin, > .pin = guc_context_pin, > .unpin = guc_context_unpin, > @@ -3428,6 +3499,10 @@ static void guc_context_init(struct intel_context *ce) > rcu_read_unlock(); > > ce->guc_state.prio = map_i915_prio_to_guc_prio(prio); > + > + INIT_DELAYED_WORK(&ce->guc_state.sched_disable_delay, > + __delay_sched_disable); > + > set_bit(CONTEXT_GUC_INIT, &ce->flags); > } > > @@ -3465,6 +3540,9 @@ static int guc_request_alloc(struct i915_request *rq) > if (unlikely(!test_bit(CONTEXT_GUC_INIT, &ce->flags))) > guc_context_init(ce); > > + if (cancel_delayed_work(&ce->guc_state.sched_disable_delay)) > + intel_context_sched_disable_unpin(ce); > + > /* > * Call pin_guc_id here rather than in the pinning step as with > * dma_resv, contexts can be repeatedly pinned / unpinned trashing the > @@ -3595,6 +3673,8 @@ static int guc_virtual_context_alloc(struct intel_context *ce) > static const struct intel_context_ops virtual_guc_context_ops = { > .alloc = guc_virtual_context_alloc, > > + .close = guc_context_close, > + > .pre_pin = guc_virtual_context_pre_pin, > .pin = guc_virtual_context_pin, > .unpin = guc_virtual_context_unpin, > @@ -3684,6 +3764,8 @@ static void guc_child_context_destroy(struct kref *kref) > static const struct intel_context_ops virtual_parent_context_ops = { > .alloc = guc_virtual_context_alloc, > > + .close = guc_context_close, > + > .pre_pin = guc_context_pre_pin, > .pin = guc_parent_context_pin, > .unpin = guc_parent_context_unpin, > @@ -4207,6 +4289,21 @@ static bool __guc_submission_selected(struct intel_guc *guc) > return i915->params.enable_guc & ENABLE_GUC_SUBMISSION; > } > > +static int __guc_get_non_lrc_num_guc_ids(struct intel_guc *guc) > +{ > + return guc->submission_state.num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc); > +} > + > +static int __guc_get_sched_disable_gucid_threshold_default(struct intel_guc *guc) > +{ > + return (__guc_get_non_lrc_num_guc_ids(guc) / 4 * 3); Should do the multiply first. The calculation is more likely to lose important precision than overflow. E.g. 15/4*3 -> 9 but 15*3/4 -> 11. Also, why make this a function? Shouldn't it just be a #define as per NUMBER_MULTI_LRC_GUC_ID, SCHED_DISABLE_DELAY_MS, etc.? > +} > + > +int intel_guc_get_sched_disable_gucid_threshold_max(struct intel_guc *guc) > +{ > + return __guc_get_non_lrc_num_guc_ids(guc); > +} > + > void intel_guc_submission_init_early(struct intel_guc *guc) > { > xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ); > @@ -4223,7 +4320,10 @@ void intel_guc_submission_init_early(struct intel_guc *guc) > spin_lock_init(&guc->timestamp.lock); > INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping); > > + guc->submission_state.sched_disable_delay_ms = SCHED_DISABLE_DELAY_MS; > guc->submission_state.num_guc_ids = GUC_MAX_CONTEXT_ID; > + guc->submission_state.sched_disable_gucid_threshold = > + __guc_get_sched_disable_gucid_threshold_default(guc); > guc->submission_supported = __guc_submission_supported(guc); > guc->submission_selected = __guc_submission_selected(guc); > } > diff --git a/drivers/gpu/drm/i915/i915_selftest.h b/drivers/gpu/drm/i915/i915_selftest.h > index f54de0499be7..bdf3e22c0a34 100644 > --- a/drivers/gpu/drm/i915/i915_selftest.h > +++ b/drivers/gpu/drm/i915/i915_selftest.h > @@ -92,12 +92,14 @@ int __i915_subtests(const char *caller, > T, ARRAY_SIZE(T), data) > #define i915_live_subtests(T, data) ({ \ > typecheck(struct drm_i915_private *, data); \ > + (data)->gt[0]->uc.guc.submission_state.sched_disable_delay_ms = 0; \ > __i915_subtests(__func__, \ > __i915_live_setup, __i915_live_teardown, \ > T, ARRAY_SIZE(T), data); \ > }) > #define intel_gt_live_subtests(T, data) ({ \ > typecheck(struct intel_gt *, data); \ > + (data)->uc.guc.submission_state.sched_disable_delay_ms = 0; \ > __i915_subtests(__func__, \ > __intel_gt_live_setup, __intel_gt_live_teardown, \ > T, ARRAY_SIZE(T), data); \ > diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h > index 37b5c9e9d260..1b85c671935c 100644 > --- a/drivers/gpu/drm/i915/i915_trace.h > +++ b/drivers/gpu/drm/i915/i915_trace.h > @@ -430,6 +430,11 @@ DEFINE_EVENT(intel_context, intel_context_reset, > TP_ARGS(ce) > ); > > +DEFINE_EVENT(intel_context, intel_context_close, > + TP_PROTO(struct intel_context *ce), > + TP_ARGS(ce) > +); > + > DEFINE_EVENT(intel_context, intel_context_ban, > TP_PROTO(struct intel_context *ce), > TP_ARGS(ce) > @@ -532,6 +537,11 @@ trace_intel_context_reset(struct intel_context *ce) > { > } > > +static inline void > +trace_intel_context_close(struct intel_context *ce) > +{ > +} > + More trace stuff despite the claim the trace was dropped? John. > static inline void > trace_intel_context_ban(struct intel_context *ce) > { ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero 2022-08-15 23:57 ` John Harrison @ 2022-08-16 1:02 ` Teres Alexis, Alan Previn 0 siblings, 0 replies; 13+ messages in thread From: Teres Alexis, Alan Previn @ 2022-08-16 1:02 UTC (permalink / raw) To: Harrison, John C, intel-gfx Shall fix all of the cosmetics and repost. WRT the magic number 34 milisecs: It was to have a "1 milisec buffer over a 33-fps" type workload which was how the issue was found in the first place (it was a workload that had hundreds of these 33-fps contexts running and thats why the impact was great). I can add that reasoning next to the #define. However, i doubt any kind of metrics measurement based choice can ever perfectly justify one value over another since the impact/improvement would always depend on the type of workload and latency requirement of that workload. WRT traces on intel_context_close - apologies that was a mistake - i realize i captured that from downstream reviews but was never mentioned upstream and i actually had missed that fix from upstream series posting since the start. Eventhough u said it doesn look plausible, I will just remove it on the next post since its a UAPI thing and rather not delay this series on account of a UAPI change. We can do that if needed later. WRT the threshold calculation (the 3/4 of remaining guc-ids), yes will fix the floating pt error and change to macro. Thanks for catching this - will fix. > +static int guc_sched_disable_gucid_threshold_get(void *data, u64 *val) > +{ > + struct intel_guc *guc = data; > + Why no check on submission_is_used here? ...alan On Mon, 2022-08-15 at 16:57 -0700, Harrison, John C wrote: > On 8/15/2022 09:01, Alan Previn wrote: > > From: Matthew Brost <matthew.brost@intel.com> > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Delay disabling scheduling on a context (rev4) 2022-08-15 16:01 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn 2022-08-15 16:01 ` [Intel-gfx] [Intel-gfx 1/2] drm/i915/selftests: Use correct selfest calls for live tests Alan Previn 2022-08-15 16:01 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn @ 2022-08-15 16:33 ` Patchwork 2022-08-15 16:55 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork 3 siblings, 0 replies; 13+ messages in thread From: Patchwork @ 2022-08-15 16:33 UTC (permalink / raw) To: Matthew Brost; +Cc: intel-gfx == Series Details == Series: Delay disabling scheduling on a context (rev4) URL : https://patchwork.freedesktop.org/series/96167/ State : warning == Summary == Error: dim sparse failed Sparse version: v0.6.2 Fast mode used, each commit won't be checked separately. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Intel-gfx] ✗ Fi.CI.BAT: failure for Delay disabling scheduling on a context (rev4) 2022-08-15 16:01 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn ` (2 preceding siblings ...) 2022-08-15 16:33 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Delay disabling scheduling on a context (rev4) Patchwork @ 2022-08-15 16:55 ` Patchwork 3 siblings, 0 replies; 13+ messages in thread From: Patchwork @ 2022-08-15 16:55 UTC (permalink / raw) To: Matthew Brost; +Cc: intel-gfx [-- Attachment #1: Type: text/plain, Size: 7545 bytes --] == Series Details == Series: Delay disabling scheduling on a context (rev4) URL : https://patchwork.freedesktop.org/series/96167/ State : failure == Summary == CI Bug Log - changes from CI_DRM_11989 -> Patchwork_96167v4 ==================================================== Summary ------- **FAILURE** Serious unknown changes coming with Patchwork_96167v4 absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_96167v4, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/index.html Participating hosts (29 -> 29) ------------------------------ Additional (1): fi-kbl-soraka Missing (1): bat-rpls-2 Possible new issues ------------------- Here are the unknown changes that may have been introduced in Patchwork_96167v4: ### IGT changes ### #### Possible regressions #### * igt@core_auth@basic-auth: - fi-rkl-guc: [PASS][1] -> [TIMEOUT][2] +4 similar issues [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11989/fi-rkl-guc/igt@core_auth@basic-auth.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-rkl-guc/igt@core_auth@basic-auth.html - bat-dg1-6: [PASS][3] -> [TIMEOUT][4] +1 similar issue [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11989/bat-dg1-6/igt@core_auth@basic-auth.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/bat-dg1-6/igt@core_auth@basic-auth.html * igt@debugfs_test@read_all_entries: - bat-dg1-6: [PASS][5] -> [INCOMPLETE][6] [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11989/bat-dg1-6/igt@debugfs_test@read_all_entries.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/bat-dg1-6/igt@debugfs_test@read_all_entries.html Known issues ------------ Here are the changes found in Patchwork_96167v4 that come from known issues: ### IGT changes ### #### Issues hit #### * igt@gem_exec_gttfill@basic: - fi-kbl-soraka: NOTRUN -> [SKIP][7] ([fdo#109271]) +8 similar issues [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-kbl-soraka/igt@gem_exec_gttfill@basic.html * igt@gem_huc_copy@huc-copy: - fi-kbl-soraka: NOTRUN -> [SKIP][8] ([fdo#109271] / [i915#2190]) [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-kbl-soraka/igt@gem_huc_copy@huc-copy.html * igt@gem_lmem_swapping@basic: - fi-kbl-soraka: NOTRUN -> [SKIP][9] ([fdo#109271] / [i915#4613]) +3 similar issues [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-kbl-soraka/igt@gem_lmem_swapping@basic.html * igt@i915_selftest@live@gt_pm: - fi-kbl-soraka: NOTRUN -> [DMESG-FAIL][10] ([i915#1886]) [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html * igt@i915_suspend@basic-s2idle-without-i915: - fi-bdw-gvtdvm: NOTRUN -> [INCOMPLETE][11] ([i915#4817]) [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-bdw-gvtdvm/igt@i915_suspend@basic-s2idle-without-i915.html * igt@kms_chamelium@common-hpd-after-suspend: - fi-bsw-kefka: NOTRUN -> [SKIP][12] ([fdo#109271] / [fdo#111827]) [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-bsw-kefka/igt@kms_chamelium@common-hpd-after-suspend.html - fi-blb-e6850: NOTRUN -> [SKIP][13] ([fdo#109271]) [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-blb-e6850/igt@kms_chamelium@common-hpd-after-suspend.html - fi-apl-guc: NOTRUN -> [SKIP][14] ([fdo#109271] / [fdo#111827]) [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-apl-guc/igt@kms_chamelium@common-hpd-after-suspend.html * igt@kms_chamelium@hdmi-hpd-fast: - fi-kbl-soraka: NOTRUN -> [SKIP][15] ([fdo#109271] / [fdo#111827]) +7 similar issues [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-kbl-soraka/igt@kms_chamelium@hdmi-hpd-fast.html * igt@runner@aborted: - fi-bdw-5557u: NOTRUN -> [FAIL][16] ([i915#4312]) [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-bdw-5557u/igt@runner@aborted.html #### Possible fixes #### * igt@i915_selftest@live@execlists: - fi-bdw-gvtdvm: [INCOMPLETE][17] ([i915#2940]) -> [PASS][18] [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11989/fi-bdw-gvtdvm/igt@i915_selftest@live@execlists.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-bdw-gvtdvm/igt@i915_selftest@live@execlists.html - fi-bsw-kefka: [INCOMPLETE][19] ([i915#2940]) -> [PASS][20] [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11989/fi-bsw-kefka/igt@i915_selftest@live@execlists.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-bsw-kefka/igt@i915_selftest@live@execlists.html * igt@i915_selftest@live@hangcheck: - {fi-ehl-2}: [INCOMPLETE][21] ([i915#6106]) -> [PASS][22] [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11989/fi-ehl-2/igt@i915_selftest@live@hangcheck.html [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-ehl-2/igt@i915_selftest@live@hangcheck.html * igt@i915_selftest@live@requests: - fi-blb-e6850: [DMESG-FAIL][23] ([i915#4528]) -> [PASS][24] [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11989/fi-blb-e6850/igt@i915_selftest@live@requests.html [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-blb-e6850/igt@i915_selftest@live@requests.html * igt@i915_selftest@live@reset: - fi-apl-guc: [INCOMPLETE][25] -> [PASS][26] [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11989/fi-apl-guc/igt@i915_selftest@live@reset.html [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/fi-apl-guc/igt@i915_selftest@live@reset.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271 [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827 [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886 [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190 [i915#2940]: https://gitlab.freedesktop.org/drm/intel/issues/2940 [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312 [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528 [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613 [i915#4817]: https://gitlab.freedesktop.org/drm/intel/issues/4817 [i915#6106]: https://gitlab.freedesktop.org/drm/intel/issues/6106 Build changes ------------- * Linux: CI_DRM_11989 -> Patchwork_96167v4 CI-20190529: 20190529 CI_DRM_11989: 8953e41fa70d4507c6f5508e030347f7eda3ba8a @ git://anongit.freedesktop.org/gfx-ci/linux IGT_6625: d47beef9b01595f721c584070940c95be1cf11e8 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git Patchwork_96167v4: 8953e41fa70d4507c6f5508e030347f7eda3ba8a @ git://anongit.freedesktop.org/gfx-ci/linux ### Linux commits edebdf6a1a62 drm/i915/guc: Add delay to disable scheduling after pin count goes to zero 2f3ba4dfcc08 drm/i915/selftests: Use correct selfest calls for live tests == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_96167v4/index.html [-- Attachment #2: Type: text/html, Size: 9208 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context @ 2022-08-14 22:43 Alan Previn 2022-08-14 22:43 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn 0 siblings, 1 reply; 13+ messages in thread From: Alan Previn @ 2022-08-14 22:43 UTC (permalink / raw) To: intel-gfx This is a revival of the same series posted by Matthew Brost back in October 2021 (https://patchwork.freedesktop.org/series/96167/). Additional real world measured metrics is included this time around that has proven the effectiveness of this series. This series adds a delay before disabling scheduling the guc-context when a context has become idle. The 2nd patch should explain it quite well. This is the 3rd rev of this series (counting from the first version by Matt). Changes from prior revs: v3: Differentiate and appropriately name helper functions for getting the 'default threshold of num-guc-ids' vs the 'max threshold of num-guc-ids' for bypassing sched-disable and use the correct one for the debugfs validation (John Harrison). v2: Changed the default of the schedule-disable delay to 34 milisecs and added debugfs to control this timing knob. Also added a debugfs to control the bypass for not delaying the schedule-disable if the we are under pressure with a very low balance of remaining guc-ds. (John Harrison). v1: Drop the trace log for intel_context_close (Chris Wilson). Add "Tested-by" into patch-2 (Chris Wilson) Add JIRA number into patch-0 (Chris Wilson). Summaries patch-2s problem and metrics into cover letter (Chris Wilson). Matthew Brost (2): drm/i915/selftests: Use correct selfest calls for live tests drm/i915/guc: Add delay to disable scheduling after pin count goes to zero drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- .../i915/gem/selftests/i915_gem_coherency.c | 2 +- .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 2 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 2 +- .../drm/i915/gem/selftests/i915_gem_object.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 9 ++ drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 18 +++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 57 +++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 150 +++++++++++++++--- drivers/gpu/drm/i915/i915_selftest.h | 2 + drivers/gpu/drm/i915/i915_trace.h | 10 ++ drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 2 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 2 +- drivers/gpu/drm/i915/selftests/i915_request.c | 2 +- drivers/gpu/drm/i915/selftests/i915_vma.c | 2 +- 16 files changed, 237 insertions(+), 34 deletions(-) base-commit: 1cb5379e17f93685065d8ec54444f1baf9386ffe -- 2.25.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero 2022-08-14 22:43 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn @ 2022-08-14 22:43 ` Alan Previn 2022-08-15 0:51 ` kernel test robot 0 siblings, 1 reply; 13+ messages in thread From: Alan Previn @ 2022-08-14 22:43 UTC (permalink / raw) To: intel-gfx From: Matthew Brost <matthew.brost@intel.com> Add a delay, configurable via debugs (default 34ms), to disable scheduling of a context after the pin count goes to zero. Disable scheduling is a somewhat costly operation so the idea is that a delay allows the user to resubmit something before doing this operation. This delay is only done if the context isn't closed and less than 3/4 of total guc_ids are in use. As temporary WA disable this feature for the selftests. Selftests are very timing sensitive and any change in timing can cause failure. A follow up patch will fixup the selftests to understand this delay. Alan Previn: Matt Brost first introduced this series back in Oct 2021. However no real world workload with measured performance impact was available to prove the intended results. Today, this series is being republished in response to a real world workload that benefited greatly from it along with measured performance improvement. Workload description: 36 containers were created on a DG2 device where each container was performing a combination of 720p 3d game rendering and 30fps video encoding. The workload density was configured in way that guaranteed each container to ALWAYS be able to render and encode no less than 30fps with a predefined maximum render + encode latency time. That means that the totality of all 36 containers and its workloads were not saturating the utilized hw engines to its max (in order to maintain just enough headrooom to meet the min fps and max latencies of incoming container submissions). Problem statement: It was observed that the CPU utilization of the CPU core that was pinned to i915 soft IRQ work was experiencing severe load. Using tracelogs and an instrumentation patch to count specific i915 IRQ events, it was confirmed that the majority of the CPU cycles were caused by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast majority of the cycles was determined to be processing a specific G2H IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent by GuC in response to i915 KMD sending H2G requests: INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G request are sent whenever a context is idle so that we can unpin the context from GuC. The high CPU utilization % symptom was limiting density scaling. Root Cause Analysis: Because the incoming execution buffers were spread across 36 different containers (each with multiple contexts) but the system in totality was NOT saturated to the max, it was assumed that each context was constantly idling between submissions. This was causing a thrashing of unpinning contexts from GuC at one moment, followed quickly by repinning them due to incoming workload the very next moment. These event-pairs were being triggered across multiple contexts per container, across all containers at the rate of > 30 times per sec per context. Metrics: When running this workload without this patch, we measured an average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or ~10 million times over ~25+ mins. With this patch, the count reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The improvement observed is ~99% for the average counts per 10 seconds. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 9 ++ drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 18 +++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 57 +++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 150 +++++++++++++++--- drivers/gpu/drm/i915/i915_selftest.h | 2 + drivers/gpu/drm/i915/i915_trace.h | 10 ++ 8 files changed, 229 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index dabdfe09f5e5..df7fd1b019ec 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1454,7 +1454,7 @@ static void engines_idle_release(struct i915_gem_context *ctx, int err; /* serialises with execbuf */ - set_bit(CONTEXT_CLOSED_BIT, &ce->flags); + intel_context_close(ce); if (!intel_context_pin_if_active(ce)) continue; diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 8e2d70630c49..7cc4bb9ad042 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -276,6 +276,15 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce) return test_bit(CONTEXT_BARRIER_BIT, &ce->flags); } +static inline void intel_context_close(struct intel_context *ce) +{ + set_bit(CONTEXT_CLOSED_BIT, &ce->flags); + + trace_intel_context_close(ce); + if (ce->ops->close) + ce->ops->close(ce); +} + static inline bool intel_context_is_closed(const struct intel_context *ce) { return test_bit(CONTEXT_CLOSED_BIT, &ce->flags); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 04eacae1aca5..86ac84e2edb9 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -43,6 +43,8 @@ struct intel_context_ops { void (*revoke)(struct intel_context *ce, struct i915_request *rq, unsigned int preempt_timeout_ms); + void (*close)(struct intel_context *ce); + int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr); int (*pin)(struct intel_context *ce, void *vaddr); void (*unpin)(struct intel_context *ce); @@ -208,6 +210,11 @@ struct intel_context { * each priority bucket */ u32 prio_count[GUC_CLIENT_PRIORITY_NUM]; + /** + * @sched_disable_delay: worker to disable scheduling on this + * context + */ + struct delayed_work sched_disable_delay; } guc_state; struct { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index a7acffbf15d1..2f534baf6f45 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -112,6 +112,10 @@ struct intel_guc { * refs */ struct list_head guc_id_list; + /** + * @guc_ids_in_use: Number single-lrc guc_ids in use + */ + u16 guc_ids_in_use; /** * @destroyed_contexts: list of contexts waiting to be destroyed * (deregistered with the GuC) @@ -132,6 +136,18 @@ struct intel_guc { * @reset_fail_mask: mask of engines that failed to reset */ intel_engine_mask_t reset_fail_mask; +#define SCHED_DISABLE_DELAY_MS 34 + /** + * @sched_disable_delay_ms: schedule disable delay, in ms, for + * contexts + */ + u64 sched_disable_delay_ms; + /** + * @sched_disable_gucid_threshold: threshold of min remaining available + * guc_ids before we start bypassing the schedule disable delay + */ + int sched_disable_gucid_threshold; + } submission_state; /** @@ -464,4 +480,6 @@ void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p); void intel_guc_write_barrier(struct intel_guc *guc); +int intel_guc_get_sched_disable_gucid_threshold_max(struct intel_guc *guc); + #endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index 25f09a420561..c0e2fe46050f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -71,12 +71,69 @@ static bool intel_eval_slpc_support(void *data) return intel_guc_slpc_is_used(guc); } +static int guc_sched_disable_delay_ms_get(void *data, u64 *val) +{ + struct intel_guc *guc = data; + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + *val = guc->submission_state.sched_disable_delay_ms; + + return 0; +} + +static int guc_sched_disable_delay_ms_set(void *data, u64 val) +{ + struct intel_guc *guc = data; + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + guc->submission_state.sched_disable_delay_ms = val; + + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_delay_ms_fops, + guc_sched_disable_delay_ms_get, + guc_sched_disable_delay_ms_set, "%lld\n"); + +static int guc_sched_disable_gucid_threshold_get(void *data, u64 *val) +{ + struct intel_guc *guc = data; + + *val = guc->submission_state.sched_disable_gucid_threshold; + return 0; +} + +static int guc_sched_disable_gucid_threshold_set(void *data, u64 val) +{ + struct intel_guc *guc = data; + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + if (val > intel_guc_get_sched_disable_gucid_threshold_max(guc)) + guc->submission_state.sched_disable_gucid_threshold = + intel_guc_get_sched_disable_gucid_threshold_max(guc); + else + guc->submission_state.sched_disable_gucid_threshold = val; + + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_gucid_threshold_fops, + guc_sched_disable_gucid_threshold_get, + guc_sched_disable_gucid_threshold_set, "%lld\n"); + void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct intel_gt_debugfs_file files[] = { { "guc_info", &guc_info_fops, NULL }, { "guc_registered_contexts", &guc_registered_contexts_fops, NULL }, { "guc_slpc_info", &guc_slpc_info_fops, &intel_eval_slpc_support}, + { "guc_sched_disable_delay_ms", &guc_sched_disable_delay_ms_fops, NULL }, + { "guc_sched_disable_gucid_threshold", &guc_sched_disable_gucid_threshold_fops, + NULL }, }; if (!intel_guc_is_supported(guc)) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0d17da77e787..24db9c1963bd 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -65,7 +65,14 @@ * corresponding G2H returns indicating the scheduling disable operation has * completed it is safe to unpin the context. While a disable is in flight it * isn't safe to resubmit the context so a fence is used to stall all future - * requests of that context until the G2H is returned. + * requests of that context until the G2H is returned. Because this interaction + * with the GuC takes a non-zero amount of time / GuC cycles we delay the + * disabling of scheduling after the pin count goes to zero by a configurable + * period of time (default 34 ms, see SCHED_DISABLE_DELAY_MS). The thought is + * this gives the user a window of time to resubmit something on the context + * before doing a somewhat costly operation. This delay is only done if it + * isn't closed and the guc_id usage is less than a threshold (default 3/4, + * (see intel_guc_get_sched_disable_gucid_threshold_default()). * * Context deregistration: * Before a context can be destroyed or if we steal its guc_id we must @@ -1989,6 +1996,9 @@ static int new_guc_id(struct intel_guc *guc, struct intel_context *ce) if (unlikely(ret < 0)) return ret; + if (!intel_context_is_parent(ce)) + ++guc->submission_state.guc_ids_in_use; + ce->guc_id.id = ret; return 0; } @@ -1998,14 +2008,16 @@ static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce) GEM_BUG_ON(intel_context_is_child(ce)); if (!context_guc_id_invalid(ce)) { - if (intel_context_is_parent(ce)) + if (intel_context_is_parent(ce)) { bitmap_release_region(guc->submission_state.guc_ids_bitmap, ce->guc_id.id, order_base_2(ce->parallel.number_children + 1)); - else + } else { + --guc->submission_state.guc_ids_in_use; ida_simple_remove(&guc->submission_state.guc_ids, ce->guc_id.id); + } clr_ctx_id_mapping(guc, ce->guc_id.id); set_context_guc_id_invalid(ce); } @@ -2993,41 +3005,98 @@ guc_context_revoke(struct intel_context *ce, struct i915_request *rq, } } -static void guc_context_sched_disable(struct intel_context *ce) +static void guc_context_sched_disable(struct intel_context *ce); + +static void do_sched_disable(struct intel_guc *guc, struct intel_context *ce, + unsigned long flags) + __releases(ce->guc_state.lock) { - struct intel_guc *guc = ce_to_guc(ce); - unsigned long flags; struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; intel_wakeref_t wakeref; - u16 guc_id; + lockdep_assert_held(&ce->guc_state.lock); + + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + with_intel_runtime_pm(runtime_pm, wakeref) + guc_context_sched_disable(ce); +} + +static bool bypass_sched_disable(struct intel_guc *guc, + struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); GEM_BUG_ON(intel_context_is_child(ce)); + if (submission_disabled(guc) || context_guc_id_invalid(ce) || + !ctx_id_mapped(guc, ce->guc_id.id)) { + clr_context_enabled(ce); + return true; + } + + return !context_enabled(ce); +} + +static void __delay_sched_disable(struct work_struct *wrk) +{ + struct intel_context *ce = + container_of(wrk, typeof(*ce), guc_state.sched_disable_delay.work); + struct intel_guc *guc = ce_to_guc(ce); + unsigned long flags; + spin_lock_irqsave(&ce->guc_state.lock, flags); - /* - * We have to check if the context has been disabled by another thread, - * check if submssion has been disabled to seal a race with reset and - * finally check if any more requests have been committed to the - * context ensursing that a request doesn't slip through the - * 'context_pending_disable' fence. - */ - if (unlikely(!context_enabled(ce) || submission_disabled(guc) || - context_has_committed_requests(ce))) { - clr_context_enabled(ce); + if (bypass_sched_disable(guc, ce)) { spin_unlock_irqrestore(&ce->guc_state.lock, flags); - goto unpin; + intel_context_sched_disable_unpin(ce); + } else { + do_sched_disable(guc, ce, flags); } - guc_id = prep_context_pending_disable(ce); +} - spin_unlock_irqrestore(&ce->guc_state.lock, flags); +static bool guc_id_pressure(struct intel_guc *guc, struct intel_context *ce) +{ + /* + * parent contexts are perma-pinned, if we are unpinning do schedule + * disable immediately. + */ + if (intel_context_is_parent(ce)) + return true; - with_intel_runtime_pm(runtime_pm, wakeref) - __guc_context_sched_disable(guc, ce, guc_id); + /* + * If we are beyond the threshold for avail guc_ids, do schedule disable immediately. + */ + return guc->submission_state.guc_ids_in_use > + guc->submission_state.sched_disable_gucid_threshold; +} + +static void guc_context_sched_disable(struct intel_context *ce) +{ + struct intel_guc *guc = ce_to_guc(ce); + u64 delay = guc->submission_state.sched_disable_delay_ms; + unsigned long flags; + + spin_lock_irqsave(&ce->guc_state.lock, flags); + + if (bypass_sched_disable(guc, ce)) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + intel_context_sched_disable_unpin(ce); + } else if (!intel_context_is_closed(ce) && !guc_id_pressure(guc, ce) && + delay) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + mod_delayed_work(system_unbound_wq, + &ce->guc_state.sched_disable_delay, + msecs_to_jiffies(delay)); + } else { + do_sched_disable(guc, ce, flags); + } +} - return; -unpin: - intel_context_sched_disable_unpin(ce); +static void guc_context_close(struct intel_context *ce) +{ + if (test_bit(CONTEXT_GUC_INIT, &ce->flags) && + cancel_delayed_work(&ce->guc_state.sched_disable_delay)) + __delay_sched_disable(&ce->guc_state.sched_disable_delay.work); } static inline void guc_lrc_desc_unpin(struct intel_context *ce) @@ -3346,6 +3415,8 @@ static void remove_from_context(struct i915_request *rq) static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc, + .close = guc_context_close, + .pre_pin = guc_context_pre_pin, .pin = guc_context_pin, .unpin = guc_context_unpin, @@ -3428,6 +3499,10 @@ static void guc_context_init(struct intel_context *ce) rcu_read_unlock(); ce->guc_state.prio = map_i915_prio_to_guc_prio(prio); + + INIT_DELAYED_WORK(&ce->guc_state.sched_disable_delay, + __delay_sched_disable); + set_bit(CONTEXT_GUC_INIT, &ce->flags); } @@ -3465,6 +3540,9 @@ static int guc_request_alloc(struct i915_request *rq) if (unlikely(!test_bit(CONTEXT_GUC_INIT, &ce->flags))) guc_context_init(ce); + if (cancel_delayed_work(&ce->guc_state.sched_disable_delay)) + intel_context_sched_disable_unpin(ce); + /* * Call pin_guc_id here rather than in the pinning step as with * dma_resv, contexts can be repeatedly pinned / unpinned trashing the @@ -3595,6 +3673,8 @@ static int guc_virtual_context_alloc(struct intel_context *ce) static const struct intel_context_ops virtual_guc_context_ops = { .alloc = guc_virtual_context_alloc, + .close = guc_context_close, + .pre_pin = guc_virtual_context_pre_pin, .pin = guc_virtual_context_pin, .unpin = guc_virtual_context_unpin, @@ -3684,6 +3764,8 @@ static void guc_child_context_destroy(struct kref *kref) static const struct intel_context_ops virtual_parent_context_ops = { .alloc = guc_virtual_context_alloc, + .close = guc_context_close, + .pre_pin = guc_context_pre_pin, .pin = guc_parent_context_pin, .unpin = guc_parent_context_unpin, @@ -4207,6 +4289,21 @@ static bool __guc_submission_selected(struct intel_guc *guc) return i915->params.enable_guc & ENABLE_GUC_SUBMISSION; } +int __guc_get_non_lrc_num_guc_ids(struct intel_guc *guc) +{ + return guc->submission_state.num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc); +} + +int __guc_get_sched_disable_gucid_threshold_default(struct intel_guc *guc) +{ + return (__guc_get_non_lrc_num_guc_ids(guc) / 4 * 3); +} + +int intel_guc_get_sched_disable_gucid_threshold_max(struct intel_guc *guc) +{ + return __guc_get_non_lrc_num_guc_ids(guc); +} + void intel_guc_submission_init_early(struct intel_guc *guc) { xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ); @@ -4223,7 +4320,10 @@ void intel_guc_submission_init_early(struct intel_guc *guc) spin_lock_init(&guc->timestamp.lock); INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping); + guc->submission_state.sched_disable_delay_ms = SCHED_DISABLE_DELAY_MS; guc->submission_state.num_guc_ids = GUC_MAX_CONTEXT_ID; + guc->submission_state.sched_disable_gucid_threshold = + __guc_get_sched_disable_gucid_threshold_default(guc); guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc); } diff --git a/drivers/gpu/drm/i915/i915_selftest.h b/drivers/gpu/drm/i915/i915_selftest.h index f54de0499be7..bdf3e22c0a34 100644 --- a/drivers/gpu/drm/i915/i915_selftest.h +++ b/drivers/gpu/drm/i915/i915_selftest.h @@ -92,12 +92,14 @@ int __i915_subtests(const char *caller, T, ARRAY_SIZE(T), data) #define i915_live_subtests(T, data) ({ \ typecheck(struct drm_i915_private *, data); \ + (data)->gt[0]->uc.guc.submission_state.sched_disable_delay_ms = 0; \ __i915_subtests(__func__, \ __i915_live_setup, __i915_live_teardown, \ T, ARRAY_SIZE(T), data); \ }) #define intel_gt_live_subtests(T, data) ({ \ typecheck(struct intel_gt *, data); \ + (data)->uc.guc.submission_state.sched_disable_delay_ms = 0; \ __i915_subtests(__func__, \ __intel_gt_live_setup, __intel_gt_live_teardown, \ T, ARRAY_SIZE(T), data); \ diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 37b5c9e9d260..1b85c671935c 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -430,6 +430,11 @@ DEFINE_EVENT(intel_context, intel_context_reset, TP_ARGS(ce) ); +DEFINE_EVENT(intel_context, intel_context_close, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + DEFINE_EVENT(intel_context, intel_context_ban, TP_PROTO(struct intel_context *ce), TP_ARGS(ce) @@ -532,6 +537,11 @@ trace_intel_context_reset(struct intel_context *ce) { } +static inline void +trace_intel_context_close(struct intel_context *ce) +{ +} + static inline void trace_intel_context_ban(struct intel_context *ce) { -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero 2022-08-14 22:43 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn @ 2022-08-15 0:51 ` kernel test robot 0 siblings, 0 replies; 13+ messages in thread From: kernel test robot @ 2022-08-15 0:51 UTC (permalink / raw) To: Alan Previn; +Cc: llvm, kbuild-all Hi Alan, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on 1cb5379e17f93685065d8ec54444f1baf9386ffe] url: https://github.com/intel-lab-lkp/linux/commits/Alan-Previn/Delay-disabling-scheduling-on-a-context/20220815-064329 base: 1cb5379e17f93685065d8ec54444f1baf9386ffe config: x86_64-randconfig-a005 (https://download.01.org/0day-ci/archive/20220815/202208150850.JJ4hGJX2-lkp@intel.com/config) compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 6afcc4a459ead8809a0d6d9b4bf7b64bcc13582b) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/521d8255b73015ef55e0466508f9f9accf64aba6 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Alan-Previn/Delay-disabling-scheduling-on-a-context/20220815-064329 git checkout 521d8255b73015ef55e0466508f9f9accf64aba6 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/gpu/drm/i915/ If you fix the issue, kindly add following tag where applicable Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): >> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:4292:5: warning: no previous prototype for function '__guc_get_non_lrc_num_guc_ids' [-Wmissing-prototypes] int __guc_get_non_lrc_num_guc_ids(struct intel_guc *guc) ^ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:4292:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int __guc_get_non_lrc_num_guc_ids(struct intel_guc *guc) ^ static >> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:4297:5: warning: no previous prototype for function '__guc_get_sched_disable_gucid_threshold_default' [-Wmissing-prototypes] int __guc_get_sched_disable_gucid_threshold_default(struct intel_guc *guc) ^ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:4297:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int __guc_get_sched_disable_gucid_threshold_default(struct intel_guc *guc) ^ static drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:353:20: warning: unused function 'context_has_committed_requests' [-Wunused-function] static inline bool context_has_committed_requests(struct intel_context *ce) ^ 3 warnings generated. vim +/__guc_get_non_lrc_num_guc_ids +4292 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 4291 > 4292 int __guc_get_non_lrc_num_guc_ids(struct intel_guc *guc) 4293 { 4294 return guc->submission_state.num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc); 4295 } 4296 > 4297 int __guc_get_sched_disable_gucid_threshold_default(struct intel_guc *guc) 4298 { 4299 return (__guc_get_non_lrc_num_guc_ids(guc) / 4 * 3); 4300 } 4301 -- 0-DAY CI Kernel Test Service https://01.org/lkp ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context @ 2022-06-28 5:51 Alan Previn 2022-06-28 5:51 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn 0 siblings, 1 reply; 13+ messages in thread From: Alan Previn @ 2022-06-28 5:51 UTC (permalink / raw) To: intel-gfx This is a revival of the same series posted by Matthew Brost back in October 2021 (https://patchwork.freedesktop.org/series/96167/). Additional real world measured metrics is included this time around that has proven the effectiveness of this series. This series adds a delay before disabling scheduling on a context. 2nd patch should explain it quite well. Matthew Brost (2): drm/i915/selftests: Use correct selfest calls for live tests drm/i915/guc: Add delay to disable scheduling after pin count goes to zero drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- .../i915/gem/selftests/i915_gem_coherency.c | 2 +- .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 2 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 2 +- .../drm/i915/gem/selftests/i915_gem_object.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 9 ++ drivers/gpu/drm/i915/gt/intel_context_types.h | 8 ++ drivers/gpu/drm/i915/gt/uc/intel_guc.h | 10 ++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 28 ++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 132 ++++++++++++++---- drivers/gpu/drm/i915/i915_selftest.h | 2 + drivers/gpu/drm/i915/i915_trace.h | 10 ++ drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 2 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 2 +- drivers/gpu/drm/i915/selftests/i915_request.c | 2 +- drivers/gpu/drm/i915/selftests/i915_vma.c | 2 +- 16 files changed, 183 insertions(+), 34 deletions(-) base-commit: ab91e54f0a9cec3fba39a2beae980ffbb8f7cea7 -- 2.25.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero 2022-06-28 5:51 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn @ 2022-06-28 5:51 ` Alan Previn 2022-07-28 20:19 ` John Harrison 0 siblings, 1 reply; 13+ messages in thread From: Alan Previn @ 2022-06-28 5:51 UTC (permalink / raw) To: intel-gfx From: Matthew Brost <matthew.brost@intel.com> Add a delay, configurable via debugs (default 100ms), to disable scheduling of a context after the pin count goes to zero. Disable scheduling is somewhat costly operation so the idea is a delay allows the resubmit something before doing this operation. This delay is only done if the context isn't close and less than 3/4 of the guc_ids are in use. As temporary WA disable this feature for the selftests. Selftests are very timing sensitive and any change in timing can cause failure. A follow up patch will fixup the selftests to understand this delay. Alan Previn: Matt Brost first introduced this series back in Oct 2021. However no real world workload with measured performance impact was available to prove the intended results. Today, this series is being republished in response to a real world workload that benefited greatly from it along with measured performance improvement. Workload description: 36 containers were created on a DG2 device where each container was performing a combination of 720p 3d game rendering and 30fps video encoding. The workload density was configured in way that guaranteed each container to ALWAYS be able to render and encode no less than 30fps with a predefined maximum render + encode latency time. That means that the totality of all 36 containers and its workloads were not saturating the utilized hw engines to its max (in order to maintain just enough headrooom to meet the minimum fps and latencies of incoming container submissions). Problem statement: It was observed that the CPU utilization of the CPU core that was pinned to i915 soft IRQ work was experiencing severe load. Using tracelogs and an instrumentation patch to count specific i915 IRQ events, it was confirmed that the majority of the CPU cycles were caused by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast majority of the cycles was determined to be processing a specific G2H IRQ which was INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. This IRQ is send by the GuC in response to the i915 KMD sending the H2G requests INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET to the GuC. That request is sent when the context is idle to unpin the context from any GuC access. The high CPU utilization % symptom was limiting the density scaling. Root Cause Analysis: Because the incoming execution buffers were spread across 36 different containers (each with multiple contexts) but the system in totality was NOT saturated to the max, it was assumed that each context was constantly idling between submissions. This was causing thrashing of unpinning a context from GuC at one moment, followed by repinning it due to incoming workload the very next moment. Both of these event-pairs were being triggered across multiple contexts per container, across all containers at the rate of > 30 times per sec per context. Metrics: When running this workload without this patch, we measured an average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or ~10 million times over ~25+ mins. With this patch, the count reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The improvement observed is ~99% for the average counts per 10 seconds. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Alan Previn <alan.previn.teres.alexis@intel.com> --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 9 ++ drivers/gpu/drm/i915/gt/intel_context_types.h | 8 ++ drivers/gpu/drm/i915/gt/uc/intel_guc.h | 10 ++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 28 ++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 132 ++++++++++++++---- drivers/gpu/drm/i915/i915_selftest.h | 2 + drivers/gpu/drm/i915/i915_trace.h | 10 ++ 8 files changed, 175 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index dabdfe09f5e5..df7fd1b019ec 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1454,7 +1454,7 @@ static void engines_idle_release(struct i915_gem_context *ctx, int err; /* serialises with execbuf */ - set_bit(CONTEXT_CLOSED_BIT, &ce->flags); + intel_context_close(ce); if (!intel_context_pin_if_active(ce)) continue; diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 8e2d70630c49..7cc4bb9ad042 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -276,6 +276,15 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce) return test_bit(CONTEXT_BARRIER_BIT, &ce->flags); } +static inline void intel_context_close(struct intel_context *ce) +{ + set_bit(CONTEXT_CLOSED_BIT, &ce->flags); + + trace_intel_context_close(ce); + if (ce->ops->close) + ce->ops->close(ce); +} + static inline bool intel_context_is_closed(const struct intel_context *ce) { return test_bit(CONTEXT_CLOSED_BIT, &ce->flags); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index d2d75d9c0c8d..13b9f4875dfe 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -43,6 +43,8 @@ struct intel_context_ops { void (*revoke)(struct intel_context *ce, struct i915_request *rq, unsigned int preempt_timeout_ms); + void (*close)(struct intel_context *ce); + int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr); int (*pin)(struct intel_context *ce, void *vaddr); void (*unpin)(struct intel_context *ce); @@ -236,6 +238,12 @@ struct intel_context { */ struct list_head destroyed_link; + /** + * @guc_sched_disable_delay: worker to disable scheduling on this + * context + */ + struct delayed_work guc_sched_disable_delay; + /** @parallel: sub-structure for parallel submission members */ struct { union { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index d0d99f178f2d..398365cf204f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -112,6 +112,10 @@ struct intel_guc { * refs */ struct list_head guc_id_list; + /** + * @guc_ids_in_use: Number single-lrc guc_ids in use + */ + u16 guc_ids_in_use; /** * @destroyed_contexts: list of contexts waiting to be destroyed * (deregistered with the GuC) @@ -132,6 +136,12 @@ struct intel_guc { * @reset_fail_mask: mask of engines that failed to reset */ intel_engine_mask_t reset_fail_mask; +#define SCHED_DISABLE_DELAY_MS 100 + /** + * @sched_disable_delay_ms: schedule disable delay, in ms, for + * contexts + */ + u64 sched_disable_delay_ms; } submission_state; /** diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index 25f09a420561..28470fc67b6d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -71,12 +71,40 @@ static bool intel_eval_slpc_support(void *data) return intel_guc_slpc_is_used(guc); } +static int guc_sched_disable_delay_ms_get(void *data, u64 *val) +{ + struct intel_guc *guc = data; + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + *val = guc->submission_state.sched_disable_delay_ms; + + return 0; +} + +static int guc_sched_disable_delay_ms_set(void *data, u64 val) +{ + struct intel_guc *guc = data; + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + guc->submission_state.sched_disable_delay_ms = val; + + return 0; +} +DEFINE_SIMPLE_ATTRIBUTE(guc_sched_disable_delay_ms_fops, + guc_sched_disable_delay_ms_get, + guc_sched_disable_delay_ms_set, "%lld\n"); + void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct intel_gt_debugfs_file files[] = { { "guc_info", &guc_info_fops, NULL }, { "guc_registered_contexts", &guc_registered_contexts_fops, NULL }, { "guc_slpc_info", &guc_slpc_info_fops, &intel_eval_slpc_support}, + { "guc_sched_disable_delay_ms", &guc_sched_disable_delay_ms_fops, NULL }, }; if (!intel_guc_is_supported(guc)) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c9f167b80910..24f6ffc93edb 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -65,7 +65,13 @@ * corresponding G2H returns indicating the scheduling disable operation has * completed it is safe to unpin the context. While a disable is in flight it * isn't safe to resubmit the context so a fence is used to stall all future - * requests of that context until the G2H is returned. + * requests of that context until the G2H is returned. Because this interaction + * with the GuC takes a non-zero amount of time / GuC cycles we delay the + * disabling of scheduling after the pin count goes to zero by configurable + * (default 100 ms) period of time. The thought is this gives the user a window + * of time to resubmit something on the context before doing a somewhat costly + * operation. This delay is only done if the context is close and less than 3/4 + * of the guc_ids are in use. * * Context deregistration: * Before a context can be destroyed or if we steal its guc_id we must @@ -1992,6 +1998,9 @@ static int new_guc_id(struct intel_guc *guc, struct intel_context *ce) if (unlikely(ret < 0)) return ret; + if (!intel_context_is_parent(ce)) + ++guc->submission_state.guc_ids_in_use; + ce->guc_id.id = ret; return 0; } @@ -2001,14 +2010,16 @@ static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce) GEM_BUG_ON(intel_context_is_child(ce)); if (!context_guc_id_invalid(ce)) { - if (intel_context_is_parent(ce)) + if (intel_context_is_parent(ce)) { bitmap_release_region(guc->submission_state.guc_ids_bitmap, ce->guc_id.id, order_base_2(ce->parallel.number_children + 1)); - else + } else { ida_simple_remove(&guc->submission_state.guc_ids, ce->guc_id.id); + --guc->submission_state.guc_ids_in_use; + } clr_ctx_id_mapping(guc, ce->guc_id.id); set_context_guc_id_invalid(ce); } @@ -2858,41 +2869,98 @@ guc_context_revoke(struct intel_context *ce, struct i915_request *rq, } } -static void guc_context_sched_disable(struct intel_context *ce) +static void guc_context_sched_disable(struct intel_context *ce); + +static void do_sched_disable(struct intel_guc *guc, struct intel_context *ce, + unsigned long flags) + __releases(ce->guc_state.lock) { - struct intel_guc *guc = ce_to_guc(ce); - unsigned long flags; struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; intel_wakeref_t wakeref; - u16 guc_id; + lockdep_assert_held(&ce->guc_state.lock); + + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + with_intel_runtime_pm(runtime_pm, wakeref) + guc_context_sched_disable(ce); +} + +static bool bypass_sched_disable(struct intel_guc *guc, + struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); GEM_BUG_ON(intel_context_is_child(ce)); + if (submission_disabled(guc) || context_guc_id_invalid(ce) || + !ctx_id_mapped(guc, ce->guc_id.id)) { + clr_context_enabled(ce); + return true; + } + + return !context_enabled(ce); +} + +static void __delay_sched_disable(struct work_struct *wrk) +{ + struct intel_context *ce = + container_of(wrk, typeof(*ce), guc_sched_disable_delay.work); + struct intel_guc *guc = ce_to_guc(ce); + unsigned long flags; + spin_lock_irqsave(&ce->guc_state.lock, flags); - /* - * We have to check if the context has been disabled by another thread, - * check if submssion has been disabled to seal a race with reset and - * finally check if any more requests have been committed to the - * context ensursing that a request doesn't slip through the - * 'context_pending_disable' fence. - */ - if (unlikely(!context_enabled(ce) || submission_disabled(guc) || - context_has_committed_requests(ce))) { - clr_context_enabled(ce); + if (bypass_sched_disable(guc, ce)) { spin_unlock_irqrestore(&ce->guc_state.lock, flags); - goto unpin; + intel_context_sched_disable_unpin(ce); + } else { + do_sched_disable(guc, ce, flags); } - guc_id = prep_context_pending_disable(ce); +} - spin_unlock_irqrestore(&ce->guc_state.lock, flags); +static bool guc_id_pressure(struct intel_guc *guc, struct intel_context *ce) +{ + /* + * parent contexts are perma-pinned, if we are unpinning do schedule + * disable immediately. + */ + if (intel_context_is_parent(ce)) + return true; - with_intel_runtime_pm(runtime_pm, wakeref) - __guc_context_sched_disable(guc, ce, guc_id); + /* + * If more than 3/4 of guc_ids in use, do schedule disable immediately. + */ + return guc->submission_state.guc_ids_in_use > + ((GUC_MAX_CONTEXT_ID - NUMBER_MULTI_LRC_GUC_ID(guc)) / 4) * 3; +} + +static void guc_context_sched_disable(struct intel_context *ce) +{ + struct intel_guc *guc = ce_to_guc(ce); + u64 delay = guc->submission_state.sched_disable_delay_ms; + unsigned long flags; - return; -unpin: - intel_context_sched_disable_unpin(ce); + spin_lock_irqsave(&ce->guc_state.lock, flags); + + if (bypass_sched_disable(guc, ce)) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + intel_context_sched_disable_unpin(ce); + } else if (!intel_context_is_closed(ce) && !guc_id_pressure(guc, ce) && + delay) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + mod_delayed_work(system_unbound_wq, + &ce->guc_sched_disable_delay, + msecs_to_jiffies(delay)); + } else { + do_sched_disable(guc, ce, flags); + } +} + +static void guc_context_close(struct intel_context *ce) +{ + if (test_bit(CONTEXT_GUC_INIT, &ce->flags) && + cancel_delayed_work(&ce->guc_sched_disable_delay)) + __delay_sched_disable(&ce->guc_sched_disable_delay.work); } static inline void guc_lrc_desc_unpin(struct intel_context *ce) @@ -3201,6 +3269,8 @@ static void remove_from_context(struct i915_request *rq) static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc, + .close = guc_context_close, + .pre_pin = guc_context_pre_pin, .pin = guc_context_pin, .unpin = guc_context_unpin, @@ -3283,6 +3353,10 @@ static void guc_context_init(struct intel_context *ce) rcu_read_unlock(); ce->guc_state.prio = map_i915_prio_to_guc_prio(prio); + + INIT_DELAYED_WORK(&ce->guc_sched_disable_delay, + __delay_sched_disable); + set_bit(CONTEXT_GUC_INIT, &ce->flags); } @@ -3320,6 +3394,9 @@ static int guc_request_alloc(struct i915_request *rq) if (unlikely(!test_bit(CONTEXT_GUC_INIT, &ce->flags))) guc_context_init(ce); + if (cancel_delayed_work(&ce->guc_sched_disable_delay)) + intel_context_sched_disable_unpin(ce); + /* * Call pin_guc_id here rather than in the pinning step as with * dma_resv, contexts can be repeatedly pinned / unpinned trashing the @@ -3450,6 +3527,8 @@ static int guc_virtual_context_alloc(struct intel_context *ce) static const struct intel_context_ops virtual_guc_context_ops = { .alloc = guc_virtual_context_alloc, + .close = guc_context_close, + .pre_pin = guc_virtual_context_pre_pin, .pin = guc_virtual_context_pin, .unpin = guc_virtual_context_unpin, @@ -3539,6 +3618,8 @@ static void guc_child_context_destroy(struct kref *kref) static const struct intel_context_ops virtual_parent_context_ops = { .alloc = guc_virtual_context_alloc, + .close = guc_context_close, + .pre_pin = guc_context_pre_pin, .pin = guc_parent_context_pin, .unpin = guc_parent_context_unpin, @@ -4064,6 +4145,7 @@ void intel_guc_submission_init_early(struct intel_guc *guc) spin_lock_init(&guc->timestamp.lock); INIT_DELAYED_WORK(&guc->timestamp.work, guc_timestamp_ping); + guc->submission_state.sched_disable_delay_ms = SCHED_DISABLE_DELAY_MS; guc->submission_state.num_guc_ids = GUC_MAX_CONTEXT_ID; guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc); diff --git a/drivers/gpu/drm/i915/i915_selftest.h b/drivers/gpu/drm/i915/i915_selftest.h index f54de0499be7..b716854246d0 100644 --- a/drivers/gpu/drm/i915/i915_selftest.h +++ b/drivers/gpu/drm/i915/i915_selftest.h @@ -91,12 +91,14 @@ int __i915_subtests(const char *caller, __i915_nop_setup, __i915_nop_teardown, \ T, ARRAY_SIZE(T), data) #define i915_live_subtests(T, data) ({ \ + (data)->gt[0]->uc.guc.submission_state.sched_disable_delay_ms = 0; \ typecheck(struct drm_i915_private *, data); \ __i915_subtests(__func__, \ __i915_live_setup, __i915_live_teardown, \ T, ARRAY_SIZE(T), data); \ }) #define intel_gt_live_subtests(T, data) ({ \ + (data)->uc.guc.submission_state.sched_disable_delay_ms = 0; \ typecheck(struct intel_gt *, data); \ __i915_subtests(__func__, \ __intel_gt_live_setup, __intel_gt_live_teardown, \ diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 37b5c9e9d260..1b85c671935c 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -430,6 +430,11 @@ DEFINE_EVENT(intel_context, intel_context_reset, TP_ARGS(ce) ); +DEFINE_EVENT(intel_context, intel_context_close, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + DEFINE_EVENT(intel_context, intel_context_ban, TP_PROTO(struct intel_context *ce), TP_ARGS(ce) @@ -532,6 +537,11 @@ trace_intel_context_reset(struct intel_context *ce) { } +static inline void +trace_intel_context_close(struct intel_context *ce) +{ +} + static inline void trace_intel_context_ban(struct intel_context *ce) { -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero 2022-06-28 5:51 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn @ 2022-07-28 20:19 ` John Harrison 0 siblings, 0 replies; 13+ messages in thread From: John Harrison @ 2022-07-28 20:19 UTC (permalink / raw) To: Alan Previn, intel-gfx On 6/27/2022 22:51, Alan Previn wrote: > From: Matthew Brost <matthew.brost@intel.com> > > Add a delay, configurable via debugs (default 100ms), to disable debugs -> debugfs Default is now 34ms? > scheduling of a context after the pin count goes to zero. Disable > scheduling is somewhat costly operation so the idea is a delay allows costly operation as it requires synchronising with the GuC. So the idea > the resubmit something before doing this operation. This delay is only the user to resubmit > done if the context isn't close and less than 3/4 of the guc_ids are in close -> closed less than a given threshold (default is 3/4) of the guc_ids > use. > > As temporary WA disable this feature for the selftests. Selftests are > very timing sensitive and any change in timing can cause failure. A > follow up patch will fixup the selftests to understand this delay. > > Alan Previn: Matt Brost first introduced this series back in Oct 2021. > However no real world workload with measured performance impact was > available to prove the intended results. Today, this series is being > republished in response to a real world workload that benefited greatly > from it along with measured performance improvement. > > Workload description: 36 containers were created on a DG2 device where > each container was performing a combination of 720p 3d game rendering > and 30fps video encoding. The workload density was configured in way > that guaranteed each container to ALWAYS be able to render and > encode no less than 30fps with a predefined maximum render + encode > latency time. That means that the totality of all 36 containers and its > workloads were not saturating the utilized hw engines to its max > (in order to maintain just enough headrooom to meet the minimum fps and > latencies of incoming container submissions). > > Problem statement: It was observed that the CPU utilization of the CPU > core that was pinned to i915 soft IRQ work was experiencing severe load. > Using tracelogs and an instrumentation patch to count specific i915 IRQ > events, it was confirmed that the majority of the CPU cycles were caused > by the gen11_other_irq_handler() -> guc_irq_handler() code path. The vast > majority of the cycles was determined to be processing a specific G2H IRQ > which was INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. This IRQ is send by send -> sent > the GuC in response to the i915 KMD sending the H2G requests > INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET to the GuC. That request is sent > when the context is idle to unpin the context from any GuC access. The > high CPU utilization % symptom was limiting the density scaling. > > Root Cause Analysis: Because the incoming execution buffers were spread > across 36 different containers (each with multiple contexts) but the > system in totality was NOT saturated to the max, it was assumed that each > context was constantly idling between submissions. This was causing thrashing > of unpinning a context from GuC at one moment, followed by repinning it > due to incoming workload the very next moment. Both of these event-pairs > were being triggered across multiple contexts per container, across all > containers at the rate of > 30 times per sec per context. > > Metrics: When running this workload without this patch, we measured an average > of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10 seconds or > ~10 million times over ~25+ mins. With this patch, the count reduced to ~480 > every 10 seconds or about ~28K over ~10 mins. The improvement observed is > ~99% for the average counts per 10 seconds. > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > Acked-by: Alan Previn <alan.previn.teres.alexis@intel.com> Needs your s-o-b as you are posting the patch. The code below looks to be the old rev of the patch? This still needs updating with the cleanup work? John. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-08-16 1:55 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-15 16:01 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn 2022-08-15 16:01 ` [Intel-gfx] [Intel-gfx 1/2] drm/i915/selftests: Use correct selfest calls for live tests Alan Previn 2022-08-15 22:40 ` John Harrison 2022-08-16 1:54 ` Teres Alexis, Alan Previn 2022-08-15 16:01 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn 2022-08-15 23:57 ` John Harrison 2022-08-16 1:02 ` Teres Alexis, Alan Previn 2022-08-15 16:33 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Delay disabling scheduling on a context (rev4) Patchwork 2022-08-15 16:55 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork -- strict thread matches above, loose matches on Subject: below -- 2022-08-14 22:43 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn 2022-08-14 22:43 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn 2022-08-15 0:51 ` kernel test robot 2022-06-28 5:51 [Intel-gfx] [Intel-gfx 0/2] Delay disabling scheduling on a context Alan Previn 2022-06-28 5:51 ` [Intel-gfx] [Intel-gfx 2/2] drm/i915/guc: Add delay to disable scheduling after pin count goes to zero Alan Previn 2022-07-28 20:19 ` John Harrison
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.