Francisco Jerez writes: > Chris Wilson writes: > >> Quoting Francisco Jerez (2020-03-10 21:41:55) >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c >>> index b9b3f78f1324..a5d7a80b826d 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c >>> @@ -1577,6 +1577,11 @@ static void execlists_submit_ports(struct intel_engine_cs *engine) >>> /* we need to manually load the submit queue */ >>> if (execlists->ctrl_reg) >>> writel(EL_CTRL_LOAD, execlists->ctrl_reg); >>> + >>> + if (execlists_num_ports(execlists) > 1 && >> pending[1] is always defined, the minimum submission is one slot, with >> pending[1] as the sentinel NULL. >> >>> + execlists->pending[1] && >>> + !atomic_xchg(&execlists->overload, 1)) >>> + intel_gt_pm_active_begin(&engine->i915->gt); >> >> engine->gt >> > > Applied your suggestions above locally, will probably wait to have a few > more changes batched up before sending a v2. > >>> } >>> >>> static bool ctx_single_port_submission(const struct intel_context *ce) >>> @@ -2213,6 +2218,12 @@ cancel_port_requests(struct intel_engine_execlists * const execlists) >>> clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight)); >>> >>> WRITE_ONCE(execlists->active, execlists->inflight); >>> + >>> + if (atomic_xchg(&execlists->overload, 0)) { >>> + struct intel_engine_cs *engine = >>> + container_of(execlists, typeof(*engine), execlists); >>> + intel_gt_pm_active_end(&engine->i915->gt); >>> + } >>> } >>> >>> static inline void >>> @@ -2386,6 +2397,9 @@ static void process_csb(struct intel_engine_cs *engine) >>> /* port0 completed, advanced to port1 */ >>> trace_ports(execlists, "completed", execlists->active); >>> >>> + if (atomic_xchg(&execlists->overload, 0)) >>> + intel_gt_pm_active_end(&engine->i915->gt); >> >> So this looses track if we preempt a dual-ELSP submission with a >> single-ELSP submission (and never go back to dual). >> > > Yes, good point. You're right that if a dual-ELSP submission gets > preempted by a single-ELSP submission "overload" will remain signaled > until the first completion interrupt arrives (e.g. from the preempting > submission). > >> If you move this to the end of the loop and check >> >> if (!execlists->active[1] && atomic_xchg(&execlists->overload, 0)) >> intel_gt_pm_active_end(engine->gt); >> >> so that it covers both preemption/promotion and completion. >> > > That sounds reasonable. > >> However, that will fluctuate quite rapidly. (And runs the risk of >> exceeding the sentinel.) >> >> An alternative approach would be to couple along >> schedule_in/schedule_out >> >> atomic_set(overload, -1); >> >> __execlists_schedule_in: >> if (!atomic_fetch_inc(overload) >> intel_gt_pm_active_begin(engine->gt); >> __execlists_schedule_out: >> if (!atomic_dec_return(overload) >> intel_gt_pm_active_end(engine->gt); >> >> which would mean we are overloaded as soon as we try to submit an >> overlapping ELSP. >> > > That sounds good to me too, and AFAICT would have roughly the same > behavior as this metric except for the preemption corner case you > mention above. I'll try this and verify that I get approximately the > same performance numbers. > This suggestion seems to lead to some minor regressions, I'm investigating the issue. Will send a v2 as soon as I have something along the lines of what you suggested running with equivalent performance to v1.