All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 06/19] drm/i915/gt: Schedule request retirement when submission idles
Date: Tue, 19 Nov 2019 16:42:28 +0000	[thread overview]
Message-ID: <157418174819.12093.10574958764232498040@skylake-alporthouse-com> (raw)
In-Reply-To: <f8d09a9a-b45a-7960-d584-3315ca0c80f3@linux.intel.com>

Quoting Tvrtko Ursulin (2019-11-19 16:33:18)
> 
> On 19/11/2019 16:20, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-11-19 15:04:46)
> >>
> >> On 18/11/2019 23:02, Chris Wilson wrote:
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index 33ce258d484f..f7c8fec436a9 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -142,6 +142,7 @@
> >>>    #include "intel_engine_pm.h"
> >>>    #include "intel_gt.h"
> >>>    #include "intel_gt_pm.h"
> >>> +#include "intel_gt_requests.h"
> >>>    #include "intel_lrc_reg.h"
> >>>    #include "intel_mocs.h"
> >>>    #include "intel_reset.h"
> >>> @@ -2278,6 +2279,18 @@ static void execlists_submission_tasklet(unsigned long data)
> >>>                if (timeout && preempt_timeout(engine))
> >>>                        preempt_reset(engine);
> >>>        }
> >>> +
> >>> +     /*
> >>> +      * If the GPU is currently idle, retire the outstanding completed
> >>> +      * requests. This will allow us to enter soft-rc6 as soon as possible,
> >>> +      * albeit at the cost of running the retire worker much more frequently
> >>> +      * (over the entire GT not just this engine) and emitting more idle
> >>> +      * barriers (i.e. kernel context switches unpinning all that went
> >>> +      * before) which may add some extra latency.
> >>> +      */
> >>> +     if (intel_engine_pm_is_awake(engine) &&
> >>> +         !execlists_active(&engine->execlists))
> >>> +             intel_gt_schedule_retire_requests(engine->gt);
> >>
> >> I am still not a fan of doing this for all platforms.
> > 
> > I understand. I think it makes a fair amount of sense to do early
> > retires, and wish to pursue that if I can show there is no harm.
> 
> It's also a bit of a layering problem.

Them's fighting words! :)
 
> >> Its not just the cost of retirement but there is
> >> intel_engine_flush_submission on all engines in there as well which we
> >> cannot avoid triggering from this path.
> >>
> >> Would it be worth experimenting with additional per-engine retire
> >> workers? Most of the code could be shared, just a little bit of
> >> specialization to filter on engine.
> > 
> > I haven't sketched out anything more than peeking at the last request on
> > the timeline and doing a rq->engine == engine filter. Walking the global
> > timeline.active_list in that case is also a nuisance.
> 
> That together with:
> 
>         flush_submission(gt, engine ? engine->mask : ALL_ENGINES);
> 
> Might be enough? At least to satisfy my concern.

Aye, flushing all other when we know we only care about being idle is
definitely a weak point of the current scheme.

> Apart layering is still bad.. And I'd still limit it to when RC6 WA is 
> active unless it can be shown there is no perf/power impact across 
> GPU/CPU to do this everywhere.

Bah, keep tuning until it's a win for everyone!
 
> At which point it becomes easier to just limit it because we have to 
> have it there.
> 
> I also wonder if the current flush_submission wasn't the reason for 
> performance regression you were seeing with this? It makes this tasklet 
> wait for all other engines, if they are busy. But not sure.. perhaps it 
> is work which would be done anyway.

I haven't finished yet; but the baseline took a big nose dive so it
might be enough to hide a lot of evil.

Too bad I don't have an Icelake with to cross check with an unaffected
platform.

> > There's definitely scope here for us using some more information from
> > process_csb() about which context completed and limit work to that
> > timeline. Hmm, something along those lines maybe...
> 
> But you want to retire all timelines which have work on this particular 
> physical engine. Otherwise it doesn't get parked, no?

There I was suggesting being even more proactive, and say keeping an
llist of completed timelines. Nothing concrete yet, plenty of existing
races found already that need fixing.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 06/19] drm/i915/gt: Schedule request retirement when submission idles
Date: Tue, 19 Nov 2019 16:42:28 +0000	[thread overview]
Message-ID: <157418174819.12093.10574958764232498040@skylake-alporthouse-com> (raw)
Message-ID: <20191119164228.y8Kgp6g1LAq_r_jPq0etNLu8Fkd9QK1DXkXmPRaVgXo@z> (raw)
In-Reply-To: <f8d09a9a-b45a-7960-d584-3315ca0c80f3@linux.intel.com>

Quoting Tvrtko Ursulin (2019-11-19 16:33:18)
> 
> On 19/11/2019 16:20, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-11-19 15:04:46)
> >>
> >> On 18/11/2019 23:02, Chris Wilson wrote:
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index 33ce258d484f..f7c8fec436a9 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -142,6 +142,7 @@
> >>>    #include "intel_engine_pm.h"
> >>>    #include "intel_gt.h"
> >>>    #include "intel_gt_pm.h"
> >>> +#include "intel_gt_requests.h"
> >>>    #include "intel_lrc_reg.h"
> >>>    #include "intel_mocs.h"
> >>>    #include "intel_reset.h"
> >>> @@ -2278,6 +2279,18 @@ static void execlists_submission_tasklet(unsigned long data)
> >>>                if (timeout && preempt_timeout(engine))
> >>>                        preempt_reset(engine);
> >>>        }
> >>> +
> >>> +     /*
> >>> +      * If the GPU is currently idle, retire the outstanding completed
> >>> +      * requests. This will allow us to enter soft-rc6 as soon as possible,
> >>> +      * albeit at the cost of running the retire worker much more frequently
> >>> +      * (over the entire GT not just this engine) and emitting more idle
> >>> +      * barriers (i.e. kernel context switches unpinning all that went
> >>> +      * before) which may add some extra latency.
> >>> +      */
> >>> +     if (intel_engine_pm_is_awake(engine) &&
> >>> +         !execlists_active(&engine->execlists))
> >>> +             intel_gt_schedule_retire_requests(engine->gt);
> >>
> >> I am still not a fan of doing this for all platforms.
> > 
> > I understand. I think it makes a fair amount of sense to do early
> > retires, and wish to pursue that if I can show there is no harm.
> 
> It's also a bit of a layering problem.

Them's fighting words! :)
 
> >> Its not just the cost of retirement but there is
> >> intel_engine_flush_submission on all engines in there as well which we
> >> cannot avoid triggering from this path.
> >>
> >> Would it be worth experimenting with additional per-engine retire
> >> workers? Most of the code could be shared, just a little bit of
> >> specialization to filter on engine.
> > 
> > I haven't sketched out anything more than peeking at the last request on
> > the timeline and doing a rq->engine == engine filter. Walking the global
> > timeline.active_list in that case is also a nuisance.
> 
> That together with:
> 
>         flush_submission(gt, engine ? engine->mask : ALL_ENGINES);
> 
> Might be enough? At least to satisfy my concern.

Aye, flushing all other when we know we only care about being idle is
definitely a weak point of the current scheme.

> Apart layering is still bad.. And I'd still limit it to when RC6 WA is 
> active unless it can be shown there is no perf/power impact across 
> GPU/CPU to do this everywhere.

Bah, keep tuning until it's a win for everyone!
 
> At which point it becomes easier to just limit it because we have to 
> have it there.
> 
> I also wonder if the current flush_submission wasn't the reason for 
> performance regression you were seeing with this? It makes this tasklet 
> wait for all other engines, if they are busy. But not sure.. perhaps it 
> is work which would be done anyway.

I haven't finished yet; but the baseline took a big nose dive so it
might be enough to hide a lot of evil.

Too bad I don't have an Icelake with to cross check with an unaffected
platform.

> > There's definitely scope here for us using some more information from
> > process_csb() about which context completed and limit work to that
> > timeline. Hmm, something along those lines maybe...
> 
> But you want to retire all timelines which have work on this particular 
> physical engine. Otherwise it doesn't get parked, no?

There I was suggesting being even more proactive, and say keeping an
llist of completed timelines. Nothing concrete yet, plenty of existing
races found already that need fixing.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-11-19 16:42 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-18 23:02 Fast soft-rc6 Chris Wilson
2019-11-18 23:02 ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 01/19] drm/i915/selftests: Force bonded submission to overlap Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 02/19] drm/i915/gem: Manually dump the debug trace on GEM_BUG_ON Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 03/19] drm/i915/gt: Close race between engine_park and intel_gt_retire_requests Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-19 14:15   ` Tvrtko Ursulin
2019-11-19 14:15     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-19 14:41     ` Chris Wilson
2019-11-19 14:41       ` [Intel-gfx] " Chris Wilson
2019-11-20 11:39       ` Tvrtko Ursulin
2019-11-20 11:39         ` [Intel-gfx] " Tvrtko Ursulin
2019-11-20 11:51         ` Chris Wilson
2019-11-20 11:51           ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 04/19] drm/i915/gt: Unlock engine-pm after queuing the kernel context switch Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-19 14:35   ` Tvrtko Ursulin
2019-11-19 14:35     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-19 14:50     ` Chris Wilson
2019-11-19 14:50       ` [Intel-gfx] " Chris Wilson
2019-11-19 15:03   ` [PATCH] " Chris Wilson
2019-11-19 15:03     ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 05/19] drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-19 14:54   ` Tvrtko Ursulin
2019-11-19 14:54     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-18 23:02 ` [PATCH 06/19] drm/i915/gt: Schedule request retirement when submission idles Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-19 15:04   ` Tvrtko Ursulin
2019-11-19 15:04     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-19 16:20     ` Chris Wilson
2019-11-19 16:20       ` [Intel-gfx] " Chris Wilson
2019-11-19 16:33       ` Tvrtko Ursulin
2019-11-19 16:33         ` [Intel-gfx] " Tvrtko Ursulin
2019-11-19 16:42         ` Chris Wilson [this message]
2019-11-19 16:42           ` Chris Wilson
2019-11-19 18:58           ` Chris Wilson
2019-11-19 18:58             ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 07/19] drm/i915: Mark up the calling context for intel_wakeref_put() Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-19 15:57   ` Tvrtko Ursulin
2019-11-19 15:57     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-19 16:12     ` Chris Wilson
2019-11-19 16:12       ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 08/19] drm/i915/gem: Merge GGTT vma flush into a single loop Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 09/19] drm/i915/gt: Only wait for register chipset flush if active Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 10/19] drm/i915: Protect the obj->vma.list during iteration Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 11/19] drm/i915: Wait until the intel_wakeref idle callback is complete Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-19 16:15   ` Tvrtko Ursulin
2019-11-19 16:15     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-18 23:02 ` [PATCH 12/19] drm/i915/gt: Declare timeline.lock to be irq-free Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-19 15:58   ` Tvrtko Ursulin
2019-11-19 15:58     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-18 23:02 ` [PATCH 13/19] drm/i915/gt: Move new timelines to the end of active_list Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-19 16:02   ` Tvrtko Ursulin
2019-11-19 16:02     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-18 23:02 ` [PATCH 14/19] drm/i915/gt: Schedule next retirement worker first Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-19 16:07   ` Tvrtko Ursulin
2019-11-19 16:07     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-18 23:02 ` [PATCH 15/19] drm/i915/gt: Flush the requests after wedging on suspend Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-19 16:12   ` Tvrtko Ursulin
2019-11-19 16:12     ` [Intel-gfx] " Tvrtko Ursulin
2019-11-19 17:22     ` Chris Wilson
2019-11-19 17:22       ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 16/19] drm/i915/selftests: Flush the active callbacks Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 17/19] drm/i915/selftests: Be explicit in ERR_PTR handling Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 18/19] drm/i915/selftests: Exercise rc6 handling Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-18 23:02 ` [PATCH 19/19] drm/i915/gt: Track engine round-trip times Chris Wilson
2019-11-18 23:02   ` [Intel-gfx] " Chris Wilson
2019-11-18 23:21 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/19] drm/i915/selftests: Force bonded submission to overlap Patchwork
2019-11-18 23:21   ` [Intel-gfx] " Patchwork
2019-11-19  0:04 ` ✓ Fi.CI.BAT: success " Patchwork
2019-11-19  0:04   ` [Intel-gfx] " Patchwork
2019-11-19  9:08 ` ✗ Fi.CI.IGT: failure " Patchwork
2019-11-19  9:08   ` [Intel-gfx] " Patchwork
2019-11-19 19:04 ` ✗ Fi.CI.BUILD: failure for series starting with [01/19] drm/i915/selftests: Force bonded submission to overlap (rev2) Patchwork
2019-11-19 19:04   ` [Intel-gfx] " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=157418174819.12093.10574958764232498040@skylake-alporthouse-com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.