All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 08/15] drm/i915: Slaughter the thundering i915_wait_request herd
Date: Mon, 30 Nov 2015 14:18:29 +0000	[thread overview]
Message-ID: <20151130141829.GC14582@nuc-i3427.alporthouse.com> (raw)
In-Reply-To: <565C4FE0.8000403@linux.intel.com>

On Mon, Nov 30, 2015 at 01:32:16PM +0000, Tvrtko Ursulin wrote:
> 
> On 30/11/15 12:30, Chris Wilson wrote:
> >On Mon, Nov 30, 2015 at 12:05:50PM +0000, Tvrtko Ursulin wrote:
> >>>+	struct intel_breadcrumbs {
> >>>+		spinlock_t lock; /* protects the per-engine request list */
> >>
> >>s/list/tree/
> >
> >list. We use the tree as a list!
> 
> Maybe but it confuses the reader a bit who goes and looks for some
> list then.
> 
> Related, when I was playing with this I was piggy backing on the
> existing per engine request list.

A separate list with separate locking only used when waiting, that's the
appeal for me.
 
> If they are seqno ordered there, and I think they are, then the
> waker thread just traverses it in order until hitting the not
> completed one, waking up every req->wait_queue as it goes.
>
> In this case it doesn't matter that the waiters come in in
> indeterministic order so we don't need a (new) tree.
> 
> But the downside is the thread needs struct mutex. And that the
> retirement from that list is so lazy..

And we can't even take struct_mutex!

> 
> >>>+		struct task_struct *task; /* bh for all user interrupts */
> >>>+		struct intel_breadcrumbs_engine {
> >>>+			struct rb_root requests; /* sorted by retirement */
> >>>+			struct drm_i915_gem_request *first;
> >>
> >>/* First request in the tree to be completed next by the hw. */
> >>
> >>?
> >
> >/* first */ ?
> 
> Funny? :) First what? Why make the reader reverse engineer it?

struct drm_i915_gem_request *first_waiter; ?

Please don't make me add /* ^^^ */

> >>For a later extension:
> >>
> >>How would you view, some time in the future, adding a bool return to
> >>ring->put_irq() which would say to the thread that it failed to
> >>disable interrupts?
> >>
> >>In this case thread would exit and would need to be restarted when
> >>the next waiter queues up. Possibly extending the
> >>idle->put_irq->exit durations as well then.
> >
> >Honestly, not keen on the idea that the lowlevel put_irq can fail (makes
> >cleanup much more fraught). I don't see what improvement you intend
> >here, maybe clarifying that would help explain what you mean.
> 
> Don't know, maybe reflecting the fact it wasn't the last put_irq
> call so let the caller handle it if they want. We can leave this
> discussion for the future.

Hmm. The only other use is the trace_irq. It would be nice to eliminate
that and the ring->irq_count.
> 
> >>>@@ -2986,16 +2981,16 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> >>>  			if (ring_idle(ring, seqno)) {
> >>>  				ring->hangcheck.action = HANGCHECK_IDLE;
> >>>
> >>>-				if (waitqueue_active(&ring->irq_queue)) {
> >>>+				if (READ_ONCE(dev_priv->breadcrumbs.engine[ring->id].first)) {
> >>
> >>READ_ONCE is again strange, but other than that I don't know the
> >>hangcheck code to really evaulate it.
> >>
> >>Perhaps this also means this line needs a comment describing what
> >>condition/state is actually approximating with the check.
> >
> >All we are doing is asking if anyone is waiting on the GPU, because the
> >GPU has finished all of its work. If so, the IRQ handling is suspect
> >(modulo the pre-existing race condition clearly earmarked by READ_ONCE).
> 
> Cool, /* Check if someone is waiting on the GPU */ then would make me happy.

+static bool any_waiters_on_engine(struct intel_engine_cs *engine)
+{
+       return READ_ONCE(engine->i915->breadcrumbs.engine[engine->id].first_waiter);
+}
+
 static bool any_waiters(struct drm_i915_private *dev_priv)
 {
        struct intel_engine_cs *ring;
        int i;
 
        for_each_ring(ring, dev_priv, i)
-               if (ring->irq_refcount)
+               if (any_waiters_on_engine(engine))
                        return true;
 
        return false;

> >>>+	while (!kthread_should_stop()) {
> >>>+		unsigned reset_counter = i915_reset_counter(&i915->gpu_error);
> >>>+		unsigned long timeout_jiffies;
> >>>+		bool idling = false;
> >>>+
> >>>+		/* On every tick, walk the seqno-ordered list of requests
> >>>+		 * and for each retired request wakeup its clients. If we
> >>>+		 * find an unfinished request, go back to sleep.
> >>>+		 */
> >>
> >>s/tick/loop/ ?
> >s/tick/iteration/ I'm sticking with tick
> 
> Tick makes me tick of a scheduler tick so to me it is the worst of
> the three. Iteration sounds really good. Whether that will be a
> tick/jiffie/orsomething is definite a bit lower in the code.

Hmm, that was the connection I liked with tick.
 
> >>And if we don't find and unfinished request still go back to sleep. :)
> >
> >Ok.
> >
> >>>+		set_current_state(TASK_INTERRUPTIBLE);
> >>>+
> >>>+		/* Note carefully that we do not hold a reference for the
> >>>+		 * requests on this list. Therefore we only inspect them
> >>>+		 * whilst holding the spinlock to ensure that they are not
> >>>+		 * freed in the meantime, and the client must remove the
> >>>+		 * request from the list if it is interrupted (before it
> >>>+		 * itself releases its reference).
> >>>+		 */
> >>>+		spin_lock(&b->lock);
> >>
> >>This lock is more per engine that global in its nature unless you
> >>think it was more efficient to do fewer lock atomics vs potential
> >>overlap of waiter activity per engines?
> >
> >Indeed. I decided not to care about making it per-engine.
> >
> >>Would need a new lock for req->irq_count, per request. So
> >>req->lock/req->irq_count and be->lock for the tree operations.
> >
> >Why? A request is tied to an individual engine, therefore we can use
> >that breadcrumb_engine.lock for the request.
> 
> Just so 2nd+ waiters would not touch the per engine lock.

Ah. I am not expecting that much contention and even fewer multiple
waiters for a request.

> >>>+		for (i = 0; i < I915_NUM_RINGS; i++) {
> >>>+			struct intel_engine_cs *engine = &i915->ring[i];
> >>>+			struct intel_breadcrumbs_engine *be = &b->engine[i];
> >>>+			struct drm_i915_gem_request *request = be->first;
> >>>+
> >>>+			if (request == NULL) {
> >>>+				if ((irq_get & (1 << i))) {
> >>>+					if (irq_enabled & (1 << i)) {
> >>>+						engine->irq_put(engine);
> >>>+						irq_enabled &= ~ (1 << i);
> >>>+					}
> >>>+					intel_runtime_pm_put(i915);
> >>>+					irq_get &= ~(1 << i);
> >>>+				}
> >>>+				continue;
> >>>+			}
> >>>+
> >>>+			if ((irq_get & (1 << i)) == 0) {
> >>>+				intel_runtime_pm_get(i915);
> >>>+				irq_enabled |= __irq_enable(engine) << i;
> >>>+				irq_get |= 1 << i;
> >>>+			}
> >>
> >>irq_get and irq_enabled are creating a little bit of a headache for
> >>me. And that would probably mean a comment is be in order to explain
> >>what they are for and how they work.
> >>
> >>Like this I don't see the need for irq_get to persist across loops?
> >
> >Because getting the irq is quite slow, we add a jiffie shadow.
> 
> But it is not a shadow in the sense of the same bitmask from the
> previous iteration. The bits are different depending on special
> cases in __irq_enable.
> 
> So it needs some comments I think.

Didn't you get that from idling and its comments?

First, how about engines_active and irqs_enabled?

...
                        /* When enabling waiting upon an engine, we need
                         * to unmask the user interrupt, and before we
                         * do so we need to hold a wakelock for our
                         * hardware access (and the interrupts thereafter).
                         */
                        if ((engines_active & (1 << i)) == 0) {
...
                        if (request == NULL) {
                                /* No more waiters, mask the user interrupt
                                 * (if we were able to enable it!) and
                                 * release the wakelock.
                                 */
                                if (engines_active & (1 << i)) {


with a bit of work, I can then get the irq and rpm references out of the
spinlock.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-11-30 14:18 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-29  8:47 i915_wait_request scaling Chris Wilson
2015-11-29  8:47 ` [PATCH 01/15] drm/i915: Break busywaiting for requests on pending signals Chris Wilson
2015-11-30 10:01   ` Tvrtko Ursulin
2015-11-29  8:48 ` [PATCH 02/15] drm/i915: Limit the busy wait on requests to 10us not 10ms! Chris Wilson
2015-11-30 10:02   ` Tvrtko Ursulin
2015-11-30 10:08     ` Chris Wilson
2015-11-29  8:48 ` [PATCH 03/15] drm/i915: Only spin whilst waiting on the current request Chris Wilson
2015-11-30 10:06   ` Tvrtko Ursulin
2015-12-01 15:47     ` Dave Gordon
2015-12-01 15:58       ` Chris Wilson
2015-12-01 16:44         ` Dave Gordon
2015-12-03  8:52           ` Daniel Vetter
2015-11-29  8:48 ` [PATCH 04/15] drm/i915: Cache the reset_counter for the request Chris Wilson
2015-12-01  8:31   ` Daniel Vetter
2015-12-01  8:47     ` Chris Wilson
2015-12-01  9:15       ` Chris Wilson
2015-12-01 11:05         ` [PATCH 1/3] drm/i915: Hide the atomic_read(reset_counter) behind a helper Chris Wilson
2015-12-01 11:05           ` [PATCH 2/3] drm/i915: Store the reset counter when constructing a request Chris Wilson
2015-12-03  8:59             ` Daniel Vetter
2015-12-01 11:05           ` [PATCH 3/3] drm/i915: Prevent leaking of -EIO from i915_wait_request() Chris Wilson
2015-12-03  9:14             ` Daniel Vetter
2015-12-03  9:41               ` Chris Wilson
2015-12-11  9:02               ` Chris Wilson
2015-12-11 16:46                 ` Daniel Vetter
2015-12-03  8:57           ` [PATCH 1/3] drm/i915: Hide the atomic_read(reset_counter) behind a helper Daniel Vetter
2015-12-03  9:02             ` Chris Wilson
2015-12-03  9:20               ` Daniel Vetter
2015-11-29  8:48 ` [PATCH 05/15] drm/i915: Suppress error message when GPU resets are disabled Chris Wilson
2015-12-01  8:30   ` Daniel Vetter
2015-11-29  8:48 ` [PATCH 06/15] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
2015-11-29  8:48 ` [PATCH 07/15] drm/i915: Check the timeout passed to i915_wait_request Chris Wilson
2015-11-30 10:14   ` Tvrtko Ursulin
2015-11-30 10:19     ` Chris Wilson
2015-11-30 10:27       ` Tvrtko Ursulin
2015-11-30 10:22   ` Chris Wilson
2015-11-30 10:28     ` Tvrtko Ursulin
2015-11-29  8:48 ` [PATCH 08/15] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
2015-11-30 10:53   ` Chris Wilson
2015-11-30 12:09     ` Tvrtko Ursulin
2015-11-30 12:38       ` Chris Wilson
2015-11-30 13:33         ` Tvrtko Ursulin
2015-11-30 14:30           ` Chris Wilson
2015-11-30 12:05   ` Tvrtko Ursulin
2015-11-30 12:30     ` Chris Wilson
2015-11-30 13:32       ` Tvrtko Ursulin
2015-11-30 14:18         ` Chris Wilson [this message]
2015-12-01 17:06           ` Dave Gordon
2015-11-30 14:26         ` Chris Wilson
2015-11-30 14:34   ` [PATCH v4] " Chris Wilson
2015-11-30 16:30     ` Chris Wilson
2015-11-30 16:40       ` Chris Wilson
2015-12-01 18:34     ` Dave Gordon
2015-12-03 16:22       ` [PATCH v7] " Chris Wilson
2015-12-07 15:08         ` Tvrtko Ursulin
2015-12-08 10:44           ` Chris Wilson
2015-12-08 14:03             ` Tvrtko Ursulin
2015-12-08 14:33               ` Chris Wilson
2015-11-23 11:34                 ` [RFC 00/12] Convert requests to use struct fence John.C.Harrison
2015-11-23 11:34                   ` [RFC 01/12] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
2015-11-23 13:29                     ` Maarten Lankhorst
2015-11-23 13:31                     ` [Intel-gfx] " Tvrtko Ursulin
2015-11-23 11:34                   ` [RFC 02/12] staging/android/sync: add sync_fence_create_dma John.C.Harrison
2015-11-23 13:27                     ` Maarten Lankhorst
2015-11-23 13:38                       ` John Harrison
2015-11-23 13:44                         ` Tvrtko Ursulin
2015-11-23 13:48                           ` Maarten Lankhorst
2015-11-23 11:34                   ` [RFC 03/12] staging/android/sync: Move sync framework out of staging John.C.Harrison
2015-11-23 11:34                   ` [RFC 04/12] drm/i915: Convert requests to use struct fence John.C.Harrison
2015-11-23 11:34                   ` [RFC 05/12] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
2015-11-23 11:34                   ` [RFC 06/12] drm/i915: Add per context timelines to fence object John.C.Harrison
2015-11-23 11:34                   ` [RFC 07/12] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
2015-11-23 11:34                   ` [RFC 08/12] drm/i915: Interrupt driven fences John.C.Harrison
2015-12-11 12:17                     ` Tvrtko Ursulin
2015-11-23 11:34                   ` [RFC 09/12] drm/i915: Updated request structure tracing John.C.Harrison
2015-11-23 11:34                   ` [RFC 10/12] android/sync: Fix reversed sense of signaled fence John.C.Harrison
2015-11-23 11:34                   ` [RFC 11/12] drm/i915: Add sync framework support to execbuff IOCTL John.C.Harrison
2015-11-23 11:34                   ` [RFC 12/12] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
2015-11-23 11:38                   ` [RFC 00/12] Convert requests to use struct fence John Harrison
2015-12-08 14:53                   ` [PATCH v7] drm/i915: Slaughter the thundering i915_wait_request herd Dave Gordon
2015-11-30 15:45   ` [PATCH] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
2015-11-29  8:48 ` [PATCH 09/15] drm/i915: Separate out the seqno-barrier from engine->get_seqno Chris Wilson
2015-11-29  8:48 ` [PATCH 10/15] drm/i915: Remove the lazy_coherency parameter from request-completed? Chris Wilson
2015-11-29  8:48 ` [PATCH 11/15] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
2015-11-29  8:48 ` [PATCH 12/15] drm/i915: Reduce seqno/irq barrier to a clflush on legacy gen6+ Chris Wilson
2015-11-29  8:48 ` [PATCH 13/15] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
2015-12-01 16:57   ` Dave Gordon
2015-12-04  9:36     ` Daniel Vetter
2015-12-04  9:51       ` Chris Wilson
2015-11-29  8:48 ` [PATCH 14/15] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
2015-11-30 10:19   ` Tvrtko Ursulin
2015-11-30 14:31     ` Chris Wilson
2015-11-29  8:48 ` [PATCH 15/15] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151130141829.GC14582@nuc-i3427.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.