All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
Date: Wed, 8 Jun 2016 14:47:49 +0100	[thread overview]
Message-ID: <20160608134749.GF32344@nuc-i3427.alporthouse.com> (raw)
In-Reply-To: <5758132B.4050409@linux.intel.com>

On Wed, Jun 08, 2016 at 01:44:27PM +0100, Tvrtko Ursulin wrote:
> 
> On 08/06/16 13:34, Chris Wilson wrote:
> >On Wed, Jun 08, 2016 at 12:47:28PM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 08/06/16 12:24, Chris Wilson wrote:
> >>>On Wed, Jun 08, 2016 at 11:16:13AM +0100, Tvrtko Ursulin wrote:
> >>>>
> >>>>And why so long delays? It looks pretty lightweight to me.
> >>>>
> >>>>>>One alternative could perhaps be to add a waiter->wake_up vfunc and
> >>>>>>signalers could then potentially use a tasklet?
> >>>>>
> >>>>>Hmm, I did find that in order to reduce execlists latency, I had to
> >>>>>drive the tasklet processing from the signaler.
> >>>>
> >>>>What do you mean? The existing execlists tasklet? Now would that work?
> >>>
> >>>Due to how dma-fence signals, the softirq is never kicked
> >>>(spin_lock_irq doesn't handle local_bh_enable()) and so we would only
> >>>submit a new task via execlists on a reschedule. That latency added
> >>>about 30% (30s on bsw) to gem_exec_parallel.
> >>
> >>I don't follow. User interrupts are separate from context complete
> >>which drives the submission. How do fences interfere with the
> >>latter?
> >
> >The biggest user benchmark (ala sysmark) regression we have for
> >execlists is the latency in submitting the first request to hardware via
> >elsp (or at least the hw responding to and executing that batch,
> >the per-bb and per-ctx w/a are not free either). If we incur extra
> >latency in the driver in even adding the request to the queue for an
> >idle GPU, that is easily felt by userspace.
> 
> I still don't see how fences tie into that. But it is not so
> important since it was all along the lines of "do we really need a
> thread".

I was just mentioning in passing an issue I noticed when mixing fences
and tasklets! Which boils down to spin_unlock_irq() doesn't do
local_bh_enable() and so trying to schedule a tasklet from inside a
fence callback incurs more latency than you would expect. Entirely
unrelated expect for the signaling, fencing and their uses ;)

> >>>>>>>+int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
> >>>>>>>+{
> >>>>>>>+	struct intel_engine_cs *engine = request->engine;
> >>>>>>>+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >>>>>>>+	struct rb_node *parent, **p;
> >>>>>>>+	struct signal *signal;
> >>>>>>>+	bool first, wakeup;
> >>>>>>>+
> >>>>>>>+	if (unlikely(IS_ERR(b->signaler)))
> >>>>>>>+		return PTR_ERR(b->signaler);
> >>>>>>
> >>>>>>I don't see that there is a fallback is kthread creation failed. It
> >>>>>>should just fail in intel_engine_init_breadcrumbs if that happens.
> >>>>>
> >>>>>Because it is not fatal to using the GPU, just one optional function.
> >>>>
> >>>>But we never expect it to fail and it is not even dependent on
> >>>>anything user controllable. Just a random error which would cause
> >>>>user experience to degrade. If thread creation failed it means
> >>>>system is in such a poor shape I would just fail the driver init.
> >>>
> >>>A minimally functional system is better than nothing at all.
> >>>GEM is not required for driver loading, interrupt driven dma-fences less
> >>>so.
> >>
> >>If you are so hot for that, how about vfuncing enable signaling in
> >>that case? Because I find the "have we created our kthread at driver
> >>init time successfuly" question for every fence a bit too much.
> >
> >read + conditional that pulls in the cacheline we want? You can place
> >the test after the spinlock if you want to avoid the cost I supose.
> >Or we just mark the GPU as wedged.
> 
> What I meant was to pass in different fence_ops at fence_init time
> depending on whether or not signaler thread was created or not. If
> driver is wanted to be functional in that case, and
> fence->enable_signaling needs to keep returning errors, it sound
> like a much more elegant solution than to repeating the check at
> every fence->enable_signaling call.

Actually, looking at it, the code was broken for !thread as there was
not an automatic fallback to polling by dma-fence. Choice between doing
that ourselves for an impossible failure case or just marking the GPU as
wedged on init.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-06-08 13:47 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-03 16:08 Breadcrumbs, again Chris Wilson
2016-06-03 16:08 ` [PATCH 01/21] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
2016-06-03 16:08 ` [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
2016-06-08  8:42   ` Daniel Vetter
2016-06-08  9:13     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 03/21] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
2016-06-06 12:52   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 04/21] drm/i915: Make queueing the hangcheck work inline Chris Wilson
2016-06-03 16:08 ` [PATCH 05/21] drm/i915: Separate GPU hang waitqueue from advance Chris Wilson
2016-06-06 13:00   ` Tvrtko Ursulin
2016-06-07 12:11     ` Arun Siluvery
2016-06-03 16:08 ` [PATCH 06/21] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
2016-06-06 13:58   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 07/21] drm/i915: Spin after waking up for an interrupt Chris Wilson
2016-06-06 14:39   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 08/21] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
2016-06-06 14:55   ` Tvrtko Ursulin
2016-06-08  9:24     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 09/21] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
2016-06-06 15:03   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 10/21] drm/i915: Allocate scratch page from stolen Chris Wilson
2016-06-06 15:05   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 11/21] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
2016-06-06 15:09   ` Tvrtko Ursulin
2016-06-08  9:27     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 12/21] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) Chris Wilson
2016-06-03 16:08 ` [PATCH 13/21] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
2016-06-06 15:10   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 14/21] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
2016-06-06 15:34   ` Tvrtko Ursulin
2016-06-08  9:35     ` Chris Wilson
2016-06-08  9:57       ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 15/21] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
2016-06-08  8:54   ` Daniel Vetter
2016-06-03 16:08 ` [PATCH 16/21] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
2016-06-06 13:50   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
2016-06-07 12:04   ` Tvrtko Ursulin
2016-06-08  9:48     ` Chris Wilson
2016-06-08 10:16       ` Tvrtko Ursulin
2016-06-08 11:24         ` Chris Wilson
2016-06-08 11:47           ` Tvrtko Ursulin
2016-06-08 12:34             ` Chris Wilson
2016-06-08 12:44               ` Tvrtko Ursulin
2016-06-08 13:47                 ` Chris Wilson [this message]
2016-06-03 16:08 ` [PATCH 18/21] drm/i915: Embed signaling node into the GEM request Chris Wilson
2016-06-07 12:31   ` Tvrtko Ursulin
2016-06-08  9:54     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller Chris Wilson
2016-06-07 12:46   ` Tvrtko Ursulin
2016-06-08 10:01     ` Chris Wilson
2016-06-08 10:18       ` Tvrtko Ursulin
2016-06-08 11:10         ` Chris Wilson
2016-06-08 11:49           ` Tvrtko Ursulin
2016-06-08 12:54             ` Chris Wilson
2016-06-03 16:08 ` [PATCH 20/21] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
2016-06-07 12:50   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 21/21] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
2016-06-07 12:51   ` Tvrtko Ursulin
2016-06-03 16:35 ` ✗ Ro.CI.BAT: failure for series starting with [01/21] drm/i915/shrinker: Flush active on objects before counting Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160608134749.GF32344@nuc-i3427.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.