All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Cc: matthew.auld@intel.com
Subject: Re: [PATCH 04/43] drm/i915: Do a synchronous switch-to-kernel-context on idling
Date: Fri, 08 Mar 2019 08:59:55 +0000	[thread overview]
Message-ID: <155203559518.27405.3468094099541620703@skylake-alporthouse-com> (raw)
In-Reply-To: <289293e2-e074-8b19-b42e-ff42ddf31a26@linux.intel.com>

Quoting Tvrtko Ursulin (2019-03-08 06:46:52)
> 
> On 07/03/2019 22:24, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-07 17:06:58)
> >>
> >> On 07/03/2019 13:29, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2019-03-07 13:07:18)
> >>>>
> >>>> On 06/03/2019 14:24, Chris Wilson wrote:
> >>>>> +static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
> >>>>> +{
> >>>>> +     if (i915_gem_switch_to_kernel_context(i915))
> >>>>> +             return false;
> >>>>
> >>>> Is it worth still trying to idle if this fails? Since the timeout is
> >>>> short, maybe reset in idle state bring less havoc than not. It can only
> >>>> fail on memory allocations I think, okay and terminally wedged. In which
> >>>> case it is still okay.
> >>>
> >>> Terminally wedged is hard wired to return 0 (this, the next patch?) so
> >>> that we don't bail during i915_gem_suspend() for this reason.
> >>>
> >>> We do still idle if this fails, as we mark the driver/GPU as wedged.
> >>> Perform a GPU reset so that it hopefully isn't shooting kittens any
> >>> more, and pull a pillow over our heads.
> >>
> >> I didn't find a path which idles before wedging if
> >> switch_to_kernel_context_sync fails due failing
> >> i915_gem_switch_to_kernel_context. Where is it?
> > 
> > Wedging implies idling. When we are wedged, the GPU is reset and left
> > pointing into the void (effectively idle, GT powersaving should be
> > unaffected by the wedge, don't ask how that works on ilk). All the
> > callers do
> > 
> > if (!switch_to_kernel_context_sync())
> >       i915_gem_set_wedged()
> > 
> > with more or less intermediate steps. Hmm, given that is true why not
> > pull it into switch_to_kernel_context_sync()...
> 
> If all callers follow up with a wedge maybe, yes.

The problem is we don't reach that point until a couple more patches.
Sign.

> > 
> >> It is a minor concern don't get me wrong. It is unlikely to fail like
> >> this. I was simply thinking why not try and wait for the current work to
> >> finish before suspending in this case. Might be a better experience
> >> after resume.
> > 
> > For the desktop use case, it's immaterial as the hotplug, reconfigure
> > and redraw take care of that. (Fbcon is also cleared.)
> 
> But a matter of whether context state is sane or corrupt I think comes 
> into play. A short wait for idle before suspend might still work if the 
> extremely unlikely fail in i915_gem_switch_to_kernel_context happens.

But then their context may be corrupt because of the suspend. The state
being still in the GPU as we save the pages.

Speaking of which since we have the default context state now, we should
restore the kernel contexts across resume.
 
> Seems more robust to me to try regardless since the timeout is short.

We don't trust suspend not to lose updates to the resident context. And
wedging itself isn't aware that it may be damaging a pinned but inactive
context, as reset itself is ignorant of that (we hope the requests that
HW save itself before reset take effect!)

Accept the compromise of

	bool success = true;

	if (switch_to_kernel_context() < 0)
		success = false;

	if (wait_for_idle(I915_GEM_IDLE_TIMEOUT) < 0)
		success = false;
	
	return success;

	if (!success)
		i915_gem_set_wedged().

So anyone who was able to save themselves before the ship sank, does.

> >>>>>     static void
> >>>>>     i915_gem_idle_work_handler(struct work_struct *work)
> >>>>>     {
> >>>>> -     struct drm_i915_private *dev_priv =
> >>>>> -             container_of(work, typeof(*dev_priv), gt.idle_work.work);
> >>>>> +     struct drm_i915_private *i915 =
> >>>>> +             container_of(work, typeof(*i915), gt.idle_work.work);
> >>>>> +     typeof(i915->gt) *gt = &i915->gt;
> >>>>
> >>>> I am really not sure about the typeof idiom in normal C code. :( It
> >>>> saves a little bit of typing, and a little bit of churn if type name
> >>>> changes, but just feels weird to use it somewhere and somewhere not.
> >>>
> >>> But then we have to name thing! We're not sold on gt; it means quite a
> >>> few different things around the bspec. This bit is actually part that
> >>> I'm earmarking for i915_gem itself (high level idle/power/user management)
> >>> and I'm contemplating i915_gpu for the bits that beneath us but still a
> >>> management layer over hardware (and intel_foo for the bits that talk to
> >>> hardware. Maybe that too will change if we completely split out into
> >>> different modules.)
> >>
> >> So you could have left it as is for now and have a smaller diff. But
> >> okay.. have it if you insist.
> > 
> > No, you've stated on a few occasions that you don't like gt->X so I'll
> > have to find a new strategy and fixup patches as I remember your
> > distaste.
> 
> I know I don't like sprinkling of typeof to declare locals, but don't 
> remember if I disliked something more. Not sure what your "no" refers to 
> now. That you feel you had to do this in this patch, or that you don't 
> accept mine "have it if you insist"?

I think you have reasonable objection to using typeof(i915->gt) / auto
locals and will rework patches to not introduce them.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-03-08  9:00 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-06 14:24 RFC Breaking up GEM struct_mutex for async-pages Chris Wilson
2019-03-06 14:24 ` [PATCH 01/43] drm/i915/selftests: Canonicalise gen8 addresses Chris Wilson
2019-03-06 14:24 ` [PATCH 02/43] drm/i915: Force GPU idle on suspend Chris Wilson
2019-03-07  9:38   ` Tvrtko Ursulin
2019-03-06 14:24 ` [PATCH 03/43] drm/i915/selftests: Improve switch-to-kernel-context checking Chris Wilson
2019-03-07 12:40   ` Tvrtko Ursulin
2019-03-07 13:17     ` Chris Wilson
2019-03-07 13:21       ` Tvrtko Ursulin
2019-03-06 14:24 ` [PATCH 04/43] drm/i915: Do a synchronous switch-to-kernel-context on idling Chris Wilson
2019-03-07 13:07   ` Tvrtko Ursulin
2019-03-07 13:29     ` Chris Wilson
2019-03-07 17:06       ` Tvrtko Ursulin
2019-03-07 22:24         ` Chris Wilson
2019-03-08  6:46           ` Tvrtko Ursulin
2019-03-08  8:59             ` Chris Wilson [this message]
2019-03-06 14:24 ` [PATCH 05/43] drm/i915: Refactor common code to load initial power context Chris Wilson
2019-03-07 13:19   ` Tvrtko Ursulin
2019-03-07 22:26     ` Chris Wilson
2019-03-08  6:48       ` Tvrtko Ursulin
2019-03-06 14:24 ` [PATCH 06/43] drm/i915: Reduce presumption of request ordering for barriers Chris Wilson
2019-03-07 17:26   ` Tvrtko Ursulin
2019-03-06 14:24 ` [PATCH 07/43] drm/i915: Remove has-kernel-context Chris Wilson
2019-03-07 17:29   ` Tvrtko Ursulin
2019-03-06 14:24 ` [PATCH 08/43] drm/i915: Introduce the i915_user_extension_method Chris Wilson
2019-03-06 14:24 ` [PATCH 09/43] drm/i915: Track active engines within a context Chris Wilson
2019-03-06 14:24 ` [PATCH 10/43] drm/i915: Introduce a context barrier callback Chris Wilson
2019-03-06 14:24 ` [PATCH 11/43] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
2019-03-06 14:24 ` [PATCH 12/43] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
2019-03-06 14:24 ` [PATCH 13/43] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
2019-03-06 14:24 ` [PATCH 14/43] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
2019-03-06 14:24 ` [PATCH 15/43] drm/i915: Allow a context to define its set of engines Chris Wilson
2019-03-06 14:24 ` [PATCH 16/43] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
2019-03-06 14:24 ` [PATCH 17/43] drm/i915: Split struct intel_context definition to its own header Chris Wilson
2019-03-06 14:24 ` [PATCH 18/43] drm/i915: Store the intel_context_ops in the intel_engine_cs Chris Wilson
2019-03-06 14:39   ` Tvrtko Ursulin
2019-03-06 14:24 ` [PATCH 19/43] drm/i915: Move over to intel_context_lookup() Chris Wilson
2019-03-06 14:40   ` Tvrtko Ursulin
2019-03-06 14:24 ` [PATCH 20/43] drm/i915: Make context pinning part of intel_context_ops Chris Wilson
2019-03-06 14:24 ` [PATCH 21/43] drm/i915: Track the pinned kernel contexts on each engine Chris Wilson
2019-03-08  9:26   ` Tvrtko Ursulin
2019-03-06 14:24 ` [PATCH 22/43] drm/i915: Introduce intel_context.pin_mutex for pin management Chris Wilson
2019-03-06 14:43   ` Tvrtko Ursulin
2019-03-06 14:51     ` Chris Wilson
2019-03-06 15:46       ` Tvrtko Ursulin
2019-03-06 14:24 ` [PATCH 23/43] drm/i915: Load balancing across a virtual engine Chris Wilson
2019-03-06 14:24 ` [PATCH 24/43] drm/i915: Extend execution fence to support a callback Chris Wilson
2019-03-06 14:24 ` [PATCH 25/43] drm/i915/execlists: Virtual engine bonding Chris Wilson
2019-03-06 14:25 ` [PATCH 26/43] drm/i915: Allow specification of parallel execbuf Chris Wilson
2019-03-06 14:25 ` [PATCH 27/43] drm/i915/selftests: Check preemption support on each engine Chris Wilson
2019-03-08  9:31   ` Tvrtko Ursulin
2019-03-06 14:25 ` [PATCH 28/43] drm/i915/execlists: Skip direct submission if only lite-restore Chris Wilson
2019-03-06 14:25 ` [PATCH 29/43] drm/i915: Split GEM object type definition to its own header Chris Wilson
2019-03-06 15:46   ` Matthew Auld
2019-03-06 14:25 ` [PATCH 30/43] drm/i915: Pull GEM ioctls interface to its own file Chris Wilson
2019-03-06 15:48   ` Matthew Auld
2019-03-06 14:25 ` [PATCH 31/43] drm/i915: Move object->pages API to i915_gem_object.[ch] Chris Wilson
2019-03-06 16:23   ` Matthew Auld
2019-03-06 16:29     ` Chris Wilson
2019-03-06 14:25 ` [PATCH 32/43] drm/i915: Move shmem object setup to its own file Chris Wilson
2019-03-06 17:05   ` Matthew Auld
2019-03-06 17:24     ` Chris Wilson
2019-03-06 14:25 ` [PATCH 33/43] drm/i915: Move phys objects " Chris Wilson
2019-03-07 12:31   ` Matthew Auld
2019-03-06 14:25 ` [PATCH 34/43] drm/i915: Move mmap and friends " Chris Wilson
2019-03-07 12:42   ` Matthew Auld
2019-03-06 14:25 ` [PATCH 35/43] drm/i915: Move GEM domain management " Chris Wilson
2019-03-07 13:00   ` Matthew Auld
2019-03-06 14:25 ` [PATCH 36/43] drm/i915: Move more GEM objects under gem/ Chris Wilson
2019-03-06 14:25 ` [PATCH 37/43] drm/i915: Pull scatterlist utils out of i915_gem.h Chris Wilson
2019-03-06 14:25 ` [PATCH 38/43] drm/i915: Move GEM object domain management from struct_mutex to local Chris Wilson
2019-03-06 14:25 ` [PATCH 39/43] drm/i915: Move GEM object waiting to its own file Chris Wilson
2019-03-06 14:25 ` [PATCH 40/43] drm/i915: Move GEM object busy checking " Chris Wilson
2019-03-06 14:25 ` [PATCH 41/43] drm/i915: Move GEM client throttling " Chris Wilson
2019-03-06 14:25 ` [PATCH 42/43] drm/i915: Drop the deferred active reference Chris Wilson
2019-03-06 14:25 ` [PATCH 43/43] drm/i915: Move object close under its own lock Chris Wilson
2019-03-06 15:08 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/43] drm/i915/selftests: Canonicalise gen8 addresses Patchwork
2019-03-06 15:25 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-03-06 16:19 ` ✗ Fi.CI.BAT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=155203559518.27405.3468094099541620703@skylake-alporthouse-com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.