All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Harrison <John.C.Harrison@Intel.com>
To: Dave Gordon <david.s.gordon@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [CI-ping 15/15] drm/i915: Late request cancellations are harmful
Date: Thu, 21 Apr 2016 14:04:29 +0100	[thread overview]
Message-ID: <5718CFDD.2050306@Intel.com> (raw)
In-Reply-To: <57162614.6030507@intel.com>

On 19/04/2016 13:35, Dave Gordon wrote:
> On 13/04/16 15:21, John Harrison wrote:
>> On 13/04/2016 10:57, Daniel Vetter wrote:
>>> On Tue, Apr 12, 2016 at 09:03:09PM +0100, Chris Wilson wrote:
>>>> Conceptually, each request is a record of a hardware transaction - we
>>>> build up a list of pending commands and then either commit them to
>>>> hardware, or cancel them. However, whilst building up the list of
>>>> pending commands, we may modify state outside of the request and make
>>>> references to the pending request. If we do so and then cancel that
>>>> request, external objects then point to the deleted request leading to
>>>> both graphical and memory corruption.
>>>>
>>>> The easiest example is to consider object/VMA tracking. When we 
>>>> mark an
>>>> object as active in a request, we store a pointer to this, the most
>>>> recent request, in the object. Then we want to free that object, we 
>>>> wait
>>>> for the most recent request to be idle before proceeding (otherwise 
>>>> the
>>>> hardware will write to pages now owned by the system, or we will 
>>>> attempt
>>>> to read from those pages before the hardware is finished writing). If
>>>> the request was cancelled instead, that wait completes immediately. 
>>>> As a
>>>> result, all requests must be committed and not cancelled if the 
>>>> external
>>>> state is unknown.
>>>>
>>>> All that remains of i915_gem_request_cancel() users are just a 
>>>> couple of
>>>> extremely unlikely allocation failures, so remove the API entirely.
>>>>
>>>> A consequence of committing all incomplete requests is that we 
>>>> generate
>>>> excess breadcrumbs and fill the ring much more often with dummy 
>>>> work. We
>>>> have completely undone the outstanding_last_seqno optimisation.
>>>>
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>>> Cc: stable@vger.kernel.org
>>> Cc: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> I'd like John's ack on this on too, but patch itself looks sound. Fast
>>> r-b
>>> since we've discussed this a while ago already ...
>>
>> I think this is going to cause a problem with the scheduler. You are
>> effectively saying that the execbuf call cannot fail beyond the point of
>> allocating a request. If it gets that far then it must go all the way
>> and submit the request to the hardware. With a scheduler, that means
>> adding it to the scheduler's queues and tracking it all the way through
>> the system to completion. If nothing else, that sounds like a lot of
>> extra overhead for no actual work. Or worse if the failure is at a point
>> where the request cannot be sent further through the system because it
>> was something critical that failed then you are really stuffed.
>>
>> I'm not sure what the other option would be though, short of being able
>> to undo the last read/write object tracking updates.
>
> With the chained-ownership code we have in the scheduler, it becomes 
> perfectly possible to undo the last-read/write tracking changes.
>
> I'd much rather see any failure during submission rewound and undone, 
> so we can just return -EAGAIN at any point and let someone retry if 
> required.
>
> This just looks like a hack to work around not having a properly 
> transactional model of request submission :(
>
> .Dave.

I was thinking if it would be possible to delay the tracking updates 
until later in the execbuf process. I.e. only do it after all potential 
failure points. That would be a much simpler change than putting in 
chained ownership.

However, it seems that the patch has already been merged despite this 
discussion and Daniel Vetter wanting an ack first? Is that correct?

John.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-04-21 13:04 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-12 20:02 [CI-ping 01/15] drm/i915: Force clean compilation with -Werror Chris Wilson
2016-04-12 20:02 ` [CI-ping 02/15] drm/i915: Disentangle i915_drv.h includes Chris Wilson
2016-04-12 20:02 ` [CI-ping 03/15] drm/i915: Add GEM debugging Kconfig option Chris Wilson
2016-04-12 20:02 ` [CI-ping 04/15] drm/i915: Hide the atomic_read(reset_counter) behind a helper Chris Wilson
2016-04-12 20:02 ` [CI-ping 05/15] drm/i915: Simplify checking of GPU reset_counter in display pageflips Chris Wilson
2016-04-12 20:03 ` [CI-ping 06/15] drm/i915: Tighten reset_counter for reset status Chris Wilson
2016-04-12 20:03 ` [CI-ping 07/15] drm/i915: Store the reset counter when constructing a request Chris Wilson
2016-04-12 20:03 ` [CI-ping 08/15] drm/i915: Simplify reset_counter handling during atomic modesetting Chris Wilson
2016-04-12 20:03 ` [CI-ping 09/15] drm/i915: Prevent leaking of -EIO from i915_wait_request() Chris Wilson
2016-04-12 20:03 ` [CI-ping 10/15] drm/i915: Suppress error message when GPU resets are disabled Chris Wilson
2016-04-12 20:03 ` [CI-ping 11/15] drm/i915: Prevent machine death on Ivybridge context switching Chris Wilson
2016-04-13  9:33   ` Daniel Vetter
2016-04-13  9:33     ` Daniel Vetter
2016-04-18  9:50     ` [Intel-gfx] " Jani Nikula
2016-04-18  9:50       ` Jani Nikula
2016-04-20 13:26       ` [Intel-gfx] " Jani Nikula
2016-04-12 20:03 ` [CI-ping 12/15] drm/i915: Force ringbuffers to not be at offset 0 Chris Wilson
2016-04-13  9:34   ` [Intel-gfx] " Daniel Vetter
2016-04-12 20:03 ` [CI-ping 13/15] drm/i915: Move the mb() following release-mmap into release-mmap Chris Wilson
2016-04-12 20:03 ` [CI-ping 14/15] drm/i915: Reorganise legacy context switch to cope with late failure Chris Wilson
2016-04-13  9:59   ` Daniel Vetter
2016-04-13 10:05     ` Chris Wilson
2016-04-13 14:16       ` [PATCH 1/2] drm/i915: Split out !RCS legacy context switching Chris Wilson
2016-04-13 14:16         ` [PATCH 2/2] drm/i915: Reorganise legacy context switch to cope with late failure Chris Wilson
2016-04-13 15:05           ` Daniel Vetter
2016-04-13 15:18             ` Chris Wilson
2016-04-13 14:56         ` [PATCH 1/2] drm/i915: Split out !RCS legacy context switching Daniel Vetter
2016-04-13 15:04           ` Chris Wilson
2016-04-12 20:03 ` [CI-ping 15/15] drm/i915: Late request cancellations are harmful Chris Wilson
2016-04-13  9:57   ` Daniel Vetter
2016-04-13  9:57     ` Daniel Vetter
2016-04-13 14:21     ` John Harrison
2016-04-19 12:35       ` Dave Gordon
2016-04-21 13:04         ` John Harrison [this message]
2016-04-22 22:57           ` John Harrison
2016-04-27 18:52             ` Dave Gordon
2016-04-18  9:46     ` [Intel-gfx] " Jani Nikula
2016-04-18  9:46       ` Jani Nikula
2016-04-14  8:45 ` ✗ Fi.CI.BAT: failure for series starting with [CI-ping,01/15] drm/i915: Force clean compilation with -Werror (rev3) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5718CFDD.2050306@Intel.com \
    --to=john.c.harrison@intel.com \
    --cc=david.s.gordon@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.