All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tomas Elf <tomas.elf@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, Intel-GFX@Lists.FreeDesktop.Org
Subject: Re: [PATCH 6/8] drm/i915: Use safe list iterators
Date: Fri, 09 Oct 2015 13:00:49 +0100	[thread overview]
Message-ID: <5617AC71.6010502@intel.com> (raw)
In-Reply-To: <20151009103856.GN27939@nuc-i3427.alporthouse.com>

On 09/10/2015 11:38, Chris Wilson wrote:
> On Fri, Oct 09, 2015 at 11:27:48AM +0100, Tomas Elf wrote:
>> On 09/10/2015 08:41, Chris Wilson wrote:
>>> On Thu, Oct 08, 2015 at 07:31:38PM +0100, Tomas Elf wrote:
>>>> Error state capture is dependent on i915_gem_active_request() and
>>>> i915_gem_obj_is_pinned(). Since there is no synchronization between error state
>>>> capture and the driver state we at least need to use safe list iterators in
>>>> these functions to alleviate the problem of request/vma lists changing during
>>>> error state capture.
>>>
>>> Does not make it safe.
>>> -Chris
>>>
>>
>> I know it doesn't make it safe, I didn't say this patch makes it safe.
>>
>> Maybe I was unclear but I chose the word "alleviate" rather than
>> "solve" to indicate that there are still problems but this change
>> reduces the problem scope and makes crashes less likely to occur. I
>> also used the formulation "at least" to indicate that we're not
>> solving everything but we can do certain things to improve things
>> somewhat.
>>
>> The problems I've been seeing has been that the list state changes
>> during iteration and that the error capture tries to read elements
>> that are no longer part of the list - not that elements that the
>> error capture code is dereferencing are deallocated by the driver or
>> whatever. Using a safe iterator helps with that very particular
>> problem. Or maybe I guess I've just been incredibly lucky for the
>> last 2 months when I've been running this code as I've been able to
>> get 12+ hours of stability during my tests instead of less than one
>> hour in between crashes that was the case before I introduced these
>> changes.
>
> You have been incredibily lucky (probably due to how the requests are
> being cached now), but that the requests can be modified whilst error
> capture runs and oops is well known. Just pretending the list iterator
> is safe does nothing.
> -Chris
>

Does nothing except consistently extend meantime between failures from 
less than one hour to more than 12 hours during my TDR tests. That's a 
lot of luck right there. On a consistent and predictable basis. Also, 
I'm not trying to pretend, I thought I communicated clearly that this is 
not a solution but rather an improvement that makes certain types of 
tests actually possible to run, tests that are needed for the TDR 
validation.

But if you have another solution then let's go with that.

Thanks,
Tomas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-10-09 12:01 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-08 18:31 [PATCH 0/8] Stability improvements to error state capture Tomas Elf
2015-10-08 18:31 ` [PATCH 1/8] drm/i915: Early exit from semaphore_waits_for for execlist mode Tomas Elf
2015-10-08 18:31 ` [PATCH 2/8] drm/i915: Migrate to safe iterators in error state capture Tomas Elf
2015-10-09  7:49   ` Chris Wilson
2015-10-09 11:38     ` Tomas Elf
2015-10-09  8:27   ` Daniel Vetter
2015-10-09 11:40     ` Tomas Elf
2015-10-13 11:37       ` Daniel Vetter
2015-10-13 11:47         ` Chris Wilson
2015-10-08 18:31 ` [PATCH 3/8] drm/i915: Cope with request list state change during " Tomas Elf
2015-10-09  7:48   ` Chris Wilson
2015-10-09 11:25     ` Tomas Elf
2015-10-13 11:39       ` Daniel Vetter
2015-10-14 11:46         ` Tomas Elf
2015-10-14 12:45           ` Daniel Vetter
2015-10-09  8:28   ` Daniel Vetter
2015-10-09 11:45     ` Tomas Elf
2015-10-13 11:40       ` Daniel Vetter
2015-10-08 18:31 ` [PATCH 4/8] drm/i915: NULL checking when capturing buffer objects " Tomas Elf
2015-10-09  7:49   ` Chris Wilson
2015-10-09 11:34     ` Tomas Elf
2015-10-09  8:32   ` Daniel Vetter
2015-10-09  8:47     ` Chris Wilson
2015-10-09 11:52       ` Tomas Elf
2015-10-09 11:45     ` Tomas Elf
2015-10-08 18:31 ` [PATCH 5/8] drm/i915: vma NULL pointer check Tomas Elf
2015-10-09  7:48   ` Chris Wilson
2015-10-09 11:30     ` Tomas Elf
2015-10-09 11:59       ` Chris Wilson
2015-10-13 11:43         ` Daniel Vetter
2015-10-09  8:33   ` Daniel Vetter
2015-10-09 11:46     ` Tomas Elf
2015-10-08 18:31 ` [PATCH 6/8] drm/i915: Use safe list iterators Tomas Elf
2015-10-09  7:41   ` Chris Wilson
2015-10-09 10:27     ` Tomas Elf
2015-10-09 10:38       ` Chris Wilson
2015-10-09 12:00         ` Tomas Elf [this message]
2015-10-08 18:31 ` [PATCH 7/8] drm/i915: Grab execlist spinlock to avoid post-reset concurrency issues Tomas Elf
2015-10-09  7:45   ` Chris Wilson
2015-10-09 10:28     ` Tomas Elf
2015-10-09  8:38   ` Daniel Vetter
2015-10-09  8:45     ` Chris Wilson
2015-10-13 11:46       ` Daniel Vetter
2015-10-13 11:45         ` Chris Wilson
2015-10-13 13:46           ` Daniel Vetter
2015-10-13 14:00             ` Chris Wilson
2015-10-19 15:32   ` [PATCH v2 " Tomas Elf
2015-10-22 16:49     ` Dave Gordon
2015-10-22 17:35       ` Daniel Vetter
2015-10-23  8:42     ` Tvrtko Ursulin
2015-10-23  8:59       ` Daniel Vetter
2015-10-23 11:02         ` Tomas Elf
2015-10-23 12:49           ` Dave Gordon
2015-10-23 13:08     ` [PATCH v3 " Tomas Elf
2015-10-23 14:53       ` Daniel, Thomas
2015-10-23 17:02     ` [PATCH] drm/i915: Update to post-reset execlist queue clean-up Tomas Elf
2015-12-01 11:46       ` Tvrtko Ursulin
2015-12-11 14:14         ` Dave Gordon
2015-12-11 16:40           ` Daniel Vetter
2015-12-14 10:21           ` Mika Kuoppala
2015-10-08 18:31 ` [PATCH 8/8] drm/i915: NULL check of unpin_work Tomas Elf
2015-10-09  7:46   ` Chris Wilson
2015-10-09  8:39     ` Daniel Vetter
2015-10-09 11:50       ` Tomas Elf
2015-10-09 10:30     ` Tomas Elf
2015-10-09 10:44       ` Chris Wilson
2015-10-09 12:06         ` Tomas Elf
2015-10-13 11:51           ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5617AC71.6010502@intel.com \
    --to=tomas.elf@intel.com \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=chris@chris-wilson.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.