All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Tomas Elf <tomas.elf@intel.com>
Cc: Intel-GFX@Lists.FreeDesktop.Org
Subject: Re: [PATCH 8/8] drm/i915: NULL check of unpin_work
Date: Tue, 13 Oct 2015 13:51:47 +0200	[thread overview]
Message-ID: <20151013115147.GR26718@phenom.ffwll.local> (raw)
In-Reply-To: <5617ADC0.6080305@intel.com>

On Fri, Oct 09, 2015 at 01:06:24PM +0100, Tomas Elf wrote:
> On 09/10/2015 11:44, Chris Wilson wrote:
> >On Fri, Oct 09, 2015 at 11:30:34AM +0100, Tomas Elf wrote:
> >>On 09/10/2015 08:46, Chris Wilson wrote:
> >>>On Thu, Oct 08, 2015 at 07:31:40PM +0100, Tomas Elf wrote:
> >>>>Avoid NULL pointer exceptions in the display driver for certain critical cases
> >>>>when unpin_work has turned out to be NULL.
> >>>
> >>>Nope, the machine reached a point where it cannot possibly reach, we
> >>>want the OOPS.
> >>>-Chris
> >>>
> >>
> >>Really? Because if I have this in there I can actually run the
> >>long-duration stability tests for 12+ hours rather than just a few
> >>hours (it doesn't happen all the time but I've seen it at least
> >>once). But, hey, if you want to investigate the reason why we have a
> >>NULL there then go ahead. I guess I'll just have to carry this patch
> >>for as long as my machine crashes during my TDR testing.
> >
> >You've sat on a critical bug for how long? There's been one recent
> >report that appears to be fallout from Tvrtko's context cleanup, but
> >nothing older, in public at least.
> >-Chris
> >
> 
> Do people typically try to actively hang their machines several times a
> minute for a 12+ hours at a time? If not then maybe that's why they haven't
> seen this.

The problem is that the world loves to conspire against races like these,
and somewhere out there is a machine which hits this ridiculously
reliably. And if this is a machine sitting on an OEM's product engineer
bench we've just lost a sale ;-)

Just because you can't repro if fast doesn't mean that your reproduction
rate is the lower limit.

> I haven't sat on anything, this has been part of my error state capture
> improvement patches which I've intended to upstream for several months now.
> I consider this part of the overall stability issue and grouped it together
> with all the other patches necessary to make the machine not crash for the
> duration of my test. I would've upstreamed this series a long time ago if I
> had actually been given time to work on it but that's another discussion.

I guess Chris' reaction is because we've had tons of cases where bugfixes
like this where stuck in android trees because they where tied up with
some feature work. But I also know it's really hard to spot them, so in
casee of doubt please upstream small fixes aggressively.

Since this one here is outside of the error capture itself, and so code
where our existing assumption that error capture only runs when the driver
is dead since seconds is invalid, it's a prime candidate.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

      reply	other threads:[~2015-10-13 11:48 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-08 18:31 [PATCH 0/8] Stability improvements to error state capture Tomas Elf
2015-10-08 18:31 ` [PATCH 1/8] drm/i915: Early exit from semaphore_waits_for for execlist mode Tomas Elf
2015-10-08 18:31 ` [PATCH 2/8] drm/i915: Migrate to safe iterators in error state capture Tomas Elf
2015-10-09  7:49   ` Chris Wilson
2015-10-09 11:38     ` Tomas Elf
2015-10-09  8:27   ` Daniel Vetter
2015-10-09 11:40     ` Tomas Elf
2015-10-13 11:37       ` Daniel Vetter
2015-10-13 11:47         ` Chris Wilson
2015-10-08 18:31 ` [PATCH 3/8] drm/i915: Cope with request list state change during " Tomas Elf
2015-10-09  7:48   ` Chris Wilson
2015-10-09 11:25     ` Tomas Elf
2015-10-13 11:39       ` Daniel Vetter
2015-10-14 11:46         ` Tomas Elf
2015-10-14 12:45           ` Daniel Vetter
2015-10-09  8:28   ` Daniel Vetter
2015-10-09 11:45     ` Tomas Elf
2015-10-13 11:40       ` Daniel Vetter
2015-10-08 18:31 ` [PATCH 4/8] drm/i915: NULL checking when capturing buffer objects " Tomas Elf
2015-10-09  7:49   ` Chris Wilson
2015-10-09 11:34     ` Tomas Elf
2015-10-09  8:32   ` Daniel Vetter
2015-10-09  8:47     ` Chris Wilson
2015-10-09 11:52       ` Tomas Elf
2015-10-09 11:45     ` Tomas Elf
2015-10-08 18:31 ` [PATCH 5/8] drm/i915: vma NULL pointer check Tomas Elf
2015-10-09  7:48   ` Chris Wilson
2015-10-09 11:30     ` Tomas Elf
2015-10-09 11:59       ` Chris Wilson
2015-10-13 11:43         ` Daniel Vetter
2015-10-09  8:33   ` Daniel Vetter
2015-10-09 11:46     ` Tomas Elf
2015-10-08 18:31 ` [PATCH 6/8] drm/i915: Use safe list iterators Tomas Elf
2015-10-09  7:41   ` Chris Wilson
2015-10-09 10:27     ` Tomas Elf
2015-10-09 10:38       ` Chris Wilson
2015-10-09 12:00         ` Tomas Elf
2015-10-08 18:31 ` [PATCH 7/8] drm/i915: Grab execlist spinlock to avoid post-reset concurrency issues Tomas Elf
2015-10-09  7:45   ` Chris Wilson
2015-10-09 10:28     ` Tomas Elf
2015-10-09  8:38   ` Daniel Vetter
2015-10-09  8:45     ` Chris Wilson
2015-10-13 11:46       ` Daniel Vetter
2015-10-13 11:45         ` Chris Wilson
2015-10-13 13:46           ` Daniel Vetter
2015-10-13 14:00             ` Chris Wilson
2015-10-19 15:32   ` [PATCH v2 " Tomas Elf
2015-10-22 16:49     ` Dave Gordon
2015-10-22 17:35       ` Daniel Vetter
2015-10-23  8:42     ` Tvrtko Ursulin
2015-10-23  8:59       ` Daniel Vetter
2015-10-23 11:02         ` Tomas Elf
2015-10-23 12:49           ` Dave Gordon
2015-10-23 13:08     ` [PATCH v3 " Tomas Elf
2015-10-23 14:53       ` Daniel, Thomas
2015-10-23 17:02     ` [PATCH] drm/i915: Update to post-reset execlist queue clean-up Tomas Elf
2015-12-01 11:46       ` Tvrtko Ursulin
2015-12-11 14:14         ` Dave Gordon
2015-12-11 16:40           ` Daniel Vetter
2015-12-14 10:21           ` Mika Kuoppala
2015-10-08 18:31 ` [PATCH 8/8] drm/i915: NULL check of unpin_work Tomas Elf
2015-10-09  7:46   ` Chris Wilson
2015-10-09  8:39     ` Daniel Vetter
2015-10-09 11:50       ` Tomas Elf
2015-10-09 10:30     ` Tomas Elf
2015-10-09 10:44       ` Chris Wilson
2015-10-09 12:06         ` Tomas Elf
2015-10-13 11:51           ` Daniel Vetter [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151013115147.GR26718@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=tomas.elf@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.