All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Tomas Elf <tomas.elf@intel.com>
Cc: Intel-GFX@Lists.FreeDesktop.Org
Subject: Re: [PATCH 3/8] drm/i915: Cope with request list state change during error state capture
Date: Wed, 14 Oct 2015 14:45:32 +0200	[thread overview]
Message-ID: <20151014124532.GA26718@phenom.ffwll.local> (raw)
In-Reply-To: <561E40AB.7080809@intel.com>

On Wed, Oct 14, 2015 at 12:46:51PM +0100, Tomas Elf wrote:
> On 13/10/2015 12:39, Daniel Vetter wrote:
> >On Fri, Oct 09, 2015 at 12:25:26PM +0100, Tomas Elf wrote:
> >>On 09/10/2015 08:48, Chris Wilson wrote:
> >>>On Thu, Oct 08, 2015 at 07:31:35PM +0100, Tomas Elf wrote:
> >>>>Since we're not synchronizing the ring request list during error state capture
> >>>>the request list state might change between the time the corresponding error
> >>>>request list was allocated and dimensioned to the time when the ring request
> >>>>list is actually captured into the error state. If this happens, throw a
> >>>>WARNING and do early exit and be aware that the captured error state might not
> >>>>be fully reliable.
> >>>>
> >>>>Signed-off-by: Tomas Elf <tomas.elf@intel.com>
> >>>>---
> >>>>  drivers/gpu/drm/i915/i915_gpu_error.c | 13 +++++++++++++
> >>>>  1 file changed, 13 insertions(+)
> >>>>
> >>>>diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> >>>>index 32c1799..cc75ca4 100644
> >>>>--- a/drivers/gpu/drm/i915/i915_gpu_error.c
> >>>>+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> >>>>@@ -1071,6 +1071,19 @@ static void i915_gem_record_rings(struct drm_device *dev,
> >>>>  		list_for_each_entry_safe(request, tmpreq, &ring->request_list, list) {
> >>>>  			struct drm_i915_error_request *erq;
> >>>>
> >>>>+			if (WARN_ON(!request || count >= error->ring[i].num_requests)) {
> >>>
> >>>Request cannot be null, count can legitmately be more, the WARN on is
> >>>inappropriate. Again, I sent several patches over the past couple of
> >>>years to fix this.
> >>>-Chris
> >>>
> >>
> >>Ok, so having the driver request list change grow in between the point where
> >>we allocate and set up the error state request list to the same size as the
> >>driver request list (since that's what count being larger than the list size
> >>implies) is legitimate? Traversing into unallocated memory seems pretty
> >>dodgy to me but if you say so.
> >
> >We still need to handle it ofc, but just not WARN on this condition since
> >it can happen.
> >-Daniel
> >
> 
> With the RCU discussion ongoing I guess we should we just drop this patch? I
> agree that what I've been seeing looks like a side-effect of concurrent
> memory deallocation. Whatever solution you reach should make this patch
> pointless.

Depending upon the solution we pick we might still need it, since if we
only make sure we don't chase freed memory but still allow the list itself
to get modified (which would be the case with kfree_rcu), then you need
it.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-10-14 12:42 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-08 18:31 [PATCH 0/8] Stability improvements to error state capture Tomas Elf
2015-10-08 18:31 ` [PATCH 1/8] drm/i915: Early exit from semaphore_waits_for for execlist mode Tomas Elf
2015-10-08 18:31 ` [PATCH 2/8] drm/i915: Migrate to safe iterators in error state capture Tomas Elf
2015-10-09  7:49   ` Chris Wilson
2015-10-09 11:38     ` Tomas Elf
2015-10-09  8:27   ` Daniel Vetter
2015-10-09 11:40     ` Tomas Elf
2015-10-13 11:37       ` Daniel Vetter
2015-10-13 11:47         ` Chris Wilson
2015-10-08 18:31 ` [PATCH 3/8] drm/i915: Cope with request list state change during " Tomas Elf
2015-10-09  7:48   ` Chris Wilson
2015-10-09 11:25     ` Tomas Elf
2015-10-13 11:39       ` Daniel Vetter
2015-10-14 11:46         ` Tomas Elf
2015-10-14 12:45           ` Daniel Vetter [this message]
2015-10-09  8:28   ` Daniel Vetter
2015-10-09 11:45     ` Tomas Elf
2015-10-13 11:40       ` Daniel Vetter
2015-10-08 18:31 ` [PATCH 4/8] drm/i915: NULL checking when capturing buffer objects " Tomas Elf
2015-10-09  7:49   ` Chris Wilson
2015-10-09 11:34     ` Tomas Elf
2015-10-09  8:32   ` Daniel Vetter
2015-10-09  8:47     ` Chris Wilson
2015-10-09 11:52       ` Tomas Elf
2015-10-09 11:45     ` Tomas Elf
2015-10-08 18:31 ` [PATCH 5/8] drm/i915: vma NULL pointer check Tomas Elf
2015-10-09  7:48   ` Chris Wilson
2015-10-09 11:30     ` Tomas Elf
2015-10-09 11:59       ` Chris Wilson
2015-10-13 11:43         ` Daniel Vetter
2015-10-09  8:33   ` Daniel Vetter
2015-10-09 11:46     ` Tomas Elf
2015-10-08 18:31 ` [PATCH 6/8] drm/i915: Use safe list iterators Tomas Elf
2015-10-09  7:41   ` Chris Wilson
2015-10-09 10:27     ` Tomas Elf
2015-10-09 10:38       ` Chris Wilson
2015-10-09 12:00         ` Tomas Elf
2015-10-08 18:31 ` [PATCH 7/8] drm/i915: Grab execlist spinlock to avoid post-reset concurrency issues Tomas Elf
2015-10-09  7:45   ` Chris Wilson
2015-10-09 10:28     ` Tomas Elf
2015-10-09  8:38   ` Daniel Vetter
2015-10-09  8:45     ` Chris Wilson
2015-10-13 11:46       ` Daniel Vetter
2015-10-13 11:45         ` Chris Wilson
2015-10-13 13:46           ` Daniel Vetter
2015-10-13 14:00             ` Chris Wilson
2015-10-19 15:32   ` [PATCH v2 " Tomas Elf
2015-10-22 16:49     ` Dave Gordon
2015-10-22 17:35       ` Daniel Vetter
2015-10-23  8:42     ` Tvrtko Ursulin
2015-10-23  8:59       ` Daniel Vetter
2015-10-23 11:02         ` Tomas Elf
2015-10-23 12:49           ` Dave Gordon
2015-10-23 13:08     ` [PATCH v3 " Tomas Elf
2015-10-23 14:53       ` Daniel, Thomas
2015-10-23 17:02     ` [PATCH] drm/i915: Update to post-reset execlist queue clean-up Tomas Elf
2015-12-01 11:46       ` Tvrtko Ursulin
2015-12-11 14:14         ` Dave Gordon
2015-12-11 16:40           ` Daniel Vetter
2015-12-14 10:21           ` Mika Kuoppala
2015-10-08 18:31 ` [PATCH 8/8] drm/i915: NULL check of unpin_work Tomas Elf
2015-10-09  7:46   ` Chris Wilson
2015-10-09  8:39     ` Daniel Vetter
2015-10-09 11:50       ` Tomas Elf
2015-10-09 10:30     ` Tomas Elf
2015-10-09 10:44       ` Chris Wilson
2015-10-09 12:06         ` Tomas Elf
2015-10-13 11:51           ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151014124532.GA26718@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=tomas.elf@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.