From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: [PATCH 0/6] robustify reset transitions
Date: Wed, 14 Nov 2012 17:14:02 +0100
Message-ID: <1352909648-21514-1-git-send-email-daniel.vetter@ffwll.ch>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org>
Received: from mail-ee0-f49.google.com (mail-ee0-f49.google.com [74.125.83.49])
	by gabe.freedesktop.org (Postfix) with ESMTP id AB394A0304
	for <intel-gfx@lists.freedesktop.org>;
	Wed, 14 Nov 2012 08:25:18 -0800 (PST)
Received: by mail-ee0-f49.google.com with SMTP id c1so385161eek.36
	for <intel-gfx@lists.freedesktop.org>;
	Wed, 14 Nov 2012 08:25:17 -0800 (PST)
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
To: Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
List-Id: intel-gfx@lists.freedesktop.org

Hi all,

Pretty much still the same approach, but with a few changes compared to last
time around:
- split out the throttle fix
- fixed a bug in the wait_error EXIT_COND
- drop a unecessary barrier and update the comments&commit message to explain
  why that's safe

Chris Wilson expressed some concerns about the approach, pushing more for his
rwlock trick to enforce proper ordering. I still think that handling this as a
generational event which invalidates all seqno waiting makes more sense, instead
of playing tricks with locking. Imo grabbing the generation number together with
the data we're blocking on makes it much clearer that reset is an exceptional
event, and also (with the explicit passing of the reset_counter down the
callchain) obvious from where to where exactly the critical section is. Abusing
locking just doesn't quite feel right.

His second concern is about the double atomic_read now required. Atomic reads
don't insert any barriers or other stuff (they only differ in the typechecking
compared to normal loads), and with the updated patches no additional barriers
are inserted in any fastpaths. So imo no concern for performance, and imo it
yields cleaner code to separate the states of the reset machine from the
generational "invalidate everthing" counter.

YMMV, so comments&flames on the approach highly welcome. Also, if people have
ideas how to better test this ...

Cheers, Daniel

Daniel Vetter (6):
  drm/i915: move dev_priv->mm out of line
  drm/i915: extract hangcheck/reset/error_state state into substruct
  drm/i915: move wedged to the other gpu error handling stuff
  drm/i915: fix reset handling in the throttle ioctl
  drm/i915: clear up wedged transitions
  drm/i915: create a race-free reset detection

 drivers/gpu/drm/i915/i915_debugfs.c     |  12 +-
 drivers/gpu/drm/i915/i915_dma.c         |   9 +-
 drivers/gpu/drm/i915/i915_drv.c         |   8 +-
 drivers/gpu/drm/i915/i915_drv.h         | 274 ++++++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gem.c         | 104 ++++++------
 drivers/gpu/drm/i915/i915_irq.c         |  89 +++++++----
 drivers/gpu/drm/i915/intel_display.c    |   4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   8 +-
 8 files changed, 291 insertions(+), 217 deletions(-)

-- 
1.7.11.4