From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: [PATCH 0/6] robustify reset transitions Date: Wed, 14 Nov 2012 17:14:02 +0100 Message-ID: <1352909648-21514-1-git-send-email-daniel.vetter@ffwll.ch> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-ee0-f49.google.com (mail-ee0-f49.google.com [74.125.83.49]) by gabe.freedesktop.org (Postfix) with ESMTP id AB394A0304 for ; Wed, 14 Nov 2012 08:25:18 -0800 (PST) Received: by mail-ee0-f49.google.com with SMTP id c1so385161eek.36 for ; Wed, 14 Nov 2012 08:25:17 -0800 (PST) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Intel Graphics Development Cc: Daniel Vetter List-Id: intel-gfx@lists.freedesktop.org Hi all, Pretty much still the same approach, but with a few changes compared to last time around: - split out the throttle fix - fixed a bug in the wait_error EXIT_COND - drop a unecessary barrier and update the comments&commit message to explain why that's safe Chris Wilson expressed some concerns about the approach, pushing more for his rwlock trick to enforce proper ordering. I still think that handling this as a generational event which invalidates all seqno waiting makes more sense, instead of playing tricks with locking. Imo grabbing the generation number together with the data we're blocking on makes it much clearer that reset is an exceptional event, and also (with the explicit passing of the reset_counter down the callchain) obvious from where to where exactly the critical section is. Abusing locking just doesn't quite feel right. His second concern is about the double atomic_read now required. Atomic reads don't insert any barriers or other stuff (they only differ in the typechecking compared to normal loads), and with the updated patches no additional barriers are inserted in any fastpaths. So imo no concern for performance, and imo it yields cleaner code to separate the states of the reset machine from the generational "invalidate everthing" counter. YMMV, so comments&flames on the approach highly welcome. Also, if people have ideas how to better test this ... Cheers, Daniel Daniel Vetter (6): drm/i915: move dev_priv->mm out of line drm/i915: extract hangcheck/reset/error_state state into substruct drm/i915: move wedged to the other gpu error handling stuff drm/i915: fix reset handling in the throttle ioctl drm/i915: clear up wedged transitions drm/i915: create a race-free reset detection drivers/gpu/drm/i915/i915_debugfs.c | 12 +- drivers/gpu/drm/i915/i915_dma.c | 9 +- drivers/gpu/drm/i915/i915_drv.c | 8 +- drivers/gpu/drm/i915/i915_drv.h | 274 ++++++++++++++++++-------------- drivers/gpu/drm/i915/i915_gem.c | 104 ++++++------ drivers/gpu/drm/i915/i915_irq.c | 89 +++++++---- drivers/gpu/drm/i915/intel_display.c | 4 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 8 +- 8 files changed, 291 insertions(+), 217 deletions(-) -- 1.7.11.4