All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] A collection of cleanups, version 4
@ 2016-01-29 19:19 Dave Gordon
  2016-01-29 19:19 ` [PATCH v4 1/6] drm/i915: tidy up initialisation failure paths (legacy) Dave Gordon
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Dave Gordon @ 2016-01-29 19:19 UTC (permalink / raw)
  To: intel-gfx

Various things can go wrong during initialisation and teardown, but they
usually don't, so the error-handling paths go largely untested.  This
collection of patches fixes some things I recently noticed. Some might
lead to a kernel OOPS, but mostly they're leaks and other inconsistencies.

Includes Nick Hoath's patch [6/6] to swap context/engine teardown, because
that helps make setup & teardown more consistent too; but more importantly,
that patch [6/6] is dependent on patch [5/6] of this set to work correctly.
Applying [6/6] without at least [5/6] as well will cause a fault on driver
unload!

v2:
    Patches reordered and the previous [1/4] split into [4..6/6],
    so that the bugs that it's fixing are more clearly visible
    (Mika Kuoppala, although he said it wasn't actually *necessary*).

v3:
    Patches reordered again; the bug fixes are now in patches 3 and 5.
    The former could stand alone; the latter depends on patch 4 and is
    a prerequisite for Nick's patch [6/6] to function.

v4:
    Rebased

Dave Gordon (5):
  drm/i915: tidy up initialisation failure paths (legacy)
  drm/i915: tidy up initialisation failure paths (GEM & LRC)
  drm/i915: unmap the correct page in intel_logical_ring_cleanup()
  drm/i915: consolidate LRC mode HWSP setup & teardown
  drm/i915: HWSP should be unmapped earlier in LRC teardown sequence

Nick Hoath (1):
  drm/i915: fix context/engine cleanup order

 drivers/gpu/drm/i915/i915_dma.c         |  4 +-
 drivers/gpu/drm/i915/i915_drv.h         |  2 +-
 drivers/gpu/drm/i915/i915_gem.c         | 28 ++++++++------
 drivers/gpu/drm/i915/intel_lrc.c        | 65 ++++++++++++++++++++-------------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 49 ++++++++++++-------------
 5 files changed, 82 insertions(+), 66 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 1/6] drm/i915: tidy up initialisation failure paths (legacy)
  2016-01-29 19:19 [PATCH v4 0/6] A collection of cleanups, version 4 Dave Gordon
@ 2016-01-29 19:19 ` Dave Gordon
  2016-01-30 10:50   ` Chris Wilson
  2016-01-29 19:19 ` [PATCH v4 2/6] drm/i915: tidy up initialisation failure paths (GEM & LRC) Dave Gordon
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Dave Gordon @ 2016-01-29 19:19 UTC (permalink / raw)
  To: intel-gfx

1. Fix intel_cleanup_ring_buffer() to handle the error cleanup
   case where the ringbuffer has been allocated but map-and-pin
   failed. Unpin it iff it's previously been mapped-and-pinned.

2. Fix the error path in intel_init_ring_buffer(), which already
   called intel_destroy_ringbuffer_obj(), but failed to free the
   actual ringbuffer structure. Calling intel_ringbuffer_free()
   instead does both in one go.

3. With the above change, intel_destroy_ringbuffer_obj() is only
   called in one place (intel_ringbuffer_free()), so flatten it
   into that function.

4. move low-level register accesses from intel_cleanup_ring_buffer()
   (which calls intel_stop_ring_buffer(ring) which calls stop_ring())
   down into stop_ring() itself), which is already doing low-level
   register accesses. Then, intel_cleanup_ring_buffer() no longer
   needs 'dev_priv'.

v3:
   Make intel_unpin_ringbuffer_obj() return early if ringbuffer is
   not mapped, simplifying matters for the caller. [Chris Wilson,
   Daniel Vetter]
   Don't bother to clear a pointer in an object about to be freed.
   [Chris Wilson]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 49 ++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6f5b511..db02893 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -551,6 +551,8 @@ static bool stop_ring(struct intel_engine_cs *ring)
 		I915_WRITE_MODE(ring, _MASKED_BIT_DISABLE(STOP_RING));
 	}
 
+	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
+
 	return (I915_READ_HEAD(ring) & HEAD_ADDR) == 0;
 }
 
@@ -2055,6 +2057,9 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 
 void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 {
+	if (!ringbuf->virtual_start)
+		return;
+
 	if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
 		vunmap(ringbuf->virtual_start);
 	else
@@ -2132,12 +2137,6 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 	return 0;
 }
 
-static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
-{
-	drm_gem_object_unreference(&ringbuf->obj->base);
-	ringbuf->obj = NULL;
-}
-
 static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 				      struct intel_ringbuffer *ringbuf)
 {
@@ -2200,11 +2199,13 @@ struct intel_ringbuffer *
 }
 
 void
-intel_ringbuffer_free(struct intel_ringbuffer *ring)
+intel_ringbuffer_free(struct intel_ringbuffer *ringbuf)
 {
-	intel_destroy_ringbuffer_obj(ring);
-	list_del(&ring->link);
-	kfree(ring);
+	if (ringbuf->obj)
+		drm_gem_object_unreference(&ringbuf->obj->base);
+
+	list_del(&ringbuf->link);
+	kfree(ringbuf);
 }
 
 static int intel_init_ring_buffer(struct drm_device *dev,
@@ -2232,6 +2233,13 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	}
 	ring->buffer = ringbuf;
 
+	ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
+	if (ret) {
+		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
+				ring->name, ret);
+		goto error;
+	}
+
 	if (I915_NEED_GFX_HWS(dev)) {
 		ret = init_status_page(ring);
 		if (ret)
@@ -2243,14 +2251,6 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 			goto error;
 	}
 
-	ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
-	if (ret) {
-		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
-				ring->name, ret);
-		intel_destroy_ringbuffer_obj(ringbuf);
-		goto error;
-	}
-
 	ret = i915_cmd_parser_init_ring(ring);
 	if (ret)
 		goto error;
@@ -2264,19 +2264,16 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 
 void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 {
-	struct drm_i915_private *dev_priv;
+	struct intel_ringbuffer *ringbuf;
 
 	if (!intel_ring_initialized(ring))
 		return;
 
-	dev_priv = to_i915(ring->dev);
-
-	if (ring->buffer) {
+	ringbuf = ring->buffer;
+	if (ringbuf) {
 		intel_stop_ring_buffer(ring);
-		WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
-
-		intel_unpin_ringbuffer_obj(ring->buffer);
-		intel_ringbuffer_free(ring->buffer);
+		intel_unpin_ringbuffer_obj(ringbuf);
+		intel_ringbuffer_free(ringbuf);
 		ring->buffer = NULL;
 	}
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 2/6] drm/i915: tidy up initialisation failure paths (GEM & LRC)
  2016-01-29 19:19 [PATCH v4 0/6] A collection of cleanups, version 4 Dave Gordon
  2016-01-29 19:19 ` [PATCH v4 1/6] drm/i915: tidy up initialisation failure paths (legacy) Dave Gordon
@ 2016-01-29 19:19 ` Dave Gordon
  2016-01-30 10:56   ` Chris Wilson
  2016-01-29 19:19 ` [PATCH v4 3/6] drm/i915: unmap the correct page in intel_logical_ring_cleanup() Dave Gordon
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Dave Gordon @ 2016-01-29 19:19 UTC (permalink / raw)
  To: intel-gfx

1. add call to i915_gem_context_fini() to deallocate the default
   context(s) if the call to init_rings() fails, so that we don't
   leak the context in that situation.

2. remove useless code in intel_logical_ring_cleanup(), presumably
   copypasted from legacy ringbuffer version at creation.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c  |  5 ++++-
 drivers/gpu/drm/i915/intel_lrc.c | 10 ++--------
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a928823..5a4d468 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4986,8 +4986,11 @@ int i915_gem_init(struct drm_device *dev)
 		goto out_unlock;
 
 	ret = dev_priv->gt.init_rings(dev);
-	if (ret)
+	if (ret) {
+		i915_gem_context_fini(dev);
+		/* XXX: anything else to be undone here? */
 		goto out_unlock;
+	}
 
 	ret = i915_gem_init_hw(dev);
 	if (ret == -EIO) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3a03646..60d7cdd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1993,17 +1993,11 @@ static int gen8_init_rcs_context(struct drm_i915_gem_request *req)
  */
 void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 {
-	struct drm_i915_private *dev_priv;
-
 	if (!intel_ring_initialized(ring))
 		return;
 
-	dev_priv = ring->dev->dev_private;
-
-	if (ring->buffer) {
-		intel_logical_ring_stop(ring);
-		WARN_ON((I915_READ_MODE(ring) & MODE_IDLE) == 0);
-	}
+	/* should not be set in LRC mode */
+	WARN_ON(ring->buffer);
 
 	if (ring->cleanup)
 		ring->cleanup(ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 3/6] drm/i915: unmap the correct page in intel_logical_ring_cleanup()
  2016-01-29 19:19 [PATCH v4 0/6] A collection of cleanups, version 4 Dave Gordon
  2016-01-29 19:19 ` [PATCH v4 1/6] drm/i915: tidy up initialisation failure paths (legacy) Dave Gordon
  2016-01-29 19:19 ` [PATCH v4 2/6] drm/i915: tidy up initialisation failure paths (GEM & LRC) Dave Gordon
@ 2016-01-29 19:19 ` Dave Gordon
  2016-01-30 11:01   ` Chris Wilson
  2016-01-29 19:19 ` [PATCH v4 4/6] drm/i915: consolidate LRC mode HWSP setup & teardown Dave Gordon
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Dave Gordon @ 2016-01-29 19:19 UTC (permalink / raw)
  To: intel-gfx

The kunmap() call here didn't match the corresponding kmap().
The kmap()ing was changed with the introduction of the GuC-compatible
layout of context objects and the introduction of "LRC_PPHWSP_PN", in

    d167519 drm/i915: Integrate GuC-based command submission

but the corresponding kunmap() wasn't, probably because the old code
dug into the underlying sgl implementation instead of than calling
"i915_gem_object_get_page(ring->status_page.obj, 0)", which might more
easily have been noticed as containing an assumption about page 0.

v3:
    Use kmap_to_page() rather than repeat the mapping calculation.
    [Chris Wilson]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 60d7cdd..0fa2497 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2006,7 +2006,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 	i915_gem_batch_pool_fini(&ring->batch_pool);
 
 	if (ring->status_page.obj) {
-		kunmap(sg_page(ring->status_page.obj->pages->sgl));
+		kunmap(kmap_to_page(ring->status_page.page_addr));
 		ring->status_page.obj = NULL;
 	}
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 4/6] drm/i915: consolidate LRC mode HWSP setup & teardown
  2016-01-29 19:19 [PATCH v4 0/6] A collection of cleanups, version 4 Dave Gordon
                   ` (2 preceding siblings ...)
  2016-01-29 19:19 ` [PATCH v4 3/6] drm/i915: unmap the correct page in intel_logical_ring_cleanup() Dave Gordon
@ 2016-01-29 19:19 ` Dave Gordon
  2016-01-30 11:11   ` Chris Wilson
  2016-01-29 19:19 ` [PATCH v4 5/6] drm/i915: HWSP should be unmapped earlier in LRC teardown sequence Dave Gordon
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Dave Gordon @ 2016-01-29 19:19 UTC (permalink / raw)
  To: intel-gfx

In legacy ringbuffer mode, the HWSP is a separate GEM object with its
own pinning and reference counts. In LRC mode, however, it's not;
instead its part of the default context object. The LRC-mode setup &
teardown code therefore needs to handle this specially; the presence
of the two bugs fixed in this patchset suggests that this code is not
well-understood or maintained at present.

So, this patch:
    moves the (newly-fixed!) LRC-mode HWSP teardown code to its own
        (trivial) function lrc_teardown_hardware_status_page(), and
    changes the call signature of lrc_setup_hardware_status_page()
        to match
so that all knowledge of this special arrangement is local to these
two functions.

It will also help with efforts in progress to eliminate special
handling of the default (kernel) context elsewhere in LRC code :)

v3: Rebased

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 41 +++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0fa2497..ff38e57 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -227,9 +227,8 @@ enum {
 
 static int intel_lr_context_pin(struct intel_context *ctx,
 				struct intel_engine_cs *engine);
-static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
-		struct drm_i915_gem_object *default_ctx_obj);
-
+static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring);
+static void lrc_teardown_hardware_status_page(struct intel_engine_cs *ring);
 
 /**
  * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists
@@ -1555,8 +1554,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	u8 next_context_status_buffer_hw;
 
-	lrc_setup_hardware_status_page(ring,
-				dev_priv->kernel_context->engine[ring->id].state);
+	lrc_setup_hardware_status_page(ring);
 
 	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
 	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
@@ -2005,10 +2003,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 	i915_cmd_parser_fini_ring(ring);
 	i915_gem_batch_pool_fini(&ring->batch_pool);
 
-	if (ring->status_page.obj) {
-		kunmap(kmap_to_page(ring->status_page.page_addr));
-		ring->status_page.obj = NULL;
-	}
+	lrc_teardown_hardware_status_page(ring);
 
 	ring->disable_lite_restore_wa = false;
 	ring->ctx_desc_template = 0;
@@ -2500,24 +2495,36 @@ uint32_t intel_lr_context_size(struct intel_engine_cs *ring)
 	return ret;
 }
 
-static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
-		struct drm_i915_gem_object *default_ctx_obj)
+static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring)
 {
-	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+	struct intel_context *dctx = dev_priv->kernel_context;
+	struct drm_i915_gem_object *dctx_obj = dctx->engine[ring->id].state;
+	u64 dctx_addr = i915_gem_obj_ggtt_offset(dctx_obj);
 	struct page *page;
 
-	/* The HWSP is part of the default context object in LRC mode. */
-	ring->status_page.gfx_addr = i915_gem_obj_ggtt_offset(default_ctx_obj)
-			+ LRC_PPHWSP_PN * PAGE_SIZE;
-	page = i915_gem_object_get_page(default_ctx_obj, LRC_PPHWSP_PN);
+	/*
+	 * The HWSP is part of the default context object in LRC mode.
+	 * Note that it doesn't contribute to the refcount!
+	 */
+	page = i915_gem_object_get_page(dctx_obj, LRC_PPHWSP_PN);
 	ring->status_page.page_addr = kmap(page);
-	ring->status_page.obj = default_ctx_obj;
+	ring->status_page.gfx_addr = dctx_addr + LRC_PPHWSP_PN * PAGE_SIZE;
+	ring->status_page.obj = dctx_obj;
 
 	I915_WRITE(RING_HWS_PGA(ring->mmio_base),
 			(u32)ring->status_page.gfx_addr);
 	POSTING_READ(RING_HWS_PGA(ring->mmio_base));
 }
 
+static void lrc_teardown_hardware_status_page(struct intel_engine_cs *ring)
+{
+	if (ring->status_page.obj) {
+		kunmap(kmap_to_page(ring->status_page.page_addr));
+		ring->status_page.obj = NULL;
+	}
+}
+
 /**
  * intel_lr_context_deferred_alloc() - create the LRC specific bits of a context
  * @ctx: LR context to create.
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 5/6] drm/i915: HWSP should be unmapped earlier in LRC teardown sequence
  2016-01-29 19:19 [PATCH v4 0/6] A collection of cleanups, version 4 Dave Gordon
                   ` (3 preceding siblings ...)
  2016-01-29 19:19 ` [PATCH v4 4/6] drm/i915: consolidate LRC mode HWSP setup & teardown Dave Gordon
@ 2016-01-29 19:19 ` Dave Gordon
  2016-01-30 11:13   ` Chris Wilson
  2016-01-29 19:19 ` [PATCH v4 6/6] drm/i915: fix context/engine cleanup order Dave Gordon
  2016-02-01  8:59 ` ✓ Fi.CI.BAT: success for A collection of cleanups, version 4 Patchwork
  6 siblings, 1 reply; 24+ messages in thread
From: Dave Gordon @ 2016-01-29 19:19 UTC (permalink / raw)
  To: intel-gfx

In LRC mode, the HWSP is part of the default context object, and
therefore does not exist independently. Worse, it doesn't contribute
to the refcount on the default context object either.

Currently, the default context is deallocated in intel_lr_context_free(),
but the HWSP kmapping is not torn down until the subsequent call to
intel_logical_ring_cleanup(). Between these calls, ring->status_page.obj
continues to point to the (now non-existent) default context object,
and the kernel mapping likewise points to a page which is now free.

The solution is to dispose of the HWSP kmapping and pointer before the
object itself is freed, so this patch moves the call to the teardown
code from intel_lr_context_free() to intel_logical_ring_cleanup().

This code was introduced in

    48d8238 drm/i915/bdw: Generic logical ring init and cleanup

i.e. it has been there ever since LRC mode was first implemented.

v3:
    Rebased.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ff38e57..947ef15 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2003,7 +2003,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 	i915_cmd_parser_fini_ring(ring);
 	i915_gem_batch_pool_fini(&ring->batch_pool);
 
-	lrc_teardown_hardware_status_page(ring);
+	/* Status page should have gone already */
+	WARN_ON(ring->status_page.page_addr);
+	WARN_ON(ring->status_page.obj);
 
 	ring->disable_lite_restore_wa = false;
 	ring->ctx_desc_template = 0;
@@ -2447,6 +2449,11 @@ void intel_lr_context_free(struct intel_context *ctx)
 			continue;
 
 		if (ctx == ctx->i915->kernel_context) {
+			/*
+			 * The HWSP is part of the kernel context
+			 * object in LRC mode, so tear it down now.
+			 */
+			lrc_teardown_hardware_status_page(ringbuf->ring);
 			intel_unpin_ringbuffer_obj(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 		}
@@ -2517,12 +2524,19 @@ static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring)
 	POSTING_READ(RING_HWS_PGA(ring->mmio_base));
 }
 
+/* This should be called *before* the default context is destroyed */
 static void lrc_teardown_hardware_status_page(struct intel_engine_cs *ring)
 {
-	if (ring->status_page.obj) {
+	struct drm_i915_gem_object *dctx_obj = ring->status_page.obj;
+
+	WARN_ON(!dctx_obj);
+
+	if (ring->status_page.page_addr) {
 		kunmap(kmap_to_page(ring->status_page.page_addr));
-		ring->status_page.obj = NULL;
+		ring->status_page.page_addr = NULL;
 	}
+
+	ring->status_page.obj = NULL;
 }
 
 /**
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 6/6] drm/i915: fix context/engine cleanup order
  2016-01-29 19:19 [PATCH v4 0/6] A collection of cleanups, version 4 Dave Gordon
                   ` (4 preceding siblings ...)
  2016-01-29 19:19 ` [PATCH v4 5/6] drm/i915: HWSP should be unmapped earlier in LRC teardown sequence Dave Gordon
@ 2016-01-29 19:19 ` Dave Gordon
  2016-01-30 11:17   ` Chris Wilson
  2016-02-01  8:59 ` ✓ Fi.CI.BAT: success for A collection of cleanups, version 4 Patchwork
  6 siblings, 1 reply; 24+ messages in thread
From: Dave Gordon @ 2016-01-29 19:19 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

From: Nick Hoath <nicholas.hoath@intel.com>

Swap the order of context & engine cleanup, so that contexts are cleaned
up first, and *then* engines. This is a more sensible order anyway, but
in particular has become necessary since the 'intel_ring_initialized()
must be simple and inline' patch, which now uses ring->dev as an
'initialised' flag, so it can now be NULL after engine teardown. This in
turn can cause a problem in the context code, which (used to) check the
ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.

Also rename the cleanup function to reflect what it actually does
(cleanup engines, not a ringbuffer), and fix an annoying whitespace
issue.

v2: Also make the fix in i915_load_modeset_init, not just in
    i915_driver_unload (Chris Wilson)
v3: Had extra stuff in it.
v4: Reverted extra stuff (so we're back to v2).
    Rebased and updated commentary above (Dave Gordon).

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: David Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h |  2 +-
 drivers/gpu/drm/i915/i915_gem.c | 23 ++++++++++++-----------
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index db9b0c6..3eb6844 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -444,8 +444,8 @@ static int i915_load_modeset_init(struct drm_device *dev)
 
 cleanup_gem:
 	mutex_lock(&dev->struct_mutex);
-	i915_gem_cleanup_ringbuffer(dev);
 	i915_gem_context_fini(dev);
+	i915_gem_cleanup_engines(dev);
 	mutex_unlock(&dev->struct_mutex);
 cleanup_irq:
 	intel_guc_ucode_fini(dev);
@@ -1220,8 +1220,8 @@ int i915_driver_unload(struct drm_device *dev)
 
 	intel_guc_ucode_fini(dev);
 	mutex_lock(&dev->struct_mutex);
-	i915_gem_cleanup_ringbuffer(dev);
 	i915_gem_context_fini(dev);
+	i915_gem_cleanup_engines(dev);
 	mutex_unlock(&dev->struct_mutex);
 	intel_fbc_cleanup_cfb(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 905e90f..8baaca7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3020,7 +3020,7 @@ int i915_gem_init_rings(struct drm_device *dev);
 int __must_check i915_gem_init_hw(struct drm_device *dev);
 int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice);
 void i915_gem_init_swizzling(struct drm_device *dev);
-void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
+void i915_gem_cleanup_engines(struct drm_device *dev);
 int __must_check i915_gpu_idle(struct drm_device *dev);
 int __must_check i915_gem_suspend(struct drm_device *dev);
 void __i915_add_request(struct drm_i915_gem_request *req,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5a4d468..4ee6df9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4912,7 +4912,7 @@ int i915_gem_init_rings(struct drm_device *dev)
 		req = i915_gem_request_alloc(ring, NULL);
 		if (IS_ERR(req)) {
 			ret = PTR_ERR(req);
-			i915_gem_cleanup_ringbuffer(dev);
+			i915_gem_cleanup_engines(dev);
 			goto out;
 		}
 
@@ -4925,7 +4925,7 @@ int i915_gem_init_rings(struct drm_device *dev)
 		if (ret && ret != -EIO) {
 			DRM_ERROR("PPGTT enable ring #%d failed %d\n", i, ret);
 			i915_gem_request_cancel(req);
-			i915_gem_cleanup_ringbuffer(dev);
+			i915_gem_cleanup_engines(dev);
 			goto out;
 		}
 
@@ -4933,7 +4933,7 @@ int i915_gem_init_rings(struct drm_device *dev)
 		if (ret && ret != -EIO) {
 			DRM_ERROR("Context enable ring #%d failed %d\n", i, ret);
 			i915_gem_request_cancel(req);
-			i915_gem_cleanup_ringbuffer(dev);
+			i915_gem_cleanup_engines(dev);
 			goto out;
 		}
 
@@ -5011,7 +5011,7 @@ int i915_gem_init(struct drm_device *dev)
 }
 
 void
-i915_gem_cleanup_ringbuffer(struct drm_device *dev)
+i915_gem_cleanup_engines(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
@@ -5020,13 +5020,14 @@ int i915_gem_init(struct drm_device *dev)
 	for_each_ring(ring, dev_priv, i)
 		dev_priv->gt.cleanup_ring(ring);
 
-    if (i915.enable_execlists)
-            /*
-             * Neither the BIOS, ourselves or any other kernel
-             * expects the system to be in execlists mode on startup,
-             * so we need to reset the GPU back to legacy mode.
-             */
-            intel_gpu_reset(dev);
+	if (i915.enable_execlists) {
+		/*
+		 * Neither the BIOS, ourselves or any other kernel
+		 * expects the system to be in execlists mode on startup,
+		 * so we need to reset the GPU back to legacy mode.
+		 */
+		intel_gpu_reset(dev);
+	}
 }
 
 static void
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] drm/i915: tidy up initialisation failure paths (legacy)
  2016-01-29 19:19 ` [PATCH v4 1/6] drm/i915: tidy up initialisation failure paths (legacy) Dave Gordon
@ 2016-01-30 10:50   ` Chris Wilson
  2016-02-01  9:38     ` Dave Gordon
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-01-30 10:50 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Jan 29, 2016 at 07:19:26PM +0000, Dave Gordon wrote:
> 1. Fix intel_cleanup_ring_buffer() to handle the error cleanup
>    case where the ringbuffer has been allocated but map-and-pin
>    failed. Unpin it iff it's previously been mapped-and-pinned.
> 
> 2. Fix the error path in intel_init_ring_buffer(), which already
>    called intel_destroy_ringbuffer_obj(), but failed to free the
>    actual ringbuffer structure. Calling intel_ringbuffer_free()
>    instead does both in one go.
> 
> 3. With the above change, intel_destroy_ringbuffer_obj() is only
>    called in one place (intel_ringbuffer_free()), so flatten it
>    into that function.
> 
> 4. move low-level register accesses from intel_cleanup_ring_buffer()
>    (which calls intel_stop_ring_buffer(ring) which calls stop_ring())
>    down into stop_ring() itself), which is already doing low-level
>    register accesses. Then, intel_cleanup_ring_buffer() no longer
>    needs 'dev_priv'.
> 
> v3:
>    Make intel_unpin_ringbuffer_obj() return early if ringbuffer is
>    not mapped, simplifying matters for the caller. [Chris Wilson,
>    Daniel Vetter]
>    Don't bother to clear a pointer in an object about to be freed.
>    [Chris Wilson]
> 
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 49 ++++++++++++++++-----------------
>  1 file changed, 23 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 6f5b511..db02893 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -551,6 +551,8 @@ static bool stop_ring(struct intel_engine_cs *ring)
>  		I915_WRITE_MODE(ring, _MASKED_BIT_DISABLE(STOP_RING));
>  	}
>  
> +	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);

This warn was guarding the release of the backing pages, to verify that
we had stopped the rings first.

> +
>  	return (I915_READ_HEAD(ring) & HEAD_ADDR) == 0;
>  }
>  
> @@ -2055,6 +2057,9 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
>  
>  void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
>  {
> +	if (!ringbuf->virtual_start)
> +		return;
> +
>  	if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
>  		vunmap(ringbuf->virtual_start);
>  	else
> @@ -2132,12 +2137,6 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
>  	return 0;
>  }
>  
> -static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
> -{
> -	drm_gem_object_unreference(&ringbuf->obj->base);
> -	ringbuf->obj = NULL;
> -}
> -
>  static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
>  				      struct intel_ringbuffer *ringbuf)
>  {
> @@ -2200,11 +2199,13 @@ struct intel_ringbuffer *
>  }
>  
>  void
> -intel_ringbuffer_free(struct intel_ringbuffer *ring)
> +intel_ringbuffer_free(struct intel_ringbuffer *ringbuf)
>  {
> -	intel_destroy_ringbuffer_obj(ring);
> -	list_del(&ring->link);
> -	kfree(ring);
> +	if (ringbuf->obj)
> +		drm_gem_object_unreference(&ringbuf->obj->base);

No, let's not add duplicate code. If you worry,
replace to_intel_bo() with

static inline struct drm_i915_gem_object *
to_intel_bo(struct drm_gem_object *gem_obj)
{
	BUILD_BUG_ON(offsetof(struct drm_i915_gem_object, base));
	return container_of(gem_obj, struct drm_i915_gem_object, base);
}

and we can begin erradicating the half-baked checks (some places assume,
others do no-op conversions). Hindsight.

> +
> +	list_del(&ringbuf->link);
> +	kfree(ringbuf);
>  }
>  
>  static int intel_init_ring_buffer(struct drm_device *dev,
> @@ -2232,6 +2233,13 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  	}
>  	ring->buffer = ringbuf;
>  
> +	ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
> +	if (ret) {
> +		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
> +				ring->name, ret);
> +		goto error;
> +	}
> +
>  	if (I915_NEED_GFX_HWS(dev)) {
>  		ret = init_status_page(ring);
>  		if (ret)
> @@ -2243,14 +2251,6 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  			goto error;
>  	}
>  
> -	ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
> -	if (ret) {
> -		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
> -				ring->name, ret);
> -		intel_destroy_ringbuffer_obj(ringbuf);
> -		goto error;
> -	}
> -
>  	ret = i915_cmd_parser_init_ring(ring);
>  	if (ret)
>  		goto error;
> @@ -2264,19 +2264,16 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  
>  void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
>  {
> -	struct drm_i915_private *dev_priv;
> +	struct intel_ringbuffer *ringbuf;
>  
>  	if (!intel_ring_initialized(ring))
>  		return;
>  
> -	dev_priv = to_i915(ring->dev);
> -
> -	if (ring->buffer) {
> +	ringbuf = ring->buffer;
> +	if (ringbuf) {
>  		intel_stop_ring_buffer(ring);
> -		WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
> -
> -		intel_unpin_ringbuffer_obj(ring->buffer);
> -		intel_ringbuffer_free(ring->buffer);
> +		intel_unpin_ringbuffer_obj(ringbuf);
> +		intel_ringbuffer_free(ringbuf);
>  		ring->buffer = NULL;

Conversions to a horrible name for no reason. ringbuf was a mistake,
please no more
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/6] drm/i915: tidy up initialisation failure paths (GEM & LRC)
  2016-01-29 19:19 ` [PATCH v4 2/6] drm/i915: tidy up initialisation failure paths (GEM & LRC) Dave Gordon
@ 2016-01-30 10:56   ` Chris Wilson
  2016-01-30 11:28     ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-01-30 10:56 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Jan 29, 2016 at 07:19:27PM +0000, Dave Gordon wrote:
> 1. add call to i915_gem_context_fini() to deallocate the default
>    context(s) if the call to init_rings() fails, so that we don't
>    leak the context in that situation.
> 
> 2. remove useless code in intel_logical_ring_cleanup(), presumably
>    copypasted from legacy ringbuffer version at creation.
> 
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c  |  5 ++++-
>  drivers/gpu/drm/i915/intel_lrc.c | 10 ++--------
>  2 files changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a928823..5a4d468 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4986,8 +4986,11 @@ int i915_gem_init(struct drm_device *dev)
>  		goto out_unlock;
>  
>  	ret = dev_priv->gt.init_rings(dev);
> -	if (ret)
> +	if (ret) {
> +		i915_gem_context_fini(dev);
> +		/* XXX: anything else to be undone here? */

Yes. Make this a separate patch and begin the onion unwind.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/6] drm/i915: unmap the correct page in intel_logical_ring_cleanup()
  2016-01-29 19:19 ` [PATCH v4 3/6] drm/i915: unmap the correct page in intel_logical_ring_cleanup() Dave Gordon
@ 2016-01-30 11:01   ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-01-30 11:01 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Jan 29, 2016 at 07:19:28PM +0000, Dave Gordon wrote:
> The kunmap() call here didn't match the corresponding kmap().
> The kmap()ing was changed with the introduction of the GuC-compatible
> layout of context objects and the introduction of "LRC_PPHWSP_PN", in
> 
>     d167519 drm/i915: Integrate GuC-based command submission
> 
> but the corresponding kunmap() wasn't, probably because the old code
> dug into the underlying sgl implementation instead of than calling
> "i915_gem_object_get_page(ring->status_page.obj, 0)", which might more
> easily have been noticed as containing an assumption about page 0.
> 
> v3:
>     Use kmap_to_page() rather than repeat the mapping calculation.
>     [Chris Wilson]
> 
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

I think we are going end up with a kummap_addr() feature request, we
have a handful of sites where we reconstruct the orginal page.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/6] drm/i915: consolidate LRC mode HWSP setup & teardown
  2016-01-29 19:19 ` [PATCH v4 4/6] drm/i915: consolidate LRC mode HWSP setup & teardown Dave Gordon
@ 2016-01-30 11:11   ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-01-30 11:11 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Jan 29, 2016 at 07:19:29PM +0000, Dave Gordon wrote:
> In legacy ringbuffer mode, the HWSP is a separate GEM object with its
> own pinning and reference counts. In LRC mode, however, it's not;
> instead its part of the default context object. The LRC-mode setup &
> teardown code therefore needs to handle this specially; the presence
> of the two bugs fixed in this patchset suggests that this code is not
> well-understood or maintained at present.
> 
> So, this patch:
>     moves the (newly-fixed!) LRC-mode HWSP teardown code to its own
>         (trivial) function lrc_teardown_hardware_status_page(), and
>     changes the call signature of lrc_setup_hardware_status_page()
>         to match
> so that all knowledge of this special arrangement is local to these
> two functions.

On the other hand, you now hide that information which makes the
relationship between engine init/fini and the default context even more
opaque.

(There is still zero reason why we use the first page of an unused context
object as the HWS page.)
 
> It will also help with efforts in progress to eliminate special
> handling of the default (kernel) context elsewhere in LRC code :)

Non sequitur.
 
> v3: Rebased
> 
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>

The patch is an improvement, so
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

>  drivers/gpu/drm/i915/intel_lrc.c | 41 +++++++++++++++++++++++-----------------
>  1 file changed, 24 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0fa2497..ff38e57 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -227,9 +227,8 @@ enum {
>  
>  static int intel_lr_context_pin(struct intel_context *ctx,
>  				struct intel_engine_cs *engine);
> -static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
> -		struct drm_i915_gem_object *default_ctx_obj);
> -
> +static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring);
> +static void lrc_teardown_hardware_status_page(struct intel_engine_cs *ring);
>  
>  /**
>   * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists
> @@ -1555,8 +1554,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	u8 next_context_status_buffer_hw;
>  
> -	lrc_setup_hardware_status_page(ring,
> -				dev_priv->kernel_context->engine[ring->id].state);
> +	lrc_setup_hardware_status_page(ring);
>  
>  	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
>  	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
> @@ -2005,10 +2003,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>  	i915_cmd_parser_fini_ring(ring);
>  	i915_gem_batch_pool_fini(&ring->batch_pool);
>  
> -	if (ring->status_page.obj) {
> -		kunmap(kmap_to_page(ring->status_page.page_addr));
> -		ring->status_page.obj = NULL;
> -	}
> +	lrc_teardown_hardware_status_page(ring);
>  
>  	ring->disable_lite_restore_wa = false;
>  	ring->ctx_desc_template = 0;
> @@ -2500,24 +2495,36 @@ uint32_t intel_lr_context_size(struct intel_engine_cs *ring)
>  	return ret;
>  }
>  
> -static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
> -		struct drm_i915_gem_object *default_ctx_obj)
> +static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring)
>  {
> -	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct drm_i915_private *dev_priv = to_i915(ring->dev);
> +	struct intel_context *dctx = dev_priv->kernel_context;
> +	struct drm_i915_gem_object *dctx_obj = dctx->engine[ring->id].state;
> +	u64 dctx_addr = i915_gem_obj_ggtt_offset(dctx_obj);

This is known via the ctx->engine[id].lrc_vma. This is actually quite
important as it stresses that the caller must have acquire the context-pin
for us.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 5/6] drm/i915: HWSP should be unmapped earlier in LRC teardown sequence
  2016-01-29 19:19 ` [PATCH v4 5/6] drm/i915: HWSP should be unmapped earlier in LRC teardown sequence Dave Gordon
@ 2016-01-30 11:13   ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-01-30 11:13 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Jan 29, 2016 at 07:19:30PM +0000, Dave Gordon wrote:
> In LRC mode, the HWSP is part of the default context object, and
> therefore does not exist independently. Worse, it doesn't contribute
> to the refcount on the default context object either.
> 
> Currently, the default context is deallocated in intel_lr_context_free(),
> but the HWSP kmapping is not torn down until the subsequent call to
> intel_logical_ring_cleanup(). Between these calls, ring->status_page.obj
> continues to point to the (now non-existent) default context object,
> and the kernel mapping likewise points to a page which is now free.
> 
> The solution is to dispose of the HWSP kmapping and pointer before the
> object itself is freed, so this patch moves the call to the teardown
> code from intel_lr_context_free() to intel_logical_ring_cleanup().
> 
> This code was introduced in
> 
>     48d8238 drm/i915/bdw: Generic logical ring init and cleanup
> 
> i.e. it has been there ever since LRC mode was first implemented.
> 
> v3:
>     Rebased.
> 
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ff38e57..947ef15 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2003,7 +2003,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>  	i915_cmd_parser_fini_ring(ring);
>  	i915_gem_batch_pool_fini(&ring->batch_pool);
>  
> -	lrc_teardown_hardware_status_page(ring);
> +	/* Status page should have gone already */
> +	WARN_ON(ring->status_page.page_addr);
> +	WARN_ON(ring->status_page.obj);
>  
>  	ring->disable_lite_restore_wa = false;
>  	ring->ctx_desc_template = 0;
> @@ -2447,6 +2449,11 @@ void intel_lr_context_free(struct intel_context *ctx)
>  			continue;
>  
>  		if (ctx == ctx->i915->kernel_context) {
> +			/*
> +			 * The HWSP is part of the kernel context
> +			 * object in LRC mode, so tear it down now.
> +			 */
> +			lrc_teardown_hardware_status_page(ringbuf->ring);

No. This is tieing the engine specific detail into the already present
layering violation in the context that I insist be removed.

All you have to do is to correctly pin/unpin the default context during
engine setup/teardown to fix the code and elimiate all the silly checks
against the kernel_context that execlists does.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 6/6] drm/i915: fix context/engine cleanup order
  2016-01-29 19:19 ` [PATCH v4 6/6] drm/i915: fix context/engine cleanup order Dave Gordon
@ 2016-01-30 11:17   ` Chris Wilson
  2016-02-11 13:36     ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-01-30 11:17 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx, Daniel Vetter

On Fri, Jan 29, 2016 at 07:19:31PM +0000, Dave Gordon wrote:
> From: Nick Hoath <nicholas.hoath@intel.com>
> 
> Swap the order of context & engine cleanup, so that contexts are cleaned
> up first, and *then* engines. This is a more sensible order anyway, but
> in particular has become necessary since the 'intel_ring_initialized()
> must be simple and inline' patch, which now uses ring->dev as an
> 'initialised' flag, so it can now be NULL after engine teardown. This in
> turn can cause a problem in the context code, which (used to) check the
> ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.
> 
> Also rename the cleanup function to reflect what it actually does
> (cleanup engines, not a ringbuffer), and fix an annoying whitespace
> issue.
> 
> v2: Also make the fix in i915_load_modeset_init, not just in
>     i915_driver_unload (Chris Wilson)
> v3: Had extra stuff in it.
> v4: Reverted extra stuff (so we're back to v2).
>     Rebased and updated commentary above (Dave Gordon).
> 
> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
> Signed-off-by: David Gordon <david.s.gordon@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Have to drop that, with the recent context changes.

You have to move the gpu-reset now for execlists.

Basically pull it out into:

static void i915_unload_gem(struct drm_device *dev)
{
       /*
        * Neither the BIOS, ourselves or any other kernel
        * expects the system to be in execlists mode on startup,
        * so we need to reset the GPU back to legacy mode. And the only
        * known way to disable logical contexts is through a GPU reset.
        *
        * So in order to leave the system in a known default configration,
        * always reset the GPU upon unload. This also cleans up the GEM
        * state tracking, flushing off the requests and leaving the system
        * idle.
        *
        * Note that is of the upmost importance that the GPU is idle and
        * all stray writes are flushed *before* we dismantle the backing
        * storage for the pinned objects.
        */
       i915_reset(dev);

       mutex_lock(&dev->struct_mutex);
       i915_gem_context_fini(dev);
       i915_gem_cleanup_ringbuffer(dev);
       mutex_unlock(&dev->struct_mutex);
}

and then kill the intel_gpu_reset along both the cleanup pathsh
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/6] drm/i915: tidy up initialisation failure paths (GEM & LRC)
  2016-01-30 10:56   ` Chris Wilson
@ 2016-01-30 11:28     ` Chris Wilson
  2016-02-01  9:45       ` Dave Gordon
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2016-01-30 11:28 UTC (permalink / raw)
  To: Dave Gordon, intel-gfx

On Sat, Jan 30, 2016 at 10:56:27AM +0000, Chris Wilson wrote:
> On Fri, Jan 29, 2016 at 07:19:27PM +0000, Dave Gordon wrote:
> > 1. add call to i915_gem_context_fini() to deallocate the default
> >    context(s) if the call to init_rings() fails, so that we don't
> >    leak the context in that situation.
> > 
> > 2. remove useless code in intel_logical_ring_cleanup(), presumably
> >    copypasted from legacy ringbuffer version at creation.
> > 
> > Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c  |  5 ++++-
> >  drivers/gpu/drm/i915/intel_lrc.c | 10 ++--------
> >  2 files changed, 6 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index a928823..5a4d468 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4986,8 +4986,11 @@ int i915_gem_init(struct drm_device *dev)
> >  		goto out_unlock;
> >  
> >  	ret = dev_priv->gt.init_rings(dev);
> > -	if (ret)
> > +	if (ret) {
> > +		i915_gem_context_fini(dev);
> > +		/* XXX: anything else to be undone here? */
> 
> Yes. Make this a separate patch and begin the onion unwind.

Hmm. Actually, we have to make sure that we can still modeset if this
function fails - that is anything but utter catastrophe should just
result in loss of functionality (no stolen, no GEM execution etc) but we
can still drive the displays so the user can see how bad the damage is.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✓ Fi.CI.BAT: success for A collection of cleanups, version 4
  2016-01-29 19:19 [PATCH v4 0/6] A collection of cleanups, version 4 Dave Gordon
                   ` (5 preceding siblings ...)
  2016-01-29 19:19 ` [PATCH v4 6/6] drm/i915: fix context/engine cleanup order Dave Gordon
@ 2016-02-01  8:59 ` Patchwork
  6 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2016-02-01  8:59 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

== Summary ==

Series 2932v1 A collection of cleanups, version 4
http://patchwork.freedesktop.org/api/1.0/series/2932/revisions/1/mbox/

Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-a:
                incomplete -> PASS       (hsw-gt2)
        Subgroup suspend-read-crc-pipe-c:
                dmesg-warn -> PASS       (bsw-nuc-2)

bdw-nuci7        total:156  pass:147  dwarn:0   dfail:0   fail:0   skip:9  
bdw-ultra        total:159  pass:147  dwarn:0   dfail:0   fail:0   skip:12 
bsw-nuc-2        total:159  pass:129  dwarn:0   dfail:0   fail:0   skip:30 
byt-nuc          total:159  pass:136  dwarn:0   dfail:0   fail:0   skip:23 
hsw-brixbox      total:159  pass:146  dwarn:0   dfail:0   fail:0   skip:13 
hsw-gt2          total:159  pass:149  dwarn:0   dfail:0   fail:0   skip:10 
ilk-hp8440p      total:159  pass:111  dwarn:0   dfail:0   fail:0   skip:48 
ivb-t430s        total:159  pass:145  dwarn:0   dfail:0   fail:0   skip:14 
skl-i5k-2        total:159  pass:144  dwarn:1   dfail:0   fail:0   skip:14 
snb-dellxps      total:159  pass:137  dwarn:0   dfail:0   fail:0   skip:22 

Results at /archive/results/CI_IGT_test/Patchwork_1331/

6b1049b84dcd979f631d15b2ada325d8e5b2c4e1 drm-intel-nightly: 2016y-01m-29d-22h-50m-57s UTC integration manifest
6bf638cf4dc6373f0bab5dbce5be38f2e9c943da drm/i915: fix context/engine cleanup order
4c6115f5160b43fbb5fe38b12d8178b13419d7b9 drm/i915: HWSP should be unmapped earlier in LRC teardown sequence
ad013b3d9dfe416e4ad32df69633d0e975590a29 drm/i915: consolidate LRC mode HWSP setup & teardown
09810b92b61347175e98f41c8bc0225e74d5f6aa drm/i915: unmap the correct page in intel_logical_ring_cleanup()
646526b035394ea1e853e4e71a783825402674a1 drm/i915: tidy up initialisation failure paths (GEM & LRC)
8f1130534d6a7f9e7bfe899f69735612bc4038a9 drm/i915: tidy up initialisation failure paths (legacy)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] drm/i915: tidy up initialisation failure paths (legacy)
  2016-01-30 10:50   ` Chris Wilson
@ 2016-02-01  9:38     ` Dave Gordon
  2016-02-11 13:35       ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Gordon @ 2016-02-01  9:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 30/01/16 10:50, Chris Wilson wrote:
> On Fri, Jan 29, 2016 at 07:19:26PM +0000, Dave Gordon wrote:
>> 1. Fix intel_cleanup_ring_buffer() to handle the error cleanup
>>     case where the ringbuffer has been allocated but map-and-pin
>>     failed. Unpin it iff it's previously been mapped-and-pinned.
>>
>> 2. Fix the error path in intel_init_ring_buffer(), which already
>>     called intel_destroy_ringbuffer_obj(), but failed to free the
>>     actual ringbuffer structure. Calling intel_ringbuffer_free()
>>     instead does both in one go.
>>
>> 3. With the above change, intel_destroy_ringbuffer_obj() is only
>>     called in one place (intel_ringbuffer_free()), so flatten it
>>     into that function.
>>
>> 4. move low-level register accesses from intel_cleanup_ring_buffer()
>>     (which calls intel_stop_ring_buffer(ring) which calls stop_ring())
>>     down into stop_ring() itself), which is already doing low-level
>>     register accesses. Then, intel_cleanup_ring_buffer() no longer
>>     needs 'dev_priv'.
>>
>> v3:
>>     Make intel_unpin_ringbuffer_obj() return early if ringbuffer is
>>     not mapped, simplifying matters for the caller. [Chris Wilson,
>>     Daniel Vetter]
>>     Don't bother to clear a pointer in an object about to be freed.
>>     [Chris Wilson]
>>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>>   drivers/gpu/drm/i915/intel_ringbuffer.c | 49 ++++++++++++++++-----------------
>>   1 file changed, 23 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 6f5b511..db02893 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -551,6 +551,8 @@ static bool stop_ring(struct intel_engine_cs *ring)
>>   		I915_WRITE_MODE(ring, _MASKED_BIT_DISABLE(STOP_RING));
>>   	}
>>
>> +	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
>
> This warn was guarding the release of the backing pages, to verify that
> we had stopped the rings first.

It's still effectively at the same point in the execution sequence, just 
not intruding into a function that has no business peeking at registers. 
It's now at the end of the called subfunctions, rather than just after 
the call at the top-level, where it was a layering violation.

Or we could put it in a separate function, and say
	WARN_ON(!ring_is_idle(ring));

>> +
>>   	return (I915_READ_HEAD(ring) & HEAD_ADDR) == 0;
>>   }
>>
>> @@ -2055,6 +2057,9 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
>>
>>   void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
>>   {
>> +	if (!ringbuf->virtual_start)
>> +		return;
>> +
>>   	if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
>>   		vunmap(ringbuf->virtual_start);
>>   	else
>> @@ -2132,12 +2137,6 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
>>   	return 0;
>>   }
>>
>> -static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
>> -{
>> -	drm_gem_object_unreference(&ringbuf->obj->base);
>> -	ringbuf->obj = NULL;
>> -}
>> -
>>   static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
>>   				      struct intel_ringbuffer *ringbuf)
>>   {
>> @@ -2200,11 +2199,13 @@ struct intel_ringbuffer *
>>   }
>>
>>   void
>> -intel_ringbuffer_free(struct intel_ringbuffer *ring)
>> +intel_ringbuffer_free(struct intel_ringbuffer *ringbuf)
>>   {
>> -	intel_destroy_ringbuffer_obj(ring);
>> -	list_del(&ring->link);
>> -	kfree(ring);
>> +	if (ringbuf->obj)
>> +		drm_gem_object_unreference(&ringbuf->obj->base);
>
> No, let's not add duplicate code. If you worry,
> replace to_intel_bo() with

I suspect the above doesn't actually duplicate the test, 'cos 
drm_gem_object_unreference() is inline, and common subexpression 
elimination should determine that ringbuf->obj and &ringbuf->obj->base 
are the same thing and then the compiler can collapse the explicit test 
here with the one inside drm_gem_object_unreference().

> static inline struct drm_i915_gem_object *
> to_intel_bo(struct drm_gem_object *gem_obj)
> {
> 	BUILD_BUG_ON(offsetof(struct drm_i915_gem_object, base));
> 	return container_of(gem_obj, struct drm_i915_gem_object, base);
> }

I would quite like to get rid of the "&(...)->base" that intrudes 
everywhere - we should be able to just (un)reference the object we're 
working with, without caring that it's a subclass of some other type. So 
we'd want the opposite function as well, for passing i915 objects to drm 
functions (as here), maybe call it to_drm_obj()?

     static inline struct drm_gem_object *
     to_drm_obj(struct drm_i915_gem_object *i915_obj)
     {
	BUILD_BUG_ON(offsetof(struct drm_i915_gem_object, base));
	return &i915_obj->base;
     }

> and we can begin erradicating the half-baked checks (some places assume,
> others do no-op conversions). Hindsight.

Or we could make drm_gem_object_unreference() accept a (struct 
drm_i915_gem_object *) as an alternative to a (struct drm_gem_object *)?

>> +
>> +	list_del(&ringbuf->link);
>> +	kfree(ringbuf);
>>   }
>>
>>   static int intel_init_ring_buffer(struct drm_device *dev,
>> @@ -2232,6 +2233,13 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>   	}
>>   	ring->buffer = ringbuf;
>>
>> +	ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
>> +	if (ret) {
>> +		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
>> +				ring->name, ret);
>> +		goto error;
>> +	}
>> +
>>   	if (I915_NEED_GFX_HWS(dev)) {
>>   		ret = init_status_page(ring);
>>   		if (ret)
>> @@ -2243,14 +2251,6 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>   			goto error;
>>   	}
>>
>> -	ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
>> -	if (ret) {
>> -		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
>> -				ring->name, ret);
>> -		intel_destroy_ringbuffer_obj(ringbuf);
>> -		goto error;
>> -	}
>> -
>>   	ret = i915_cmd_parser_init_ring(ring);
>>   	if (ret)
>>   		goto error;
>> @@ -2264,19 +2264,16 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>
>>   void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
>>   {
>> -	struct drm_i915_private *dev_priv;
>> +	struct intel_ringbuffer *ringbuf;
>>
>>   	if (!intel_ring_initialized(ring))
>>   		return;
>>
>> -	dev_priv = to_i915(ring->dev);
>> -
>> -	if (ring->buffer) {
>> +	ringbuf = ring->buffer;
>> +	if (ringbuf) {
>>   		intel_stop_ring_buffer(ring);
>> -		WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
>> -
>> -		intel_unpin_ringbuffer_obj(ring->buffer);
>> -		intel_ringbuffer_free(ring->buffer);
>> +		intel_unpin_ringbuffer_obj(ringbuf);
>> +		intel_ringbuffer_free(ringbuf);
>>   		ring->buffer = NULL;
>
> Conversions to a horrible name for no reason. ringbuf was a mistake,
> please no more
> -Chris

'ringbuf' is a perfectly good name for a (struct intel_ringbuffer *). 
What we *can't* call it is 'ring', because that name is used here and 
everywhere else for a (struct intel_engine_cs *). Would you prefer 'rbp' 
(*r*ing*b*uffer*p*ointer) ? Other suggestions considered, but NOT 'ring'

Or are you simply objecting to having a local variable at all? It's 
generally a good idea to take a local copy of any structure member 
that's going to be referenced more than once, because it lets the 
compiler know that this function owns the value and it doesn't have to 
be fetched from memory more than once, even after some function has been 
called that could in theory have modified the structure member.

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/6] drm/i915: tidy up initialisation failure paths (GEM & LRC)
  2016-01-30 11:28     ` Chris Wilson
@ 2016-02-01  9:45       ` Dave Gordon
  2016-02-11  8:47         ` Daniel Vetter
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Gordon @ 2016-02-01  9:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 30/01/16 11:28, Chris Wilson wrote:
> On Sat, Jan 30, 2016 at 10:56:27AM +0000, Chris Wilson wrote:
>> On Fri, Jan 29, 2016 at 07:19:27PM +0000, Dave Gordon wrote:
>>> 1. add call to i915_gem_context_fini() to deallocate the default
>>>     context(s) if the call to init_rings() fails, so that we don't
>>>     leak the context in that situation.
>>>
>>> 2. remove useless code in intel_logical_ring_cleanup(), presumably
>>>     copypasted from legacy ringbuffer version at creation.
>>>
>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem.c  |  5 ++++-
>>>   drivers/gpu/drm/i915/intel_lrc.c | 10 ++--------
>>>   2 files changed, 6 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index a928823..5a4d468 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -4986,8 +4986,11 @@ int i915_gem_init(struct drm_device *dev)
>>>   		goto out_unlock;
>>>
>>>   	ret = dev_priv->gt.init_rings(dev);
>>> -	if (ret)
>>> +	if (ret) {
>>> +		i915_gem_context_fini(dev);
>>> +		/* XXX: anything else to be undone here? */
>>
>> Yes. Make this a separate patch and begin the onion unwind.
>
> Hmm. Actually, we have to make sure that we can still modeset if this
> function fails - that is anything but utter catastrophe should just
> result in loss of functionality (no stolen, no GEM execution etc) but we
> can still drive the displays so the user can see how bad the damage is.
> -Chris

Yes, Mika said that's why (he thought) there wasn't a complete reversal 
of everything the driver has done up to this point.

The addition of the context_fini() seems reasonable, that's going to 
make it leak a bit less, while still leaving basic framebuffer working.

Could be a separate patch if you like, but hardly seems worth splitting 
from the other chunk, which after all only replaces unreachable code 
with a WARNing.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/6] drm/i915: tidy up initialisation failure paths (GEM & LRC)
  2016-02-01  9:45       ` Dave Gordon
@ 2016-02-11  8:47         ` Daniel Vetter
  2016-02-15 11:55           ` Dave Gordon
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Vetter @ 2016-02-11  8:47 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Feb 01, 2016 at 09:45:25AM +0000, Dave Gordon wrote:
> On 30/01/16 11:28, Chris Wilson wrote:
> >On Sat, Jan 30, 2016 at 10:56:27AM +0000, Chris Wilson wrote:
> >>On Fri, Jan 29, 2016 at 07:19:27PM +0000, Dave Gordon wrote:
> >>>1. add call to i915_gem_context_fini() to deallocate the default
> >>>    context(s) if the call to init_rings() fails, so that we don't
> >>>    leak the context in that situation.
> >>>
> >>>2. remove useless code in intel_logical_ring_cleanup(), presumably
> >>>    copypasted from legacy ringbuffer version at creation.
> >>>

If your commit message has a list of independent changes ...

> >>>Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem.c  |  5 ++++-
> >>>  drivers/gpu/drm/i915/intel_lrc.c | 10 ++--------
> >>>  2 files changed, 6 insertions(+), 9 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>index a928823..5a4d468 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>@@ -4986,8 +4986,11 @@ int i915_gem_init(struct drm_device *dev)
> >>>  		goto out_unlock;
> >>>
> >>>  	ret = dev_priv->gt.init_rings(dev);
> >>>-	if (ret)
> >>>+	if (ret) {
> >>>+		i915_gem_context_fini(dev);
> >>>+		/* XXX: anything else to be undone here? */
> >>
> >>Yes. Make this a separate patch and begin the onion unwind.
> >
> >Hmm. Actually, we have to make sure that we can still modeset if this
> >function fails - that is anything but utter catastrophe should just
> >result in loss of functionality (no stolen, no GEM execution etc) but we
> >can still drive the displays so the user can see how bad the damage is.
> >-Chris
> 
> Yes, Mika said that's why (he thought) there wasn't a complete reversal of
> everything the driver has done up to this point.
> 
> The addition of the context_fini() seems reasonable, that's going to make it
> leak a bit less, while still leaving basic framebuffer working.
> 
> Could be a separate patch if you like, but hardly seems worth splitting from
> the other chunk, which after all only replaces unreachable code with a
> WARNing.

... it should be split. As a rule of thumb at least. I don't really care
all that much here though, so jut for the future.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/6] drm/i915: tidy up initialisation failure paths (legacy)
  2016-02-01  9:38     ` Dave Gordon
@ 2016-02-11 13:35       ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2016-02-11 13:35 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Mon, Feb 01, 2016 at 09:38:02AM +0000, Dave Gordon wrote:
> On 30/01/16 10:50, Chris Wilson wrote:
> >On Fri, Jan 29, 2016 at 07:19:26PM +0000, Dave Gordon wrote:
> >>1. Fix intel_cleanup_ring_buffer() to handle the error cleanup
> >>    case where the ringbuffer has been allocated but map-and-pin
> >>    failed. Unpin it iff it's previously been mapped-and-pinned.
> >>
> >>2. Fix the error path in intel_init_ring_buffer(), which already
> >>    called intel_destroy_ringbuffer_obj(), but failed to free the
> >>    actual ringbuffer structure. Calling intel_ringbuffer_free()
> >>    instead does both in one go.
> >>
> >>3. With the above change, intel_destroy_ringbuffer_obj() is only
> >>    called in one place (intel_ringbuffer_free()), so flatten it
> >>    into that function.
> >>
> >>4. move low-level register accesses from intel_cleanup_ring_buffer()
> >>    (which calls intel_stop_ring_buffer(ring) which calls stop_ring())
> >>    down into stop_ring() itself), which is already doing low-level
> >>    register accesses. Then, intel_cleanup_ring_buffer() no longer
> >>    needs 'dev_priv'.
> >>
> >>v3:
> >>    Make intel_unpin_ringbuffer_obj() return early if ringbuffer is
> >>    not mapped, simplifying matters for the caller. [Chris Wilson,
> >>    Daniel Vetter]
> >>    Don't bother to clear a pointer in an object about to be freed.
> >>    [Chris Wilson]
> >>
> >>Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> >>Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >>---
> >>  drivers/gpu/drm/i915/intel_ringbuffer.c | 49 ++++++++++++++++-----------------
> >>  1 file changed, 23 insertions(+), 26 deletions(-)
> >>
> >>diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>index 6f5b511..db02893 100644
> >>--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>@@ -551,6 +551,8 @@ static bool stop_ring(struct intel_engine_cs *ring)
> >>  		I915_WRITE_MODE(ring, _MASKED_BIT_DISABLE(STOP_RING));
> >>  	}
> >>
> >>+	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
> >
> >This warn was guarding the release of the backing pages, to verify that
> >we had stopped the rings first.
> 
> It's still effectively at the same point in the execution sequence,
> just not intruding into a function that has no business peeking at
> registers. It's now at the end of the called subfunctions, rather
> than just after the call at the top-level, where it was a layering
> violation.

Nope, semantically it is an important difference and the assertion that
the hardware is not reading from the object as we return the pages back
to the system is very, very important documentation.
 
> Or we could put it in a separate function, and say
> 	WARN_ON(!ring_is_idle(ring));
> 
> >>+
> >>  	return (I915_READ_HEAD(ring) & HEAD_ADDR) == 0;
> >>  }
> >>
> >>@@ -2055,6 +2057,9 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
> >>
> >>  void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
> >>  {
> >>+	if (!ringbuf->virtual_start)
> >>+		return;
> >>+
> >>  	if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
> >>  		vunmap(ringbuf->virtual_start);
> >>  	else
> >>@@ -2132,12 +2137,6 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> >>  	return 0;
> >>  }
> >>
> >>-static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
> >>-{
> >>-	drm_gem_object_unreference(&ringbuf->obj->base);
> >>-	ringbuf->obj = NULL;
> >>-}
> >>-
> >>  static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
> >>  				      struct intel_ringbuffer *ringbuf)
> >>  {
> >>@@ -2200,11 +2199,13 @@ struct intel_ringbuffer *
> >>  }
> >>
> >>  void
> >>-intel_ringbuffer_free(struct intel_ringbuffer *ring)
> >>+intel_ringbuffer_free(struct intel_ringbuffer *ringbuf)
> >>  {
> >>-	intel_destroy_ringbuffer_obj(ring);
> >>-	list_del(&ring->link);
> >>-	kfree(ring);
> >>+	if (ringbuf->obj)
> >>+		drm_gem_object_unreference(&ringbuf->obj->base);
> >
> >No, let's not add duplicate code. If you worry,
> >replace to_intel_bo() with
> 
> I suspect the above doesn't actually duplicate the test, 'cos
> drm_gem_object_unreference() is inline, and common subexpression
> elimination should determine that ringbuf->obj and
> &ringbuf->obj->base are the same thing and then the compiler can
> collapse the explicit test here with the one inside
> drm_gem_object_unreference().

So why did you write it? Since it clearly duplicates the existing code,
the only reason I could think of was if you did not trust

&ringbuf->obj == &ringbuf->obj->base.
 
> >static inline struct drm_i915_gem_object *
> >to_intel_bo(struct drm_gem_object *gem_obj)
> >{
> >	BUILD_BUG_ON(offsetof(struct drm_i915_gem_object, base));
> >	return container_of(gem_obj, struct drm_i915_gem_object, base);
> >}
> 
> I would quite like to get rid of the "&(...)->base" that intrudes
> everywhere - we should be able to just (un)reference the object
> we're working with, without caring that it's a subclass of some
> other type. So we'd want the opposite function as well, for passing
> i915 objects to drm functions (as here), maybe call it to_drm_obj()?

See above.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 6/6] drm/i915: fix context/engine cleanup order
  2016-01-30 11:17   ` Chris Wilson
@ 2016-02-11 13:36     ` Chris Wilson
  2016-02-11 15:02       ` Mika Kuoppala
  2016-02-15 13:43       ` Dave Gordon
  0 siblings, 2 replies; 24+ messages in thread
From: Chris Wilson @ 2016-02-11 13:36 UTC (permalink / raw)
  To: Dave Gordon, intel-gfx, Nick Hoath, Mika Kuoppala, Daniel Vetter

On Sat, Jan 30, 2016 at 11:17:07AM +0000, Chris Wilson wrote:
> On Fri, Jan 29, 2016 at 07:19:31PM +0000, Dave Gordon wrote:
> > From: Nick Hoath <nicholas.hoath@intel.com>
> > 
> > Swap the order of context & engine cleanup, so that contexts are cleaned
> > up first, and *then* engines. This is a more sensible order anyway, but
> > in particular has become necessary since the 'intel_ring_initialized()
> > must be simple and inline' patch, which now uses ring->dev as an
> > 'initialised' flag, so it can now be NULL after engine teardown. This in
> > turn can cause a problem in the context code, which (used to) check the
> > ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.
> > 
> > Also rename the cleanup function to reflect what it actually does
> > (cleanup engines, not a ringbuffer), and fix an annoying whitespace
> > issue.
> > 
> > v2: Also make the fix in i915_load_modeset_init, not just in
> >     i915_driver_unload (Chris Wilson)
> > v3: Had extra stuff in it.
> > v4: Reverted extra stuff (so we're back to v2).
> >     Rebased and updated commentary above (Dave Gordon).
> > 
> > Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
> > Signed-off-by: David Gordon <david.s.gordon@intel.com>
> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Have to drop that, with the recent context changes.
> 
> You have to move the gpu-reset now for execlists.
> 
> Basically pull it out into:
> 
> static void i915_unload_gem(struct drm_device *dev)
> {
>        /*
>         * Neither the BIOS, ourselves or any other kernel
>         * expects the system to be in execlists mode on startup,
>         * so we need to reset the GPU back to legacy mode. And the only
>         * known way to disable logical contexts is through a GPU reset.
>         *
>         * So in order to leave the system in a known default configration,
>         * always reset the GPU upon unload. This also cleans up the GEM
>         * state tracking, flushing off the requests and leaving the system
>         * idle.
>         *
>         * Note that is of the upmost importance that the GPU is idle and
>         * all stray writes are flushed *before* we dismantle the backing
>         * storage for the pinned objects.
>         */
>        i915_reset(dev);
> 
>        mutex_lock(&dev->struct_mutex);
>        i915_gem_context_fini(dev);
>        i915_gem_cleanup_ringbuffer(dev);
>        mutex_unlock(&dev->struct_mutex);
> }
> 
> and then kill the intel_gpu_reset along both the cleanup pathsh

It appears this patch was applied without dropping my r-b for the issue
I pointed out above.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 6/6] drm/i915: fix context/engine cleanup order
  2016-02-11 13:36     ` Chris Wilson
@ 2016-02-11 15:02       ` Mika Kuoppala
  2016-02-15 12:00         ` Dave Gordon
  2016-02-15 13:43       ` Dave Gordon
  1 sibling, 1 reply; 24+ messages in thread
From: Mika Kuoppala @ 2016-02-11 15:02 UTC (permalink / raw)
  To: Chris Wilson, Dave Gordon, intel-gfx, Nick Hoath, Daniel Vetter

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Sat, Jan 30, 2016 at 11:17:07AM +0000, Chris Wilson wrote:
>> On Fri, Jan 29, 2016 at 07:19:31PM +0000, Dave Gordon wrote:
>> > From: Nick Hoath <nicholas.hoath@intel.com>
>> > 
>> > Swap the order of context & engine cleanup, so that contexts are cleaned
>> > up first, and *then* engines. This is a more sensible order anyway, but
>> > in particular has become necessary since the 'intel_ring_initialized()
>> > must be simple and inline' patch, which now uses ring->dev as an
>> > 'initialised' flag, so it can now be NULL after engine teardown. This in
>> > turn can cause a problem in the context code, which (used to) check the
>> > ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.
>> > 
>> > Also rename the cleanup function to reflect what it actually does
>> > (cleanup engines, not a ringbuffer), and fix an annoying whitespace
>> > issue.
>> > 
>> > v2: Also make the fix in i915_load_modeset_init, not just in
>> >     i915_driver_unload (Chris Wilson)
>> > v3: Had extra stuff in it.
>> > v4: Reverted extra stuff (so we're back to v2).
>> >     Rebased and updated commentary above (Dave Gordon).
>> > 
>> > Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
>> > Signed-off-by: David Gordon <david.s.gordon@intel.com>
>> > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>> 
>> Have to drop that, with the recent context changes.
>> 
>> You have to move the gpu-reset now for execlists.
>> 
>> Basically pull it out into:
>> 
>> static void i915_unload_gem(struct drm_device *dev)
>> {
>>        /*
>>         * Neither the BIOS, ourselves or any other kernel
>>         * expects the system to be in execlists mode on startup,
>>         * so we need to reset the GPU back to legacy mode. And the only
>>         * known way to disable logical contexts is through a GPU reset.
>>         *
>>         * So in order to leave the system in a known default configration,
>>         * always reset the GPU upon unload. This also cleans up the GEM
>>         * state tracking, flushing off the requests and leaving the system
>>         * idle.
>>         *
>>         * Note that is of the upmost importance that the GPU is idle and
>>         * all stray writes are flushed *before* we dismantle the backing
>>         * storage for the pinned objects.
>>         */
>>        i915_reset(dev);
>> 
>>        mutex_lock(&dev->struct_mutex);
>>        i915_gem_context_fini(dev);
>>        i915_gem_cleanup_ringbuffer(dev);
>>        mutex_unlock(&dev->struct_mutex);
>> }
>> 
>> and then kill the intel_gpu_reset along both the cleanup pathsh
>
> It appears this patch was applied without dropping my r-b for the issue
> I pointed out above.

Now causes a splat in intel_logical_ring_cleanup when unloading module.

Best to revert and rework on top of Dave's cleanup set?
-Mika



[   58.170369] BUG: unable to handle kernel NULL pointer dereference at
(null)
[   58.170374] IP: [<ffffffffa00e04d3>]
intel_logical_ring_cleanup+0x83/0x100 [i915]
[   58.170389] PGD 0 
[   58.170391] Oops: 0000 [#1] PREEMPT SMP 
[   58.170394] Modules linked in: x86_pkg_temp_thermal intel_powerclamp
coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915(-)
mei_me mei i2c_hid e1000e ptp pps_core
[   58.170404] CPU: 0 PID: 5766 Comm: rmmod Not tainted 4.5.0-rc3+ #43
[   58.170407] Hardware name: System manufacturer System Product
Name/Z170M-PLUS, BIOS 0505 11/16/2015
[   58.170410] task: ffff880036aab380 ti: ffff8802374c0000 task.ti:
ffff8802374c0000
[   58.170412] RIP: 0010:[<ffffffffa00e04d3>]  [<ffffffffa00e04d3>]
intel_logical_ring_cleanup+0x83/0x100 [i915]
[   58.170424] RSP: 0018:ffff8802374c3d30  EFLAGS: 00010282
[   58.170425] RAX: 0000000000000000 RBX: ffff880237203788 RCX:
0000000000000001
[   58.170428] RDX: 0000000087654321 RSI: 000000000000000d RDI:
ffff8802372037c0
[   58.170430] RBP: ffff8802374c3d40 R08: 0000000000000000 R09:
0000000ad856c000
[   58.170432] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff880237200000
[   58.170434] R13: ffff880237209638 R14: ffff880237323000 R15:
0000558a770bd090
[   58.170438] FS:  00007f8b439ff700(0000) GS:ffff880239800000(0000)
knlGS:0000000000000000
[   58.170442] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   58.170444] CR2: 0000000000000000 CR3: 000000023532d000 CR4:
00000000003406f0
[   58.170447] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   58.170450] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7:
0000000000000400
[   58.170453] Stack:
[   58.170455]  ffff880237203788 ffff880237200000 ffff8802374c3d70
ffffffffa00d0ed4
[   58.170460]  ffff880237200000 ffff880237323000 ffff880237323060
ffffffffa0194360
[   58.170465]  ffff8802374c3d98 ffffffffa0154520 ffff880237323000
ffff880237323000
[   58.170469] Call Trace:
[   58.170479]  [<ffffffffa00d0ed4>] i915_gem_cleanup_engines+0x34/0x60
[i915]
[   58.170493]  [<ffffffffa0154520>] i915_driver_unload+0x140/0x220
[i915]
[   58.170497]  [<ffffffff8154a4f4>] drm_dev_unregister+0x24/0xa0
[   58.170501]  [<ffffffff8154aace>] drm_put_dev+0x1e/0x60
[   58.170506]  [<ffffffffa00912a0>] i915_pci_remove+0x10/0x20 [i915]
[   58.170510]  [<ffffffff814766e4>] pci_device_remove+0x34/0xb0
[   58.170514]  [<ffffffff8156e7d5>] __device_release_driver+0x95/0x140
[   58.170518]  [<ffffffff8156e97c>] driver_detach+0xbc/0xc0
[   58.170521]  [<ffffffff8156d883>] bus_remove_driver+0x53/0xd0
[   58.170525]  [<ffffffff8156f3a7>] driver_unregister+0x27/0x50
[   58.170528]  [<ffffffff81475725>] pci_unregister_driver+0x25/0x70
[   58.170531]  [<ffffffff8154c274>] drm_pci_exit+0x74/0x90
[   58.170543]  [<ffffffffa0154cb0>] i915_exit+0x20/0x1aa [i915]
[   58.170548]  [<ffffffff8111846f>] SyS_delete_module+0x18f/0x1f0


> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/6] drm/i915: tidy up initialisation failure paths (GEM & LRC)
  2016-02-11  8:47         ` Daniel Vetter
@ 2016-02-15 11:55           ` Dave Gordon
  0 siblings, 0 replies; 24+ messages in thread
From: Dave Gordon @ 2016-02-15 11:55 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 11/02/16 08:47, Daniel Vetter wrote:
> On Mon, Feb 01, 2016 at 09:45:25AM +0000, Dave Gordon wrote:
>> On 30/01/16 11:28, Chris Wilson wrote:
>>> On Sat, Jan 30, 2016 at 10:56:27AM +0000, Chris Wilson wrote:
>>>> On Fri, Jan 29, 2016 at 07:19:27PM +0000, Dave Gordon wrote:
>>>>> 1. add call to i915_gem_context_fini() to deallocate the default
>>>>>     context(s) if the call to init_rings() fails, so that we don't
>>>>>     leak the context in that situation.
>>>>>
>>>>> 2. remove useless code in intel_logical_ring_cleanup(), presumably
>>>>>     copypasted from legacy ringbuffer version at creation.
>>>>>
>
> If your commit message has a list of independent changes ...
>
>>>>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_gem.c  |  5 ++++-
>>>>>   drivers/gpu/drm/i915/intel_lrc.c | 10 ++--------
>>>>>   2 files changed, 6 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>>> index a928823..5a4d468 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>> @@ -4986,8 +4986,11 @@ int i915_gem_init(struct drm_device *dev)
>>>>>   		goto out_unlock;
>>>>>
>>>>>   	ret = dev_priv->gt.init_rings(dev);
>>>>> -	if (ret)
>>>>> +	if (ret) {
>>>>> +		i915_gem_context_fini(dev);
>>>>> +		/* XXX: anything else to be undone here? */
>>>>
>>>> Yes. Make this a separate patch and begin the onion unwind.
>>>
>>> Hmm. Actually, we have to make sure that we can still modeset if this
>>> function fails - that is anything but utter catastrophe should just
>>> result in loss of functionality (no stolen, no GEM execution etc) but we
>>> can still drive the displays so the user can see how bad the damage is.
>>> -Chris
>>
>> Yes, Mika said that's why (he thought) there wasn't a complete reversal of
>> everything the driver has done up to this point.
>>
>> The addition of the context_fini() seems reasonable, that's going to make it
>> leak a bit less, while still leaving basic framebuffer working.
>>
>> Could be a separate patch if you like, but hardly seems worth splitting from
>> the other chunk, which after all only replaces unreachable code with a
>> WARNing.
>
> ... it should be split. As a rule of thumb at least. I don't really care
> all that much here though, so jut for the future.
> -Daniel

That was already done in the updated (v5) patchset posted 2016-02-05.
This (v4) sequence is therefore already obsolete.

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 6/6] drm/i915: fix context/engine cleanup order
  2016-02-11 15:02       ` Mika Kuoppala
@ 2016-02-15 12:00         ` Dave Gordon
  0 siblings, 0 replies; 24+ messages in thread
From: Dave Gordon @ 2016-02-15 12:00 UTC (permalink / raw)
  To: Mika Kuoppala, Chris Wilson, intel-gfx, Nick Hoath, Daniel Vetter

On 11/02/16 15:02, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
>> On Sat, Jan 30, 2016 at 11:17:07AM +0000, Chris Wilson wrote:
>>> On Fri, Jan 29, 2016 at 07:19:31PM +0000, Dave Gordon wrote:
>>>> From: Nick Hoath <nicholas.hoath@intel.com>
>>>>
>>>> Swap the order of context & engine cleanup, so that contexts are cleaned
>>>> up first, and *then* engines. This is a more sensible order anyway, but
>>>> in particular has become necessary since the 'intel_ring_initialized()
>>>> must be simple and inline' patch, which now uses ring->dev as an
>>>> 'initialised' flag, so it can now be NULL after engine teardown. This in
>>>> turn can cause a problem in the context code, which (used to) check the
>>>> ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.
>>>>
>>>> Also rename the cleanup function to reflect what it actually does
>>>> (cleanup engines, not a ringbuffer), and fix an annoying whitespace
>>>> issue.
>>>>
>>>> v2: Also make the fix in i915_load_modeset_init, not just in
>>>>      i915_driver_unload (Chris Wilson)
>>>> v3: Had extra stuff in it.
>>>> v4: Reverted extra stuff (so we're back to v2).
>>>>      Rebased and updated commentary above (Dave Gordon).
>>>>
>>>> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
>>>> Signed-off-by: David Gordon <david.s.gordon@intel.com>
>>>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> Have to drop that, with the recent context changes.
>>>
>>> You have to move the gpu-reset now for execlists.
>>>
>>> Basically pull it out into:
>>>
>>> static void i915_unload_gem(struct drm_device *dev)
>>> {
>>>         /*
>>>          * Neither the BIOS, ourselves or any other kernel
>>>          * expects the system to be in execlists mode on startup,
>>>          * so we need to reset the GPU back to legacy mode. And the only
>>>          * known way to disable logical contexts is through a GPU reset.
>>>          *
>>>          * So in order to leave the system in a known default configration,
>>>          * always reset the GPU upon unload. This also cleans up the GEM
>>>          * state tracking, flushing off the requests and leaving the system
>>>          * idle.
>>>          *
>>>          * Note that is of the upmost importance that the GPU is idle and
>>>          * all stray writes are flushed *before* we dismantle the backing
>>>          * storage for the pinned objects.
>>>          */
>>>         i915_reset(dev);
>>>
>>>         mutex_lock(&dev->struct_mutex);
>>>         i915_gem_context_fini(dev);
>>>         i915_gem_cleanup_ringbuffer(dev);
>>>         mutex_unlock(&dev->struct_mutex);
>>> }
>>>
>>> and then kill the intel_gpu_reset along both the cleanup pathsh
>>
>> It appears this patch was applied without dropping my r-b for the issue
>> I pointed out above.
>
> Now causes a splat in intel_logical_ring_cleanup when unloading module.
>
> Best to revert and rework on top of Dave's cleanup set?
> -Mika

This whole patchset was already superseded by a newer version before 
Daniel merged it. The newer version doesn't have Chris's R-B on this 
(context/engine) patch, only on two others that are still as they were
when he reviewed them.

Please go and look instead at [v5, 11 patches] posted 2016-02-05.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 6/6] drm/i915: fix context/engine cleanup order
  2016-02-11 13:36     ` Chris Wilson
  2016-02-11 15:02       ` Mika Kuoppala
@ 2016-02-15 13:43       ` Dave Gordon
  1 sibling, 0 replies; 24+ messages in thread
From: Dave Gordon @ 2016-02-15 13:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Nick Hoath, Mika Kuoppala, Daniel Vetter

On 11/02/16 13:36, Chris Wilson wrote:
> On Sat, Jan 30, 2016 at 11:17:07AM +0000, Chris Wilson wrote:
>> On Fri, Jan 29, 2016 at 07:19:31PM +0000, Dave Gordon wrote:
>>> From: Nick Hoath <nicholas.hoath@intel.com>
>>>
>>> Swap the order of context & engine cleanup, so that contexts are cleaned
>>> up first, and *then* engines. This is a more sensible order anyway, but
>>> in particular has become necessary since the 'intel_ring_initialized()
>>> must be simple and inline' patch, which now uses ring->dev as an
>>> 'initialised' flag, so it can now be NULL after engine teardown. This in
>>> turn can cause a problem in the context code, which (used to) check the
>>> ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.

[snip]

> It appears this patch was applied without dropping my r-b for the issue
> I pointed out above.
> -Chris

Not only that, but it was plucked from the end of the set of 6 patches 
without the earlier ones in the sequence, despite the cover letter [0/6] 
saying this:

     Patches reordered again; the bug fixes are now in patches 3 and 5.
     The former could stand alone; the latter depends on patch 4 and is
     a prerequisite for Nick's patch [6/6] to function.

So there's no chance that this one alone could be expected to work :(

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-02-15 13:43 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-29 19:19 [PATCH v4 0/6] A collection of cleanups, version 4 Dave Gordon
2016-01-29 19:19 ` [PATCH v4 1/6] drm/i915: tidy up initialisation failure paths (legacy) Dave Gordon
2016-01-30 10:50   ` Chris Wilson
2016-02-01  9:38     ` Dave Gordon
2016-02-11 13:35       ` Chris Wilson
2016-01-29 19:19 ` [PATCH v4 2/6] drm/i915: tidy up initialisation failure paths (GEM & LRC) Dave Gordon
2016-01-30 10:56   ` Chris Wilson
2016-01-30 11:28     ` Chris Wilson
2016-02-01  9:45       ` Dave Gordon
2016-02-11  8:47         ` Daniel Vetter
2016-02-15 11:55           ` Dave Gordon
2016-01-29 19:19 ` [PATCH v4 3/6] drm/i915: unmap the correct page in intel_logical_ring_cleanup() Dave Gordon
2016-01-30 11:01   ` Chris Wilson
2016-01-29 19:19 ` [PATCH v4 4/6] drm/i915: consolidate LRC mode HWSP setup & teardown Dave Gordon
2016-01-30 11:11   ` Chris Wilson
2016-01-29 19:19 ` [PATCH v4 5/6] drm/i915: HWSP should be unmapped earlier in LRC teardown sequence Dave Gordon
2016-01-30 11:13   ` Chris Wilson
2016-01-29 19:19 ` [PATCH v4 6/6] drm/i915: fix context/engine cleanup order Dave Gordon
2016-01-30 11:17   ` Chris Wilson
2016-02-11 13:36     ` Chris Wilson
2016-02-11 15:02       ` Mika Kuoppala
2016-02-15 12:00         ` Dave Gordon
2016-02-15 13:43       ` Dave Gordon
2016-02-01  8:59 ` ✓ Fi.CI.BAT: success for A collection of cleanups, version 4 Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.