All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] xf86-video-intel: enable hw-generated binding tables
@ 2014-04-22 15:16 Abdiel Janulgue
  2014-04-22 15:16 ` [RFC PATCH 1/2] xf86-video-intel: Add "ResourceStreamer" option Abdiel Janulgue
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Abdiel Janulgue @ 2014-04-22 15:16 UTC (permalink / raw)
  To: intel-gfx

This patch series enables resource streamer for xf86-video-intel UXA.

Fixes i965 Mesa driver that makes use of resource streamer optimization
to survive a full piglit run without bricking the machine. Previously,
I get random hungs when doing a full piglit round when enabling RS. 
After months of head-scratching, I noticed I didn't quite pay attention
to this small detail in bspec:

 "The binding table generator feature has a simple all or nothing
  model. If HW generated binding tables are enabled, the driver must enable
  the pool and use 3D_HW_BINDING_TABLE_POINTER_* commands."

I realized that our xf86-video-intel driver is piping out 3D commands
as well that used the older manual-generated binding table format
that caused a conflict when Mesa is using the hw-generated binding table 
format.

I didn't touch the SNA and video acceleration paths yet. Please let me know
if I'm on the right track.

To use the feature, set in xorg.conf:

Option "ResourceStreamer" "true"

 src/intel_options.c         |    1 +
 src/intel_options.h         |    1 +
 src/uxa/i965_3d.c           |    5 ++-
 src/uxa/i965_reg.h          |    8 +++++
 src/uxa/i965_render.c       |   78 +++++++++++++++++++++++++++++++++++--------
 src/uxa/intel.h             |    3 ++
 src/uxa/intel_batchbuffer.c |    7 ++--
 src/uxa/intel_driver.c      |   10 +++++-
 8 files changed, 94 insertions(+), 19 deletions(-)

Abdiel Janulgue (2):

[PATCH 1/2] xf86-video-intel: Add "ResourceStreamer" option
[PATCH 2/2] xf86-video-intel: Enable hw-generated binding tables for

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/2] xf86-video-intel: Add "ResourceStreamer" option
  2014-04-22 15:16 [RFC] xf86-video-intel: enable hw-generated binding tables Abdiel Janulgue
@ 2014-04-22 15:16 ` Abdiel Janulgue
  2014-04-22 15:16 ` [RFC PATCH 2/2] xf86-video-intel: Enable hw-generated binding tables for UXA Abdiel Janulgue
  2014-04-22 15:23 ` [RFC] xf86-video-intel: enable hw-generated binding tables Chris Wilson
  2 siblings, 0 replies; 15+ messages in thread
From: Abdiel Janulgue @ 2014-04-22 15:16 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
---
 src/intel_options.c    |    1 +
 src/intel_options.h    |    1 +
 src/uxa/intel.h        |    3 +++
 src/uxa/intel_driver.c |   10 +++++++++-
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/intel_options.c b/src/intel_options.c
index 02a4ae1..cc418d1 100644
--- a/src/intel_options.c
+++ b/src/intel_options.c
@@ -35,6 +35,7 @@ const OptionInfoRec intel_options[] = {
 	{OPTION_DEBUG_FLUSH_CACHES, "DebugFlushCaches", OPTV_BOOLEAN, {0}, 0},
 	{OPTION_DEBUG_WAIT, "DebugWait", OPTV_BOOLEAN, {0}, 0},
 	{OPTION_BUFFER_CACHE,	"BufferCache",	OPTV_BOOLEAN,   {0},    1},
+	{OPTION_RESOURCESTREAMER, "ResourceStreamer",	OPTV_BOOLEAN,   {0},    0},
 #endif
 	{-1,			NULL,		OPTV_NONE,	{0},	0}
 };
diff --git a/src/intel_options.h b/src/intel_options.h
index 77f0c45..4867419 100644
--- a/src/intel_options.h
+++ b/src/intel_options.h
@@ -43,6 +43,7 @@ enum intel_options {
 	OPTION_DEBUG_FLUSH_CACHES,
 	OPTION_DEBUG_WAIT,
 	OPTION_BUFFER_CACHE,
+	OPTION_RESOURCESTREAMER,
 #endif
 	NUM_OPTIONS,
 };
diff --git a/src/uxa/intel.h b/src/uxa/intel.h
index 6ac770e..611eeea 100644
--- a/src/uxa/intel.h
+++ b/src/uxa/intel.h
@@ -349,6 +349,9 @@ typedef struct intel_screen_private {
 	 */
 	Bool fallback_debug;
 	unsigned debug_flush;
+
+	dri_bo *hw_bt_pool_bo;
+	Bool use_resource_streamer;
 #if HAVE_UDEV
 	struct udev_monitor *uevent_monitor;
 	pointer uevent_handler;
diff --git a/src/uxa/intel_driver.c b/src/uxa/intel_driver.c
index 0a4fe2a..5574ffe 100644
--- a/src/uxa/intel_driver.c
+++ b/src/uxa/intel_driver.c
@@ -196,6 +196,15 @@ static Bool I830GetEarlyOptions(ScrnInfoPtr scrn)
 	intel->fallback_debug = xf86ReturnOptValBool(intel->Options,
 						     OPTION_FALLBACKDEBUG,
 						     FALSE);
+	intel->use_resource_streamer = xf86ReturnOptValBool(intel->Options,
+						     OPTION_RESOURCESTREAMER,
+						     FALSE);
+	if (intel->use_resource_streamer)
+		xf86DrvMsg(scrn->scrnIndex, X_CONFIG,
+			   "Enabling Resource Streamer\n");
+	else
+		xf86DrvMsg(scrn->scrnIndex, X_CONFIG,
+			   "Disabling Resource Streamer\n");
 
 	intel->debug_flush = 0;
 
@@ -870,7 +879,6 @@ I830ScreenInit(SCREEN_INIT_ARGS_DECL)
 	 * appropriate.
 	 */
 	intel->XvEnabled = TRUE;
-
 #ifdef DRI2
 	if (intel->directRenderingType == DRI_NONE
 	    && I830DRI2ScreenInit(screen))
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/2] xf86-video-intel: Enable hw-generated binding tables for UXA
  2014-04-22 15:16 [RFC] xf86-video-intel: enable hw-generated binding tables Abdiel Janulgue
  2014-04-22 15:16 ` [RFC PATCH 1/2] xf86-video-intel: Add "ResourceStreamer" option Abdiel Janulgue
@ 2014-04-22 15:16 ` Abdiel Janulgue
  2014-04-22 15:23 ` [RFC] xf86-video-intel: enable hw-generated binding tables Chris Wilson
  2 siblings, 0 replies; 15+ messages in thread
From: Abdiel Janulgue @ 2014-04-22 15:16 UTC (permalink / raw)
  To: intel-gfx

Code is based on my hw-generated binding table code for Mesa
adapted to i965_composite path in UXA.

Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
---
 src/uxa/i965_3d.c           |    5 ++-
 src/uxa/i965_reg.h          |    8 +++++
 src/uxa/i965_render.c       |   78 +++++++++++++++++++++++++++++++++++--------
 src/uxa/intel_batchbuffer.c |    7 ++--
 4 files changed, 80 insertions(+), 18 deletions(-)

diff --git a/src/uxa/i965_3d.c b/src/uxa/i965_3d.c
index 757a979..afbb5a7 100644
--- a/src/uxa/i965_3d.c
+++ b/src/uxa/i965_3d.c
@@ -406,7 +406,10 @@ gen7_upload_binding_table(intel_screen_private *intel,
 			  uint32_t ps_binding_table_offset)
 {
 	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_PS | (2 - 2));
-	OUT_BATCH(ps_binding_table_offset);
+	if (intel->use_resource_streamer)
+		OUT_BATCH(ps_binding_table_offset >> 1);
+	else
+		OUT_BATCH(ps_binding_table_offset);
 }
 
 void
diff --git a/src/uxa/i965_reg.h b/src/uxa/i965_reg.h
index a934a67..157b212 100644
--- a/src/uxa/i965_reg.h
+++ b/src/uxa/i965_reg.h
@@ -296,6 +296,14 @@
 /* DW1 */
 # define GEN7_PUSH_CONSTANT_BUFFER_OFFSET_SHIFT 16
 
+/* GEN7+ resource streamer */
+#define GEN7_3DSTATE_BINDING_TABLE_POOL_ALLOC           BRW_3D(3, 1, 0x19)
+# define BINDING_TABLE_POOL_ENABLE              0x0860
+#define GEN7_3DSTATE_BINDING_TABLE_EDIT_VS              BRW_3D(3, 0, 0x43)
+#define GEN7_3DSTATE_BINDING_TABLE_EDIT_GS              BRW_3D(3, 0, 0x44)
+#define GEN7_3DSTATE_BINDING_TABLE_EDIT_HS              BRW_3D(3, 0, 0x45)
+#define GEN7_3DSTATE_BINDING_TABLE_EDIT_DS              BRW_3D(3, 0, 0x46)
+#define GEN7_3DSTATE_BINDING_TABLE_EDIT_PS              BRW_3D(3, 0, 0x47)
 
 #define PIPELINE_SELECT_3D		0
 #define PIPELINE_SELECT_MEDIA		1
diff --git a/src/uxa/i965_render.c b/src/uxa/i965_render.c
index 74f57af..d5225dd 100644
--- a/src/uxa/i965_render.c
+++ b/src/uxa/i965_render.c
@@ -1783,6 +1783,10 @@ static void i965_surface_flush(struct intel_screen_private *intel)
 				   sizeof(intel->surface_data), 4096);
 	assert(intel->surface_bo);
 
+	drm_intel_bo_unreference(intel->hw_bt_pool_bo);
+	intel->hw_bt_pool_bo = drm_intel_bo_alloc(intel->bufmgr, "hw_bt",
+						  131072, 4096);
+	assert(intel->hw_bt_pool_bo);
 	return;
 	(void)ret;
 }
@@ -2217,32 +2221,70 @@ static void i965_select_vertex_buffer(struct intel_screen_private *intel)
 static void i965_bind_surfaces(struct intel_screen_private *intel)
 {
 	uint32_t *binding_table;
+	uint32_t surf0 = 0, surf1 = 0, surf2 = 0;
 
 	assert(intel->surface_used + 4 * SURFACE_STATE_PADDED_SIZE <= sizeof(intel->surface_data));
 
-	binding_table = (uint32_t*) (intel->surface_data + intel->surface_used);
-	intel->surface_table = intel->surface_used;
-	intel->surface_used += SURFACE_STATE_PADDED_SIZE;
-
-	binding_table[0] =
-		i965_set_picture_surface_state(intel,
+	surf0 = i965_set_picture_surface_state(intel,
 					       intel->render_dest_picture,
 					       intel->render_dest,
 					       TRUE);
-	binding_table[1] =
-		i965_set_picture_surface_state(intel,
+	surf1 = i965_set_picture_surface_state(intel,
 					       intel->render_source_picture,
 					       intel->render_source,
 					       FALSE);
 	if (intel->render_mask) {
-		binding_table[2] =
-			i965_set_picture_surface_state(intel,
-						       intel->render_mask_picture,
-						       intel->render_mask,
-						       FALSE);
+		surf2  = i965_set_picture_surface_state(intel,
+							intel->render_mask_picture,
+							intel->render_mask,
+							FALSE);
+	}
+
+	if (intel->use_resource_streamer) {
+		intel->surface_table += (256 * sizeof(uint16_t));
+		OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_EDIT_PS | (5 - 2));
+		OUT_BATCH(0x3);
+		{
+			OUT_BATCH(0 << 16 | surf0 >> 5);
+			OUT_BATCH(1 << 16 | surf1 >> 5);
+			OUT_BATCH(2 << 16 | surf2 >> 5);
+		}
+	} else {
+		binding_table = (uint32_t*) (intel->surface_data + intel->surface_used);
+		intel->surface_table = intel->surface_used;
+		intel->surface_used += SURFACE_STATE_PADDED_SIZE;
+
+		binding_table[0] = surf0;
+		binding_table[1] = surf1;
+		binding_table[2] = surf2;
 	}
 }
 
+static void i965_enable_hw_binding_table(struct intel_screen_private *intel)
+{
+	if (!intel->use_resource_streamer)
+		return;
+
+	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POOL_ALLOC | (3 - 2));
+	OUT_RELOC(intel->hw_bt_pool_bo, I915_GEM_DOMAIN_SAMPLER, 0,
+		  BINDING_TABLE_POOL_ENABLE);
+	OUT_RELOC(intel->hw_bt_pool_bo, I915_GEM_DOMAIN_SAMPLER, 0,
+		  intel->hw_bt_pool_bo->size);
+
+	OUT_BATCH(BRW_PIPE_CONTROL | (4 - 2));
+	OUT_BATCH(BRW_PIPE_CONTROL_GLOBAL_GTT);
+	OUT_BATCH(0); /* address */
+	OUT_BATCH(0); /* write data */
+
+	/* Do a block clear for existing on-chip binding table entries
+	   that might have stuck from the old batch. Otherwise, this
+	   causes GPU hungs
+	*/
+	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_EDIT_PS | (3 - 2));
+	OUT_BATCH(0xffff << 16 | 0x3 );
+	OUT_BATCH(0);
+}
+
 void
 i965_composite(PixmapPtr dest, int srcX, int srcY, int maskX, int maskY,
 	       int dstX, int dstY, int w, int h)
@@ -2252,6 +2294,7 @@ i965_composite(PixmapPtr dest, int srcX, int srcY, int maskX, int maskY,
 
 	intel_batch_start_atomic(scrn, 200);
 	if (intel->needs_render_state_emit) {
+		i965_enable_hw_binding_table(intel);
 		i965_bind_surfaces(intel);
 
 		if (INTEL_INFO(intel)->gen >= 060)
@@ -2349,6 +2392,8 @@ void gen4_render_state_init(ScrnInfoPtr scrn)
 		drm_intel_bo_alloc(intel->bufmgr, "surface data",
 				   sizeof(intel->surface_data), 4096);
 	assert(intel->surface_bo);
+	intel->hw_bt_pool_bo = drm_intel_bo_alloc(intel->bufmgr, "hw_bt",
+						  131072, 4096);
 
 	intel->surface_used = 0;
 
@@ -2445,6 +2490,7 @@ void gen4_render_state_cleanup(ScrnInfoPtr scrn)
 	int i, j, k, l, m;
 
 	drm_intel_bo_unreference(intel->surface_bo);
+	drm_intel_bo_unreference(intel->hw_bt_pool_bo);
 	drm_intel_bo_unreference(render_state->vs_state_bo);
 	drm_intel_bo_unreference(render_state->sf_state_bo);
 	drm_intel_bo_unreference(render_state->sf_mask_state_bo);
@@ -2571,9 +2617,13 @@ gen6_composite_create_depth_stencil_state(intel_screen_private *intel)
 	(void)ret;
 }
 
+#define MI_RS_CONTROL                           (0x6 << 23)
+
 static void
 gen6_composite_state_base_address(intel_screen_private *intel)
 {
+	OUT_BATCH(MI_RS_CONTROL | 0x0);
+
 	OUT_BATCH(BRW_STATE_BASE_ADDRESS | (10 - 2));
 	OUT_BATCH(BASE_ADDRESS_MODIFY); /* General state base address */
 	intel->surface_reloc = intel->batch_used;
@@ -2586,6 +2636,8 @@ gen6_composite_state_base_address(intel_screen_private *intel)
 	OUT_BATCH(BASE_ADDRESS_MODIFY); /* Dynamic state upper bound */
 	OUT_BATCH(BASE_ADDRESS_MODIFY); /* Indirect object upper bound */
 	OUT_BATCH(BASE_ADDRESS_MODIFY); /* Instruction access upper bound */
+
+	OUT_BATCH(MI_RS_CONTROL | 0x1);
 }
 
 static void
diff --git a/src/uxa/intel_batchbuffer.c b/src/uxa/intel_batchbuffer.c
index dedf7f8..347413b 100644
--- a/src/uxa/intel_batchbuffer.c
+++ b/src/uxa/intel_batchbuffer.c
@@ -260,13 +260,12 @@ void intel_batch_submit(ScrnInfoPtr scrn)
 	}
 
 	ret = dri_bo_subdata(intel->batch_bo, 0, intel->batch_used*4, intel->batch_ptr);
+	uint32_t flags = HAS_BLT(intel) ? intel->current_batch: I915_EXEC_DEFAULT;
+	flags |= intel->use_resource_streamer ? I915_EXEC_RESOURCE_STREAMER : 0;
 	if (ret == 0) {
 		ret = drm_intel_bo_mrb_exec(intel->batch_bo,
 				intel->batch_used*4,
-				NULL, 0, 0xffffffff,
-				(HAS_BLT(intel) ?
-				 intel->current_batch:
-				 I915_EXEC_DEFAULT));
+				NULL, 0, 0xffffffff, flags);
 	}
 
 	if (ret != 0) {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-22 15:16 [RFC] xf86-video-intel: enable hw-generated binding tables Abdiel Janulgue
  2014-04-22 15:16 ` [RFC PATCH 1/2] xf86-video-intel: Add "ResourceStreamer" option Abdiel Janulgue
  2014-04-22 15:16 ` [RFC PATCH 2/2] xf86-video-intel: Enable hw-generated binding tables for UXA Abdiel Janulgue
@ 2014-04-22 15:23 ` Chris Wilson
  2014-04-22 17:20   ` Daniel Vetter
  2 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2014-04-22 15:23 UTC (permalink / raw)
  To: Abdiel Janulgue; +Cc: intel-gfx

On Tue, Apr 22, 2014 at 06:16:34PM +0300, Abdiel Janulgue wrote:
> This patch series enables resource streamer for xf86-video-intel UXA.
> 
> Fixes i965 Mesa driver that makes use of resource streamer optimization
> to survive a full piglit run without bricking the machine. Previously,
> I get random hungs when doing a full piglit round when enabling RS. 
> After months of head-scratching, I noticed I didn't quite pay attention
> to this small detail in bspec:
> 
>  "The binding table generator feature has a simple all or nothing
>   model. If HW generated binding tables are enabled, the driver must enable
>   the pool and use 3D_HW_BINDING_TABLE_POINTER_* commands."
> 
> I realized that our xf86-video-intel driver is piping out 3D commands
> as well that used the older manual-generated binding table format
> that caused a conflict when Mesa is using the hw-generated binding table 
> format.

This has to be per-context right? Otherwise how do you intend to
coordinate multiple clients submitting to the kernel using incompatible
command sets? Even in the restricted X/DRI sense, how do you intend to
negotiate which to use?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-22 15:23 ` [RFC] xf86-video-intel: enable hw-generated binding tables Chris Wilson
@ 2014-04-22 17:20   ` Daniel Vetter
  2014-04-23 11:21     ` Abdiel Janulgue
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Vetter @ 2014-04-22 17:20 UTC (permalink / raw)
  To: Chris Wilson, Abdiel Janulgue, intel-gfx

On Tue, Apr 22, 2014 at 04:23:12PM +0100, Chris Wilson wrote:
> On Tue, Apr 22, 2014 at 06:16:34PM +0300, Abdiel Janulgue wrote:
> > This patch series enables resource streamer for xf86-video-intel UXA.
> > 
> > Fixes i965 Mesa driver that makes use of resource streamer optimization
> > to survive a full piglit run without bricking the machine. Previously,
> > I get random hungs when doing a full piglit round when enabling RS. 
> > After months of head-scratching, I noticed I didn't quite pay attention
> > to this small detail in bspec:
> > 
> >  "The binding table generator feature has a simple all or nothing
> >   model. If HW generated binding tables are enabled, the driver must enable
> >   the pool and use 3D_HW_BINDING_TABLE_POINTER_* commands."
> > 
> > I realized that our xf86-video-intel driver is piping out 3D commands
> > as well that used the older manual-generated binding table format
> > that caused a conflict when Mesa is using the hw-generated binding table 
> > format.
> 
> This has to be per-context right? Otherwise how do you intend to
> coordinate multiple clients submitting to the kernel using incompatible
> command sets? Even in the restricted X/DRI sense, how do you intend to
> negotiate which to use?

Hm, I've thought that the MI_BB_START should synchronize with the
asynchronous RS parsing/processing? Is there no way to disable RS again
for the next batch in a different context?

If there's no way to solve this with contexts or some manual reset trick
using LRIs then we're pretty decently screwed up. Worst case we need to
stop the gpu and reset it to keep backwards compat :(
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-22 17:20   ` Daniel Vetter
@ 2014-04-23 11:21     ` Abdiel Janulgue
  2014-04-23 16:52       ` Daniel Vetter
  0 siblings, 1 reply; 15+ messages in thread
From: Abdiel Janulgue @ 2014-04-23 11:21 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tuesday, April 22, 2014 07:20:58 PM Daniel Vetter wrote:
> On Tue, Apr 22, 2014 at 04:23:12PM +0100, Chris Wilson wrote:
> > On Tue, Apr 22, 2014 at 06:16:34PM +0300, Abdiel Janulgue wrote:
> > > This patch series enables resource streamer for xf86-video-intel UXA.
> > > 
> > > Fixes i965 Mesa driver that makes use of resource streamer optimization
> > > to survive a full piglit run without bricking the machine. Previously,
> > > I get random hungs when doing a full piglit round when enabling RS.
> > > After months of head-scratching, I noticed I didn't quite pay attention
> > > 
> > > to this small detail in bspec:
> > >  "The binding table generator feature has a simple all or nothing
> > >  
> > >   model. If HW generated binding tables are enabled, the driver must
> > >   enable
> > >   the pool and use 3D_HW_BINDING_TABLE_POINTER_* commands."
> > > 
> > > I realized that our xf86-video-intel driver is piping out 3D commands
> > > as well that used the older manual-generated binding table format
> > > that caused a conflict when Mesa is using the hw-generated binding table
> > > format.
> > 
> > This has to be per-context right? Otherwise how do you intend to
> > coordinate multiple clients submitting to the kernel using incompatible
> > command sets? Even in the restricted X/DRI sense, how do you intend to
> > negotiate which to use?
> 
> Hm, I've thought that the MI_BB_START should synchronize with the
> asynchronous RS parsing/processing? Is there no way to disable RS again
> for the next batch in a different context?

I've already tried disabling RS at the end of every batch so that next batch 
in different context can continue to use older non-RS format. That does not 
work either and still causes hangs. 

What I've seen so far, it seems GPU does not like switching the format of 
commands from RS-format to non-RS format. It's either one way or the other. If 
switched on, it affects subsequent contexes henceforth expecting RS-format 
commands until the GPU gets reset. That's probably the note in bspec:
"the binding table generator feature has a simple all or nothing model".

> 
> If there's no way to solve this with contexts or some manual reset trick
> using LRIs then we're pretty decently screwed up. Worst case we need to
> stop the gpu and reset it to keep backwards compat :(

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-23 11:21     ` Abdiel Janulgue
@ 2014-04-23 16:52       ` Daniel Vetter
  2014-04-23 17:50         ` Ville Syrjälä
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Vetter @ 2014-04-23 16:52 UTC (permalink / raw)
  To: Abdiel Janulgue; +Cc: intel-gfx

On Wed, Apr 23, 2014 at 1:21 PM, Abdiel Janulgue
<abdiel.janulgue@linux.intel.com> wrote:
> I've already tried disabling RS at the end of every batch so that next batch
> in different context can continue to use older non-RS format. That does not
> work either and still causes hangs.
>
> What I've seen so far, it seems GPU does not like switching the format of
> commands from RS-format to non-RS format. It's either one way or the other. If
> switched on, it affects subsequent contexes henceforth expecting RS-format
> commands until the GPU gets reset. That's probably the note in bspec:
>
> "the binding table generator feature has a simple all or nothing model".

Oh hooray, that's just awesome :( So we indeed need to stop the gpu
and reset it if there's a non-RS render batch after any RS render
batch.

Which also means that we need to enable this for _all_ userspace to
avoid completely disastrous performance. So uxa, sna, libva, maybe
opencl ...

I guess before we engage in this endeavor we need to track this down
with the hardware people.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-23 16:52       ` Daniel Vetter
@ 2014-04-23 17:50         ` Ville Syrjälä
  2014-04-24  6:08           ` Abdiel Janulgue
  0 siblings, 1 reply; 15+ messages in thread
From: Ville Syrjälä @ 2014-04-23 17:50 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Apr 23, 2014 at 06:52:09PM +0200, Daniel Vetter wrote:
> On Wed, Apr 23, 2014 at 1:21 PM, Abdiel Janulgue
> <abdiel.janulgue@linux.intel.com> wrote:
> > I've already tried disabling RS at the end of every batch so that next batch
> > in different context can continue to use older non-RS format. That does not
> > work either and still causes hangs.
> >
> > What I've seen so far, it seems GPU does not like switching the format of
> > commands from RS-format to non-RS format. It's either one way or the other. If
> > switched on, it affects subsequent contexes henceforth expecting RS-format
> > commands until the GPU gets reset. That's probably the note in bspec:
> >
> > "the binding table generator feature has a simple all or nothing model".
> 
> Oh hooray, that's just awesome :( So we indeed need to stop the gpu
> and reset it if there's a non-RS render batch after any RS render
> batch.

I'm not sure I buy it. There's an example in the spec showing how to
disable the RS around eg. GPGPU workloads and re-enable it for 3D
workloads even within one batch. I guess it's possible that the GPGPU
pipeline mode is somehow special here, but since the RS state is
supposed to be saved to the context I'm thinking you should be able to
disable it whenever you want.

What's really confusing with that example is the fact that it first
re-enables the binding tables and then the RS, but then there's a note
after that saying you have to those steps in the opposite order.

Anyhoo, I'm wondering if the problems are simply due to disabling RS but
leaving the binding tables enabled?

So one idea might be that when we switch from executing an RS batch to a
non-RS batch, we should disable the RS and binding tables manually. We
would then have to demand that any user batch which needs the binding
tables enabled must enable them, even if if we just executed a batch that
already used the binding tables. And when we need to disable the RS and
binding tables we could either have the kernel do that autmagically when
it notices a RS->non-RS transition, or we could also demand that all user
batches must disable the binding tables at the end. But maybe demanding
that from every batch would incur some extra overhead? In any case it
should be fairly easy to try that and see if it cures the hangs. Or are
you already doing things that way?

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-24  6:08           ` Abdiel Janulgue
@ 2014-04-24  6:06             ` Chris Wilson
  2014-04-24  8:25               ` Abdiel Janulgue
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2014-04-24  6:06 UTC (permalink / raw)
  To: Abdiel Janulgue; +Cc: intel-gfx

On Thu, Apr 24, 2014 at 09:08:14AM +0300, Abdiel Janulgue wrote:
> Anyway I haven't tried the work-around where we explictly only disable the BT 
> and RS on the other user-space clients (xorg driver in this case) when Mesa is 
> using RS instead of forcing the reset of the clients to use RS format.  I'll 
> try that first and let you know if it works. If it does, it might be more 
> efficient to do that in the kernel?

It has to be done in the kernel in order for interoperability with third
party clients.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-23 17:50         ` Ville Syrjälä
@ 2014-04-24  6:08           ` Abdiel Janulgue
  2014-04-24  6:06             ` Chris Wilson
  0 siblings, 1 reply; 15+ messages in thread
From: Abdiel Janulgue @ 2014-04-24  6:08 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Wednesday, April 23, 2014 08:50:21 PM Ville Syrjälä wrote:
> On Wed, Apr 23, 2014 at 06:52:09PM +0200, Daniel Vetter wrote:
> > On Wed, Apr 23, 2014 at 1:21 PM, Abdiel Janulgue
> > 
> > <abdiel.janulgue@linux.intel.com> wrote:
> > > I've already tried disabling RS at the end of every batch so that next
> > > batch in different context can continue to use older non-RS format.
> > > That does not work either and still causes hangs.
> > > 
> > > What I've seen so far, it seems GPU does not like switching the format
> > > of
> > > commands from RS-format to non-RS format. It's either one way or the
> > > other. If switched on, it affects subsequent contexes henceforth
> > > expecting RS-format commands until the GPU gets reset. That's probably
> > > the note in bspec:
> > > 
> > > "the binding table generator feature has a simple all or nothing model".
> > 
> > Oh hooray, that's just awesome :( So we indeed need to stop the gpu
> > and reset it if there's a non-RS render batch after any RS render
> > batch.
> 
> I'm not sure I buy it. There's an example in the spec showing how to
> disable the RS around eg. GPGPU workloads and re-enable it for 3D
> workloads even within one batch. I guess it's possible that the GPGPU
> pipeline mode is somehow special here, but since the RS state is
> supposed to be saved to the context I'm thinking you should be able to
> disable it whenever you want.
> 
> What's really confusing with that example is the fact that it first
> re-enables the binding tables and then the RS, but then there's a note
> after that saying you have to those steps in the opposite order.
> 
> Anyhoo, I'm wondering if the problems are simply due to disabling RS but
> leaving the binding tables enabled?

Yes, the work-around I tried previously is that at the end of each batch both 
hw-binding tables and RS are disabled in that particular order.

> 
> So one idea might be that when we switch from executing an RS batch to a
> non-RS batch, we should disable the RS and binding tables manually. We
> would then have to demand that any user batch which needs the binding
> tables enabled must enable them, even if if we just executed a batch that
> already used the binding tables. And when we need to disable the RS and
> binding tables we could either have the kernel do that autmagically when
> it notices a RS->non-RS transition, or we could also demand that all user
> batches must disable the binding tables at the end. But maybe demanding
> that from every batch would incur some extra overhead? In any case it
> should be fairly easy to try that and see if it cures the hangs. Or are
> you already doing things that way?

This is actually what current RS implementation in Mesa does. It explicitly 
enables RS then hw-binding tables for each batch start and then the tear-down 
work-around at the batch end I mentioned  above; which didn't help at all.

Anyway I haven't tried the work-around where we explictly only disable the BT 
and RS on the other user-space clients (xorg driver in this case) when Mesa is 
using RS instead of forcing the reset of the clients to use RS format.  I'll 
try that first and let you know if it works. If it does, it might be more 
efficient to do that in the kernel?

--Abdiel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-24  6:06             ` Chris Wilson
@ 2014-04-24  8:25               ` Abdiel Janulgue
  2014-04-24 10:04                 ` Daniel Vetter
  2014-04-24 10:17                 ` Ville Syrjälä
  0 siblings, 2 replies; 15+ messages in thread
From: Abdiel Janulgue @ 2014-04-24  8:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Thursday, April 24, 2014 07:06:34 AM Chris Wilson wrote:
> On Thu, Apr 24, 2014 at 09:08:14AM +0300, Abdiel Janulgue wrote:
> > Anyway I haven't tried the work-around where we explictly only disable the
> > BT and RS on the other user-space clients (xorg driver in this case) when
> > Mesa is using RS instead of forcing the reset of the clients to use RS
> > format.  I'll try that first and let you know if it works. 

I hate to break the bad news. Tried this just now - still get hangs :(

So I guess, all userspace clients* does need to use RS-format if we use this 
feature. GPGPU workloads seems to be special use-case where the RS hwbinding 
table format can be disabled. Otherwise, I guess we are stuck with this 
inflexibility.

[1]
On the other hand, it doesn't seem all that bad though. The RS hw-binding 
table format are only needed for clients that submit vertex and pixel shader 
commands. I've identified currently just UXA and SNA that seem to use this 
besides Mesa. OpenCL is not affected.


> > If it does, it
> > might be more efficient to do that in the kernel?
> 
> It has to be done in the kernel in order for interoperability with third
> party clients.
> -Chris

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-24  8:25               ` Abdiel Janulgue
@ 2014-04-24 10:04                 ` Daniel Vetter
  2014-04-24 10:17                 ` Ville Syrjälä
  1 sibling, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2014-04-24 10:04 UTC (permalink / raw)
  To: Abdiel Janulgue; +Cc: intel-gfx

On Thu, Apr 24, 2014 at 11:25:15AM +0300, Abdiel Janulgue wrote:
> On Thursday, April 24, 2014 07:06:34 AM Chris Wilson wrote:
> > On Thu, Apr 24, 2014 at 09:08:14AM +0300, Abdiel Janulgue wrote:
> > > Anyway I haven't tried the work-around where we explictly only disable the
> > > BT and RS on the other user-space clients (xorg driver in this case) when
> > > Mesa is using RS instead of forcing the reset of the clients to use RS
> > > format.  I'll try that first and let you know if it works. 
> 
> I hate to break the bad news. Tried this just now - still get hangs :(
> 
> So I guess, all userspace clients* does need to use RS-format if we use this 
> feature. GPGPU workloads seems to be special use-case where the RS hwbinding 
> table format can be disabled. Otherwise, I guess we are stuck with this 
> inflexibility.
> 
> [1]
> On the other hand, it doesn't seem all that bad though. The RS hw-binding 
> table format are only needed for clients that submit vertex and pixel shader 
> commands. I've identified currently just UXA and SNA that seem to use this 
> besides Mesa. OpenCL is not affected.

libva? rendercopy in igt? This kind of abi break is really ugly :(
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-24  8:25               ` Abdiel Janulgue
  2014-04-24 10:04                 ` Daniel Vetter
@ 2014-04-24 10:17                 ` Ville Syrjälä
  2014-04-24 11:22                   ` Abdiel Janulgue
  2014-05-02 18:24                   ` Abdiel Janulgue
  1 sibling, 2 replies; 15+ messages in thread
From: Ville Syrjälä @ 2014-04-24 10:17 UTC (permalink / raw)
  To: Abdiel Janulgue; +Cc: intel-gfx

On Thu, Apr 24, 2014 at 11:25:15AM +0300, Abdiel Janulgue wrote:
> On Thursday, April 24, 2014 07:06:34 AM Chris Wilson wrote:
> > On Thu, Apr 24, 2014 at 09:08:14AM +0300, Abdiel Janulgue wrote:
> > > Anyway I haven't tried the work-around where we explictly only disable the
> > > BT and RS on the other user-space clients (xorg driver in this case) when
> > > Mesa is using RS instead of forcing the reset of the clients to use RS
> > > format.  I'll try that first and let you know if it works. 
> 
> I hate to break the bad news. Tried this just now - still get hangs :(
> 
> So I guess, all userspace clients* does need to use RS-format if we use this 
> feature. GPGPU workloads seems to be special use-case where the RS hwbinding 
> table format can be disabled. Otherwise, I guess we are stuck with this 
> inflexibility.

The spec also says this:
"When switching between HW and SW binding table generation,
SW must issue a state cache invalidate."

So it does look like they were expecting people to switch between the
two modes.

Do you have any igt test for RS? Maybe add an option into rendercopy to
make use of RS? Then we could write some tests that try to hit this
problem w/o requiring Mesa.

> 
> [1]
> On the other hand, it doesn't seem all that bad though. The RS hw-binding 
> table format are only needed for clients that submit vertex and pixel shader 
> commands. I've identified currently just UXA and SNA that seem to use this 
> besides Mesa. OpenCL is not affected.
> 
> 
> > > If it does, it
> > > might be more efficient to do that in the kernel?
> > 
> > It has to be done in the kernel in order for interoperability with third
> > party clients.
> > -Chris

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-24 10:17                 ` Ville Syrjälä
@ 2014-04-24 11:22                   ` Abdiel Janulgue
  2014-05-02 18:24                   ` Abdiel Janulgue
  1 sibling, 0 replies; 15+ messages in thread
From: Abdiel Janulgue @ 2014-04-24 11:22 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Thursday, April 24, 2014 01:17:14 PM Ville Syrjälä wrote:
> On Thu, Apr 24, 2014 at 11:25:15AM +0300, Abdiel Janulgue wrote:
> > On Thursday, April 24, 2014 07:06:34 AM Chris Wilson wrote:
> > > On Thu, Apr 24, 2014 at 09:08:14AM +0300, Abdiel Janulgue wrote:
> > > > Anyway I haven't tried the work-around where we explictly only disable
> > > > the
> > > > BT and RS on the other user-space clients (xorg driver in this case)
> > > > when
> > > > Mesa is using RS instead of forcing the reset of the clients to use RS
> > > > format.  I'll try that first and let you know if it works.
> > 
> > I hate to break the bad news. Tried this just now - still get hangs :(
> > 
> > So I guess, all userspace clients* does need to use RS-format if we use
> > this feature. GPGPU workloads seems to be special use-case where the RS
> > hwbinding table format can be disabled. Otherwise, I guess we are stuck
> > with this inflexibility.
> 
> The spec also says this:
> "When switching between HW and SW binding table generation,
> SW must issue a state cache invalidate."
> 
> So it does look like they were expecting people to switch between the
> two modes.

Yep, the previous work-around did include a state cache invalidate.

> 
> Do you have any igt test for RS? Maybe add an option into rendercopy to
> make use of RS? Then we could write some tests that try to hit this
> problem w/o requiring Mesa.

Seems to be a good idea. I'll try to reproduce the problem with igt


> 
> > [1]
> > On the other hand, it doesn't seem all that bad though. The RS hw-binding
> > table format are only needed for clients that submit vertex and pixel
> > shader commands. I've identified currently just UXA and SNA that seem to
> > use this besides Mesa. OpenCL is not affected.
> > 
> > > > If it does, it
> > > > might be more efficient to do that in the kernel?
> > > 
> > > It has to be done in the kernel in order for interoperability with third
> > > party clients.
> > > -Chris

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] xf86-video-intel: enable hw-generated binding tables
  2014-04-24 10:17                 ` Ville Syrjälä
  2014-04-24 11:22                   ` Abdiel Janulgue
@ 2014-05-02 18:24                   ` Abdiel Janulgue
  1 sibling, 0 replies; 15+ messages in thread
From: Abdiel Janulgue @ 2014-05-02 18:24 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

Hi all!

On Thursday, April 24, 2014 01:17:14 PM Ville Syrjälä wrote:
> On Thu, Apr 24, 2014 at 11:25:15AM +0300, Abdiel Janulgue wrote:
> > On Thursday, April 24, 2014 07:06:34 AM Chris Wilson wrote:
> > > On Thu, Apr 24, 2014 at 09:08:14AM +0300, Abdiel Janulgue wrote:
> > > > Anyway I haven't tried the work-around where we explictly only disable
> > > > the
> > > > BT and RS on the other user-space clients (xorg driver in this case)
> > > > when
> > > > Mesa is using RS instead of forcing the reset of the clients to use RS
> > > > format.  I'll try that first and let you know if it works.
> > 
> > I hate to break the bad news. Tried this just now - still get hangs :(
> > 
> > So I guess, all userspace clients* does need to use RS-format if we use
> > this feature. GPGPU workloads seems to be special use-case where the RS
> > hwbinding table format can be disabled. Otherwise, I guess we are stuck
> > with this inflexibility.
> 
> The spec also says this:
> "When switching between HW and SW binding table generation,
> SW must issue a state cache invalidate."
> 
> So it does look like they were expecting people to switch between the
> two modes.
> 
> Do you have any igt test for RS? Maybe add an option into rendercopy to
> make use of RS? Then we could write some tests that try to hit this
> problem w/o requiring Mesa.

I've modified igt rendercopy to test switching RS format on and off.

Good news and bad news!

Good news is that switching between RS binding tables and SW format
works seamlessly in BDW! Bad news for HSW because the problems I describe 
above only happens on that chip.

This is a HW issue then.

=Abdiel


> 
> > [1]
> > On the other hand, it doesn't seem all that bad though. The RS hw-binding
> > table format are only needed for clients that submit vertex and pixel
> > shader commands. I've identified currently just UXA and SNA that seem to
> > use this besides Mesa. OpenCL is not affected.
> > 
> > > > If it does, it
> > > > might be more efficient to do that in the kernel?
> > > 
> > > It has to be done in the kernel in order for interoperability with third
> > > party clients.
> > > -Chris

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-05-02 18:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-22 15:16 [RFC] xf86-video-intel: enable hw-generated binding tables Abdiel Janulgue
2014-04-22 15:16 ` [RFC PATCH 1/2] xf86-video-intel: Add "ResourceStreamer" option Abdiel Janulgue
2014-04-22 15:16 ` [RFC PATCH 2/2] xf86-video-intel: Enable hw-generated binding tables for UXA Abdiel Janulgue
2014-04-22 15:23 ` [RFC] xf86-video-intel: enable hw-generated binding tables Chris Wilson
2014-04-22 17:20   ` Daniel Vetter
2014-04-23 11:21     ` Abdiel Janulgue
2014-04-23 16:52       ` Daniel Vetter
2014-04-23 17:50         ` Ville Syrjälä
2014-04-24  6:08           ` Abdiel Janulgue
2014-04-24  6:06             ` Chris Wilson
2014-04-24  8:25               ` Abdiel Janulgue
2014-04-24 10:04                 ` Daniel Vetter
2014-04-24 10:17                 ` Ville Syrjälä
2014-04-24 11:22                   ` Abdiel Janulgue
2014-05-02 18:24                   ` Abdiel Janulgue

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.