All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Fixed VS thread distribution between slices
@ 2012-04-10  4:10 Ben Widawsky
  2012-04-10 18:38 ` Kenneth Graunke
  2012-04-10 18:39 ` Eugeni Dodonov
  0 siblings, 2 replies; 4+ messages in thread
From: Ben Widawsky @ 2012-04-10  4:10 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, bernard.r.kilarski, Ben Widawsky

From: "bernard.r.kilarski" <your-user-name@intel.com>

Full disclosure: my IVB hangs on nexuiz both before, and after this patch, and
I haven't done any debug

This patch was given to me by Bernard, by way of Daniel. The patch came
with very little description, and I haven't really spent too much time
in the spec to make much sense out of it.

The improvement (IVB only) with my nexuiz configuration is almost
unbelievable. It has a more humble improvement on OpenArena. I'll throw
other tests at it, and try to debug the hangs a bit, but given the
nexuiz results, I was compelled to post this sooner, rather than later.

I see about a 2% improvement on OA
and about a 40$% improvement on nexuiz

CC: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h      |    3 +++
 drivers/gpu/drm/i915/intel_display.c |    3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index cb55444..755bdff 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -682,6 +682,9 @@
 
 #define GEN6_BSD_RNCID			0x12198
 
+#define GEN7_FF_THREAD_MODE		0x20a0
+#define GEN7_FF_THREAD_SIMPLE_SCHED	0x28a00010
+
 /*
  * Framebuffer compression (915+ only)
  */
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 6ec81b4..bbd5f46 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -8893,6 +8893,9 @@ static void ivybridge_init_clock_gating(struct drm_device *dev)
 			   DISPPLANE_TRICKLE_FEED_DISABLE);
 		intel_flush_display_plane(dev_priv, pipe);
 	}
+
+	/* Fix vertex load balancing */
+	I915_WRITE(GEN7_FF_THREAD_MODE,GEN7_FF_THREAD_SIMPLE_SCHED);
 }
 
 static void valleyview_init_clock_gating(struct drm_device *dev)
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/i915: Fixed VS thread distribution between slices
  2012-04-10  4:10 [PATCH] drm/i915: Fixed VS thread distribution between slices Ben Widawsky
@ 2012-04-10 18:38 ` Kenneth Graunke
  2012-04-10 21:08   ` Ben Widawsky
  2012-04-10 18:39 ` Eugeni Dodonov
  1 sibling, 1 reply; 4+ messages in thread
From: Kenneth Graunke @ 2012-04-10 18:38 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx

On 04/09/2012 09:10 PM, Ben Widawsky wrote:
> From: "bernard.r.kilarski"<your-user-name@intel.com>
>
> Full disclosure: my IVB hangs on nexuiz both before, and after this patch, and
> I haven't done any debug
>
> This patch was given to me by Bernard, by way of Daniel. The patch came
> with very little description, and I haven't really spent too much time
> in the spec to make much sense out of it.
>
> The improvement (IVB only) with my nexuiz configuration is almost
> unbelievable. It has a more humble improvement on OpenArena. I'll throw
> other tests at it, and try to debug the hangs a bit, but given the
> nexuiz results, I was compelled to post this sooner, rather than later.
>
> I see about a 2% improvement on OA
> and about a 40$% improvement on nexuiz
>
> CC: Daniel Vetter<daniel.vetter@ffwll.ch>
> Signed-off-by: Ben Widawsky<benjamin.widawsky@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_reg.h      |    3 +++
>   drivers/gpu/drm/i915/intel_display.c |    3 +++
>   2 files changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index cb55444..755bdff 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -682,6 +682,9 @@
>
>   #define GEN6_BSD_RNCID			0x12198
>
> +#define GEN7_FF_THREAD_MODE		0x20a0
> +#define GEN7_FF_THREAD_SIMPLE_SCHED	0x28a00010
> +
>   /*
>    * Framebuffer compression (915+ only)
>    */
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 6ec81b4..bbd5f46 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -8893,6 +8893,9 @@ static void ivybridge_init_clock_gating(struct drm_device *dev)
>   			   DISPPLANE_TRICKLE_FEED_DISABLE);
>   		intel_flush_display_plane(dev_priv, pipe);
>   	}
> +
> +	/* Fix vertex load balancing */
> +	I915_WRITE(GEN7_FF_THREAD_MODE,GEN7_FF_THREAD_SIMPLE_SCHED);
>   }
>
>   static void valleyview_init_clock_gating(struct drm_device *dev)

I definitely approve of this patch.  It appears to make the hardware use 
a simple scheduler, as opposed to a complex one which is default.

The default scheduler has this lovely comment: "Note: this will cause 
possible corruption if input handles are reused due to instancing or 
topologies that reuse vertices (i.e. strips and fans)".

That is frightening.  We use instancing, tristrips, and trifans all the 
time.  While I haven't observed a problem, I don't want this to come 
back and bite us later.  The other scheduling options in VS Thread 
Dispatch Mode are useless.  Simple scheduling appears to be the only 
option that's safe.  And apparently it performs well too :)

So, as is, I think this qualifies for a:

Cc: stable@kernel.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

However!  I would like to be more paranoid and program it to 0x28a00000.
In other words, the IVB default minus bit 4 and bit 12:

     0x28a01010 & ~(1 << 4 | 1 << 12)

Bit 12 is the VS Thread Dispatch Override, which Bernard's patch 
cleared.  Bit 4 is the equivalent bit for DS (Domain Shaders).  It also 
has the scary comment about instancing and strips/fans not working.

Likewise, bit 16 also should be off (but is already off by default).

Since domain shaders are an OpenGL 4.0 feature, this shouldn't matter in 
the short term...but I doubt any of us will remember to come back here 
and change these bits when we eventually need them.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/i915: Fixed VS thread distribution between slices
  2012-04-10  4:10 [PATCH] drm/i915: Fixed VS thread distribution between slices Ben Widawsky
  2012-04-10 18:38 ` Kenneth Graunke
@ 2012-04-10 18:39 ` Eugeni Dodonov
  1 sibling, 0 replies; 4+ messages in thread
From: Eugeni Dodonov @ 2012-04-10 18:39 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx, Ben Widawsky


[-- Attachment #1.1: Type: text/plain, Size: 478 bytes --]

On Tue, Apr 10, 2012 at 01:10, Ben Widawsky <ben@bwidawsk.net> wrote:

> +#define GEN7_FF_THREAD_MODE            0x20a0

+#define GEN7_FF_THREAD_SIMPLE_SCHED    0x28a00010
>

Would it be too much asking you to check with the 0x28a00000 value as well
please, and with a patch which only cleans the bit 12 and 4?

I think we don't need to write the entire pre-defined message here, only
those 2 bits seem to matter in this case..

-- 
Eugeni Dodonov
 <http://eugeni.dodonov.net/>

[-- Attachment #1.2: Type: text/html, Size: 924 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/i915: Fixed VS thread distribution between slices
  2012-04-10 18:38 ` Kenneth Graunke
@ 2012-04-10 21:08   ` Ben Widawsky
  0 siblings, 0 replies; 4+ messages in thread
From: Ben Widawsky @ 2012-04-10 21:08 UTC (permalink / raw)
  To: Kenneth Graunke; +Cc: Daniel Vetter, intel-gfx

On Tue, Apr 10, 2012 at 11:38:32AM -0700, Kenneth Graunke wrote:
> On 04/09/2012 09:10 PM, Ben Widawsky wrote:
> >From: "bernard.r.kilarski"<your-user-name@intel.com>
> >
> >Full disclosure: my IVB hangs on nexuiz both before, and after this patch, and
> >I haven't done any debug
> >
> >This patch was given to me by Bernard, by way of Daniel. The patch came
> >with very little description, and I haven't really spent too much time
> >in the spec to make much sense out of it.
> >
> >The improvement (IVB only) with my nexuiz configuration is almost
> >unbelievable. It has a more humble improvement on OpenArena. I'll throw
> >other tests at it, and try to debug the hangs a bit, but given the
> >nexuiz results, I was compelled to post this sooner, rather than later.
> >
> >I see about a 2% improvement on OA
> >and about a 40$% improvement on nexuiz
> >
> >CC: Daniel Vetter<daniel.vetter@ffwll.ch>
> >Signed-off-by: Ben Widawsky<benjamin.widawsky@intel.com>
> >---
> >  drivers/gpu/drm/i915/i915_reg.h      |    3 +++
> >  drivers/gpu/drm/i915/intel_display.c |    3 +++
> >  2 files changed, 6 insertions(+)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> >index cb55444..755bdff 100644
> >--- a/drivers/gpu/drm/i915/i915_reg.h
> >+++ b/drivers/gpu/drm/i915/i915_reg.h
> >@@ -682,6 +682,9 @@
> >
> >  #define GEN6_BSD_RNCID			0x12198
> >
> >+#define GEN7_FF_THREAD_MODE		0x20a0
> >+#define GEN7_FF_THREAD_SIMPLE_SCHED	0x28a00010
> >+
> >  /*
> >   * Framebuffer compression (915+ only)
> >   */
> >diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> >index 6ec81b4..bbd5f46 100644
> >--- a/drivers/gpu/drm/i915/intel_display.c
> >+++ b/drivers/gpu/drm/i915/intel_display.c
> >@@ -8893,6 +8893,9 @@ static void ivybridge_init_clock_gating(struct drm_device *dev)
> >  			   DISPPLANE_TRICKLE_FEED_DISABLE);
> >  		intel_flush_display_plane(dev_priv, pipe);
> >  	}
> >+
> >+	/* Fix vertex load balancing */
> >+	I915_WRITE(GEN7_FF_THREAD_MODE,GEN7_FF_THREAD_SIMPLE_SCHED);
> >  }
> >
> >  static void valleyview_init_clock_gating(struct drm_device *dev)
> 
> I definitely approve of this patch.  It appears to make the hardware
> use a simple scheduler, as opposed to a complex one which is
> default.
> 
> The default scheduler has this lovely comment: "Note: this will
> cause possible corruption if input handles are reused due to
> instancing or topologies that reuse vertices (i.e. strips and
> fans)".
> 
> That is frightening.  We use instancing, tristrips, and trifans all
> the time.  While I haven't observed a problem, I don't want this to
> come back and bite us later.  The other scheduling options in VS
> Thread Dispatch Mode are useless.  Simple scheduling appears to be
> the only option that's safe.  And apparently it performs well too :)
> 
> So, as is, I think this qualifies for a:
> 
> Cc: stable@kernel.org
> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
> 
> However!  I would like to be more paranoid and program it to 0x28a00000.
> In other words, the IVB default minus bit 4 and bit 12:
> 
>     0x28a01010 & ~(1 << 4 | 1 << 12)
> 
> Bit 12 is the VS Thread Dispatch Override, which Bernard's patch
> cleared.  Bit 4 is the equivalent bit for DS (Domain Shaders).  It
> also has the scary comment about instancing and strips/fans not
> working.
> 
> Likewise, bit 16 also should be off (but is already off by default).
> 
> Since domain shaders are an OpenGL 4.0 feature, this shouldn't
> matter in the short term...but I doubt any of us will remember to
> come back here and change these bits when we eventually need them.


After moving to -queued per Daniel's recommendation, something goes
seriously wrong with the patch, and I think it will require some debug.

With -queued I have no issues anymore, and with the patch *something*
hangs every time. No netconsole atm.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-04-10 21:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-10  4:10 [PATCH] drm/i915: Fixed VS thread distribution between slices Ben Widawsky
2012-04-10 18:38 ` Kenneth Graunke
2012-04-10 21:08   ` Ben Widawsky
2012-04-10 18:39 ` Eugeni Dodonov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.