All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
@ 2017-01-08  7:31 Francisco Jerez
  2017-01-08 11:57 ` kbuild test robot
  2017-01-09 21:07 ` [PATCHv2] " Francisco Jerez
  0 siblings, 2 replies; 18+ messages in thread
From: Francisco Jerez @ 2017-01-08  7:31 UTC (permalink / raw)
  To: intel-gfx; +Cc: Jani Nikula, Eero Tamminen, beignet, Mika Kuoppala

The WaDisableLSQCROPERFforOCL workaround has the side effect of
disabling an L3SQ optimization that has huge performance implications
and is unlikely to be necessary for the correct functioning of usual
graphic workloads.  Userspace is free to re-enable the workaround on
demand, and is generally in a better position to determine whether the
workaround is necessary than the DRM is (e.g. only during the
execution of compute kernels that rely on both L3 fences and HDC R/W
requests).

The same workaround seems to apply to BDW (at least to production
stepping G1) and SKL as well (the internal workaround database claims
that it does for all steppings, while the BSpec workaround table only
mentions pre-production steppings), but the DRM doesn't do anything
beyond whitelisting the L3SQCREG4 register so userspace can enable it
when it sees fit.  Do the same on KBL platforms.

Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
This is followed by a regression of 35% and 10% respectively for the
same benchmarks and platform caused by my recent patch series
switching userspace to use the dataport constant cache instead of the
sampler to implement uniform pull constant loads, which caused us to
hit more heavily the L3 cache (and on platforms other than KBL had the
opposite effect of improving performance of the same two benchmarks).
The overall effect on KBL of this change combined with the recent
userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
was affected by the constant cache changes (though it improved as it
did on other platforms rather than regressing), but is not
significantly affected by this patch (with statistical significance of
5% and sample size 20).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: beignet@lists.freedesktop.org
---
 drivers/gpu/drm/i915/intel_lrc.c        | 9 ---------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 8 --------
 2 files changed, 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6db246a..b1ac74e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -973,15 +973,6 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
 	struct drm_i915_private *dev_priv = engine->i915;
 	uint32_t l3sqc4_flush = (0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES);
 
-	/*
-	 * WaDisableLSQCROPERFforOCL:kbl
-	 * This WA is implemented in skl_init_clock_gating() but since
-	 * this batch updates GEN8_L3SQCREG4 with default value we need to
-	 * set this bit here to retain the WA during flush.
-	 */
-	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
-		l3sqc4_flush |= GEN8_LQSC_RO_PERF_DIS;
-
 	wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |
 				   MI_SRM_LRM_GLOBAL_GTT));
 	wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 0971ac3..7cb2ab4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1095,14 +1095,6 @@ static int kbl_init_workarounds(struct intel_engine_cs *engine)
 		WA_SET_BIT_MASKED(HDC_CHICKEN0,
 				  HDC_FENCE_DEST_SLM_DISABLE);
 
-	/* GEN8_L3SQCREG4 has a dependency with WA batch so any new changes
-	 * involving this register should also be added to WA batch as required.
-	 */
-	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
-		/* WaDisableLSQCROPERFforOCL:kbl */
-		I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) |
-			   GEN8_LQSC_RO_PERF_DIS);
-
 	/* WaToEnableHwFixForPushConstHWBug:kbl */
 	if (IS_KBL_REVID(dev_priv, KBL_REVID_C0, REVID_FOREVER))
 		WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
-- 
2.10.2

_______________________________________________
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-08  7:31 [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround Francisco Jerez
@ 2017-01-08 11:57 ` kbuild test robot
  2017-01-09 21:07 ` [PATCHv2] " Francisco Jerez
  1 sibling, 0 replies; 18+ messages in thread
From: kbuild test robot @ 2017-01-08 11:57 UTC (permalink / raw)
  To: Francisco Jerez
  Cc: Jani Nikula, intel-gfx, beignet, kbuild-all, Eero Tamminen,
	Mika Kuoppala

[-- Attachment #1: Type: text/plain, Size: 2172 bytes --]

Hi Francisco,

[auto build test ERROR on drm-intel/for-linux-next]
[also build test ERROR on v4.10-rc2 next-20170106]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Francisco-Jerez/drm-i915-Remove-WaDisableLSQCROPERFforOCL-KBL-workaround/20170108-193533
base:   git://anongit.freedesktop.org/drm-intel for-linux-next
config: i386-randconfig-x012-201702 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/gpu/drm/i915/intel_lrc.c: In function 'gen8_emit_flush_coherentl3_wa':
>> drivers/gpu/drm/i915/intel_lrc.c:972:27: error: unused variable 'dev_priv' [-Werror=unused-variable]
     struct drm_i915_private *dev_priv = engine->i915;
                              ^~~~~~~~
   cc1: all warnings being treated as errors

vim +/dev_priv +972 drivers/gpu/drm/i915/intel_lrc.c

9e000847 Arun Siluvery  2015-07-03  966   * code duplication.
9e000847 Arun Siluvery  2015-07-03  967   */
0bc40be8 Tvrtko Ursulin 2016-03-16  968  static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
6e5248b5 Daniel Vetter  2016-07-15  969  						uint32_t *batch,
9e000847 Arun Siluvery  2015-07-03  970  						uint32_t index)
9e000847 Arun Siluvery  2015-07-03  971  {
5e580523 Dave Airlie    2016-07-26 @972  	struct drm_i915_private *dev_priv = engine->i915;
9e000847 Arun Siluvery  2015-07-03  973  	uint32_t l3sqc4_flush = (0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES);
9e000847 Arun Siluvery  2015-07-03  974  
f1afe24f Arun Siluvery  2015-08-04  975  	wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |

:::::: The code at line 972 was first introduced by commit
:::::: 5e580523d9128a4d8364fe89d36c38fc7819c8dd Backmerge tag 'v4.7' into drm-next

:::::: TO: Dave Airlie <airlied@redhat.com>
:::::: CC: Dave Airlie <airlied@redhat.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25598 bytes --]

[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-08  7:31 [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround Francisco Jerez
  2017-01-08 11:57 ` kbuild test robot
@ 2017-01-09 21:07 ` Francisco Jerez
  2017-01-11  8:17   ` [Intel-gfx] " Daniel Vetter
  1 sibling, 1 reply; 18+ messages in thread
From: Francisco Jerez @ 2017-01-09 21:07 UTC (permalink / raw)
  To: intel-gfx; +Cc: Jani Nikula, Eero Tamminen, beignet, Mika Kuoppala

The WaDisableLSQCROPERFforOCL workaround has the side effect of
disabling an L3SQ optimization that has huge performance implications
and is unlikely to be necessary for the correct functioning of usual
graphic workloads.  Userspace is free to re-enable the workaround on
demand, and is generally in a better position to determine whether the
workaround is necessary than the DRM is (e.g. only during the
execution of compute kernels that rely on both L3 fences and HDC R/W
requests).

The same workaround seems to apply to BDW (at least to production
stepping G1) and SKL as well (the internal workaround database claims
that it does for all steppings, while the BSpec workaround table only
mentions pre-production steppings), but the DRM doesn't do anything
beyond whitelisting the L3SQCREG4 register so userspace can enable it
when it sees fit.  Do the same on KBL platforms.

Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
This is followed by a regression of 35% and 10% respectively for the
same benchmarks and platform caused by my recent patch series
switching userspace to use the dataport constant cache instead of the
sampler to implement uniform pull constant loads, which caused us to
hit more heavily the L3 cache (and on platforms other than KBL had the
opposite effect of improving performance of the same two benchmarks).
The overall effect on KBL of this change combined with the recent
userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
was affected by the constant cache changes (though it improved as it
did on other platforms rather than regressing), but is not
significantly affected by this patch (with statistical significance of
5% and sample size 20).

v2: Drop some more code to avoid unused variable warning.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: beignet@lists.freedesktop.org
---
 drivers/gpu/drm/i915/intel_lrc.c        | 10 ----------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  8 --------
 2 files changed, 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6db246a..656e0a3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -970,18 +970,8 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
 						uint32_t *batch,
 						uint32_t index)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
 	uint32_t l3sqc4_flush = (0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES);
 
-	/*
-	 * WaDisableLSQCROPERFforOCL:kbl
-	 * This WA is implemented in skl_init_clock_gating() but since
-	 * this batch updates GEN8_L3SQCREG4 with default value we need to
-	 * set this bit here to retain the WA during flush.
-	 */
-	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
-		l3sqc4_flush |= GEN8_LQSC_RO_PERF_DIS;
-
 	wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |
 				   MI_SRM_LRM_GLOBAL_GTT));
 	wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 0971ac3..7cb2ab4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1095,14 +1095,6 @@ static int kbl_init_workarounds(struct intel_engine_cs *engine)
 		WA_SET_BIT_MASKED(HDC_CHICKEN0,
 				  HDC_FENCE_DEST_SLM_DISABLE);
 
-	/* GEN8_L3SQCREG4 has a dependency with WA batch so any new changes
-	 * involving this register should also be added to WA batch as required.
-	 */
-	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
-		/* WaDisableLSQCROPERFforOCL:kbl */
-		I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) |
-			   GEN8_LQSC_RO_PERF_DIS);
-
 	/* WaToEnableHwFixForPushConstHWBug:kbl */
 	if (IS_KBL_REVID(dev_priv, KBL_REVID_C0, REVID_FOREVER))
 		WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
-- 
2.10.2

_______________________________________________
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-09 21:07 ` [PATCHv2] " Francisco Jerez
@ 2017-01-11  8:17   ` Daniel Vetter
  2017-01-11 12:07     ` Mika Kuoppala
  2017-01-12  0:03     ` [Intel-gfx] " Francisco Jerez
  0 siblings, 2 replies; 18+ messages in thread
From: Daniel Vetter @ 2017-01-11  8:17 UTC (permalink / raw)
  To: Francisco Jerez
  Cc: Jani Nikula, intel-gfx, Eero Tamminen, beignet, Mika Kuoppala

On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
> The WaDisableLSQCROPERFforOCL workaround has the side effect of
> disabling an L3SQ optimization that has huge performance implications
> and is unlikely to be necessary for the correct functioning of usual
> graphic workloads.  Userspace is free to re-enable the workaround on
> demand, and is generally in a better position to determine whether the
> workaround is necessary than the DRM is (e.g. only during the
> execution of compute kernels that rely on both L3 fences and HDC R/W
> requests).
> 
> The same workaround seems to apply to BDW (at least to production
> stepping G1) and SKL as well (the internal workaround database claims
> that it does for all steppings, while the BSpec workaround table only
> mentions pre-production steppings), but the DRM doesn't do anything
> beyond whitelisting the L3SQCREG4 register so userspace can enable it
> when it sees fit.  Do the same on KBL platforms.
> 
> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
> This is followed by a regression of 35% and 10% respectively for the
> same benchmarks and platform caused by my recent patch series
> switching userspace to use the dataport constant cache instead of the
> sampler to implement uniform pull constant loads, which caused us to
> hit more heavily the L3 cache (and on platforms other than KBL had the
> opposite effect of improving performance of the same two benchmarks).
> The overall effect on KBL of this change combined with the recent
> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
> was affected by the constant cache changes (though it improved as it
> did on other platforms rather than regressing), but is not
> significantly affected by this patch (with statistical significance of
> 5% and sample size 20).
> 
> v2: Drop some more code to avoid unused variable warning.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: Jani Nikula <jani.nikula@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: beignet@lists.freedesktop.org

Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
for compute kernels? Are the patches for mesa compute/beignet
ready&reviewed?
-Daniel

> ---
>  drivers/gpu/drm/i915/intel_lrc.c        | 10 ----------
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  8 --------
>  2 files changed, 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 6db246a..656e0a3 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -970,18 +970,8 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
>  						uint32_t *batch,
>  						uint32_t index)
>  {
> -	struct drm_i915_private *dev_priv = engine->i915;
>  	uint32_t l3sqc4_flush = (0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES);
>  
> -	/*
> -	 * WaDisableLSQCROPERFforOCL:kbl
> -	 * This WA is implemented in skl_init_clock_gating() but since
> -	 * this batch updates GEN8_L3SQCREG4 with default value we need to
> -	 * set this bit here to retain the WA during flush.
> -	 */
> -	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
> -		l3sqc4_flush |= GEN8_LQSC_RO_PERF_DIS;
> -
>  	wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |
>  				   MI_SRM_LRM_GLOBAL_GTT));
>  	wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 0971ac3..7cb2ab4 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1095,14 +1095,6 @@ static int kbl_init_workarounds(struct intel_engine_cs *engine)
>  		WA_SET_BIT_MASKED(HDC_CHICKEN0,
>  				  HDC_FENCE_DEST_SLM_DISABLE);
>  
> -	/* GEN8_L3SQCREG4 has a dependency with WA batch so any new changes
> -	 * involving this register should also be added to WA batch as required.
> -	 */
> -	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
> -		/* WaDisableLSQCROPERFforOCL:kbl */
> -		I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) |
> -			   GEN8_LQSC_RO_PERF_DIS);
> -
>  	/* WaToEnableHwFixForPushConstHWBug:kbl */
>  	if (IS_KBL_REVID(dev_priv, KBL_REVID_C0, REVID_FOREVER))
>  		WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
> -- 
> 2.10.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-11  8:17   ` [Intel-gfx] " Daniel Vetter
@ 2017-01-11 12:07     ` Mika Kuoppala
  2017-01-11 12:24       ` [Intel-gfx] " Chris Wilson
  2017-01-12  0:03     ` [Intel-gfx] " Francisco Jerez
  1 sibling, 1 reply; 18+ messages in thread
From: Mika Kuoppala @ 2017-01-11 12:07 UTC (permalink / raw)
  To: Daniel Vetter, Francisco Jerez
  Cc: Jani Nikula, intel-gfx, Eero Tamminen, beignet

Daniel Vetter <daniel@ffwll.ch> writes:

> On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
>> The WaDisableLSQCROPERFforOCL workaround has the side effect of
>> disabling an L3SQ optimization that has huge performance implications
>> and is unlikely to be necessary for the correct functioning of usual
>> graphic workloads.  Userspace is free to re-enable the workaround on
>> demand, and is generally in a better position to determine whether the
>> workaround is necessary than the DRM is (e.g. only during the
>> execution of compute kernels that rely on both L3 fences and HDC R/W
>> requests).
>> 
>> The same workaround seems to apply to BDW (at least to production
>> stepping G1) and SKL as well (the internal workaround database claims
>> that it does for all steppings, while the BSpec workaround table only
>> mentions pre-production steppings), but the DRM doesn't do anything
>> beyond whitelisting the L3SQCREG4 register so userspace can enable it
>> when it sees fit.  Do the same on KBL platforms.
>> 
>> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
>> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
>> This is followed by a regression of 35% and 10% respectively for the
>> same benchmarks and platform caused by my recent patch series
>> switching userspace to use the dataport constant cache instead of the
>> sampler to implement uniform pull constant loads, which caused us to
>> hit more heavily the L3 cache (and on platforms other than KBL had the
>> opposite effect of improving performance of the same two benchmarks).
>> The overall effect on KBL of this change combined with the recent
>> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
>> was affected by the constant cache changes (though it improved as it
>> did on other platforms rather than regressing), but is not
>> significantly affected by this patch (with statistical significance of
>> 5% and sample size 20).
>> 
>> v2: Drop some more code to avoid unused variable warning.
>> 
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
>> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
>> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
>> Cc: Jani Nikula <jani.nikula@intel.com>
>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> Cc: beignet@lists.freedesktop.org
>
> Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
> for compute kernels? Are the patches for mesa compute/beignet
> ready&reviewed?

This is explicit setting on kbl/E0 only. So one could argue
that unless they filter based on PCI-IDs, things would already
blow up across the skl/kbl population, if they forgot
to set it. The whitelisting is in place and looks sane
so this E0 exception is a wart that got in by me reading wa
database slavishly without thinking.

-Mika

> -Daniel
>
>> ---
>>  drivers/gpu/drm/i915/intel_lrc.c        | 10 ----------
>>  drivers/gpu/drm/i915/intel_ringbuffer.c |  8 --------
>>  2 files changed, 18 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 6db246a..656e0a3 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -970,18 +970,8 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
>>  						uint32_t *batch,
>>  						uint32_t index)
>>  {
>> -	struct drm_i915_private *dev_priv = engine->i915;
>>  	uint32_t l3sqc4_flush = (0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES);
>>  
>> -	/*
>> -	 * WaDisableLSQCROPERFforOCL:kbl
>> -	 * This WA is implemented in skl_init_clock_gating() but since
>> -	 * this batch updates GEN8_L3SQCREG4 with default value we need to
>> -	 * set this bit here to retain the WA during flush.
>> -	 */
>> -	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
>> -		l3sqc4_flush |= GEN8_LQSC_RO_PERF_DIS;
>> -
>>  	wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |
>>  				   MI_SRM_LRM_GLOBAL_GTT));
>>  	wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 0971ac3..7cb2ab4 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -1095,14 +1095,6 @@ static int kbl_init_workarounds(struct intel_engine_cs *engine)
>>  		WA_SET_BIT_MASKED(HDC_CHICKEN0,
>>  				  HDC_FENCE_DEST_SLM_DISABLE);
>>  
>> -	/* GEN8_L3SQCREG4 has a dependency with WA batch so any new changes
>> -	 * involving this register should also be added to WA batch as required.
>> -	 */
>> -	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
>> -		/* WaDisableLSQCROPERFforOCL:kbl */
>> -		I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) |
>> -			   GEN8_LQSC_RO_PERF_DIS);
>> -
>>  	/* WaToEnableHwFixForPushConstHWBug:kbl */
>>  	if (IS_KBL_REVID(dev_priv, KBL_REVID_C0, REVID_FOREVER))
>>  		WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
>> -- 
>> 2.10.2
>> 
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-11 12:07     ` Mika Kuoppala
@ 2017-01-11 12:24       ` Chris Wilson
  2017-01-11 12:40         ` Mika Kuoppala
  2017-01-11 13:24         ` [Intel-gfx] " Daniel Vetter
  0 siblings, 2 replies; 18+ messages in thread
From: Chris Wilson @ 2017-01-11 12:24 UTC (permalink / raw)
  To: Mika Kuoppala
  Cc: Jani Nikula, intel-gfx, beignet, Daniel Vetter, Francisco Jerez,
	Eero Tamminen

On Wed, Jan 11, 2017 at 02:07:37PM +0200, Mika Kuoppala wrote:
> Daniel Vetter <daniel@ffwll.ch> writes:
> 
> > On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
> >> The WaDisableLSQCROPERFforOCL workaround has the side effect of
> >> disabling an L3SQ optimization that has huge performance implications
> >> and is unlikely to be necessary for the correct functioning of usual
> >> graphic workloads.  Userspace is free to re-enable the workaround on
> >> demand, and is generally in a better position to determine whether the
> >> workaround is necessary than the DRM is (e.g. only during the
> >> execution of compute kernels that rely on both L3 fences and HDC R/W
> >> requests).
> >> 
> >> The same workaround seems to apply to BDW (at least to production
> >> stepping G1) and SKL as well (the internal workaround database claims
> >> that it does for all steppings, while the BSpec workaround table only
> >> mentions pre-production steppings), but the DRM doesn't do anything
> >> beyond whitelisting the L3SQCREG4 register so userspace can enable it
> >> when it sees fit.  Do the same on KBL platforms.
> >> 
> >> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
> >> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
> >> This is followed by a regression of 35% and 10% respectively for the
> >> same benchmarks and platform caused by my recent patch series
> >> switching userspace to use the dataport constant cache instead of the
> >> sampler to implement uniform pull constant loads, which caused us to
> >> hit more heavily the L3 cache (and on platforms other than KBL had the
> >> opposite effect of improving performance of the same two benchmarks).
> >> The overall effect on KBL of this change combined with the recent
> >> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
> >> was affected by the constant cache changes (though it improved as it
> >> did on other platforms rather than regressing), but is not
> >> significantly affected by this patch (with statistical significance of
> >> 5% and sample size 20).
> >> 
> >> v2: Drop some more code to avoid unused variable warning.
> >> 
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
> >> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
> >> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> >> Cc: Jani Nikula <jani.nikula@intel.com>
> >> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> >> Cc: beignet@lists.freedesktop.org
> >
> > Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
> > for compute kernels? Are the patches for mesa compute/beignet
> > ready&reviewed?
> 
> This is explicit setting on kbl/E0 only. So one could argue
> that unless they filter based on PCI-IDs, things would already
> blow up across the skl/kbl population, if they forgot
> to set it. The whitelisting is in place and looks sane
> so this E0 exception is a wart that got in by me reading wa
> database slavishly without thinking.

Add Fixes then?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-11 12:24       ` [Intel-gfx] " Chris Wilson
@ 2017-01-11 12:40         ` Mika Kuoppala
  2017-01-11 13:24         ` [Intel-gfx] " Daniel Vetter
  1 sibling, 0 replies; 18+ messages in thread
From: Mika Kuoppala @ 2017-01-11 12:40 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Jani Nikula, intel-gfx, beignet, Eero Tamminen

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Wed, Jan 11, 2017 at 02:07:37PM +0200, Mika Kuoppala wrote:
>> Daniel Vetter <daniel@ffwll.ch> writes:
>> 
>> > On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
>> >> The WaDisableLSQCROPERFforOCL workaround has the side effect of
>> >> disabling an L3SQ optimization that has huge performance implications
>> >> and is unlikely to be necessary for the correct functioning of usual
>> >> graphic workloads.  Userspace is free to re-enable the workaround on
>> >> demand, and is generally in a better position to determine whether the
>> >> workaround is necessary than the DRM is (e.g. only during the
>> >> execution of compute kernels that rely on both L3 fences and HDC R/W
>> >> requests).
>> >> 
>> >> The same workaround seems to apply to BDW (at least to production
>> >> stepping G1) and SKL as well (the internal workaround database claims
>> >> that it does for all steppings, while the BSpec workaround table only
>> >> mentions pre-production steppings), but the DRM doesn't do anything
>> >> beyond whitelisting the L3SQCREG4 register so userspace can enable it
>> >> when it sees fit.  Do the same on KBL platforms.
>> >> 
>> >> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
>> >> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
>> >> This is followed by a regression of 35% and 10% respectively for the
>> >> same benchmarks and platform caused by my recent patch series
>> >> switching userspace to use the dataport constant cache instead of the
>> >> sampler to implement uniform pull constant loads, which caused us to
>> >> hit more heavily the L3 cache (and on platforms other than KBL had the
>> >> opposite effect of improving performance of the same two benchmarks).
>> >> The overall effect on KBL of this change combined with the recent
>> >> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
>> >> was affected by the constant cache changes (though it improved as it
>> >> did on other platforms rather than regressing), but is not
>> >> significantly affected by this patch (with statistical significance of
>> >> 5% and sample size 20).
>> >> 
>> >> v2: Drop some more code to avoid unused variable warning.
>> >> 
>> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
>> >> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
>> >> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
>> >> Cc: Jani Nikula <jani.nikula@intel.com>
>> >> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> >> Cc: beignet@lists.freedesktop.org
>> >
>> > Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
>> > for compute kernels? Are the patches for mesa compute/beignet
>> > ready&reviewed?
>> 
>> This is explicit setting on kbl/E0 only. So one could argue
>> that unless they filter based on PCI-IDs, things would already
>> blow up across the skl/kbl population, if they forgot
>> to set it. The whitelisting is in place and looks sane
>> so this E0 exception is a wart that got in by me reading wa
>> database slavishly without thinking.
>
> Add Fixes then?

Fixes: a4106a782d11 ("drm/i915/gen9: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround")

Looking at beignet source, they don't care about this register/bit (yet).

Also we need to get rid of KBL_REVID_E0 as there is no such thing.
Oddly kbl doesnt follow the logical x0->rev mapping but leave
holes. Were they afraid of running out of revids or what...

-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-11 12:24       ` [Intel-gfx] " Chris Wilson
  2017-01-11 12:40         ` Mika Kuoppala
@ 2017-01-11 13:24         ` Daniel Vetter
  2017-01-12  0:05           ` Francisco Jerez
  1 sibling, 1 reply; 18+ messages in thread
From: Daniel Vetter @ 2017-01-11 13:24 UTC (permalink / raw)
  To: Chris Wilson, Mika Kuoppala, Daniel Vetter, Francisco Jerez,
	Jani Nikula, intel-gfx, Eero Tamminen, beignet

On Wed, Jan 11, 2017 at 12:24:59PM +0000, Chris Wilson wrote:
> On Wed, Jan 11, 2017 at 02:07:37PM +0200, Mika Kuoppala wrote:
> > Daniel Vetter <daniel@ffwll.ch> writes:
> > 
> > > On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
> > >> The WaDisableLSQCROPERFforOCL workaround has the side effect of
> > >> disabling an L3SQ optimization that has huge performance implications
> > >> and is unlikely to be necessary for the correct functioning of usual
> > >> graphic workloads.  Userspace is free to re-enable the workaround on
> > >> demand, and is generally in a better position to determine whether the
> > >> workaround is necessary than the DRM is (e.g. only during the
> > >> execution of compute kernels that rely on both L3 fences and HDC R/W
> > >> requests).
> > >> 
> > >> The same workaround seems to apply to BDW (at least to production
> > >> stepping G1) and SKL as well (the internal workaround database claims
> > >> that it does for all steppings, while the BSpec workaround table only
> > >> mentions pre-production steppings), but the DRM doesn't do anything
> > >> beyond whitelisting the L3SQCREG4 register so userspace can enable it
> > >> when it sees fit.  Do the same on KBL platforms.
> > >> 
> > >> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
> > >> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
> > >> This is followed by a regression of 35% and 10% respectively for the
> > >> same benchmarks and platform caused by my recent patch series
> > >> switching userspace to use the dataport constant cache instead of the
> > >> sampler to implement uniform pull constant loads, which caused us to
> > >> hit more heavily the L3 cache (and on platforms other than KBL had the
> > >> opposite effect of improving performance of the same two benchmarks).
> > >> The overall effect on KBL of this change combined with the recent
> > >> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
> > >> was affected by the constant cache changes (though it improved as it
> > >> did on other platforms rather than regressing), but is not
> > >> significantly affected by this patch (with statistical significance of
> > >> 5% and sample size 20).
> > >> 
> > >> v2: Drop some more code to avoid unused variable warning.
> > >> 
> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
> > >> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
> > >> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > >> Cc: Jani Nikula <jani.nikula@intel.com>
> > >> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > >> Cc: beignet@lists.freedesktop.org
> > >
> > > Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
> > > for compute kernels? Are the patches for mesa compute/beignet
> > > ready&reviewed?
> > 
> > This is explicit setting on kbl/E0 only. So one could argue
> > that unless they filter based on PCI-IDs, things would already
> > blow up across the skl/kbl population, if they forgot
> > to set it. The whitelisting is in place and looks sane
> > so this E0 exception is a wart that got in by me reading wa
> > database slavishly without thinking.
> 
> Add Fixes then?

Yeah, cc: stable would be good to make sure it shows up in all supported
kernels, fast. Otherwise we'll get some good wtf bug reports.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-11  8:17   ` [Intel-gfx] " Daniel Vetter
  2017-01-11 12:07     ` Mika Kuoppala
@ 2017-01-12  0:03     ` Francisco Jerez
  1 sibling, 0 replies; 18+ messages in thread
From: Francisco Jerez @ 2017-01-12  0:03 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jani Nikula, intel-gfx, Eero Tamminen, beignet, Mika Kuoppala


[-- Attachment #1.1.1: Type: text/plain, Size: 5186 bytes --]

Daniel Vetter <daniel@ffwll.ch> writes:

> On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
>> The WaDisableLSQCROPERFforOCL workaround has the side effect of
>> disabling an L3SQ optimization that has huge performance implications
>> and is unlikely to be necessary for the correct functioning of usual
>> graphic workloads.  Userspace is free to re-enable the workaround on
>> demand, and is generally in a better position to determine whether the
>> workaround is necessary than the DRM is (e.g. only during the
>> execution of compute kernels that rely on both L3 fences and HDC R/W
>> requests).
>> 
>> The same workaround seems to apply to BDW (at least to production
>> stepping G1) and SKL as well (the internal workaround database claims
>> that it does for all steppings, while the BSpec workaround table only
>> mentions pre-production steppings), but the DRM doesn't do anything
>> beyond whitelisting the L3SQCREG4 register so userspace can enable it
>> when it sees fit.  Do the same on KBL platforms.
>> 
>> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
>> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
>> This is followed by a regression of 35% and 10% respectively for the
>> same benchmarks and platform caused by my recent patch series
>> switching userspace to use the dataport constant cache instead of the
>> sampler to implement uniform pull constant loads, which caused us to
>> hit more heavily the L3 cache (and on platforms other than KBL had the
>> opposite effect of improving performance of the same two benchmarks).
>> The overall effect on KBL of this change combined with the recent
>> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
>> was affected by the constant cache changes (though it improved as it
>> did on other platforms rather than regressing), but is not
>> significantly affected by this patch (with statistical significance of
>> 5% and sample size 20).
>> 
>> v2: Drop some more code to avoid unused variable warning.
>> 
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
>> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
>> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
>> Cc: Jani Nikula <jani.nikula@intel.com>
>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> Cc: beignet@lists.freedesktop.org
>
> Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
> for compute kernels? Are the patches for mesa compute/beignet
> ready&reviewed?

I don't think upstream userspace (neither Beignet nor Mesa) relies on
the workaround at this point, though Beignet devs may want to keep this
hardware bug in mind when they start doing SVM.

> -Daniel
>
>> ---
>>  drivers/gpu/drm/i915/intel_lrc.c        | 10 ----------
>>  drivers/gpu/drm/i915/intel_ringbuffer.c |  8 --------
>>  2 files changed, 18 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 6db246a..656e0a3 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -970,18 +970,8 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
>>  						uint32_t *batch,
>>  						uint32_t index)
>>  {
>> -	struct drm_i915_private *dev_priv = engine->i915;
>>  	uint32_t l3sqc4_flush = (0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES);
>>  
>> -	/*
>> -	 * WaDisableLSQCROPERFforOCL:kbl
>> -	 * This WA is implemented in skl_init_clock_gating() but since
>> -	 * this batch updates GEN8_L3SQCREG4 with default value we need to
>> -	 * set this bit here to retain the WA during flush.
>> -	 */
>> -	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
>> -		l3sqc4_flush |= GEN8_LQSC_RO_PERF_DIS;
>> -
>>  	wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |
>>  				   MI_SRM_LRM_GLOBAL_GTT));
>>  	wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 0971ac3..7cb2ab4 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -1095,14 +1095,6 @@ static int kbl_init_workarounds(struct intel_engine_cs *engine)
>>  		WA_SET_BIT_MASKED(HDC_CHICKEN0,
>>  				  HDC_FENCE_DEST_SLM_DISABLE);
>>  
>> -	/* GEN8_L3SQCREG4 has a dependency with WA batch so any new changes
>> -	 * involving this register should also be added to WA batch as required.
>> -	 */
>> -	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
>> -		/* WaDisableLSQCROPERFforOCL:kbl */
>> -		I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) |
>> -			   GEN8_LQSC_RO_PERF_DIS);
>> -
>>  	/* WaToEnableHwFixForPushConstHWBug:kbl */
>>  	if (IS_KBL_REVID(dev_priv, KBL_REVID_C0, REVID_FOREVER))
>>  		WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
>> -- 
>> 2.10.2
>> 
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 212 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-11 13:24         ` [Intel-gfx] " Daniel Vetter
@ 2017-01-12  0:05           ` Francisco Jerez
  2017-01-12 13:58             ` Mika Kuoppala
  0 siblings, 1 reply; 18+ messages in thread
From: Francisco Jerez @ 2017-01-12  0:05 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson, Mika Kuoppala


[-- Attachment #1.1.1: Type: text/plain, Size: 3680 bytes --]

Daniel Vetter <daniel@ffwll.ch> writes:

> On Wed, Jan 11, 2017 at 12:24:59PM +0000, Chris Wilson wrote:
>> On Wed, Jan 11, 2017 at 02:07:37PM +0200, Mika Kuoppala wrote:
>> > Daniel Vetter <daniel@ffwll.ch> writes:
>> > 
>> > > On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
>> > >> The WaDisableLSQCROPERFforOCL workaround has the side effect of
>> > >> disabling an L3SQ optimization that has huge performance implications
>> > >> and is unlikely to be necessary for the correct functioning of usual
>> > >> graphic workloads.  Userspace is free to re-enable the workaround on
>> > >> demand, and is generally in a better position to determine whether the
>> > >> workaround is necessary than the DRM is (e.g. only during the
>> > >> execution of compute kernels that rely on both L3 fences and HDC R/W
>> > >> requests).
>> > >> 
>> > >> The same workaround seems to apply to BDW (at least to production
>> > >> stepping G1) and SKL as well (the internal workaround database claims
>> > >> that it does for all steppings, while the BSpec workaround table only
>> > >> mentions pre-production steppings), but the DRM doesn't do anything
>> > >> beyond whitelisting the L3SQCREG4 register so userspace can enable it
>> > >> when it sees fit.  Do the same on KBL platforms.
>> > >> 
>> > >> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
>> > >> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
>> > >> This is followed by a regression of 35% and 10% respectively for the
>> > >> same benchmarks and platform caused by my recent patch series
>> > >> switching userspace to use the dataport constant cache instead of the
>> > >> sampler to implement uniform pull constant loads, which caused us to
>> > >> hit more heavily the L3 cache (and on platforms other than KBL had the
>> > >> opposite effect of improving performance of the same two benchmarks).
>> > >> The overall effect on KBL of this change combined with the recent
>> > >> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
>> > >> was affected by the constant cache changes (though it improved as it
>> > >> did on other platforms rather than regressing), but is not
>> > >> significantly affected by this patch (with statistical significance of
>> > >> 5% and sample size 20).
>> > >> 
>> > >> v2: Drop some more code to avoid unused variable warning.
>> > >> 
>> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
>> > >> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
>> > >> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
>> > >> Cc: Jani Nikula <jani.nikula@intel.com>
>> > >> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> > >> Cc: beignet@lists.freedesktop.org
>> > >
>> > > Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
>> > > for compute kernels? Are the patches for mesa compute/beignet
>> > > ready&reviewed?
>> > 
>> > This is explicit setting on kbl/E0 only. So one could argue
>> > that unless they filter based on PCI-IDs, things would already
>> > blow up across the skl/kbl population, if they forgot
>> > to set it. The whitelisting is in place and looks sane
>> > so this E0 exception is a wart that got in by me reading wa
>> > database slavishly without thinking.
>> 
>> Add Fixes then?
>
> Yeah, cc: stable would be good to make sure it shows up in all supported
> kernels, fast. Otherwise we'll get some good wtf bug reports.

Agreed -- It would be nice for this to get to stable kernel branches.

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 212 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-12  0:05           ` Francisco Jerez
@ 2017-01-12 13:58             ` Mika Kuoppala
  0 siblings, 0 replies; 18+ messages in thread
From: Mika Kuoppala @ 2017-01-12 13:58 UTC (permalink / raw)
  To: Francisco Jerez, Daniel Vetter, Chris Wilson

Francisco Jerez <currojerez@riseup.net> writes:

> Daniel Vetter <daniel@ffwll.ch> writes:
>
>> On Wed, Jan 11, 2017 at 12:24:59PM +0000, Chris Wilson wrote:
>>> On Wed, Jan 11, 2017 at 02:07:37PM +0200, Mika Kuoppala wrote:
>>> > Daniel Vetter <daniel@ffwll.ch> writes:
>>> > 
>>> > > On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
>>> > >> The WaDisableLSQCROPERFforOCL workaround has the side effect of
>>> > >> disabling an L3SQ optimization that has huge performance implications
>>> > >> and is unlikely to be necessary for the correct functioning of usual
>>> > >> graphic workloads.  Userspace is free to re-enable the workaround on
>>> > >> demand, and is generally in a better position to determine whether the
>>> > >> workaround is necessary than the DRM is (e.g. only during the
>>> > >> execution of compute kernels that rely on both L3 fences and HDC R/W
>>> > >> requests).
>>> > >> 
>>> > >> The same workaround seems to apply to BDW (at least to production
>>> > >> stepping G1) and SKL as well (the internal workaround database claims
>>> > >> that it does for all steppings, while the BSpec workaround table only
>>> > >> mentions pre-production steppings), but the DRM doesn't do anything
>>> > >> beyond whitelisting the L3SQCREG4 register so userspace can enable it
>>> > >> when it sees fit.  Do the same on KBL platforms.
>>> > >> 
>>> > >> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
>>> > >> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
>>> > >> This is followed by a regression of 35% and 10% respectively for the
>>> > >> same benchmarks and platform caused by my recent patch series
>>> > >> switching userspace to use the dataport constant cache instead of the
>>> > >> sampler to implement uniform pull constant loads, which caused us to
>>> > >> hit more heavily the L3 cache (and on platforms other than KBL had the
>>> > >> opposite effect of improving performance of the same two benchmarks).
>>> > >> The overall effect on KBL of this change combined with the recent
>>> > >> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
>>> > >> was affected by the constant cache changes (though it improved as it
>>> > >> did on other platforms rather than regressing), but is not
>>> > >> significantly affected by this patch (with statistical significance of
>>> > >> 5% and sample size 20).
>>> > >> 
>>> > >> v2: Drop some more code to avoid unused variable warning.
>>> > >> 
>>> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
>>> > >> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
>>> > >> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
>>> > >> Cc: Jani Nikula <jani.nikula@intel.com>
>>> > >> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>> > >> Cc: beignet@lists.freedesktop.org
>>> > >
>>> > > Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
>>> > > for compute kernels? Are the patches for mesa compute/beignet
>>> > > ready&reviewed?
>>> > 
>>> > This is explicit setting on kbl/E0 only. So one could argue
>>> > that unless they filter based on PCI-IDs, things would already
>>> > blow up across the skl/kbl population, if they forgot
>>> > to set it. The whitelisting is in place and looks sane
>>> > so this E0 exception is a wart that got in by me reading wa
>>> > database slavishly without thinking.
>>> 
>>> Add Fixes then?
>>
>> Yeah, cc: stable would be good to make sure it shows up in all supported
>> kernels, fast. Otherwise we'll get some good wtf bug reports.
>
> Agreed -- It would be nice for this to get to stable kernel branches.
>

Added Fixes and stable tags and pushed to drm-intel-next-queued.

Thanks for patch,
-Mika

>> -Daniel
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-31  5:11   ` Greg KH
@ 2017-01-31  5:11     ` Greg KH
  2017-01-31  5:11       ` Francisco Jerez
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2017-01-31  5:11 UTC (permalink / raw)
  To: Francisco Jerez
  Cc: stable, eero.t.tamminen, jani.nikula, matthew.william.auld,
	mika.kuoppala, beignet

On Tue, Jan 31, 2017 at 06:11:25AM +0100, Greg KH wrote:
> On Mon, Jan 30, 2017 at 12:24:45PM -0800, Francisco Jerez wrote:
> > The WaDisableLSQCROPERFforOCL workaround has the side effect of
> > disabling an L3SQ optimization that has huge performance implications
> > and is unlikely to be necessary for the correct functioning of usual
> > graphic workloads.  Userspace is free to re-enable the workaround on
> > demand, and is generally in a better position to determine whether the
> > workaround is necessary than the DRM is (e.g. only during the
> > execution of compute kernels that rely on both L3 fences and HDC R/W
> > requests).
> > 
> > The same workaround seems to apply to BDW (at least to production
> > stepping G1) and SKL as well (the internal workaround database claims
> > that it does for all steppings, while the BSpec workaround table only
> > mentions pre-production steppings), but the DRM doesn't do anything
> > beyond whitelisting the L3SQCREG4 register so userspace can enable it
> > when it sees fit.  Do the same on KBL platforms.
> > 
> > Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
> > and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
> > This is followed by a regression of 35% and 10% respectively for the
> > same benchmarks and platform caused by my recent patch series
> > switching userspace to use the dataport constant cache instead of the
> > sampler to implement uniform pull constant loads, which caused us to
> > hit more heavily the L3 cache (and on platforms other than KBL had the
> > opposite effect of improving performance of the same two benchmarks).
> > The overall effect on KBL of this change combined with the recent
> > userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
> > was affected by the constant cache changes (though it improved as it
> > did on other platforms rather than regressing), but is not
> > significantly affected by this patch (with statistical significance of
> > 5% and sample size 20).
> > 
> > v2: Drop some more code to avoid unused variable warning.
> > 
> > Fixes: 738fa1b3123f ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
> > Signed-off-by: Francisco Jerez <currojerez@riseup.net>
> > Cc: Matthew Auld <matthew.william.auld@gmail.com>
> > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > Cc: Jani Nikula <jani.nikula@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Cc: beignet@lists.freedesktop.org
> > Cc: <stable@vger.kernel.org> # v4.7+
> > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > [Removed double Fixes tag]
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
> > (cherry picked from commit 8726f2faa371514fba2f594d799db95203dfeee0)
> > Signed-off-by: Jani Nikula <jani.nikula@intel.com>
> > [ Francisco Jerez: Rebase on v4.9 branch. ]
> > Signed-off-by: Francisco Jerez <currojerez@riseup.net>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c        | 3 +--
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 8 --------
> >  2 files changed, 1 insertion(+), 10 deletions(-)
> 
> What is the commit id of this patch in Linus's tree?

Ah, nevermind, it's 4fc020d864647ea3ae8cb8f17d63e48e87ebd0bf, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-31  5:11     ` Greg KH
@ 2017-01-31  5:11       ` Francisco Jerez
  0 siblings, 0 replies; 18+ messages in thread
From: Francisco Jerez @ 2017-01-31  5:11 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, eero.t.tamminen, jani.nikula, matthew.william.auld,
	mika.kuoppala, beignet


[-- Attachment #1.1: Type: text/plain, Size: 3610 bytes --]

Greg KH <gregkh@linuxfoundation.org> writes:

> On Tue, Jan 31, 2017 at 06:11:25AM +0100, Greg KH wrote:
>> On Mon, Jan 30, 2017 at 12:24:45PM -0800, Francisco Jerez wrote:
>> > The WaDisableLSQCROPERFforOCL workaround has the side effect of
>> > disabling an L3SQ optimization that has huge performance implications
>> > and is unlikely to be necessary for the correct functioning of usual
>> > graphic workloads.  Userspace is free to re-enable the workaround on
>> > demand, and is generally in a better position to determine whether the
>> > workaround is necessary than the DRM is (e.g. only during the
>> > execution of compute kernels that rely on both L3 fences and HDC R/W
>> > requests).
>> > 
>> > The same workaround seems to apply to BDW (at least to production
>> > stepping G1) and SKL as well (the internal workaround database claims
>> > that it does for all steppings, while the BSpec workaround table only
>> > mentions pre-production steppings), but the DRM doesn't do anything
>> > beyond whitelisting the L3SQCREG4 register so userspace can enable it
>> > when it sees fit.  Do the same on KBL platforms.
>> > 
>> > Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
>> > and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
>> > This is followed by a regression of 35% and 10% respectively for the
>> > same benchmarks and platform caused by my recent patch series
>> > switching userspace to use the dataport constant cache instead of the
>> > sampler to implement uniform pull constant loads, which caused us to
>> > hit more heavily the L3 cache (and on platforms other than KBL had the
>> > opposite effect of improving performance of the same two benchmarks).
>> > The overall effect on KBL of this change combined with the recent
>> > userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
>> > was affected by the constant cache changes (though it improved as it
>> > did on other platforms rather than regressing), but is not
>> > significantly affected by this patch (with statistical significance of
>> > 5% and sample size 20).
>> > 
>> > v2: Drop some more code to avoid unused variable warning.
>> > 
>> > Fixes: 738fa1b3123f ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
>> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
>> > Signed-off-by: Francisco Jerez <currojerez@riseup.net>
>> > Cc: Matthew Auld <matthew.william.auld@gmail.com>
>> > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
>> > Cc: Jani Nikula <jani.nikula@intel.com>
>> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> > Cc: beignet@lists.freedesktop.org
>> > Cc: <stable@vger.kernel.org> # v4.7+
>> > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> > [Removed double Fixes tag]
>> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> > Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
>> > (cherry picked from commit 8726f2faa371514fba2f594d799db95203dfeee0)
>> > Signed-off-by: Jani Nikula <jani.nikula@intel.com>
>> > [ Francisco Jerez: Rebase on v4.9 branch. ]
>> > Signed-off-by: Francisco Jerez <currojerez@riseup.net>
>> > ---
>> >  drivers/gpu/drm/i915/intel_lrc.c        | 3 +--
>> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 8 --------
>> >  2 files changed, 1 insertion(+), 10 deletions(-)
>> 
>> What is the commit id of this patch in Linus's tree?
>
> Ah, nevermind, it's 4fc020d864647ea3ae8cb8f17d63e48e87ebd0bf, right?
>

Oops, yes, that's right.  Thanks!

> thanks,
>
> greg k-h

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 212 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-30 20:24 ` [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround Francisco Jerez
@ 2017-01-31  5:11   ` Greg KH
  2017-01-31  5:11     ` Greg KH
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2017-01-31  5:11 UTC (permalink / raw)
  To: Francisco Jerez
  Cc: stable, eero.t.tamminen, jani.nikula, matthew.william.auld,
	mika.kuoppala, beignet

On Mon, Jan 30, 2017 at 12:24:45PM -0800, Francisco Jerez wrote:
> The WaDisableLSQCROPERFforOCL workaround has the side effect of
> disabling an L3SQ optimization that has huge performance implications
> and is unlikely to be necessary for the correct functioning of usual
> graphic workloads.  Userspace is free to re-enable the workaround on
> demand, and is generally in a better position to determine whether the
> workaround is necessary than the DRM is (e.g. only during the
> execution of compute kernels that rely on both L3 fences and HDC R/W
> requests).
> 
> The same workaround seems to apply to BDW (at least to production
> stepping G1) and SKL as well (the internal workaround database claims
> that it does for all steppings, while the BSpec workaround table only
> mentions pre-production steppings), but the DRM doesn't do anything
> beyond whitelisting the L3SQCREG4 register so userspace can enable it
> when it sees fit.  Do the same on KBL platforms.
> 
> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
> This is followed by a regression of 35% and 10% respectively for the
> same benchmarks and platform caused by my recent patch series
> switching userspace to use the dataport constant cache instead of the
> sampler to implement uniform pull constant loads, which caused us to
> hit more heavily the L3 cache (and on platforms other than KBL had the
> opposite effect of improving performance of the same two benchmarks).
> The overall effect on KBL of this change combined with the recent
> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
> was affected by the constant cache changes (though it improved as it
> did on other platforms rather than regressing), but is not
> significantly affected by this patch (with statistical significance of
> 5% and sample size 20).
> 
> v2: Drop some more code to avoid unused variable warning.
> 
> Fixes: 738fa1b3123f ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: Jani Nikula <jani.nikula@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: beignet@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v4.7+
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> [Removed double Fixes tag]
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
> (cherry picked from commit 8726f2faa371514fba2f594d799db95203dfeee0)
> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
> [ Francisco Jerez: Rebase on v4.9 branch. ]
> Signed-off-by: Francisco Jerez <currojerez@riseup.net>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        | 3 +--
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 8 --------
>  2 files changed, 1 insertion(+), 10 deletions(-)

What is the commit id of this patch in Linus's tree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-30 13:19 FAILED: patch "[PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround." failed to apply to 4.9-stable tree gregkh
@ 2017-01-30 20:24 ` Francisco Jerez
  2017-01-31  5:11   ` Greg KH
  0 siblings, 1 reply; 18+ messages in thread
From: Francisco Jerez @ 2017-01-30 20:24 UTC (permalink / raw)
  To: stable
  Cc: gregkh, eero.t.tamminen, jani.nikula, matthew.william.auld,
	mika.kuoppala, beignet

The WaDisableLSQCROPERFforOCL workaround has the side effect of
disabling an L3SQ optimization that has huge performance implications
and is unlikely to be necessary for the correct functioning of usual
graphic workloads.  Userspace is free to re-enable the workaround on
demand, and is generally in a better position to determine whether the
workaround is necessary than the DRM is (e.g. only during the
execution of compute kernels that rely on both L3 fences and HDC R/W
requests).

The same workaround seems to apply to BDW (at least to production
stepping G1) and SKL as well (the internal workaround database claims
that it does for all steppings, while the BSpec workaround table only
mentions pre-production steppings), but the DRM doesn't do anything
beyond whitelisting the L3SQCREG4 register so userspace can enable it
when it sees fit.  Do the same on KBL platforms.

Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
This is followed by a regression of 35% and 10% respectively for the
same benchmarks and platform caused by my recent patch series
switching userspace to use the dataport constant cache instead of the
sampler to implement uniform pull constant loads, which caused us to
hit more heavily the L3 cache (and on platforms other than KBL had the
opposite effect of improving performance of the same two benchmarks).
The overall effect on KBL of this change combined with the recent
userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
was affected by the constant cache changes (though it improved as it
did on other platforms rather than regressing), but is not
significantly affected by this patch (with statistical significance of
5% and sample size 20).

v2: Drop some more code to avoid unused variable warning.

Fixes: 738fa1b3123f ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: beignet@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.7+
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[Removed double Fixes tag]
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
(cherry picked from commit 8726f2faa371514fba2f594d799db95203dfeee0)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[ Francisco Jerez: Rebase on v4.9 branch. ]
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 3 +--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 8 --------
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0adb879..67db157 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -858,8 +858,7 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
 	 * this batch updates GEN8_L3SQCREG4 with default value we need to
 	 * set this bit here to retain the WA during flush.
 	 */
-	if (IS_SKL_REVID(dev_priv, 0, SKL_REVID_E0) ||
-	    IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
+	if (IS_SKL_REVID(dev_priv, 0, SKL_REVID_E0))
 		l3sqc4_flush |= GEN8_LQSC_RO_PERF_DIS;
 
 	wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ed9955d..8babfe0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1153,14 +1153,6 @@ static int kbl_init_workarounds(struct intel_engine_cs *engine)
 		WA_SET_BIT_MASKED(HDC_CHICKEN0,
 				  HDC_FENCE_DEST_SLM_DISABLE);
 
-	/* GEN8_L3SQCREG4 has a dependency with WA batch so any new changes
-	 * involving this register should also be added to WA batch as required.
-	 */
-	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
-		/* WaDisableLSQCROPERFforOCL:kbl */
-		I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) |
-			   GEN8_LQSC_RO_PERF_DIS);
-
 	/* WaToEnableHwFixForPushConstHWBug:kbl */
 	if (IS_KBL_REVID(dev_priv, KBL_REVID_C0, REVID_FOREVER))
 		WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
-- 
2.10.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  2017-01-12 10:44 ` Mika Kuoppala
  (?)
@ 2017-01-12 10:56 ` Chris Wilson
  -1 siblings, 0 replies; 18+ messages in thread
From: Chris Wilson @ 2017-01-12 10:56 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: Jani Nikula, # v4 . 7+, intel-gfx, Eero Tamminen, beignet

On Thu, Jan 12, 2017 at 12:44:54PM +0200, Mika Kuoppala wrote:
> From: Francisco Jerez <currojerez@riseup.net>
> 
> The WaDisableLSQCROPERFforOCL workaround has the side effect of
> disabling an L3SQ optimization that has huge performance implications
> and is unlikely to be necessary for the correct functioning of usual
> graphic workloads.  Userspace is free to re-enable the workaround on
> demand, and is generally in a better position to determine whether the
> workaround is necessary than the DRM is (e.g. only during the
> execution of compute kernels that rely on both L3 fences and HDC R/W
> requests).
> 
> The same workaround seems to apply to BDW (at least to production
> stepping G1) and SKL as well (the internal workaround database claims
> that it does for all steppings, while the BSpec workaround table only
> mentions pre-production steppings), but the DRM doesn't do anything
> beyond whitelisting the L3SQCREG4 register so userspace can enable it
> when it sees fit.  Do the same on KBL platforms.
> 
> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
> This is followed by a regression of 35% and 10% respectively for the
> same benchmarks and platform caused by my recent patch series
> switching userspace to use the dataport constant cache instead of the
> sampler to implement uniform pull constant loads, which caused us to
> hit more heavily the L3 cache (and on platforms other than KBL had the
> opposite effect of improving performance of the same two benchmarks).
> The overall effect on KBL of this change combined with the recent
> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
> was affected by the constant cache changes (though it improved as it
> did on other platforms rather than regressing), but is not
> significantly affected by this patch (with statistical significance of
> 5% and sample size 20).
> 
> v2: Drop some more code to avoid unused variable warning.
> 
> Fixes: Fixes: 738fa1b3123f ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
Once is enough :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
@ 2017-01-12 10:44 ` Mika Kuoppala
  0 siblings, 0 replies; 18+ messages in thread
From: Mika Kuoppala @ 2017-01-12 10:44 UTC (permalink / raw)
  To: intel-gfx
  Cc: Francisco Jerez, Matthew Auld, Eero Tamminen, Jani Nikula,
	Mika Kuoppala, beignet, # v4 . 7+

From: Francisco Jerez <currojerez@riseup.net>

The WaDisableLSQCROPERFforOCL workaround has the side effect of
disabling an L3SQ optimization that has huge performance implications
and is unlikely to be necessary for the correct functioning of usual
graphic workloads.  Userspace is free to re-enable the workaround on
demand, and is generally in a better position to determine whether the
workaround is necessary than the DRM is (e.g. only during the
execution of compute kernels that rely on both L3 fences and HDC R/W
requests).

The same workaround seems to apply to BDW (at least to production
stepping G1) and SKL as well (the internal workaround database claims
that it does for all steppings, while the BSpec workaround table only
mentions pre-production steppings), but the DRM doesn't do anything
beyond whitelisting the L3SQCREG4 register so userspace can enable it
when it sees fit.  Do the same on KBL platforms.

Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
This is followed by a regression of 35% and 10% respectively for the
same benchmarks and platform caused by my recent patch series
switching userspace to use the dataport constant cache instead of the
sampler to implement uniform pull constant loads, which caused us to
hit more heavily the L3 cache (and on platforms other than KBL had the
opposite effect of improving performance of the same two benchmarks).
The overall effect on KBL of this change combined with the recent
userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
was affected by the constant cache changes (though it improved as it
did on other platforms rather than regressing), but is not
significantly affected by this patch (with statistical significance of
5% and sample size 20).

v2: Drop some more code to avoid unused variable warning.

Fixes: Fixes: 738fa1b3123f ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: beignet@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.7+
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 10 ----------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  8 --------
 2 files changed, 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index db714dc..8acab87 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -970,18 +970,8 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
 						uint32_t *batch,
 						uint32_t index)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
 	uint32_t l3sqc4_flush = (0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES);
 
-	/*
-	 * WaDisableLSQCROPERFforOCL:kbl
-	 * This WA is implemented in skl_init_clock_gating() but since
-	 * this batch updates GEN8_L3SQCREG4 with default value we need to
-	 * set this bit here to retain the WA during flush.
-	 */
-	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
-		l3sqc4_flush |= GEN8_LQSC_RO_PERF_DIS;
-
 	wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |
 				   MI_SRM_LRM_GLOBAL_GTT));
 	wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ab83fc2..49fa800 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1095,14 +1095,6 @@ static int kbl_init_workarounds(struct intel_engine_cs *engine)
 		WA_SET_BIT_MASKED(HDC_CHICKEN0,
 				  HDC_FENCE_DEST_SLM_DISABLE);
 
-	/* GEN8_L3SQCREG4 has a dependency with WA batch so any new changes
-	 * involving this register should also be added to WA batch as required.
-	 */
-	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
-		/* WaDisableLSQCROPERFforOCL:kbl */
-		I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) |
-			   GEN8_LQSC_RO_PERF_DIS);
-
 	/* WaToEnableHwFixForPushConstHWBug:kbl */
 	if (IS_KBL_REVID(dev_priv, KBL_REVID_C0, REVID_FOREVER))
 		WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
@ 2017-01-12 10:44 ` Mika Kuoppala
  0 siblings, 0 replies; 18+ messages in thread
From: Mika Kuoppala @ 2017-01-12 10:44 UTC (permalink / raw)
  To: intel-gfx; +Cc: Jani Nikula, Eero Tamminen, beignet, # v4 . 7+

From: Francisco Jerez <currojerez@riseup.net>

The WaDisableLSQCROPERFforOCL workaround has the side effect of
disabling an L3SQ optimization that has huge performance implications
and is unlikely to be necessary for the correct functioning of usual
graphic workloads.  Userspace is free to re-enable the workaround on
demand, and is generally in a better position to determine whether the
workaround is necessary than the DRM is (e.g. only during the
execution of compute kernels that rely on both L3 fences and HDC R/W
requests).

The same workaround seems to apply to BDW (at least to production
stepping G1) and SKL as well (the internal workaround database claims
that it does for all steppings, while the BSpec workaround table only
mentions pre-production steppings), but the DRM doesn't do anything
beyond whitelisting the L3SQCREG4 register so userspace can enable it
when it sees fit.  Do the same on KBL platforms.

Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
This is followed by a regression of 35% and 10% respectively for the
same benchmarks and platform caused by my recent patch series
switching userspace to use the dataport constant cache instead of the
sampler to implement uniform pull constant loads, which caused us to
hit more heavily the L3 cache (and on platforms other than KBL had the
opposite effect of improving performance of the same two benchmarks).
The overall effect on KBL of this change combined with the recent
userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
was affected by the constant cache changes (though it improved as it
did on other platforms rather than regressing), but is not
significantly affected by this patch (with statistical significance of
5% and sample size 20).

v2: Drop some more code to avoid unused variable warning.

Fixes: Fixes: 738fa1b3123f ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: beignet@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.7+
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 10 ----------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  8 --------
 2 files changed, 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index db714dc..8acab87 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -970,18 +970,8 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
 						uint32_t *batch,
 						uint32_t index)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
 	uint32_t l3sqc4_flush = (0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES);
 
-	/*
-	 * WaDisableLSQCROPERFforOCL:kbl
-	 * This WA is implemented in skl_init_clock_gating() but since
-	 * this batch updates GEN8_L3SQCREG4 with default value we need to
-	 * set this bit here to retain the WA during flush.
-	 */
-	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
-		l3sqc4_flush |= GEN8_LQSC_RO_PERF_DIS;
-
 	wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |
 				   MI_SRM_LRM_GLOBAL_GTT));
 	wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ab83fc2..49fa800 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1095,14 +1095,6 @@ static int kbl_init_workarounds(struct intel_engine_cs *engine)
 		WA_SET_BIT_MASKED(HDC_CHICKEN0,
 				  HDC_FENCE_DEST_SLM_DISABLE);
 
-	/* GEN8_L3SQCREG4 has a dependency with WA batch so any new changes
-	 * involving this register should also be added to WA batch as required.
-	 */
-	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_E0))
-		/* WaDisableLSQCROPERFforOCL:kbl */
-		I915_WRITE(GEN8_L3SQCREG4, I915_READ(GEN8_L3SQCREG4) |
-			   GEN8_LQSC_RO_PERF_DIS);
-
 	/* WaToEnableHwFixForPushConstHWBug:kbl */
 	if (IS_KBL_REVID(dev_priv, KBL_REVID_C0, REVID_FOREVER))
 		WA_SET_BIT_MASKED(COMMON_SLICE_CHICKEN2,
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-01-31  5:18 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-08  7:31 [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround Francisco Jerez
2017-01-08 11:57 ` kbuild test robot
2017-01-09 21:07 ` [PATCHv2] " Francisco Jerez
2017-01-11  8:17   ` [Intel-gfx] " Daniel Vetter
2017-01-11 12:07     ` Mika Kuoppala
2017-01-11 12:24       ` [Intel-gfx] " Chris Wilson
2017-01-11 12:40         ` Mika Kuoppala
2017-01-11 13:24         ` [Intel-gfx] " Daniel Vetter
2017-01-12  0:05           ` Francisco Jerez
2017-01-12 13:58             ` Mika Kuoppala
2017-01-12  0:03     ` [Intel-gfx] " Francisco Jerez
2017-01-12 10:44 [PATCH] " Mika Kuoppala
2017-01-12 10:44 ` Mika Kuoppala
2017-01-12 10:56 ` Chris Wilson
2017-01-30 13:19 FAILED: patch "[PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround." failed to apply to 4.9-stable tree gregkh
2017-01-30 20:24 ` [PATCH] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround Francisco Jerez
2017-01-31  5:11   ` Greg KH
2017-01-31  5:11     ` Greg KH
2017-01-31  5:11       ` Francisco Jerez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.