All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE
@ 2017-05-04 15:02 ` David Weinehall
  0 siblings, 0 replies; 10+ messages in thread
From: David Weinehall @ 2017-05-04 15:02 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Arkadiusz Hiler, Tvrtko Ursulin, stable

On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
> A good default for garbage entries from the user is to follow the
> default setting of the object (i.e. the PTE). Currently they use the
> uncached entry, and now the only way to accidentally hit uncached
> performance is via explicit use of the uncached MOCS or setting the
> object to uncached. Note that these entries are currently undefined in
> the ABI and we reserve the right to change them. We originally chose
> uncached to eliminate any problem with reducing the caching level in
> future, but the object is a much better definition of the minimum
> caching level.
> 
> Fixes: 3bbaba0ceaa2 ("drm/i915: Added Programming of the MOCS")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: David Weinehall <david.weinehall@linux.intel.com>
> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: stable@vger.kernel.org

LGTM, and passes our nightly msdk test case.

Tested-by: David Weinehall <david.weinehall@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/intel_mocs.c | 39 +++++++++++++++------------------------
>  1 file changed, 15 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
> index 92e461c68385..e7a7781ca457 100644
> --- a/drivers/gpu/drm/i915/intel_mocs.c
> +++ b/drivers/gpu/drm/i915/intel_mocs.c
> @@ -85,10 +85,7 @@ struct drm_i915_mocs_table {
>   *
>   * Entries not part of the following tables are undefined as far as
>   * userspace is concerned and shouldn't be relied upon.  For the time
> - * being they will be implicitly initialized to the strictest caching
> - * configuration (uncached) to guarantee forwards compatibility with
> - * userspace programs written against more recent kernels providing
> - * additional MOCS entries.
> + * being they will be implicitly initialized to follow the PTE.
>   *
>   * NOTE: These tables MUST start with being uncached and the length
>   *       MUST be less than 63 as the last two registers are reserved
> @@ -249,16 +246,13 @@ int intel_mocs_init_engine(struct intel_engine_cs *engine)
>  			   table.table[index].control_value);
>  
>  	/*
> -	 * Ok, now set the unused entries to uncached. These entries
> +	 * Ok, now set the unused entries to follow the PTE. These entries
>  	 * are officially undefined and no contract for the contents
>  	 * and settings is given for these entries.
> -	 *
> -	 * Entry 0 in the table is uncached - so we are just writing
> -	 * that value to all the used entries.
>  	 */
>  	for (; index < GEN9_NUM_MOCS_ENTRIES; index++)
>  		I915_WRITE(mocs_register(engine->id, index),
> -			   table.table[0].control_value);
> +			   table.table[I915_MOCS_PTE].control_value);
>  
>  	return 0;
>  }
> @@ -295,16 +289,13 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
>  	}
>  
>  	/*
> -	 * Ok, now set the unused entries to uncached. These entries
> +	 * Ok, now set the unused entries to follow the PTE. These entries
>  	 * are officially undefined and no contract for the contents
>  	 * and settings is given for these entries.
> -	 *
> -	 * Entry 0 in the table is uncached - so we are just writing
> -	 * that value to all the used entries.
>  	 */
>  	for (; index < GEN9_NUM_MOCS_ENTRIES; index++) {
>  		*cs++ = i915_mmio_reg_offset(mocs_register(engine, index));
> -		*cs++ = table->table[0].control_value;
> +		*cs++ = table->table[I915_MOCS_PTE].control_value;
>  	}
>  
>  	*cs++ = MI_NOOP;
> @@ -355,18 +346,17 @@ static int emit_mocs_l3cc_table(struct drm_i915_gem_request *req,
>  	if (table->size & 0x01) {
>  		/* Odd table size - 1 left over */
>  		*cs++ = i915_mmio_reg_offset(GEN9_LNCFCMOCS(i));
> -		*cs++ = l3cc_combine(table, 2 * i, 0);
> +		*cs++ = l3cc_combine(table, 2 * i, I915_MOCS_PTE);
>  		i++;
>  	}
>  
>  	/*
> -	 * Now set the rest of the table to uncached - use entry 0 as
> -	 * this will be uncached. Leave the last pair uninitialised as
> -	 * they are reserved by the hardware.
> +	 * Now set the rest of the table to follow the PTE.
> +	 * Leave the last pair as they are reserved by the hardware.
>  	 */
>  	for (; i < GEN9_NUM_MOCS_ENTRIES / 2; i++) {
>  		*cs++ = i915_mmio_reg_offset(GEN9_LNCFCMOCS(i));
> -		*cs++ = l3cc_combine(table, 0, 0);
> +		*cs++ = l3cc_combine(table, I915_MOCS_PTE, I915_MOCS_PTE);
>  	}
>  
>  	*cs++ = MI_NOOP;
> @@ -402,17 +392,18 @@ void intel_mocs_init_l3cc_table(struct drm_i915_private *dev_priv)
>  
>  	/* Odd table size - 1 left over */
>  	if (table.size & 0x01) {
> -		I915_WRITE(GEN9_LNCFCMOCS(i), l3cc_combine(&table, 2*i, 0));
> +		I915_WRITE(GEN9_LNCFCMOCS(i),
> +			   l3cc_combine(&table, 2*i, I915_MOCS_PTE));
>  		i++;
>  	}
>  
>  	/*
> -	 * Now set the rest of the table to uncached - use entry 0 as
> -	 * this will be uncached. Leave the last pair as initialised as
> -	 * they are reserved by the hardware.
> +	 * Now set the rest of the table to follow the PTE.
> +	 * Leave the last pair as they are reserved by the hardware.
>  	 */
>  	for (; i < (GEN9_NUM_MOCS_ENTRIES / 2); i++)
> -		I915_WRITE(GEN9_LNCFCMOCS(i), l3cc_combine(&table, 0, 0));
> +		I915_WRITE(GEN9_LNCFCMOCS(i),
> +			   l3cc_combine(&table, I915_MOCS_PTE, I915_MOCS_PTE));
>  }
>  
>  /**
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE
@ 2017-05-04 15:02 ` David Weinehall
  0 siblings, 0 replies; 10+ messages in thread
From: David Weinehall @ 2017-05-04 15:02 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable

On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
> A good default for garbage entries from the user is to follow the
> default setting of the object (i.e. the PTE). Currently they use the
> uncached entry, and now the only way to accidentally hit uncached
> performance is via explicit use of the uncached MOCS or setting the
> object to uncached. Note that these entries are currently undefined in
> the ABI and we reserve the right to change them. We originally chose
> uncached to eliminate any problem with reducing the caching level in
> future, but the object is a much better definition of the minimum
> caching level.
> 
> Fixes: 3bbaba0ceaa2 ("drm/i915: Added Programming of the MOCS")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: David Weinehall <david.weinehall@linux.intel.com>
> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: stable@vger.kernel.org

LGTM, and passes our nightly msdk test case.

Tested-by: David Weinehall <david.weinehall@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/intel_mocs.c | 39 +++++++++++++++------------------------
>  1 file changed, 15 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
> index 92e461c68385..e7a7781ca457 100644
> --- a/drivers/gpu/drm/i915/intel_mocs.c
> +++ b/drivers/gpu/drm/i915/intel_mocs.c
> @@ -85,10 +85,7 @@ struct drm_i915_mocs_table {
>   *
>   * Entries not part of the following tables are undefined as far as
>   * userspace is concerned and shouldn't be relied upon.  For the time
> - * being they will be implicitly initialized to the strictest caching
> - * configuration (uncached) to guarantee forwards compatibility with
> - * userspace programs written against more recent kernels providing
> - * additional MOCS entries.
> + * being they will be implicitly initialized to follow the PTE.
>   *
>   * NOTE: These tables MUST start with being uncached and the length
>   *       MUST be less than 63 as the last two registers are reserved
> @@ -249,16 +246,13 @@ int intel_mocs_init_engine(struct intel_engine_cs *engine)
>  			   table.table[index].control_value);
>  
>  	/*
> -	 * Ok, now set the unused entries to uncached. These entries
> +	 * Ok, now set the unused entries to follow the PTE. These entries
>  	 * are officially undefined and no contract for the contents
>  	 * and settings is given for these entries.
> -	 *
> -	 * Entry 0 in the table is uncached - so we are just writing
> -	 * that value to all the used entries.
>  	 */
>  	for (; index < GEN9_NUM_MOCS_ENTRIES; index++)
>  		I915_WRITE(mocs_register(engine->id, index),
> -			   table.table[0].control_value);
> +			   table.table[I915_MOCS_PTE].control_value);
>  
>  	return 0;
>  }
> @@ -295,16 +289,13 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
>  	}
>  
>  	/*
> -	 * Ok, now set the unused entries to uncached. These entries
> +	 * Ok, now set the unused entries to follow the PTE. These entries
>  	 * are officially undefined and no contract for the contents
>  	 * and settings is given for these entries.
> -	 *
> -	 * Entry 0 in the table is uncached - so we are just writing
> -	 * that value to all the used entries.
>  	 */
>  	for (; index < GEN9_NUM_MOCS_ENTRIES; index++) {
>  		*cs++ = i915_mmio_reg_offset(mocs_register(engine, index));
> -		*cs++ = table->table[0].control_value;
> +		*cs++ = table->table[I915_MOCS_PTE].control_value;
>  	}
>  
>  	*cs++ = MI_NOOP;
> @@ -355,18 +346,17 @@ static int emit_mocs_l3cc_table(struct drm_i915_gem_request *req,
>  	if (table->size & 0x01) {
>  		/* Odd table size - 1 left over */
>  		*cs++ = i915_mmio_reg_offset(GEN9_LNCFCMOCS(i));
> -		*cs++ = l3cc_combine(table, 2 * i, 0);
> +		*cs++ = l3cc_combine(table, 2 * i, I915_MOCS_PTE);
>  		i++;
>  	}
>  
>  	/*
> -	 * Now set the rest of the table to uncached - use entry 0 as
> -	 * this will be uncached. Leave the last pair uninitialised as
> -	 * they are reserved by the hardware.
> +	 * Now set the rest of the table to follow the PTE.
> +	 * Leave the last pair as they are reserved by the hardware.
>  	 */
>  	for (; i < GEN9_NUM_MOCS_ENTRIES / 2; i++) {
>  		*cs++ = i915_mmio_reg_offset(GEN9_LNCFCMOCS(i));
> -		*cs++ = l3cc_combine(table, 0, 0);
> +		*cs++ = l3cc_combine(table, I915_MOCS_PTE, I915_MOCS_PTE);
>  	}
>  
>  	*cs++ = MI_NOOP;
> @@ -402,17 +392,18 @@ void intel_mocs_init_l3cc_table(struct drm_i915_private *dev_priv)
>  
>  	/* Odd table size - 1 left over */
>  	if (table.size & 0x01) {
> -		I915_WRITE(GEN9_LNCFCMOCS(i), l3cc_combine(&table, 2*i, 0));
> +		I915_WRITE(GEN9_LNCFCMOCS(i),
> +			   l3cc_combine(&table, 2*i, I915_MOCS_PTE));
>  		i++;
>  	}
>  
>  	/*
> -	 * Now set the rest of the table to uncached - use entry 0 as
> -	 * this will be uncached. Leave the last pair as initialised as
> -	 * they are reserved by the hardware.
> +	 * Now set the rest of the table to follow the PTE.
> +	 * Leave the last pair as they are reserved by the hardware.
>  	 */
>  	for (; i < (GEN9_NUM_MOCS_ENTRIES / 2); i++)
> -		I915_WRITE(GEN9_LNCFCMOCS(i), l3cc_combine(&table, 0, 0));
> +		I915_WRITE(GEN9_LNCFCMOCS(i),
> +			   l3cc_combine(&table, I915_MOCS_PTE, I915_MOCS_PTE));
>  }
>  
>  /**
> -- 
> 2.11.0
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Set all undefined MOCS entries to    follow PTE
  2017-05-04 15:02 ` David Weinehall
  (?)
@ 2017-05-04 17:56 ` Francisco Jerez
  2017-05-04 20:03   ` Chris Wilson
  -1 siblings, 1 reply; 10+ messages in thread
From: Francisco Jerez @ 2017-05-04 17:56 UTC (permalink / raw)
  To: David Weinehall, Chris Wilson; +Cc: intel-gfx, stable


[-- Attachment #1.1: Type: text/plain, Size: 7416 bytes --]

David Weinehall <david.weinehall@linux.intel.com> writes:

> On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
>> A good default for garbage entries from the user is to follow the
>> default setting of the object (i.e. the PTE). Currently they use the
>> uncached entry, and now the only way to accidentally hit uncached
>> performance is via explicit use of the uncached MOCS or setting the
>> object to uncached. Note that these entries are currently undefined in
>> the ABI and we reserve the right to change them. We originally chose
>> uncached to eliminate any problem with reducing the caching level in
>> future, but the object is a much better definition of the minimum
>> caching level.
>> 

NAK.  The reason for the default being UC is that it's the only setting
that guarantees full forwards compatibility with any other entry that
might be added in the future.  If you default to PTE on (e)LLC and WB on
L3, userspace will no longer be able to use any newly introduced entry
with stricter coherency guarantees than that (e.g. any L3-uncached
entry) in a backwards-compatible way.  Attempting to do so may break
memory coherency assumptions of the application and lead to misrendering
when run on older kernel versions (which to my judgment is a scarier
failure mode than reduced performance).

My other concern is that this change may make inadvertent use of
undefined MOCS entries extremely difficult to detect in some cases -- UC
gives userspace a pretty obvious (if functionally harmless) indicative
that it's got its caching settings wrong, and is a strong motivation for
userspace developers to contribute MOCS table changes to the kernel
instead of blindly making assumptions about them (e.g. that they match
the Android kernel as media-sdk was probably doing).  With this change
checked in, the performance drawback from using media-sdk on an upstream
kernel may have been subtle enough that David would never have bothered
to look into the issue.  People may have started shipping copies of
media-sdk making bogus MOCS table assumptions (with potential
correctness implications), at which point you would have to deal with
userspace regressions anytime the MOCS table is extended in the future.

>> Fixes: 3bbaba0ceaa2 ("drm/i915: Added Programming of the MOCS")
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: David Weinehall <david.weinehall@linux.intel.com>
>> Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: stable@vger.kernel.org
>
> LGTM, and passes our nightly msdk test case.
>
> Tested-by: David Weinehall <david.weinehall@linux.intel.com>
> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
>
>> ---
>>  drivers/gpu/drm/i915/intel_mocs.c | 39 +++++++++++++++------------------------
>>  1 file changed, 15 insertions(+), 24 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/intel_mocs.c b/drivers/gpu/drm/i915/intel_mocs.c
>> index 92e461c68385..e7a7781ca457 100644
>> --- a/drivers/gpu/drm/i915/intel_mocs.c
>> +++ b/drivers/gpu/drm/i915/intel_mocs.c
>> @@ -85,10 +85,7 @@ struct drm_i915_mocs_table {
>>   *
>>   * Entries not part of the following tables are undefined as far as
>>   * userspace is concerned and shouldn't be relied upon.  For the time
>> - * being they will be implicitly initialized to the strictest caching
>> - * configuration (uncached) to guarantee forwards compatibility with
>> - * userspace programs written against more recent kernels providing
>> - * additional MOCS entries.
>> + * being they will be implicitly initialized to follow the PTE.
>>   *
>>   * NOTE: These tables MUST start with being uncached and the length
>>   *       MUST be less than 63 as the last two registers are reserved
>> @@ -249,16 +246,13 @@ int intel_mocs_init_engine(struct intel_engine_cs *engine)
>>  			   table.table[index].control_value);
>>  
>>  	/*
>> -	 * Ok, now set the unused entries to uncached. These entries
>> +	 * Ok, now set the unused entries to follow the PTE. These entries
>>  	 * are officially undefined and no contract for the contents
>>  	 * and settings is given for these entries.
>> -	 *
>> -	 * Entry 0 in the table is uncached - so we are just writing
>> -	 * that value to all the used entries.
>>  	 */
>>  	for (; index < GEN9_NUM_MOCS_ENTRIES; index++)
>>  		I915_WRITE(mocs_register(engine->id, index),
>> -			   table.table[0].control_value);
>> +			   table.table[I915_MOCS_PTE].control_value);
>>  
>>  	return 0;
>>  }
>> @@ -295,16 +289,13 @@ static int emit_mocs_control_table(struct drm_i915_gem_request *req,
>>  	}
>>  
>>  	/*
>> -	 * Ok, now set the unused entries to uncached. These entries
>> +	 * Ok, now set the unused entries to follow the PTE. These entries
>>  	 * are officially undefined and no contract for the contents
>>  	 * and settings is given for these entries.
>> -	 *
>> -	 * Entry 0 in the table is uncached - so we are just writing
>> -	 * that value to all the used entries.
>>  	 */
>>  	for (; index < GEN9_NUM_MOCS_ENTRIES; index++) {
>>  		*cs++ = i915_mmio_reg_offset(mocs_register(engine, index));
>> -		*cs++ = table->table[0].control_value;
>> +		*cs++ = table->table[I915_MOCS_PTE].control_value;
>>  	}
>>  
>>  	*cs++ = MI_NOOP;
>> @@ -355,18 +346,17 @@ static int emit_mocs_l3cc_table(struct drm_i915_gem_request *req,
>>  	if (table->size & 0x01) {
>>  		/* Odd table size - 1 left over */
>>  		*cs++ = i915_mmio_reg_offset(GEN9_LNCFCMOCS(i));
>> -		*cs++ = l3cc_combine(table, 2 * i, 0);
>> +		*cs++ = l3cc_combine(table, 2 * i, I915_MOCS_PTE);
>>  		i++;
>>  	}
>>  
>>  	/*
>> -	 * Now set the rest of the table to uncached - use entry 0 as
>> -	 * this will be uncached. Leave the last pair uninitialised as
>> -	 * they are reserved by the hardware.
>> +	 * Now set the rest of the table to follow the PTE.
>> +	 * Leave the last pair as they are reserved by the hardware.
>>  	 */
>>  	for (; i < GEN9_NUM_MOCS_ENTRIES / 2; i++) {
>>  		*cs++ = i915_mmio_reg_offset(GEN9_LNCFCMOCS(i));
>> -		*cs++ = l3cc_combine(table, 0, 0);
>> +		*cs++ = l3cc_combine(table, I915_MOCS_PTE, I915_MOCS_PTE);
>>  	}
>>  
>>  	*cs++ = MI_NOOP;
>> @@ -402,17 +392,18 @@ void intel_mocs_init_l3cc_table(struct drm_i915_private *dev_priv)
>>  
>>  	/* Odd table size - 1 left over */
>>  	if (table.size & 0x01) {
>> -		I915_WRITE(GEN9_LNCFCMOCS(i), l3cc_combine(&table, 2*i, 0));
>> +		I915_WRITE(GEN9_LNCFCMOCS(i),
>> +			   l3cc_combine(&table, 2*i, I915_MOCS_PTE));
>>  		i++;
>>  	}
>>  
>>  	/*
>> -	 * Now set the rest of the table to uncached - use entry 0 as
>> -	 * this will be uncached. Leave the last pair as initialised as
>> -	 * they are reserved by the hardware.
>> +	 * Now set the rest of the table to follow the PTE.
>> +	 * Leave the last pair as they are reserved by the hardware.
>>  	 */
>>  	for (; i < (GEN9_NUM_MOCS_ENTRIES / 2); i++)
>> -		I915_WRITE(GEN9_LNCFCMOCS(i), l3cc_combine(&table, 0, 0));
>> +		I915_WRITE(GEN9_LNCFCMOCS(i),
>> +			   l3cc_combine(&table, I915_MOCS_PTE, I915_MOCS_PTE));
>>  }
>>  
>>  /**
>> -- 
>> 2.11.0
>> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 212 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE
  2017-05-04 17:56 ` [Intel-gfx] " Francisco Jerez
@ 2017-05-04 20:03   ` Chris Wilson
  2017-05-04 20:59     ` Francisco Jerez
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Wilson @ 2017-05-04 20:03 UTC (permalink / raw)
  To: Francisco Jerez; +Cc: David Weinehall, intel-gfx, stable

On Thu, May 04, 2017 at 10:56:54AM -0700, Francisco Jerez wrote:
> David Weinehall <david.weinehall@linux.intel.com> writes:
> 
> > On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
> >> A good default for garbage entries from the user is to follow the
> >> default setting of the object (i.e. the PTE). Currently they use the
> >> uncached entry, and now the only way to accidentally hit uncached
> >> performance is via explicit use of the uncached MOCS or setting the
> >> object to uncached. Note that these entries are currently undefined in
> >> the ABI and we reserve the right to change them. We originally chose
> >> uncached to eliminate any problem with reducing the caching level in
> >> future, but the object is a much better definition of the minimum
> >> caching level.
> >> 
> 
> NAK.  The reason for the default being UC is that it's the only setting
> that guarantees full forwards compatibility with any other entry that
> might be added in the future.  If you default to PTE on (e)LLC and WB on
> L3, userspace will no longer be able to use any newly introduced entry
> with stricter coherency guarantees than that (e.g. any L3-uncached
> entry) in a backwards-compatible way.  Attempting to do so may break
> memory coherency assumptions of the application and lead to misrendering
> when run on older kernel versions (which to my judgment is a scarier
> failure mode than reduced performance).

You can't use a weaker coherency model in mocs than that specified for
the object as you can't control other uses of the object (even just
memory pressure will break your assumptions).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE
  2017-05-04 20:03   ` Chris Wilson
@ 2017-05-04 20:59     ` Francisco Jerez
  2017-06-28 10:19       ` Chris Wilson
  0 siblings, 1 reply; 10+ messages in thread
From: Francisco Jerez @ 2017-05-04 20:59 UTC (permalink / raw)
  To: Chris Wilson; +Cc: David Weinehall, intel-gfx, stable


[-- Attachment #1.1: Type: text/plain, Size: 2012 bytes --]

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Thu, May 04, 2017 at 10:56:54AM -0700, Francisco Jerez wrote:
>> David Weinehall <david.weinehall@linux.intel.com> writes:
>> 
>> > On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
>> >> A good default for garbage entries from the user is to follow the
>> >> default setting of the object (i.e. the PTE). Currently they use the
>> >> uncached entry, and now the only way to accidentally hit uncached
>> >> performance is via explicit use of the uncached MOCS or setting the
>> >> object to uncached. Note that these entries are currently undefined in
>> >> the ABI and we reserve the right to change them. We originally chose
>> >> uncached to eliminate any problem with reducing the caching level in
>> >> future, but the object is a much better definition of the minimum
>> >> caching level.
>> >> 
>> 
>> NAK.  The reason for the default being UC is that it's the only setting
>> that guarantees full forwards compatibility with any other entry that
>> might be added in the future.  If you default to PTE on (e)LLC and WB on
>> L3, userspace will no longer be able to use any newly introduced entry
>> with stricter coherency guarantees than that (e.g. any L3-uncached
>> entry) in a backwards-compatible way.  Attempting to do so may break
>> memory coherency assumptions of the application and lead to misrendering
>> when run on older kernel versions (which to my judgment is a scarier
>> failure mode than reduced performance).
>
> You can't use a weaker coherency model in mocs than that specified for
> the object as you can't control other uses of the object (even just
> memory pressure will break your assumptions).

Exactly, but you can use a stronger coherency model than the application
requested, which is why falling back to UC should generally work for
unknown entries but falling back to PTE+WB isn't guaranteed to.

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 212 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE
  2017-05-04 20:59     ` Francisco Jerez
@ 2017-06-28 10:19       ` Chris Wilson
  2017-06-28 21:10           ` Francisco Jerez
  2017-06-29  8:35           ` Daniel Vetter
  0 siblings, 2 replies; 10+ messages in thread
From: Chris Wilson @ 2017-06-28 10:19 UTC (permalink / raw)
  To: Francisco Jerez; +Cc: intel-gfx, stable

Quoting Francisco Jerez (2017-05-04 21:59:44)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Thu, May 04, 2017 at 10:56:54AM -0700, Francisco Jerez wrote:
> >> David Weinehall <david.weinehall@linux.intel.com> writes:
> >> 
> >> > On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
> >> >> A good default for garbage entries from the user is to follow the
> >> >> default setting of the object (i.e. the PTE). Currently they use the
> >> >> uncached entry, and now the only way to accidentally hit uncached
> >> >> performance is via explicit use of the uncached MOCS or setting the
> >> >> object to uncached. Note that these entries are currently undefined in
> >> >> the ABI and we reserve the right to change them. We originally chose
> >> >> uncached to eliminate any problem with reducing the caching level in
> >> >> future, but the object is a much better definition of the minimum
> >> >> caching level.
> >> >> 
> >> 
> >> NAK.  The reason for the default being UC is that it's the only setting
> >> that guarantees full forwards compatibility with any other entry that
> >> might be added in the future.  If you default to PTE on (e)LLC and WB on
> >> L3, userspace will no longer be able to use any newly introduced entry
> >> with stricter coherency guarantees than that (e.g. any L3-uncached
> >> entry) in a backwards-compatible way.  Attempting to do so may break
> >> memory coherency assumptions of the application and lead to misrendering
> >> when run on older kernel versions (which to my judgment is a scarier
> >> failure mode than reduced performance).
> >
> > You can't use a weaker coherency model in mocs than that specified for
> > the object as you can't control other uses of the object (even just
> > memory pressure will break your assumptions).
> 
> Exactly, but you can use a stronger coherency model than the application
> requested, which is why falling back to UC should generally work for
> unknown entries but falling back to PTE+WB isn't guaranteed to.

Still wrong. GEM will write into the CPU cache believing the object is
coherent. The GPU will read from memory bypassing the CPU cache
following the UC mocs. The only safe option is for it to follow PTE.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE
  2017-06-28 10:19       ` Chris Wilson
@ 2017-06-28 21:10           ` Francisco Jerez
  2017-06-29  8:35           ` Daniel Vetter
  1 sibling, 0 replies; 10+ messages in thread
From: Francisco Jerez @ 2017-06-28 21:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: David Weinehall, intel-gfx, stable


[-- Attachment #1.1: Type: text/plain, Size: 2896 bytes --]

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Francisco Jerez (2017-05-04 21:59:44)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > On Thu, May 04, 2017 at 10:56:54AM -0700, Francisco Jerez wrote:
>> >> David Weinehall <david.weinehall@linux.intel.com> writes:
>> >> 
>> >> > On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
>> >> >> A good default for garbage entries from the user is to follow the
>> >> >> default setting of the object (i.e. the PTE). Currently they use the
>> >> >> uncached entry, and now the only way to accidentally hit uncached
>> >> >> performance is via explicit use of the uncached MOCS or setting the
>> >> >> object to uncached. Note that these entries are currently undefined in
>> >> >> the ABI and we reserve the right to change them. We originally chose
>> >> >> uncached to eliminate any problem with reducing the caching level in
>> >> >> future, but the object is a much better definition of the minimum
>> >> >> caching level.
>> >> >> 
>> >> 
>> >> NAK.  The reason for the default being UC is that it's the only setting
>> >> that guarantees full forwards compatibility with any other entry that
>> >> might be added in the future.  If you default to PTE on (e)LLC and WB on
>> >> L3, userspace will no longer be able to use any newly introduced entry
>> >> with stricter coherency guarantees than that (e.g. any L3-uncached
>> >> entry) in a backwards-compatible way.  Attempting to do so may break
>> >> memory coherency assumptions of the application and lead to misrendering
>> >> when run on older kernel versions (which to my judgment is a scarier
>> >> failure mode than reduced performance).
>> >
>> > You can't use a weaker coherency model in mocs than that specified for
>> > the object as you can't control other uses of the object (even just
>> > memory pressure will break your assumptions).
>> 
>> Exactly, but you can use a stronger coherency model than the application
>> requested, which is why falling back to UC should generally work for
>> unknown entries but falling back to PTE+WB isn't guaranteed to.
>
> Still wrong. GEM will write into the CPU cache believing the object is
> coherent. The GPU will read from memory bypassing the CPU cache
> following the UC mocs.

I agree that this is a plausible scenario.

> The only safe option is for it to follow PTE.

Except you don't know whether the client reading or writing at the other
end is the CPU, or whether the client at the other end is (set up to be)
LLC-coherent.  There's likely no 100% safe option on the LLC side of
things.

I could probably be convinced that in a number of scenarios PTE on LLC
has somewhat better chances of success, but on the L3 side of things
this patch enables WB which is AFAIA strictly more weakly coherent than
UC, so it still gets my NAK.

> -Chris


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 212 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE
@ 2017-06-28 21:10           ` Francisco Jerez
  0 siblings, 0 replies; 10+ messages in thread
From: Francisco Jerez @ 2017-06-28 21:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable


[-- Attachment #1.1.1: Type: text/plain, Size: 2896 bytes --]

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Francisco Jerez (2017-05-04 21:59:44)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > On Thu, May 04, 2017 at 10:56:54AM -0700, Francisco Jerez wrote:
>> >> David Weinehall <david.weinehall@linux.intel.com> writes:
>> >> 
>> >> > On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
>> >> >> A good default for garbage entries from the user is to follow the
>> >> >> default setting of the object (i.e. the PTE). Currently they use the
>> >> >> uncached entry, and now the only way to accidentally hit uncached
>> >> >> performance is via explicit use of the uncached MOCS or setting the
>> >> >> object to uncached. Note that these entries are currently undefined in
>> >> >> the ABI and we reserve the right to change them. We originally chose
>> >> >> uncached to eliminate any problem with reducing the caching level in
>> >> >> future, but the object is a much better definition of the minimum
>> >> >> caching level.
>> >> >> 
>> >> 
>> >> NAK.  The reason for the default being UC is that it's the only setting
>> >> that guarantees full forwards compatibility with any other entry that
>> >> might be added in the future.  If you default to PTE on (e)LLC and WB on
>> >> L3, userspace will no longer be able to use any newly introduced entry
>> >> with stricter coherency guarantees than that (e.g. any L3-uncached
>> >> entry) in a backwards-compatible way.  Attempting to do so may break
>> >> memory coherency assumptions of the application and lead to misrendering
>> >> when run on older kernel versions (which to my judgment is a scarier
>> >> failure mode than reduced performance).
>> >
>> > You can't use a weaker coherency model in mocs than that specified for
>> > the object as you can't control other uses of the object (even just
>> > memory pressure will break your assumptions).
>> 
>> Exactly, but you can use a stronger coherency model than the application
>> requested, which is why falling back to UC should generally work for
>> unknown entries but falling back to PTE+WB isn't guaranteed to.
>
> Still wrong. GEM will write into the CPU cache believing the object is
> coherent. The GPU will read from memory bypassing the CPU cache
> following the UC mocs.

I agree that this is a plausible scenario.

> The only safe option is for it to follow PTE.

Except you don't know whether the client reading or writing at the other
end is the CPU, or whether the client at the other end is (set up to be)
LLC-coherent.  There's likely no 100% safe option on the LLC side of
things.

I could probably be convinced that in a number of scenarios PTE on LLC
has somewhat better chances of success, but on the L3 side of things
this patch enables WB which is AFAIA strictly more weakly coherent than
UC, so it still gets my NAK.

> -Chris


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 212 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE
  2017-06-28 10:19       ` Chris Wilson
@ 2017-06-29  8:35           ` Daniel Vetter
  2017-06-29  8:35           ` Daniel Vetter
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel Vetter @ 2017-06-29  8:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Francisco Jerez, intel-gfx, stable

On Wed, Jun 28, 2017 at 11:19:21AM +0100, Chris Wilson wrote:
> Quoting Francisco Jerez (2017-05-04 21:59:44)
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > 
> > > On Thu, May 04, 2017 at 10:56:54AM -0700, Francisco Jerez wrote:
> > >> David Weinehall <david.weinehall@linux.intel.com> writes:
> > >> 
> > >> > On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
> > >> >> A good default for garbage entries from the user is to follow the
> > >> >> default setting of the object (i.e. the PTE). Currently they use the
> > >> >> uncached entry, and now the only way to accidentally hit uncached
> > >> >> performance is via explicit use of the uncached MOCS or setting the
> > >> >> object to uncached. Note that these entries are currently undefined in
> > >> >> the ABI and we reserve the right to change them. We originally chose
> > >> >> uncached to eliminate any problem with reducing the caching level in
> > >> >> future, but the object is a much better definition of the minimum
> > >> >> caching level.
> > >> >> 
> > >> 
> > >> NAK.  The reason for the default being UC is that it's the only setting
> > >> that guarantees full forwards compatibility with any other entry that
> > >> might be added in the future.  If you default to PTE on (e)LLC and WB on
> > >> L3, userspace will no longer be able to use any newly introduced entry
> > >> with stricter coherency guarantees than that (e.g. any L3-uncached
> > >> entry) in a backwards-compatible way.  Attempting to do so may break
> > >> memory coherency assumptions of the application and lead to misrendering
> > >> when run on older kernel versions (which to my judgment is a scarier
> > >> failure mode than reduced performance).

Tbh, if you use a MOCS entry that's not set that's a userspace bug and
needs to be fixed there. I don't think we should frob the settings of the
unused entries. The rough plan is to have a GETPARAM to give you the
version of the MOCS table, with the idea we'll only add new entries (on
any giving pci id) at the end (and hope it'll never get full, but that
seems to be the case looking at the bsepc recommended mocs entries). If
userspace wants to use more than what's there at power on (or when we
declare the i915.ko support stable and enable it by default), then it must
consult the mocs table. Imo mocs entries shouldn't change once we've made
that slot part of the uapi, that's just too much headaches.

media-sdk userspace is in the process of getting fixed to work like that.

Still a reason not to move forward with this patch, but a different one
:-)

> > > You can't use a weaker coherency model in mocs than that specified for
> > > the object as you can't control other uses of the object (even just
> > > memory pressure will break your assumptions).
> > 
> > Exactly, but you can use a stronger coherency model than the application
> > requested, which is why falling back to UC should generally work for
> > unknown entries but falling back to PTE+WB isn't guaranteed to.
> 
> Still wrong. GEM will write into the CPU cache believing the object is
> coherent. The GPU will read from memory bypassing the CPU cache
> following the UC mocs. The only safe option is for it to follow PTE.

Userspace is already allowed to shoot it's feet off with corrupted
cachelines when e.g. not calling set_domain when it should. Same applies
if it plays games on the gpu side.

The only thing we need to ensure from the kernel pov is that gpu caches
are fully flushed (either to memory or cpu coherent caches) when the batch
is over, to make sure the gpu doesn't scribble  over memory after we've
swapped it out. We might swap out garbage, but that's not the kernel's
problem.

Or do I miss something, and userspace can (ab)use this to pull the kernel over
the table?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE
@ 2017-06-29  8:35           ` Daniel Vetter
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel Vetter @ 2017-06-29  8:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable

On Wed, Jun 28, 2017 at 11:19:21AM +0100, Chris Wilson wrote:
> Quoting Francisco Jerez (2017-05-04 21:59:44)
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > 
> > > On Thu, May 04, 2017 at 10:56:54AM -0700, Francisco Jerez wrote:
> > >> David Weinehall <david.weinehall@linux.intel.com> writes:
> > >> 
> > >> > On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote:
> > >> >> A good default for garbage entries from the user is to follow the
> > >> >> default setting of the object (i.e. the PTE). Currently they use the
> > >> >> uncached entry, and now the only way to accidentally hit uncached
> > >> >> performance is via explicit use of the uncached MOCS or setting the
> > >> >> object to uncached. Note that these entries are currently undefined in
> > >> >> the ABI and we reserve the right to change them. We originally chose
> > >> >> uncached to eliminate any problem with reducing the caching level in
> > >> >> future, but the object is a much better definition of the minimum
> > >> >> caching level.
> > >> >> 
> > >> 
> > >> NAK.  The reason for the default being UC is that it's the only setting
> > >> that guarantees full forwards compatibility with any other entry that
> > >> might be added in the future.  If you default to PTE on (e)LLC and WB on
> > >> L3, userspace will no longer be able to use any newly introduced entry
> > >> with stricter coherency guarantees than that (e.g. any L3-uncached
> > >> entry) in a backwards-compatible way.  Attempting to do so may break
> > >> memory coherency assumptions of the application and lead to misrendering
> > >> when run on older kernel versions (which to my judgment is a scarier
> > >> failure mode than reduced performance).

Tbh, if you use a MOCS entry that's not set that's a userspace bug and
needs to be fixed there. I don't think we should frob the settings of the
unused entries. The rough plan is to have a GETPARAM to give you the
version of the MOCS table, with the idea we'll only add new entries (on
any giving pci id) at the end (and hope it'll never get full, but that
seems to be the case looking at the bsepc recommended mocs entries). If
userspace wants to use more than what's there at power on (or when we
declare the i915.ko support stable and enable it by default), then it must
consult the mocs table. Imo mocs entries shouldn't change once we've made
that slot part of the uapi, that's just too much headaches.

media-sdk userspace is in the process of getting fixed to work like that.

Still a reason not to move forward with this patch, but a different one
:-)

> > > You can't use a weaker coherency model in mocs than that specified for
> > > the object as you can't control other uses of the object (even just
> > > memory pressure will break your assumptions).
> > 
> > Exactly, but you can use a stronger coherency model than the application
> > requested, which is why falling back to UC should generally work for
> > unknown entries but falling back to PTE+WB isn't guaranteed to.
> 
> Still wrong. GEM will write into the CPU cache believing the object is
> coherent. The GPU will read from memory bypassing the CPU cache
> following the UC mocs. The only safe option is for it to follow PTE.

Userspace is already allowed to shoot it's feet off with corrupted
cachelines when e.g. not calling set_domain when it should. Same applies
if it plays games on the gpu side.

The only thing we need to ensure from the kernel pov is that gpu caches
are fully flushed (either to memory or cpu coherent caches) when the batch
is over, to make sure the gpu doesn't scribble  over memory after we've
swapped it out. We might swap out garbage, but that's not the kernel's
problem.

Or do I miss something, and userspace can (ab)use this to pull the kernel over
the table?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-06-29  8:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04 15:02 [PATCH] drm/i915: Set all undefined MOCS entries to follow PTE David Weinehall
2017-05-04 15:02 ` David Weinehall
2017-05-04 17:56 ` [Intel-gfx] " Francisco Jerez
2017-05-04 20:03   ` Chris Wilson
2017-05-04 20:59     ` Francisco Jerez
2017-06-28 10:19       ` Chris Wilson
2017-06-28 21:10         ` [Intel-gfx] " Francisco Jerez
2017-06-28 21:10           ` Francisco Jerez
2017-06-29  8:35         ` [Intel-gfx] " Daniel Vetter
2017-06-29  8:35           ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.