intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
@ 2023-02-08 10:51 Andrzej Hajda
  2023-02-08 11:03 ` Matthew Auld
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Andrzej Hajda @ 2023-02-08 10:51 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Matthew Auld, Andrzej Hajda, Rodrigo Vivi,
	chris.p.wilson

Write-combining memory allows speculative reads by CPU.
ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
to prefetch memory beyond the error_capture, ie it tries
to read memory pointed by next PTE in GGTT.
If this PTE points to invalid address DMAR errors will occur.
This behaviour was observed on ADL, RPL, DG2 platforms.
To avoid it, guard scratch page should be added after error_capture.
The patch fixes the most annoying issue with error capture but
since WC reads are used also in other places there is a risk similar
problem can affect them as well.

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
---
This patch tries to diminish plague of DMAR read errors present
in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
CI is usually tolerant for these errors, so the scale of the problem
is not really visible.
To show it I have counted lines containing DMAR read errors in dmesgs
produced by CI for all three versions of the patch, but in contrast to v2
I have grepped only for lines containing "PTE Read access".
Below stats for kernel w/o patch vs patched one.
v1: 210 vs 0
v2: 201 vs 0
v3: 214 vs 0
Apparently the patch fixes all common PTE read errors.

In previous version there were different numbers due to less exact grepping,
"grep DMAR" catched write errors and "DMAR: DRHD: handling fault status reg"
lines, anyway the actual number of errors is much bigger - DMAR errors
are rate-limited.

[1]: http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt

Changelog:
v2:
    - modified commit message (I hope the diagnosis is correct),
    - added bug checks to ensure scratch is initialized on gen3 platforms.
      CI produces strange stacktrace for it suggesting scratch[0] is NULL,
      to be removed after resolving the issue with gen3 platforms.
v3:
    - removed bug checks, replaced with gen check.
v4:
    - change code for scratch page insertion to support all platforms,
    - add info in commit message there could be more similar issues

Regards
Andrzej
---
 drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 ++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 842e69c7b21e49..6566d2066f1f8b 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt *ggtt)
 	mutex_destroy(&ggtt->error_mutex);
 }
 
+static void
+ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 length)
+{
+	struct i915_address_space *vm = &ggtt->vm;
+
+	if (GRAPHICS_VER(ggtt->vm.i915) < 8)
+		return vm->clear_range(vm, offset, length);
+	/* clear_range since gen8 is nop */
+	while (length > 0) {
+		vm->insert_page(vm, px_dma(vm->scratch[0]), offset, I915_CACHE_NONE, 0);
+		offset += I915_GTT_PAGE_SIZE;
+		length -= I915_GTT_PAGE_SIZE;
+	}
+}
+
 static int init_ggtt(struct i915_ggtt *ggtt)
 {
 	/*
@@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
 		 * paths, and we trust that 0 will remain reserved. However,
 		 * the only likely reason for failure to insert is a driver
 		 * bug, which we expect to cause other failures...
+		 *
+		 * Since CPU can perform speculative reads on error capture
+		 * (write-combining allows it) add scratch page after error
+		 * capture to avoid DMAR errors.
 		 */
-		ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
+		ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
 		ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
 		if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
 			drm_mm_insert_node_in_range(&ggtt->vm.mm,
@@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
 						    0, ggtt->mappable_end,
 						    DRM_MM_INSERT_LOW);
 	}
-	if (drm_mm_node_allocated(&ggtt->error_capture))
+	if (drm_mm_node_allocated(&ggtt->error_capture)) {
+		u64 start = ggtt->error_capture.start;
+		u64 size = ggtt->error_capture.size;
+
+		ggtt_insert_scratch_pages(ggtt, start, size);
 		drm_dbg(&ggtt->vm.i915->drm,
 			"Reserved GGTT:[%llx, %llx] for use by error capture\n",
-			ggtt->error_capture.start,
-			ggtt->error_capture.start + ggtt->error_capture.size);
+			start, start + size);
+	}
 
 	/*
 	 * The upper portion of the GuC address space has a sizeable hole
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
  2023-02-08 10:51 [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
@ 2023-02-08 11:03 ` Matthew Auld
  2023-02-08 11:17   ` Andrzej Hajda
  2023-02-08 18:39 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev5) Patchwork
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Matthew Auld @ 2023-02-08 11:03 UTC (permalink / raw)
  To: Andrzej Hajda, intel-gfx
  Cc: Thomas Hellström, Rodrigo Vivi, chris.p.wilson

On 08/02/2023 10:51, Andrzej Hajda wrote:
> Write-combining memory allows speculative reads by CPU.
> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
> to prefetch memory beyond the error_capture, ie it tries
> to read memory pointed by next PTE in GGTT.
> If this PTE points to invalid address DMAR errors will occur.
> This behaviour was observed on ADL, RPL, DG2 platforms.

Note that DG2 doesn't use this path for error capture, since it lacks 
mappable aperture. Do you know if CI sees any DMAR errors related to 
error capture on DG2/DG1?

> To avoid it, guard scratch page should be added after error_capture.
> The patch fixes the most annoying issue with error capture but
> since WC reads are used also in other places there is a risk similar
> problem can affect them as well.
> 
> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
> This patch tries to diminish plague of DMAR read errors present
> in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
> CI is usually tolerant for these errors, so the scale of the problem
> is not really visible.
> To show it I have counted lines containing DMAR read errors in dmesgs
> produced by CI for all three versions of the patch, but in contrast to v2
> I have grepped only for lines containing "PTE Read access".
> Below stats for kernel w/o patch vs patched one.
> v1: 210 vs 0
> v2: 201 vs 0
> v3: 214 vs 0
> Apparently the patch fixes all common PTE read errors.
> 
> In previous version there were different numbers due to less exact grepping,
> "grep DMAR" catched write errors and "DMAR: DRHD: handling fault status reg"
> lines, anyway the actual number of errors is much bigger - DMAR errors
> are rate-limited.
> 
> [1]: http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt
> 
> Changelog:
> v2:
>      - modified commit message (I hope the diagnosis is correct),
>      - added bug checks to ensure scratch is initialized on gen3 platforms.
>        CI produces strange stacktrace for it suggesting scratch[0] is NULL,
>        to be removed after resolving the issue with gen3 platforms.
> v3:
>      - removed bug checks, replaced with gen check.
> v4:
>      - change code for scratch page insertion to support all platforms,
>      - add info in commit message there could be more similar issues
> 
> Regards
> Andrzej
> ---
>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 ++++++++++++++++++++++++----
>   1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 842e69c7b21e49..6566d2066f1f8b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt *ggtt)
>   	mutex_destroy(&ggtt->error_mutex);
>   }
>   
> +static void
> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 length)
> +{
> +	struct i915_address_space *vm = &ggtt->vm;
> +
> +	if (GRAPHICS_VER(ggtt->vm.i915) < 8)
> +		return vm->clear_range(vm, offset, length);
> +	/* clear_range since gen8 is nop */
> +	while (length > 0) {
> +		vm->insert_page(vm, px_dma(vm->scratch[0]), offset, I915_CACHE_NONE, 0);
> +		offset += I915_GTT_PAGE_SIZE;
> +		length -= I915_GTT_PAGE_SIZE;
> +	}
> +}
> +
>   static int init_ggtt(struct i915_ggtt *ggtt)
>   {
>   	/*
> @@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   		 * paths, and we trust that 0 will remain reserved. However,
>   		 * the only likely reason for failure to insert is a driver
>   		 * bug, which we expect to cause other failures...
> +		 *
> +		 * Since CPU can perform speculative reads on error capture
> +		 * (write-combining allows it) add scratch page after error
> +		 * capture to avoid DMAR errors.
>   		 */
> -		ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
> +		ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>   		ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>   		if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>   			drm_mm_insert_node_in_range(&ggtt->vm.mm,
> @@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   						    0, ggtt->mappable_end,
>   						    DRM_MM_INSERT_LOW);
>   	}
> -	if (drm_mm_node_allocated(&ggtt->error_capture))
> +	if (drm_mm_node_allocated(&ggtt->error_capture)) {
> +		u64 start = ggtt->error_capture.start;
> +		u64 size = ggtt->error_capture.size;
> +
> +		ggtt_insert_scratch_pages(ggtt, start, size);
>   		drm_dbg(&ggtt->vm.i915->drm,
>   			"Reserved GGTT:[%llx, %llx] for use by error capture\n",
> -			ggtt->error_capture.start,
> -			ggtt->error_capture.start + ggtt->error_capture.size);
> +			start, start + size);
> +	}
>   
>   	/*
>   	 * The upper portion of the GuC address space has a sizeable hole

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
  2023-02-08 11:03 ` Matthew Auld
@ 2023-02-08 11:17   ` Andrzej Hajda
  2023-02-08 11:29     ` Andrzej Hajda
  0 siblings, 1 reply; 17+ messages in thread
From: Andrzej Hajda @ 2023-02-08 11:17 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx
  Cc: Thomas Hellström, Rodrigo Vivi, chris.p.wilson



On 08.02.2023 12:03, Matthew Auld wrote:
> On 08/02/2023 10:51, Andrzej Hajda wrote:
>> Write-combining memory allows speculative reads by CPU.
>> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
>> to prefetch memory beyond the error_capture, ie it tries
>> to read memory pointed by next PTE in GGTT.
>> If this PTE points to invalid address DMAR errors will occur.
>> This behaviour was observed on ADL, RPL, DG2 platforms.
>
> Note that DG2 doesn't use this path for error capture, since it lacks 
> mappable aperture. Do you know if CI sees any DMAR errors related to 
> error capture on DG2/DG1?

I have not tested personally but CI confirms it [1] (grep for DMAR:), 
but only on bat-dg2-11.
I am not sure why only this one, but my patch silences DMAR errors on it.
[1]: 
http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12680/bat-dg2-11/dmesg0.txt

Regards
Andrzej


>
>> To avoid it, guard scratch page should be added after error_capture.
>> The patch fixes the most annoying issue with error capture but
>> since WC reads are used also in other places there is a risk similar
>> problem can affect them as well.
>>
>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>> ---
>> This patch tries to diminish plague of DMAR read errors present
>> in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
>> CI is usually tolerant for these errors, so the scale of the problem
>> is not really visible.
>> To show it I have counted lines containing DMAR read errors in dmesgs
>> produced by CI for all three versions of the patch, but in contrast 
>> to v2
>> I have grepped only for lines containing "PTE Read access".
>> Below stats for kernel w/o patch vs patched one.
>> v1: 210 vs 0
>> v2: 201 vs 0
>> v3: 214 vs 0
>> Apparently the patch fixes all common PTE read errors.
>>
>> In previous version there were different numbers due to less exact 
>> grepping,
>> "grep DMAR" catched write errors and "DMAR: DRHD: handling fault 
>> status reg"
>> lines, anyway the actual number of errors is much bigger - DMAR errors
>> are rate-limited.
>>
>> [1]: 
>> http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt
>>
>> Changelog:
>> v2:
>>      - modified commit message (I hope the diagnosis is correct),
>>      - added bug checks to ensure scratch is initialized on gen3 
>> platforms.
>>        CI produces strange stacktrace for it suggesting scratch[0] is 
>> NULL,
>>        to be removed after resolving the issue with gen3 platforms.
>> v3:
>>      - removed bug checks, replaced with gen check.
>> v4:
>>      - change code for scratch page insertion to support all platforms,
>>      - add info in commit message there could be more similar issues
>>
>> Regards
>> Andrzej
>> ---
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 ++++++++++++++++++++++++----
>>   1 file changed, 27 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index 842e69c7b21e49..6566d2066f1f8b 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt 
>> *ggtt)
>>       mutex_destroy(&ggtt->error_mutex);
>>   }
>>   +static void
>> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 
>> length)
>> +{
>> +    struct i915_address_space *vm = &ggtt->vm;
>> +
>> +    if (GRAPHICS_VER(ggtt->vm.i915) < 8)
>> +        return vm->clear_range(vm, offset, length);
>> +    /* clear_range since gen8 is nop */
>> +    while (length > 0) {
>> +        vm->insert_page(vm, px_dma(vm->scratch[0]), offset, 
>> I915_CACHE_NONE, 0);
>> +        offset += I915_GTT_PAGE_SIZE;
>> +        length -= I915_GTT_PAGE_SIZE;
>> +    }
>> +}
>> +
>>   static int init_ggtt(struct i915_ggtt *ggtt)
>>   {
>>       /*
>> @@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>            * paths, and we trust that 0 will remain reserved. However,
>>            * the only likely reason for failure to insert is a driver
>>            * bug, which we expect to cause other failures...
>> +         *
>> +         * Since CPU can perform speculative reads on error capture
>> +         * (write-combining allows it) add scratch page after error
>> +         * capture to avoid DMAR errors.
>>            */
>> -        ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
>> +        ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>>           ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>>           if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>>               drm_mm_insert_node_in_range(&ggtt->vm.mm,
>> @@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>                               0, ggtt->mappable_end,
>>                               DRM_MM_INSERT_LOW);
>>       }
>> -    if (drm_mm_node_allocated(&ggtt->error_capture))
>> +    if (drm_mm_node_allocated(&ggtt->error_capture)) {
>> +        u64 start = ggtt->error_capture.start;
>> +        u64 size = ggtt->error_capture.size;
>> +
>> +        ggtt_insert_scratch_pages(ggtt, start, size);
>>           drm_dbg(&ggtt->vm.i915->drm,
>>               "Reserved GGTT:[%llx, %llx] for use by error capture\n",
>> -            ggtt->error_capture.start,
>> -            ggtt->error_capture.start + ggtt->error_capture.size);
>> +            start, start + size);
>> +    }
>>         /*
>>        * The upper portion of the GuC address space has a sizeable hole


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
  2023-02-08 11:17   ` Andrzej Hajda
@ 2023-02-08 11:29     ` Andrzej Hajda
  2023-02-08 11:35       ` Matthew Auld
  0 siblings, 1 reply; 17+ messages in thread
From: Andrzej Hajda @ 2023-02-08 11:29 UTC (permalink / raw)
  To: Matthew Auld, intel-gfx
  Cc: Thomas Hellström, Rodrigo Vivi, chris.p.wilson



On 08.02.2023 12:17, Andrzej Hajda wrote:
>
>
> On 08.02.2023 12:03, Matthew Auld wrote:
>> On 08/02/2023 10:51, Andrzej Hajda wrote:
>>> Write-combining memory allows speculative reads by CPU.
>>> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
>>> to prefetch memory beyond the error_capture, ie it tries
>>> to read memory pointed by next PTE in GGTT.
>>> If this PTE points to invalid address DMAR errors will occur.
>>> This behaviour was observed on ADL, RPL, DG2 platforms.
>>
>> Note that DG2 doesn't use this path for error capture, since it lacks 
>> mappable aperture. Do you know if CI sees any DMAR errors related to 
>> error capture on DG2/DG1?
>
> I have not tested personally but CI confirms it [1] (grep for DMAR:), 
> but only on bat-dg2-11.
> I am not sure why only this one, but my patch silences DMAR errors on it.
> [1]: 
> http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12680/bat-dg2-11/dmesg0.txt

OK, bat-dg2-11 has DG2 and ADL-P :)

>
> Regards
> Andrzej
>
>
>>
>>> To avoid it, guard scratch page should be added after error_capture.
>>> The patch fixes the most annoying issue with error capture but
>>> since WC reads are used also in other places there is a risk similar
>>> problem can affect them as well.
>>>
>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>> ---
>>> This patch tries to diminish plague of DMAR read errors present
>>> in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
>>> CI is usually tolerant for these errors, so the scale of the problem
>>> is not really visible.
>>> To show it I have counted lines containing DMAR read errors in dmesgs
>>> produced by CI for all three versions of the patch, but in contrast 
>>> to v2
>>> I have grepped only for lines containing "PTE Read access".
>>> Below stats for kernel w/o patch vs patched one.
>>> v1: 210 vs 0
>>> v2: 201 vs 0
>>> v3: 214 vs 0
>>> Apparently the patch fixes all common PTE read errors.
>>>
>>> In previous version there were different numbers due to less exact 
>>> grepping,
>>> "grep DMAR" catched write errors and "DMAR: DRHD: handling fault 
>>> status reg"
>>> lines, anyway the actual number of errors is much bigger - DMAR errors
>>> are rate-limited.
>>>
>>> [1]: 
>>> http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt
>>>
>>> Changelog:
>>> v2:
>>>      - modified commit message (I hope the diagnosis is correct),
>>>      - added bug checks to ensure scratch is initialized on gen3 
>>> platforms.
>>>        CI produces strange stacktrace for it suggesting scratch[0] 
>>> is NULL,
>>>        to be removed after resolving the issue with gen3 platforms.
>>> v3:
>>>      - removed bug checks, replaced with gen check.
>>> v4:
>>>      - change code for scratch page insertion to support all platforms,
>>>      - add info in commit message there could be more similar issues
>>>
>>> Regards
>>> Andrzej
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 
>>> ++++++++++++++++++++++++----
>>>   1 file changed, 27 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
>>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> index 842e69c7b21e49..6566d2066f1f8b 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> @@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt 
>>> *ggtt)
>>>       mutex_destroy(&ggtt->error_mutex);
>>>   }
>>>   +static void
>>> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 
>>> length)
>>> +{
>>> +    struct i915_address_space *vm = &ggtt->vm;
>>> +
>>> +    if (GRAPHICS_VER(ggtt->vm.i915) < 8)
>>> +        return vm->clear_range(vm, offset, length);
>>> +    /* clear_range since gen8 is nop */
>>> +    while (length > 0) {
>>> +        vm->insert_page(vm, px_dma(vm->scratch[0]), offset, 
>>> I915_CACHE_NONE, 0);
>>> +        offset += I915_GTT_PAGE_SIZE;
>>> +        length -= I915_GTT_PAGE_SIZE;
>>> +    }
>>> +}
>>> +
>>>   static int init_ggtt(struct i915_ggtt *ggtt)
>>>   {
>>>       /*
>>> @@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>            * paths, and we trust that 0 will remain reserved. However,
>>>            * the only likely reason for failure to insert is a driver
>>>            * bug, which we expect to cause other failures...
>>> +         *
>>> +         * Since CPU can perform speculative reads on error capture
>>> +         * (write-combining allows it) add scratch page after error
>>> +         * capture to avoid DMAR errors.
>>>            */
>>> -        ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
>>> +        ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>>>           ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>>>           if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>>>               drm_mm_insert_node_in_range(&ggtt->vm.mm,
>>> @@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>                               0, ggtt->mappable_end,
>>>                               DRM_MM_INSERT_LOW);
>>>       }
>>> -    if (drm_mm_node_allocated(&ggtt->error_capture))
>>> +    if (drm_mm_node_allocated(&ggtt->error_capture)) {
>>> +        u64 start = ggtt->error_capture.start;
>>> +        u64 size = ggtt->error_capture.size;
>>> +
>>> +        ggtt_insert_scratch_pages(ggtt, start, size);
>>>           drm_dbg(&ggtt->vm.i915->drm,
>>>               "Reserved GGTT:[%llx, %llx] for use by error capture\n",
>>> -            ggtt->error_capture.start,
>>> -            ggtt->error_capture.start + ggtt->error_capture.size);
>>> +            start, start + size);
>>> +    }
>>>         /*
>>>        * The upper portion of the GuC address space has a sizeable hole
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
  2023-02-08 11:29     ` Andrzej Hajda
@ 2023-02-08 11:35       ` Matthew Auld
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Auld @ 2023-02-08 11:35 UTC (permalink / raw)
  To: Andrzej Hajda, intel-gfx
  Cc: Thomas Hellström, Rodrigo Vivi, chris.p.wilson

On 08/02/2023 11:29, Andrzej Hajda wrote:
> 
> 
> On 08.02.2023 12:17, Andrzej Hajda wrote:
>>
>>
>> On 08.02.2023 12:03, Matthew Auld wrote:
>>> On 08/02/2023 10:51, Andrzej Hajda wrote:
>>>> Write-combining memory allows speculative reads by CPU.
>>>> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
>>>> to prefetch memory beyond the error_capture, ie it tries
>>>> to read memory pointed by next PTE in GGTT.
>>>> If this PTE points to invalid address DMAR errors will occur.
>>>> This behaviour was observed on ADL, RPL, DG2 platforms.
>>>
>>> Note that DG2 doesn't use this path for error capture, since it lacks 
>>> mappable aperture. Do you know if CI sees any DMAR errors related to 
>>> error capture on DG2/DG1?
>>
>> I have not tested personally but CI confirms it [1] (grep for DMAR:), 
>> but only on bat-dg2-11.
>> I am not sure why only this one, but my patch silences DMAR errors on it.
>> [1]: 
>> http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12680/bat-dg2-11/dmesg0.txt
> 
> OK, bat-dg2-11 has DG2 and ADL-P :)

Yeah, it looks like it's running all the tests twice, once for DG2 and 
then again on the ADL, for that CI host. And DMAR errors only appear on 
ADL. I totally didn't realise that CI was doing that. Also seems 
potentially wasteful, since we already have dedicated ADL hosts...

> 
>>
>> Regards
>> Andrzej
>>
>>
>>>
>>>> To avoid it, guard scratch page should be added after error_capture.
>>>> The patch fixes the most annoying issue with error capture but
>>>> since WC reads are used also in other places there is a risk similar
>>>> problem can affect them as well.
>>>>
>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>> ---
>>>> This patch tries to diminish plague of DMAR read errors present
>>>> in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
>>>> CI is usually tolerant for these errors, so the scale of the problem
>>>> is not really visible.
>>>> To show it I have counted lines containing DMAR read errors in dmesgs
>>>> produced by CI for all three versions of the patch, but in contrast 
>>>> to v2
>>>> I have grepped only for lines containing "PTE Read access".
>>>> Below stats for kernel w/o patch vs patched one.
>>>> v1: 210 vs 0
>>>> v2: 201 vs 0
>>>> v3: 214 vs 0
>>>> Apparently the patch fixes all common PTE read errors.
>>>>
>>>> In previous version there were different numbers due to less exact 
>>>> grepping,
>>>> "grep DMAR" catched write errors and "DMAR: DRHD: handling fault 
>>>> status reg"
>>>> lines, anyway the actual number of errors is much bigger - DMAR errors
>>>> are rate-limited.
>>>>
>>>> [1]: 
>>>> http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt
>>>>
>>>> Changelog:
>>>> v2:
>>>>      - modified commit message (I hope the diagnosis is correct),
>>>>      - added bug checks to ensure scratch is initialized on gen3 
>>>> platforms.
>>>>        CI produces strange stacktrace for it suggesting scratch[0] 
>>>> is NULL,
>>>>        to be removed after resolving the issue with gen3 platforms.
>>>> v3:
>>>>      - removed bug checks, replaced with gen check.
>>>> v4:
>>>>      - change code for scratch page insertion to support all platforms,
>>>>      - add info in commit message there could be more similar issues
>>>>
>>>> Regards
>>>> Andrzej
>>>> ---
>>>>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 
>>>> ++++++++++++++++++++++++----
>>>>   1 file changed, 27 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
>>>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>> index 842e69c7b21e49..6566d2066f1f8b 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>> @@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt 
>>>> *ggtt)
>>>>       mutex_destroy(&ggtt->error_mutex);
>>>>   }
>>>>   +static void
>>>> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 
>>>> length)
>>>> +{
>>>> +    struct i915_address_space *vm = &ggtt->vm;
>>>> +
>>>> +    if (GRAPHICS_VER(ggtt->vm.i915) < 8)
>>>> +        return vm->clear_range(vm, offset, length);
>>>> +    /* clear_range since gen8 is nop */
>>>> +    while (length > 0) {
>>>> +        vm->insert_page(vm, px_dma(vm->scratch[0]), offset, 
>>>> I915_CACHE_NONE, 0);
>>>> +        offset += I915_GTT_PAGE_SIZE;
>>>> +        length -= I915_GTT_PAGE_SIZE;
>>>> +    }
>>>> +}
>>>> +
>>>>   static int init_ggtt(struct i915_ggtt *ggtt)
>>>>   {
>>>>       /*
>>>> @@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>>            * paths, and we trust that 0 will remain reserved. However,
>>>>            * the only likely reason for failure to insert is a driver
>>>>            * bug, which we expect to cause other failures...
>>>> +         *
>>>> +         * Since CPU can perform speculative reads on error capture
>>>> +         * (write-combining allows it) add scratch page after error
>>>> +         * capture to avoid DMAR errors.
>>>>            */
>>>> -        ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
>>>> +        ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>>>>           ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>>>>           if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>>>>               drm_mm_insert_node_in_range(&ggtt->vm.mm,
>>>> @@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>>                               0, ggtt->mappable_end,
>>>>                               DRM_MM_INSERT_LOW);
>>>>       }
>>>> -    if (drm_mm_node_allocated(&ggtt->error_capture))
>>>> +    if (drm_mm_node_allocated(&ggtt->error_capture)) {
>>>> +        u64 start = ggtt->error_capture.start;
>>>> +        u64 size = ggtt->error_capture.size;
>>>> +
>>>> +        ggtt_insert_scratch_pages(ggtt, start, size);
>>>>           drm_dbg(&ggtt->vm.i915->drm,
>>>>               "Reserved GGTT:[%llx, %llx] for use by error capture\n",
>>>> -            ggtt->error_capture.start,
>>>> -            ggtt->error_capture.start + ggtt->error_capture.size);
>>>> +            start, start + size);
>>>> +    }
>>>>         /*
>>>>        * The upper portion of the GuC address space has a sizeable hole
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev5)
  2023-02-08 10:51 [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
  2023-02-08 11:03 ` Matthew Auld
@ 2023-02-08 18:39 ` Patchwork
  2023-02-22 11:45   ` Andrzej Hajda
  2023-02-08 19:52 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev6) Patchwork
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Patchwork @ 2023-02-08 18:39 UTC (permalink / raw)
  To: Andrzej Hajda; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 3268 bytes --]

== Series Details ==

Series: drm/i915: add guard page to ggtt->error_capture (rev5)
URL   : https://patchwork.freedesktop.org/series/113560/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12713 -> Patchwork_113560v5
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/index.html

Participating hosts (37 -> 35)
------------------------------

  Missing    (2): fi-blb-e6850 fi-snb-2520m 

Known issues
------------

  Here are the changes found in Patchwork_113560v5 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions:
    - fi-bsw-n3050:       [PASS][1] -> [FAIL][2] ([i915#6298])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12713/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@hangcheck:
    - {bat-adlm-1}:       [ABORT][3] -> [PASS][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12713/bat-adlm-1/igt@i915_selftest@live@hangcheck.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/bat-adlm-1/igt@i915_selftest@live@hangcheck.html
    - fi-skl-guc:         [DMESG-WARN][5] -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12713/fi-skl-guc/igt@i915_selftest@live@hangcheck.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/fi-skl-guc/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@requests:
    - {bat-rpls-1}:       [ABORT][7] ([i915#7911]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12713/bat-rpls-1/igt@i915_selftest@live@requests.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/bat-rpls-1/igt@i915_selftest@live@requests.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#3546]: https://gitlab.freedesktop.org/drm/intel/issues/3546
  [i915#6298]: https://gitlab.freedesktop.org/drm/intel/issues/6298
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7911]: https://gitlab.freedesktop.org/drm/intel/issues/7911
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913


Build changes
-------------

  * Linux: CI_DRM_12713 -> Patchwork_113560v5

  CI-20190529: 20190529
  CI_DRM_12713: 5180055794b438ce688a6804afb0af08e626ebab @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7153: f47f859f13376958a2bd199423b1f0ff53dddbe0 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_113560v5: 5180055794b438ce688a6804afb0af08e626ebab @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

13f608422386 drm/i915: add guard page to ggtt->error_capture

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/index.html

[-- Attachment #2: Type: text/html, Size: 3700 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev6)
  2023-02-08 10:51 [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
  2023-02-08 11:03 ` Matthew Auld
  2023-02-08 18:39 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev5) Patchwork
@ 2023-02-08 19:52 ` Patchwork
  2023-02-22 11:35   ` Andrzej Hajda
  2023-02-09 20:04 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Patchwork @ 2023-02-08 19:52 UTC (permalink / raw)
  To: Andrzej Hajda; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 3022 bytes --]

== Series Details ==

Series: drm/i915: add guard page to ggtt->error_capture (rev6)
URL   : https://patchwork.freedesktop.org/series/113560/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12715 -> Patchwork_113560v6
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/index.html

Participating hosts (36 -> 35)
------------------------------

  Missing    (1): fi-snb-2520m 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_113560v6:

### IGT changes ###

#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live@reset:
    - {bat-rpls-2}:       [PASS][1] -> [ABORT][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/bat-rpls-2/igt@i915_selftest@live@reset.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/bat-rpls-2/igt@i915_selftest@live@reset.html

  
Known issues
------------

  Here are the changes found in Patchwork_113560v6 that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * igt@i915_selftest@live@gt_pm:
    - {bat-rpls-2}:       [DMESG-FAIL][3] ([i915#4258]) -> [PASS][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/bat-rpls-2/igt@i915_selftest@live@gt_pm.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/bat-rpls-2/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@reset:
    - {bat-rpls-1}:       [ABORT][5] ([i915#4983]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/bat-rpls-1/igt@i915_selftest@live@reset.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/bat-rpls-1/igt@i915_selftest@live@reset.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#4258]: https://gitlab.freedesktop.org/drm/intel/issues/4258
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5251]: https://gitlab.freedesktop.org/drm/intel/issues/5251
  [i915#6311]: https://gitlab.freedesktop.org/drm/intel/issues/6311
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913


Build changes
-------------

  * Linux: CI_DRM_12715 -> Patchwork_113560v6

  CI-20190529: 20190529
  CI_DRM_12715: 07d1c54bcdf3fddd723f31d605b4e8c49328affa @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7153: f47f859f13376958a2bd199423b1f0ff53dddbe0 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_113560v6: 07d1c54bcdf3fddd723f31d605b4e8c49328affa @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

3b4ea4d26bfb drm/i915: add guard page to ggtt->error_capture

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/index.html

[-- Attachment #2: Type: text/html, Size: 3498 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915: add guard page to ggtt->error_capture (rev6)
  2023-02-08 10:51 [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
                   ` (2 preceding siblings ...)
  2023-02-08 19:52 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev6) Patchwork
@ 2023-02-09 20:04 ` Patchwork
  2023-02-22 12:05 ` [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
  2023-03-02 10:43 ` Tvrtko Ursulin
  5 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2023-02-09 20:04 UTC (permalink / raw)
  To: Andrzej Hajda; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 21244 bytes --]

== Series Details ==

Series: drm/i915: add guard page to ggtt->error_capture (rev6)
URL   : https://patchwork.freedesktop.org/series/113560/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12715_full -> Patchwork_113560v6_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/index.html

Participating hosts (11 -> 11)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in Patchwork_113560v6_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_fair@basic-deadline:
    - shard-glk:          [PASS][1] -> [FAIL][2] ([i915#2846])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-glk3/igt@gem_exec_fair@basic-deadline.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-glk2/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-none-rrul@rcs0:
    - shard-glk:          NOTRUN -> [FAIL][3] ([i915#2842])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-glk1/igt@gem_exec_fair@basic-none-rrul@rcs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-glk:          [PASS][4] -> [FAIL][5] ([i915#2842])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-glk8/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-glk9/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_lmem_swapping@massive:
    - shard-glk:          NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#4613])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-glk1/igt@gem_lmem_swapping@massive.html

  * igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a:
    - shard-glk:          NOTRUN -> [SKIP][7] ([fdo#109271] / [i915#1937])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-glk1/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a.html

  * igt@kms_big_fb@linear-16bpp-rotate-90:
    - shard-glk:          NOTRUN -> [SKIP][8] ([fdo#109271]) +16 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-glk1/igt@kms_big_fb@linear-16bpp-rotate-90.html

  * igt@kms_ccs@pipe-a-random-ccs-data-y_tiled_gen12_mc_ccs:
    - shard-glk:          NOTRUN -> [SKIP][9] ([fdo#109271] / [i915#3886])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-glk1/igt@kms_ccs@pipe-a-random-ccs-data-y_tiled_gen12_mc_ccs.html

  
#### Possible fixes ####

  * igt@fbdev@nullptr:
    - {shard-rkl}:        [SKIP][10] ([i915#2582]) -> [PASS][11]
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-3/igt@fbdev@nullptr.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-6/igt@fbdev@nullptr.html

  * igt@gem_exec_fair@basic-none-vip@rcs0:
    - {shard-rkl}:        [FAIL][12] ([i915#2842]) -> [PASS][13] +4 similar issues
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-3/igt@gem_exec_fair@basic-none-vip@rcs0.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-5/igt@gem_exec_fair@basic-none-vip@rcs0.html

  * igt@gem_exec_reloc@basic-gtt-read-noreloc:
    - {shard-rkl}:        [SKIP][14] ([i915#3281]) -> [PASS][15] +13 similar issues
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-6/igt@gem_exec_reloc@basic-gtt-read-noreloc.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-5/igt@gem_exec_reloc@basic-gtt-read-noreloc.html

  * igt@gem_partial_pwrite_pread@writes-after-reads:
    - {shard-rkl}:        [SKIP][16] ([i915#3282]) -> [PASS][17] +3 similar issues
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-6/igt@gem_partial_pwrite_pread@writes-after-reads.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-5/igt@gem_partial_pwrite_pread@writes-after-reads.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-glk:          [ABORT][18] ([i915#5566]) -> [PASS][19]
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-glk4/igt@gen9_exec_parse@allowed-single.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-glk1/igt@gen9_exec_parse@allowed-single.html

  * igt@i915_hangman@engine-engine-error@bcs0:
    - {shard-rkl}:        [SKIP][20] ([i915#6258]) -> [PASS][21] +1 similar issue
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-5/igt@i915_hangman@engine-engine-error@bcs0.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-3/igt@i915_hangman@engine-engine-error@bcs0.html

  * igt@i915_pm_dc@dc5-dpms:
    - {shard-rkl}:        [FAIL][22] ([i915#7330]) -> [PASS][23]
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-5/igt@i915_pm_dc@dc5-dpms.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-3/igt@i915_pm_dc@dc5-dpms.html

  * igt@i915_pm_rpm@dpms-lpsp:
    - {shard-rkl}:        [SKIP][24] ([i915#1397]) -> [PASS][25]
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-3/igt@i915_pm_rpm@dpms-lpsp.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-6/igt@i915_pm_rpm@dpms-lpsp.html

  * igt@i915_pm_rpm@pm-tiling:
    - {shard-rkl}:        [SKIP][26] ([fdo#109308]) -> [PASS][27]
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-3/igt@i915_pm_rpm@pm-tiling.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-6/igt@i915_pm_rpm@pm-tiling.html

  * igt@kms_flip@2x-flip-vs-expired-vblank@bc-hdmi-a1-hdmi-a2:
    - shard-glk:          [FAIL][28] ([i915#79]) -> [PASS][29]
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-glk2/igt@kms_flip@2x-flip-vs-expired-vblank@bc-hdmi-a1-hdmi-a2.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-glk6/igt@kms_flip@2x-flip-vs-expired-vblank@bc-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@bc-hdmi-a1-hdmi-a2:
    - shard-glk:          [FAIL][30] ([i915#2122]) -> [PASS][31] +1 similar issue
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-glk9/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@bc-hdmi-a1-hdmi-a2.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-glk3/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@bc-hdmi-a1-hdmi-a2.html

  * igt@kms_frontbuffer_tracking@fbc-tiling-linear:
    - {shard-rkl}:        [SKIP][32] ([i915#1849] / [i915#4098]) -> [PASS][33] +15 similar issues
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-3/igt@kms_frontbuffer_tracking@fbc-tiling-linear.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-6/igt@kms_frontbuffer_tracking@fbc-tiling-linear.html

  * igt@kms_plane@pixel-format@pipe-b-planes:
    - {shard-rkl}:        [SKIP][34] ([i915#1849]) -> [PASS][35] +1 similar issue
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-3/igt@kms_plane@pixel-format@pipe-b-planes.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-6/igt@kms_plane@pixel-format@pipe-b-planes.html

  * igt@kms_psr@cursor_blt:
    - {shard-rkl}:        [SKIP][36] ([i915#1072]) -> [PASS][37] +1 similar issue
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-3/igt@kms_psr@cursor_blt.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-6/igt@kms_psr@cursor_blt.html

  * igt@kms_universal_plane@universal-plane-pageflip-windowed-pipe-b:
    - {shard-rkl}:        [SKIP][38] ([i915#4098]) -> [PASS][39]
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-3/igt@kms_universal_plane@universal-plane-pageflip-windowed-pipe-b.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-6/igt@kms_universal_plane@universal-plane-pageflip-windowed-pipe-b.html

  * igt@kms_vblank@pipe-a-query-forked-busy-hang:
    - {shard-rkl}:        [SKIP][40] ([i915#1845] / [i915#4098]) -> [PASS][41] +14 similar issues
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-3/igt@kms_vblank@pipe-a-query-forked-busy-hang.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-6/igt@kms_vblank@pipe-a-query-forked-busy-hang.html

  * igt@perf@gen8-unprivileged-single-ctx-counters:
    - {shard-rkl}:        [SKIP][42] ([i915#2436]) -> [PASS][43]
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-6/igt@perf@gen8-unprivileged-single-ctx-counters.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-5/igt@perf@gen8-unprivileged-single-ctx-counters.html

  * igt@prime_vgem@basic-read:
    - {shard-rkl}:        [SKIP][44] ([fdo#109295] / [i915#3291] / [i915#3708]) -> [PASS][45]
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-3/igt@prime_vgem@basic-read.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-5/igt@prime_vgem@basic-read.html

  * igt@prime_vgem@coherency-gtt:
    - {shard-rkl}:        [SKIP][46] ([fdo#109295] / [fdo#111656] / [i915#3708]) -> [PASS][47]
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/shard-rkl-6/igt@prime_vgem@coherency-gtt.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/shard-rkl-5/igt@prime_vgem@coherency-gtt.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [IGT#2]: https://gitlab.freedesktop.org/drm/igt-gpu-tools/issues/2
  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109279]: https://bugs.freedesktop.org/show_bug.cgi?id=109279
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#109302]: https://bugs.freedesktop.org/show_bug.cgi?id=109302
  [fdo#109308]: https://bugs.freedesktop.org/show_bug.cgi?id=109308
  [fdo#109312]: https://bugs.freedesktop.org/show_bug.cgi?id=109312
  [fdo#109314]: https://bugs.freedesktop.org/show_bug.cgi?id=109314
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#109506]: https://bugs.freedesktop.org/show_bug.cgi?id=109506
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#110189]: https://bugs.freedesktop.org/show_bug.cgi?id=110189
  [fdo#110542]: https://bugs.freedesktop.org/show_bug.cgi?id=110542
  [fdo#110723]: https://bugs.freedesktop.org/show_bug.cgi?id=110723
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [fdo#111614]: https://bugs.freedesktop.org/show_bug.cgi?id=111614
  [fdo#111615]: https://bugs.freedesktop.org/show_bug.cgi?id=111615
  [fdo#111644]: https://bugs.freedesktop.org/show_bug.cgi?id=111644
  [fdo#111656]: https://bugs.freedesktop.org/show_bug.cgi?id=111656
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [fdo#112054]: https://bugs.freedesktop.org/show_bug.cgi?id=112054
  [fdo#112283]: https://bugs.freedesktop.org/show_bug.cgi?id=112283
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#132]: https://gitlab.freedesktop.org/drm/intel/issues/132
  [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397
  [i915#1755]: https://gitlab.freedesktop.org/drm/intel/issues/1755
  [i915#1769]: https://gitlab.freedesktop.org/drm/intel/issues/1769
  [i915#1825]: https://gitlab.freedesktop.org/drm/intel/issues/1825
  [i915#1839]: https://gitlab.freedesktop.org/drm/intel/issues/1839
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1937]: https://gitlab.freedesktop.org/drm/intel/issues/1937
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2434]: https://gitlab.freedesktop.org/drm/intel/issues/2434
  [i915#2436]: https://gitlab.freedesktop.org/drm/intel/issues/2436
  [i915#2437]: https://gitlab.freedesktop.org/drm/intel/issues/2437
  [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527
  [i915#2532]: https://gitlab.freedesktop.org/drm/intel/issues/2532
  [i915#2575]: https://gitlab.freedesktop.org/drm/intel/issues/2575
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#2587]: https://gitlab.freedesktop.org/drm/intel/issues/2587
  [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672
  [i915#2681]: https://gitlab.freedesktop.org/drm/intel/issues/2681
  [i915#2705]: https://gitlab.freedesktop.org/drm/intel/issues/2705
  [i915#280]: https://gitlab.freedesktop.org/drm/intel/issues/280
  [i915#284]: https://gitlab.freedesktop.org/drm/intel/issues/284
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2846]: https://gitlab.freedesktop.org/drm/intel/issues/2846
  [i915#2856]: https://gitlab.freedesktop.org/drm/intel/issues/2856
  [i915#2920]: https://gitlab.freedesktop.org/drm/intel/issues/2920
  [i915#3116]: https://gitlab.freedesktop.org/drm/intel/issues/3116
  [i915#315]: https://gitlab.freedesktop.org/drm/intel/issues/315
  [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3297]: https://gitlab.freedesktop.org/drm/intel/issues/3297
  [i915#3299]: https://gitlab.freedesktop.org/drm/intel/issues/3299
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3318]: https://gitlab.freedesktop.org/drm/intel/issues/3318
  [i915#3359]: https://gitlab.freedesktop.org/drm/intel/issues/3359
  [i915#3361]: https://gitlab.freedesktop.org/drm/intel/issues/3361
  [i915#3458]: https://gitlab.freedesktop.org/drm/intel/issues/3458
  [i915#3469]: https://gitlab.freedesktop.org/drm/intel/issues/3469
  [i915#3528]: https://gitlab.freedesktop.org/drm/intel/issues/3528
  [i915#3536]: https://gitlab.freedesktop.org/drm/intel/issues/3536
  [i915#3539]: https://gitlab.freedesktop.org/drm/intel/issues/3539
  [i915#3546]: https://gitlab.freedesktop.org/drm/intel/issues/3546
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638
  [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3734]: https://gitlab.freedesktop.org/drm/intel/issues/3734
  [i915#3742]: https://gitlab.freedesktop.org/drm/intel/issues/3742
  [i915#3804]: https://gitlab.freedesktop.org/drm/intel/issues/3804
  [i915#3825]: https://gitlab.freedesktop.org/drm/intel/issues/3825
  [i915#3840]: https://gitlab.freedesktop.org/drm/intel/issues/3840
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#3952]: https://gitlab.freedesktop.org/drm/intel/issues/3952
  [i915#3966]: https://gitlab.freedesktop.org/drm/intel/issues/3966
  [i915#404]: https://gitlab.freedesktop.org/drm/intel/issues/404
  [i915#4070]: https://gitlab.freedesktop.org/drm/intel/issues/4070
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270
  [i915#4281]: https://gitlab.freedesktop.org/drm/intel/issues/4281
  [i915#433]: https://gitlab.freedesktop.org/drm/intel/issues/433
  [i915#4391]: https://gitlab.freedesktop.org/drm/intel/issues/4391
  [i915#4525]: https://gitlab.freedesktop.org/drm/intel/issues/4525
  [i915#4538]: https://gitlab.freedesktop.org/drm/intel/issues/4538
  [i915#4565]: https://gitlab.freedesktop.org/drm/intel/issues/4565
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4771]: https://gitlab.freedesktop.org/drm/intel/issues/4771
  [i915#4812]: https://gitlab.freedesktop.org/drm/intel/issues/4812
  [i915#4833]: https://gitlab.freedesktop.org/drm/intel/issues/4833
  [i915#4852]: https://gitlab.freedesktop.org/drm/intel/issues/4852
  [i915#4860]: https://gitlab.freedesktop.org/drm/intel/issues/4860
  [i915#4884]: https://gitlab.freedesktop.org/drm/intel/issues/4884
  [i915#4885]: https://gitlab.freedesktop.org/drm/intel/issues/4885
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5030]: https://gitlab.freedesktop.org/drm/intel/issues/5030
  [i915#5115]: https://gitlab.freedesktop.org/drm/intel/issues/5115
  [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286
  [i915#5288]: https://gitlab.freedesktop.org/drm/intel/issues/5288
  [i915#5289]: https://gitlab.freedesktop.org/drm/intel/issues/5289
  [i915#5325]: https://gitlab.freedesktop.org/drm/intel/issues/5325
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#5461]: https://gitlab.freedesktop.org/drm/intel/issues/5461
  [i915#5563]: https://gitlab.freedesktop.org/drm/intel/issues/5563
  [i915#5566]: https://gitlab.freedesktop.org/drm/intel/issues/5566
  [i915#5723]: https://gitlab.freedesktop.org/drm/intel/issues/5723
  [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095
  [i915#6117]: https://gitlab.freedesktop.org/drm/intel/issues/6117
  [i915#6227]: https://gitlab.freedesktop.org/drm/intel/issues/6227
  [i915#6245]: https://gitlab.freedesktop.org/drm/intel/issues/6245
  [i915#6247]: https://gitlab.freedesktop.org/drm/intel/issues/6247
  [i915#6248]: https://gitlab.freedesktop.org/drm/intel/issues/6248
  [i915#6258]: https://gitlab.freedesktop.org/drm/intel/issues/6258
  [i915#6335]: https://gitlab.freedesktop.org/drm/intel/issues/6335
  [i915#6344]: https://gitlab.freedesktop.org/drm/intel/issues/6344
  [i915#6433]: https://gitlab.freedesktop.org/drm/intel/issues/6433
  [i915#6493]: https://gitlab.freedesktop.org/drm/intel/issues/6493
  [i915#6497]: https://gitlab.freedesktop.org/drm/intel/issues/6497
  [i915#6524]: https://gitlab.freedesktop.org/drm/intel/issues/6524
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#6590]: https://gitlab.freedesktop.org/drm/intel/issues/6590
  [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
  [i915#6768]: https://gitlab.freedesktop.org/drm/intel/issues/6768
  [i915#6944]: https://gitlab.freedesktop.org/drm/intel/issues/6944
  [i915#6946]: https://gitlab.freedesktop.org/drm/intel/issues/6946
  [i915#7052]: https://gitlab.freedesktop.org/drm/intel/issues/7052
  [i915#7116]: https://gitlab.freedesktop.org/drm/intel/issues/7116
  [i915#7118]: https://gitlab.freedesktop.org/drm/intel/issues/7118
  [i915#7128]: https://gitlab.freedesktop.org/drm/intel/issues/7128
  [i915#7276]: https://gitlab.freedesktop.org/drm/intel/issues/7276
  [i915#7330]: https://gitlab.freedesktop.org/drm/intel/issues/7330
  [i915#7561]: https://gitlab.freedesktop.org/drm/intel/issues/7561
  [i915#7582]: https://gitlab.freedesktop.org/drm/intel/issues/7582
  [i915#7651]: https://gitlab.freedesktop.org/drm/intel/issues/7651
  [i915#7697]: https://gitlab.freedesktop.org/drm/intel/issues/7697
  [i915#7701]: https://gitlab.freedesktop.org/drm/intel/issues/7701
  [i915#7707]: https://gitlab.freedesktop.org/drm/intel/issues/7707
  [i915#7711]: https://gitlab.freedesktop.org/drm/intel/issues/7711
  [i915#7742]: https://gitlab.freedesktop.org/drm/intel/issues/7742
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#7949]: https://gitlab.freedesktop.org/drm/intel/issues/7949
  [i915#7957]: https://gitlab.freedesktop.org/drm/intel/issues/7957
  [i915#7981]: https://gitlab.freedesktop.org/drm/intel/issues/7981


Build changes
-------------

  * Linux: CI_DRM_12715 -> Patchwork_113560v6

  CI-20190529: 20190529
  CI_DRM_12715: 07d1c54bcdf3fddd723f31d605b4e8c49328affa @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7153: f47f859f13376958a2bd199423b1f0ff53dddbe0 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_113560v6: 07d1c54bcdf3fddd723f31d605b4e8c49328affa @ git://anongit.freedesktop.org/gfx-ci/linux

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/index.html

[-- Attachment #2: Type: text/html, Size: 13916 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx]  ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev6)
  2023-02-08 19:52 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev6) Patchwork
@ 2023-02-22 11:35   ` Andrzej Hajda
  0 siblings, 0 replies; 17+ messages in thread
From: Andrzej Hajda @ 2023-02-22 11:35 UTC (permalink / raw)
  To: intel-gfx, Patchwork

On 08.02.2023 20:52, Patchwork wrote:
> *Patch Details*
> *Series:*	drm/i915: add guard page to ggtt->error_capture (rev6)
> *URL:*	https://patchwork.freedesktop.org/series/113560/ 
> <https://patchwork.freedesktop.org/series/113560/>
> *State:*	success
> *Details:* 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/index.html 
> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/index.html>
> 
> 
>   CI Bug Log - changes from CI_DRM_12715 -> Patchwork_113560v6
> 
> 
>     Summary
> 
> *SUCCESS*
> 
> No regressions found.
> 
> External URL: 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/index.html
> 
> 
>     Participating hosts (36 -> 35)
> 
> Missing (1): fi-snb-2520m
> 
> 
>     Possible new issues
> 
> Here are the unknown changes that may have been introduced in 
> Patchwork_113560v6:
> 
> 
>       IGT changes
> 
> 
>         Suppressed
> 
> The following results come from untrusted machines, tests, or statuses.
> They do not affect the overall result.
> 
>   * igt@i915_selftest@live@reset:
>       o {bat-rpls-2}: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/bat-rpls-2/igt@i915_selftest@live@reset.html> -> ABORT <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/bat-rpls-2/igt@i915_selftest@live@reset.html>

This 'issue' and below 'possible fixes' are due to know issue with 
"timed out waiting for forcewake ack request", not related.

Regards
Andrzej

> 
> 
>     Known issues
> 
> Here are the changes found in Patchwork_113560v6 that come from known 
> issues:
> 
> 
>       IGT changes
> 
> 
>         Possible fixes
> 
>   *
> 
>     igt@i915_selftest@live@gt_pm:
> 
>       o {bat-rpls-2}: DMESG-FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/bat-rpls-2/igt@i915_selftest@live@gt_pm.html> (i915#4258 <https://gitlab.freedesktop.org/drm/intel/issues/4258>) -> PASS <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/bat-rpls-2/igt@i915_selftest@live@gt_pm.html>
>   *
> 
>     igt@i915_selftest@live@reset:
> 
>       o {bat-rpls-1}: ABORT
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12715/bat-rpls-1/igt@i915_selftest@live@reset.html> (i915#4983 <https://gitlab.freedesktop.org/drm/intel/issues/4983>) -> PASS <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v6/bat-rpls-1/igt@i915_selftest@live@reset.html>
> 
> {name}: This element is suppressed. This means it is ignored when computing
> the status of the difference (SUCCESS, WARNING, or FAILURE).
> 
> 
>     Build changes
> 
>   * Linux: CI_DRM_12715 -> Patchwork_113560v6
> 
> CI-20190529: 20190529
> CI_DRM_12715: 07d1c54bcdf3fddd723f31d605b4e8c49328affa @ 
> git://anongit.freedesktop.org/gfx-ci/linux
> IGT_7153: f47f859f13376958a2bd199423b1f0ff53dddbe0 @ 
> https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
> Patchwork_113560v6: 07d1c54bcdf3fddd723f31d605b4e8c49328affa @ 
> git://anongit.freedesktop.org/gfx-ci/linux
> 
> 
>       Linux commits
> 
> 3b4ea4d26bfb drm/i915: add guard page to ggtt->error_capture
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx]  ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev5)
  2023-02-08 18:39 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev5) Patchwork
@ 2023-02-22 11:45   ` Andrzej Hajda
  0 siblings, 0 replies; 17+ messages in thread
From: Andrzej Hajda @ 2023-02-22 11:45 UTC (permalink / raw)
  To: intel-gfx, Patchwork

On 08.02.2023 19:39, Patchwork wrote:
> *Patch Details*
> *Series:*	drm/i915: add guard page to ggtt->error_capture (rev5)
> *URL:*	https://patchwork.freedesktop.org/series/113560/ 
> <https://patchwork.freedesktop.org/series/113560/>
> *State:*	success
> *Details:* 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/index.html 
> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/index.html>
> 
> 
>   CI Bug Log - changes from CI_DRM_12713 -> Patchwork_113560v5
> 
> 
>     Summary
> 
> *SUCCESS*
> 
> No regressions found.
> 
> External URL: 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/index.html
> 
> 
>     Participating hosts (37 -> 35)
> 
> Missing (2): fi-blb-e6850 fi-snb-2520m
> 
> 
>     Known issues
> 
> Here are the changes found in Patchwork_113560v5 that come from known 
> issues:
> 
> 
>       IGT changes
> 
> 
>         Issues hit
> 
>   * igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions:
>       o fi-bsw-n3050: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12713/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html> -> FAIL <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html> (i915#6298 <https://gitlab.freedesktop.org/drm/intel/issues/6298>)


Very common error: 
http://gfx-ci.igk.intel.com/cibuglog-ng/results/all?query_key=a35c73578e5625e7c07ee8d0cbc3a8bd1645592c

Not related

Regards
Andrzej

> 
> 
>         Possible fixes
> 
>   *
> 
>     igt@i915_selftest@live@hangcheck:
> 
>       o
> 
>         {bat-adlm-1}: ABORT
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12713/bat-adlm-1/igt@i915_selftest@live@hangcheck.html> -> PASS <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/bat-adlm-1/igt@i915_selftest@live@hangcheck.html>
> 
>       o
> 
>         fi-skl-guc: DMESG-WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12713/fi-skl-guc/igt@i915_selftest@live@hangcheck.html> -> PASS <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/fi-skl-guc/igt@i915_selftest@live@hangcheck.html>
> 
>   *
> 
>     igt@i915_selftest@live@requests:
> 
>       o {bat-rpls-1}: ABORT
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12713/bat-rpls-1/igt@i915_selftest@live@requests.html> (i915#7911 <https://gitlab.freedesktop.org/drm/intel/issues/7911>) -> PASS <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113560v5/bat-rpls-1/igt@i915_selftest@live@requests.html>
> 
> {name}: This element is suppressed. This means it is ignored when computing
> the status of the difference (SUCCESS, WARNING, or FAILURE).
> 
> 
>     Build changes
> 
>   * Linux: CI_DRM_12713 -> Patchwork_113560v5
> 
> CI-20190529: 20190529
> CI_DRM_12713: 5180055794b438ce688a6804afb0af08e626ebab @ 
> git://anongit.freedesktop.org/gfx-ci/linux
> IGT_7153: f47f859f13376958a2bd199423b1f0ff53dddbe0 @ 
> https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
> Patchwork_113560v5: 5180055794b438ce688a6804afb0af08e626ebab @ 
> git://anongit.freedesktop.org/gfx-ci/linux
> 
> 
>       Linux commits
> 
> 13f608422386 drm/i915: add guard page to ggtt->error_capture
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
  2023-02-08 10:51 [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
                   ` (3 preceding siblings ...)
  2023-02-09 20:04 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
@ 2023-02-22 12:05 ` Andrzej Hajda
  2023-02-22 20:48   ` Rodrigo Vivi
  2023-03-02 10:43 ` Tvrtko Ursulin
  5 siblings, 1 reply; 17+ messages in thread
From: Andrzej Hajda @ 2023-02-22 12:05 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Matthew Auld, Rodrigo Vivi,
	chris.p.wilson, Das, Nirmoy

Hi all,

Gently ping on the patch. CI pollution is quite high:
$ grep 'PTE Read access' CI/drm-tip/CI_DRM_12768/*/dmesg* | wc -l
308

Regards
Andrzej

On 08.02.2023 11:51, Andrzej Hajda wrote:
> Write-combining memory allows speculative reads by CPU.
> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
> to prefetch memory beyond the error_capture, ie it tries
> to read memory pointed by next PTE in GGTT.
> If this PTE points to invalid address DMAR errors will occur.
> This behaviour was observed on ADL, RPL, DG2 platforms.
> To avoid it, guard scratch page should be added after error_capture.
> The patch fixes the most annoying issue with error capture but
> since WC reads are used also in other places there is a risk similar
> problem can affect them as well.
> 
> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> ---
> This patch tries to diminish plague of DMAR read errors present
> in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
> CI is usually tolerant for these errors, so the scale of the problem
> is not really visible.
> To show it I have counted lines containing DMAR read errors in dmesgs
> produced by CI for all three versions of the patch, but in contrast to v2
> I have grepped only for lines containing "PTE Read access".
> Below stats for kernel w/o patch vs patched one.
> v1: 210 vs 0
> v2: 201 vs 0
> v3: 214 vs 0
> Apparently the patch fixes all common PTE read errors.
> 
> In previous version there were different numbers due to less exact grepping,
> "grep DMAR" catched write errors and "DMAR: DRHD: handling fault status reg"
> lines, anyway the actual number of errors is much bigger - DMAR errors
> are rate-limited.
> 
> [1]: http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt
> 
> Changelog:
> v2:
>      - modified commit message (I hope the diagnosis is correct),
>      - added bug checks to ensure scratch is initialized on gen3 platforms.
>        CI produces strange stacktrace for it suggesting scratch[0] is NULL,
>        to be removed after resolving the issue with gen3 platforms.
> v3:
>      - removed bug checks, replaced with gen check.
> v4:
>      - change code for scratch page insertion to support all platforms,
>      - add info in commit message there could be more similar issues
> 
> Regards
> Andrzej
> ---
>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 ++++++++++++++++++++++++----
>   1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 842e69c7b21e49..6566d2066f1f8b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt *ggtt)
>   	mutex_destroy(&ggtt->error_mutex);
>   }
>   
> +static void
> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 length)
> +{
> +	struct i915_address_space *vm = &ggtt->vm;
> +
> +	if (GRAPHICS_VER(ggtt->vm.i915) < 8)
> +		return vm->clear_range(vm, offset, length);
> +	/* clear_range since gen8 is nop */
> +	while (length > 0) {
> +		vm->insert_page(vm, px_dma(vm->scratch[0]), offset, I915_CACHE_NONE, 0);
> +		offset += I915_GTT_PAGE_SIZE;
> +		length -= I915_GTT_PAGE_SIZE;
> +	}
> +}
> +
>   static int init_ggtt(struct i915_ggtt *ggtt)
>   {
>   	/*
> @@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   		 * paths, and we trust that 0 will remain reserved. However,
>   		 * the only likely reason for failure to insert is a driver
>   		 * bug, which we expect to cause other failures...
> +		 *
> +		 * Since CPU can perform speculative reads on error capture
> +		 * (write-combining allows it) add scratch page after error
> +		 * capture to avoid DMAR errors.
>   		 */
> -		ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
> +		ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>   		ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>   		if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>   			drm_mm_insert_node_in_range(&ggtt->vm.mm,
> @@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   						    0, ggtt->mappable_end,
>   						    DRM_MM_INSERT_LOW);
>   	}
> -	if (drm_mm_node_allocated(&ggtt->error_capture))
> +	if (drm_mm_node_allocated(&ggtt->error_capture)) {
> +		u64 start = ggtt->error_capture.start;
> +		u64 size = ggtt->error_capture.size;
> +
> +		ggtt_insert_scratch_pages(ggtt, start, size);
>   		drm_dbg(&ggtt->vm.i915->drm,
>   			"Reserved GGTT:[%llx, %llx] for use by error capture\n",
> -			ggtt->error_capture.start,
> -			ggtt->error_capture.start + ggtt->error_capture.size);
> +			start, start + size);
> +	}
>   
>   	/*
>   	 * The upper portion of the GuC address space has a sizeable hole


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
  2023-02-22 12:05 ` [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
@ 2023-02-22 20:48   ` Rodrigo Vivi
  2023-02-27 15:35     ` Andrzej Hajda
  0 siblings, 1 reply; 17+ messages in thread
From: Rodrigo Vivi @ 2023-02-22 20:48 UTC (permalink / raw)
  To: Andrzej Hajda
  Cc: intel-gfx, Thomas Hellström, chris.p.wilson, Matthew Auld,
	Das, Nirmoy


On Wed, Feb 22, 2023 at 01:05:16PM +0100, Andrzej Hajda wrote:
> Hi all,
> 
> Gently ping on the patch. CI pollution is quite high:
> $ grep 'PTE Read access' CI/drm-tip/CI_DRM_12768/*/dmesg* | wc -l
> 308
> 
> Regards
> Andrzej
> 
> On 08.02.2023 11:51, Andrzej Hajda wrote:
> > Write-combining memory allows speculative reads by CPU.
> > ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
> > to prefetch memory beyond the error_capture, ie it tries
> > to read memory pointed by next PTE in GGTT.
> > If this PTE points to invalid address DMAR errors will occur.
> > This behaviour was observed on ADL, RPL, DG2 platforms.
> > To avoid it, guard scratch page should be added after error_capture.
> > The patch fixes the most annoying issue with error capture but
> > since WC reads are used also in other places there is a risk similar
> > problem can affect them as well.
> > 
> > Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> > ---
> > This patch tries to diminish plague of DMAR read errors present
> > in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
> > CI is usually tolerant for these errors, so the scale of the problem
> > is not really visible.

is this info not relevant to the commit msg?

> > To show it I have counted lines containing DMAR read errors in dmesgs
> > produced by CI for all three versions of the patch, but in contrast to v2
> > I have grepped only for lines containing "PTE Read access".
> > Below stats for kernel w/o patch vs patched one.
> > v1: 210 vs 0
> > v2: 201 vs 0
> > v3: 214 vs 0
> > Apparently the patch fixes all common PTE read errors.
> > 
> > In previous version there were different numbers due to less exact grepping,
> > "grep DMAR" catched write errors and "DMAR: DRHD: handling fault status reg"
> > lines, anyway the actual number of errors is much bigger - DMAR errors
> > are rate-limited.
> > 
> > [1]: http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt
> > 
> > Changelog:
> > v2:
> >      - modified commit message (I hope the diagnosis is correct),
> >      - added bug checks to ensure scratch is initialized on gen3 platforms.
> >        CI produces strange stacktrace for it suggesting scratch[0] is NULL,
> >        to be removed after resolving the issue with gen3 platforms.
> > v3:
> >      - removed bug checks, replaced with gen check.
> > v4:
> >      - change code for scratch page insertion to support all platforms,
> >      - add info in commit message there could be more similar issues

in general in i915 we keep the history in the commit msg itself as well...

But with this CI history behind I would like Tvrtko or Joonas to take a look
and merge to drm-intel-gt-next.

> > 
> > Regards
> > Andrzej
> > ---
> >   drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 ++++++++++++++++++++++++----
> >   1 file changed, 27 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > index 842e69c7b21e49..6566d2066f1f8b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > @@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt *ggtt)
> >   	mutex_destroy(&ggtt->error_mutex);
> >   }
> > +static void
> > +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 length)
> > +{
> > +	struct i915_address_space *vm = &ggtt->vm;
> > +
> > +	if (GRAPHICS_VER(ggtt->vm.i915) < 8)
> > +		return vm->clear_range(vm, offset, length);
> > +	/* clear_range since gen8 is nop */
> > +	while (length > 0) {
> > +		vm->insert_page(vm, px_dma(vm->scratch[0]), offset, I915_CACHE_NONE, 0);
> > +		offset += I915_GTT_PAGE_SIZE;
> > +		length -= I915_GTT_PAGE_SIZE;
> > +	}
> > +}
> > +
> >   static int init_ggtt(struct i915_ggtt *ggtt)
> >   {
> >   	/*
> > @@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
> >   		 * paths, and we trust that 0 will remain reserved. However,
> >   		 * the only likely reason for failure to insert is a driver
> >   		 * bug, which we expect to cause other failures...
> > +		 *
> > +		 * Since CPU can perform speculative reads on error capture
> > +		 * (write-combining allows it) add scratch page after error
> > +		 * capture to avoid DMAR errors.
> >   		 */
> > -		ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
> > +		ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
> >   		ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
> >   		if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
> >   			drm_mm_insert_node_in_range(&ggtt->vm.mm,
> > @@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
> >   						    0, ggtt->mappable_end,
> >   						    DRM_MM_INSERT_LOW);
> >   	}
> > -	if (drm_mm_node_allocated(&ggtt->error_capture))
> > +	if (drm_mm_node_allocated(&ggtt->error_capture)) {
> > +		u64 start = ggtt->error_capture.start;
> > +		u64 size = ggtt->error_capture.size;
> > +
> > +		ggtt_insert_scratch_pages(ggtt, start, size);
> >   		drm_dbg(&ggtt->vm.i915->drm,
> >   			"Reserved GGTT:[%llx, %llx] for use by error capture\n",
> > -			ggtt->error_capture.start,
> > -			ggtt->error_capture.start + ggtt->error_capture.size);
> > +			start, start + size);
> > +	}
> >   	/*
> >   	 * The upper portion of the GuC address space has a sizeable hole
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
  2023-02-22 20:48   ` Rodrigo Vivi
@ 2023-02-27 15:35     ` Andrzej Hajda
  0 siblings, 0 replies; 17+ messages in thread
From: Andrzej Hajda @ 2023-02-27 15:35 UTC (permalink / raw)
  To: Rodrigo Vivi, Tvrtko Ursulin, Joonas Lahtinen
  Cc: intel-gfx, Thomas Hellström, chris.p.wilson, Matthew Auld,
	Das, Nirmoy

Hi all,

As suggested by Rodrigo added CC: Tvrtko and Joonas, see below.


On 22.02.2023 21:48, Rodrigo Vivi wrote:
> 
> On Wed, Feb 22, 2023 at 01:05:16PM +0100, Andrzej Hajda wrote:
>> Hi all,
>>
>> Gently ping on the patch. CI pollution is quite high:
>> $ grep 'PTE Read access' CI/drm-tip/CI_DRM_12768/*/dmesg* | wc -l
>> 308
>>
>> Regards
>> Andrzej
>>
>> On 08.02.2023 11:51, Andrzej Hajda wrote:
>>> Write-combining memory allows speculative reads by CPU.
>>> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
>>> to prefetch memory beyond the error_capture, ie it tries
>>> to read memory pointed by next PTE in GGTT.
>>> If this PTE points to invalid address DMAR errors will occur.
>>> This behaviour was observed on ADL, RPL, DG2 platforms.
>>> To avoid it, guard scratch page should be added after error_capture.
>>> The patch fixes the most annoying issue with error capture but
>>> since WC reads are used also in other places there is a risk similar
>>> problem can affect them as well.
>>>
>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>> ---
>>> This patch tries to diminish plague of DMAR read errors present
>>> in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
>>> CI is usually tolerant for these errors, so the scale of the problem
>>> is not really visible.
> 
> is this info not relevant to the commit msg?

It is, but also seems slightly redundant for me, except the alarming tone :)

> 
>>> To show it I have counted lines containing DMAR read errors in dmesgs
>>> produced by CI for all three versions of the patch, but in contrast to v2
>>> I have grepped only for lines containing "PTE Read access".
>>> Below stats for kernel w/o patch vs patched one.
>>> v1: 210 vs 0
>>> v2: 201 vs 0
>>> v3: 214 vs 0
>>> Apparently the patch fixes all common PTE read errors.
>>>
>>> In previous version there were different numbers due to less exact grepping,
>>> "grep DMAR" catched write errors and "DMAR: DRHD: handling fault status reg"
>>> lines, anyway the actual number of errors is much bigger - DMAR errors
>>> are rate-limited.
>>>
>>> [1]: http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt
>>>
>>> Changelog:
>>> v2:
>>>       - modified commit message (I hope the diagnosis is correct),
>>>       - added bug checks to ensure scratch is initialized on gen3 platforms.
>>>         CI produces strange stacktrace for it suggesting scratch[0] is NULL,
>>>         to be removed after resolving the issue with gen3 platforms.
>>> v3:
>>>       - removed bug checks, replaced with gen check.
>>> v4:
>>>       - change code for scratch page insertion to support all platforms,
>>>       - add info in commit message there could be more similar issues
> 
> in general in i915 we keep the history in the commit msg itself as well...
> 
> But with this CI history behind I would like Tvrtko or Joonas to take a look
> and merge to drm-intel-gt-next.

Tvrtko, Joonas any comments?

Regards
Andrzej

> 
>>>
>>> Regards
>>> Andrzej
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 ++++++++++++++++++++++++----
>>>    1 file changed, 27 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> index 842e69c7b21e49..6566d2066f1f8b 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> @@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt *ggtt)
>>>    	mutex_destroy(&ggtt->error_mutex);
>>>    }
>>> +static void
>>> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 length)
>>> +{
>>> +	struct i915_address_space *vm = &ggtt->vm;
>>> +
>>> +	if (GRAPHICS_VER(ggtt->vm.i915) < 8)
>>> +		return vm->clear_range(vm, offset, length);
>>> +	/* clear_range since gen8 is nop */
>>> +	while (length > 0) {
>>> +		vm->insert_page(vm, px_dma(vm->scratch[0]), offset, I915_CACHE_NONE, 0);
>>> +		offset += I915_GTT_PAGE_SIZE;
>>> +		length -= I915_GTT_PAGE_SIZE;
>>> +	}
>>> +}
>>> +
>>>    static int init_ggtt(struct i915_ggtt *ggtt)
>>>    {
>>>    	/*
>>> @@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>    		 * paths, and we trust that 0 will remain reserved. However,
>>>    		 * the only likely reason for failure to insert is a driver
>>>    		 * bug, which we expect to cause other failures...
>>> +		 *
>>> +		 * Since CPU can perform speculative reads on error capture
>>> +		 * (write-combining allows it) add scratch page after error
>>> +		 * capture to avoid DMAR errors.
>>>    		 */
>>> -		ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
>>> +		ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>>>    		ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>>>    		if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>>>    			drm_mm_insert_node_in_range(&ggtt->vm.mm,
>>> @@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>    						    0, ggtt->mappable_end,
>>>    						    DRM_MM_INSERT_LOW);
>>>    	}
>>> -	if (drm_mm_node_allocated(&ggtt->error_capture))
>>> +	if (drm_mm_node_allocated(&ggtt->error_capture)) {
>>> +		u64 start = ggtt->error_capture.start;
>>> +		u64 size = ggtt->error_capture.size;
>>> +
>>> +		ggtt_insert_scratch_pages(ggtt, start, size);
>>>    		drm_dbg(&ggtt->vm.i915->drm,
>>>    			"Reserved GGTT:[%llx, %llx] for use by error capture\n",
>>> -			ggtt->error_capture.start,
>>> -			ggtt->error_capture.start + ggtt->error_capture.size);
>>> +			start, start + size);
>>> +	}
>>>    	/*
>>>    	 * The upper portion of the GuC address space has a sizeable hole
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
  2023-02-08 10:51 [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
                   ` (4 preceding siblings ...)
  2023-02-22 12:05 ` [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
@ 2023-03-02 10:43 ` Tvrtko Ursulin
  2023-03-02 11:00   ` Andrzej Hajda
  5 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2023-03-02 10:43 UTC (permalink / raw)
  To: Andrzej Hajda, intel-gfx
  Cc: Thomas Hellström, Matthew Auld, Rodrigo Vivi, chris.p.wilson


On 08/02/2023 10:51, Andrzej Hajda wrote:
> Write-combining memory allows speculative reads by CPU.
> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
> to prefetch memory beyond the error_capture, ie it tries
> to read memory pointed by next PTE in GGTT.
> If this PTE points to invalid address DMAR errors will occur.
> This behaviour was observed on ADL, RPL, DG2 platforms.
> To avoid it, guard scratch page should be added after error_capture.
> The patch fixes the most annoying issue with error capture but
> since WC reads are used also in other places there is a risk similar
> problem can affect them as well.
> 
> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Research tells the explanation is plausible so:

Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

On patch details below.

What about "simiar risk in other places" - are there any plans to asses 
the potential impact elsewhere?

> ---
> This patch tries to diminish plague of DMAR read errors present
> in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
> CI is usually tolerant for these errors, so the scale of the problem
> is not really visible.
> To show it I have counted lines containing DMAR read errors in dmesgs
> produced by CI for all three versions of the patch, but in contrast to v2
> I have grepped only for lines containing "PTE Read access".
> Below stats for kernel w/o patch vs patched one.
> v1: 210 vs 0
> v2: 201 vs 0
> v3: 214 vs 0
> Apparently the patch fixes all common PTE read errors.
> 
> In previous version there were different numbers due to less exact grepping,
> "grep DMAR" catched write errors and "DMAR: DRHD: handling fault status reg"
> lines, anyway the actual number of errors is much bigger - DMAR errors
> are rate-limited.
> 
> [1]: http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt
> 
> Changelog:
> v2:
>      - modified commit message (I hope the diagnosis is correct),
>      - added bug checks to ensure scratch is initialized on gen3 platforms.
>        CI produces strange stacktrace for it suggesting scratch[0] is NULL,
>        to be removed after resolving the issue with gen3 platforms.
> v3:
>      - removed bug checks, replaced with gen check.
> v4:
>      - change code for scratch page insertion to support all platforms,
>      - add info in commit message there could be more similar issues
> 
> Regards
> Andrzej
> ---
>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 ++++++++++++++++++++++++----
>   1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 842e69c7b21e49..6566d2066f1f8b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt *ggtt)
>   	mutex_destroy(&ggtt->error_mutex);
>   }
>   
> +static void
> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 length)
> +{
> +	struct i915_address_space *vm = &ggtt->vm;
> +
> +	if (GRAPHICS_VER(ggtt->vm.i915) < 8)
> +		return vm->clear_range(vm, offset, length);
> +	/* clear_range since gen8 is nop */

It would also work to simply drop the <gen8 case and just do the loop 
below, right? If so would that be clearer?

> +	while (length > 0) {

Maybe add a GEM_BUG_ON if length is not aligned? Granted it may be a 
huge stretch of imagination..

Regards,

Tvrtko

> +		vm->insert_page(vm, px_dma(vm->scratch[0]), offset, I915_CACHE_NONE, 0);
> +		offset += I915_GTT_PAGE_SIZE;
> +		length -= I915_GTT_PAGE_SIZE;
> +	}
> +}
> +
>   static int init_ggtt(struct i915_ggtt *ggtt)
>   {
>   	/*
> @@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   		 * paths, and we trust that 0 will remain reserved. However,
>   		 * the only likely reason for failure to insert is a driver
>   		 * bug, which we expect to cause other failures...
> +		 *
> +		 * Since CPU can perform speculative reads on error capture
> +		 * (write-combining allows it) add scratch page after error
> +		 * capture to avoid DMAR errors.
>   		 */
> -		ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
> +		ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>   		ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>   		if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>   			drm_mm_insert_node_in_range(&ggtt->vm.mm,
> @@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   						    0, ggtt->mappable_end,
>   						    DRM_MM_INSERT_LOW);
>   	}
> -	if (drm_mm_node_allocated(&ggtt->error_capture))
> +	if (drm_mm_node_allocated(&ggtt->error_capture)) {
> +		u64 start = ggtt->error_capture.start;
> +		u64 size = ggtt->error_capture.size;
> +
> +		ggtt_insert_scratch_pages(ggtt, start, size);
>   		drm_dbg(&ggtt->vm.i915->drm,
>   			"Reserved GGTT:[%llx, %llx] for use by error capture\n",
> -			ggtt->error_capture.start,
> -			ggtt->error_capture.start + ggtt->error_capture.size);
> +			start, start + size);
> +	}
>   
>   	/*
>   	 * The upper portion of the GuC address space has a sizeable hole

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
  2023-03-02 10:43 ` Tvrtko Ursulin
@ 2023-03-02 11:00   ` Andrzej Hajda
  2023-03-03 12:01     ` Tvrtko Ursulin
  0 siblings, 1 reply; 17+ messages in thread
From: Andrzej Hajda @ 2023-03-02 11:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx
  Cc: Thomas Hellström, Matthew Auld, Rodrigo Vivi, chris.p.wilson



On 02.03.2023 11:43, Tvrtko Ursulin wrote:
>
> On 08/02/2023 10:51, Andrzej Hajda wrote:
>> Write-combining memory allows speculative reads by CPU.
>> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
>> to prefetch memory beyond the error_capture, ie it tries
>> to read memory pointed by next PTE in GGTT.
>> If this PTE points to invalid address DMAR errors will occur.
>> This behaviour was observed on ADL, RPL, DG2 platforms.
>> To avoid it, guard scratch page should be added after error_capture.
>> The patch fixes the most annoying issue with error capture but
>> since WC reads are used also in other places there is a risk similar
>> problem can affect them as well.
>>
>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>
> Research tells the explanation is plausible so:
>
> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>

Thanks for looking at it.

> On patch details below.
>
> What about "simiar risk in other places" - are there any plans to 
> asses the potential impact elsewhere?

Yes, merging this patch is the 1st step, as error_capture is responsible 
for most (maybe even all) regular
DMAR read errors. After removing this noise it will be much easier to 
spot other DMAR read errors.
There are also multiple regular DMAR write errors (less frequent, but 
still), which I hope to debug if time permits.

Regards
Andrzej

>
>> ---
>> This patch tries to diminish plague of DMAR read errors present
>> in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
>> CI is usually tolerant for these errors, so the scale of the problem
>> is not really visible.
>> To show it I have counted lines containing DMAR read errors in dmesgs
>> produced by CI for all three versions of the patch, but in contrast 
>> to v2
>> I have grepped only for lines containing "PTE Read access".
>> Below stats for kernel w/o patch vs patched one.
>> v1: 210 vs 0
>> v2: 201 vs 0
>> v3: 214 vs 0
>> Apparently the patch fixes all common PTE read errors.
>>
>> In previous version there were different numbers due to less exact 
>> grepping,
>> "grep DMAR" catched write errors and "DMAR: DRHD: handling fault 
>> status reg"
>> lines, anyway the actual number of errors is much bigger - DMAR errors
>> are rate-limited.
>>
>> [1]: 
>> http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt
>>
>> Changelog:
>> v2:
>>      - modified commit message (I hope the diagnosis is correct),
>>      - added bug checks to ensure scratch is initialized on gen3 
>> platforms.
>>        CI produces strange stacktrace for it suggesting scratch[0] is 
>> NULL,
>>        to be removed after resolving the issue with gen3 platforms.
>> v3:
>>      - removed bug checks, replaced with gen check.
>> v4:
>>      - change code for scratch page insertion to support all platforms,
>>      - add info in commit message there could be more similar issues
>>
>> Regards
>> Andrzej
>> ---
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 ++++++++++++++++++++++++----
>>   1 file changed, 27 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> index 842e69c7b21e49..6566d2066f1f8b 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>> @@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt 
>> *ggtt)
>>       mutex_destroy(&ggtt->error_mutex);
>>   }
>>   +static void
>> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 
>> length)
>> +{
>> +    struct i915_address_space *vm = &ggtt->vm;
>> +
>> +    if (GRAPHICS_VER(ggtt->vm.i915) < 8)
>> +        return vm->clear_range(vm, offset, length);
>> +    /* clear_range since gen8 is nop */
>
> It would also work to simply drop the <gen8 case and just do the loop 
> below, right? If so would that be clearer?
>
>> +    while (length > 0) {
>
> Maybe add a GEM_BUG_ON if length is not aligned? Granted it may be a 
> huge stretch of imagination..
>
> Regards,
>
> Tvrtko
>
>> +        vm->insert_page(vm, px_dma(vm->scratch[0]), offset, 
>> I915_CACHE_NONE, 0);
>> +        offset += I915_GTT_PAGE_SIZE;
>> +        length -= I915_GTT_PAGE_SIZE;
>> +    }
>> +}
>> +
>>   static int init_ggtt(struct i915_ggtt *ggtt)
>>   {
>>       /*
>> @@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>            * paths, and we trust that 0 will remain reserved. However,
>>            * the only likely reason for failure to insert is a driver
>>            * bug, which we expect to cause other failures...
>> +         *
>> +         * Since CPU can perform speculative reads on error capture
>> +         * (write-combining allows it) add scratch page after error
>> +         * capture to avoid DMAR errors.
>>            */
>> -        ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
>> +        ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>>           ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>>           if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>>               drm_mm_insert_node_in_range(&ggtt->vm.mm,
>> @@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>                               0, ggtt->mappable_end,
>>                               DRM_MM_INSERT_LOW);
>>       }
>> -    if (drm_mm_node_allocated(&ggtt->error_capture))
>> +    if (drm_mm_node_allocated(&ggtt->error_capture)) {
>> +        u64 start = ggtt->error_capture.start;
>> +        u64 size = ggtt->error_capture.size;
>> +
>> +        ggtt_insert_scratch_pages(ggtt, start, size);
>>           drm_dbg(&ggtt->vm.i915->drm,
>>               "Reserved GGTT:[%llx, %llx] for use by error capture\n",
>> -            ggtt->error_capture.start,
>> -            ggtt->error_capture.start + ggtt->error_capture.size);
>> +            start, start + size);
>> +    }
>>         /*
>>        * The upper portion of the GuC address space has a sizeable hole


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
  2023-03-02 11:00   ` Andrzej Hajda
@ 2023-03-03 12:01     ` Tvrtko Ursulin
  2023-03-03 12:54       ` Andrzej Hajda
  0 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2023-03-03 12:01 UTC (permalink / raw)
  To: Andrzej Hajda, intel-gfx
  Cc: Thomas Hellström, Matthew Auld, Rodrigo Vivi, chris.p.wilson


On 02/03/2023 11:00, Andrzej Hajda wrote:
> On 02.03.2023 11:43, Tvrtko Ursulin wrote:
>>
>> On 08/02/2023 10:51, Andrzej Hajda wrote:
>>> Write-combining memory allows speculative reads by CPU.
>>> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
>>> to prefetch memory beyond the error_capture, ie it tries
>>> to read memory pointed by next PTE in GGTT.
>>> If this PTE points to invalid address DMAR errors will occur.
>>> This behaviour was observed on ADL, RPL, DG2 platforms.
>>> To avoid it, guard scratch page should be added after error_capture.
>>> The patch fixes the most annoying issue with error capture but
>>> since WC reads are used also in other places there is a risk similar
>>> problem can affect them as well.
>>>
>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>
>> Research tells the explanation is plausible so:
>>
>> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
> 
> Thanks for looking at it.
> 
>> On patch details below.
>>
>> What about "simiar risk in other places" - are there any plans to 
>> asses the potential impact elsewhere?
> 
> Yes, merging this patch is the 1st step, as error_capture is responsible 
> for most (maybe even all) regular
> DMAR read errors. After removing this noise it will be much easier to 
> spot other DMAR read errors.
> There are also multiple regular DMAR write errors (less frequent, but 
> still), which I hope to debug if time permits.
> 
> Regards
> Andrzej
> 
>>
>>> ---
>>> This patch tries to diminish plague of DMAR read errors present
>>> in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
>>> CI is usually tolerant for these errors, so the scale of the problem
>>> is not really visible.
>>> To show it I have counted lines containing DMAR read errors in dmesgs
>>> produced by CI for all three versions of the patch, but in contrast 
>>> to v2
>>> I have grepped only for lines containing "PTE Read access".
>>> Below stats for kernel w/o patch vs patched one.
>>> v1: 210 vs 0
>>> v2: 201 vs 0
>>> v3: 214 vs 0
>>> Apparently the patch fixes all common PTE read errors.
>>>
>>> In previous version there were different numbers due to less exact 
>>> grepping,
>>> "grep DMAR" catched write errors and "DMAR: DRHD: handling fault 
>>> status reg"
>>> lines, anyway the actual number of errors is much bigger - DMAR errors
>>> are rate-limited.
>>>
>>> [1]: 
>>> http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt
>>>
>>> Changelog:
>>> v2:
>>>      - modified commit message (I hope the diagnosis is correct),
>>>      - added bug checks to ensure scratch is initialized on gen3 
>>> platforms.
>>>        CI produces strange stacktrace for it suggesting scratch[0] is 
>>> NULL,
>>>        to be removed after resolving the issue with gen3 platforms.
>>> v3:
>>>      - removed bug checks, replaced with gen check.
>>> v4:
>>>      - change code for scratch page insertion to support all platforms,
>>>      - add info in commit message there could be more similar issues
>>>
>>> Regards
>>> Andrzej
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 ++++++++++++++++++++++++----
>>>   1 file changed, 27 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
>>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> index 842e69c7b21e49..6566d2066f1f8b 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>> @@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt 
>>> *ggtt)
>>>       mutex_destroy(&ggtt->error_mutex);
>>>   }
>>>   +static void
>>> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 
>>> length)
>>> +{
>>> +    struct i915_address_space *vm = &ggtt->vm;
>>> +
>>> +    if (GRAPHICS_VER(ggtt->vm.i915) < 8)
>>> +        return vm->clear_range(vm, offset, length);
>>> +    /* clear_range since gen8 is nop */
>>
>> It would also work to simply drop the <gen8 case and just do the loop 
>> below, right? If so would that be clearer?

Alternatively, if you want to keep the dual path here, perhaps it would 
be better to replace the gen check with "clear_range !=/== 
nop_clear_range". That way maybe more of the platform knowledge remains 
at a central location.

Another thing - are there any suspend-resume considerations 
(i915_ggtt_resume_vm)? Maybe this padding needs to be restored too.

Regards,

Tvrtko

>>
>>> +    while (length > 0) {
>>
>> Maybe add a GEM_BUG_ON if length is not aligned? Granted it may be a 
>> huge stretch of imagination..
>>
>> Regards,
>>
>> Tvrtko
>>
>>> +        vm->insert_page(vm, px_dma(vm->scratch[0]), offset, 
>>> I915_CACHE_NONE, 0);
>>> +        offset += I915_GTT_PAGE_SIZE;
>>> +        length -= I915_GTT_PAGE_SIZE;
>>> +    }
>>> +}
>>> +
>>>   static int init_ggtt(struct i915_ggtt *ggtt)
>>>   {
>>>       /*
>>> @@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>            * paths, and we trust that 0 will remain reserved. However,
>>>            * the only likely reason for failure to insert is a driver
>>>            * bug, which we expect to cause other failures...
>>> +         *
>>> +         * Since CPU can perform speculative reads on error capture
>>> +         * (write-combining allows it) add scratch page after error
>>> +         * capture to avoid DMAR errors.
>>>            */
>>> -        ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
>>> +        ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>>>           ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>>>           if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>>>               drm_mm_insert_node_in_range(&ggtt->vm.mm,
>>> @@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>                               0, ggtt->mappable_end,
>>>                               DRM_MM_INSERT_LOW);
>>>       }
>>> -    if (drm_mm_node_allocated(&ggtt->error_capture))
>>> +    if (drm_mm_node_allocated(&ggtt->error_capture)) {
>>> +        u64 start = ggtt->error_capture.start;
>>> +        u64 size = ggtt->error_capture.size;
>>> +
>>> +        ggtt_insert_scratch_pages(ggtt, start, size);
>>>           drm_dbg(&ggtt->vm.i915->drm,
>>>               "Reserved GGTT:[%llx, %llx] for use by error capture\n",
>>> -            ggtt->error_capture.start,
>>> -            ggtt->error_capture.start + ggtt->error_capture.size);
>>> +            start, start + size);
>>> +    }
>>>         /*
>>>        * The upper portion of the GuC address space has a sizeable hole
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture
  2023-03-03 12:01     ` Tvrtko Ursulin
@ 2023-03-03 12:54       ` Andrzej Hajda
  0 siblings, 0 replies; 17+ messages in thread
From: Andrzej Hajda @ 2023-03-03 12:54 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx
  Cc: Thomas Hellström, Matthew Auld, Rodrigo Vivi, chris.p.wilson



On 03.03.2023 13:01, Tvrtko Ursulin wrote:
>
> On 02/03/2023 11:00, Andrzej Hajda wrote:
>> On 02.03.2023 11:43, Tvrtko Ursulin wrote:
>>>
>>> On 08/02/2023 10:51, Andrzej Hajda wrote:
>>>> Write-combining memory allows speculative reads by CPU.
>>>> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
>>>> to prefetch memory beyond the error_capture, ie it tries
>>>> to read memory pointed by next PTE in GGTT.
>>>> If this PTE points to invalid address DMAR errors will occur.
>>>> This behaviour was observed on ADL, RPL, DG2 platforms.
>>>> To avoid it, guard scratch page should be added after error_capture.
>>>> The patch fixes the most annoying issue with error capture but
>>>> since WC reads are used also in other places there is a risk similar
>>>> problem can affect them as well.
>>>>
>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>
>>> Research tells the explanation is plausible so:
>>>
>>> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>
>> Thanks for looking at it.
>>
>>> On patch details below.
>>>
>>> What about "simiar risk in other places" - are there any plans to 
>>> asses the potential impact elsewhere?
>>
>> Yes, merging this patch is the 1st step, as error_capture is 
>> responsible for most (maybe even all) regular
>> DMAR read errors. After removing this noise it will be much easier to 
>> spot other DMAR read errors.
>> There are also multiple regular DMAR write errors (less frequent, but 
>> still), which I hope to debug if time permits.
>>
>> Regards
>> Andrzej
>>
>>>
>>>> ---
>>>> This patch tries to diminish plague of DMAR read errors present
>>>> in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
>>>> CI is usually tolerant for these errors, so the scale of the problem
>>>> is not really visible.
>>>> To show it I have counted lines containing DMAR read errors in dmesgs
>>>> produced by CI for all three versions of the patch, but in contrast 
>>>> to v2
>>>> I have grepped only for lines containing "PTE Read access".
>>>> Below stats for kernel w/o patch vs patched one.
>>>> v1: 210 vs 0
>>>> v2: 201 vs 0
>>>> v3: 214 vs 0
>>>> Apparently the patch fixes all common PTE read errors.
>>>>
>>>> In previous version there were different numbers due to less exact 
>>>> grepping,
>>>> "grep DMAR" catched write errors and "DMAR: DRHD: handling fault 
>>>> status reg"
>>>> lines, anyway the actual number of errors is much bigger - DMAR errors
>>>> are rate-limited.
>>>>
>>>> [1]: 
>>>> http://gfx-ci.igk.intel.com/tree/drm-tip/CI_DRM_12678/bat-adln-1/dmesg0.txt
>>>>
>>>> Changelog:
>>>> v2:
>>>>      - modified commit message (I hope the diagnosis is correct),
>>>>      - added bug checks to ensure scratch is initialized on gen3 
>>>> platforms.
>>>>        CI produces strange stacktrace for it suggesting scratch[0] 
>>>> is NULL,
>>>>        to be removed after resolving the issue with gen3 platforms.
>>>> v3:
>>>>      - removed bug checks, replaced with gen check.
>>>> v4:
>>>>      - change code for scratch page insertion to support all 
>>>> platforms,
>>>>      - add info in commit message there could be more similar issues
>>>>
>>>> Regards
>>>> Andrzej
>>>> ---
>>>>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 31 
>>>> ++++++++++++++++++++++++----
>>>>   1 file changed, 27 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
>>>> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>> index 842e69c7b21e49..6566d2066f1f8b 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
>>>> @@ -503,6 +503,21 @@ static void cleanup_init_ggtt(struct i915_ggtt 
>>>> *ggtt)
>>>>       mutex_destroy(&ggtt->error_mutex);
>>>>   }
>>>>   +static void
>>>> +ggtt_insert_scratch_pages(struct i915_ggtt *ggtt, u64 offset, u64 
>>>> length)
>>>> +{
>>>> +    struct i915_address_space *vm = &ggtt->vm;
>>>> +
>>>> +    if (GRAPHICS_VER(ggtt->vm.i915) < 8)
>>>> +        return vm->clear_range(vm, offset, length);
>>>> +    /* clear_range since gen8 is nop */
>>>
>>> It would also work to simply drop the <gen8 case and just do the 
>>> loop below, right? If so would that be clearer?
>

Apparently I have missed this comment, loop does not work for gen3, 
vm->scratch[0] is not initialized, it uses some code from 
drivers/char/agp/intel-gtt.c for clearing range.

> Alternatively, if you want to keep the dual path here, perhaps it 
> would be better to replace the gen check with "clear_range !=/== 
> nop_clear_range". That way maybe more of the platform knowledge 
> remains at a central location.

There is vm->error_range in internal branch which does the same as 
clear_range, which 'when upstreamed (?)' should simplify this code. On 
the other side unstaticising nop_clear_range would allow to drop 
mock_clear_range, dpt_clear_range.

>
> Another thing - are there any suspend-resume considerations 
> (i915_ggtt_resume_vm)? Maybe this padding needs to be restored too.

Didn't think about it, I will add code there too.

Regards
Andrzej

>
> Regards,
>
> Tvrtko
>
>>>
>>>> +    while (length > 0) {
>>>
>>> Maybe add a GEM_BUG_ON if length is not aligned? Granted it may be a 
>>> huge stretch of imagination..
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>> +        vm->insert_page(vm, px_dma(vm->scratch[0]), offset, 
>>>> I915_CACHE_NONE, 0);
>>>> +        offset += I915_GTT_PAGE_SIZE;
>>>> +        length -= I915_GTT_PAGE_SIZE;
>>>> +    }
>>>> +}
>>>> +
>>>>   static int init_ggtt(struct i915_ggtt *ggtt)
>>>>   {
>>>>       /*
>>>> @@ -551,8 +566,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>>            * paths, and we trust that 0 will remain reserved. However,
>>>>            * the only likely reason for failure to insert is a driver
>>>>            * bug, which we expect to cause other failures...
>>>> +         *
>>>> +         * Since CPU can perform speculative reads on error capture
>>>> +         * (write-combining allows it) add scratch page after error
>>>> +         * capture to avoid DMAR errors.
>>>>            */
>>>> -        ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
>>>> +        ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>>>>           ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>>>>           if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>>>> drm_mm_insert_node_in_range(&ggtt->vm.mm,
>>>> @@ -562,11 +581,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>>>>                               0, ggtt->mappable_end,
>>>>                               DRM_MM_INSERT_LOW);
>>>>       }
>>>> -    if (drm_mm_node_allocated(&ggtt->error_capture))
>>>> +    if (drm_mm_node_allocated(&ggtt->error_capture)) {
>>>> +        u64 start = ggtt->error_capture.start;
>>>> +        u64 size = ggtt->error_capture.size;
>>>> +
>>>> +        ggtt_insert_scratch_pages(ggtt, start, size);
>>>>           drm_dbg(&ggtt->vm.i915->drm,
>>>>               "Reserved GGTT:[%llx, %llx] for use by error capture\n",
>>>> -            ggtt->error_capture.start,
>>>> -            ggtt->error_capture.start + ggtt->error_capture.size);
>>>> +            start, start + size);
>>>> +    }
>>>>         /*
>>>>        * The upper portion of the GuC address space has a sizeable 
>>>> hole
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-03-03 12:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-08 10:51 [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
2023-02-08 11:03 ` Matthew Auld
2023-02-08 11:17   ` Andrzej Hajda
2023-02-08 11:29     ` Andrzej Hajda
2023-02-08 11:35       ` Matthew Auld
2023-02-08 18:39 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev5) Patchwork
2023-02-22 11:45   ` Andrzej Hajda
2023-02-08 19:52 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: add guard page to ggtt->error_capture (rev6) Patchwork
2023-02-22 11:35   ` Andrzej Hajda
2023-02-09 20:04 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2023-02-22 12:05 ` [Intel-gfx] [PATCH v4] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
2023-02-22 20:48   ` Rodrigo Vivi
2023-02-27 15:35     ` Andrzej Hajda
2023-03-02 10:43 ` Tvrtko Ursulin
2023-03-02 11:00   ` Andrzej Hajda
2023-03-03 12:01     ` Tvrtko Ursulin
2023-03-03 12:54       ` Andrzej Hajda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).