All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Das, Nirmoy" <nirmoy.das@linux.intel.com>
To: Andrzej Hajda <andrzej.hajda@intel.com>,
	Jani Nikula <jani.nikula@linux.intel.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	Andi Shyti <andi.shyti@linux.intel.com>,
	Chris Wilson <chris.p.wilson@linux.intel.com>,
	Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [Intel-gfx] [PATCH v6 2/2] drm/i915: add guard page to ggtt->error_capture
Date: Mon, 13 Mar 2023 13:58:53 +0100	[thread overview]
Message-ID: <a07ac81c-f234-a68b-0d68-225259e534a8@linux.intel.com> (raw)
In-Reply-To: <20230308-guard_error_capture-v6-2-1b5f31422563@intel.com>


On 3/10/2023 10:23 AM, Andrzej Hajda wrote:
> Write-combining memory allows speculative reads by CPU.
> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
> to prefetch memory beyond the error_capture, ie it tries
> to read memory pointed by next PTE in GGTT.
> If this PTE points to invalid address DMAR errors will occur.
> This behaviour was observed on ADL and RPL platforms.
> To avoid it, guard scratch page should be added after error_capture.
> The patch fixes the most annoying issue with error capture but
> since WC reads are used also in other places there is a risk similar
> problem can affect them as well.
>
> v2:
>    - modified commit message (I hope the diagnosis is correct),
>    - added bug checks to ensure scratch is initialized on gen3 platforms.
>      CI produces strange stacktrace for it suggesting scratch[0] is NULL,
>      to be removed after resolving the issue with gen3 platforms.
> v3:
>    - removed bug checks, replaced with gen check.
> v4:
>    - change code for scratch page insertion to support all platforms,
>    - add info in commit message there could be more similar issues
> v5:
>    - check for nop_clear_range instead of gen8 (Tvrtko),
>    - re-insert scratch pages on resume (Tvrtko)
> v6:
>    - use scratch_range callback to set scratch pages (Chris)
>
> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 20 ++++++++++++++++----
>   1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 38e6f0b207fe0c..5ef7e03b11c8e6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -572,8 +572,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   		 * paths, and we trust that 0 will remain reserved. However,
>   		 * the only likely reason for failure to insert is a driver
>   		 * bug, which we expect to cause other failures...
> +		 *
> +		 * Since CPU can perform speculative reads on error capture
> +		 * (write-combining allows it) add scratch page after error
> +		 * capture to avoid DMAR errors.
>   		 */
> -		ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
> +		ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>   		ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>   		if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>   			drm_mm_insert_node_in_range(&ggtt->vm.mm,
> @@ -583,11 +587,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   						    0, ggtt->mappable_end,
>   						    DRM_MM_INSERT_LOW);
>   	}
> -	if (drm_mm_node_allocated(&ggtt->error_capture))
> +	if (drm_mm_node_allocated(&ggtt->error_capture)) {
> +		u64 start = ggtt->error_capture.start;
> +		u64 size = ggtt->error_capture.size;
> +
> +		ggtt->vm.scratch_range(&ggtt->vm, start, size);
>   		drm_dbg(&ggtt->vm.i915->drm,
>   			"Reserved GGTT:[%llx, %llx] for use by error capture\n",
> -			ggtt->error_capture.start,
> -			ggtt->error_capture.start + ggtt->error_capture.size);
> +			start, start + size);
> +	}
>   
>   	/*
>   	 * The upper portion of the GuC address space has a sizeable hole
> @@ -1280,6 +1288,10 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
>   
>   	flush = i915_ggtt_resume_vm(&ggtt->vm);
>   
> +	if (drm_mm_node_allocated(&ggtt->error_capture))
> +		ggtt->vm.scratch_range(&ggtt->vm, ggtt->error_capture.start,
> +				       ggtt->error_capture.size);
> +
>   	ggtt->invalidate(ggtt);
>   
>   	if (flush)
>

WARNING: multiple messages have this Message-ID (diff)
From: "Das, Nirmoy" <nirmoy.das@linux.intel.com>
To: Andrzej Hajda <andrzej.hajda@intel.com>,
	Jani Nikula <jani.nikula@linux.intel.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	Chris Wilson <chris.p.wilson@linux.intel.com>,
	Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [Intel-gfx] [PATCH v6 2/2] drm/i915: add guard page to ggtt->error_capture
Date: Mon, 13 Mar 2023 13:58:53 +0100	[thread overview]
Message-ID: <a07ac81c-f234-a68b-0d68-225259e534a8@linux.intel.com> (raw)
In-Reply-To: <20230308-guard_error_capture-v6-2-1b5f31422563@intel.com>


On 3/10/2023 10:23 AM, Andrzej Hajda wrote:
> Write-combining memory allows speculative reads by CPU.
> ggtt->error_capture is WC mapped to CPU, so CPU/MMU can try
> to prefetch memory beyond the error_capture, ie it tries
> to read memory pointed by next PTE in GGTT.
> If this PTE points to invalid address DMAR errors will occur.
> This behaviour was observed on ADL and RPL platforms.
> To avoid it, guard scratch page should be added after error_capture.
> The patch fixes the most annoying issue with error capture but
> since WC reads are used also in other places there is a risk similar
> problem can affect them as well.
>
> v2:
>    - modified commit message (I hope the diagnosis is correct),
>    - added bug checks to ensure scratch is initialized on gen3 platforms.
>      CI produces strange stacktrace for it suggesting scratch[0] is NULL,
>      to be removed after resolving the issue with gen3 platforms.
> v3:
>    - removed bug checks, replaced with gen check.
> v4:
>    - change code for scratch page insertion to support all platforms,
>    - add info in commit message there could be more similar issues
> v5:
>    - check for nop_clear_range instead of gen8 (Tvrtko),
>    - re-insert scratch pages on resume (Tvrtko)
> v6:
>    - use scratch_range callback to set scratch pages (Chris)
>
> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_ggtt.c | 20 ++++++++++++++++----
>   1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 38e6f0b207fe0c..5ef7e03b11c8e6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -572,8 +572,12 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   		 * paths, and we trust that 0 will remain reserved. However,
>   		 * the only likely reason for failure to insert is a driver
>   		 * bug, which we expect to cause other failures...
> +		 *
> +		 * Since CPU can perform speculative reads on error capture
> +		 * (write-combining allows it) add scratch page after error
> +		 * capture to avoid DMAR errors.
>   		 */
> -		ggtt->error_capture.size = I915_GTT_PAGE_SIZE;
> +		ggtt->error_capture.size = 2 * I915_GTT_PAGE_SIZE;
>   		ggtt->error_capture.color = I915_COLOR_UNEVICTABLE;
>   		if (drm_mm_reserve_node(&ggtt->vm.mm, &ggtt->error_capture))
>   			drm_mm_insert_node_in_range(&ggtt->vm.mm,
> @@ -583,11 +587,15 @@ static int init_ggtt(struct i915_ggtt *ggtt)
>   						    0, ggtt->mappable_end,
>   						    DRM_MM_INSERT_LOW);
>   	}
> -	if (drm_mm_node_allocated(&ggtt->error_capture))
> +	if (drm_mm_node_allocated(&ggtt->error_capture)) {
> +		u64 start = ggtt->error_capture.start;
> +		u64 size = ggtt->error_capture.size;
> +
> +		ggtt->vm.scratch_range(&ggtt->vm, start, size);
>   		drm_dbg(&ggtt->vm.i915->drm,
>   			"Reserved GGTT:[%llx, %llx] for use by error capture\n",
> -			ggtt->error_capture.start,
> -			ggtt->error_capture.start + ggtt->error_capture.size);
> +			start, start + size);
> +	}
>   
>   	/*
>   	 * The upper portion of the GuC address space has a sizeable hole
> @@ -1280,6 +1288,10 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
>   
>   	flush = i915_ggtt_resume_vm(&ggtt->vm);
>   
> +	if (drm_mm_node_allocated(&ggtt->error_capture))
> +		ggtt->vm.scratch_range(&ggtt->vm, ggtt->error_capture.start,
> +				       ggtt->error_capture.size);
> +
>   	ggtt->invalidate(ggtt);
>   
>   	if (flush)
>

  reply	other threads:[~2023-03-13 12:59 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-10  9:23 [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
2023-03-10  9:23 ` Andrzej Hajda
2023-03-10  9:23 ` [Intel-gfx] " Andrzej Hajda
2023-03-10  9:23 ` [PATCH v6 1/2] drm/i915/gt: introduce vm->scratch_range callback Andrzej Hajda
2023-03-10  9:23   ` Andrzej Hajda
2023-03-10  9:23   ` [Intel-gfx] " Andrzej Hajda
2023-03-13 12:58   ` Das, Nirmoy
2023-03-13 12:58     ` [Intel-gfx] " Das, Nirmoy
2023-03-14 17:14   ` Andi Shyti
2023-03-14 17:14     ` [Intel-gfx] " Andi Shyti
2023-03-14 17:14     ` Andi Shyti
2023-03-10  9:23 ` [PATCH v6 2/2] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
2023-03-10  9:23   ` Andrzej Hajda
2023-03-10  9:23   ` [Intel-gfx] " Andrzej Hajda
2023-03-13 12:58   ` Das, Nirmoy [this message]
2023-03-13 12:58     ` Das, Nirmoy
2023-03-10 11:37 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: add guard page to ggtt->error_capture (rev8) Patchwork
2023-03-10 11:56 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-03-13  6:03 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2023-03-16 18:18 ` [Intel-gfx] [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
2023-03-16 18:18   ` Andrzej Hajda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a07ac81c-f234-a68b-0d68-225259e534a8@linux.intel.com \
    --to=nirmoy.das@linux.intel.com \
    --cc=andi.shyti@linux.intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=chris.p.wilson@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nirmoy.das@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.