All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrzej Hajda <andrzej.hajda@intel.com>
To: Jani Nikula <jani.nikula@linux.intel.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>,
	intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	Andi Shyti <andi.shyti@linux.intel.com>,
	Chris Wilson <chris.p.wilson@linux.intel.com>,
	Nirmoy Das <nirmoy.das@intel.com>
Subject: [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture
Date: Fri, 10 Mar 2023 10:23:49 +0100	[thread overview]
Message-ID: <20230308-guard_error_capture-v6-0-1b5f31422563@intel.com> (raw)

This patch tries to diminish plague of DMAR read errors present
in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
CI is usually tolerant for these errors, so the scale of the problem
is not really visible.
To show it I have counted lines containing DMAR read errors in dmesgs
produced by CI for all three versions of the patch, but in contrast to v2
I have grepped only for lines containing "PTE Read access".
Below stats for kernel w/o patchset vs patched one.
v1: 210 vs 0
v2: 201 vs 0
v3: 214 vs 0
Apparently the patchset fixes all common PTE read errors.

Changelog:
v2:
    - modified commit message (I hope the diagnosis is correct),
    - added bug checks to ensure scratch is initialized on gen3 platforms.
      CI produces strange stacktrace for it suggesting scratch[0] is NULL,
      to be removed after resolving the issue with gen3 platforms.
v3:
    - removed bug checks, replaced with gen check.
v4:
    - change code for scratch page insertion to support all platforms,
    - add info in commit message there could be more similar issues
v5:
    - changed to patchset adding nop_clear_range related code,
    - re-insert scratch PTEs on resume
v6:
    - use scratch_range

To: Jani Nikula <jani.nikula@linux.intel.com>
To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>

---
- Link to v5: https://lore.kernel.org/r/20230308-guard_error_capture-v5-0-6d1410d13540@intel.com

---
Andrzej Hajda (2):
      drm/i915/gt: introduce vm->scratch_range callback
      drm/i915: add guard page to ggtt->error_capture

 drivers/gpu/drm/i915/gt/intel_ggtt.c      | 43 ++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c |  1 +
 drivers/gpu/drm/i915/gt/intel_gtt.h       |  2 ++
 3 files changed, 42 insertions(+), 4 deletions(-)
---
base-commit: 3cd6c251f39c14df9ab711e3eb56e703b359ff54
change-id: 20230308-guard_error_capture-f3f334eec85f

Best regards,
-- 
Andrzej Hajda <andrzej.hajda@intel.com>

WARNING: multiple messages have this Message-ID (diff)
From: Andrzej Hajda <andrzej.hajda@intel.com>
To: Jani Nikula <jani.nikula@linux.intel.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>,
	intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	Chris Wilson <chris.p.wilson@linux.intel.com>,
	Nirmoy Das <nirmoy.das@intel.com>
Subject: [Intel-gfx] [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture
Date: Fri, 10 Mar 2023 10:23:49 +0100	[thread overview]
Message-ID: <20230308-guard_error_capture-v6-0-1b5f31422563@intel.com> (raw)

This patch tries to diminish plague of DMAR read errors present
in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
CI is usually tolerant for these errors, so the scale of the problem
is not really visible.
To show it I have counted lines containing DMAR read errors in dmesgs
produced by CI for all three versions of the patch, but in contrast to v2
I have grepped only for lines containing "PTE Read access".
Below stats for kernel w/o patchset vs patched one.
v1: 210 vs 0
v2: 201 vs 0
v3: 214 vs 0
Apparently the patchset fixes all common PTE read errors.

Changelog:
v2:
    - modified commit message (I hope the diagnosis is correct),
    - added bug checks to ensure scratch is initialized on gen3 platforms.
      CI produces strange stacktrace for it suggesting scratch[0] is NULL,
      to be removed after resolving the issue with gen3 platforms.
v3:
    - removed bug checks, replaced with gen check.
v4:
    - change code for scratch page insertion to support all platforms,
    - add info in commit message there could be more similar issues
v5:
    - changed to patchset adding nop_clear_range related code,
    - re-insert scratch PTEs on resume
v6:
    - use scratch_range

To: Jani Nikula <jani.nikula@linux.intel.com>
To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>

---
- Link to v5: https://lore.kernel.org/r/20230308-guard_error_capture-v5-0-6d1410d13540@intel.com

---
Andrzej Hajda (2):
      drm/i915/gt: introduce vm->scratch_range callback
      drm/i915: add guard page to ggtt->error_capture

 drivers/gpu/drm/i915/gt/intel_ggtt.c      | 43 ++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c |  1 +
 drivers/gpu/drm/i915/gt/intel_gtt.h       |  2 ++
 3 files changed, 42 insertions(+), 4 deletions(-)
---
base-commit: 3cd6c251f39c14df9ab711e3eb56e703b359ff54
change-id: 20230308-guard_error_capture-f3f334eec85f

Best regards,
-- 
Andrzej Hajda <andrzej.hajda@intel.com>

WARNING: multiple messages have this Message-ID (diff)
From: Andrzej Hajda <andrzej.hajda@intel.com>
To: Jani Nikula <jani.nikula@linux.intel.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org,
	Andi Shyti <andi.shyti@linux.intel.com>,
	Chris Wilson <chris.p.wilson@linux.intel.com>,
	Nirmoy Das <nirmoy.das@intel.com>,
	Andrzej Hajda <andrzej.hajda@intel.com>
Subject: [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture
Date: Fri, 10 Mar 2023 10:23:49 +0100	[thread overview]
Message-ID: <20230308-guard_error_capture-v6-0-1b5f31422563@intel.com> (raw)

This patch tries to diminish plague of DMAR read errors present
in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
CI is usually tolerant for these errors, so the scale of the problem
is not really visible.
To show it I have counted lines containing DMAR read errors in dmesgs
produced by CI for all three versions of the patch, but in contrast to v2
I have grepped only for lines containing "PTE Read access".
Below stats for kernel w/o patchset vs patched one.
v1: 210 vs 0
v2: 201 vs 0
v3: 214 vs 0
Apparently the patchset fixes all common PTE read errors.

Changelog:
v2:
    - modified commit message (I hope the diagnosis is correct),
    - added bug checks to ensure scratch is initialized on gen3 platforms.
      CI produces strange stacktrace for it suggesting scratch[0] is NULL,
      to be removed after resolving the issue with gen3 platforms.
v3:
    - removed bug checks, replaced with gen check.
v4:
    - change code for scratch page insertion to support all platforms,
    - add info in commit message there could be more similar issues
v5:
    - changed to patchset adding nop_clear_range related code,
    - re-insert scratch PTEs on resume
v6:
    - use scratch_range

To: Jani Nikula <jani.nikula@linux.intel.com>
To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>

---
- Link to v5: https://lore.kernel.org/r/20230308-guard_error_capture-v5-0-6d1410d13540@intel.com

---
Andrzej Hajda (2):
      drm/i915/gt: introduce vm->scratch_range callback
      drm/i915: add guard page to ggtt->error_capture

 drivers/gpu/drm/i915/gt/intel_ggtt.c      | 43 ++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c |  1 +
 drivers/gpu/drm/i915/gt/intel_gtt.h       |  2 ++
 3 files changed, 42 insertions(+), 4 deletions(-)
---
base-commit: 3cd6c251f39c14df9ab711e3eb56e703b359ff54
change-id: 20230308-guard_error_capture-f3f334eec85f

Best regards,
-- 
Andrzej Hajda <andrzej.hajda@intel.com>

             reply	other threads:[~2023-03-10  9:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-10  9:23 Andrzej Hajda [this message]
2023-03-10  9:23 ` [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
2023-03-10  9:23 ` [Intel-gfx] " Andrzej Hajda
2023-03-10  9:23 ` [PATCH v6 1/2] drm/i915/gt: introduce vm->scratch_range callback Andrzej Hajda
2023-03-10  9:23   ` Andrzej Hajda
2023-03-10  9:23   ` [Intel-gfx] " Andrzej Hajda
2023-03-13 12:58   ` Das, Nirmoy
2023-03-13 12:58     ` [Intel-gfx] " Das, Nirmoy
2023-03-14 17:14   ` Andi Shyti
2023-03-14 17:14     ` [Intel-gfx] " Andi Shyti
2023-03-14 17:14     ` Andi Shyti
2023-03-10  9:23 ` [PATCH v6 2/2] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
2023-03-10  9:23   ` Andrzej Hajda
2023-03-10  9:23   ` [Intel-gfx] " Andrzej Hajda
2023-03-13 12:58   ` Das, Nirmoy
2023-03-13 12:58     ` Das, Nirmoy
2023-03-10 11:37 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: add guard page to ggtt->error_capture (rev8) Patchwork
2023-03-10 11:56 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-03-13  6:03 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2023-03-16 18:18 ` [Intel-gfx] [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
2023-03-16 18:18   ` Andrzej Hajda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230308-guard_error_capture-v6-0-1b5f31422563@intel.com \
    --to=andrzej.hajda@intel.com \
    --cc=andi.shyti@linux.intel.com \
    --cc=chris.p.wilson@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nirmoy.das@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.