From: Andrzej Hajda <andrzej.hajda@intel.com>
To: Jani Nikula <jani.nikula@linux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>,
intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
dri-devel@lists.freedesktop.org,
Andi Shyti <andi.shyti@linux.intel.com>,
Chris Wilson <chris.p.wilson@linux.intel.com>,
Nirmoy Das <nirmoy.das@intel.com>
Subject: [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture
Date: Fri, 10 Mar 2023 10:23:49 +0100 [thread overview]
Message-ID: <20230308-guard_error_capture-v6-0-1b5f31422563@intel.com> (raw)
This patch tries to diminish plague of DMAR read errors present
in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
CI is usually tolerant for these errors, so the scale of the problem
is not really visible.
To show it I have counted lines containing DMAR read errors in dmesgs
produced by CI for all three versions of the patch, but in contrast to v2
I have grepped only for lines containing "PTE Read access".
Below stats for kernel w/o patchset vs patched one.
v1: 210 vs 0
v2: 201 vs 0
v3: 214 vs 0
Apparently the patchset fixes all common PTE read errors.
Changelog:
v2:
- modified commit message (I hope the diagnosis is correct),
- added bug checks to ensure scratch is initialized on gen3 platforms.
CI produces strange stacktrace for it suggesting scratch[0] is NULL,
to be removed after resolving the issue with gen3 platforms.
v3:
- removed bug checks, replaced with gen check.
v4:
- change code for scratch page insertion to support all platforms,
- add info in commit message there could be more similar issues
v5:
- changed to patchset adding nop_clear_range related code,
- re-insert scratch PTEs on resume
v6:
- use scratch_range
To: Jani Nikula <jani.nikula@linux.intel.com>
To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
- Link to v5: https://lore.kernel.org/r/20230308-guard_error_capture-v5-0-6d1410d13540@intel.com
---
Andrzej Hajda (2):
drm/i915/gt: introduce vm->scratch_range callback
drm/i915: add guard page to ggtt->error_capture
drivers/gpu/drm/i915/gt/intel_ggtt.c | 43 ++++++++++++++++++++++++++++---
drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c | 1 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 2 ++
3 files changed, 42 insertions(+), 4 deletions(-)
---
base-commit: 3cd6c251f39c14df9ab711e3eb56e703b359ff54
change-id: 20230308-guard_error_capture-f3f334eec85f
Best regards,
--
Andrzej Hajda <andrzej.hajda@intel.com>
WARNING: multiple messages have this Message-ID (diff)
From: Andrzej Hajda <andrzej.hajda@intel.com>
To: Jani Nikula <jani.nikula@linux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>,
intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
dri-devel@lists.freedesktop.org,
Chris Wilson <chris.p.wilson@linux.intel.com>,
Nirmoy Das <nirmoy.das@intel.com>
Subject: [Intel-gfx] [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture
Date: Fri, 10 Mar 2023 10:23:49 +0100 [thread overview]
Message-ID: <20230308-guard_error_capture-v6-0-1b5f31422563@intel.com> (raw)
This patch tries to diminish plague of DMAR read errors present
in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
CI is usually tolerant for these errors, so the scale of the problem
is not really visible.
To show it I have counted lines containing DMAR read errors in dmesgs
produced by CI for all three versions of the patch, but in contrast to v2
I have grepped only for lines containing "PTE Read access".
Below stats for kernel w/o patchset vs patched one.
v1: 210 vs 0
v2: 201 vs 0
v3: 214 vs 0
Apparently the patchset fixes all common PTE read errors.
Changelog:
v2:
- modified commit message (I hope the diagnosis is correct),
- added bug checks to ensure scratch is initialized on gen3 platforms.
CI produces strange stacktrace for it suggesting scratch[0] is NULL,
to be removed after resolving the issue with gen3 platforms.
v3:
- removed bug checks, replaced with gen check.
v4:
- change code for scratch page insertion to support all platforms,
- add info in commit message there could be more similar issues
v5:
- changed to patchset adding nop_clear_range related code,
- re-insert scratch PTEs on resume
v6:
- use scratch_range
To: Jani Nikula <jani.nikula@linux.intel.com>
To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
- Link to v5: https://lore.kernel.org/r/20230308-guard_error_capture-v5-0-6d1410d13540@intel.com
---
Andrzej Hajda (2):
drm/i915/gt: introduce vm->scratch_range callback
drm/i915: add guard page to ggtt->error_capture
drivers/gpu/drm/i915/gt/intel_ggtt.c | 43 ++++++++++++++++++++++++++++---
drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c | 1 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 2 ++
3 files changed, 42 insertions(+), 4 deletions(-)
---
base-commit: 3cd6c251f39c14df9ab711e3eb56e703b359ff54
change-id: 20230308-guard_error_capture-f3f334eec85f
Best regards,
--
Andrzej Hajda <andrzej.hajda@intel.com>
WARNING: multiple messages have this Message-ID (diff)
From: Andrzej Hajda <andrzej.hajda@intel.com>
To: Jani Nikula <jani.nikula@linux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
linux-kernel@vger.kernel.org,
Andi Shyti <andi.shyti@linux.intel.com>,
Chris Wilson <chris.p.wilson@linux.intel.com>,
Nirmoy Das <nirmoy.das@intel.com>,
Andrzej Hajda <andrzej.hajda@intel.com>
Subject: [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture
Date: Fri, 10 Mar 2023 10:23:49 +0100 [thread overview]
Message-ID: <20230308-guard_error_capture-v6-0-1b5f31422563@intel.com> (raw)
This patch tries to diminish plague of DMAR read errors present
in CI for ADL*, RPL*, DG2 platforms, see for example [1] (grep DMAR).
CI is usually tolerant for these errors, so the scale of the problem
is not really visible.
To show it I have counted lines containing DMAR read errors in dmesgs
produced by CI for all three versions of the patch, but in contrast to v2
I have grepped only for lines containing "PTE Read access".
Below stats for kernel w/o patchset vs patched one.
v1: 210 vs 0
v2: 201 vs 0
v3: 214 vs 0
Apparently the patchset fixes all common PTE read errors.
Changelog:
v2:
- modified commit message (I hope the diagnosis is correct),
- added bug checks to ensure scratch is initialized on gen3 platforms.
CI produces strange stacktrace for it suggesting scratch[0] is NULL,
to be removed after resolving the issue with gen3 platforms.
v3:
- removed bug checks, replaced with gen check.
v4:
- change code for scratch page insertion to support all platforms,
- add info in commit message there could be more similar issues
v5:
- changed to patchset adding nop_clear_range related code,
- re-insert scratch PTEs on resume
v6:
- use scratch_range
To: Jani Nikula <jani.nikula@linux.intel.com>
To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
- Link to v5: https://lore.kernel.org/r/20230308-guard_error_capture-v5-0-6d1410d13540@intel.com
---
Andrzej Hajda (2):
drm/i915/gt: introduce vm->scratch_range callback
drm/i915: add guard page to ggtt->error_capture
drivers/gpu/drm/i915/gt/intel_ggtt.c | 43 ++++++++++++++++++++++++++++---
drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c | 1 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 2 ++
3 files changed, 42 insertions(+), 4 deletions(-)
---
base-commit: 3cd6c251f39c14df9ab711e3eb56e703b359ff54
change-id: 20230308-guard_error_capture-f3f334eec85f
Best regards,
--
Andrzej Hajda <andrzej.hajda@intel.com>
next reply other threads:[~2023-03-10 9:24 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-10 9:23 Andrzej Hajda [this message]
2023-03-10 9:23 ` [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
2023-03-10 9:23 ` [Intel-gfx] " Andrzej Hajda
2023-03-10 9:23 ` [PATCH v6 1/2] drm/i915/gt: introduce vm->scratch_range callback Andrzej Hajda
2023-03-10 9:23 ` Andrzej Hajda
2023-03-10 9:23 ` [Intel-gfx] " Andrzej Hajda
2023-03-13 12:58 ` Das, Nirmoy
2023-03-13 12:58 ` [Intel-gfx] " Das, Nirmoy
2023-03-14 17:14 ` Andi Shyti
2023-03-14 17:14 ` [Intel-gfx] " Andi Shyti
2023-03-14 17:14 ` Andi Shyti
2023-03-10 9:23 ` [PATCH v6 2/2] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
2023-03-10 9:23 ` Andrzej Hajda
2023-03-10 9:23 ` [Intel-gfx] " Andrzej Hajda
2023-03-13 12:58 ` Das, Nirmoy
2023-03-13 12:58 ` Das, Nirmoy
2023-03-10 11:37 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: add guard page to ggtt->error_capture (rev8) Patchwork
2023-03-10 11:56 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-03-13 6:03 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2023-03-16 18:18 ` [Intel-gfx] [PATCH v6 0/2] drm/i915: add guard page to ggtt->error_capture Andrzej Hajda
2023-03-16 18:18 ` Andrzej Hajda
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230308-guard_error_capture-v6-0-1b5f31422563@intel.com \
--to=andrzej.hajda@intel.com \
--cc=andi.shyti@linux.intel.com \
--cc=chris.p.wilson@linux.intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=jani.nikula@linux.intel.com \
--cc=joonas.lahtinen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nirmoy.das@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=tvrtko.ursulin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.