From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: John Harrison <John.C.Harrison@Intel.com>, DRI-Devel@Lists.FreeDesktop.Org Subject: [PATCH v4 0/7] Allow error capture without a request & fix locking issues Date: Fri, 20 Jan 2023 15:28:24 -0800 [thread overview] Message-ID: <20230120232831.28177-1-John.C.Harrison@Intel.com> (raw) From: John Harrison <John.C.Harrison@Intel.com> It is technically possible to get a hung context without a valid request. In such a situation, try to provide as much information in the error capture as possible rather than just aborting and capturing nothing. Similarly, in the case of an engine reset failure the GuC is not able to report the guilty context. So try a manual search instead of reporting nothing. While doing all this, it was noticed that the locking was broken in a number of places when searching for hung requests and dumping request info. So fix all that up as well. v2: Tidy up code flow in error capture. Reword some comments/messages. (review feedback from Tvrtko) Also fix up request locking issues from earlier changes noticed during code review of this change. v3: Fix some potential null pointer derefs and a reference leak. Add new patch to refactor the duplicated hung request search code into a common backend agnostic wrapper function and use the correct spinlocks for the correct lists. Also tweak some of the patch descriptions for better accuracy. v4: Shuffle some code around to more appropriate source files. Fix potential leak of GuC capture object after code flow re-org and pull improved info message earlier (Daniele). Also rename the GuC capture object to be more consistent. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> John Harrison (7): drm/i915: Fix request locking during error capture & debugfs dump drm/i915: Fix up locking around dumping requests lists drm/i915: Allow error capture without a request drm/i915: Allow error capture of a pending request drm/i915/guc: Look for a guilty context when an engine reset fails drm/i915/guc: Add a debug print on GuC triggered reset drm/i915/guc: Rename GuC register state capture node to be more obvious drivers/gpu/drm/i915/gt/intel_context.c | 4 +- drivers/gpu/drm/i915/gt/intel_context.h | 3 +- drivers/gpu/drm/i915/gt/intel_engine.h | 4 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 74 ++++++++------- .../drm/i915/gt/intel_execlists_submission.c | 27 ++++++ .../drm/i915/gt/intel_execlists_submission.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_capture.c | 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 ++++++- drivers/gpu/drm/i915/i915_gpu_error.c | 92 ++++++++++--------- drivers/gpu/drm/i915/i915_gpu_error.h | 2 +- 10 files changed, 160 insertions(+), 93 deletions(-) -- 2.39.0
WARNING: multiple messages have this Message-ID (diff)
From: John.C.Harrison@Intel.com To: Intel-GFX@Lists.FreeDesktop.Org Cc: DRI-Devel@Lists.FreeDesktop.Org Subject: [Intel-gfx] [PATCH v4 0/7] Allow error capture without a request & fix locking issues Date: Fri, 20 Jan 2023 15:28:24 -0800 [thread overview] Message-ID: <20230120232831.28177-1-John.C.Harrison@Intel.com> (raw) From: John Harrison <John.C.Harrison@Intel.com> It is technically possible to get a hung context without a valid request. In such a situation, try to provide as much information in the error capture as possible rather than just aborting and capturing nothing. Similarly, in the case of an engine reset failure the GuC is not able to report the guilty context. So try a manual search instead of reporting nothing. While doing all this, it was noticed that the locking was broken in a number of places when searching for hung requests and dumping request info. So fix all that up as well. v2: Tidy up code flow in error capture. Reword some comments/messages. (review feedback from Tvrtko) Also fix up request locking issues from earlier changes noticed during code review of this change. v3: Fix some potential null pointer derefs and a reference leak. Add new patch to refactor the duplicated hung request search code into a common backend agnostic wrapper function and use the correct spinlocks for the correct lists. Also tweak some of the patch descriptions for better accuracy. v4: Shuffle some code around to more appropriate source files. Fix potential leak of GuC capture object after code flow re-org and pull improved info message earlier (Daniele). Also rename the GuC capture object to be more consistent. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> John Harrison (7): drm/i915: Fix request locking during error capture & debugfs dump drm/i915: Fix up locking around dumping requests lists drm/i915: Allow error capture without a request drm/i915: Allow error capture of a pending request drm/i915/guc: Look for a guilty context when an engine reset fails drm/i915/guc: Add a debug print on GuC triggered reset drm/i915/guc: Rename GuC register state capture node to be more obvious drivers/gpu/drm/i915/gt/intel_context.c | 4 +- drivers/gpu/drm/i915/gt/intel_context.h | 3 +- drivers/gpu/drm/i915/gt/intel_engine.h | 4 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 74 ++++++++------- .../drm/i915/gt/intel_execlists_submission.c | 27 ++++++ .../drm/i915/gt/intel_execlists_submission.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_capture.c | 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 ++++++- drivers/gpu/drm/i915/i915_gpu_error.c | 92 ++++++++++--------- drivers/gpu/drm/i915/i915_gpu_error.h | 2 +- 10 files changed, 160 insertions(+), 93 deletions(-) -- 2.39.0
next reply other threads:[~2023-01-20 23:29 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-01-20 23:28 John.C.Harrison [this message] 2023-01-20 23:28 ` [Intel-gfx] [PATCH v4 0/7] Allow error capture without a request & fix locking issues John.C.Harrison 2023-01-20 23:28 ` [PATCH v4 1/7] drm/i915: Fix request locking during error capture & debugfs dump John.C.Harrison 2023-01-20 23:28 ` [Intel-gfx] " John.C.Harrison 2023-01-23 17:51 ` Tvrtko Ursulin 2023-01-23 17:51 ` [Intel-gfx] " Tvrtko Ursulin 2023-01-23 20:35 ` John Harrison 2023-01-23 20:35 ` [Intel-gfx] " John Harrison 2023-01-25 22:04 ` John Harrison 2023-01-25 22:04 ` John Harrison 2023-01-20 23:28 ` [PATCH v4 2/7] drm/i915: Fix up locking around dumping requests lists John.C.Harrison 2023-01-20 23:28 ` [Intel-gfx] " John.C.Harrison 2023-01-20 23:40 ` John Harrison 2023-01-20 23:40 ` [Intel-gfx] " John Harrison 2023-01-24 14:40 ` Tvrtko Ursulin 2023-01-25 18:00 ` John Harrison 2023-01-25 18:12 ` Tvrtko Ursulin 2023-01-25 18:17 ` John Harrison 2023-01-25 0:31 ` Ceraolo Spurio, Daniele 2023-01-20 23:28 ` [PATCH v4 3/7] drm/i915: Allow error capture without a request John.C.Harrison 2023-01-20 23:28 ` [Intel-gfx] " John.C.Harrison 2023-01-25 0:39 ` Ceraolo Spurio, Daniele 2023-01-25 0:56 ` John Harrison 2023-01-20 23:28 ` [PATCH v4 4/7] drm/i915: Allow error capture of a pending request John.C.Harrison 2023-01-20 23:28 ` [Intel-gfx] " John.C.Harrison 2023-01-20 23:28 ` [PATCH v4 5/7] drm/i915/guc: Look for a guilty context when an engine reset fails John.C.Harrison 2023-01-20 23:28 ` [Intel-gfx] " John.C.Harrison 2023-01-20 23:28 ` [PATCH v4 6/7] drm/i915/guc: Add a debug print on GuC triggered reset John.C.Harrison 2023-01-20 23:28 ` [Intel-gfx] " John.C.Harrison 2023-01-20 23:28 ` [PATCH v4 7/7] drm/i915/guc: Rename GuC register state capture node to be more obvious John.C.Harrison 2023-01-20 23:28 ` [Intel-gfx] " John.C.Harrison 2023-01-25 0:44 ` Ceraolo Spurio, Daniele 2023-01-20 23:57 ` [Intel-gfx] ✓ Fi.CI.BAT: success for Allow error capture without a request & fix locking issues (rev2) Patchwork 2023-01-21 21:41 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20230120232831.28177-1-John.C.Harrison@Intel.com \ --to=john.c.harrison@intel.com \ --cc=DRI-Devel@Lists.FreeDesktop.Org \ --cc=Intel-GFX@Lists.FreeDesktop.Org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.