All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Add GuC Error Capture Support
@ 2022-01-18 10:03 ` Alan Previn
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Previn @ 2022-01-18 10:03 UTC (permalink / raw)
  To: intel-gfx
  Cc: Matthew Brost, Tvrtko Ursulin, John Harrison, dri-devel, Alan Previn

This series:
  1. Enables support of GuC to execute error-
     state-capture based on a list of MMIO
     registers the driver registers and GuC will
     dump and report back right before a GuC
     triggered engine-reset event.
  2. Updates the ADS blob creation to register lists
     of global and engine registers with GuC.
  3. Defines tables of register lists that are global or
     engine class or engine instance in scope.
  4. Separates GuC log-buffer access locks for relay logging
     vs the new region for the error state capture data.
  5. Allocates an additional interim circular buffer store
     to copy snapshots of new GuC reported error-state-capture
     dumps in response to the G2H notification.
  6. Connects the i915_gpu_coredump reporting function
     to the GuC error capture module to print all GuC
     error state capture dumps that is reported.

This is the 4th rev of this series with the first 3 revs
labelled as RFC.

Prior receipts of rvb's:
  - Patch #4 has received R-v-b from Matthew Brost
    <matthew.brost@intel.com>

Changes from prior revs:
  v4:
      - Rebased on latest drm-tip that has been merged with the
        support of GuC firmware version 69.0.3 that is required
        for GuC error-state-catpure to work.
      - Added register list for DG2 which is the same as XE_LP
        except an additional steering register set.
      - Fixed a bug in the end of capture parsing loop in
        intel_guc_capture_out_print_next_group that was not
        properly comparing the engine-instance and engine-
        class being parsed against the one that triggered
        the i915_gpu_coredump.
  v3:
      - Fixed all review comments from rev2 except the following:
          - Michal Wajdeczko proposed adding a seperate function
            to lookup register string nameslookup (based on offset)
            but decided against it because of offset conflicts
            and the current table layout is easier to maintain.
          - Last set of checkpatch errors pertaining to "COMPLEX
            MACROS" should be fixed on next rev.
      - Abstracted internal-to-guc-capture information into a new
        __guc_state_capture_priv structure that allows the exclusion
        of intel_guc.h and intel_guc_fwif.h from intel_guc_capture.h.
        Now, only the first 2 patches have a wider build time
        impact because of the changes to intel_guc_fwif.h but
        subsequent changes to guc-capture internal structures
        or firmware interfaces used solely by guc-capture module
        shoudn't impact the rest of the driver build.
      - Added missing Gen12LP registers and added slice+subslice
        indices when reporting extended steered registers.
      - Add additional checks to ensure that the GuC reported
        error capture information matches the i915_gpu_coredump
        that is being printed before we print out the corresponding
        VMA dumps such as the batch buffer.
   v2:
      - Ignore - failed CI retest.

Alan Previn (7):
  drm/i915/guc: Update GuC ADS size for error capture lists
  drm/i915/guc: Add XE_LP registers for GuC error state capture.
  drm/i915/guc: Add DG2 registers for GuC error state capture.
  drm/i915/guc: Add GuC's error state capture output structures.
  drm/i915/guc: Update GuC's log-buffer-state access for error capture.
  drm/i915/guc: Copy new GuC error capture logs upon G2H notification.
  drm/i915/guc: Print the GuC error capture output register list.

 drivers/gpu/drm/i915/Makefile                 |    1 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |    4 +-
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |    7 +
 drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h |   85 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   13 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   13 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |   36 +-
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 1310 +++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |   30 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   21 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    |  155 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |   20 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   14 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         |   65 +-
 drivers/gpu/drm/i915/i915_gpu_error.h         |   14 +
 15 files changed, 1669 insertions(+), 119 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 0/7] Add GuC Error Capture Support
@ 2022-01-18 10:03 ` Alan Previn
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Previn @ 2022-01-18 10:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel, Alan Previn

This series:
  1. Enables support of GuC to execute error-
     state-capture based on a list of MMIO
     registers the driver registers and GuC will
     dump and report back right before a GuC
     triggered engine-reset event.
  2. Updates the ADS blob creation to register lists
     of global and engine registers with GuC.
  3. Defines tables of register lists that are global or
     engine class or engine instance in scope.
  4. Separates GuC log-buffer access locks for relay logging
     vs the new region for the error state capture data.
  5. Allocates an additional interim circular buffer store
     to copy snapshots of new GuC reported error-state-capture
     dumps in response to the G2H notification.
  6. Connects the i915_gpu_coredump reporting function
     to the GuC error capture module to print all GuC
     error state capture dumps that is reported.

This is the 4th rev of this series with the first 3 revs
labelled as RFC.

Prior receipts of rvb's:
  - Patch #4 has received R-v-b from Matthew Brost
    <matthew.brost@intel.com>

Changes from prior revs:
  v4:
      - Rebased on latest drm-tip that has been merged with the
        support of GuC firmware version 69.0.3 that is required
        for GuC error-state-catpure to work.
      - Added register list for DG2 which is the same as XE_LP
        except an additional steering register set.
      - Fixed a bug in the end of capture parsing loop in
        intel_guc_capture_out_print_next_group that was not
        properly comparing the engine-instance and engine-
        class being parsed against the one that triggered
        the i915_gpu_coredump.
  v3:
      - Fixed all review comments from rev2 except the following:
          - Michal Wajdeczko proposed adding a seperate function
            to lookup register string nameslookup (based on offset)
            but decided against it because of offset conflicts
            and the current table layout is easier to maintain.
          - Last set of checkpatch errors pertaining to "COMPLEX
            MACROS" should be fixed on next rev.
      - Abstracted internal-to-guc-capture information into a new
        __guc_state_capture_priv structure that allows the exclusion
        of intel_guc.h and intel_guc_fwif.h from intel_guc_capture.h.
        Now, only the first 2 patches have a wider build time
        impact because of the changes to intel_guc_fwif.h but
        subsequent changes to guc-capture internal structures
        or firmware interfaces used solely by guc-capture module
        shoudn't impact the rest of the driver build.
      - Added missing Gen12LP registers and added slice+subslice
        indices when reporting extended steered registers.
      - Add additional checks to ensure that the GuC reported
        error capture information matches the i915_gpu_coredump
        that is being printed before we print out the corresponding
        VMA dumps such as the batch buffer.
   v2:
      - Ignore - failed CI retest.

Alan Previn (7):
  drm/i915/guc: Update GuC ADS size for error capture lists
  drm/i915/guc: Add XE_LP registers for GuC error state capture.
  drm/i915/guc: Add DG2 registers for GuC error state capture.
  drm/i915/guc: Add GuC's error state capture output structures.
  drm/i915/guc: Update GuC's log-buffer-state access for error capture.
  drm/i915/guc: Copy new GuC error capture logs upon G2H notification.
  drm/i915/guc: Print the GuC error capture output register list.

 drivers/gpu/drm/i915/Makefile                 |    1 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |    4 +-
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |    7 +
 drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h |   85 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   13 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   13 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |   36 +-
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 1310 +++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |   30 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   21 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    |  155 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |   20 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   14 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         |   65 +-
 drivers/gpu/drm/i915/i915_gpu_error.h         |   14 +
 15 files changed, 1669 insertions(+), 119 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 1/7] drm/i915/guc: Update GuC ADS size for error capture lists
  2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
  (?)
@ 2022-01-18 10:03 ` Alan Previn
  -1 siblings, 0 replies; 14+ messages in thread
From: Alan Previn @ 2022-01-18 10:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: Alan Previn

Update GuC ADS size allocation to include space for
the lists of error state capture register descriptors.

Also, populate the lists of registers we want GuC to report back to
Host on engine reset events. This list should include global,
engine-class and engine-instance registers for every engine-class
type on the current hardware.

NOTE: Start with a sample table of register lists to layout the
framework before adding real registers in subsequent patch.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h |  36 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  13 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  11 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  36 +-
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 450 ++++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |  20 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  17 +
 8 files changed, 555 insertions(+), 29 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
 create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index aa86ac33effc..92fe5302e35f 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -183,6 +183,7 @@ i915-y += gt/uc/intel_uc.o \
 	  gt/uc/intel_uc_fw.o \
 	  gt/uc/intel_guc.o \
 	  gt/uc/intel_guc_ads.o \
+	  gt/uc/intel_guc_capture.o \
 	  gt/uc/intel_guc_ct.o \
 	  gt/uc/intel_guc_debugfs.o \
 	  gt/uc/intel_guc_fw.o \
diff --git a/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h b/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
new file mode 100644
index 000000000000..15b8c02b8a76
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021-2021 Intel Corporation
+ */
+
+#ifndef _INTEL_GUC_CAPTURE_FWIF_H
+#define _INTEL_GUC_CAPTURE_FWIF_H
+
+#include <linux/types.h>
+#include "intel_guc_fwif.h"
+
+struct intel_guc;
+
+struct __guc_mmio_reg_descr {
+	i915_reg_t reg;
+	u32 flags;
+	u32 mask;
+	const char *regname;
+};
+
+struct __guc_mmio_reg_descr_group {
+	struct __guc_mmio_reg_descr *list;
+	u32 num_regs;
+	u32 owner; /* see enum guc_capture_owner */
+	u32 type; /* see enum guc_capture_type */
+	u32 engine; /* as per MAX_ENGINE_CLASS */
+};
+
+struct __guc_state_capture_priv {
+	struct __guc_mmio_reg_descr_group *reglists;
+	u16 num_instance_regs[GUC_CAPTURE_LIST_INDEX_MAX][GUC_MAX_ENGINE_CLASSES];
+	u16 num_class_regs[GUC_CAPTURE_LIST_INDEX_MAX][GUC_MAX_ENGINE_CLASSES];
+	u16 num_global_regs[GUC_CAPTURE_LIST_INDEX_MAX];
+};
+
+#endif /* _INTEL_GUC_CAPTURE_FWIF_H */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index ba2a67f9e500..d035a3ba8700 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -8,8 +8,9 @@
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm_irq.h"
 #include "intel_guc.h"
-#include "intel_guc_slpc.h"
 #include "intel_guc_ads.h"
+#include "intel_guc_capture.h"
+#include "intel_guc_slpc.h"
 #include "intel_guc_submission.h"
 #include "i915_drv.h"
 #include "i915_irq.h"
@@ -361,9 +362,14 @@ int intel_guc_init(struct intel_guc *guc)
 	if (ret)
 		goto err_fw;
 
-	ret = intel_guc_ads_create(guc);
+	ret = intel_guc_capture_init(guc);
 	if (ret)
 		goto err_log;
+
+	ret = intel_guc_ads_create(guc);
+	if (ret)
+		goto err_capture;
+
 	GEM_BUG_ON(!guc->ads_vma);
 
 	ret = intel_guc_ct_init(&guc->ct);
@@ -402,6 +408,8 @@ int intel_guc_init(struct intel_guc *guc)
 	intel_guc_ct_fini(&guc->ct);
 err_ads:
 	intel_guc_ads_destroy(guc);
+err_capture:
+	intel_guc_capture_destroy(guc);
 err_log:
 	intel_guc_log_destroy(&guc->log);
 err_fw:
@@ -429,6 +437,7 @@ void intel_guc_fini(struct intel_guc *guc)
 	intel_guc_ct_fini(&guc->ct);
 
 	intel_guc_ads_destroy(guc);
+	intel_guc_capture_destroy(guc);
 	intel_guc_log_destroy(&guc->log);
 	intel_uc_fw_fini(&guc->fw);
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 9d26a86fe557..9542db6fda0d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -9,18 +9,19 @@
 #include <linux/xarray.h>
 #include <linux/delay.h>
 
-#include "intel_uncore.h"
+#include "intel_guc_ct.h"
 #include "intel_guc_fw.h"
 #include "intel_guc_fwif.h"
-#include "intel_guc_ct.h"
 #include "intel_guc_log.h"
 #include "intel_guc_reg.h"
 #include "intel_guc_slpc_types.h"
 #include "intel_uc_fw.h"
+#include "intel_uncore.h"
 #include "i915_utils.h"
 #include "i915_vma.h"
 
 struct __guc_ads_blob;
+struct __guc_state_capture_priv;
 
 /**
  * struct intel_guc - Top level structure of GuC.
@@ -37,6 +38,10 @@ struct intel_guc {
 	struct intel_guc_ct ct;
 	/** @slpc: sub-structure containing SLPC related data and objects */
 	struct intel_guc_slpc slpc;
+	/** @capture: the error-state-capture module's data and objects */
+	struct intel_guc_state_capture {
+		struct __guc_state_capture_priv *priv;
+	} capture;
 
 	/** @sched_engine: Global engine used to submit requests to GuC */
 	struct i915_sched_engine *sched_engine;
@@ -143,6 +148,8 @@ struct intel_guc {
 	u32 ads_regset_size;
 	/** @ads_golden_ctxt_size: size of the golden contexts in the ADS */
 	u32 ads_golden_ctxt_size;
+	/** @ads_capture_size: size of register lists in the ADS used for error capture */
+	u32 ads_capture_size;
 	/** @ads_engine_usage_size: size of engine usage in the ADS */
 	u32 ads_engine_usage_size;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 668bf4ac9b0c..4597ba0a4177 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -10,6 +10,7 @@
 #include "gt/intel_lrc.h"
 #include "gt/shmem_utils.h"
 #include "intel_guc_ads.h"
+#include "intel_guc_capture.h"
 #include "intel_guc_fwif.h"
 #include "intel_uc.h"
 #include "i915_drv.h"
@@ -72,8 +73,7 @@ static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)
 
 static u32 guc_ads_capture_size(struct intel_guc *guc)
 {
-	/* FIXME: Allocate a proper capture list */
-	return PAGE_ALIGN(PAGE_SIZE);
+	return PAGE_ALIGN(guc->ads_capture_size);
 }
 
 static u32 guc_ads_private_data_size(struct intel_guc *guc)
@@ -520,26 +520,6 @@ static void guc_init_golden_context(struct intel_guc *guc)
 	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
 }
 
-static void guc_capture_list_init(struct intel_guc *guc, struct __guc_ads_blob *blob)
-{
-	int i, j;
-	u32 addr_ggtt, offset;
-
-	offset = guc_ads_capture_offset(guc);
-	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
-
-	/* FIXME: Populate a proper capture list */
-
-	for (i = 0; i < GUC_CAPTURE_LIST_INDEX_MAX; i++) {
-		for (j = 0; j < GUC_MAX_ENGINE_CLASSES; j++) {
-			blob->ads.capture_instance[i][j] = addr_ggtt;
-			blob->ads.capture_class[i][j] = addr_ggtt;
-		}
-
-		blob->ads.capture_global[i] = addr_ggtt;
-	}
-}
-
 static void __guc_ads_init(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
@@ -573,9 +553,9 @@ static void __guc_ads_init(struct intel_guc *guc)
 
 	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
 
-	/* Capture list for hang debug */
-	guc_capture_list_init(guc, blob);
-
+	/* Lists for error capture debug */
+	intel_guc_capture_prep_lists(guc, (struct guc_ads *)blob, base,
+				     guc_ads_capture_offset(guc), &blob->system_info);
 	/* ADS */
 	blob->ads.scheduler_policies = base + ptr_offset(blob, policies);
 	blob->ads.gt_system_info = base + ptr_offset(blob, system_info);
@@ -615,6 +595,12 @@ int intel_guc_ads_create(struct intel_guc *guc)
 		return ret;
 	guc->ads_golden_ctxt_size = ret;
 
+	/* Likewise the capture lists: */
+	ret = intel_guc_capture_prep_lists(guc, NULL, 0, 0, NULL);
+	if (ret < 0)
+		return ret;
+	guc->ads_capture_size = ret;
+
 	/* Now the total size can be determined: */
 	size = guc_ads_blob_size(guc);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
new file mode 100644
index 000000000000..20c537274e60
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -0,0 +1,450 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021-2021 Intel Corporation
+ */
+
+#include <linux/types.h>
+
+#include <drm/drm_print.h>
+
+#include "gt/intel_engine_regs.h"
+#include "gt/intel_gt.h"
+#include "guc_capture_fwif.h"
+#include "intel_guc_fwif.h"
+#include "i915_drv.h"
+#include "i915_memcpy.h"
+
+/*
+ * Define all device tables of GuC error capture register lists
+ * NOTE: For engine-registers, GuC only needs the register offsets
+ *       from the engine-mmio-base
+ */
+/* XE_LPD - Global */
+static struct __guc_mmio_reg_descr xe_lpd_global_regs[] = {
+	{GEN12_RING_FAULT_REG,     0,      0, "GEN12_RING_FAULT_REG"}
+};
+
+/* XE_LPD - Render / Compute Per-Class */
+static struct __guc_mmio_reg_descr xe_lpd_rc_class_regs[] = {
+	{EIR,                      0,      0, "EIR"}
+};
+
+/* XE_LPD - Render / Compute Per-Engine-Instance */
+static struct __guc_mmio_reg_descr xe_lpd_rc_inst_regs[] = {
+	{RING_HEAD(0),             0,      0, "RING_HEAD"},
+	{RING_TAIL(0),             0,      0, "RING_TAIL"},
+};
+
+/* XE_LPD - Media Decode/Encode Per-Class */
+static struct __guc_mmio_reg_descr xe_lpd_vd_class_regs[] = {
+};
+
+/* XE_LPD - Media Decode/Encode Per-Engine-Instance */
+static struct __guc_mmio_reg_descr xe_lpd_vd_inst_regs[] = {
+	{RING_HEAD(0),             0,      0, "RING_HEAD"},
+	{RING_TAIL(0),             0,      0, "RING_TAIL"},
+};
+
+/* XE_LPD - Video Enhancement Per-Class */
+static struct __guc_mmio_reg_descr xe_lpd_vec_class_regs[] = {
+};
+
+/* XE_LPD - Video Enhancement Per-Engine-Instance */
+static struct __guc_mmio_reg_descr xe_lpd_vec_inst_regs[] = {
+	{RING_HEAD(0),             0,      0, "RING_HEAD"},
+	{RING_TAIL(0),             0,      0, "RING_TAIL"},
+};
+
+#define TO_GCAP_DEF_OWNER(x) (GUC_CAPTURE_LIST_INDEX_##x)
+#define TO_GCAP_DEF_TYPE(x) (GUC_CAPTURE_LIST_TYPE_##x)
+#define MAKE_REGLIST(regslist, regsowner, regstype, class) \
+	{ \
+		.list = regslist, \
+		.num_regs = ARRAY_SIZE(regslist), \
+		.owner = TO_GCAP_DEF_OWNER(regsowner), \
+		.type = TO_GCAP_DEF_TYPE(regstype), \
+		.engine = class, \
+	}
+
+/* List of lists */
+static struct __guc_mmio_reg_descr_group xe_lpd_lists[] = {
+	MAKE_REGLIST(xe_lpd_global_regs, PF, GLOBAL, 0),
+	MAKE_REGLIST(xe_lpd_rc_class_regs, PF, ENGINE_CLASS, GUC_RENDER_CLASS),
+	MAKE_REGLIST(xe_lpd_rc_inst_regs, PF, ENGINE_INSTANCE, GUC_RENDER_CLASS),
+	MAKE_REGLIST(xe_lpd_vd_class_regs, PF, ENGINE_CLASS, GUC_VIDEO_CLASS),
+	MAKE_REGLIST(xe_lpd_vd_inst_regs, PF, ENGINE_INSTANCE, GUC_VIDEO_CLASS),
+	MAKE_REGLIST(xe_lpd_vec_class_regs, PF, ENGINE_CLASS, GUC_VIDEOENHANCE_CLASS),
+	MAKE_REGLIST(xe_lpd_vec_inst_regs, PF, ENGINE_INSTANCE, GUC_VIDEOENHANCE_CLASS),
+	{}
+};
+
+static struct __guc_mmio_reg_descr_group *
+guc_capture_get_device_reglist(struct intel_guc *guc)
+{
+	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
+
+	if (IS_TIGERLAKE(i915) || IS_ROCKETLAKE(i915) ||
+	    IS_ALDERLAKE_S(i915) || IS_ALDERLAKE_P(i915)) {
+		/*
+		 * For certain engine classes, there are slice and subslice
+		 * level registers requiring steering. We allocate and populate
+		 * these at init time based on hw config add it as an extension
+		 * list at the end of the pre-populated render list.
+		 */
+		return xe_lpd_lists;
+	}
+
+	return NULL;
+}
+
+static struct __guc_mmio_reg_descr_group *
+guc_capture_get_one_list(struct __guc_mmio_reg_descr_group *reglists, u32 owner, u32 type, u32 id)
+{
+	int i;
+
+	if (!reglists)
+		return NULL;
+
+	for (i = 0; reglists[i].list; i++) {
+		if (reglists[i].owner == owner && reglists[i].type == type &&
+		    (reglists[i].engine == id || reglists[i].type == GUC_CAPTURE_LIST_TYPE_GLOBAL))
+		return &reglists[i];
+	}
+
+	return NULL;
+}
+
+static const char *
+guc_capture_stringify_owner(u32 owner)
+{
+	switch (owner) {
+	case GUC_CAPTURE_LIST_INDEX_PF:
+		return "PF";
+	case GUC_CAPTURE_LIST_INDEX_VF:
+		return "VF";
+	default:
+		return "unknown";
+	}
+
+	return "";
+}
+
+static const char *
+guc_capture_stringify_type(u32 type)
+{
+	switch (type) {
+	case GUC_CAPTURE_LIST_TYPE_GLOBAL:
+		return "Global";
+	case GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS:
+		return "Class";
+	case GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE:
+		return "Instance";
+	default:
+		return "unknown";
+	}
+
+	return "";
+}
+
+static const char *
+guc_capture_stringify_engclass(u32 class)
+{
+	switch (class) {
+	case GUC_RENDER_CLASS:
+		return "Render";
+	case GUC_VIDEO_CLASS:
+		return "Video";
+	case GUC_VIDEOENHANCE_CLASS:
+		return "VideoEnhance";
+	case GUC_BLITTER_CLASS:
+		return "Blitter";
+	case GUC_RESERVED_CLASS:
+		return "Reserved";
+	default:
+		return "unknown";
+	}
+
+	return "";
+}
+
+static void
+guc_capture_warn_with_list_info(struct drm_i915_private *i915, char *msg,
+				u32 owner, u32 type, u32 classid)
+{
+	if (type == GUC_CAPTURE_LIST_TYPE_GLOBAL)
+		drm_dbg(&i915->drm, "GuC-capture: %s for %s %s-Registers.\n", msg,
+			 guc_capture_stringify_owner(owner), guc_capture_stringify_type(type));
+	else
+		drm_dbg(&i915->drm, "GuC-capture: %s for %s %s-Registers on %s-Engine\n", msg,
+			 guc_capture_stringify_owner(owner), guc_capture_stringify_type(type),
+			 guc_capture_stringify_engclass(classid));
+}
+
+static int
+guc_capture_list_init(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
+		      struct guc_mmio_reg *ptr, u16 num_entries)
+{
+	u32 j = 0;
+	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
+	struct __guc_mmio_reg_descr_group *reglists = guc->capture.priv->reglists;
+	struct __guc_mmio_reg_descr_group *match;
+
+	if (!reglists)
+		return -ENODEV;
+
+	match = guc_capture_get_one_list(reglists, owner, type, classid);
+	if (match) {
+		for (j = 0; j < num_entries && j < match->num_regs; ++j) {
+			ptr[j].offset = match->list[j].reg.reg;
+			ptr[j].value = 0xDEADF00D;
+			ptr[j].flags = match->list[j].flags;
+			ptr[j].mask = match->list[j].mask;
+		}
+		return 0;
+	}
+
+	guc_capture_warn_with_list_info(i915, "Missing register list init", owner, type,
+					classid);
+
+	return -ENODATA;
+}
+
+static int
+guc_capture_fill_reglist(struct intel_guc *guc, struct guc_ads *ads,
+			 u32 owner, int type, int classid, u16 numregs,
+			 u8 **p_virt, u32 *p_ggtt, u32 null_ggtt)
+{
+	struct guc_debug_capture_list *listnode;
+	u32 *p_capturelist_ggtt;
+	int size = 0;
+
+	/*
+	 * For enabled capture lists, we not only need to call capture module to help
+	 * populate the list-descriptor into the correct ads capture structures, but
+	 * we also need to increment the virtual pointers and ggtt offsets so that
+	 * caller has the subsequent gfx memory location.
+	 */
+	size = PAGE_ALIGN((sizeof(struct guc_debug_capture_list)) +
+			  (numregs * sizeof(struct guc_mmio_reg)));
+	/* if caller hasn't allocated ADS blob, return size and counts, we're done */
+	if (!ads)
+		return size;
+
+	/*
+	 * If caller allocated ADS blob, populate the capture register descriptors into
+	 * the designated ADS location based on list-owner, list-type and engine-classid
+	 */
+	if (type == GUC_CAPTURE_LIST_TYPE_GLOBAL)
+		p_capturelist_ggtt = &ads->capture_global[owner];
+	else if (type == GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS)
+		p_capturelist_ggtt = &ads->capture_class[owner][classid];
+	else /*GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE*/
+		p_capturelist_ggtt = &ads->capture_instance[owner][classid];
+
+	if (!numregs) {
+		*p_capturelist_ggtt = null_ggtt;
+	} else {
+		/* get ptr and populate header info: */
+		*p_capturelist_ggtt = *p_ggtt;
+		listnode = (struct guc_debug_capture_list *)*p_virt;
+		*p_ggtt += sizeof(struct guc_debug_capture_list);
+		*p_virt += sizeof(struct guc_debug_capture_list);
+		listnode->header.info = FIELD_PREP(GUC_CAPTURELISTHDR_NUMDESCR, numregs);
+
+		/* get ptr and populate register descriptor list: */
+		guc_capture_list_init(guc, owner, type, classid,
+				      (struct guc_mmio_reg *)*p_virt,
+				      numregs);
+
+		/* increment ptrs for that header: */
+		*p_ggtt += size - sizeof(struct guc_debug_capture_list);
+		*p_virt += size - sizeof(struct guc_debug_capture_list);
+	}
+
+	return size;
+}
+
+static int
+guc_capture_list_count(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
+		       u16 *num_entries)
+{
+	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
+	struct __guc_mmio_reg_descr_group *reglists = guc->capture.priv->reglists;
+	struct __guc_mmio_reg_descr_group *match;
+
+	if (!reglists)
+		return -ENODEV;
+
+	match = guc_capture_get_one_list(reglists, owner, type, classid);
+	if (!match) {
+		guc_capture_warn_with_list_info(i915, "Missing register list size",
+						owner, type, classid);
+		return -ENODATA;
+	}
+
+	*num_entries = match->num_regs;
+	return 0;
+}
+
+static void
+guc_capture_fill_engine_enable_masks(struct intel_gt *gt,
+				     struct guc_gt_system_info *info)
+{
+	info->engine_enabled_masks[GUC_RENDER_CLASS] = 1;
+	info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
+	info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
+	info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
+}
+
+int intel_guc_capture_prep_lists(struct intel_guc *guc, struct guc_ads *blob, u32 blob_ggtt,
+				 u32 capture_offset, struct guc_gt_system_info *sysinfo)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct guc_gt_system_info *info, local_info;
+	struct guc_debug_capture_list *listnode;
+	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
+	struct __guc_state_capture_priv *gc = guc->capture.priv;
+	int i, j, size;
+	u32 ggtt, null_ggtt, alloc_size = 0;
+	u16 tmpnumreg = 0;
+	u8 *ptr = NULL;
+
+	GEM_BUG_ON(!gc);
+
+	if (blob) {
+		ptr = ((u8 *)blob) + capture_offset;
+		ggtt = blob_ggtt + capture_offset;
+		GEM_BUG_ON(!sysinfo);
+		info = sysinfo;
+	} else {
+		memset(&local_info, 0, sizeof(local_info));
+		info = &local_info;
+		guc_capture_fill_engine_enable_masks(gt, info);
+	}
+
+	/* first, set aside the first page for a capture_list with zero descriptors */
+	alloc_size = PAGE_SIZE;
+	if (blob) {
+		listnode = (struct guc_debug_capture_list *)ptr;
+		listnode->header.info = FIELD_PREP(GUC_CAPTURELISTHDR_NUMDESCR, 0);
+		null_ggtt = ggtt;
+		ggtt += PAGE_SIZE;
+		ptr +=  PAGE_SIZE;
+	}
+
+#define COUNT_REGS guc_capture_list_count
+#define FILL_REGS guc_capture_fill_reglist
+#define TYPE_GLOBAL GUC_CAPTURE_LIST_TYPE_GLOBAL
+#define TYPE_CLASS GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS
+#define TYPE_INSTANCE GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE
+#define OWNER2STR guc_capture_stringify_owner
+#define ENGCLS2STR guc_capture_stringify_engclass
+#define TYPE2STR guc_capture_stringify_type
+
+	for (i = 0; i < GUC_CAPTURE_LIST_INDEX_MAX; i++) {
+		for (j = 0; j < GUC_MAX_ENGINE_CLASSES; j++) {
+			if (!info->engine_enabled_masks[j]) {
+				if (gc->num_class_regs[i][j])
+					drm_warn(&i915->drm, "GuC-Cap %s's %s class-"
+						 "list enable mismatch was=%d now off\n",
+						 OWNER2STR(i), ENGCLS2STR(j),
+						 gc->num_class_regs[i][j]);
+				if (gc->num_instance_regs[i][j])
+					drm_warn(&i915->drm, "GuC-Cap %s's %s inst-"
+						 "list enable mismatch was=%d now off!\n",
+						 OWNER2STR(i), ENGCLS2STR(j),
+						 gc->num_instance_regs[i][j]);
+				gc->num_class_regs[i][j] = 0;
+				gc->num_instance_regs[i][j] = 0;
+				if (blob) {
+					blob->capture_class[i][j] = null_ggtt;
+					blob->capture_instance[i][j] = null_ggtt;
+				}
+			} else {
+				if (!COUNT_REGS(guc, i, TYPE_CLASS, j, &tmpnumreg)) {
+					if (blob && tmpnumreg > gc->num_class_regs[i][j]) {
+						drm_warn(&i915->drm, "GuC-Cap %s's %s-%s-list "
+							 "count overflow cap from %d to %d",
+							 OWNER2STR(i), ENGCLS2STR(j),
+							 TYPE2STR(TYPE_CLASS),
+							 gc->num_class_regs[i][j], tmpnumreg);
+						tmpnumreg = gc->num_class_regs[i][j];
+					}
+					size = FILL_REGS(guc, blob, i, TYPE_CLASS, j,
+							 tmpnumreg, &ptr, &ggtt, null_ggtt);
+					alloc_size += size;
+					gc->num_class_regs[i][j] = tmpnumreg;
+				} else {
+					gc->num_class_regs[i][j] = 0;
+					if (blob)
+						blob->capture_class[i][j] = null_ggtt;
+				}
+				if (!COUNT_REGS(guc, i, TYPE_INSTANCE, j, &tmpnumreg)) {
+					if (blob && tmpnumreg > gc->num_instance_regs[i][j]) {
+						drm_warn(&i915->drm, "GuC-Cap %s's %s-%s-list "
+							 "count overflow cap from %d to %d",
+							 OWNER2STR(i), ENGCLS2STR(j),
+							 TYPE2STR(TYPE_INSTANCE),
+							 gc->num_instance_regs[i][j], tmpnumreg);
+						tmpnumreg = gc->num_instance_regs[i][j];
+					}
+					size = FILL_REGS(guc, blob, i, TYPE_INSTANCE, j,
+							 tmpnumreg, &ptr, &ggtt, null_ggtt);
+					alloc_size += size;
+					gc->num_instance_regs[i][j] = tmpnumreg;
+				} else {
+					gc->num_instance_regs[i][j] = 0;
+					if (blob)
+						blob->capture_instance[i][j] = null_ggtt;
+				}
+			}
+		}
+		if (!COUNT_REGS(guc, i, TYPE_GLOBAL, 0, &tmpnumreg)) {
+			if (blob && tmpnumreg > gc->num_global_regs[i]) {
+				drm_warn(&i915->drm, "GuC-Cap %s's %s-list count increased from %d to %d",
+					 OWNER2STR(i), TYPE2STR(TYPE_GLOBAL),
+					 gc->num_global_regs[i], tmpnumreg);
+				tmpnumreg = gc->num_global_regs[i];
+			}
+			size = FILL_REGS(guc, blob, i, TYPE_GLOBAL, 0, tmpnumreg,
+					 &ptr, &ggtt, null_ggtt);
+			alloc_size += size;
+			gc->num_global_regs[i] = tmpnumreg;
+		} else {
+			gc->num_global_regs[i] = 0;
+			if (blob)
+				blob->capture_global[i] = null_ggtt;
+		}
+	}
+
+#undef COUNT_REGS
+#undef FILL_REGS
+#undef TYPE_GLOBAL
+#undef TYPE_CLASS
+#undef TYPE_INSTANCE
+#undef OWNER2STR
+#undef ENGCLS2STR
+#undef TYPE2STR
+
+	if (guc->ads_capture_size && guc->ads_capture_size != PAGE_ALIGN(alloc_size))
+		drm_warn(&i915->drm, "GuC->ADS->Capture alloc size changed from %d to %d\n",
+			 guc->ads_capture_size, PAGE_ALIGN(alloc_size));
+
+	return PAGE_ALIGN(alloc_size);
+}
+
+void intel_guc_capture_destroy(struct intel_guc *guc)
+{
+	kfree(guc->capture.priv);
+	guc->capture.priv = NULL;
+}
+
+int intel_guc_capture_init(struct intel_guc *guc)
+{
+	guc->capture.priv = kzalloc(sizeof(*guc->capture.priv), GFP_KERNEL);
+	if (!guc->capture.priv)
+		return -ENOMEM;
+	guc->capture.priv->reglists = guc_capture_get_device_reglist(guc);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
new file mode 100644
index 000000000000..6b5594ca529d
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2021-2021 Intel Corporation
+ */
+
+#ifndef _INTEL_GUC_CAPTURE_H
+#define _INTEL_GUC_CAPTURE_H
+
+#include <linux/types.h>
+
+struct intel_guc;
+struct guc_ads;
+struct guc_gt_system_info;
+
+int intel_guc_capture_prep_lists(struct intel_guc *guc, struct guc_ads *blob, u32 blob_ggtt,
+				 u32 capture_offset, struct guc_gt_system_info *sysinfo);
+void intel_guc_capture_destroy(struct intel_guc *guc);
+int intel_guc_capture_init(struct intel_guc *guc);
+
+#endif /* _INTEL_GUC_CAPTURE_H */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 6a4612a852e2..92bfe25a5e85 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -297,6 +297,23 @@ enum {
 	GUC_CAPTURE_LIST_INDEX_MAX = 2,
 };
 
+/*Register-types of GuC capture register lists */
+enum guc_capture_type {
+	GUC_CAPTURE_LIST_TYPE_GLOBAL = 0,
+	GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS,
+	GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE,
+	GUC_CAPTURE_LIST_TYPE_MAX,
+};
+
+struct guc_debug_capture_list_header {
+	u32 info;
+		#define GUC_CAPTURELISTHDR_NUMDESCR GENMASK(15, 0)
+} __packed;
+
+struct guc_debug_capture_list {
+	struct guc_debug_capture_list_header header;
+} __packed;
+
 /* GuC Additional Data Struct */
 struct guc_ads {
 	struct guc_mmio_reg_set reg_state_list[GUC_MAX_ENGINE_CLASSES][GUC_MAX_INSTANCES_PER_CLASS];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 2/7] drm/i915/guc: Add XE_LP registers for GuC error state capture.
  2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
  (?)
  (?)
@ 2022-01-18 10:03 ` Alan Previn
  2022-01-24 19:33   ` Teres Alexis, Alan Previn
  -1 siblings, 1 reply; 14+ messages in thread
From: Alan Previn @ 2022-01-18 10:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: Alan Previn

Add device specific tables and register lists to cover different engines
class types for GuC error state capture for XE_LP products.

Also, add runtime allocation and freeing of extended register lists
for registers that need steering identifiers that depend on
the detected HW config.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h |   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 208 +++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   4 +-
 3 files changed, 186 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h b/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
index 15b8c02b8a76..a2f97d04ff18 100644
--- a/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
@@ -24,6 +24,8 @@ struct __guc_mmio_reg_descr_group {
 	u32 owner; /* see enum guc_capture_owner */
 	u32 type; /* see enum guc_capture_type */
 	u32 engine; /* as per MAX_ENGINE_CLASS */
+	int num_ext;
+	struct __guc_mmio_reg_descr *ext;
 };
 
 struct __guc_state_capture_priv {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
index 20c537274e60..6adfb5c07bcf 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -19,20 +19,84 @@
  * NOTE: For engine-registers, GuC only needs the register offsets
  *       from the engine-mmio-base
  */
+#define COMMON_GEN12BASE_GLOBAL() \
+	{GEN12_FAULT_TLB_DATA0,    0,      0, "GEN12_FAULT_TLB_DATA0"}, \
+	{GEN12_FAULT_TLB_DATA1,    0,      0, "GEN12_FAULT_TLB_DATA1"}, \
+	{FORCEWAKE_MT,             0,      0, "FORCEWAKE_MT"}, \
+	{DERRMR,                   0,      0, "DERRMR"}, \
+	{GEN12_AUX_ERR_DBG,        0,      0, "GEN12_AUX_ERR_DBG"}, \
+	{GEN12_GAM_DONE,           0,      0, "GEN12_GAM_DONE"}, \
+	{GEN11_GUC_SG_INTR_ENABLE, 0,      0, "GEN11_GUC_SG_INTR_ENABLE"}, \
+	{GEN11_CRYPTO_RSVD_INTR_ENABLE, 0, 0, "GEN11_CRYPTO_RSVD_INTR_ENABLE"}, \
+	{GEN11_GUNIT_CSME_INTR_ENABLE, 0,  0, "GEN11_GUNIT_CSME_INTR_ENABLE"}, \
+	{GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0, 0, "GEN11_GPM_WGBOXPERF_INTR_ENABLE"}, \
+	{GEN8_DE_MISC_IER,         0,      0, "GEN8_DE_MISC_IER"}, \
+	{GEN12_RING_FAULT_REG,     0,      0, "GEN12_RING_FAULT_REG"}
+
+#define COMMON_GEN12BASE_ENGINE_INSTANCE() \
+	{RING_PSMI_CTL(0),         0,      0, "RING_PSMI_CTL"}, \
+	{RING_ESR(0),              0,      0, "RING_ESR"}, \
+	{RING_DMA_FADD(0),         0,      0, "RING_DMA_FADD_LOW32"}, \
+	{RING_DMA_FADD_UDW(0),     0,      0, "RING_DMA_FADD_UP32"}, \
+	{RING_IPEIR(0),            0,      0, "RING_IPEIR"}, \
+	{RING_IPEHR(0),            0,      0, "RING_IPEHR"}, \
+	{RING_INSTPS(0),           0,      0, "RING_INSTPS"}, \
+	{RING_BBADDR(0),           0,      0, "RING_BBADDR_LOW32"}, \
+	{RING_BBADDR_UDW(0),       0,      0, "RING_BBADDR_UP32"}, \
+	{RING_BBSTATE(0),          0,      0, "RING_BBSTATE"}, \
+	{CCID(0),                  0,      0, "CCID"}, \
+	{RING_ACTHD(0),            0,      0, "RING_ACTHD_LOW32"}, \
+	{RING_ACTHD_UDW(0),        0,      0, "RING_ACTHD_UP32"}, \
+	{RING_INSTPM(0),           0,      0, "RING_INSTPM"}, \
+	{RING_NOPID(0),            0,      0, "RING_NOPID"}, \
+	{RING_START(0),            0,      0, "RING_START"}, \
+	{RING_HEAD(0),             0,      0, "RING_HEAD"}, \
+	{RING_TAIL(0),             0,      0, "RING_TAIL"}, \
+	{RING_CTL(0),              0,      0, "RING_CTL"}, \
+	{RING_MI_MODE(0),          0,      0, "RING_MI_MODE"}, \
+	{RING_CONTEXT_CONTROL(0),  0,      0, "RING_CONTEXT_CONTROL"}, \
+	{RING_INSTDONE(0),         0,      0, "RING_INSTDONE"}, \
+	{RING_HWS_PGA(0),          0,      0, "RING_HWS_PGA"}, \
+	{RING_MODE_GEN7(0),        0,      0, "RING_MODE_GEN7"}, \
+	{GEN8_RING_PDP_LDW(0, 0),  0,      0, "GEN8_RING_PDP0_LDW"}, \
+	{GEN8_RING_PDP_UDW(0, 0),  0,      0, "GEN8_RING_PDP0_UDW"}, \
+	{GEN8_RING_PDP_LDW(0, 1),  0,      0, "GEN8_RING_PDP1_LDW"}, \
+	{GEN8_RING_PDP_UDW(0, 1),  0,      0, "GEN8_RING_PDP1_UDW"}, \
+	{GEN8_RING_PDP_LDW(0, 2),  0,      0, "GEN8_RING_PDP2_LDW"}, \
+	{GEN8_RING_PDP_UDW(0, 2),  0,      0, "GEN8_RING_PDP2_UDW"}, \
+	{GEN8_RING_PDP_LDW(0, 3),  0,      0, "GEN8_RING_PDP3_LDW"}, \
+	{GEN8_RING_PDP_UDW(0, 3),  0,      0, "GEN8_RING_PDP3_UDW"}
+
+#define COMMON_GEN12BASE_HAS_EU() \
+	{EIR,                      0,      0, "EIR"}
+
+#define COMMON_GEN12BASE_RENDER() \
+	{GEN7_SC_INSTDONE,         0,      0, "GEN7_SC_INSTDONE"}, \
+	{GEN12_SC_INSTDONE_EXTRA,  0,      0, "GEN12_SC_INSTDONE_EXTRA"}, \
+	{GEN12_SC_INSTDONE_EXTRA2, 0,      0, "GEN12_SC_INSTDONE_EXTRA2"}
+
+#define COMMON_GEN12BASE_VEC() \
+	{GEN11_VCS_VECS_INTR_ENABLE, 0,    0, "GEN11_VCS_VECS_INTR_ENABLE"}, \
+	{GEN12_SFC_DONE(0),        0,      0, "GEN12_SFC_DONE0"}, \
+	{GEN12_SFC_DONE(1),        0,      0, "GEN12_SFC_DONE1"}, \
+	{GEN12_SFC_DONE(2),        0,      0, "GEN12_SFC_DONE2"}, \
+	{GEN12_SFC_DONE(3),        0,      0, "GEN12_SFC_DONE3"}
+
 /* XE_LPD - Global */
 static struct __guc_mmio_reg_descr xe_lpd_global_regs[] = {
-	{GEN12_RING_FAULT_REG,     0,      0, "GEN12_RING_FAULT_REG"}
+	COMMON_GEN12BASE_GLOBAL(),
 };
 
 /* XE_LPD - Render / Compute Per-Class */
 static struct __guc_mmio_reg_descr xe_lpd_rc_class_regs[] = {
-	{EIR,                      0,      0, "EIR"}
+	COMMON_GEN12BASE_HAS_EU(),
+	COMMON_GEN12BASE_RENDER(),
+	{GEN11_RENDER_COPY_INTR_ENABLE, 0, 0, "GEN11_RENDER_COPY_INTR_ENABLE"},
 };
 
 /* XE_LPD - Render / Compute Per-Engine-Instance */
 static struct __guc_mmio_reg_descr xe_lpd_rc_inst_regs[] = {
-	{RING_HEAD(0),             0,      0, "RING_HEAD"},
-	{RING_TAIL(0),             0,      0, "RING_TAIL"},
+	COMMON_GEN12BASE_ENGINE_INSTANCE(),
 };
 
 /* XE_LPD - Media Decode/Encode Per-Class */
@@ -41,18 +105,26 @@ static struct __guc_mmio_reg_descr xe_lpd_vd_class_regs[] = {
 
 /* XE_LPD - Media Decode/Encode Per-Engine-Instance */
 static struct __guc_mmio_reg_descr xe_lpd_vd_inst_regs[] = {
-	{RING_HEAD(0),             0,      0, "RING_HEAD"},
-	{RING_TAIL(0),             0,      0, "RING_TAIL"},
+	COMMON_GEN12BASE_ENGINE_INSTANCE(),
 };
 
 /* XE_LPD - Video Enhancement Per-Class */
 static struct __guc_mmio_reg_descr xe_lpd_vec_class_regs[] = {
+	COMMON_GEN12BASE_VEC(),
 };
 
 /* XE_LPD - Video Enhancement Per-Engine-Instance */
 static struct __guc_mmio_reg_descr xe_lpd_vec_inst_regs[] = {
-	{RING_HEAD(0),             0,      0, "RING_HEAD"},
-	{RING_TAIL(0),             0,      0, "RING_TAIL"},
+	COMMON_GEN12BASE_ENGINE_INSTANCE(),
+};
+
+/* XE_LPD - Media Decode/Encode Per-Class */
+static struct __guc_mmio_reg_descr xe_lpd_blt_class_regs[] = {
+};
+
+/* XE_LPD - Media Decode/Encode Per-Engine-Instance */
+static struct __guc_mmio_reg_descr xe_lpd_blt_inst_regs[] = {
+	COMMON_GEN12BASE_ENGINE_INSTANCE(),
 };
 
 #define TO_GCAP_DEF_OWNER(x) (GUC_CAPTURE_LIST_INDEX_##x)
@@ -64,6 +136,8 @@ static struct __guc_mmio_reg_descr xe_lpd_vec_inst_regs[] = {
 		.owner = TO_GCAP_DEF_OWNER(regsowner), \
 		.type = TO_GCAP_DEF_TYPE(regstype), \
 		.engine = class, \
+		.num_ext = 0, \
+		.ext = NULL, \
 	}
 
 /* List of lists */
@@ -75,9 +149,92 @@ static struct __guc_mmio_reg_descr_group xe_lpd_lists[] = {
 	MAKE_REGLIST(xe_lpd_vd_inst_regs, PF, ENGINE_INSTANCE, GUC_VIDEO_CLASS),
 	MAKE_REGLIST(xe_lpd_vec_class_regs, PF, ENGINE_CLASS, GUC_VIDEOENHANCE_CLASS),
 	MAKE_REGLIST(xe_lpd_vec_inst_regs, PF, ENGINE_INSTANCE, GUC_VIDEOENHANCE_CLASS),
+	MAKE_REGLIST(xe_lpd_blt_class_regs, PF, ENGINE_CLASS, GUC_BLITTER_CLASS),
+	MAKE_REGLIST(xe_lpd_blt_inst_regs, PF, ENGINE_INSTANCE, GUC_BLITTER_CLASS),
 	{}
 };
 
+static struct __guc_mmio_reg_descr_group *
+guc_capture_get_one_list(struct __guc_mmio_reg_descr_group *reglists, u32 owner, u32 type, u32 id)
+{
+	int i;
+
+	if (!reglists)
+		return NULL;
+
+	for (i = 0; reglists[i].list; i++) {
+		if (reglists[i].owner == owner && reglists[i].type == type &&
+		    (reglists[i].engine == id || reglists[i].type == GUC_CAPTURE_LIST_TYPE_GLOBAL))
+		return &reglists[i];
+	}
+
+	return NULL;
+}
+
+static void guc_capture_clear_ext_regs(struct __guc_mmio_reg_descr_group *lists)
+{
+	while (lists->list) {
+		kfree(lists->ext);
+		lists->ext = NULL;
+		++lists;
+	}
+}
+
+struct __ext_steer_reg {
+	const char *name;
+	i915_reg_t reg;
+};
+
+static struct __ext_steer_reg xelpd_extregs[] = {
+	{"GEN7_SAMPLER_INSTDONE", GEN7_SAMPLER_INSTDONE},
+	{"GEN7_ROW_INSTDONE", GEN7_ROW_INSTDONE}
+};
+
+static void
+guc_capture_alloc_steered_list_xelpd(struct intel_guc *guc,
+				     struct __guc_mmio_reg_descr_group *lists)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
+	struct sseu_dev_info *sseu;
+	int slice, subslice, i, num_tot_regs = 0;
+	struct __guc_mmio_reg_descr_group *list;
+	struct __guc_mmio_reg_descr *extarray;
+	int num_steer_regs = ARRAY_SIZE(xelpd_extregs);
+
+	/* In XE_LP we only care about render-class steering registers during error-capture */
+	list = guc_capture_get_one_list(lists, GUC_CAPTURE_LIST_INDEX_PF,
+					GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS, GUC_RENDER_CLASS);
+	if (!list)
+		return;
+
+	if (list->ext)
+		return; /* already populated */
+
+	sseu = &gt->info.sseu;
+	for_each_instdone_slice_subslice(i915, sseu, slice, subslice) {
+		num_tot_regs += num_steer_regs;
+	}
+	if (!num_tot_regs)
+		return;
+
+	list->ext = kcalloc(num_tot_regs, sizeof(struct __guc_mmio_reg_descr), GFP_KERNEL);
+	if (!list->ext)
+		return;
+
+	extarray = list->ext;
+	for_each_instdone_slice_subslice(i915, sseu, slice, subslice) {
+		for (i = 0; i < num_steer_regs; i++) {
+			extarray->reg = xelpd_extregs[i].reg;
+			extarray->flags = FIELD_PREP(GUC_REGSET_STEERING_GROUP, slice);
+			extarray->flags |= FIELD_PREP(GUC_REGSET_STEERING_INSTANCE, subslice);
+			extarray->regname = xelpd_extregs[i].name;
+			++extarray;
+		}
+	}
+	list->num_ext = num_tot_regs;
+}
+
 static struct __guc_mmio_reg_descr_group *
 guc_capture_get_device_reglist(struct intel_guc *guc)
 {
@@ -91,29 +248,13 @@ guc_capture_get_device_reglist(struct intel_guc *guc)
 		 * these at init time based on hw config add it as an extension
 		 * list at the end of the pre-populated render list.
 		 */
+		guc_capture_alloc_steered_list_xelpd(guc, xe_lpd_lists);
 		return xe_lpd_lists;
 	}
 
 	return NULL;
 }
 
-static struct __guc_mmio_reg_descr_group *
-guc_capture_get_one_list(struct __guc_mmio_reg_descr_group *reglists, u32 owner, u32 type, u32 id)
-{
-	int i;
-
-	if (!reglists)
-		return NULL;
-
-	for (i = 0; reglists[i].list; i++) {
-		if (reglists[i].owner == owner && reglists[i].type == type &&
-		    (reglists[i].engine == id || reglists[i].type == GUC_CAPTURE_LIST_TYPE_GLOBAL))
-		return &reglists[i];
-	}
-
-	return NULL;
-}
-
 static const char *
 guc_capture_stringify_owner(u32 owner)
 {
@@ -184,7 +325,7 @@ static int
 guc_capture_list_init(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
 		      struct guc_mmio_reg *ptr, u16 num_entries)
 {
-	u32 j = 0;
+	u32 j = 0, k = 0;
 	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
 	struct __guc_mmio_reg_descr_group *reglists = guc->capture.priv->reglists;
 	struct __guc_mmio_reg_descr_group *match;
@@ -200,6 +341,18 @@ guc_capture_list_init(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
 			ptr[j].flags = match->list[j].flags;
 			ptr[j].mask = match->list[j].mask;
 		}
+		if (match->ext) {
+			for (j = match->num_regs, k = 0; j < num_entries &&
+			     j < (match->num_regs + match->num_ext); ++j, ++k) {
+				ptr[j].offset = match->ext[k].reg.reg;
+				ptr[j].value = 0xDEADF00D;
+				ptr[j].flags = match->ext[k].flags;
+				ptr[j].mask = match->ext[k].mask;
+			}
+		}
+		if (j < num_entries)
+			drm_dbg(&i915->drm, "GuC-capture: Init reglist short %d out %d.\n",
+				(int)j, (int)num_entries);
 		return 0;
 	}
 
@@ -282,7 +435,7 @@ guc_capture_list_count(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
 		return -ENODATA;
 	}
 
-	*num_entries = match->num_regs;
+	*num_entries = match->num_regs + match->num_ext;
 	return 0;
 }
 
@@ -435,6 +588,7 @@ int intel_guc_capture_prep_lists(struct intel_guc *guc, struct guc_ads *blob, u3
 
 void intel_guc_capture_destroy(struct intel_guc *guc)
 {
+	guc_capture_clear_ext_regs(guc->capture.priv->reglists);
 	kfree(guc->capture.priv);
 	guc->capture.priv = NULL;
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 92bfe25a5e85..50fcd987f2a2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -271,10 +271,12 @@ struct guc_mmio_reg {
 	u32 offset;
 	u32 value;
 	u32 flags;
-	u32 mask;
 #define GUC_REGSET_MASKED		BIT(0)
 #define GUC_REGSET_MASKED_WITH_VALUE	BIT(2)
 #define GUC_REGSET_RESTORE_ONLY		BIT(3)
+#define GUC_REGSET_STEERING_GROUP       GENMASK(15, 12)
+#define GUC_REGSET_STEERING_INSTANCE    GENMASK(23, 20)
+	u32 mask;
 } __packed;
 
 /* GuC register sets */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 3/7] drm/i915/guc: Add DG2 registers for GuC error state capture.
  2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
                   ` (2 preceding siblings ...)
  (?)
@ 2022-01-18 10:03 ` Alan Previn
  -1 siblings, 0 replies; 14+ messages in thread
From: Alan Previn @ 2022-01-18 10:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: Alan Previn

Add additional DG2 registers for GuC error state capture.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 64 ++++++++++++++-----
 1 file changed, 49 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
index 6adfb5c07bcf..3df396c72b4c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -190,19 +190,23 @@ static struct __ext_steer_reg xelpd_extregs[] = {
 	{"GEN7_ROW_INSTDONE", GEN7_ROW_INSTDONE}
 };
 
+static struct __ext_steer_reg xehpg_extregs[] = {
+	{"XEHPG_INSTDONE_GEOM_SVG", XEHPG_INSTDONE_GEOM_SVG}
+};
+
 static void
-guc_capture_alloc_steered_list_xelpd(struct intel_guc *guc,
-				     struct __guc_mmio_reg_descr_group *lists)
+guc_capture_alloc_steered_list_xe_lpd_hpg(struct intel_guc *guc,
+					  struct __guc_mmio_reg_descr_group *lists,
+					  u32 ipver)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
 	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
 	struct sseu_dev_info *sseu;
-	int slice, subslice, i, num_tot_regs = 0;
+	int slice, subslice, i, iter, num_steer_regs, num_tot_regs = 0;
 	struct __guc_mmio_reg_descr_group *list;
 	struct __guc_mmio_reg_descr *extarray;
-	int num_steer_regs = ARRAY_SIZE(xelpd_extregs);
 
-	/* In XE_LP we only care about render-class steering registers during error-capture */
+	/* In XE_LP / HPG we only have render-class steering registers during error-capture */
 	list = guc_capture_get_one_list(lists, GUC_CAPTURE_LIST_INDEX_PF,
 					GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS, GUC_RENDER_CLASS);
 	if (!list)
@@ -211,10 +215,21 @@ guc_capture_alloc_steered_list_xelpd(struct intel_guc *guc,
 	if (list->ext)
 		return; /* already populated */
 
+	num_steer_regs = ARRAY_SIZE(xelpd_extregs);
+	if (ipver >= IP_VER(12, 55))
+		num_steer_regs += ARRAY_SIZE(xehpg_extregs);
+
 	sseu = &gt->info.sseu;
-	for_each_instdone_slice_subslice(i915, sseu, slice, subslice) {
-		num_tot_regs += num_steer_regs;
+	if (ipver >= IP_VER(12, 50)) {
+		for_each_instdone_gslice_dss_xehp(i915, sseu, iter, slice, subslice) {
+			num_tot_regs += num_steer_regs;
+		}
+	} else {
+		for_each_instdone_slice_subslice(i915, sseu, slice, subslice) {
+			num_tot_regs += num_steer_regs;
+		}
 	}
+
 	if (!num_tot_regs)
 		return;
 
@@ -223,15 +238,31 @@ guc_capture_alloc_steered_list_xelpd(struct intel_guc *guc,
 		return;
 
 	extarray = list->ext;
-	for_each_instdone_slice_subslice(i915, sseu, slice, subslice) {
-		for (i = 0; i < num_steer_regs; i++) {
-			extarray->reg = xelpd_extregs[i].reg;
-			extarray->flags = FIELD_PREP(GUC_REGSET_STEERING_GROUP, slice);
-			extarray->flags |= FIELD_PREP(GUC_REGSET_STEERING_INSTANCE, subslice);
-			extarray->regname = xelpd_extregs[i].name;
-			++extarray;
+
+#define POPULATE_NEXT_EXTREG(ext, list, idx, slicenum, subslicenum) \
+	{ \
+		ext->reg = list[idx].reg; \
+		ext->flags = FIELD_PREP(GUC_REGSET_STEERING_GROUP, slicenum); \
+		ext->flags |= FIELD_PREP(GUC_REGSET_STEERING_INSTANCE, subslicenum); \
+		ext->regname = xelpd_extregs[i].name; \
+		++ext; \
+	}
+	if (ipver >= IP_VER(12, 50)) {
+		for_each_instdone_gslice_dss_xehp(i915, sseu, iter, slice, subslice) {
+			for (i = 0; i < ARRAY_SIZE(xelpd_extregs); i++)
+				POPULATE_NEXT_EXTREG(extarray, xelpd_extregs, i, slice, subslice)
+			for (i = 0; i < ARRAY_SIZE(xehpg_extregs) && ipver >= IP_VER(12, 55);
+			     i++)
+				POPULATE_NEXT_EXTREG(extarray, xehpg_extregs, i, slice, subslice)
+		}
+	} else {
+		for_each_instdone_slice_subslice(i915, sseu, slice, subslice) {
+			for (i = 0; i < num_steer_regs; i++)
+				POPULATE_NEXT_EXTREG(extarray, xelpd_extregs, i, slice, subslice)
 		}
 	}
+#undef POPULATE_NEXT_EXTREG
+
 	list->num_ext = num_tot_regs;
 }
 
@@ -248,7 +279,10 @@ guc_capture_get_device_reglist(struct intel_guc *guc)
 		 * these at init time based on hw config add it as an extension
 		 * list at the end of the pre-populated render list.
 		 */
-		guc_capture_alloc_steered_list_xelpd(guc, xe_lpd_lists);
+		guc_capture_alloc_steered_list_xe_lpd_hpg(guc, xe_lpd_lists, IP_VER(12, 0));
+		return xe_lpd_lists;
+	} else if (IS_DG2(i915)) {
+		guc_capture_alloc_steered_list_xe_lpd_hpg(guc, xe_lpd_lists, IP_VER(12, 55));
 		return xe_lpd_lists;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 4/7] drm/i915/guc: Add GuC's error state capture output structures.
  2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
                   ` (3 preceding siblings ...)
  (?)
@ 2022-01-18 10:03 ` Alan Previn
  -1 siblings, 0 replies; 14+ messages in thread
From: Alan Previn @ 2022-01-18 10:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: Alan Previn

Add GuC's error capture output structures and definitions as how
they would appear in GuC log buffer's error capture subregion after
an error state capture G2H event notification.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h | 35 +++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h b/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
index a2f97d04ff18..495cdb0228c6 100644
--- a/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
@@ -28,6 +28,41 @@ struct __guc_mmio_reg_descr_group {
 	struct __guc_mmio_reg_descr *ext;
 };
 
+struct guc_state_capture_header_t {
+	u32 reserved1;
+	u32 info;
+		#define CAP_HDR_CAPTURE_TYPE GENMASK(3, 0) /* see enum guc_capture_type */
+		#define CAP_HDR_ENGINE_CLASS GENMASK(7, 4) /* see GUC_MAX_ENGINE_CLASSES */
+		#define CAP_HDR_ENGINE_INSTANCE GENMASK(11, 8)
+	u32 lrca; /* if type-instance, LRCA (address) that hung, else set to ~0 */
+	u32 guc_id; /* if type-instance, context index of hung context, else set to ~0 */
+	u32 num_mmios;
+		#define CAP_HDR_NUM_MMIOS GENMASK(9, 0)
+} __packed;
+
+struct guc_state_capture_t {
+	struct guc_state_capture_header_t header;
+	struct guc_mmio_reg mmio_entries[0];
+} __packed;
+
+enum guc_capture_group_types {
+	GUC_STATE_CAPTURE_GROUP_TYPE_FULL,
+	GUC_STATE_CAPTURE_GROUP_TYPE_PARTIAL,
+	GUC_STATE_CAPTURE_GROUP_TYPE_MAX,
+};
+
+struct guc_state_capture_group_header_t {
+	u32 reserved1;
+	u32 info;
+		#define CAP_GRP_HDR_NUM_CAPTURES GENMASK(7, 0)
+		#define CAP_GRP_HDR_CAPTURE_TYPE GENMASK(15, 8) /* guc_capture_group_types */
+} __packed;
+
+struct guc_state_capture_group_t {
+	struct guc_state_capture_group_header_t grp_header;
+	struct guc_state_capture_t capture_entries[0];
+} __packed;
+
 struct __guc_state_capture_priv {
 	struct __guc_mmio_reg_descr_group *reglists;
 	u16 num_instance_regs[GUC_CAPTURE_LIST_INDEX_MAX][GUC_MAX_ENGINE_CLASSES];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 5/7] drm/i915/guc: Update GuC's log-buffer-state access for error capture.
  2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
                   ` (4 preceding siblings ...)
  (?)
@ 2022-01-18 10:03 ` Alan Previn
  -1 siblings, 0 replies; 14+ messages in thread
From: Alan Previn @ 2022-01-18 10:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: Alan Previn

GuC log buffer regions for debug-log-events, crash-dumps and
error-state-capture are all a single bo allocation that includes
the guc_log_buffer_state structures.

Since the error-capture region is accessed with high priority at non-
deterministic times (as part of gpu coredump) while the debug-log-event
region is populated and accessed with different priorities, timings and
consumers, let's split out separate locks for buffer-state accesses
of each region.

Also, ensure a global mapping is made up front for the entire bo
throughout GuC operation so that dynamic mapping and unmapping isn't
required for error capture log access if relay-logging isn't running.

Additionally, while here, make some readibility improvements:
1. change previous function names with "capture_logs" to
   "copy_debug_logs" to help make the distinction clearer.
2. Update the guc log region mapping comments to order them
   according to the enum definition as per the GuC interface.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    |  47 ++++++
 .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 135 +++++++++++-------
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  16 ++-
 5 files changed, 141 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 9542db6fda0d..e4c901a5080f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -34,6 +34,8 @@ struct intel_guc {
 	struct intel_uc_fw fw;
 	/** @log: sub-structure containing GuC log related data and objects */
 	struct intel_guc_log log;
+	/** @log_state: states and locks for each subregion of GuC's log buffer */
+	struct intel_guc_log_stats log_state[GUC_MAX_LOG_BUFFER];
 	/** @ct: the command transport communication channel */
 	struct intel_guc_ct ct;
 	/** @slpc: sub-structure containing SLPC related data and objects */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
index 3df396c72b4c..b637628905ec 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -620,6 +620,53 @@ int intel_guc_capture_prep_lists(struct intel_guc *guc, struct guc_ads *blob, u3
 	return PAGE_ALIGN(alloc_size);
 }
 
+#define GUC_CAPTURE_OVERBUFFER_MULTIPLIER 3
+int intel_guc_capture_output_min_size_est(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int worst_min_size = 0, num_regs = 0;
+	u16 tmp = 0;
+
+	/*
+	 * If every single engine-instance suffered a failure in quick succession but
+	 * were all unrelated, then a burst of multiple error-capture events would dump
+	 * registers for every one engine instance, one at a time. In this case, GuC
+	 * would even dump the global-registers repeatedly.
+	 *
+	 * For each engine instance, there would be 1 x guc_state_capture_group_t output
+	 * followed by 3 x guc_state_capture_t lists. The latter is how the register
+	 * dumps are split across different register types (where the '3' are global vs class
+	 * vs instance). Finally, let's multiply the whole thing by 3x (just so we are
+	 * not limited to just 1 round of data in a worst case full register dump log)
+	 *
+	 * NOTE: intel_guc_log that allocates the log buffer would round this size up to
+	 * a power of two.
+	 */
+
+	for_each_engine(engine, gt, id) {
+		worst_min_size += sizeof(struct guc_state_capture_group_header_t) +
+				  (3 * sizeof(struct guc_state_capture_header_t));
+
+		if (!guc_capture_list_count(guc, 0, GUC_CAPTURE_LIST_TYPE_GLOBAL, 0, &tmp))
+			num_regs += tmp;
+
+		if (!guc_capture_list_count(guc, 0, GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS,
+					    engine->class, &tmp)) {
+			num_regs += tmp;
+		}
+		if (!guc_capture_list_count(guc, 0, GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE,
+					    engine->class, &tmp)) {
+			num_regs += tmp;
+		}
+	}
+
+	worst_min_size += (num_regs * sizeof(struct guc_mmio_reg));
+
+	return (worst_min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER);
+}
+
 void intel_guc_capture_destroy(struct intel_guc *guc)
 {
 	guc_capture_clear_ext_regs(guc->capture.priv->reglists);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
index 6b5594ca529d..4d3e5221128c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
@@ -14,6 +14,7 @@ struct guc_gt_system_info;
 
 int intel_guc_capture_prep_lists(struct intel_guc *guc, struct guc_ads *blob, u32 blob_ggtt,
 				 u32 capture_offset, struct guc_gt_system_info *sysinfo);
+int intel_guc_capture_output_min_size_est(struct intel_guc *guc);
 void intel_guc_capture_destroy(struct intel_guc *guc);
 int intel_guc_capture_init(struct intel_guc *guc);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
index b53f61f3101f..d6b1a3c0fb15 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
@@ -6,12 +6,13 @@
 #include <linux/debugfs.h>
 
 #include "gt/intel_gt.h"
+#include "intel_guc_capture.h"
+#include "intel_guc_log.h"
 #include "i915_drv.h"
 #include "i915_irq.h"
 #include "i915_memcpy.h"
-#include "intel_guc_log.h"
 
-static void guc_log_capture_logs(struct intel_guc_log *log);
+static void guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log);
 
 /**
  * DOC: GuC firmware log
@@ -136,7 +137,7 @@ static void guc_move_to_next_buf(struct intel_guc_log *log)
 	smp_wmb();
 
 	/* All data has been written, so now move the offset of sub buffer. */
-	relay_reserve(log->relay.channel, log->vma->obj->base.size);
+	relay_reserve(log->relay.channel, log->vma->obj->base.size - CAPTURE_BUFFER_SIZE);
 
 	/* Switch to the next sub buffer */
 	relay_flush(log->relay.channel);
@@ -156,25 +157,25 @@ static void *guc_get_write_buffer(struct intel_guc_log *log)
 	return relay_reserve(log->relay.channel, 0);
 }
 
-static bool guc_check_log_buf_overflow(struct intel_guc_log *log,
-				       enum guc_log_buffer_type type,
+static bool guc_check_log_buf_overflow(struct intel_guc *guc,
+				       struct intel_guc_log_stats *log_state,
 				       unsigned int full_cnt)
 {
-	unsigned int prev_full_cnt = log->stats[type].sampled_overflow;
+	unsigned int prev_full_cnt = log_state->sampled_overflow;
 	bool overflow = false;
 
 	if (full_cnt != prev_full_cnt) {
 		overflow = true;
 
-		log->stats[type].overflow = full_cnt;
-		log->stats[type].sampled_overflow += full_cnt - prev_full_cnt;
+		log_state->overflow = full_cnt;
+		log_state->sampled_overflow += full_cnt - prev_full_cnt;
 
 		if (full_cnt < prev_full_cnt) {
 			/* buffer_full_cnt is a 4 bit counter */
-			log->stats[type].sampled_overflow += 16;
+			log_state->sampled_overflow += 16;
 		}
 
-		dev_notice_ratelimited(guc_to_gt(log_to_guc(log))->i915->drm.dev,
+		dev_notice_ratelimited(guc_to_gt(guc)->i915->drm.dev,
 				       "GuC log buffer overflow\n");
 	}
 
@@ -197,8 +198,10 @@ static unsigned int guc_get_log_buffer_size(enum guc_log_buffer_type type)
 	return 0;
 }
 
-static void guc_read_update_log_buffer(struct intel_guc_log *log)
+static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log)
 {
+	struct intel_guc *guc = log_to_guc(log);
+	struct intel_guc_log_stats *logstate;
 	unsigned int buffer_size, read_offset, write_offset, bytes_to_copy, full_cnt;
 	struct guc_log_buffer_state *log_buf_state, *log_buf_snapshot_state;
 	struct guc_log_buffer_state log_buf_state_local;
@@ -212,7 +215,7 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log)
 		goto out_unlock;
 
 	/* Get the pointer to shared GuC log buffer */
-	log_buf_state = src_data = log->relay.buf_addr;
+	log_buf_state = src_data = log->buf_addr;
 
 	/* Get the pointer to local buffer to store the logs */
 	log_buf_snapshot_state = dst_data = guc_get_write_buffer(log);
@@ -222,7 +225,7 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log)
 		 * Used rate limited to avoid deluge of messages, logs might be
 		 * getting consumed by User at a slow rate.
 		 */
-		DRM_ERROR_RATELIMITED("no sub-buffer to capture logs\n");
+		DRM_ERROR_RATELIMITED("no sub-buffer to copy general logs\n");
 		log->relay.full_count++;
 
 		goto out_unlock;
@@ -232,12 +235,16 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log)
 	src_data += PAGE_SIZE;
 	dst_data += PAGE_SIZE;
 
-	for (type = GUC_DEBUG_LOG_BUFFER; type < GUC_MAX_LOG_BUFFER; type++) {
+	/* For relay logging, we exclude error state capture */
+	for (type = GUC_DEBUG_LOG_BUFFER; type <= GUC_CRASH_DUMP_LOG_BUFFER; type++) {
 		/*
+		 * Get a lock to the buffer_state we want to read and update.
 		 * Make a copy of the state structure, inside GuC log buffer
 		 * (which is uncached mapped), on the stack to avoid reading
 		 * from it multiple times.
 		 */
+		logstate = &guc->log_state[type];
+		mutex_lock(&logstate->lock);
 		memcpy(&log_buf_state_local, log_buf_state,
 		       sizeof(struct guc_log_buffer_state));
 		buffer_size = guc_get_log_buffer_size(type);
@@ -246,13 +253,14 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log)
 		full_cnt = log_buf_state_local.buffer_full_cnt;
 
 		/* Bookkeeping stuff */
-		log->stats[type].flush += log_buf_state_local.flush_to_file;
-		new_overflow = guc_check_log_buf_overflow(log, type, full_cnt);
+		logstate->flush += log_buf_state_local.flush_to_file;
+		new_overflow = guc_check_log_buf_overflow(guc, logstate, full_cnt);
 
 		/* Update the state of shared log buffer */
 		log_buf_state->read_ptr = write_offset;
 		log_buf_state->flush_to_file = 0;
 		log_buf_state++;
+		mutex_unlock(&logstate->lock);
 
 		/* First copy the state structure in snapshot buffer */
 		memcpy(log_buf_snapshot_state, &log_buf_state_local,
@@ -300,49 +308,49 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log)
 	mutex_unlock(&log->relay.lock);
 }
 
-static void capture_logs_work(struct work_struct *work)
+static void copy_debug_logs_work(struct work_struct *work)
 {
 	struct intel_guc_log *log =
 		container_of(work, struct intel_guc_log, relay.flush_work);
 
-	guc_log_capture_logs(log);
+	guc_log_copy_debuglogs_for_relay(log);
 }
 
-static int guc_log_map(struct intel_guc_log *log)
+static int guc_log_relay_map(struct intel_guc_log *log)
 {
-	void *vaddr;
-
 	lockdep_assert_held(&log->relay.lock);
 
-	if (!log->vma)
+	if (!log->vma || !log->buf_addr)
 		return -ENODEV;
 
 	/*
-	 * Create a WC (Uncached for read) vmalloc mapping of log
-	 * buffer pages, so that we can directly get the data
-	 * (up-to-date) from memory.
+	 * WC vmalloc mapping of log buffer pages was done at
+	 * GuC Init time, but lets keep a ref for book-keeping
 	 */
-	vaddr = i915_gem_object_pin_map_unlocked(log->vma->obj, I915_MAP_WC);
-	if (IS_ERR(vaddr))
-		return PTR_ERR(vaddr);
-
-	log->relay.buf_addr = vaddr;
+	i915_gem_object_get(log->vma->obj);
+	log->relay.buf_in_use = true;
 
 	return 0;
 }
 
-static void guc_log_unmap(struct intel_guc_log *log)
+static void guc_log_relay_unmap(struct intel_guc_log *log)
 {
 	lockdep_assert_held(&log->relay.lock);
 
-	i915_gem_object_unpin_map(log->vma->obj);
-	log->relay.buf_addr = NULL;
+	i915_gem_object_put(log->vma->obj);
+	log->relay.buf_in_use = false;
 }
 
 void intel_guc_log_init_early(struct intel_guc_log *log)
 {
+	struct intel_guc *guc = log_to_guc(log);
+	int n;
+
+	for (n = GUC_DEBUG_LOG_BUFFER; n < GUC_MAX_LOG_BUFFER; n++)
+		mutex_init(&guc->log_state[n].lock);
+
 	mutex_init(&log->relay.lock);
-	INIT_WORK(&log->relay.flush_work, capture_logs_work);
+	INIT_WORK(&log->relay.flush_work, copy_debug_logs_work);
 	log->relay.started = false;
 }
 
@@ -357,8 +365,11 @@ static int guc_log_relay_create(struct intel_guc_log *log)
 	lockdep_assert_held(&log->relay.lock);
 	GEM_BUG_ON(!log->vma);
 
-	 /* Keep the size of sub buffers same as shared log buffer */
-	subbuf_size = log->vma->size;
+	 /*
+	  * Keep the size of sub buffers same as shared log buffer
+	  * but GuC log-events excludes the error-state-capture logs
+	  */
+	subbuf_size = log->vma->size - CAPTURE_BUFFER_SIZE;
 
 	/*
 	 * Store up to 8 snapshots, which is large enough to buffer sufficient
@@ -393,13 +404,13 @@ static void guc_log_relay_destroy(struct intel_guc_log *log)
 	log->relay.channel = NULL;
 }
 
-static void guc_log_capture_logs(struct intel_guc_log *log)
+static void guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log)
 {
 	struct intel_guc *guc = log_to_guc(log);
 	struct drm_i915_private *dev_priv = guc_to_gt(guc)->i915;
 	intel_wakeref_t wakeref;
 
-	guc_read_update_log_buffer(log);
+	_guc_log_copy_debuglogs_for_relay(log);
 
 	/*
 	 * Generally device is expected to be active only at this
@@ -439,6 +450,7 @@ int intel_guc_log_create(struct intel_guc_log *log)
 {
 	struct intel_guc *guc = log_to_guc(log);
 	struct i915_vma *vma;
+	void *vaddr;
 	u32 guc_log_size;
 	int ret;
 
@@ -446,25 +458,29 @@ int intel_guc_log_create(struct intel_guc_log *log)
 
 	/*
 	 *  GuC Log buffer Layout
+	 * (this ordering must follow "enum guc_log_buffer_type" definition)
 	 *
 	 *  +===============================+ 00B
-	 *  |    Crash dump state header    |
-	 *  +-------------------------------+ 32B
 	 *  |      Debug state header       |
+	 *  +-------------------------------+ 32B
+	 *  |    Crash dump state header    |
 	 *  +-------------------------------+ 64B
 	 *  |     Capture state header      |
 	 *  +-------------------------------+ 96B
 	 *  |                               |
 	 *  +===============================+ PAGE_SIZE (4KB)
-	 *  |        Crash Dump logs        |
-	 *  +===============================+ + CRASH_SIZE
 	 *  |          Debug logs           |
 	 *  +===============================+ + DEBUG_SIZE
+	 *  |        Crash Dump logs        |
+	 *  +===============================+ + CRASH_SIZE
 	 *  |         Capture logs          |
 	 *  +===============================+ + CAPTURE_SIZE
 	 */
-	guc_log_size = PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE +
-		       CAPTURE_BUFFER_SIZE;
+	if (intel_guc_capture_output_min_size_est(guc) > CAPTURE_BUFFER_SIZE)
+		DRM_WARN("GuC log buffer for state_capture maybe too small. %d < %d\n",
+			 CAPTURE_BUFFER_SIZE, intel_guc_capture_output_min_size_est(guc));
+
+	guc_log_size = PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
 
 	vma = intel_guc_allocate_vma(guc, guc_log_size);
 	if (IS_ERR(vma)) {
@@ -473,6 +489,17 @@ int intel_guc_log_create(struct intel_guc_log *log)
 	}
 
 	log->vma = vma;
+	/*
+	 * Create a WC (Uncached for read) vmalloc mapping up front immediate access to
+	 * data from memory during  critical events such as error capture
+	 */
+	vaddr = i915_gem_object_pin_map_unlocked(log->vma->obj, I915_MAP_WC);
+	if (IS_ERR(vaddr)) {
+		ret = PTR_ERR(vaddr);
+		i915_vma_unpin_and_release(&log->vma, 0);
+		goto err;
+	}
+	log->buf_addr = vaddr;
 
 	log->level = __get_default_log_level(log);
 	DRM_DEBUG_DRIVER("guc_log_level=%d (%s, verbose:%s, verbosity:%d)\n",
@@ -483,13 +510,14 @@ int intel_guc_log_create(struct intel_guc_log *log)
 	return 0;
 
 err:
-	DRM_ERROR("Failed to allocate GuC log buffer. %d\n", ret);
+	DRM_ERROR("Failed to allocate or map GuC log buffer. %d\n", ret);
 	return ret;
 }
 
 void intel_guc_log_destroy(struct intel_guc_log *log)
 {
-	i915_vma_unpin_and_release(&log->vma, 0);
+	log->buf_addr = NULL;
+	i915_vma_unpin_and_release(&log->vma, I915_VMA_RELEASE_MAP);
 }
 
 int intel_guc_log_set_level(struct intel_guc_log *log, u32 level)
@@ -534,7 +562,7 @@ int intel_guc_log_set_level(struct intel_guc_log *log, u32 level)
 
 bool intel_guc_log_relay_created(const struct intel_guc_log *log)
 {
-	return log->relay.buf_addr;
+	return log->buf_addr;
 }
 
 int intel_guc_log_relay_open(struct intel_guc_log *log)
@@ -565,7 +593,7 @@ int intel_guc_log_relay_open(struct intel_guc_log *log)
 	if (ret)
 		goto out_unlock;
 
-	ret = guc_log_map(log);
+	ret = guc_log_relay_map(log);
 	if (ret)
 		goto out_relay;
 
@@ -615,8 +643,8 @@ void intel_guc_log_relay_flush(struct intel_guc_log *log)
 	with_intel_runtime_pm(guc_to_gt(guc)->uncore->rpm, wakeref)
 		guc_action_flush_log(guc);
 
-	/* GuC would have updated log buffer by now, so capture it */
-	guc_log_capture_logs(log);
+	/* GuC would have updated log buffer by now, so copy it */
+	guc_log_copy_debuglogs_for_relay(log);
 }
 
 /*
@@ -645,7 +673,7 @@ void intel_guc_log_relay_close(struct intel_guc_log *log)
 
 	mutex_lock(&log->relay.lock);
 	GEM_BUG_ON(!intel_guc_log_relay_created(log));
-	guc_log_unmap(log);
+	guc_log_relay_unmap(log);
 	guc_log_relay_destroy(log);
 	mutex_unlock(&log->relay.lock);
 }
@@ -682,6 +710,7 @@ stringify_guc_log_type(enum guc_log_buffer_type type)
  */
 void intel_guc_log_info(struct intel_guc_log *log, struct drm_printer *p)
 {
+	struct intel_guc *guc = log_to_guc(log);
 	enum guc_log_buffer_type type;
 
 	if (!intel_guc_log_relay_created(log)) {
@@ -696,8 +725,8 @@ void intel_guc_log_info(struct intel_guc_log *log, struct drm_printer *p)
 	for (type = GUC_DEBUG_LOG_BUFFER; type < GUC_MAX_LOG_BUFFER; type++) {
 		drm_printf(p, "\t%s:\tflush count %10u, overflow count %10u\n",
 			   stringify_guc_log_type(type),
-			   log->stats[type].flush,
-			   log->stats[type].sampled_overflow);
+			   guc->log_state[type].flush,
+			   guc->log_state[type].sampled_overflow);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
index d7e1b6471fed..b6e8e9ee37b7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
@@ -46,23 +46,25 @@ struct intel_guc;
 #define GUC_VERBOSITY_TO_LOG_LEVEL(x)	((x) + 2)
 #define GUC_LOG_LEVEL_MAX GUC_VERBOSITY_TO_LOG_LEVEL(GUC_LOG_VERBOSITY_MAX)
 
+struct intel_guc_log_stats {
+	struct mutex lock; /* protects below and guc_log_buffer_state's read-ptr */
+	u32 sampled_overflow;
+	u32 overflow;
+	u32 flush;
+};
+
 struct intel_guc_log {
 	u32 level;
 	struct i915_vma *vma;
+	void *buf_addr;
 	struct {
-		void *buf_addr;
+		bool buf_in_use;
 		bool started;
 		struct work_struct flush_work;
 		struct rchan *channel;
 		struct mutex lock;
 		u32 full_count;
 	} relay;
-	/* logging related stats */
-	struct {
-		u32 sampled_overflow;
-		u32 overflow;
-		u32 flush;
-	} stats[GUC_MAX_LOG_BUFFER];
 };
 
 void intel_guc_log_init_early(struct intel_guc_log *log);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 6/7] drm/i915/guc: Copy new GuC error capture logs upon G2H notification.
  2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
                   ` (5 preceding siblings ...)
  (?)
@ 2022-01-18 10:03 ` Alan Previn
  2022-01-19  1:36   ` Teres Alexis, Alan Previn
  -1 siblings, 1 reply; 14+ messages in thread
From: Alan Previn @ 2022-01-18 10:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: Alan Previn

Upon the G2H Notify-Err-Capture event, make a snapshot of the error
state capture logs from the GuC-log buffer (error capture region)
into an bigger interim circular buffer store that can be parsed
later during gpu coredump printing.

Also, do the same for where we reset GuC submission and need to
flush outstanding logs.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   7 +
 drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h |  12 ++
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 186 ++++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    |  26 ++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  14 +-
 7 files changed, 240 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 7afdadc7656f..82a69f54cddb 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -173,4 +173,11 @@ enum intel_guc_sleep_state_status {
 #define GUC_LOG_CONTROL_VERBOSITY_MASK	(0xF << GUC_LOG_CONTROL_VERBOSITY_SHIFT)
 #define GUC_LOG_CONTROL_DEFAULT_LOGGING	(1 << 8)
 
+enum intel_guc_state_capture_event_status {
+	INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_SUCCESS = 0x0,
+	INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_NOSPACE = 0x1,
+};
+
+#define INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_MASK      0x1
+
 #endif /* _ABI_GUC_ACTIONS_ABI_H */
diff --git a/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h b/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
index 495cdb0228c6..d9ea5df64b06 100644
--- a/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h
@@ -63,11 +63,23 @@ struct guc_state_capture_group_t {
 	struct guc_state_capture_t capture_entries[0];
 } __packed;
 
+struct guc_capture_out_store {
+	/* An interim storage to copy the GuC error-capture-output before
+	 * parsing and reporting via proper reporting flows with formatting.
+	 */
+	unsigned char *addr;
+	size_t size;
+	unsigned long head; /* inject new output capture data */
+	unsigned long tail; /* remove output capture data when reporting */
+	struct mutex lock; /*lock head or tail when copying capture in or extracting out*/
+};
+
 struct __guc_state_capture_priv {
 	struct __guc_mmio_reg_descr_group *reglists;
 	u16 num_instance_regs[GUC_CAPTURE_LIST_INDEX_MAX][GUC_MAX_ENGINE_CLASSES];
 	u16 num_class_regs[GUC_CAPTURE_LIST_INDEX_MAX][GUC_MAX_ENGINE_CLASSES];
 	u16 num_global_regs[GUC_CAPTURE_LIST_INDEX_MAX];
+	struct guc_capture_out_store out_store;
 };
 
 #endif /* _INTEL_GUC_CAPTURE_FWIF_H */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
index b637628905ec..fc80c5f31915 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -3,6 +3,7 @@
  * Copyright © 2021-2021 Intel Corporation
  */
 
+#include <linux/circ_buf.h>
 #include <linux/types.h>
 
 #include <drm/drm_print.h>
@@ -12,6 +13,8 @@
 #include "guc_capture_fwif.h"
 #include "intel_guc_fwif.h"
 #include "i915_drv.h"
+#include "i915_gpu_error.h"
+#include "i915_irq.h"
 #include "i915_memcpy.h"
 
 /*
@@ -629,6 +632,9 @@ int intel_guc_capture_output_min_size_est(struct intel_guc *guc)
 	int worst_min_size = 0, num_regs = 0;
 	u16 tmp = 0;
 
+	if (!guc->capture.priv)
+		return -ENODEV;
+
 	/*
 	 * If every single engine-instance suffered a failure in quick succession but
 	 * were all unrelated, then a burst of multiple error-capture events would dump
@@ -667,8 +673,174 @@ int intel_guc_capture_output_min_size_est(struct intel_guc *guc)
 	return (worst_min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER);
 }
 
+/*
+ * KMD Init time flows:
+ * --------------------
+ *     --> alloc A: GuC input capture regs lists (registered via ADS)
+ *                  List acquired via intel_guc_capture_list_count + intel_guc_capture_list_init
+ *                  Size = global-reg-list + (class-reg-list) + (num-instances x instance-reg-list)
+ *                  Device tables carry: 1x global, 1x per-class, 1x per-instance)
+ *                  Caller needs to call per-class and per-instance multiplie times
+ *
+ *     --> alloc B: GuC output capture buf (registered via guc_init_params(log_param))
+ *                  Size = #define CAPTURE_BUFFER_SIZE (warns if on too-small)
+ *                  Note2: 'x 3' to hold multiple capture groups
+ *
+ *     --> alloc C: GuC capture interim circular buffer storage in system mem
+ *                  Size = 'power_of_two(sizeof(B))' as per kernel circular buffer helper
+ *
+ * GUC Runtime notify capture:
+ * --------------------------
+ *     --> G2H STATE_CAPTURE_NOTIFICATION
+ *                   L--> intel_guc_capture_store_snapshot
+ *                           L--> Copies from B (head->tail) into C
+ */
+
+static void guc_capture_store_insert(struct intel_guc *guc, struct guc_capture_out_store *store,
+				     unsigned char *new_data, size_t bytes)
+{
+	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
+	unsigned char *dst_data = store->addr;
+	unsigned long h, t;
+	size_t tmp;
+
+	h = store->head;
+	t = store->tail;
+	if (CIRC_SPACE(h, t, store->size) >= bytes) {
+		while (bytes) {
+			tmp = CIRC_SPACE_TO_END(h, t, store->size);
+			if (tmp) {
+				tmp = tmp < bytes ? tmp : bytes;
+				i915_unaligned_memcpy_from_wc(&dst_data[h], new_data, tmp);
+				bytes -= tmp;
+				new_data += tmp;
+				h = (h + tmp) & (store->size - 1);
+			} else {
+				drm_err(&i915->drm, "circbuf copy-to ptr-corruption!\n");
+				break;
+			}
+		}
+		store->head = h;
+	} else {
+		drm_err(&i915->drm, "GuC capture interim-store insufficient space!\n");
+	}
+}
+
+static void __guc_capture_store_snapshot_work(struct intel_guc *guc)
+{
+	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
+	unsigned int buffer_size, read_offset, write_offset, bytes_to_copy, full_count;
+	struct guc_log_buffer_state *log_buf_state;
+	struct guc_log_buffer_state log_buf_state_local;
+	void *src_data, *dst_data = NULL;
+	bool new_overflow;
+
+	/* Lock to get the pointer to GuC capture-log-buffer-state */
+	mutex_lock(&guc->log_state[GUC_CAPTURE_LOG_BUFFER].lock);
+	log_buf_state = guc->log.buf_addr +
+			(sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER);
+	src_data = guc->log.buf_addr + intel_guc_get_log_buffer_offset(GUC_CAPTURE_LOG_BUFFER);
+
+	/*
+	 * Make a copy of the state structure, inside GuC log buffer
+	 * (which is uncached mapped), on the stack to avoid reading
+	 * from it multiple times.
+	 */
+	memcpy(&log_buf_state_local, log_buf_state, sizeof(struct guc_log_buffer_state));
+	buffer_size = intel_guc_get_log_buffer_size(GUC_CAPTURE_LOG_BUFFER);
+	read_offset = log_buf_state_local.read_ptr;
+	write_offset = log_buf_state_local.sampled_write_ptr;
+	full_count = log_buf_state_local.buffer_full_cnt;
+
+	/* Bookkeeping stuff */
+	guc->log_state[GUC_CAPTURE_LOG_BUFFER].flush += log_buf_state_local.flush_to_file;
+	new_overflow = intel_guc_check_log_buf_overflow(guc,
+							&guc->log_state[GUC_CAPTURE_LOG_BUFFER],
+							full_count);
+
+	/* Update the state of shared log buffer */
+	log_buf_state->read_ptr = write_offset;
+	log_buf_state->flush_to_file = 0;
+
+	mutex_unlock(&guc->log_state[GUC_CAPTURE_LOG_BUFFER].lock);
+
+	dst_data = guc->capture.priv->out_store.addr;
+	if (dst_data) {
+		mutex_lock(&guc->capture.priv->out_store.lock);
+
+		/* Now copy the actual logs. */
+		if (unlikely(new_overflow)) {
+			/* copy the whole buffer in case of overflow */
+			read_offset = 0;
+			write_offset = buffer_size;
+		} else if (unlikely((read_offset > buffer_size) ||
+			   (write_offset > buffer_size))) {
+			drm_err(&i915->drm, "invalid GuC log capture buffer state!\n");
+			/* copy whole buffer as offsets are unreliable */
+			read_offset = 0;
+			write_offset = buffer_size;
+		}
+
+		/* first copy from the tail end of the GuC log capture buffer */
+		if (read_offset > write_offset) {
+			guc_capture_store_insert(guc, &guc->capture.priv->out_store, src_data,
+						 write_offset);
+			bytes_to_copy = buffer_size - read_offset;
+		} else {
+			bytes_to_copy = write_offset - read_offset;
+		}
+		guc_capture_store_insert(guc, &guc->capture.priv->out_store, src_data + read_offset,
+					 bytes_to_copy);
+
+		mutex_unlock(&guc->capture.priv->out_store.lock);
+	}
+}
+
+void intel_guc_capture_store_snapshot(struct intel_guc *guc)
+{
+	if (guc->capture.priv)
+		__guc_capture_store_snapshot_work(guc);
+}
+
+static void guc_capture_store_destroy(struct intel_guc *guc)
+{
+	mutex_destroy(&guc->capture.priv->out_store.lock);
+	guc->capture.priv->out_store.size = 0;
+	kfree(guc->capture.priv->out_store.addr);
+	guc->capture.priv->out_store.addr = NULL;
+}
+
+static int guc_capture_store_create(struct intel_guc *guc)
+{
+	/*
+	 * Make this interim buffer larger than GuC capture output buffer so that we can absorb
+	 * a little delay when processing the raw capture dumps into text friendly logs
+	 * for the i915_gpu_coredump output
+	 */
+	size_t max_dump_size;
+
+	GEM_BUG_ON(guc->capture.priv->out_store.addr);
+
+	max_dump_size = PAGE_ALIGN(intel_guc_capture_output_min_size_est(guc));
+	max_dump_size = roundup_pow_of_two(max_dump_size);
+
+	guc->capture.priv->out_store.addr = kzalloc(max_dump_size, GFP_KERNEL);
+	if (!guc->capture.priv->out_store.addr)
+		return -ENOMEM;
+
+	guc->capture.priv->out_store.size = max_dump_size;
+	mutex_init(&guc->capture.priv->out_store.lock);
+
+	return 0;
+}
+
 void intel_guc_capture_destroy(struct intel_guc *guc)
 {
+	if (!guc->capture.priv)
+		return;
+
+	intel_synchronize_irq(guc_to_gt(guc)->i915);
+	guc_capture_store_destroy(guc);
 	guc_capture_clear_ext_regs(guc->capture.priv->reglists);
 	kfree(guc->capture.priv);
 	guc->capture.priv = NULL;
@@ -676,10 +848,24 @@ void intel_guc_capture_destroy(struct intel_guc *guc)
 
 int intel_guc_capture_init(struct intel_guc *guc)
 {
+	int ret;
+
 	guc->capture.priv = kzalloc(sizeof(*guc->capture.priv), GFP_KERNEL);
 	if (!guc->capture.priv)
 		return -ENOMEM;
+
 	guc->capture.priv->reglists = guc_capture_get_device_reglist(guc);
+	/*
+	 * allocate interim store at init time so we dont require memory
+	 * allocation whilst in the midst of the reset + capture
+	 */
+	ret = guc_capture_store_create(guc);
+	if (ret) {
+		guc_capture_clear_ext_regs(guc->capture.priv->reglists);
+		kfree(guc->capture.priv);
+		guc->capture.priv = NULL;
+		return ret;
+	}
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
index 4d3e5221128c..c240a4cc046b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
@@ -14,6 +14,7 @@ struct guc_gt_system_info;
 
 int intel_guc_capture_prep_lists(struct intel_guc *guc, struct guc_ads *blob, u32 blob_ggtt,
 				 u32 capture_offset, struct guc_gt_system_info *sysinfo);
+void intel_guc_capture_store_snapshot(struct intel_guc *guc);
 int intel_guc_capture_output_min_size_est(struct intel_guc *guc);
 void intel_guc_capture_destroy(struct intel_guc *guc);
 int intel_guc_capture_init(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
index d6b1a3c0fb15..194b17e8c2ae 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
@@ -157,9 +157,9 @@ static void *guc_get_write_buffer(struct intel_guc_log *log)
 	return relay_reserve(log->relay.channel, 0);
 }
 
-static bool guc_check_log_buf_overflow(struct intel_guc *guc,
-				       struct intel_guc_log_stats *log_state,
-				       unsigned int full_cnt)
+bool intel_guc_check_log_buf_overflow(struct intel_guc *guc,
+				      struct intel_guc_log_stats *log_state,
+				      unsigned int full_cnt)
 {
 	unsigned int prev_full_cnt = log_state->sampled_overflow;
 	bool overflow = false;
@@ -182,7 +182,7 @@ static bool guc_check_log_buf_overflow(struct intel_guc *guc,
 	return overflow;
 }
 
-static unsigned int guc_get_log_buffer_size(enum guc_log_buffer_type type)
+unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type)
 {
 	switch (type) {
 	case GUC_DEBUG_LOG_BUFFER:
@@ -198,6 +198,20 @@ static unsigned int guc_get_log_buffer_size(enum guc_log_buffer_type type)
 	return 0;
 }
 
+size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type)
+{
+	enum guc_log_buffer_type i;
+	size_t offset = PAGE_SIZE;/* for the log_buffer_states */
+
+	for (i = GUC_DEBUG_LOG_BUFFER; i < GUC_MAX_LOG_BUFFER; i++) {
+		if (i == type)
+			break;
+		offset += intel_guc_get_log_buffer_size(i);
+	}
+
+	return offset;
+}
+
 static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log)
 {
 	struct intel_guc *guc = log_to_guc(log);
@@ -247,14 +261,14 @@ static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log)
 		mutex_lock(&logstate->lock);
 		memcpy(&log_buf_state_local, log_buf_state,
 		       sizeof(struct guc_log_buffer_state));
-		buffer_size = guc_get_log_buffer_size(type);
+		buffer_size = intel_guc_get_log_buffer_size(type);
 		read_offset = log_buf_state_local.read_ptr;
 		write_offset = log_buf_state_local.sampled_write_ptr;
 		full_cnt = log_buf_state_local.buffer_full_cnt;
 
 		/* Bookkeeping stuff */
 		logstate->flush += log_buf_state_local.flush_to_file;
-		new_overflow = guc_check_log_buf_overflow(guc, logstate, full_cnt);
+		new_overflow = intel_guc_check_log_buf_overflow(guc, logstate, full_cnt);
 
 		/* Update the state of shared log buffer */
 		log_buf_state->read_ptr = write_offset;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
index b6e8e9ee37b7..f16de816447d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
@@ -68,6 +68,10 @@ struct intel_guc_log {
 };
 
 void intel_guc_log_init_early(struct intel_guc_log *log);
+bool intel_guc_check_log_buf_overflow(struct intel_guc *guc, struct intel_guc_log_stats *state,
+				      unsigned int full_cnt);
+unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type);
+size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type);
 int intel_guc_log_create(struct intel_guc_log *log);
 void intel_guc_log_destroy(struct intel_guc_log *log);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 23a40f10d376..baaa33472a50 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -24,6 +24,7 @@
 #include "gt/intel_ring.h"
 
 #include "intel_guc_ads.h"
+#include "intel_guc_capture.h"
 #include "intel_guc_submission.h"
 
 #include "i915_drv.h"
@@ -1431,6 +1432,8 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 	}
 
 	scrub_guc_desc_for_outstanding_g2h(guc);
+
+	intel_guc_capture_store_snapshot(guc);
 }
 
 static struct intel_engine_cs *
@@ -4025,17 +4028,20 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 int intel_guc_error_capture_process_msg(struct intel_guc *guc,
 					const u32 *msg, u32 len)
 {
-	int status;
+	u32 status;
 
 	if (unlikely(len != 1)) {
 		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
 		return -EPROTO;
 	}
 
-	status = msg[0];
-	drm_info(&guc_to_gt(guc)->i915->drm, "Got error capture: status = %d", status);
+	status = msg[0] & INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_MASK;
+	if (status == INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_NOSPACE)
+		drm_warn(&guc_to_gt(guc)->i915->drm, "G2H-Error capture no space");
+	else
+		drm_info(&guc_to_gt(guc)->i915->drm, "G2H-Received error capture");
 
-	/* FIXME: Do something with the capture */
+	intel_guc_capture_store_snapshot(guc);
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 7/7] drm/i915/guc: Print the GuC error capture output register list.
  2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
                   ` (6 preceding siblings ...)
  (?)
@ 2022-01-18 10:03 ` Alan Previn
  -1 siblings, 0 replies; 14+ messages in thread
From: Alan Previn @ 2022-01-18 10:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: Alan Previn

Print the GuC captured error state register list (string names
and values) when gpu_coredump_state printout is invoked via
the i915 debugfs for flushing the gpu error-state that was
captured prior.

Since GuC could have reported multiple engine register dumps
in a single notification event, parse the captured data
(appearing as a stream of structures) to identify each dump as
a different 'engine-capture-group-output'.

Finally, for each 'engine-capture-group-output' that is found,
verify if the engine register dump corresponds to the
engine_coredump content that was previously populated by the
i915_gpu_coredump function. That function would have copied
the context's vma's including the bacth buffer during the
G2H-context-reset notification that occurred earlier. Perform
this verification check by comparing guc_id, lrca and engine-
instance obtained from the 'engine-capture-group-output' vs a
copy of that same info taken during i915_gpu_coredump. If
they match, then print those vma's as well (such as the batch
buffers).

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   4 +-
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 439 ++++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |  10 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         |  65 ++-
 drivers/gpu/drm/i915/i915_gpu_error.h         |  14 +
 5 files changed, 509 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 4317ae5e525b..47c0c32d9b86 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1628,9 +1628,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
 		drm_printf(m, "\tIPEHR: 0x%08x\n", ENGINE_READ(engine, IPEHR));
 	}
 
-	if (intel_engine_uses_guc(engine)) {
-		/* nothing to print yet */
-	} else if (HAS_EXECLISTS(dev_priv)) {
+	if (HAS_EXECLISTS(dev_priv) && !intel_engine_uses_guc(engine)) {
 		struct i915_request * const *port, *rq;
 		const u32 *hws =
 			&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX];
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
index fc80c5f31915..1c8ad6a1c2d3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -694,8 +694,423 @@ int intel_guc_capture_output_min_size_est(struct intel_guc *guc)
  *     --> G2H STATE_CAPTURE_NOTIFICATION
  *                   L--> intel_guc_capture_store_snapshot
  *                           L--> Copies from B (head->tail) into C
+ *
+ * GUC --> notify context reset:
+ * -----------------------------
+ *     --> G2H CONTEXT RESET
+ *                   L--> guc_handle_context_reset --> i915_capture_error_state
+ *                    --> i915_gpu_coredump --> intel_guc_capture_store_ptr
+ *                        L--> keep a ptr to capture_store in
+ *                             i915_gpu_coredump struct.
+ *
+ * User Sysfs / Debugfs
+ * --------------------
+ *      --> i915_gpu_coredump_copy_to_buffer->
+ *                   L--> err_print_to_sgl --> err_print_gt
+ *                        L--> error_print_guc_captures
+ *                             L--> loop: intel_guc_capture_out_print_next_group
+ *
  */
 
+#if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)
+
+static const char *
+guc_capture_register_to_string(const struct intel_guc *guc, u32 owner, u32 type,
+			       u32 class, u32 id, u32 offset, u32 *is_ext)
+{
+	struct __guc_mmio_reg_descr_group *reglists = guc->capture.priv->reglists;
+	struct __guc_mmio_reg_descr_group *match;
+	int num_regs, j;
+
+	*is_ext = 0;
+	if (!reglists)
+		return NULL;
+
+	match = guc_capture_get_one_list(reglists, owner, type, id);
+
+	if (match) {
+		for (num_regs = match->num_regs, j = 0; j < num_regs; ++j) {
+			if (offset == match->list[j].reg.reg)
+				return match->list[j].regname;
+		}
+	}
+	if (match->ext) {
+		for (num_regs = match->num_ext, j = 0; j < num_regs; ++j) {
+			if (offset == match->ext[j].reg.reg) {
+				*is_ext = 1;
+				return match->ext[j].regname;
+			}
+		}
+	}
+
+	return NULL;
+}
+
+static int
+guc_capture_store_remove_dw(struct guc_capture_out_store *store, u32 *bytesleft,
+			    u32 *dw)
+{
+	int tries = 2;
+	int avail = 0;
+	u32 *src_data;
+
+	if (!*bytesleft)
+		return 0;
+
+	while (tries--) {
+		avail = CIRC_CNT_TO_END(store->head, store->tail, store->size);
+		if (avail >= sizeof(u32)) {
+			src_data = (u32 *)(store->addr + store->tail);
+			*dw = *src_data;
+			store->tail = (store->tail + 4) & (store->size - 1);
+			*bytesleft -= 4;
+			return 4;
+		}
+		if (store->tail == (store->size - 1) && store->head > 0)
+			store->tail = 0;
+	}
+
+	return 0;
+}
+
+static int
+guc_capture_store_get_group_hdr(const struct intel_guc *guc,
+				struct guc_capture_out_store *store, u32 *bytesleft,
+				struct guc_state_capture_group_header_t *ghdr)
+{
+	int read = 0;
+	int fullsize = sizeof(struct guc_state_capture_group_header_t);
+
+	if (fullsize > *bytesleft)
+		return -1;
+
+	if (CIRC_CNT_TO_END(store->head, store->tail, store->size) >= fullsize) {
+		memcpy(ghdr, (store->addr + store->tail), fullsize);
+		store->tail = (store->tail + fullsize) & (store->size - 1);
+		*bytesleft -= fullsize;
+		return 0;
+	}
+
+	read += guc_capture_store_remove_dw(store, bytesleft, &ghdr->reserved1);
+	read += guc_capture_store_remove_dw(store, bytesleft, &ghdr->info);
+	if (read != sizeof(*ghdr))
+		return -1;
+
+	return 0;
+}
+
+static int
+guc_capture_store_get_data_hdr(const struct intel_guc *guc,
+			       struct guc_capture_out_store *store, u32 *bytesleft,
+			       struct guc_state_capture_header_t *hdr)
+{
+	int read = 0;
+	int fullsize = sizeof(struct guc_state_capture_header_t);
+
+	if (fullsize > *bytesleft)
+		return -1;
+
+	if (CIRC_CNT_TO_END(store->head, store->tail, store->size) >= fullsize) {
+		memcpy(hdr, (store->addr + store->tail), fullsize);
+		store->tail = (store->tail + fullsize) & (store->size - 1);
+		*bytesleft -= fullsize;
+		return 0;
+	}
+
+	read += guc_capture_store_remove_dw(store, bytesleft, &hdr->reserved1);
+	read += guc_capture_store_remove_dw(store, bytesleft, &hdr->info);
+	read += guc_capture_store_remove_dw(store, bytesleft, &hdr->lrca);
+	read += guc_capture_store_remove_dw(store, bytesleft, &hdr->guc_id);
+	read += guc_capture_store_remove_dw(store, bytesleft, &hdr->num_mmios);
+	if (read != sizeof(*hdr))
+		return -1;
+
+	return 0;
+}
+
+static int
+guc_capture_store_get_register(const struct intel_guc *guc,
+			       struct guc_capture_out_store *store, u32 *bytesleft,
+			       struct guc_mmio_reg *reg)
+{
+	int read = 0;
+	int fullsize = sizeof(struct guc_mmio_reg);
+
+	if (fullsize > *bytesleft)
+		return -1;
+
+	if (CIRC_CNT_TO_END(store->head, store->tail, store->size) >= fullsize) {
+		memcpy(reg, (store->addr + store->tail), fullsize);
+		store->tail = (store->tail + fullsize) & (store->size - 1);
+		*bytesleft -= fullsize;
+		return 0;
+	}
+
+	read += guc_capture_store_remove_dw(store, bytesleft, &reg->offset);
+	read += guc_capture_store_remove_dw(store, bytesleft, &reg->value);
+	read += guc_capture_store_remove_dw(store, bytesleft, &reg->flags);
+	read += guc_capture_store_remove_dw(store, bytesleft, &reg->mask);
+	if (read != sizeof(*reg))
+		return -1;
+
+	return 0;
+}
+
+static void guc_capture_store_drop_data(struct guc_capture_out_store *store,
+					unsigned long sampled_head)
+{
+	if (sampled_head == 0)
+		store->tail = store->size - 1;
+	else
+		store->tail = sampled_head - 1;
+}
+
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
+#define guc_capt_err_print(a, b, ...) \
+	do { \
+		drm_warn(a, __VA_ARGS__); \
+		if (b) \
+			i915_error_printf(b, __VA_ARGS__); \
+	} while (0)
+#else
+#define guc_capt_err_print(a, b, ...) \
+	do { \
+		if (b) \
+			i915_error_printf(b, __VA_ARGS__); \
+	} while (0)
+#endif
+
+static struct intel_engine_cs *
+guc_capture_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	u8 engine_class = guc_class_to_engine_class(guc_class);
+
+	/* Class index is checked in class converter */
+	GEM_BUG_ON(instance > MAX_ENGINE_INSTANCE);
+
+	return gt->engine_class[engine_class][instance];
+}
+
+#define PRINT guc_capt_err_print
+#define REGSTR guc_capture_register_to_string
+
+#define GCAP_PRINT_INTEL_ENG_INFO(i915, ebuf, eng) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Name: %s command stream\n", (eng)->name); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Inst-Class: 0x%02x\n", (eng)->class); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Inst-Id: 0x%02x\n", (eng)->instance); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-LogicalMask: 0x%08x\n", \
+		      (eng)->logical_mask); \
+	} while (0)
+
+#define GCAP_PRINT_GUC_INST_INFO(i915, ebuf, hdr) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    GuC-Engine-Inst-Id: 0x%08x\n", \
+		      (uint32_t)FIELD_GET(CAP_HDR_ENGINE_INSTANCE, (hdr).info)); \
+		PRINT(&i915->drm, (ebuf), "    GuC-Context-Id: 0x%08x\n", (hdr).guc_id); \
+		PRINT(&i915->drm, (ebuf), "    LRCA: 0x%08x\n", (hdr).lrca); \
+	} while (0)
+
+#define GCAP_PRINT_INTEL_CTX_INFO(i915, ebuf, ce) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    i915-Ctx-Flags: 0x%016lx\n", (ce)->flags); \
+		PRINT(&i915->drm, (ebuf), "    i915-Ctx-GuC-Id: 0x%016x\n", (ce)->guc_id.id); \
+	} while (0)
+
+#define GCAP_PRINT_BATCH(i915, ebuf, ee, batch) \
+	do { \
+		batch = intel_gpu_error_find_batch(ee); \
+		if (batch) { \
+			u64 start = batch->gtt_offset; \
+			u64 end = start + batch->gtt_size; \
+			PRINT(&i915->drm, (ebuf), "  batch: [0x%08x_%08x, 0x%08x_%08x]\n", \
+			   upper_32_bits(start), lower_32_bits(start), \
+			   upper_32_bits(end), lower_32_bits(end)); \
+		} \
+	} while (0)
+
+#define GCAP_PRINT_CONTEXT(i915, ebuf, ctx) \
+	do { \
+		const u32 period = to_gt(ebuf->i915)->clock_period_ns; \
+		PRINT(&i915->drm, (ebuf), "  Active context: %s[%d] prio %d, guilty %d " \
+		      "active %d, runtime total %lluns, avg %lluns\n", \
+		      ctx->comm, ctx->pid, ctx->sched_attr.priority, \
+		      ctx->guilty, ctx->active, \
+		      ctx->total_runtime * period, \
+		      mul_u32_u32(ctx->avg_runtime, period)); \
+	} while (0)
+
+int intel_guc_capture_out_print_next_group(struct drm_i915_error_state_buf *ebuf,
+					   struct intel_gt_coredump *gt)
+{
+	/* constant qualifier for data-pointers we shouldn't change mid of error dump printing */
+	struct intel_guc_state_capture *cap = gt->uc->capture;
+	struct intel_guc *guc = container_of(cap, struct intel_guc, capture);
+	struct drm_i915_private *i915 = (container_of(guc, struct intel_gt,
+						   uc.guc))->i915;
+	struct guc_capture_out_store *store;
+	struct guc_capture_out_store tmpstore;
+	struct guc_state_capture_group_header_t ghdr;
+	struct guc_state_capture_header_t hdr;
+	struct guc_mmio_reg reg;
+	const char *grptypestr[GUC_STATE_CAPTURE_GROUP_TYPE_MAX] = {"full-capture",
+								    "partial-capture"};
+	const char *datatypestr[GUC_CAPTURE_LIST_TYPE_MAX] = {"Global", "Engine-Class",
+							      "Engine-Instance"};
+	enum guc_capture_group_types grptype;
+	enum guc_capture_type datatype;
+	int numgrps, numregs, ret = 0;
+	const char *str;
+	char noname[16];
+	u32 numbytes, guc_engclss, guc_enginst, guc_lrca, guc_gucid, is_ext;
+	struct intel_engine_cs *eng;
+	const struct intel_engine_coredump *ee;
+	const struct i915_gem_context_coredump *ctx;
+	struct i915_vma_coredump *batch;
+
+	if (!cap->priv)
+		return -ENODEV;
+
+	store = &cap->priv->out_store;
+
+	mutex_lock(&store->lock);
+	smp_mb(); /* sync to get the latest head for the moment */
+	/* NOTE1: make a copy of store so we dont have to deal with a changing lower bound of
+	 *        occupied-space in this circular buffer.
+	 * NOTE2: Higher up the stack from here, we keep calling this function in a loop to
+	 *        reading more capture groups as they appear (as the lower bound of occupied-space
+	 *        changes) until this circ-buf is empty.
+	 */
+	memcpy(&tmpstore, store, sizeof(tmpstore));
+
+	PRINT(&i915->drm, ebuf, "global --- GuC Error Capture\n");
+
+	numbytes = CIRC_CNT(tmpstore.head, tmpstore.tail, tmpstore.size);
+	if (!numbytes) {
+		PRINT(&i915->drm, ebuf, "GuC err-capture parsing done\n");
+		ret = -ENODATA;
+		goto unlock;
+	}
+	/* everything in GuC output structures are dword aligned */
+	if (numbytes & 0x3) {
+		PRINT(&i915->drm, ebuf, "GuC capture stream unaligned!\n");
+		ret = -EIO;
+		goto unlock;
+	}
+
+	if (guc_capture_store_get_group_hdr(guc, &tmpstore, &numbytes, &ghdr)) {
+		PRINT(&i915->drm, ebuf, "GuC capture error getting next group-header!\n");
+		ret = -EIO;
+		goto unlock;
+	}
+
+	PRINT(&i915->drm, ebuf, "NumCaptures:  0x%08x\n", (uint32_t)
+	      FIELD_GET(CAP_GRP_HDR_NUM_CAPTURES, ghdr.info));
+	grptype = FIELD_GET(CAP_GRP_HDR_CAPTURE_TYPE, ghdr.info);
+	PRINT(&i915->drm, ebuf, "Coverage:  0x%08x = %s\n", grptype,
+	      grptypestr[grptype % GUC_STATE_CAPTURE_GROUP_TYPE_MAX]);
+
+	numgrps = FIELD_GET(CAP_GRP_HDR_NUM_CAPTURES, ghdr.info);
+	while (numgrps--) {
+		if (guc_capture_store_get_data_hdr(guc, &tmpstore, &numbytes, &hdr)) {
+			PRINT(&i915->drm, ebuf, "GuC capture error on next capture-header!\n");
+			ret = -EIO;
+			goto unlock;
+		}
+		datatype = FIELD_GET(CAP_HDR_CAPTURE_TYPE, hdr.info);
+		PRINT(&i915->drm, ebuf, "  RegListType: %s\n",
+		      datatypestr[datatype % GUC_CAPTURE_LIST_TYPE_MAX]);
+
+		eng = NULL;
+		guc_engclss = 0xffffffff;
+		guc_enginst = 0xffffffff;
+		guc_gucid = guc_lrca = 0;
+		guc_engclss = FIELD_GET(CAP_HDR_ENGINE_CLASS, hdr.info);
+		if (datatype != GUC_CAPTURE_LIST_TYPE_GLOBAL) {
+			PRINT(&i915->drm, ebuf, "    GuC-Engine-Class: %d\n",
+			      guc_engclss);
+			if (datatype == GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS &&
+			    guc_engclss <= GUC_LAST_ENGINE_CLASS)
+				PRINT(&i915->drm, ebuf, "    i915-Eng-Class: %d\n",
+				      guc_class_to_engine_class(guc_engclss));
+
+			if (datatype == GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE) {
+				guc_enginst = FIELD_GET(CAP_HDR_ENGINE_INSTANCE, hdr.info);
+				eng = guc_capture_lookup_engine(guc, guc_engclss, guc_enginst);
+				if (eng)
+					GCAP_PRINT_INTEL_ENG_INFO(i915, ebuf, eng);
+				else
+					PRINT(&i915->drm, ebuf,
+					      "    i915-Eng-Lookup Fail!\n");
+				guc_lrca = hdr.lrca;
+				guc_gucid = hdr.guc_id;
+				GCAP_PRINT_GUC_INST_INFO(i915, ebuf, hdr);
+			}
+		}
+		numregs = FIELD_GET(CAP_HDR_NUM_MMIOS, hdr.num_mmios);
+		PRINT(&i915->drm, ebuf, "    NumRegs: %d\n", numregs);
+
+		while (numregs--) {
+			if (guc_capture_store_get_register(guc, &tmpstore, &numbytes, &reg)) {
+				PRINT(&i915->drm, ebuf, "Error getting next register!\n");
+				ret = -EIO;
+				goto unlock;
+			}
+			str = REGSTR(guc, GUC_CAPTURE_LIST_INDEX_PF, datatype,
+				     guc_engclss, 0, reg.offset, &is_ext);
+			if (!str) {
+				snprintf(noname, sizeof(noname), "REG-0x%08x", reg.offset);
+				PRINT(&i915->drm, ebuf, "      %s", noname);
+			} else {
+				PRINT(&i915->drm, ebuf, "      %s", str);
+			}
+			if (is_ext)
+				PRINT(&i915->drm, ebuf, "[%ld][%ld]",
+				      FIELD_GET(GUC_REGSET_STEERING_GROUP, reg.flags),
+				      FIELD_GET(GUC_REGSET_STEERING_INSTANCE, reg.flags));
+			PRINT(&i915->drm, ebuf, ":  0x%08x\n", reg.value);
+		}
+		for (ee = gt->engine; ee; ee = ee->next) {
+			const struct i915_vma_coredump *vma;
+
+			if (ee->engine == eng &&
+			    guc_enginst == GUC_ID_TO_ENGINE_INSTANCE(ee->gucinfo.eng_id) &&
+			    guc_engclss == GUC_ID_TO_ENGINE_CLASS(ee->gucinfo.eng_id) &&
+			    ee->gucinfo.guc_id == guc_gucid &&
+			    (ee->gucinfo.lrca & CTX_GTT_ADDRESS_MASK) ==
+			    (guc_lrca & CTX_GTT_ADDRESS_MASK)) {
+				PRINT(&i915->drm, ebuf, "i915-Ctx-VMA-Matched:\n");
+				GCAP_PRINT_BATCH(i915, ebuf, ee, batch);
+				PRINT(&i915->drm, ebuf, "  engine reset count: %u\n",
+				      ee->reset_count);
+				ctx = &ee->context;
+				GCAP_PRINT_CONTEXT(i915, ebuf, ctx);
+
+				for (vma = ee->vma; vma; vma = vma->next)
+					intel_gpu_error_print_vma(ebuf, ee->engine, vma);
+			}
+		}
+	}
+
+	store->tail = tmpstore.tail;
+unlock:
+	/* if we have a stream error, just drop everything */
+	if (ret == -EIO) {
+		drm_warn(&i915->drm, "Skip GuC capture header print due to stream error\n");
+		guc_capture_store_drop_data(store, tmpstore.head);
+	}
+
+	mutex_unlock(&store->lock);
+
+	return ret;
+}
+
+#undef REGSTR
+#undef PRINT
+
+#endif //CONFIG_DRM_I915_DEBUG_GUC
+
 static void guc_capture_store_insert(struct intel_guc *guc, struct guc_capture_out_store *store,
 				     unsigned char *new_data, size_t bytes)
 {
@@ -846,6 +1261,30 @@ void intel_guc_capture_destroy(struct intel_guc *guc)
 	guc->capture.priv = NULL;
 }
 
+void intel_guc_capture_copy_info(struct intel_engine_coredump *ee, struct intel_context *ce)
+{
+	if (!ee || !ce)
+		return;
+	/*
+	 * Store GuC relatable information pertaining to the faulting
+	 * context into the intel_engine_coredump structure that we can
+	 * reference later during the debugfs triggered printout function
+	 * to ensure we print the vma dumps matching that match
+	 * the GuC register dumps
+	 */
+	ee->gucinfo.lrca = ce->lrc.lrca;
+	ee->gucinfo.guc_id = ce->guc_id.id;
+	ee->gucinfo.eng_id = ee->engine->guc_id;
+}
+
+struct intel_guc_state_capture *
+intel_guc_capture_store_ptr(struct intel_guc *guc)
+{
+	if (!guc->capture.priv)
+		return NULL;
+	return &guc->capture;
+}
+
 int intel_guc_capture_init(struct intel_guc *guc)
 {
 	int ret;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
index c240a4cc046b..37e29f76cda8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
@@ -8,15 +8,23 @@
 
 #include <linux/types.h>
 
-struct intel_guc;
+struct drm_i915_error_state_buf;
 struct guc_ads;
 struct guc_gt_system_info;
+struct intel_gt_coredump;
+struct intel_guc;
+struct intel_engine_coredump;
+struct intel_context;
 
 int intel_guc_capture_prep_lists(struct intel_guc *guc, struct guc_ads *blob, u32 blob_ggtt,
 				 u32 capture_offset, struct guc_gt_system_info *sysinfo);
+int intel_guc_capture_out_print_next_group(struct drm_i915_error_state_buf *m,
+					   struct intel_gt_coredump *gt);
+void intel_guc_capture_copy_info(struct intel_engine_coredump *ee, struct intel_context *ce);
 void intel_guc_capture_store_snapshot(struct intel_guc *guc);
 int intel_guc_capture_output_min_size_est(struct intel_guc *guc);
 void intel_guc_capture_destroy(struct intel_guc *guc);
+struct intel_guc_state_capture *intel_guc_capture_store_ptr(struct intel_guc *guc);
 int intel_guc_capture_init(struct intel_guc *guc);
 
 #endif /* _INTEL_GUC_CAPTURE_H */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 67f3515f07e7..4eeab55b4314 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -526,8 +526,8 @@ __find_vma(struct i915_vma_coredump *vma, const char *name)
 	return NULL;
 }
 
-static struct i915_vma_coredump *
-find_batch(const struct intel_engine_coredump *ee)
+struct i915_vma_coredump *
+intel_gpu_error_find_batch(const struct intel_engine_coredump *ee)
 {
 	return __find_vma(ee->vma, "batch");
 }
@@ -555,7 +555,7 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
 
 	error_print_instdone(m, ee);
 
-	batch = find_batch(ee);
+	batch = intel_gpu_error_find_batch(ee);
 	if (batch) {
 		u64 start = batch->gtt_offset;
 		u64 end = start + batch->gtt_size;
@@ -601,6 +601,16 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
 	error_print_context(m, "  Active context: ", &ee->context);
 }
 
+static void error_print_guc_captures(struct drm_i915_error_state_buf *m,
+				     struct intel_gt_coredump *gt)
+{
+	int ret;
+
+	do {
+		ret = intel_guc_capture_out_print_next_group(m, gt);
+	} while (!ret);
+}
+
 void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...)
 {
 	va_list args;
@@ -610,9 +620,9 @@ void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...)
 	va_end(args);
 }
 
-static void print_error_vma(struct drm_i915_error_state_buf *m,
-			    const struct intel_engine_cs *engine,
-			    const struct i915_vma_coredump *vma)
+void intel_gpu_error_print_vma(struct drm_i915_error_state_buf *m,
+			       const struct intel_engine_cs *engine,
+			       const struct i915_vma_coredump *vma)
 {
 	char out[ASCII85_BUFSZ];
 	struct page *page;
@@ -681,7 +691,7 @@ static void err_print_uc(struct drm_i915_error_state_buf *m,
 
 	intel_uc_fw_dump(&error_uc->guc_fw, &p);
 	intel_uc_fw_dump(&error_uc->huc_fw, &p);
-	print_error_vma(m, NULL, error_uc->guc_log);
+	intel_gpu_error_print_vma(m, NULL, error_uc->guc_log);
 }
 
 static void err_free_sgl(struct scatterlist *sgl)
@@ -766,12 +776,17 @@ static void err_print_gt(struct drm_i915_error_state_buf *m,
 		err_printf(m, "  GAM_DONE: 0x%08x\n", gt->gam_done);
 	}
 
-	for (ee = gt->engine; ee; ee = ee->next) {
-		const struct i915_vma_coredump *vma;
+	if (gt->uc && gt->uc->capture) {
+		/* error capture was via GuC */
+		error_print_guc_captures(m, gt);
+	} else {
+		for (ee = gt->engine; ee; ee = ee->next) {
+			const struct i915_vma_coredump *vma;
 
-		error_print_engine(m, ee);
-		for (vma = ee->vma; vma; vma = vma->next)
-			print_error_vma(m, ee->engine, vma);
+			error_print_engine(m, ee);
+			for (vma = ee->vma; vma; vma = vma->next)
+				intel_gpu_error_print_vma(m, ee->engine, vma);
+		}
 	}
 
 	if (gt->uc)
@@ -1146,7 +1161,7 @@ static void gt_record_fences(struct intel_gt_coredump *gt)
 	gt->nfence = i;
 }
 
-static void engine_record_registers(struct intel_engine_coredump *ee)
+static void engine_record_registers_execlist(struct intel_engine_coredump *ee)
 {
 	const struct intel_engine_cs *engine = ee->engine;
 	struct drm_i915_private *i915 = engine->i915;
@@ -1443,8 +1458,10 @@ intel_engine_coredump_alloc(struct intel_engine_cs *engine, gfp_t gfp)
 
 	ee->engine = engine;
 
-	engine_record_registers(ee);
-	engine_record_execlists(ee);
+	if (!intel_uc_uses_guc_submission(&engine->gt->uc)) {
+		engine_record_registers_execlist(ee);
+		engine_record_execlists(ee);
+	}
 
 	return ee;
 }
@@ -1515,11 +1532,14 @@ capture_engine(struct intel_engine_cs *engine,
 	struct intel_context *ce;
 	struct i915_request *rq = NULL;
 	unsigned long flags;
+	bool guc_submission = false;
 
 	ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
 	if (!ee)
 		return NULL;
 
+	guc_submission = intel_uc_uses_guc_submission(&engine->gt->uc);
+
 	ce = intel_engine_get_hung_context(engine);
 	if (ce) {
 		intel_engine_clear_hung_context(engine);
@@ -1531,7 +1551,7 @@ capture_engine(struct intel_engine_cs *engine,
 		 * Getting here with GuC enabled means it is a forced error capture
 		 * with no actual hang. So, no need to attempt the execlist search.
 		 */
-		if (!intel_uc_uses_guc_submission(&engine->gt->uc)) {
+		if (!guc_submission) {
 			spin_lock_irqsave(&engine->sched_engine->lock, flags);
 			rq = intel_engine_execlist_find_hung_request(engine);
 			spin_unlock_irqrestore(&engine->sched_engine->lock,
@@ -1549,6 +1569,8 @@ capture_engine(struct intel_engine_cs *engine,
 		i915_request_put(rq);
 		goto no_request_capture;
 	}
+	if (guc_submission)
+		intel_guc_capture_copy_info(ee, ce);
 
 	intel_engine_coredump_add_vma(ee, capture, compress);
 	i915_request_put(rq);
@@ -1617,8 +1639,8 @@ gt_record_uc(struct intel_gt_coredump *gt,
 	return error_uc;
 }
 
-/* Capture all registers which don't fit into another category. */
-static void gt_record_regs(struct intel_gt_coredump *gt)
+/* Capture all global registers which don't fit into another category. */
+static void gt_record_registers_execlist(struct intel_gt_coredump *gt)
 {
 	struct intel_uncore *uncore = gt->_gt->uncore;
 	struct drm_i915_private *i915 = uncore->i915;
@@ -1862,7 +1884,9 @@ intel_gt_coredump_alloc(struct intel_gt *gt, gfp_t gfp)
 	gc->_gt = gt;
 	gc->awake = intel_gt_pm_is_awake(gt);
 
-	gt_record_regs(gc);
+	if (!intel_uc_uses_guc_submission(&gt->uc))
+		gt_record_registers_execlist(gc);
+
 	gt_record_fences(gc);
 
 	return gc;
@@ -1927,6 +1951,9 @@ __i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask)
 		if (INTEL_INFO(i915)->has_gt_uc)
 			error->gt->uc = gt_record_uc(error->gt, compress);
 
+		if (intel_uc_uses_guc_submission(&gt->uc))
+			error->gt->uc->capture = intel_guc_capture_store_ptr(&gt->uc.guc);
+
 		i915_vma_capture_finish(error->gt, compress);
 
 		error->simulated |= error->gt->simulated;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 5aedf5129814..576677c2888e 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -17,6 +17,7 @@
 #include "gt/intel_engine.h"
 #include "gt/intel_gt_types.h"
 #include "gt/uc/intel_uc_fw.h"
+#include "gt/uc/intel_guc_capture.h"
 
 #include "intel_device_info.h"
 
@@ -84,6 +85,13 @@ struct intel_engine_coredump {
 	u32 rc_psmi; /* sleep state */
 	struct intel_instdone instdone;
 
+	/* GuC correlated info */
+	struct {
+		u32 lrca;
+		u16 guc_id;
+		u32 eng_id;
+	} gucinfo;
+
 	struct i915_gem_context_coredump {
 		char comm[TASK_COMM_LEN];
 
@@ -149,6 +157,7 @@ struct intel_gt_coredump {
 		struct intel_uc_fw guc_fw;
 		struct intel_uc_fw huc_fw;
 		struct i915_vma_coredump *guc_log;
+		struct intel_guc_state_capture *capture;
 	} *uc;
 
 	struct intel_gt_coredump *next;
@@ -214,6 +223,11 @@ struct drm_i915_error_state_buf {
 
 __printf(2, 3)
 void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...);
+void intel_gpu_error_print_vma(struct drm_i915_error_state_buf *m,
+			       const struct intel_engine_cs *engine,
+			       const struct i915_vma_coredump *vma);
+struct i915_vma_coredump *
+intel_gpu_error_find_batch(const struct intel_engine_coredump *ee);
 
 struct i915_gpu_coredump *i915_gpu_coredump(struct intel_gt *gt,
 					    intel_engine_mask_t engine_mask);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add GuC Error Capture Support (rev4)
  2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
                   ` (7 preceding siblings ...)
  (?)
@ 2022-01-18 10:16 ` Patchwork
  -1 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2022-01-18 10:16 UTC (permalink / raw)
  To: Alan Previn; +Cc: intel-gfx

== Series Details ==

Series: Add GuC Error Capture Support (rev4)
URL   : https://patchwork.freedesktop.org/series/97187/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
c5b79a260782 drm/i915/guc: Update GuC ADS size for error capture lists
-:32: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#32: 
new file mode 100644

-:307: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'regslist' - possible side-effects?
#307: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:60:
+#define MAKE_REGLIST(regslist, regsowner, regstype, class) \
+	{ \
+		.list = regslist, \
+		.num_regs = ARRAY_SIZE(regslist), \
+		.owner = TO_GCAP_DEF_OWNER(regsowner), \
+		.type = TO_GCAP_DEF_TYPE(regstype), \
+		.engine = class, \
+	}

-:356: WARNING:SUSPECT_CODE_INDENT: suspect code indent for conditional statements (16, 16)
#356: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:109:
+		if (reglists[i].owner == owner && reglists[i].type == type &&
[...]
+		return &reglists[i];

-:423: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#423: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:176:
+		drm_dbg(&i915->drm, "GuC-capture: %s for %s %s-Registers.\n", msg,
+			 guc_capture_stringify_owner(owner), guc_capture_stringify_type(type));

-:426: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#426: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:179:
+		drm_dbg(&i915->drm, "GuC-capture: %s for %s %s-Registers on %s-Engine\n", msg,
+			 guc_capture_stringify_owner(owner), guc_capture_stringify_type(type),

total: 0 errors, 2 warnings, 3 checks, 681 lines checked
68bc5c4ac7f8 drm/i915/guc: Add XE_LP registers for GuC error state capture.
-:37: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#37: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:22:
+#define COMMON_GEN12BASE_GLOBAL() \
+	{GEN12_FAULT_TLB_DATA0,    0,      0, "GEN12_FAULT_TLB_DATA0"}, \
+	{GEN12_FAULT_TLB_DATA1,    0,      0, "GEN12_FAULT_TLB_DATA1"}, \
+	{FORCEWAKE_MT,             0,      0, "FORCEWAKE_MT"}, \
+	{DERRMR,                   0,      0, "DERRMR"}, \
+	{GEN12_AUX_ERR_DBG,        0,      0, "GEN12_AUX_ERR_DBG"}, \
+	{GEN12_GAM_DONE,           0,      0, "GEN12_GAM_DONE"}, \
+	{GEN11_GUC_SG_INTR_ENABLE, 0,      0, "GEN11_GUC_SG_INTR_ENABLE"}, \
+	{GEN11_CRYPTO_RSVD_INTR_ENABLE, 0, 0, "GEN11_CRYPTO_RSVD_INTR_ENABLE"}, \
+	{GEN11_GUNIT_CSME_INTR_ENABLE, 0,  0, "GEN11_GUNIT_CSME_INTR_ENABLE"}, \
+	{GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0, 0, "GEN11_GPM_WGBOXPERF_INTR_ENABLE"}, \
+	{GEN8_DE_MISC_IER,         0,      0, "GEN8_DE_MISC_IER"}, \
+	{GEN12_RING_FAULT_REG,     0,      0, "GEN12_RING_FAULT_REG"}

-:51: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#51: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:36:
+#define COMMON_GEN12BASE_ENGINE_INSTANCE() \
+	{RING_PSMI_CTL(0),         0,      0, "RING_PSMI_CTL"}, \
+	{RING_ESR(0),              0,      0, "RING_ESR"}, \
+	{RING_DMA_FADD(0),         0,      0, "RING_DMA_FADD_LOW32"}, \
+	{RING_DMA_FADD_UDW(0),     0,      0, "RING_DMA_FADD_UP32"}, \
+	{RING_IPEIR(0),            0,      0, "RING_IPEIR"}, \
+	{RING_IPEHR(0),            0,      0, "RING_IPEHR"}, \
+	{RING_INSTPS(0),           0,      0, "RING_INSTPS"}, \
+	{RING_BBADDR(0),           0,      0, "RING_BBADDR_LOW32"}, \
+	{RING_BBADDR_UDW(0),       0,      0, "RING_BBADDR_UP32"}, \
+	{RING_BBSTATE(0),          0,      0, "RING_BBSTATE"}, \
+	{CCID(0),                  0,      0, "CCID"}, \
+	{RING_ACTHD(0),            0,      0, "RING_ACTHD_LOW32"}, \
+	{RING_ACTHD_UDW(0),        0,      0, "RING_ACTHD_UP32"}, \
+	{RING_INSTPM(0),           0,      0, "RING_INSTPM"}, \
+	{RING_NOPID(0),            0,      0, "RING_NOPID"}, \
+	{RING_START(0),            0,      0, "RING_START"}, \
+	{RING_HEAD(0),             0,      0, "RING_HEAD"}, \
+	{RING_TAIL(0),             0,      0, "RING_TAIL"}, \
+	{RING_CTL(0),              0,      0, "RING_CTL"}, \
+	{RING_MI_MODE(0),          0,      0, "RING_MI_MODE"}, \
+	{RING_CONTEXT_CONTROL(0),  0,      0, "RING_CONTEXT_CONTROL"}, \
+	{RING_INSTDONE(0),         0,      0, "RING_INSTDONE"}, \
+	{RING_HWS_PGA(0),          0,      0, "RING_HWS_PGA"}, \
+	{RING_MODE_GEN7(0),        0,      0, "RING_MODE_GEN7"}, \
+	{GEN8_RING_PDP_LDW(0, 0),  0,      0, "GEN8_RING_PDP0_LDW"}, \
+	{GEN8_RING_PDP_UDW(0, 0),  0,      0, "GEN8_RING_PDP0_UDW"}, \
+	{GEN8_RING_PDP_LDW(0, 1),  0,      0, "GEN8_RING_PDP1_LDW"}, \
+	{GEN8_RING_PDP_UDW(0, 1),  0,      0, "GEN8_RING_PDP1_UDW"}, \
+	{GEN8_RING_PDP_LDW(0, 2),  0,      0, "GEN8_RING_PDP2_LDW"}, \
+	{GEN8_RING_PDP_UDW(0, 2),  0,      0, "GEN8_RING_PDP2_UDW"}, \
+	{GEN8_RING_PDP_LDW(0, 3),  0,      0, "GEN8_RING_PDP3_LDW"}, \
+	{GEN8_RING_PDP_UDW(0, 3),  0,      0, "GEN8_RING_PDP3_UDW"}

-:88: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#88: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:73:
+#define COMMON_GEN12BASE_RENDER() \
+	{GEN7_SC_INSTDONE,         0,      0, "GEN7_SC_INSTDONE"}, \
+	{GEN12_SC_INSTDONE_EXTRA,  0,      0, "GEN12_SC_INSTDONE_EXTRA"}, \
+	{GEN12_SC_INSTDONE_EXTRA2, 0,      0, "GEN12_SC_INSTDONE_EXTRA2"}

-:93: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#93: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:78:
+#define COMMON_GEN12BASE_VEC() \
+	{GEN11_VCS_VECS_INTR_ENABLE, 0,    0, "GEN11_VCS_VECS_INTR_ENABLE"}, \
+	{GEN12_SFC_DONE(0),        0,      0, "GEN12_SFC_DONE0"}, \
+	{GEN12_SFC_DONE(1),        0,      0, "GEN12_SFC_DONE1"}, \
+	{GEN12_SFC_DONE(2),        0,      0, "GEN12_SFC_DONE2"}, \
+	{GEN12_SFC_DONE(3),        0,      0, "GEN12_SFC_DONE3"}

-:180: WARNING:SUSPECT_CODE_INDENT: suspect code indent for conditional statements (16, 16)
#180: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:166:
+		if (reglists[i].owner == owner && reglists[i].type == type &&
[...]
+		return &reglists[i];

total: 4 errors, 1 warnings, 0 checks, 310 lines checked
7febfa438109 drm/i915/guc: Add DG2 registers for GuC error state capture.
-:79: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ext' - possible side-effects?
#79: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:242:
+#define POPULATE_NEXT_EXTREG(ext, list, idx, slicenum, subslicenum) \
+	{ \
+		ext->reg = list[idx].reg; \
+		ext->flags = FIELD_PREP(GUC_REGSET_STEERING_GROUP, slicenum); \
+		ext->flags |= FIELD_PREP(GUC_REGSET_STEERING_INSTANCE, subslicenum); \
+		ext->regname = xelpd_extregs[i].name; \
+		++ext; \
+	}

-:79: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'ext' may be better as '(ext)' to avoid precedence issues
#79: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:242:
+#define POPULATE_NEXT_EXTREG(ext, list, idx, slicenum, subslicenum) \
+	{ \
+		ext->reg = list[idx].reg; \
+		ext->flags = FIELD_PREP(GUC_REGSET_STEERING_GROUP, slicenum); \
+		ext->flags |= FIELD_PREP(GUC_REGSET_STEERING_INSTANCE, subslicenum); \
+		ext->regname = xelpd_extregs[i].name; \
+		++ext; \
+	}

total: 0 errors, 0 warnings, 2 checks, 100 lines checked
13acbdb9a7ca drm/i915/guc: Add GuC's error state capture output structures.
b484fd64c7ba drm/i915/guc: Update GuC's log-buffer-state access for error capture.
-:191: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#191: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_log.c:218:
+	log_buf_state = src_data = log->buf_addr;

total: 0 errors, 0 warnings, 1 checks, 436 lines checked
dcf143ea1f41 drm/i915/guc: Copy new GuC error capture logs upon G2H notification.
e6a40bf716e6 drm/i915/guc: Print the GuC error capture output register list.
-:226: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'b' - possible side-effects?
#226: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:869:
+#define guc_capt_err_print(a, b, ...) \
+	do { \
+		drm_warn(a, __VA_ARGS__); \
+		if (b) \
+			i915_error_printf(b, __VA_ARGS__); \
+	} while (0)

-:233: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'b' - possible side-effects?
#233: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:876:
+#define guc_capt_err_print(a, b, ...) \
+	do { \
+		if (b) \
+			i915_error_printf(b, __VA_ARGS__); \
+	} while (0)

-:255: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'i915' - possible side-effects?
#255: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:898:
+#define GCAP_PRINT_INTEL_ENG_INFO(i915, ebuf, eng) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Name: %s command stream\n", (eng)->name); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Inst-Class: 0x%02x\n", (eng)->class); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Inst-Id: 0x%02x\n", (eng)->instance); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-LogicalMask: 0x%08x\n", \
+		      (eng)->logical_mask); \
+	} while (0)

-:255: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'i915' may be better as '(i915)' to avoid precedence issues
#255: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:898:
+#define GCAP_PRINT_INTEL_ENG_INFO(i915, ebuf, eng) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Name: %s command stream\n", (eng)->name); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Inst-Class: 0x%02x\n", (eng)->class); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Inst-Id: 0x%02x\n", (eng)->instance); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-LogicalMask: 0x%08x\n", \
+		      (eng)->logical_mask); \
+	} while (0)

-:255: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ebuf' - possible side-effects?
#255: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:898:
+#define GCAP_PRINT_INTEL_ENG_INFO(i915, ebuf, eng) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Name: %s command stream\n", (eng)->name); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Inst-Class: 0x%02x\n", (eng)->class); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Inst-Id: 0x%02x\n", (eng)->instance); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-LogicalMask: 0x%08x\n", \
+		      (eng)->logical_mask); \
+	} while (0)

-:255: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'eng' - possible side-effects?
#255: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:898:
+#define GCAP_PRINT_INTEL_ENG_INFO(i915, ebuf, eng) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Name: %s command stream\n", (eng)->name); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Inst-Class: 0x%02x\n", (eng)->class); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-Inst-Id: 0x%02x\n", (eng)->instance); \
+		PRINT(&i915->drm, (ebuf), "    i915-Eng-LogicalMask: 0x%08x\n", \
+		      (eng)->logical_mask); \
+	} while (0)

-:264: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'i915' - possible side-effects?
#264: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:907:
+#define GCAP_PRINT_GUC_INST_INFO(i915, ebuf, hdr) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    GuC-Engine-Inst-Id: 0x%08x\n", \
+		      (uint32_t)FIELD_GET(CAP_HDR_ENGINE_INSTANCE, (hdr).info)); \
+		PRINT(&i915->drm, (ebuf), "    GuC-Context-Id: 0x%08x\n", (hdr).guc_id); \
+		PRINT(&i915->drm, (ebuf), "    LRCA: 0x%08x\n", (hdr).lrca); \
+	} while (0)

-:264: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'i915' may be better as '(i915)' to avoid precedence issues
#264: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:907:
+#define GCAP_PRINT_GUC_INST_INFO(i915, ebuf, hdr) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    GuC-Engine-Inst-Id: 0x%08x\n", \
+		      (uint32_t)FIELD_GET(CAP_HDR_ENGINE_INSTANCE, (hdr).info)); \
+		PRINT(&i915->drm, (ebuf), "    GuC-Context-Id: 0x%08x\n", (hdr).guc_id); \
+		PRINT(&i915->drm, (ebuf), "    LRCA: 0x%08x\n", (hdr).lrca); \
+	} while (0)

-:264: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ebuf' - possible side-effects?
#264: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:907:
+#define GCAP_PRINT_GUC_INST_INFO(i915, ebuf, hdr) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    GuC-Engine-Inst-Id: 0x%08x\n", \
+		      (uint32_t)FIELD_GET(CAP_HDR_ENGINE_INSTANCE, (hdr).info)); \
+		PRINT(&i915->drm, (ebuf), "    GuC-Context-Id: 0x%08x\n", (hdr).guc_id); \
+		PRINT(&i915->drm, (ebuf), "    LRCA: 0x%08x\n", (hdr).lrca); \
+	} while (0)

-:264: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'hdr' - possible side-effects?
#264: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:907:
+#define GCAP_PRINT_GUC_INST_INFO(i915, ebuf, hdr) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    GuC-Engine-Inst-Id: 0x%08x\n", \
+		      (uint32_t)FIELD_GET(CAP_HDR_ENGINE_INSTANCE, (hdr).info)); \
+		PRINT(&i915->drm, (ebuf), "    GuC-Context-Id: 0x%08x\n", (hdr).guc_id); \
+		PRINT(&i915->drm, (ebuf), "    LRCA: 0x%08x\n", (hdr).lrca); \
+	} while (0)

-:272: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'i915' - possible side-effects?
#272: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:915:
+#define GCAP_PRINT_INTEL_CTX_INFO(i915, ebuf, ce) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    i915-Ctx-Flags: 0x%016lx\n", (ce)->flags); \
+		PRINT(&i915->drm, (ebuf), "    i915-Ctx-GuC-Id: 0x%016x\n", (ce)->guc_id.id); \
+	} while (0)

-:272: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'i915' may be better as '(i915)' to avoid precedence issues
#272: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:915:
+#define GCAP_PRINT_INTEL_CTX_INFO(i915, ebuf, ce) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    i915-Ctx-Flags: 0x%016lx\n", (ce)->flags); \
+		PRINT(&i915->drm, (ebuf), "    i915-Ctx-GuC-Id: 0x%016x\n", (ce)->guc_id.id); \
+	} while (0)

-:272: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ebuf' - possible side-effects?
#272: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:915:
+#define GCAP_PRINT_INTEL_CTX_INFO(i915, ebuf, ce) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    i915-Ctx-Flags: 0x%016lx\n", (ce)->flags); \
+		PRINT(&i915->drm, (ebuf), "    i915-Ctx-GuC-Id: 0x%016x\n", (ce)->guc_id.id); \
+	} while (0)

-:272: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ce' - possible side-effects?
#272: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:915:
+#define GCAP_PRINT_INTEL_CTX_INFO(i915, ebuf, ce) \
+	do { \
+		PRINT(&i915->drm, (ebuf), "    i915-Ctx-Flags: 0x%016lx\n", (ce)->flags); \
+		PRINT(&i915->drm, (ebuf), "    i915-Ctx-GuC-Id: 0x%016x\n", (ce)->guc_id.id); \
+	} while (0)

-:278: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'i915' may be better as '(i915)' to avoid precedence issues
#278: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:921:
+#define GCAP_PRINT_BATCH(i915, ebuf, ee, batch) \
+	do { \
+		batch = intel_gpu_error_find_batch(ee); \
+		if (batch) { \
+			u64 start = batch->gtt_offset; \
+			u64 end = start + batch->gtt_size; \
+			PRINT(&i915->drm, (ebuf), "  batch: [0x%08x_%08x, 0x%08x_%08x]\n", \
+			   upper_32_bits(start), lower_32_bits(start), \
+			   upper_32_bits(end), lower_32_bits(end)); \
+		} \
+	} while (0)

-:278: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'batch' - possible side-effects?
#278: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:921:
+#define GCAP_PRINT_BATCH(i915, ebuf, ee, batch) \
+	do { \
+		batch = intel_gpu_error_find_batch(ee); \
+		if (batch) { \
+			u64 start = batch->gtt_offset; \
+			u64 end = start + batch->gtt_size; \
+			PRINT(&i915->drm, (ebuf), "  batch: [0x%08x_%08x, 0x%08x_%08x]\n", \
+			   upper_32_bits(start), lower_32_bits(start), \
+			   upper_32_bits(end), lower_32_bits(end)); \
+		} \
+	} while (0)

-:290: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'i915' - possible side-effects?
#290: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:933:
+#define GCAP_PRINT_CONTEXT(i915, ebuf, ctx) \
+	do { \
+		const u32 period = to_gt(ebuf->i915)->clock_period_ns; \
+		PRINT(&i915->drm, (ebuf), "  Active context: %s[%d] prio %d, guilty %d " \
+		      "active %d, runtime total %lluns, avg %lluns\n", \
+		      ctx->comm, ctx->pid, ctx->sched_attr.priority, \
+		      ctx->guilty, ctx->active, \
+		      ctx->total_runtime * period, \
+		      mul_u32_u32(ctx->avg_runtime, period)); \
+	} while (0)

-:290: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'i915' may be better as '(i915)' to avoid precedence issues
#290: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:933:
+#define GCAP_PRINT_CONTEXT(i915, ebuf, ctx) \
+	do { \
+		const u32 period = to_gt(ebuf->i915)->clock_period_ns; \
+		PRINT(&i915->drm, (ebuf), "  Active context: %s[%d] prio %d, guilty %d " \
+		      "active %d, runtime total %lluns, avg %lluns\n", \
+		      ctx->comm, ctx->pid, ctx->sched_attr.priority, \
+		      ctx->guilty, ctx->active, \
+		      ctx->total_runtime * period, \
+		      mul_u32_u32(ctx->avg_runtime, period)); \
+	} while (0)

-:290: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ebuf' - possible side-effects?
#290: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:933:
+#define GCAP_PRINT_CONTEXT(i915, ebuf, ctx) \
+	do { \
+		const u32 period = to_gt(ebuf->i915)->clock_period_ns; \
+		PRINT(&i915->drm, (ebuf), "  Active context: %s[%d] prio %d, guilty %d " \
+		      "active %d, runtime total %lluns, avg %lluns\n", \
+		      ctx->comm, ctx->pid, ctx->sched_attr.priority, \
+		      ctx->guilty, ctx->active, \
+		      ctx->total_runtime * period, \
+		      mul_u32_u32(ctx->avg_runtime, period)); \
+	} while (0)

-:290: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'ebuf' may be better as '(ebuf)' to avoid precedence issues
#290: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:933:
+#define GCAP_PRINT_CONTEXT(i915, ebuf, ctx) \
+	do { \
+		const u32 period = to_gt(ebuf->i915)->clock_period_ns; \
+		PRINT(&i915->drm, (ebuf), "  Active context: %s[%d] prio %d, guilty %d " \
+		      "active %d, runtime total %lluns, avg %lluns\n", \
+		      ctx->comm, ctx->pid, ctx->sched_attr.priority, \
+		      ctx->guilty, ctx->active, \
+		      ctx->total_runtime * period, \
+		      mul_u32_u32(ctx->avg_runtime, period)); \
+	} while (0)

-:290: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ctx' - possible side-effects?
#290: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:933:
+#define GCAP_PRINT_CONTEXT(i915, ebuf, ctx) \
+	do { \
+		const u32 period = to_gt(ebuf->i915)->clock_period_ns; \
+		PRINT(&i915->drm, (ebuf), "  Active context: %s[%d] prio %d, guilty %d " \
+		      "active %d, runtime total %lluns, avg %lluns\n", \
+		      ctx->comm, ctx->pid, ctx->sched_attr.priority, \
+		      ctx->guilty, ctx->active, \
+		      ctx->total_runtime * period, \
+		      mul_u32_u32(ctx->avg_runtime, period)); \
+	} while (0)

-:290: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'ctx' may be better as '(ctx)' to avoid precedence issues
#290: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:933:
+#define GCAP_PRINT_CONTEXT(i915, ebuf, ctx) \
+	do { \
+		const u32 period = to_gt(ebuf->i915)->clock_period_ns; \
+		PRINT(&i915->drm, (ebuf), "  Active context: %s[%d] prio %d, guilty %d " \
+		      "active %d, runtime total %lluns, avg %lluns\n", \
+		      ctx->comm, ctx->pid, ctx->sched_attr.priority, \
+		      ctx->guilty, ctx->active, \
+		      ctx->total_runtime * period, \
+		      mul_u32_u32(ctx->avg_runtime, period)); \
+	} while (0)

-:385: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#385: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c:1028:
+		guc_gucid = guc_lrca = 0;

total: 0 errors, 0 warnings, 23 checks, 680 lines checked



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Add GuC Error Capture Support (rev4)
  2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
                   ` (8 preceding siblings ...)
  (?)
@ 2022-01-18 10:17 ` Patchwork
  -1 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2022-01-18 10:17 UTC (permalink / raw)
  To: Alan Previn; +Cc: intel-gfx

== Series Details ==

Series: Add GuC Error Capture Support (rev4)
URL   : https://patchwork.freedesktop.org/series/97187/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for Add GuC Error Capture Support (rev4)
  2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
                   ` (9 preceding siblings ...)
  (?)
@ 2022-01-18 10:49 ` Patchwork
  -1 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2022-01-18 10:49 UTC (permalink / raw)
  To: Alan Previn; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 10835 bytes --]

== Series Details ==

Series: Add GuC Error Capture Support (rev4)
URL   : https://patchwork.freedesktop.org/series/97187/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11094 -> Patchwork_22011
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_22011 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_22011, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/index.html

Participating hosts (46 -> 37)
------------------------------

  Additional (2): fi-kbl-soraka fi-pnv-d510 
  Missing    (11): fi-bdw-samus shard-tglu bat-dg1-6 bat-dg1-5 fi-bsw-cyan bat-adlp-6 bat-rpls-1 shard-rkl shard-dg1 bat-jsl-2 bat-jsl-1 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_22011:

### IGT changes ###

#### Possible regressions ####

  * igt@core_hotunplug@unbind-rebind:
    - fi-cfl-guc:         [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-cfl-guc/igt@core_hotunplug@unbind-rebind.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-cfl-guc/igt@core_hotunplug@unbind-rebind.html
    - fi-kbl-guc:         [PASS][3] -> [INCOMPLETE][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-kbl-guc/igt@core_hotunplug@unbind-rebind.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-kbl-guc/igt@core_hotunplug@unbind-rebind.html
    - fi-skl-guc:         [PASS][5] -> [INCOMPLETE][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-skl-guc/igt@core_hotunplug@unbind-rebind.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-skl-guc/igt@core_hotunplug@unbind-rebind.html

  * igt@i915_selftest@live@hangcheck:
    - fi-rkl-guc:         [PASS][7] -> [DMESG-WARN][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-rkl-guc/igt@i915_selftest@live@hangcheck.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-rkl-guc/igt@i915_selftest@live@hangcheck.html

  
Known issues
------------

  Here are the changes found in Patchwork_22011 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@semaphore:
    - fi-bdw-5557u:       NOTRUN -> [SKIP][9] ([fdo#109271]) +31 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-bdw-5557u/igt@amdgpu/amd_basic@semaphore.html

  * igt@amdgpu/amd_cs_nop@sync-fork-compute0:
    - fi-snb-2600:        NOTRUN -> [SKIP][10] ([fdo#109271]) +17 similar issues
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-snb-2600/igt@amdgpu/amd_cs_nop@sync-fork-compute0.html

  * igt@gem_exec_fence@basic-busy@bcs0:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][11] ([fdo#109271]) +8 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-kbl-soraka/igt@gem_exec_fence@basic-busy@bcs0.html

  * igt@gem_flink_basic@bad-flink:
    - fi-skl-6600u:       [PASS][12] -> [INCOMPLETE][13] ([i915#4547])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-skl-6600u/igt@gem_flink_basic@bad-flink.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-skl-6600u/igt@gem_flink_basic@bad-flink.html

  * igt@gem_huc_copy@huc-copy:
    - fi-pnv-d510:        NOTRUN -> [SKIP][14] ([fdo#109271]) +57 similar issues
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-pnv-d510/igt@gem_huc_copy@huc-copy.html
    - fi-kbl-soraka:      NOTRUN -> [SKIP][15] ([fdo#109271] / [i915#2190])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-kbl-soraka/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@parallel-random-engines:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][16] ([fdo#109271] / [i915#4613]) +3 similar issues
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-kbl-soraka/igt@gem_lmem_swapping@parallel-random-engines.html

  * igt@i915_selftest@live@execlists:
    - fi-bsw-kefka:       [PASS][17] -> [INCOMPLETE][18] ([i915#2940])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-bsw-kefka/igt@i915_selftest@live@execlists.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-bsw-kefka/igt@i915_selftest@live@execlists.html

  * igt@i915_selftest@live@gt_pm:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-FAIL][19] ([i915#1886] / [i915#2291])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@requests:
    - fi-blb-e6850:       [PASS][20] -> [DMESG-FAIL][21] ([i915#4528])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-blb-e6850/igt@i915_selftest@live@requests.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-blb-e6850/igt@i915_selftest@live@requests.html

  * igt@kms_chamelium@dp-edid-read:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][22] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-kbl-soraka/igt@kms_chamelium@dp-edid-read.html

  * igt@kms_chamelium@vga-edid-read:
    - fi-bdw-5557u:       NOTRUN -> [SKIP][23] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-bdw-5557u/igt@kms_chamelium@vga-edid-read.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][24] ([fdo#109271] / [i915#533])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-kbl-soraka/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@runner@aborted:
    - fi-cfl-guc:         NOTRUN -> [FAIL][25] ([i915#2426] / [i915#4312])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-cfl-guc/igt@runner@aborted.html
    - fi-skl-guc:         NOTRUN -> [FAIL][26] ([i915#2426] / [i915#4312])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-skl-guc/igt@runner@aborted.html
    - fi-blb-e6850:       NOTRUN -> [FAIL][27] ([fdo#109271] / [i915#2403] / [i915#4312])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-blb-e6850/igt@runner@aborted.html
    - fi-bsw-kefka:       NOTRUN -> [FAIL][28] ([fdo#109271] / [i915#1436] / [i915#3428] / [i915#4312])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-bsw-kefka/igt@runner@aborted.html
    - fi-kbl-guc:         NOTRUN -> [FAIL][29] ([i915#2426] / [i915#4312])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-kbl-guc/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3@smem:
    - fi-bdw-5557u:       [INCOMPLETE][30] ([i915#146]) -> [PASS][31]
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-bdw-5557u/igt@gem_exec_suspend@basic-s3@smem.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-bdw-5557u/igt@gem_exec_suspend@basic-s3@smem.html

  * igt@i915_selftest@live@gt_heartbeat:
    - {fi-tgl-dsi}:       [INCOMPLETE][32] -> [PASS][33]
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-tgl-dsi/igt@i915_selftest@live@gt_heartbeat.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-tgl-dsi/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@hangcheck:
    - fi-snb-2600:        [INCOMPLETE][34] ([i915#3921]) -> [PASS][35]
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-snb-2600/igt@i915_selftest@live@hangcheck.html

  
#### Warnings ####

  * igt@runner@aborted:
    - fi-skl-6600u:       [FAIL][36] ([i915#4312]) -> [FAIL][37] ([i915#2722] / [i915#4312])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11094/fi-skl-6600u/igt@runner@aborted.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/fi-skl-6600u/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1436]: https://gitlab.freedesktop.org/drm/intel/issues/1436
  [i915#146]: https://gitlab.freedesktop.org/drm/intel/issues/146
  [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2291]: https://gitlab.freedesktop.org/drm/intel/issues/2291
  [i915#2403]: https://gitlab.freedesktop.org/drm/intel/issues/2403
  [i915#2426]: https://gitlab.freedesktop.org/drm/intel/issues/2426
  [i915#2575]: https://gitlab.freedesktop.org/drm/intel/issues/2575
  [i915#2722]: https://gitlab.freedesktop.org/drm/intel/issues/2722
  [i915#2940]: https://gitlab.freedesktop.org/drm/intel/issues/2940
  [i915#3428]: https://gitlab.freedesktop.org/drm/intel/issues/3428
  [i915#3921]: https://gitlab.freedesktop.org/drm/intel/issues/3921
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4547]: https://gitlab.freedesktop.org/drm/intel/issues/4547
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533


Build changes
-------------

  * Linux: CI_DRM_11094 -> Patchwork_22011

  CI-20190529: 20190529
  CI_DRM_11094: 6ce31c986ee8beaa0f98fd4e200b7a421fd4adf9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6327: 0d559158c2d3b5723abbfc2cb4b04532e28663b2 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_22011: e6a40bf716e6f1ecea017de684ee5bcd06eff8db @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

e6a40bf716e6 drm/i915/guc: Print the GuC error capture output register list.
dcf143ea1f41 drm/i915/guc: Copy new GuC error capture logs upon G2H notification.
b484fd64c7ba drm/i915/guc: Update GuC's log-buffer-state access for error capture.
13acbdb9a7ca drm/i915/guc: Add GuC's error state capture output structures.
7febfa438109 drm/i915/guc: Add DG2 registers for GuC error state capture.
68bc5c4ac7f8 drm/i915/guc: Add XE_LP registers for GuC error state capture.
c5b79a260782 drm/i915/guc: Update GuC ADS size for error capture lists

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22011/index.html

[-- Attachment #2: Type: text/html, Size: 13198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Copy new GuC error capture logs upon G2H notification.
  2022-01-18 10:03 ` [Intel-gfx] [PATCH 6/7] drm/i915/guc: Copy new GuC error capture logs upon G2H notification Alan Previn
@ 2022-01-19  1:36   ` Teres Alexis, Alan Previn
  0 siblings, 0 replies; 14+ messages in thread
From: Teres Alexis, Alan Previn @ 2022-01-19  1:36 UTC (permalink / raw)
  To: intel-gfx

A fresh round of offline design discussions with internal team has decided 
that:

 - we dont want to use an interim circular buffer to copy all of the GuC
   generated register dumps of one or more 'engine-capture-groups'.
 - instead, we shall dynamically allocate multiple nodes, each node being
   a single "engine-capture-dump". A link list of one or many engine-capture-
   dumps would result from a single engine-capture-group.
 - this dynamic allocation should happen during the G2H error-capture
   notification event which happens before the corresponding G2H context-
   reset that triggers the i915_gpu_coredump (where we want to avoid
   memory allocation moving forward).
 - we also realize that during the link-list allocation we would need
   a first-parse of the guc-log-error-state-capture head-to-tail entries
   in order to duplicate global and engine-class register dumps for each
   each engine instance register dump if we find dependent-engine resets
   in a engine-capture-group.
 - later when i915_gpu_coredump calls into capture_engine, we finally
   attach the corresponding node from the link list above (detaching it
   from that link list) into i915_gpu_coredump's intel_engine_coredump
   structure when have matching LRCA/guc-id/engine-instace.
 - we would also have to add a flag through i915_gpu_coredump top level
   trigger through to capture_engine to indicate if the capture was triggered
   via a guc context reset or a forced user reset or gt-reset. In the latter
   case (user/gt reset), we should capture the register values directly
   from the HW, i.e. the pre-guc behavior without matching against GuC.

...alan


On Tue, 2022-01-18 at 02:03 -0800, Alan Previn wrote:

> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> index b637628905ec..fc80c5f31915 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c+static void __guc_capture_store_snapshot_work(struct intel_guc *guc)
..
..
> +{
> +	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> +	unsigned int buffer_size, read_offset, write_offset, bytes_to_copy, full_count;
> +	struct guc_log_buffer_state *log_buf_state;
> +	struct guc_log_buffer_state log_buf_state_local;
> +	void *src_data, *dst_data = NULL;
> +	bool new_overflow;
> +
> +	/* Lock to get the pointer to GuC capture-log-buffer-state */
> +	mutex_lock(&guc->log_state[GUC_CAPTURE_LOG_BUFFER].lock);
> +	log_buf_state = guc->log.buf_addr +
> +			(sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER);
> +	src_data = guc->log.buf_addr + intel_guc_get_log_buffer_offset(GUC_CAPTURE_LOG_BUFFER);
> +
> +	/*
> +	 * Make a copy of the state structure, inside GuC log buffer
> +	 * (which is uncached mapped), on the stack to avoid reading
> +	 * from it multiple times.
> +	 */
> +	memcpy(&log_buf_state_local, log_buf_state, sizeof(struct guc_log_buffer_state));
> +	buffer_size = intel_guc_get_log_buffer_size(GUC_CAPTURE_LOG_BUFFER);
> +	read_offset = log_buf_state_local.read_ptr;
> +	write_offset = log_buf_state_local.sampled_write_ptr;
> +	full_count = log_buf_state_local.buffer_full_cnt;
> +
> +	/* Bookkeeping stuff */
> +	guc->log_state[GUC_CAPTURE_LOG_BUFFER].flush += log_buf_state_local.flush_to_file;
> +	new_overflow = intel_guc_check_log_buf_overflow(guc,
> +							&guc->log_state[GUC_CAPTURE_LOG_BUFFER],
> +							full_count);
> +
> +	/* Update the state of shared log buffer */
> +	log_buf_state->read_ptr = write_offset;
> +	log_buf_state->flush_to_file = 0;
> +
> +	mutex_unlock(&guc->log_state[GUC_CAPTURE_LOG_BUFFER].lock);
> +
> +	dst_data = guc->capture.priv->out_store.addr;
> +	if (dst_data) {
> +		mutex_lock(&guc->capture.priv->out_store.lock);
> +
> +		/* Now copy the actual logs. */
> +		if (unlikely(new_overflow)) {
> +			/* copy the whole buffer in case of overflow */
> +			read_offset = 0;
> +			write_offset = buffer_size;
> +		} else if (unlikely((read_offset > buffer_size) ||
> +			   (write_offset > buffer_size))) {
> +			drm_err(&i915->drm, "invalid GuC log capture buffer state!\n");
> +			/* copy whole buffer as offsets are unreliable */
> +			read_offset = 0;
> +			write_offset = buffer_size;
> +		}
> +
> +		/* first copy from the tail end of the GuC log capture buffer */
> +		if (read_offset > write_offset) {
> +			guc_capture_store_insert(guc, &guc->capture.priv->out_store, src_data,
> +						 write_offset);
> +			bytes_to_copy = buffer_size - read_offset;
> +		} else {
> +			bytes_to_copy = write_offset - read_offset;
> +		}
> +		guc_capture_store_insert(guc, &guc->capture.priv->out_store, src_data + read_offset,
> +					 bytes_to_copy);
> +
> +		mutex_unlock(&guc->capture.priv->out_store.lock);
> +	}
> +}
> +
> +void intel_guc_capture_store_snapshot(struct intel_guc *guc)
> +{
> +	if (guc->capture.priv)
> +		__guc_capture_store_snapshot_work(guc);
> +}
> +
> +static void guc_capture_store_destroy(struct intel_guc *guc)
> +{
> +	mutex_destroy(&guc->capture.priv->out_store.lock);
> +	guc->capture.priv->out_store.size = 0;
> +	kfree(guc->capture.priv->out_store.addr);
> +	guc->capture.priv->out_store.addr = NULL;
> +}
> +
> +static int guc_capture_store_create(struct intel_guc *guc)
> +{
> +	/*
> +	 * Make this interim buffer larger than GuC capture output buffer so that we can absorb
> +	 * a little delay when processing the raw capture dumps into text friendly logs
> +	 * for the i915_gpu_coredump output
> +	 */
> +	size_t max_dump_size;
> +
> +	GEM_BUG_ON(guc->capture.priv->out_store.addr);
> +
> +	max_dump_size = PAGE_ALIGN(intel_guc_capture_output_min_size_est(guc));
> +	max_dump_size = roundup_pow_of_two(max_dump_size);
> +
> +	guc->capture.priv->out_store.addr = kzalloc(max_dump_size, GFP_KERNEL);
> +	if (!guc->capture.priv->out_store.addr)
> +		return -ENOMEM;
> +
> +	guc->capture.priv->out_store.size = max_dump_size;
> +	mutex_init(&guc->capture.priv->out_store.lock);
> +
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 2/7] drm/i915/guc: Add XE_LP registers for GuC error state capture.
  2022-01-18 10:03 ` [Intel-gfx] [PATCH 2/7] drm/i915/guc: Add XE_LP registers for GuC error state capture Alan Previn
@ 2022-01-24 19:33   ` Teres Alexis, Alan Previn
  0 siblings, 0 replies; 14+ messages in thread
From: Teres Alexis, Alan Previn @ 2022-01-24 19:33 UTC (permalink / raw)
  To: intel-gfx

Internal feedback is to exactly match the register dumps
output as it did in execlist, however it seems that the 
register dump function in execlist targetting the GT subsystem
also includes non-GT registers like display-related ones that
GuC doesn't manage. So for that, I will have to break up
the execlist function into global-non-gt vs global-gt and then
call the former for both GuC and non-GuC cases (skipping latter
when GuC is doing the dump).

...alan
 

On Tue, 2022-01-18 at 02:03 -0800, Alan Previn wrote:
> Add device specific tables and register lists to cover different engines
> class types for GuC error state capture for XE_LP products.
> 
> Also, add runtime allocation and freeing of extended register lists
> for registers that need steering identifiers that depend on
> the detected HW config.
> 
> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h |   2 +
>  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 208 +++++++++++++++---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   4 +-
>  3 files changed, 186 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> index 20c537274e60..6adfb5c07bcf 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> @@ -19,20 +19,84 @@
>   * NOTE: For engine-registers, GuC only needs the register offsets
>   *       from the engine-mmio-base
>   */
> +#define COMMON_GEN12BASE_GLOBAL() \
> +	{GEN12_FAULT_TLB_DATA0,    0,      0, "GEN12_FAULT_TLB_DATA0"}, \
> +	{GEN12_FAULT_TLB_DATA1,    0,      0, "GEN12_FAULT_TLB_DATA1"}, \
> +	{FORCEWAKE_MT,             0,      0, "FORCEWAKE_MT"}, \
> +	{DERRMR,                   0,      0, "DERRMR"}, \
> +	{GEN12_AUX_ERR_DBG,        0,      0, "GEN12_AUX_ERR_DBG"}, \
> +	{GEN12_GAM_DONE,           0,      0, "GEN12_GAM_DONE"}, \
> +	{GEN11_GUC_SG_INTR_ENABLE, 0,      0, "GEN11_GUC_SG_INTR_ENABLE"}, \
> +	{GEN11_CRYPTO_RSVD_INTR_ENABLE, 0, 0, "GEN11_CRYPTO_RSVD_INTR_ENABLE"}, \
> +	{GEN11_GUNIT_CSME_INTR_ENABLE, 0,  0, "GEN11_GUNIT_CSME_INTR_ENABLE"}, \
> +	{GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0, 0, "GEN11_GPM_WGBOXPERF_INTR_ENABLE"}, \
> +	{GEN8_DE_MISC_IER,         0,      0, "GEN8_DE_MISC_IER"}, \
> +	{GEN12_RING_FAULT_REG,     0,      0, "GEN12_RING_FAULT_REG"}
> +

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-01-24 19:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-18 10:03 [PATCH 0/7] Add GuC Error Capture Support Alan Previn
2022-01-18 10:03 ` [Intel-gfx] " Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 1/7] drm/i915/guc: Update GuC ADS size for error capture lists Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 2/7] drm/i915/guc: Add XE_LP registers for GuC error state capture Alan Previn
2022-01-24 19:33   ` Teres Alexis, Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 3/7] drm/i915/guc: Add DG2 " Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 4/7] drm/i915/guc: Add GuC's error state capture output structures Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 5/7] drm/i915/guc: Update GuC's log-buffer-state access for error capture Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 6/7] drm/i915/guc: Copy new GuC error capture logs upon G2H notification Alan Previn
2022-01-19  1:36   ` Teres Alexis, Alan Previn
2022-01-18 10:03 ` [Intel-gfx] [PATCH 7/7] drm/i915/guc: Print the GuC error capture output register list Alan Previn
2022-01-18 10:16 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Add GuC Error Capture Support (rev4) Patchwork
2022-01-18 10:17 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-01-18 10:49 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.