dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Fixes and improvements to GuC logging and error capture
@ 2022-07-28  2:20 John.C.Harrison
  2022-07-28  2:20 ` [PATCH 1/7] drm/i915/guc: Add a helper for log buffer size John.C.Harrison
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: John.C.Harrison @ 2022-07-28  2:20 UTC (permalink / raw)
  To: Intel-GFX; +Cc: John Harrison, DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

Fix bugs and improve the usability/effectiveness of GuC logging and
GuC related error captures.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>


Alan Previn (1):
  drm/i915/guc: Add a helper for log buffer size

Chris Wilson (1):
  drm/i915/guc: Use streaming loads to speed up dumping the guc log

John Harrison (5):
  drm/i915/guc: Fix capture size warning and bump the size
  drm/i915/guc: Add GuC <-> kernel time stamp translation information
  drm/i915/guc: Record CTB info in error logs
  drm/i915/guc: Make GuC log sizes runtime configurable
  drm/i915/guc: Reduce spam from error capture

 drivers/gpu/drm/i915/gt/intel_gt_regs.h       |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  72 +++--
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 113 ++++----
 .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |   1 -
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 253 +++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  42 +--
 drivers/gpu/drm/i915/i915_gpu_error.c         |  67 ++++-
 drivers/gpu/drm/i915/i915_gpu_error.h         |  21 +-
 drivers/gpu/drm/i915/i915_params.c            |  12 +
 drivers/gpu/drm/i915/i915_params.h            |   3 +
 11 files changed, 427 insertions(+), 161 deletions(-)

-- 
2.37.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/7] drm/i915/guc: Add a helper for log buffer size
  2022-07-28  2:20 [PATCH 0/7] Fixes and improvements to GuC logging and error capture John.C.Harrison
@ 2022-07-28  2:20 ` John.C.Harrison
  2022-08-02 17:37   ` Teres Alexis, Alan Previn
  2022-07-28  2:20 ` [PATCH 2/7] drm/i915/guc: Fix capture size warning and bump the size John.C.Harrison
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: John.C.Harrison @ 2022-07-28  2:20 UTC (permalink / raw)
  To: Intel-GFX; +Cc: Matthew Brost, John Harrison, Alan Previn, DRI-Devel

From: Alan Previn <alan.previn.teres.alexis@intel.com>

Add a helper to get GuC log buffer size.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 49 ++++++++++++----------
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
index 25b2d7ce6640d..492bbf419d4df 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
@@ -15,6 +15,32 @@
 
 static void guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log);
 
+static u32 intel_guc_log_size(struct intel_guc_log *log)
+{
+	/*
+	 *  GuC Log buffer Layout:
+	 *
+	 *  NB: Ordering must follow "enum guc_log_buffer_type".
+	 *
+	 *  +===============================+ 00B
+	 *  |      Debug state header       |
+	 *  +-------------------------------+ 32B
+	 *  |    Crash dump state header    |
+	 *  +-------------------------------+ 64B
+	 *  |     Capture state header      |
+	 *  +-------------------------------+ 96B
+	 *  |                               |
+	 *  +===============================+ PAGE_SIZE (4KB)
+	 *  |          Debug logs           |
+	 *  +===============================+ + DEBUG_SIZE
+	 *  |        Crash Dump logs        |
+	 *  +===============================+ + CRASH_SIZE
+	 *  |         Capture logs          |
+	 *  +===============================+ + CAPTURE_SIZE
+	 */
+	return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
+}
+
 /**
  * DOC: GuC firmware log
  *
@@ -461,32 +487,11 @@ int intel_guc_log_create(struct intel_guc_log *log)
 
 	GEM_BUG_ON(log->vma);
 
-	/*
-	 *  GuC Log buffer Layout
-	 * (this ordering must follow "enum guc_log_buffer_type" definition)
-	 *
-	 *  +===============================+ 00B
-	 *  |      Debug state header       |
-	 *  +-------------------------------+ 32B
-	 *  |    Crash dump state header    |
-	 *  +-------------------------------+ 64B
-	 *  |     Capture state header      |
-	 *  +-------------------------------+ 96B
-	 *  |                               |
-	 *  +===============================+ PAGE_SIZE (4KB)
-	 *  |          Debug logs           |
-	 *  +===============================+ + DEBUG_SIZE
-	 *  |        Crash Dump logs        |
-	 *  +===============================+ + CRASH_SIZE
-	 *  |         Capture logs          |
-	 *  +===============================+ + CAPTURE_SIZE
-	 */
 	if (intel_guc_capture_output_min_size_est(guc) > CAPTURE_BUFFER_SIZE)
 		DRM_WARN("GuC log buffer for state_capture maybe too small. %d < %d\n",
 			 CAPTURE_BUFFER_SIZE, intel_guc_capture_output_min_size_est(guc));
 
-	guc_log_size = PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE +
-		       CAPTURE_BUFFER_SIZE;
+	guc_log_size = intel_guc_log_size(log);
 
 	vma = intel_guc_allocate_vma(guc, guc_log_size);
 	if (IS_ERR(vma)) {
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/7] drm/i915/guc: Fix capture size warning and bump the size
  2022-07-28  2:20 [PATCH 0/7] Fixes and improvements to GuC logging and error capture John.C.Harrison
  2022-07-28  2:20 ` [PATCH 1/7] drm/i915/guc: Add a helper for log buffer size John.C.Harrison
@ 2022-07-28  2:20 ` John.C.Harrison
  2022-08-02 17:46   ` [Intel-gfx] " Teres Alexis, Alan Previn
  2022-07-28  2:20 ` [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information John.C.Harrison
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: John.C.Harrison @ 2022-07-28  2:20 UTC (permalink / raw)
  To: Intel-GFX; +Cc: John Harrison, DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

There was a size check to warn if the GuC error state capture buffer
allocation would be too small to fit a reasonable amount of capture
data for the current platform. Unfortunately, the test was done too
early in the boot sequence and was actually testing 'if(-ENODEV >
size)'.

Move the check to be later. The check is only used to print a warning
message, so it doesn't really matter how early or late it is done.
Note that it is not possible to dynamically size the buffer because
the allocation needs to be done before the engine information is
available (at least, it would be in the intended two-phase GuC init
process).

Now that the check works, it is reporting size too small for newer
platforms. The check includes a 3x oversample multiplier to allow for
multiple error captures to be bufferd by GuC before i915 has a chance
to read them out. This is less important than simply being big enough
to fit the first capture.

So a) bump the default size to be large enough for one capture minimum
and b) make the warning only if one capture won't fit, instead use a
notice for the 3x size.

Note that the size estimate is a worst case scenario. Actual captures
will likely be smaller.

Lastly, use drm_warn istead of DRM_WARN as the former provides more
infmration and the latter is deprecated.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 40 ++++++++++++++-----
 .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |  1 -
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    |  4 --
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  4 +-
 4 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
index 75257bd20ff01..b54b7883320b1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -600,10 +600,8 @@ intel_guc_capture_getnullheader(struct intel_guc *guc,
 	return 0;
 }
 
-#define GUC_CAPTURE_OVERBUFFER_MULTIPLIER 3
-
-int
-intel_guc_capture_output_min_size_est(struct intel_guc *guc)
+static int
+guc_capture_output_min_size_est(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
 	struct intel_engine_cs *engine;
@@ -623,13 +621,8 @@ intel_guc_capture_output_min_size_est(struct intel_guc *guc)
 	 * For each engine instance, there would be 1 x guc_state_capture_group_t output
 	 * followed by 3 x guc_state_capture_t lists. The latter is how the register
 	 * dumps are split across different register types (where the '3' are global vs class
-	 * vs instance). Finally, let's multiply the whole thing by 3x (just so we are
-	 * not limited to just 1 round of data in a worst case full register dump log)
-	 *
-	 * NOTE: intel_guc_log that allocates the log buffer would round this size up to
-	 * a power of two.
+	 * vs instance).
 	 */
-
 	for_each_engine(engine, gt, id) {
 		worst_min_size += sizeof(struct guc_state_capture_group_header_t) +
 					 (3 * sizeof(struct guc_state_capture_header_t));
@@ -649,7 +642,30 @@ intel_guc_capture_output_min_size_est(struct intel_guc *guc)
 
 	worst_min_size += (num_regs * sizeof(struct guc_mmio_reg));
 
-	return (worst_min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER);
+	return worst_min_size;
+}
+
+/*
+ * Add on a 3x multiplier to allow for multiple back-to-back captures occurring
+ * before the i915 can read the data out and process it
+ */
+#define GUC_CAPTURE_OVERBUFFER_MULTIPLIER 3
+
+static void check_guc_capture_size(struct intel_guc *guc)
+{
+	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
+	int min_size = guc_capture_output_min_size_est(guc);
+	int spare_size = min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER;
+
+	if (min_size < 0)
+		drm_warn(&i915->drm, "Failed to calculate GuC error state capture buffer minimum size: %d!\n",
+			 min_size);
+	else if (min_size > CAPTURE_BUFFER_SIZE)
+		drm_warn(&i915->drm, "GuC error state capture buffer is too small: %d < %d\n",
+			 CAPTURE_BUFFER_SIZE, min_size);
+	else if (spare_size > CAPTURE_BUFFER_SIZE)
+		drm_notice(&i915->drm, "GuC error state capture buffer maybe too small: %d < %d (min = %d)\n",
+			   CAPTURE_BUFFER_SIZE, spare_size, min_size);
 }
 
 /*
@@ -1580,5 +1596,7 @@ int intel_guc_capture_init(struct intel_guc *guc)
 	INIT_LIST_HEAD(&guc->capture->outlist);
 	INIT_LIST_HEAD(&guc->capture->cachelist);
 
+	check_guc_capture_size(guc);
+
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
index d3d7bd0b6db64..fbd3713c7832d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
@@ -21,7 +21,6 @@ int intel_guc_capture_print_engine_node(struct drm_i915_error_state_buf *m,
 void intel_guc_capture_get_matching_node(struct intel_gt *gt, struct intel_engine_coredump *ee,
 					 struct intel_context *ce);
 void intel_guc_capture_process(struct intel_guc *guc);
-int intel_guc_capture_output_min_size_est(struct intel_guc *guc);
 int intel_guc_capture_getlist(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
 			      void **outptr);
 int intel_guc_capture_getlistsize(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
index 492bbf419d4df..991d4a02248dc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
@@ -487,10 +487,6 @@ int intel_guc_log_create(struct intel_guc_log *log)
 
 	GEM_BUG_ON(log->vma);
 
-	if (intel_guc_capture_output_min_size_est(guc) > CAPTURE_BUFFER_SIZE)
-		DRM_WARN("GuC log buffer for state_capture maybe too small. %d < %d\n",
-			 CAPTURE_BUFFER_SIZE, intel_guc_capture_output_min_size_est(guc));
-
 	guc_log_size = intel_guc_log_size(log);
 
 	vma = intel_guc_allocate_vma(guc, guc_log_size);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
index 18007e639be99..dc9715411d626 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
@@ -22,11 +22,11 @@ struct intel_guc;
 #elif defined(CONFIG_DRM_I915_DEBUG_GEM)
 #define CRASH_BUFFER_SIZE	SZ_1M
 #define DEBUG_BUFFER_SIZE	SZ_2M
-#define CAPTURE_BUFFER_SIZE	SZ_1M
+#define CAPTURE_BUFFER_SIZE	SZ_4M
 #else
 #define CRASH_BUFFER_SIZE	SZ_8K
 #define DEBUG_BUFFER_SIZE	SZ_64K
-#define CAPTURE_BUFFER_SIZE	SZ_16K
+#define CAPTURE_BUFFER_SIZE	SZ_2M
 #endif
 
 /*
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information
  2022-07-28  2:20 [PATCH 0/7] Fixes and improvements to GuC logging and error capture John.C.Harrison
  2022-07-28  2:20 ` [PATCH 1/7] drm/i915/guc: Add a helper for log buffer size John.C.Harrison
  2022-07-28  2:20 ` [PATCH 2/7] drm/i915/guc: Fix capture size warning and bump the size John.C.Harrison
@ 2022-07-28  2:20 ` John.C.Harrison
  2022-08-05  0:40   ` [Intel-gfx] " Teres Alexis, Alan Previn
  2022-08-19 10:45   ` Jani Nikula
  2022-07-28  2:20 ` [PATCH 4/7] drm/i915/guc: Record CTB info in error logs John.C.Harrison
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 29+ messages in thread
From: John.C.Harrison @ 2022-07-28  2:20 UTC (permalink / raw)
  To: Intel-GFX; +Cc: John Harrison, DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

It is useful to be able to match GuC events to kernel events when
looking at the GuC log. That requires being able to convert GuC
timestamps to kernel time. So, when dumping error captures and/or GuC
logs, include a stamp in both time zones plus the clock frequency.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_regs.h    |  2 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c     | 19 +++++++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h     |  2 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c |  2 ++
 drivers/gpu/drm/i915/i915_gpu_error.c      | 12 ++++++++++++
 drivers/gpu/drm/i915/i915_gpu_error.h      |  3 +++
 6 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index 60d6eb5f245b7..fc7979bd91db5 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1007,6 +1007,8 @@
 #define   GEN11_LSN_UNSLCVC_GAFS_HALF_CL2_MAXALLOC	(1 << 9)
 #define   GEN11_LSN_UNSLCVC_GAFS_HALF_SF_MAXALLOC	(1 << 7)
 
+#define GUCPMTIMESTAMP				_MMIO(0xc3e8)
+
 #define __GEN9_RCS0_MOCS0			0xc800
 #define GEN9_GFX_MOCS(i)			_MMIO(__GEN9_RCS0_MOCS0 + (i) * 4)
 #define __GEN9_VCS0_MOCS0			0xc900
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 2706a8c650900..ab4aacc516aa4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -389,6 +389,25 @@ void intel_guc_write_params(struct intel_guc *guc)
 	intel_uncore_forcewake_put(uncore, FORCEWAKE_GT);
 }
 
+void intel_guc_dump_time_info(struct intel_guc *guc, struct drm_printer *p)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	intel_wakeref_t wakeref;
+	u32 stamp = 0;
+	u64 ktime;
+
+	intel_device_info_print_runtime(RUNTIME_INFO(gt->i915), p);
+
+	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
+		stamp = intel_uncore_read(gt->uncore, GUCPMTIMESTAMP);
+	ktime = ktime_get_boottime_ns();
+
+	drm_printf(p, "Kernel timestamp: 0x%08llX [%llu]\n", ktime, ktime);
+	drm_printf(p, "GuC timestamp: 0x%08X [%u]\n", stamp, stamp);
+	drm_printf(p, "CS timestamp frequency: %u Hz, %u ns\n",
+		   gt->clock_frequency, gt->clock_period_ns);
+}
+
 int intel_guc_init(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index a7acffbf15d1f..804133df1ac9b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -464,4 +464,6 @@ void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
 void intel_guc_write_barrier(struct intel_guc *guc);
 
+void intel_guc_dump_time_info(struct intel_guc *guc, struct drm_printer *p);
+
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
index 991d4a02248dc..07d31ae32f765 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
@@ -764,6 +764,8 @@ int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
 	if (!obj)
 		return 0;
 
+	intel_guc_dump_time_info(guc, p);
+
 	map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC);
 	if (IS_ERR(map)) {
 		DRM_DEBUG("Failed to pin object\n");
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 32e92651ef7c2..addba75252343 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -678,6 +678,7 @@ static void err_print_uc(struct drm_i915_error_state_buf *m,
 
 	intel_uc_fw_dump(&error_uc->guc_fw, &p);
 	intel_uc_fw_dump(&error_uc->huc_fw, &p);
+	err_printf(m, "GuC timestamp: 0x%08x\n", error_uc->timestamp);
 	intel_gpu_error_print_vma(m, NULL, error_uc->guc_log);
 }
 
@@ -720,6 +721,8 @@ static void err_print_gt_global_nonguc(struct drm_i915_error_state_buf *m,
 	int i;
 
 	err_printf(m, "GT awake: %s\n", str_yes_no(gt->awake));
+	err_printf(m, "CS timestamp frequency: %u Hz, %d ns\n",
+		   gt->clock_frequency, gt->clock_period_ns);
 	err_printf(m, "EIR: 0x%08x\n", gt->eir);
 	err_printf(m, "PGTBL_ER: 0x%08x\n", gt->pgtbl_er);
 
@@ -1675,6 +1678,13 @@ gt_record_uc(struct intel_gt_coredump *gt,
 	 */
 	error_uc->guc_fw.path = kstrdup(uc->guc.fw.path, ALLOW_FAIL);
 	error_uc->huc_fw.path = kstrdup(uc->huc.fw.path, ALLOW_FAIL);
+
+	/*
+	 * Save the GuC log and include a timestamp reference for converting the
+	 * log times to system times (in conjunction with the error->boottime and
+	 * gt->clock_frequency fields saved elsewhere).
+	 */
+	error_uc->timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
 	error_uc->guc_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
 						"GuC log buffer", compress);
 
@@ -1833,6 +1843,8 @@ static void gt_record_global_regs(struct intel_gt_coredump *gt)
 static void gt_record_info(struct intel_gt_coredump *gt)
 {
 	memcpy(&gt->info, &gt->_gt->info, sizeof(struct intel_gt_info));
+	gt->clock_frequency = gt->_gt->clock_frequency;
+	gt->clock_period_ns = gt->_gt->clock_period_ns;
 }
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 55a143b92d10e..d8a8b3d529e09 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -150,6 +150,8 @@ struct intel_gt_coredump {
 	u32 gtt_cache;
 	u32 aux_err; /* gen12 */
 	u32 gam_done; /* gen12 */
+	u32 clock_frequency;
+	u32 clock_period_ns;
 
 	/* Display related */
 	u32 derrmr;
@@ -164,6 +166,7 @@ struct intel_gt_coredump {
 		struct intel_uc_fw guc_fw;
 		struct intel_uc_fw huc_fw;
 		struct i915_vma_coredump *guc_log;
+		u32 timestamp;
 		bool is_guc_capture;
 	} *uc;
 
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/7] drm/i915/guc: Record CTB info in error logs
  2022-07-28  2:20 [PATCH 0/7] Fixes and improvements to GuC logging and error capture John.C.Harrison
                   ` (2 preceding siblings ...)
  2022-07-28  2:20 ` [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information John.C.Harrison
@ 2022-07-28  2:20 ` John.C.Harrison
  2022-08-02 18:27   ` [Intel-gfx] " Teres Alexis, Alan Previn
  2022-07-28  2:20 ` [PATCH 5/7] drm/i915/guc: Use streaming loads to speed up dumping the guc log John.C.Harrison
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: John.C.Harrison @ 2022-07-28  2:20 UTC (permalink / raw)
  To: Intel-GFX; +Cc: John Harrison, DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

When debugging GuC communication issues, it is useful to have the CTB
info available. So add the state and buffer contents to the error
capture log.

Also, add a sub-structure for the GuC specific error capture info as
it is now becoming numerous.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 59 +++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gpu_error.h | 20 +++++++--
 2 files changed, 67 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index addba75252343..543ba63f958ea 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -671,6 +671,18 @@ static void err_print_pciid(struct drm_i915_error_state_buf *m,
 		   pdev->subsystem_device);
 }
 
+static void err_print_guc_ctb(struct drm_i915_error_state_buf *m,
+			      const char *name,
+			      const struct intel_ctb_coredump *ctb)
+{
+	if (!ctb->size)
+		return;
+
+	err_printf(m, "GuC %s CTB: raw: 0x%08X, 0x%08X/%08X, cached: 0x%08X/%08X, desc = 0x%08X, buf = 0x%08X x 0x%08X\n",
+		   name, ctb->raw_status, ctb->raw_head, ctb->raw_tail,
+		   ctb->head, ctb->tail, ctb->desc_offset, ctb->cmds_offset, ctb->size);
+}
+
 static void err_print_uc(struct drm_i915_error_state_buf *m,
 			 const struct intel_uc_coredump *error_uc)
 {
@@ -678,8 +690,12 @@ static void err_print_uc(struct drm_i915_error_state_buf *m,
 
 	intel_uc_fw_dump(&error_uc->guc_fw, &p);
 	intel_uc_fw_dump(&error_uc->huc_fw, &p);
-	err_printf(m, "GuC timestamp: 0x%08x\n", error_uc->timestamp);
-	intel_gpu_error_print_vma(m, NULL, error_uc->guc_log);
+	err_printf(m, "GuC timestamp: 0x%08x\n", error_uc->guc.timestamp);
+	intel_gpu_error_print_vma(m, NULL, error_uc->guc.vma_log);
+	err_printf(m, "GuC CTB fence: %d\n", error_uc->guc.last_fence);
+	err_print_guc_ctb(m, "Send", error_uc->guc.ctb + 0);
+	err_print_guc_ctb(m, "Recv", error_uc->guc.ctb + 1);
+	intel_gpu_error_print_vma(m, NULL, error_uc->guc.vma_ctb);
 }
 
 static void err_free_sgl(struct scatterlist *sgl)
@@ -854,7 +870,7 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
 	if (error->gt) {
 		bool print_guc_capture = false;
 
-		if (error->gt->uc && error->gt->uc->is_guc_capture)
+		if (error->gt->uc && error->gt->uc->guc.is_guc_capture)
 			print_guc_capture = true;
 
 		err_print_gt_display(m, error->gt);
@@ -1009,7 +1025,8 @@ static void cleanup_uc(struct intel_uc_coredump *uc)
 {
 	kfree(uc->guc_fw.path);
 	kfree(uc->huc_fw.path);
-	i915_vma_coredump_free(uc->guc_log);
+	i915_vma_coredump_free(uc->guc.vma_log);
+	i915_vma_coredump_free(uc->guc.vma_ctb);
 
 	kfree(uc);
 }
@@ -1658,6 +1675,23 @@ gt_record_engines(struct intel_gt_coredump *gt,
 	}
 }
 
+static void gt_record_guc_ctb(struct intel_ctb_coredump *saved,
+			      const struct intel_guc_ct_buffer *ctb,
+			      const void *blob_ptr, struct intel_guc *guc)
+{
+	if (!ctb || !ctb->desc)
+		return;
+
+	saved->raw_status = ctb->desc->status;
+	saved->raw_head = ctb->desc->head;
+	saved->raw_tail = ctb->desc->tail;
+	saved->head = ctb->head;
+	saved->tail = ctb->tail;
+	saved->size = ctb->size;
+	saved->desc_offset = ((void *)ctb->desc) - blob_ptr;
+	saved->cmds_offset = ((void *)ctb->cmds) - blob_ptr;
+}
+
 static struct intel_uc_coredump *
 gt_record_uc(struct intel_gt_coredump *gt,
 	     struct i915_vma_compress *compress)
@@ -1684,9 +1718,16 @@ gt_record_uc(struct intel_gt_coredump *gt,
 	 * log times to system times (in conjunction with the error->boottime and
 	 * gt->clock_frequency fields saved elsewhere).
 	 */
-	error_uc->timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
-	error_uc->guc_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
-						"GuC log buffer", compress);
+	error_uc->guc.timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
+	error_uc->guc.vma_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
+						    "GuC log buffer", compress);
+	error_uc->guc.vma_ctb = create_vma_coredump(gt->_gt, uc->guc.ct.vma,
+						    "GuC CT buffer", compress);
+	error_uc->guc.last_fence = uc->guc.ct.requests.last_fence;
+	gt_record_guc_ctb(error_uc->guc.ctb + 0, &uc->guc.ct.ctbs.send,
+			  uc->guc.ct.ctbs.send.desc, (struct intel_guc *)&uc->guc);
+	gt_record_guc_ctb(error_uc->guc.ctb + 1, &uc->guc.ct.ctbs.recv,
+			  uc->guc.ct.ctbs.send.desc, (struct intel_guc *)&uc->guc);
 
 	return error_uc;
 }
@@ -2039,9 +2080,9 @@ __i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask, u32 du
 			error->gt->uc = gt_record_uc(error->gt, compress);
 			if (error->gt->uc) {
 				if (dump_flags & CORE_DUMP_FLAG_IS_GUC_CAPTURE)
-					error->gt->uc->is_guc_capture = true;
+					error->gt->uc->guc.is_guc_capture = true;
 				else
-					GEM_BUG_ON(error->gt->uc->is_guc_capture);
+					GEM_BUG_ON(error->gt->uc->guc.is_guc_capture);
 			}
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index d8a8b3d529e09..efc75cc2ffdb9 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -125,6 +125,15 @@ struct intel_engine_coredump {
 	struct intel_engine_coredump *next;
 };
 
+struct intel_ctb_coredump {
+	u32 raw_head, head;
+	u32 raw_tail, tail;
+	u32 raw_status;
+	u32 desc_offset;
+	u32 cmds_offset;
+	u32 size;
+};
+
 struct intel_gt_coredump {
 	const struct intel_gt *_gt;
 	bool awake;
@@ -165,9 +174,14 @@ struct intel_gt_coredump {
 	struct intel_uc_coredump {
 		struct intel_uc_fw guc_fw;
 		struct intel_uc_fw huc_fw;
-		struct i915_vma_coredump *guc_log;
-		u32 timestamp;
-		bool is_guc_capture;
+		struct guc_info {
+			struct intel_ctb_coredump ctb[2];
+			struct i915_vma_coredump *vma_ctb;
+			struct i915_vma_coredump *vma_log;
+			u32 timestamp;
+			u16 last_fence;
+			bool is_guc_capture;
+		} guc;
 	} *uc;
 
 	struct intel_gt_coredump *next;
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/7] drm/i915/guc: Use streaming loads to speed up dumping the guc log
  2022-07-28  2:20 [PATCH 0/7] Fixes and improvements to GuC logging and error capture John.C.Harrison
                   ` (3 preceding siblings ...)
  2022-07-28  2:20 ` [PATCH 4/7] drm/i915/guc: Record CTB info in error logs John.C.Harrison
@ 2022-07-28  2:20 ` John.C.Harrison
  2022-08-02 18:48   ` [Intel-gfx] " Teres Alexis, Alan Previn
  2022-07-28  2:20 ` [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable John.C.Harrison
  2022-07-28  2:20 ` [PATCH 7/7] drm/i915/guc: Reduce spam from error capture John.C.Harrison
  6 siblings, 1 reply; 29+ messages in thread
From: John.C.Harrison @ 2022-07-28  2:20 UTC (permalink / raw)
  To: Intel-GFX; +Cc: John Harrison, Chris Wilson, DRI-Devel

From: Chris Wilson <chris.p.wilson@intel.com>

Use a temporary page and mempy_from_wc to reduce the time it takes to
dump the guc log to debugfs.

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 24 ++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
index 07d31ae32f765..4722d4b18ed19 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
@@ -750,8 +750,9 @@ int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
 	struct intel_guc *guc = log_to_guc(log);
 	struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
 	struct drm_i915_gem_object *obj = NULL;
-	u32 *map;
-	int i = 0;
+	void *map;
+	u32 *page;
+	int i, j;
 
 	if (!intel_guc_is_supported(guc))
 		return -ENODEV;
@@ -764,23 +765,34 @@ int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
 	if (!obj)
 		return 0;
 
+	page = (u32 *)__get_free_page(GFP_KERNEL);
+	if (!page)
+		return -ENOMEM;
+
 	intel_guc_dump_time_info(guc, p);
 
 	map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC);
 	if (IS_ERR(map)) {
 		DRM_DEBUG("Failed to pin object\n");
 		drm_puts(p, "(log data unaccessible)\n");
+		free_page((unsigned long)page);
 		return PTR_ERR(map);
 	}
 
-	for (i = 0; i < obj->base.size / sizeof(u32); i += 4)
-		drm_printf(p, "0x%08x 0x%08x 0x%08x 0x%08x\n",
-			   *(map + i), *(map + i + 1),
-			   *(map + i + 2), *(map + i + 3));
+	for (i = 0; i < obj->base.size; i += PAGE_SIZE) {
+		if (!i915_memcpy_from_wc(page, map + i, PAGE_SIZE))
+			memcpy(page, map + i, PAGE_SIZE);
+
+		for (j = 0; j < PAGE_SIZE / sizeof(u32); j += 4)
+			drm_printf(p, "0x%08x 0x%08x 0x%08x 0x%08x\n",
+				   *(page + j + 0), *(page + j + 1),
+				   *(page + j + 2), *(page + j + 3));
+	}
 
 	drm_puts(p, "\n");
 
 	i915_gem_object_unpin_map(obj);
+	free_page((unsigned long)page);
 
 	return 0;
 }
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable
  2022-07-28  2:20 [PATCH 0/7] Fixes and improvements to GuC logging and error capture John.C.Harrison
                   ` (4 preceding siblings ...)
  2022-07-28  2:20 ` [PATCH 5/7] drm/i915/guc: Use streaming loads to speed up dumping the guc log John.C.Harrison
@ 2022-07-28  2:20 ` John.C.Harrison
  2022-08-15  5:43   ` [Intel-gfx] " Teres Alexis, Alan Previn
  2022-08-24  9:01   ` Joonas Lahtinen
  2022-07-28  2:20 ` [PATCH 7/7] drm/i915/guc: Reduce spam from error capture John.C.Harrison
  6 siblings, 2 replies; 29+ messages in thread
From: John.C.Harrison @ 2022-07-28  2:20 UTC (permalink / raw)
  To: Intel-GFX; +Cc: John Harrison, DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

The GuC log buffer sizes had to be configured statically at compile
time. This can be quite troublesome when needing to get larger logs
out of a released driver. So re-organise the code to allow a boot time
module parameter override.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  53 ++----
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    |  14 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 176 +++++++++++++++++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  42 +++--
 drivers/gpu/drm/i915/i915_params.c            |  12 ++
 drivers/gpu/drm/i915/i915_params.h            |   3 +
 6 files changed, 226 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index ab4aacc516aa4..01f2705cb94a3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -224,53 +224,22 @@ static u32 guc_ctl_feature_flags(struct intel_guc *guc)
 
 static u32 guc_ctl_log_params_flags(struct intel_guc *guc)
 {
-	u32 offset = intel_guc_ggtt_offset(guc, guc->log.vma) >> PAGE_SHIFT;
-	u32 flags;
-
-	#if (((CRASH_BUFFER_SIZE) % SZ_1M) == 0)
-	#define LOG_UNIT SZ_1M
-	#define LOG_FLAG GUC_LOG_LOG_ALLOC_UNITS
-	#else
-	#define LOG_UNIT SZ_4K
-	#define LOG_FLAG 0
-	#endif
-
-	#if (((CAPTURE_BUFFER_SIZE) % SZ_1M) == 0)
-	#define CAPTURE_UNIT SZ_1M
-	#define CAPTURE_FLAG GUC_LOG_CAPTURE_ALLOC_UNITS
-	#else
-	#define CAPTURE_UNIT SZ_4K
-	#define CAPTURE_FLAG 0
-	#endif
-
-	BUILD_BUG_ON(!CRASH_BUFFER_SIZE);
-	BUILD_BUG_ON(!IS_ALIGNED(CRASH_BUFFER_SIZE, LOG_UNIT));
-	BUILD_BUG_ON(!DEBUG_BUFFER_SIZE);
-	BUILD_BUG_ON(!IS_ALIGNED(DEBUG_BUFFER_SIZE, LOG_UNIT));
-	BUILD_BUG_ON(!CAPTURE_BUFFER_SIZE);
-	BUILD_BUG_ON(!IS_ALIGNED(CAPTURE_BUFFER_SIZE, CAPTURE_UNIT));
-
-	BUILD_BUG_ON((CRASH_BUFFER_SIZE / LOG_UNIT - 1) >
-			(GUC_LOG_CRASH_MASK >> GUC_LOG_CRASH_SHIFT));
-	BUILD_BUG_ON((DEBUG_BUFFER_SIZE / LOG_UNIT - 1) >
-			(GUC_LOG_DEBUG_MASK >> GUC_LOG_DEBUG_SHIFT));
-	BUILD_BUG_ON((CAPTURE_BUFFER_SIZE / CAPTURE_UNIT - 1) >
-			(GUC_LOG_CAPTURE_MASK >> GUC_LOG_CAPTURE_SHIFT));
+	struct intel_guc_log *log = &guc->log;
+	u32 offset, flags;
+
+	GEM_BUG_ON(!log->sizes_initialised);
+
+	offset = intel_guc_ggtt_offset(guc, log->vma) >> PAGE_SHIFT;
 
 	flags = GUC_LOG_VALID |
 		GUC_LOG_NOTIFY_ON_HALF_FULL |
-		CAPTURE_FLAG |
-		LOG_FLAG |
-		((CRASH_BUFFER_SIZE / LOG_UNIT - 1) << GUC_LOG_CRASH_SHIFT) |
-		((DEBUG_BUFFER_SIZE / LOG_UNIT - 1) << GUC_LOG_DEBUG_SHIFT) |
-		((CAPTURE_BUFFER_SIZE / CAPTURE_UNIT - 1) << GUC_LOG_CAPTURE_SHIFT) |
+		log->sizes[GUC_LOG_SECTIONS_DEBUG].flag |
+		log->sizes[GUC_LOG_SECTIONS_CAPTURE].flag |
+		(log->sizes[GUC_LOG_SECTIONS_CRASH].count << GUC_LOG_CRASH_SHIFT) |
+		(log->sizes[GUC_LOG_SECTIONS_DEBUG].count << GUC_LOG_DEBUG_SHIFT) |
+		(log->sizes[GUC_LOG_SECTIONS_CAPTURE].count << GUC_LOG_CAPTURE_SHIFT) |
 		(offset << GUC_LOG_BUF_ADDR_SHIFT);
 
-	#undef LOG_UNIT
-	#undef LOG_FLAG
-	#undef CAPTURE_UNIT
-	#undef CAPTURE_FLAG
-
 	return flags;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
index b54b7883320b1..d2ac53d4f3b6e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -656,16 +656,17 @@ static void check_guc_capture_size(struct intel_guc *guc)
 	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
 	int min_size = guc_capture_output_min_size_est(guc);
 	int spare_size = min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER;
+	u32 buffer_size = intel_guc_log_section_size_capture(&guc->log);
 
 	if (min_size < 0)
 		drm_warn(&i915->drm, "Failed to calculate GuC error state capture buffer minimum size: %d!\n",
 			 min_size);
-	else if (min_size > CAPTURE_BUFFER_SIZE)
+	else if (min_size > buffer_size)
 		drm_warn(&i915->drm, "GuC error state capture buffer is too small: %d < %d\n",
-			 CAPTURE_BUFFER_SIZE, min_size);
-	else if (spare_size > CAPTURE_BUFFER_SIZE)
+			 buffer_size, min_size);
+	else if (spare_size > buffer_size)
 		drm_notice(&i915->drm, "GuC error state capture buffer maybe too small: %d < %d (min = %d)\n",
-			   CAPTURE_BUFFER_SIZE, spare_size, min_size);
+			   buffer_size, spare_size, min_size);
 }
 
 /*
@@ -1294,7 +1295,8 @@ static void __guc_capture_process_output(struct intel_guc *guc)
 
 	log_buf_state = guc->log.buf_addr +
 			(sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER);
-	src_data = guc->log.buf_addr + intel_guc_get_log_buffer_offset(GUC_CAPTURE_LOG_BUFFER);
+	src_data = guc->log.buf_addr +
+		   intel_guc_get_log_buffer_offset(&guc->log, GUC_CAPTURE_LOG_BUFFER);
 
 	/*
 	 * Make a copy of the state structure, inside GuC log buffer
@@ -1302,7 +1304,7 @@ static void __guc_capture_process_output(struct intel_guc *guc)
 	 * from it multiple times.
 	 */
 	memcpy(&log_buf_state_local, log_buf_state, sizeof(struct guc_log_buffer_state));
-	buffer_size = intel_guc_get_log_buffer_size(GUC_CAPTURE_LOG_BUFFER);
+	buffer_size = intel_guc_get_log_buffer_size(&guc->log, GUC_CAPTURE_LOG_BUFFER);
 	read_offset = log_buf_state_local.read_ptr;
 	write_offset = log_buf_state_local.sampled_write_ptr;
 	full_count = log_buf_state_local.buffer_full_cnt;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
index 4722d4b18ed19..890b6853bd609 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
@@ -13,8 +13,158 @@
 #include "intel_guc_capture.h"
 #include "intel_guc_log.h"
 
+#if defined(CONFIG_DRM_I915_DEBUG_GUC)
+#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE	SZ_2M
+#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE	SZ_16M
+#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE	SZ_4M
+#elif defined(CONFIG_DRM_I915_DEBUG_GEM)
+#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE	SZ_1M
+#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE	SZ_2M
+#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE	SZ_4M
+#else
+#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE	SZ_8K
+#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE	SZ_64K
+#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE	SZ_2M
+#endif
+
 static void guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log);
 
+struct guc_log_section {
+	u32 max;
+	u32 flag;
+	u32 default_val;
+	const char *name;
+};
+
+static s32 scale_log_param(struct intel_guc_log *log, const struct guc_log_section *section,
+			   s32 param)
+{
+	/* -1 means default */
+	if (param < 0)
+		return section->default_val;
+
+	/* Check for 32-bit overflow */
+	if (param >= SZ_4K) {
+		drm_err(&guc_to_gt(log_to_guc(log))->i915->drm, "Size too large for GuC %s log: %dMB!",
+			section->name, param);
+		return section->default_val;
+	}
+
+	/* Param units are 1MB */
+	return param * SZ_1M;
+}
+
+static void _guc_log_init_sizes(struct intel_guc_log *log)
+{
+	struct intel_guc *guc = log_to_guc(log);
+	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
+	static const struct guc_log_section sections[GUC_LOG_SECTIONS_LIMIT] = {
+		{
+			GUC_LOG_CRASH_MASK >> GUC_LOG_CRASH_SHIFT,
+			GUC_LOG_LOG_ALLOC_UNITS,
+			GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE,
+			"crash dump"
+		},
+		{
+			GUC_LOG_DEBUG_MASK >> GUC_LOG_DEBUG_SHIFT,
+			GUC_LOG_LOG_ALLOC_UNITS,
+			GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE,
+			"debug",
+		},
+		{
+			GUC_LOG_CAPTURE_MASK >> GUC_LOG_CAPTURE_SHIFT,
+			GUC_LOG_CAPTURE_ALLOC_UNITS,
+			GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE,
+			"capture",
+		}
+	};
+	s32 params[GUC_LOG_SECTIONS_LIMIT] = {
+		i915->params.guc_log_size_crash,
+		i915->params.guc_log_size_debug,
+		i915->params.guc_log_size_capture,
+	};
+	int i;
+
+	for (i = 0; i < GUC_LOG_SECTIONS_LIMIT; i++)
+		log->sizes[i].bytes = scale_log_param(log, sections + i, params[i]);
+
+	/* If debug size > 1MB then bump default crash size to keep the same units */
+	if (log->sizes[GUC_LOG_SECTIONS_DEBUG].bytes >= SZ_1M &&
+	    (i915->params.guc_log_size_crash == -1) &&
+	    GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE < SZ_1M)
+		log->sizes[GUC_LOG_SECTIONS_CRASH].bytes = SZ_1M;
+
+	/* Prepare the GuC API structure fields: */
+	for (i = 0; i < GUC_LOG_SECTIONS_LIMIT; i++) {
+		/* Convert to correct units */
+		if ((log->sizes[i].bytes % SZ_1M) == 0) {
+			log->sizes[i].units = SZ_1M;
+			log->sizes[i].flag = sections[i].flag;
+		} else {
+			log->sizes[i].units = SZ_4K;
+			log->sizes[i].flag = 0;
+		}
+
+		if (!IS_ALIGNED(log->sizes[i].bytes, log->sizes[i].units))
+			drm_err(&i915->drm, "Mis-aligned GuC log %s size: 0x%X vs 0x%X!",
+				sections[i].name, log->sizes[i].bytes, log->sizes[i].units);
+		log->sizes[i].count = log->sizes[i].bytes / log->sizes[i].units;
+
+		if (!log->sizes[i].count) {
+			drm_err(&i915->drm, "Zero GuC log %s size!", sections[i].name);
+		} else {
+			/* Size is +1 unit */
+			log->sizes[i].count--;
+		}
+
+		/* Clip to field size */
+		if (log->sizes[i].count > sections[i].max) {
+			drm_err(&i915->drm, "GuC log %s size too large: %d vs %d!",
+				sections[i].name, log->sizes[i].count + 1, sections[i].max + 1);
+			log->sizes[i].count = sections[i].max;
+		}
+	}
+
+	if (log->sizes[GUC_LOG_SECTIONS_CRASH].units != log->sizes[GUC_LOG_SECTIONS_DEBUG].units) {
+		drm_err(&i915->drm, "Unit mis-match for GuC log crash and debug sections: %d vs %d!",
+			log->sizes[GUC_LOG_SECTIONS_CRASH].units,
+			log->sizes[GUC_LOG_SECTIONS_DEBUG].units);
+		log->sizes[GUC_LOG_SECTIONS_CRASH].units = log->sizes[GUC_LOG_SECTIONS_DEBUG].units;
+		log->sizes[GUC_LOG_SECTIONS_CRASH].count = 0;
+	}
+
+	log->sizes_initialised = true;
+}
+
+static void guc_log_init_sizes(struct intel_guc_log *log)
+{
+	if (log->sizes_initialised)
+		return;
+
+	_guc_log_init_sizes(log);
+}
+
+static u32 intel_guc_log_section_size_crash(struct intel_guc_log *log)
+{
+	guc_log_init_sizes(log);
+
+	return log->sizes[GUC_LOG_SECTIONS_CRASH].bytes;
+}
+
+static u32 intel_guc_log_section_size_debug(struct intel_guc_log *log)
+{
+	guc_log_init_sizes(log);
+
+	return log->sizes[GUC_LOG_SECTIONS_DEBUG].bytes;
+}
+
+u32 intel_guc_log_section_size_capture(struct intel_guc_log *log)
+{
+	guc_log_init_sizes(log);
+
+	return log->sizes[GUC_LOG_SECTIONS_CAPTURE].bytes;
+}
+
 static u32 intel_guc_log_size(struct intel_guc_log *log)
 {
 	/*
@@ -38,7 +188,10 @@ static u32 intel_guc_log_size(struct intel_guc_log *log)
 	 *  |         Capture logs          |
 	 *  +===============================+ + CAPTURE_SIZE
 	 */
-	return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
+	return PAGE_SIZE +
+		intel_guc_log_section_size_crash(log) +
+		intel_guc_log_section_size_debug(log) +
+		intel_guc_log_section_size_capture(log);
 }
 
 /**
@@ -165,7 +318,8 @@ static void guc_move_to_next_buf(struct intel_guc_log *log)
 	smp_wmb();
 
 	/* All data has been written, so now move the offset of sub buffer. */
-	relay_reserve(log->relay.channel, log->vma->obj->base.size - CAPTURE_BUFFER_SIZE);
+	relay_reserve(log->relay.channel, log->vma->obj->base.size -
+					  intel_guc_log_section_size_capture(log));
 
 	/* Switch to the next sub buffer */
 	relay_flush(log->relay.channel);
@@ -210,15 +364,16 @@ bool intel_guc_check_log_buf_overflow(struct intel_guc_log *log,
 	return overflow;
 }
 
-unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type)
+unsigned int intel_guc_get_log_buffer_size(struct intel_guc_log *log,
+					   enum guc_log_buffer_type type)
 {
 	switch (type) {
 	case GUC_DEBUG_LOG_BUFFER:
-		return DEBUG_BUFFER_SIZE;
+		return intel_guc_log_section_size_debug(log);
 	case GUC_CRASH_DUMP_LOG_BUFFER:
-		return CRASH_BUFFER_SIZE;
+		return intel_guc_log_section_size_crash(log);
 	case GUC_CAPTURE_LOG_BUFFER:
-		return CAPTURE_BUFFER_SIZE;
+		return intel_guc_log_section_size_capture(log);
 	default:
 		MISSING_CASE(type);
 	}
@@ -226,7 +381,8 @@ unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type)
 	return 0;
 }
 
-size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type)
+size_t intel_guc_get_log_buffer_offset(struct intel_guc_log *log,
+				       enum guc_log_buffer_type type)
 {
 	enum guc_log_buffer_type i;
 	size_t offset = PAGE_SIZE;/* for the log_buffer_states */
@@ -234,7 +390,7 @@ size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type)
 	for (i = GUC_DEBUG_LOG_BUFFER; i < GUC_MAX_LOG_BUFFER; ++i) {
 		if (i == type)
 			break;
-		offset += intel_guc_get_log_buffer_size(i);
+		offset += intel_guc_get_log_buffer_size(log, i);
 	}
 
 	return offset;
@@ -285,7 +441,7 @@ static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log)
 		 */
 		memcpy(&log_buf_state_local, log_buf_state,
 		       sizeof(struct guc_log_buffer_state));
-		buffer_size = intel_guc_get_log_buffer_size(type);
+		buffer_size = intel_guc_get_log_buffer_size(log, type);
 		read_offset = log_buf_state_local.read_ptr;
 		write_offset = log_buf_state_local.sampled_write_ptr;
 		full_cnt = log_buf_state_local.buffer_full_cnt;
@@ -400,7 +556,7 @@ static int guc_log_relay_create(struct intel_guc_log *log)
 	  * Keep the size of sub buffers same as shared log buffer
 	  * but GuC log-events excludes the error-state-capture logs
 	  */
-	subbuf_size = log->vma->size - CAPTURE_BUFFER_SIZE;
+	subbuf_size = log->vma->size - intel_guc_log_section_size_capture(log);
 
 	/*
 	 * Store up to 8 snapshots, which is large enough to buffer sufficient
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
index dc9715411d626..02127703be809 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
@@ -15,20 +15,6 @@
 
 struct intel_guc;
 
-#if defined(CONFIG_DRM_I915_DEBUG_GUC)
-#define CRASH_BUFFER_SIZE	SZ_2M
-#define DEBUG_BUFFER_SIZE	SZ_16M
-#define CAPTURE_BUFFER_SIZE	SZ_4M
-#elif defined(CONFIG_DRM_I915_DEBUG_GEM)
-#define CRASH_BUFFER_SIZE	SZ_1M
-#define DEBUG_BUFFER_SIZE	SZ_2M
-#define CAPTURE_BUFFER_SIZE	SZ_4M
-#else
-#define CRASH_BUFFER_SIZE	SZ_8K
-#define DEBUG_BUFFER_SIZE	SZ_64K
-#define CAPTURE_BUFFER_SIZE	SZ_2M
-#endif
-
 /*
  * While we're using plain log level in i915, GuC controls are much more...
  * "elaborate"? We have a couple of bits for verbosity, separate bit for actual
@@ -46,10 +32,30 @@ struct intel_guc;
 #define GUC_VERBOSITY_TO_LOG_LEVEL(x)	((x) + 2)
 #define GUC_LOG_LEVEL_MAX GUC_VERBOSITY_TO_LOG_LEVEL(GUC_LOG_VERBOSITY_MAX)
 
+enum {
+	GUC_LOG_SECTIONS_CRASH,
+	GUC_LOG_SECTIONS_DEBUG,
+	GUC_LOG_SECTIONS_CAPTURE,
+	GUC_LOG_SECTIONS_LIMIT
+};
+
 struct intel_guc_log {
 	u32 level;
+
+	/* Allocation settings */
+	struct {
+		s32 bytes;	/* Size in bytes */
+		s32 units;	/* GuC API units - 1MB or 4KB */
+		s32 count;	/* Number of API units */
+		u32 flag;	/* GuC API units flag */
+	} sizes[GUC_LOG_SECTIONS_LIMIT];
+	bool sizes_initialised;
+
+	/* Combined buffer allocation */
 	struct i915_vma *vma;
 	void *buf_addr;
+
+	/* RelayFS support */
 	struct {
 		bool buf_in_use;
 		bool started;
@@ -58,6 +64,7 @@ struct intel_guc_log {
 		struct mutex lock;
 		u32 full_count;
 	} relay;
+
 	/* logging related stats */
 	struct {
 		u32 sampled_overflow;
@@ -69,8 +76,9 @@ struct intel_guc_log {
 void intel_guc_log_init_early(struct intel_guc_log *log);
 bool intel_guc_check_log_buf_overflow(struct intel_guc_log *log, enum guc_log_buffer_type type,
 				      unsigned int full_cnt);
-unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type);
-size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type);
+unsigned int intel_guc_get_log_buffer_size(struct intel_guc_log *log,
+					   enum guc_log_buffer_type type);
+size_t intel_guc_get_log_buffer_offset(struct intel_guc_log *log, enum guc_log_buffer_type type);
 int intel_guc_log_create(struct intel_guc_log *log);
 void intel_guc_log_destroy(struct intel_guc_log *log);
 
@@ -92,4 +100,6 @@ void intel_guc_log_info(struct intel_guc_log *log, struct drm_printer *p);
 int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
 		       bool dump_load_err);
 
+u32 intel_guc_log_section_size_capture(struct intel_guc_log *log);
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 6fc475a5db615..06ca5b8221118 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -171,6 +171,18 @@ i915_param_named(guc_log_level, int, 0400,
 	"GuC firmware logging level. Requires GuC to be loaded. "
 	"(-1=auto [default], 0=disable, 1..4=enable with verbosity min..max)");
 
+i915_param_named(guc_log_size_crash, int, 0400,
+	"GuC firmware logging buffer size for crash dumps (in MB)"
+	"(-1=auto [default], NB: max = 4, other restrictions apply)");
+
+i915_param_named(guc_log_size_debug, int, 0400,
+	"GuC firmware logging buffer size for debug logs (in MB)"
+	"(-1=auto [default], NB: max = 16, other restrictions apply)");
+
+i915_param_named(guc_log_size_capture, int, 0400,
+	"GuC error capture register dump buffer size (in MB)"
+	"(-1=auto [default], NB: max = 4, other restrictions apply)");
+
 i915_param_named_unsafe(guc_firmware_path, charp, 0400,
 	"GuC firmware path to use instead of the default one");
 
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index 2733cb6cfe094..f684d1ab87078 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -61,6 +61,9 @@ struct drm_printer;
 	param(int, invert_brightness, 0, 0600) \
 	param(int, enable_guc, -1, 0400) \
 	param(int, guc_log_level, -1, 0400) \
+	param(int, guc_log_size_crash, -1, 0400) \
+	param(int, guc_log_size_debug, -1, 0400) \
+	param(int, guc_log_size_capture, -1, 0400) \
 	param(char *, guc_firmware_path, NULL, 0400) \
 	param(char *, huc_firmware_path, NULL, 0400) \
 	param(char *, dmc_firmware_path, NULL, 0400) \
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 7/7] drm/i915/guc: Reduce spam from error capture
  2022-07-28  2:20 [PATCH 0/7] Fixes and improvements to GuC logging and error capture John.C.Harrison
                   ` (5 preceding siblings ...)
  2022-07-28  2:20 ` [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable John.C.Harrison
@ 2022-07-28  2:20 ` John.C.Harrison
  2022-08-02 18:54   ` [Intel-gfx] " Teres Alexis, Alan Previn
  6 siblings, 1 reply; 29+ messages in thread
From: John.C.Harrison @ 2022-07-28  2:20 UTC (permalink / raw)
  To: Intel-GFX; +Cc: John Harrison, DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

Some debug code got left in when the GuC based register save for error
capture was added. Remove that.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 67 ++++++++-----------
 1 file changed, 28 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
index d2ac53d4f3b6e..8f11651460131 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
@@ -1383,33 +1383,22 @@ guc_capture_reg_to_str(const struct intel_guc *guc, u32 owner, u32 type,
 	return NULL;
 }
 
-#ifdef CONFIG_DRM_I915_DEBUG_GUC
-#define __out(a, ...) \
-	do { \
-		drm_warn((&(a)->i915->drm), __VA_ARGS__); \
-		i915_error_printf((a), __VA_ARGS__); \
-	} while (0)
-#else
-#define __out(a, ...) \
-	i915_error_printf(a, __VA_ARGS__)
-#endif
-
 #define GCAP_PRINT_INTEL_ENG_INFO(ebuf, eng) \
 	do { \
-		__out(ebuf, "    i915-Eng-Name: %s command stream\n", \
-		      (eng)->name); \
-		__out(ebuf, "    i915-Eng-Inst-Class: 0x%02x\n", (eng)->class); \
-		__out(ebuf, "    i915-Eng-Inst-Id: 0x%02x\n", (eng)->instance); \
-		__out(ebuf, "    i915-Eng-LogicalMask: 0x%08x\n", \
-		      (eng)->logical_mask); \
+		i915_error_printf(ebuf, "    i915-Eng-Name: %s command stream\n", \
+				  (eng)->name); \
+		i915_error_printf(ebuf, "    i915-Eng-Inst-Class: 0x%02x\n", (eng)->class); \
+		i915_error_printf(ebuf, "    i915-Eng-Inst-Id: 0x%02x\n", (eng)->instance); \
+		i915_error_printf(ebuf, "    i915-Eng-LogicalMask: 0x%08x\n", \
+				  (eng)->logical_mask); \
 	} while (0)
 
 #define GCAP_PRINT_GUC_INST_INFO(ebuf, node) \
 	do { \
-		__out(ebuf, "    GuC-Engine-Inst-Id: 0x%08x\n", \
-		      (node)->eng_inst); \
-		__out(ebuf, "    GuC-Context-Id: 0x%08x\n", (node)->guc_id); \
-		__out(ebuf, "    LRCA: 0x%08x\n", (node)->lrca); \
+		i915_error_printf(ebuf, "    GuC-Engine-Inst-Id: 0x%08x\n", \
+				  (node)->eng_inst); \
+		i915_error_printf(ebuf, "    GuC-Context-Id: 0x%08x\n", (node)->guc_id); \
+		i915_error_printf(ebuf, "    LRCA: 0x%08x\n", (node)->lrca); \
 	} while (0)
 
 int intel_guc_capture_print_engine_node(struct drm_i915_error_state_buf *ebuf,
@@ -1441,57 +1430,57 @@ int intel_guc_capture_print_engine_node(struct drm_i915_error_state_buf *ebuf,
 
 	guc = &ee->engine->gt->uc.guc;
 
-	__out(ebuf, "global --- GuC Error Capture on %s command stream:\n",
-	      ee->engine->name);
+	i915_error_printf(ebuf, "global --- GuC Error Capture on %s command stream:\n",
+			  ee->engine->name);
 
 	node = ee->guc_capture_node;
 	if (!node) {
-		__out(ebuf, "  No matching ee-node\n");
+		i915_error_printf(ebuf, "  No matching ee-node\n");
 		return 0;
 	}
 
-	__out(ebuf, "Coverage:  %s\n", grptype[node->is_partial]);
+	i915_error_printf(ebuf, "Coverage:  %s\n", grptype[node->is_partial]);
 
 	for (i = GUC_CAPTURE_LIST_TYPE_GLOBAL; i < GUC_CAPTURE_LIST_TYPE_MAX; ++i) {
-		__out(ebuf, "  RegListType: %s\n",
-		      datatype[i % GUC_CAPTURE_LIST_TYPE_MAX]);
-		__out(ebuf, "    Owner-Id: %d\n", node->reginfo[i].vfid);
+		i915_error_printf(ebuf, "  RegListType: %s\n",
+				  datatype[i % GUC_CAPTURE_LIST_TYPE_MAX]);
+		i915_error_printf(ebuf, "    Owner-Id: %d\n", node->reginfo[i].vfid);
 
 		switch (i) {
 		case GUC_CAPTURE_LIST_TYPE_GLOBAL:
 		default:
 			break;
 		case GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS:
-			__out(ebuf, "    GuC-Eng-Class: %d\n", node->eng_class);
-			__out(ebuf, "    i915-Eng-Class: %d\n",
-			      guc_class_to_engine_class(node->eng_class));
+			i915_error_printf(ebuf, "    GuC-Eng-Class: %d\n", node->eng_class);
+			i915_error_printf(ebuf, "    i915-Eng-Class: %d\n",
+					  guc_class_to_engine_class(node->eng_class));
 			break;
 		case GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE:
 			eng = intel_guc_lookup_engine(guc, node->eng_class, node->eng_inst);
 			if (eng)
 				GCAP_PRINT_INTEL_ENG_INFO(ebuf, eng);
 			else
-				__out(ebuf, "    i915-Eng-Lookup Fail!\n");
+				i915_error_printf(ebuf, "    i915-Eng-Lookup Fail!\n");
 			GCAP_PRINT_GUC_INST_INFO(ebuf, node);
 			break;
 		}
 
 		numregs = node->reginfo[i].num_regs;
-		__out(ebuf, "    NumRegs: %d\n", numregs);
+		i915_error_printf(ebuf, "    NumRegs: %d\n", numregs);
 		j = 0;
 		while (numregs--) {
 			regs = node->reginfo[i].regs;
 			str = guc_capture_reg_to_str(guc, GUC_CAPTURE_LIST_INDEX_PF, i,
 						     node->eng_class, 0, regs[j].offset, &is_ext);
 			if (!str)
-				__out(ebuf, "      REG-0x%08x", regs[j].offset);
+				i915_error_printf(ebuf, "      REG-0x%08x", regs[j].offset);
 			else
-				__out(ebuf, "      %s", str);
+				i915_error_printf(ebuf, "      %s", str);
 			if (is_ext)
-				__out(ebuf, "[%ld][%ld]",
-				      FIELD_GET(GUC_REGSET_STEERING_GROUP, regs[j].flags),
-				      FIELD_GET(GUC_REGSET_STEERING_INSTANCE, regs[j].flags));
-			__out(ebuf, ":  0x%08x\n", regs[j].value);
+				i915_error_printf(ebuf, "[%ld][%ld]",
+					FIELD_GET(GUC_REGSET_STEERING_GROUP, regs[j].flags),
+					FIELD_GET(GUC_REGSET_STEERING_INSTANCE, regs[j].flags));
+			i915_error_printf(ebuf, ":  0x%08x\n", regs[j].value);
 			++j;
 		}
 	}
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] drm/i915/guc: Add a helper for log buffer size
  2022-07-28  2:20 ` [PATCH 1/7] drm/i915/guc: Add a helper for log buffer size John.C.Harrison
@ 2022-08-02 17:37   ` Teres Alexis, Alan Previn
  2022-08-03  0:29     ` John Harrison
  0 siblings, 1 reply; 29+ messages in thread
From: Teres Alexis, Alan Previn @ 2022-08-02 17:37 UTC (permalink / raw)
  To: Harrison, John C, Intel-GFX; +Cc: Brost, Matthew, DRI-Devel

Something minor in comments, so conditional R-B (please fix on the way in or reply to correct me):

Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>

On Wed, 2022-07-27 at 19:20 -0700, Harrison, John C wrote:
> From: Alan Previn <alan.previn.teres.alexis@intel.com>
> 
> Add a helper to get GuC log buffer size.
> 
> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 49 ++++++++++++----------
>  1 file changed, 27 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> index 25b2d7ce6640d..492bbf419d4df 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> @@ -15,6 +15,32 @@
>  
>  static void guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log);
>  
> +static u32 intel_guc_log_size(struct intel_guc_log *log)
> +{
> +	/*
> +	 *  GuC Log buffer Layout:
> +	 *
> +	 *  NB: Ordering must follow "enum guc_log_buffer_type".
> +	 *
> +	 *  +===============================+ 00B
> +	 *  |      Debug state header       |
> +	 *  +-------------------------------+ 32B
> 
Something we might have missed in prior updates but i think the bufer state is now 36 bytes long no? (9 dwords).


> +	 *  |    Crash dump state header    |
> +	 *  +-------------------------------+ 64B
> +	 *  |     Capture state header      |
> +	 *  +-------------------------------+ 96B
> +	 *  |                               |
> +	 *  +===============================+ PAGE_SIZE (4KB)
> +	 *  |          Debug logs           |
> +	 *  +===============================+ + DEBUG_SIZE
> +	 *  |        Crash Dump logs        |
> +	 *  +===============================+ + CRASH_SIZE
> +	 *  |         Capture logs          |
> +	 *  +===============================+ + CAPTURE_SIZE
> +	 */
> +	return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
> +}
> +
>  /**
>   * DOC: GuC firmware log
>   *
> @@ -461,32 +487,11 @@ int intel_guc_log_create(struct intel_guc_log *log)
>  
>  	GEM_BUG_ON(log->vma);
>  
> -	/*
> -	 *  GuC Log buffer Layout
> -	 * (this ordering must follow "enum guc_log_buffer_type" definition)
> -	 *
> -	 *  +===============================+ 00B
> -	 *  |      Debug state header       |
> -	 *  +-------------------------------+ 32B
> -	 *  |    Crash dump state header    |
> -	 *  +-------------------------------+ 64B
> -	 *  |     Capture state header      |
> -	 *  +-------------------------------+ 96B
> -	 *  |                               |
> -	 *  +===============================+ PAGE_SIZE (4KB)
> -	 *  |          Debug logs           |
> -	 *  +===============================+ + DEBUG_SIZE
> -	 *  |        Crash Dump logs        |
> -	 *  +===============================+ + CRASH_SIZE
> -	 *  |         Capture logs          |
> -	 *  +===============================+ + CAPTURE_SIZE
> -	 */
>  	if (intel_guc_capture_output_min_size_est(guc) > CAPTURE_BUFFER_SIZE)
>  		DRM_WARN("GuC log buffer for state_capture maybe too small. %d < %d\n",
>  			 CAPTURE_BUFFER_SIZE, intel_guc_capture_output_min_size_est(guc));
>  
> -	guc_log_size = PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE +
> -		       CAPTURE_BUFFER_SIZE;
> +	guc_log_size = intel_guc_log_size(log);
>  
>  	vma = intel_guc_allocate_vma(guc, guc_log_size);
>  	if (IS_ERR(vma)) {
> -- 
> 2.37.1
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 2/7] drm/i915/guc: Fix capture size warning and bump the size
  2022-07-28  2:20 ` [PATCH 2/7] drm/i915/guc: Fix capture size warning and bump the size John.C.Harrison
@ 2022-08-02 17:46   ` Teres Alexis, Alan Previn
  0 siblings, 0 replies; 29+ messages in thread
From: Teres Alexis, Alan Previn @ 2022-08-02 17:46 UTC (permalink / raw)
  To: Harrison, John C, Intel-GFX; +Cc: DRI-Devel


Straight forward change - LGTM.

Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>


On Wed, 2022-07-27 at 19:20 -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> There was a size check to warn if the GuC error state capture buffer
> allocation would be too small to fit a reasonable amount of capture
> data for the current platform. Unfortunately, the test was done too
> early in the boot sequence and was actually testing 'if(-ENODEV >
> size)'.
> 
> Move the check to be later. The check is only used to print a warning
> message, so it doesn't really matter how early or late it is done.
> Note that it is not possible to dynamically size the buffer because
> the allocation needs to be done before the engine information is
> available (at least, it would be in the intended two-phase GuC init
> process).
> 
> Now that the check works, it is reporting size too small for newer
> platforms. The check includes a 3x oversample multiplier to allow for
> multiple error captures to be bufferd by GuC before i915 has a chance
> to read them out. This is less important than simply being big enough
> to fit the first capture.
> 
> So a) bump the default size to be large enough for one capture minimum
> and b) make the warning only if one capture won't fit, instead use a
> notice for the 3x size.
> 
> Note that the size estimate is a worst case scenario. Actual captures
> will likely be smaller.
> 
> Lastly, use drm_warn istead of DRM_WARN as the former provides more
> infmration and the latter is deprecated.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 40 ++++++++++++++-----
>  .../gpu/drm/i915/gt/uc/intel_guc_capture.h    |  1 -
>  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    |  4 --
>  drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  4 +-
>  4 files changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> index 75257bd20ff01..b54b7883320b1 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> @@ -600,10 +600,8 @@ intel_guc_capture_getnullheader(struct intel_guc *guc,
>  	return 0;
>  }
>  
> -#define GUC_CAPTURE_OVERBUFFER_MULTIPLIER 3
> -
> -int
> -intel_guc_capture_output_min_size_est(struct intel_guc *guc)
> +static int
> +guc_capture_output_min_size_est(struct intel_guc *guc)
>  {
>  	struct intel_gt *gt = guc_to_gt(guc);
>  	struct intel_engine_cs *engine;
> @@ -623,13 +621,8 @@ intel_guc_capture_output_min_size_est(struct intel_guc *guc)
>  	 * For each engine instance, there would be 1 x guc_state_capture_group_t output
>  	 * followed by 3 x guc_state_capture_t lists. The latter is how the register
>  	 * dumps are split across different register types (where the '3' are global vs class
> -	 * vs instance). Finally, let's multiply the whole thing by 3x (just so we are
> -	 * not limited to just 1 round of data in a worst case full register dump log)
> -	 *
> -	 * NOTE: intel_guc_log that allocates the log buffer would round this size up to
> -	 * a power of two.
> +	 * vs instance).
>  	 */
> -
>  	for_each_engine(engine, gt, id) {
>  		worst_min_size += sizeof(struct guc_state_capture_group_header_t) +
>  					 (3 * sizeof(struct guc_state_capture_header_t));
> @@ -649,7 +642,30 @@ intel_guc_capture_output_min_size_est(struct intel_guc *guc)
>  
>  	worst_min_size += (num_regs * sizeof(struct guc_mmio_reg));
>  
> -	return (worst_min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER);
> +	return worst_min_size;
> +}
> +
> +/*
> + * Add on a 3x multiplier to allow for multiple back-to-back captures occurring
> + * before the i915 can read the data out and process it
> + */
> +#define GUC_CAPTURE_OVERBUFFER_MULTIPLIER 3
> +
> +static void check_guc_capture_size(struct intel_guc *guc)
> +{
> +	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> +	int min_size = guc_capture_output_min_size_est(guc);
> +	int spare_size = min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER;
> +
> +	if (min_size < 0)
> +		drm_warn(&i915->drm, "Failed to calculate GuC error state capture buffer minimum size: %d!\n",
> +			 min_size);
> +	else if (min_size > CAPTURE_BUFFER_SIZE)
> +		drm_warn(&i915->drm, "GuC error state capture buffer is too small: %d < %d\n",
> +			 CAPTURE_BUFFER_SIZE, min_size);
> +	else if (spare_size > CAPTURE_BUFFER_SIZE)
> +		drm_notice(&i915->drm, "GuC error state capture buffer maybe too small: %d < %d (min = %d)\n",
> +			   CAPTURE_BUFFER_SIZE, spare_size, min_size);
>  }
>  
>  /*
> @@ -1580,5 +1596,7 @@ int intel_guc_capture_init(struct intel_guc *guc)
>  	INIT_LIST_HEAD(&guc->capture->outlist);
>  	INIT_LIST_HEAD(&guc->capture->cachelist);
>  
> +	check_guc_capture_size(guc);
> +
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
> index d3d7bd0b6db64..fbd3713c7832d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.h
> @@ -21,7 +21,6 @@ int intel_guc_capture_print_engine_node(struct drm_i915_error_state_buf *m,
>  void intel_guc_capture_get_matching_node(struct intel_gt *gt, struct intel_engine_coredump *ee,
>  					 struct intel_context *ce);
>  void intel_guc_capture_process(struct intel_guc *guc);
> -int intel_guc_capture_output_min_size_est(struct intel_guc *guc);
>  int intel_guc_capture_getlist(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
>  			      void **outptr);
>  int intel_guc_capture_getlistsize(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> index 492bbf419d4df..991d4a02248dc 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> @@ -487,10 +487,6 @@ int intel_guc_log_create(struct intel_guc_log *log)
>  
>  	GEM_BUG_ON(log->vma);
>  
> -	if (intel_guc_capture_output_min_size_est(guc) > CAPTURE_BUFFER_SIZE)
> -		DRM_WARN("GuC log buffer for state_capture maybe too small. %d < %d\n",
> -			 CAPTURE_BUFFER_SIZE, intel_guc_capture_output_min_size_est(guc));
> -
>  	guc_log_size = intel_guc_log_size(log);
>  
>  	vma = intel_guc_allocate_vma(guc, guc_log_size);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> index 18007e639be99..dc9715411d626 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> @@ -22,11 +22,11 @@ struct intel_guc;
>  #elif defined(CONFIG_DRM_I915_DEBUG_GEM)
>  #define CRASH_BUFFER_SIZE	SZ_1M
>  #define DEBUG_BUFFER_SIZE	SZ_2M
> -#define CAPTURE_BUFFER_SIZE	SZ_1M
> +#define CAPTURE_BUFFER_SIZE	SZ_4M
>  #else
>  #define CRASH_BUFFER_SIZE	SZ_8K
>  #define DEBUG_BUFFER_SIZE	SZ_64K
> -#define CAPTURE_BUFFER_SIZE	SZ_16K
> +#define CAPTURE_BUFFER_SIZE	SZ_2M
>  #endif
>  
>  /*
> -- 
> 2.37.1
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Record CTB info in error logs
  2022-07-28  2:20 ` [PATCH 4/7] drm/i915/guc: Record CTB info in error logs John.C.Harrison
@ 2022-08-02 18:27   ` Teres Alexis, Alan Previn
  2022-08-03  0:20     ` John Harrison
  0 siblings, 1 reply; 29+ messages in thread
From: Teres Alexis, Alan Previn @ 2022-08-02 18:27 UTC (permalink / raw)
  To: Harrison, John C, Intel-GFX; +Cc: DRI-Devel

One minor NIT (though i hope it could be fixed otw in as it adds a bit of ease-of-log-readibility).
That said, everything else looks good.

Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
 
On Wed, 2022-07-27 at 19:20 -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> When debugging GuC communication issues, it is useful to have the CTB
> info available. So add the state and buffer contents to the error
> capture log.
> 
> Also, add a sub-structure for the GuC specific error capture info as
> it is now becoming numerous.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 59 +++++++++++++++++++++++----
>  drivers/gpu/drm/i915/i915_gpu_error.h | 20 +++++++--
>  2 files changed, 67 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index addba75252343..543ba63f958ea 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -671,6 +671,18 @@ static void err_print_pciid(struct drm_i915_error_state_buf *m,
>  		   pdev->subsystem_device);
>  }
>  
> +static void err_print_guc_ctb(struct drm_i915_error_state_buf *m,
> +			      const char *name,
> +			      const struct intel_ctb_coredump *ctb)
> +{
> +	if (!ctb->size)
> +		return;
> +
> +	err_printf(m, "GuC %s CTB: raw: 0x%08X, 0x%08X/%08X, cached: 0x%08X/%08X, desc = 0x%08X, buf = 0x%08X x 0x%08X\n",
> +		   name, ctb->raw_status, ctb->raw_head, ctb->raw_tail,
> +		   ctb->head, ctb->tail, ctb->desc_offset, ctb->cmds_offset, ctb->size);
> 
NIT: to make it more readible on first glance, would be nice to add more descriptive text like "raw: Sts:0x%08X,
Hd:0x%08X,Tl:0x@08X..." also, the not sure why cmds_offset is presented with a "x size" as opposed to just "desc-off =
foo1, cmd-off = foo2, size = foo3"?
> +}
> +
>  static void err_print_uc(struct drm_i915_error_state_buf *m,
>  			 const struct intel_uc_coredump *error_uc)
>  {
> @@ -678,8 +690,12 @@ static void err_print_uc(struct drm_i915_error_state_buf *m,
>  
>  	intel_uc_fw_dump(&error_uc->guc_fw, &p);
>  	intel_uc_fw_dump(&error_uc->huc_fw, &p);
> -	err_printf(m, "GuC timestamp: 0x%08x\n", error_uc->timestamp);
> -	intel_gpu_error_print_vma(m, NULL, error_uc->guc_log);
> +	err_printf(m, "GuC timestamp: 0x%08x\n", error_uc->guc.timestamp);
> +	intel_gpu_error_print_vma(m, NULL, error_uc->guc.vma_log);
> +	err_printf(m, "GuC CTB fence: %d\n", error_uc->guc.last_fence);
> +	err_print_guc_ctb(m, "Send", error_uc->guc.ctb + 0);
> +	err_print_guc_ctb(m, "Recv", error_uc->guc.ctb + 1);
> +	intel_gpu_error_print_vma(m, NULL, error_uc->guc.vma_ctb);
>  }
>  
>  static void err_free_sgl(struct scatterlist *sgl)
> @@ -854,7 +870,7 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
>  	if (error->gt) {
>  		bool print_guc_capture = false;
>  
> -		if (error->gt->uc && error->gt->uc->is_guc_capture)
> +		if (error->gt->uc && error->gt->uc->guc.is_guc_capture)
>  			print_guc_capture = true;
>  
>  		err_print_gt_display(m, error->gt);
> @@ -1009,7 +1025,8 @@ static void cleanup_uc(struct intel_uc_coredump *uc)
>  {
>  	kfree(uc->guc_fw.path);
>  	kfree(uc->huc_fw.path);
> -	i915_vma_coredump_free(uc->guc_log);
> +	i915_vma_coredump_free(uc->guc.vma_log);
> +	i915_vma_coredump_free(uc->guc.vma_ctb);
>  
>  	kfree(uc);
>  }
> @@ -1658,6 +1675,23 @@ gt_record_engines(struct intel_gt_coredump *gt,
>  	}
>  }
>  
> +static void gt_record_guc_ctb(struct intel_ctb_coredump *saved,
> +			      const struct intel_guc_ct_buffer *ctb,
> +			      const void *blob_ptr, struct intel_guc *guc)
> +{
> +	if (!ctb || !ctb->desc)
> +		return;
> +
> +	saved->raw_status = ctb->desc->status;
> +	saved->raw_head = ctb->desc->head;
> +	saved->raw_tail = ctb->desc->tail;
> +	saved->head = ctb->head;
> +	saved->tail = ctb->tail;
> +	saved->size = ctb->size;
> +	saved->desc_offset = ((void *)ctb->desc) - blob_ptr;
> +	saved->cmds_offset = ((void *)ctb->cmds) - blob_ptr;
> +}
> +
>  static struct intel_uc_coredump *
>  gt_record_uc(struct intel_gt_coredump *gt,
>  	     struct i915_vma_compress *compress)
> @@ -1684,9 +1718,16 @@ gt_record_uc(struct intel_gt_coredump *gt,
>  	 * log times to system times (in conjunction with the error->boottime and
>  	 * gt->clock_frequency fields saved elsewhere).
>  	 */
> -	error_uc->timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
> -	error_uc->guc_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
> -						"GuC log buffer", compress);
> +	error_uc->guc.timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
> +	error_uc->guc.vma_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
> +						    "GuC log buffer", compress);
> +	error_uc->guc.vma_ctb = create_vma_coredump(gt->_gt, uc->guc.ct.vma,
> +						    "GuC CT buffer", compress);
> +	error_uc->guc.last_fence = uc->guc.ct.requests.last_fence;
> +	gt_record_guc_ctb(error_uc->guc.ctb + 0, &uc->guc.ct.ctbs.send,
> +			  uc->guc.ct.ctbs.send.desc, (struct intel_guc *)&uc->guc);
> +	gt_record_guc_ctb(error_uc->guc.ctb + 1, &uc->guc.ct.ctbs.recv,
> +			  uc->guc.ct.ctbs.send.desc, (struct intel_guc *)&uc->guc);
>  
>  	return error_uc;
>  }
> @@ -2039,9 +2080,9 @@ __i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask, u32 du
>  			error->gt->uc = gt_record_uc(error->gt, compress);
>  			if (error->gt->uc) {
>  				if (dump_flags & CORE_DUMP_FLAG_IS_GUC_CAPTURE)
> -					error->gt->uc->is_guc_capture = true;
> +					error->gt->uc->guc.is_guc_capture = true;
>  				else
> -					GEM_BUG_ON(error->gt->uc->is_guc_capture);
> +					GEM_BUG_ON(error->gt->uc->guc.is_guc_capture);
>  			}
>  		}
>  
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index d8a8b3d529e09..efc75cc2ffdb9 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -125,6 +125,15 @@ struct intel_engine_coredump {
>  	struct intel_engine_coredump *next;
>  };
>  
> +struct intel_ctb_coredump {
> +	u32 raw_head, head;
> +	u32 raw_tail, tail;
> +	u32 raw_status;
> +	u32 desc_offset;
> +	u32 cmds_offset;
> +	u32 size;
> +};
> +
>  struct intel_gt_coredump {
>  	const struct intel_gt *_gt;
>  	bool awake;
> @@ -165,9 +174,14 @@ struct intel_gt_coredump {
>  	struct intel_uc_coredump {
>  		struct intel_uc_fw guc_fw;
>  		struct intel_uc_fw huc_fw;
> -		struct i915_vma_coredump *guc_log;
> -		u32 timestamp;
> -		bool is_guc_capture;
> +		struct guc_info {
> +			struct intel_ctb_coredump ctb[2];
> +			struct i915_vma_coredump *vma_ctb;
> +			struct i915_vma_coredump *vma_log;
> +			u32 timestamp;
> +			u16 last_fence;
> +			bool is_guc_capture;
> +		} guc;
>  	} *uc;
>  
>  	struct intel_gt_coredump *next;
> -- 
> 2.37.1
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 5/7] drm/i915/guc: Use streaming loads to speed up dumping the guc log
  2022-07-28  2:20 ` [PATCH 5/7] drm/i915/guc: Use streaming loads to speed up dumping the guc log John.C.Harrison
@ 2022-08-02 18:48   ` Teres Alexis, Alan Previn
  2022-08-03  0:14     ` John Harrison
  0 siblings, 1 reply; 29+ messages in thread
From: Teres Alexis, Alan Previn @ 2022-08-02 18:48 UTC (permalink / raw)
  To: Harrison, John C, Intel-GFX; +Cc: Wilson, Chris P, DRI-Devel

One concern below. Else, nice, simple yet good optimization here. :)

In the interest of quicker progression, I will provide a conditional R-B if you can either fix the issue raised below on
the way in or provide a reason why that's not an issue:

Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>

On Wed, 2022-07-27 at 19:20 -0700, John.C.Harrison@Intel.com wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Use a temporary page and mempy_from_wc to reduce the time it takes to
> dump the guc log to debugfs.
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 24 ++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> index 07d31ae32f765..4722d4b18ed19 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> @@ -750,8 +750,9 @@ int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
>  	struct intel_guc *guc = log_to_guc(log);
>  	struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
>  	struct drm_i915_gem_object *obj = NULL;
> -	u32 *map;
> -	int i = 0;
> +	void *map;
> +	u32 *page;
> +	int i, j;
>  
>  	if (!intel_guc_is_supported(guc))
>  		return -ENODEV;
> @@ -764,23 +765,34 @@ int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
>  	if (!obj)
>  		return 0;
>  
> +	page = (u32 *)__get_free_page(GFP_KERNEL);
> +	if (!page)
> +		return -ENOMEM;

Alan: although unlikely, its possible that user could trigger debugfs mid of a gt reset - not sure if we need to use the
"uc->reset_in_progress" before calling this allocation and return a different error in that case like EAGAIN or EBUSY or
ECONNRESET.

> +
>  	intel_guc_dump_time_info(guc, p);
>  
>  	map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC);
>  	if (IS_ERR(map)) {
>  		DRM_DEBUG("Failed to pin object\n");
>  		drm_puts(p, "(log data unaccessible)\n");
> +		free_page((unsigned long)page);
>  		return PTR_ERR(map);
>  	}
>  
> -	for (i = 0; i < obj->base.size / sizeof(u32); i += 4)
> -		drm_printf(p, "0x%08x 0x%08x 0x%08x 0x%08x\n",
> -			   *(map + i), *(map + i + 1),
> -			   *(map + i + 2), *(map + i + 3));
> +	for (i = 0; i < obj->base.size; i += PAGE_SIZE) {
> +		if (!i915_memcpy_from_wc(page, map + i, PAGE_SIZE))
> +			memcpy(page, map + i, PAGE_SIZE);
> +
> +		for (j = 0; j < PAGE_SIZE / sizeof(u32); j += 4)
> +			drm_printf(p, "0x%08x 0x%08x 0x%08x 0x%08x\n",
> +				   *(page + j + 0), *(page + j + 1),
> +				   *(page + j + 2), *(page + j + 3));
> +	}
>  
>  	drm_puts(p, "\n");
>  
>  	i915_gem_object_unpin_map(obj);
> +	free_page((unsigned long)page);
>  
>  	return 0;
>  }
> -- 
> 2.37.1
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 7/7] drm/i915/guc: Reduce spam from error capture
  2022-07-28  2:20 ` [PATCH 7/7] drm/i915/guc: Reduce spam from error capture John.C.Harrison
@ 2022-08-02 18:54   ` Teres Alexis, Alan Previn
  0 siblings, 0 replies; 29+ messages in thread
From: Teres Alexis, Alan Previn @ 2022-08-02 18:54 UTC (permalink / raw)
  To: Harrison, John C, Intel-GFX; +Cc: DRI-Devel


Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>

On Wed, 2022-07-27 at 19:20 -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Some debug code got left in when the GuC based register save for error
> capture was added. Remove that.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    | 67 ++++++++-----------
>  1 file changed, 28 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> index d2ac53d4f3b6e..8f11651460131 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> @@ -1383,33 +1383,22 @@ guc_capture_reg_to_str(const struct intel_guc *guc, u32 owner, u32 type,
>  	return NULL;
>  }
>  
> -#ifdef CONFIG_DRM_I915_DEBUG_GUC
> -#define __out(a, ...) \
> -	do { \
> -		drm_warn((&(a)->i915->drm), __VA_ARGS__); \
> -		i915_error_printf((a), __VA_ARGS__); \
> -	} while (0)
> -#else
> -#define __out(a, ...) \
> -	i915_error_printf(a, __VA_ARGS__)
> -#endif
> -
>  #define GCAP_PRINT_INTEL_ENG_INFO(ebuf, eng) \
>  	do { \
> -		__out(ebuf, "    i915-Eng-Name: %s command stream\n", \
> -		      (eng)->name); \
> -		__out(ebuf, "    i915-Eng-Inst-Class: 0x%02x\n", (eng)->class); \
> -		__out(ebuf, "    i915-Eng-Inst-Id: 0x%02x\n", (eng)->instance); \
> -		__out(ebuf, "    i915-Eng-LogicalMask: 0x%08x\n", \
> -		      (eng)->logical_mask); \
> +		i915_error_printf(ebuf, "    i915-Eng-Name: %s command stream\n", \
> +				  (eng)->name); \
> +		i915_error_printf(ebuf, "    i915-Eng-Inst-Class: 0x%02x\n", (eng)->class); \
> +		i915_error_printf(ebuf, "    i915-Eng-Inst-Id: 0x%02x\n", (eng)->instance); \
> +		i915_error_printf(ebuf, "    i915-Eng-LogicalMask: 0x%08x\n", \
> +				  (eng)->logical_mask); \
>  	} while (0)
>  
>  #define GCAP_PRINT_GUC_INST_INFO(ebuf, node) \
>  	do { \
> -		__out(ebuf, "    GuC-Engine-Inst-Id: 0x%08x\n", \
> -		      (node)->eng_inst); \
> -		__out(ebuf, "    GuC-Context-Id: 0x%08x\n", (node)->guc_id); \
> -		__out(ebuf, "    LRCA: 0x%08x\n", (node)->lrca); \
> +		i915_error_printf(ebuf, "    GuC-Engine-Inst-Id: 0x%08x\n", \
> +				  (node)->eng_inst); \
> +		i915_error_printf(ebuf, "    GuC-Context-Id: 0x%08x\n", (node)->guc_id); \
> +		i915_error_printf(ebuf, "    LRCA: 0x%08x\n", (node)->lrca); \
>  	} while (0)
>  
>  int intel_guc_capture_print_engine_node(struct drm_i915_error_state_buf *ebuf,
> @@ -1441,57 +1430,57 @@ int intel_guc_capture_print_engine_node(struct drm_i915_error_state_buf *ebuf,
>  
>  	guc = &ee->engine->gt->uc.guc;
>  
> -	__out(ebuf, "global --- GuC Error Capture on %s command stream:\n",
> -	      ee->engine->name);
> +	i915_error_printf(ebuf, "global --- GuC Error Capture on %s command stream:\n",
> +			  ee->engine->name);
>  
>  	node = ee->guc_capture_node;
>  	if (!node) {
> -		__out(ebuf, "  No matching ee-node\n");
> +		i915_error_printf(ebuf, "  No matching ee-node\n");
>  		return 0;
>  	}
>  
> -	__out(ebuf, "Coverage:  %s\n", grptype[node->is_partial]);
> +	i915_error_printf(ebuf, "Coverage:  %s\n", grptype[node->is_partial]);
>  
>  	for (i = GUC_CAPTURE_LIST_TYPE_GLOBAL; i < GUC_CAPTURE_LIST_TYPE_MAX; ++i) {
> -		__out(ebuf, "  RegListType: %s\n",
> -		      datatype[i % GUC_CAPTURE_LIST_TYPE_MAX]);
> -		__out(ebuf, "    Owner-Id: %d\n", node->reginfo[i].vfid);
> +		i915_error_printf(ebuf, "  RegListType: %s\n",
> +				  datatype[i % GUC_CAPTURE_LIST_TYPE_MAX]);
> +		i915_error_printf(ebuf, "    Owner-Id: %d\n", node->reginfo[i].vfid);
>  
>  		switch (i) {
>  		case GUC_CAPTURE_LIST_TYPE_GLOBAL:
>  		default:
>  			break;
>  		case GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS:
> -			__out(ebuf, "    GuC-Eng-Class: %d\n", node->eng_class);
> -			__out(ebuf, "    i915-Eng-Class: %d\n",
> -			      guc_class_to_engine_class(node->eng_class));
> +			i915_error_printf(ebuf, "    GuC-Eng-Class: %d\n", node->eng_class);
> +			i915_error_printf(ebuf, "    i915-Eng-Class: %d\n",
> +					  guc_class_to_engine_class(node->eng_class));
>  			break;
>  		case GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE:
>  			eng = intel_guc_lookup_engine(guc, node->eng_class, node->eng_inst);
>  			if (eng)
>  				GCAP_PRINT_INTEL_ENG_INFO(ebuf, eng);
>  			else
> -				__out(ebuf, "    i915-Eng-Lookup Fail!\n");
> +				i915_error_printf(ebuf, "    i915-Eng-Lookup Fail!\n");
>  			GCAP_PRINT_GUC_INST_INFO(ebuf, node);
>  			break;
>  		}
>  
>  		numregs = node->reginfo[i].num_regs;
> -		__out(ebuf, "    NumRegs: %d\n", numregs);
> +		i915_error_printf(ebuf, "    NumRegs: %d\n", numregs);
>  		j = 0;
>  		while (numregs--) {
>  			regs = node->reginfo[i].regs;
>  			str = guc_capture_reg_to_str(guc, GUC_CAPTURE_LIST_INDEX_PF, i,
>  						     node->eng_class, 0, regs[j].offset, &is_ext);
>  			if (!str)
> -				__out(ebuf, "      REG-0x%08x", regs[j].offset);
> +				i915_error_printf(ebuf, "      REG-0x%08x", regs[j].offset);
>  			else
> -				__out(ebuf, "      %s", str);
> +				i915_error_printf(ebuf, "      %s", str);
>  			if (is_ext)
> -				__out(ebuf, "[%ld][%ld]",
> -				      FIELD_GET(GUC_REGSET_STEERING_GROUP, regs[j].flags),
> -				      FIELD_GET(GUC_REGSET_STEERING_INSTANCE, regs[j].flags));
> -			__out(ebuf, ":  0x%08x\n", regs[j].value);
> +				i915_error_printf(ebuf, "[%ld][%ld]",
> +					FIELD_GET(GUC_REGSET_STEERING_GROUP, regs[j].flags),
> +					FIELD_GET(GUC_REGSET_STEERING_INSTANCE, regs[j].flags));
> +			i915_error_printf(ebuf, ":  0x%08x\n", regs[j].value);
>  			++j;
>  		}
>  	}
> -- 
> 2.37.1
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 5/7] drm/i915/guc: Use streaming loads to speed up dumping the guc log
  2022-08-02 18:48   ` [Intel-gfx] " Teres Alexis, Alan Previn
@ 2022-08-03  0:14     ` John Harrison
  0 siblings, 0 replies; 29+ messages in thread
From: John Harrison @ 2022-08-03  0:14 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, Intel-GFX; +Cc: Wilson, Chris P, DRI-Devel

On 8/2/2022 11:48, Teres Alexis, Alan Previn wrote:
> One concern below. Else, nice, simple yet good optimization here. :)
>
> In the interest of quicker progression, I will provide a conditional R-B if you can either fix the issue raised below on
> the way in or provide a reason why that's not an issue:
Not an issue, but code changes like that can't be 'fixed on the way in'. 
Tweaking a commit message can potentially happen when merging patches 
but not code changes. For that you have to repost for CI.

John.

>
> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
>
> On Wed, 2022-07-27 at 19:20 -0700, John.C.Harrison@Intel.com wrote:
>> From: Chris Wilson <chris.p.wilson@intel.com>
>>
>> Use a temporary page and mempy_from_wc to reduce the time it takes to
>> dump the guc log to debugfs.
>>
>> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 24 ++++++++++++++++------
>>   1 file changed, 18 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>> index 07d31ae32f765..4722d4b18ed19 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>> @@ -750,8 +750,9 @@ int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
>>   	struct intel_guc *guc = log_to_guc(log);
>>   	struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
>>   	struct drm_i915_gem_object *obj = NULL;
>> -	u32 *map;
>> -	int i = 0;
>> +	void *map;
>> +	u32 *page;
>> +	int i, j;
>>   
>>   	if (!intel_guc_is_supported(guc))
>>   		return -ENODEV;
>> @@ -764,23 +765,34 @@ int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
>>   	if (!obj)
>>   		return 0;
>>   
>> +	page = (u32 *)__get_free_page(GFP_KERNEL);
>> +	if (!page)
>> +		return -ENOMEM;
> Alan: although unlikely, its possible that user could trigger debugfs mid of a gt reset - not sure if we need to use the
> "uc->reset_in_progress" before calling this allocation and return a different error in that case like EAGAIN or EBUSY or
> ECONNRESET.
Doesn't matter.

The issue of thou shalt not allocate memory during a reset is only 
relevant to code that can be called from within the reset path. As in, 
you must not do something that could block the reset from completing. 
This is only debugfs code. It is not called from within the reset path. 
So if it gets blocked waiting for memory to be released, no-one cares.

John.


>
>> +
>>   	intel_guc_dump_time_info(guc, p);
>>   
>>   	map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC);
>>   	if (IS_ERR(map)) {
>>   		DRM_DEBUG("Failed to pin object\n");
>>   		drm_puts(p, "(log data unaccessible)\n");
>> +		free_page((unsigned long)page);
>>   		return PTR_ERR(map);
>>   	}
>>   
>> -	for (i = 0; i < obj->base.size / sizeof(u32); i += 4)
>> -		drm_printf(p, "0x%08x 0x%08x 0x%08x 0x%08x\n",
>> -			   *(map + i), *(map + i + 1),
>> -			   *(map + i + 2), *(map + i + 3));
>> +	for (i = 0; i < obj->base.size; i += PAGE_SIZE) {
>> +		if (!i915_memcpy_from_wc(page, map + i, PAGE_SIZE))
>> +			memcpy(page, map + i, PAGE_SIZE);
>> +
>> +		for (j = 0; j < PAGE_SIZE / sizeof(u32); j += 4)
>> +			drm_printf(p, "0x%08x 0x%08x 0x%08x 0x%08x\n",
>> +				   *(page + j + 0), *(page + j + 1),
>> +				   *(page + j + 2), *(page + j + 3));
>> +	}
>>   
>>   	drm_puts(p, "\n");
>>   
>>   	i915_gem_object_unpin_map(obj);
>> +	free_page((unsigned long)page);
>>   
>>   	return 0;
>>   }
>> -- 
>> 2.37.1
>>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Record CTB info in error logs
  2022-08-02 18:27   ` [Intel-gfx] " Teres Alexis, Alan Previn
@ 2022-08-03  0:20     ` John Harrison
  0 siblings, 0 replies; 29+ messages in thread
From: John Harrison @ 2022-08-03  0:20 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, Intel-GFX; +Cc: DRI-Devel

On 8/2/2022 11:27, Teres Alexis, Alan Previn wrote:
> One minor NIT (though i hope it could be fixed otw in as it adds a bit of ease-of-log-readibility).
> That said, everything else looks good.
>
> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
>   
> On Wed, 2022-07-27 at 19:20 -0700, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> When debugging GuC communication issues, it is useful to have the CTB
>> info available. So add the state and buffer contents to the error
>> capture log.
>>
>> Also, add a sub-structure for the GuC specific error capture info as
>> it is now becoming numerous.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gpu_error.c | 59 +++++++++++++++++++++++----
>>   drivers/gpu/drm/i915/i915_gpu_error.h | 20 +++++++--
>>   2 files changed, 67 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index addba75252343..543ba63f958ea 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -671,6 +671,18 @@ static void err_print_pciid(struct drm_i915_error_state_buf *m,
>>   		   pdev->subsystem_device);
>>   }
>>   
>> +static void err_print_guc_ctb(struct drm_i915_error_state_buf *m,
>> +			      const char *name,
>> +			      const struct intel_ctb_coredump *ctb)
>> +{
>> +	if (!ctb->size)
>> +		return;
>> +
>> +	err_printf(m, "GuC %s CTB: raw: 0x%08X, 0x%08X/%08X, cached: 0x%08X/%08X, desc = 0x%08X, buf = 0x%08X x 0x%08X\n",
>> +		   name, ctb->raw_status, ctb->raw_head, ctb->raw_tail,
>> +		   ctb->head, ctb->tail, ctb->desc_offset, ctb->cmds_offset, ctb->size);
>>
> NIT: to make it more readible on first glance, would be nice to add more descriptive text like "raw: Sts:0x%08X,
> Hd:0x%08X,Tl:0x@08X..." also, the not sure why cmds_offset is presented with a "x size" as opposed to just "desc-off =
> foo1, cmd-off = foo2, size = foo3"?
The line is long enough as it is. I'd rather not make it even longer. 
Same for '<name>: <address> x <size>' rather than '<name> _addr = 
<address>, <name>_size = <size>'. It's useful for readability to keep a 
single CTB channel on a single line but not if that line is excessively 
long.

John.

>> +}
>> +
>>   static void err_print_uc(struct drm_i915_error_state_buf *m,
>>   			 const struct intel_uc_coredump *error_uc)
>>   {
>> @@ -678,8 +690,12 @@ static void err_print_uc(struct drm_i915_error_state_buf *m,
>>   
>>   	intel_uc_fw_dump(&error_uc->guc_fw, &p);
>>   	intel_uc_fw_dump(&error_uc->huc_fw, &p);
>> -	err_printf(m, "GuC timestamp: 0x%08x\n", error_uc->timestamp);
>> -	intel_gpu_error_print_vma(m, NULL, error_uc->guc_log);
>> +	err_printf(m, "GuC timestamp: 0x%08x\n", error_uc->guc.timestamp);
>> +	intel_gpu_error_print_vma(m, NULL, error_uc->guc.vma_log);
>> +	err_printf(m, "GuC CTB fence: %d\n", error_uc->guc.last_fence);
>> +	err_print_guc_ctb(m, "Send", error_uc->guc.ctb + 0);
>> +	err_print_guc_ctb(m, "Recv", error_uc->guc.ctb + 1);
>> +	intel_gpu_error_print_vma(m, NULL, error_uc->guc.vma_ctb);
>>   }
>>   
>>   static void err_free_sgl(struct scatterlist *sgl)
>> @@ -854,7 +870,7 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
>>   	if (error->gt) {
>>   		bool print_guc_capture = false;
>>   
>> -		if (error->gt->uc && error->gt->uc->is_guc_capture)
>> +		if (error->gt->uc && error->gt->uc->guc.is_guc_capture)
>>   			print_guc_capture = true;
>>   
>>   		err_print_gt_display(m, error->gt);
>> @@ -1009,7 +1025,8 @@ static void cleanup_uc(struct intel_uc_coredump *uc)
>>   {
>>   	kfree(uc->guc_fw.path);
>>   	kfree(uc->huc_fw.path);
>> -	i915_vma_coredump_free(uc->guc_log);
>> +	i915_vma_coredump_free(uc->guc.vma_log);
>> +	i915_vma_coredump_free(uc->guc.vma_ctb);
>>   
>>   	kfree(uc);
>>   }
>> @@ -1658,6 +1675,23 @@ gt_record_engines(struct intel_gt_coredump *gt,
>>   	}
>>   }
>>   
>> +static void gt_record_guc_ctb(struct intel_ctb_coredump *saved,
>> +			      const struct intel_guc_ct_buffer *ctb,
>> +			      const void *blob_ptr, struct intel_guc *guc)
>> +{
>> +	if (!ctb || !ctb->desc)
>> +		return;
>> +
>> +	saved->raw_status = ctb->desc->status;
>> +	saved->raw_head = ctb->desc->head;
>> +	saved->raw_tail = ctb->desc->tail;
>> +	saved->head = ctb->head;
>> +	saved->tail = ctb->tail;
>> +	saved->size = ctb->size;
>> +	saved->desc_offset = ((void *)ctb->desc) - blob_ptr;
>> +	saved->cmds_offset = ((void *)ctb->cmds) - blob_ptr;
>> +}
>> +
>>   static struct intel_uc_coredump *
>>   gt_record_uc(struct intel_gt_coredump *gt,
>>   	     struct i915_vma_compress *compress)
>> @@ -1684,9 +1718,16 @@ gt_record_uc(struct intel_gt_coredump *gt,
>>   	 * log times to system times (in conjunction with the error->boottime and
>>   	 * gt->clock_frequency fields saved elsewhere).
>>   	 */
>> -	error_uc->timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
>> -	error_uc->guc_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
>> -						"GuC log buffer", compress);
>> +	error_uc->guc.timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
>> +	error_uc->guc.vma_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
>> +						    "GuC log buffer", compress);
>> +	error_uc->guc.vma_ctb = create_vma_coredump(gt->_gt, uc->guc.ct.vma,
>> +						    "GuC CT buffer", compress);
>> +	error_uc->guc.last_fence = uc->guc.ct.requests.last_fence;
>> +	gt_record_guc_ctb(error_uc->guc.ctb + 0, &uc->guc.ct.ctbs.send,
>> +			  uc->guc.ct.ctbs.send.desc, (struct intel_guc *)&uc->guc);
>> +	gt_record_guc_ctb(error_uc->guc.ctb + 1, &uc->guc.ct.ctbs.recv,
>> +			  uc->guc.ct.ctbs.send.desc, (struct intel_guc *)&uc->guc);
>>   
>>   	return error_uc;
>>   }
>> @@ -2039,9 +2080,9 @@ __i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask, u32 du
>>   			error->gt->uc = gt_record_uc(error->gt, compress);
>>   			if (error->gt->uc) {
>>   				if (dump_flags & CORE_DUMP_FLAG_IS_GUC_CAPTURE)
>> -					error->gt->uc->is_guc_capture = true;
>> +					error->gt->uc->guc.is_guc_capture = true;
>>   				else
>> -					GEM_BUG_ON(error->gt->uc->is_guc_capture);
>> +					GEM_BUG_ON(error->gt->uc->guc.is_guc_capture);
>>   			}
>>   		}
>>   
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
>> index d8a8b3d529e09..efc75cc2ffdb9 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
>> @@ -125,6 +125,15 @@ struct intel_engine_coredump {
>>   	struct intel_engine_coredump *next;
>>   };
>>   
>> +struct intel_ctb_coredump {
>> +	u32 raw_head, head;
>> +	u32 raw_tail, tail;
>> +	u32 raw_status;
>> +	u32 desc_offset;
>> +	u32 cmds_offset;
>> +	u32 size;
>> +};
>> +
>>   struct intel_gt_coredump {
>>   	const struct intel_gt *_gt;
>>   	bool awake;
>> @@ -165,9 +174,14 @@ struct intel_gt_coredump {
>>   	struct intel_uc_coredump {
>>   		struct intel_uc_fw guc_fw;
>>   		struct intel_uc_fw huc_fw;
>> -		struct i915_vma_coredump *guc_log;
>> -		u32 timestamp;
>> -		bool is_guc_capture;
>> +		struct guc_info {
>> +			struct intel_ctb_coredump ctb[2];
>> +			struct i915_vma_coredump *vma_ctb;
>> +			struct i915_vma_coredump *vma_log;
>> +			u32 timestamp;
>> +			u16 last_fence;
>> +			bool is_guc_capture;
>> +		} guc;
>>   	} *uc;
>>   
>>   	struct intel_gt_coredump *next;
>> -- 
>> 2.37.1
>>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] drm/i915/guc: Add a helper for log buffer size
  2022-08-02 17:37   ` Teres Alexis, Alan Previn
@ 2022-08-03  0:29     ` John Harrison
  0 siblings, 0 replies; 29+ messages in thread
From: John Harrison @ 2022-08-03  0:29 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, Intel-GFX; +Cc: Brost, Matthew, DRI-Devel

On 8/2/2022 10:37, Teres Alexis, Alan Previn wrote:
> Something minor in comments, so conditional R-B (please fix on the way in or reply to correct me):
>
> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
>
> On Wed, 2022-07-27 at 19:20 -0700, Harrison, John C wrote:
>> From: Alan Previn <alan.previn.teres.alexis@intel.com>
>>
>> Add a helper to get GuC log buffer size.
>>
>> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 49 ++++++++++++----------
>>   1 file changed, 27 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>> index 25b2d7ce6640d..492bbf419d4df 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>> @@ -15,6 +15,32 @@
>>   
>>   static void guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log);
>>   
>> +static u32 intel_guc_log_size(struct intel_guc_log *log)
>> +{
>> +	/*
>> +	 *  GuC Log buffer Layout:
>> +	 *
>> +	 *  NB: Ordering must follow "enum guc_log_buffer_type".
>> +	 *
>> +	 *  +===============================+ 00B
>> +	 *  |      Debug state header       |
>> +	 *  +-------------------------------+ 32B
>>
> Something we might have missed in prior updates but i think the bufer state is now 36 bytes long no? (9 dwords).
Good catch. Yes, an extra word was added some while back.

John.

>
>
>> +	 *  |    Crash dump state header    |
>> +	 *  +-------------------------------+ 64B
>> +	 *  |     Capture state header      |
>> +	 *  +-------------------------------+ 96B
>> +	 *  |                               |
>> +	 *  +===============================+ PAGE_SIZE (4KB)
>> +	 *  |          Debug logs           |
>> +	 *  +===============================+ + DEBUG_SIZE
>> +	 *  |        Crash Dump logs        |
>> +	 *  +===============================+ + CRASH_SIZE
>> +	 *  |         Capture logs          |
>> +	 *  +===============================+ + CAPTURE_SIZE
>> +	 */
>> +	return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
>> +}
>> +
>>   /**
>>    * DOC: GuC firmware log
>>    *
>> @@ -461,32 +487,11 @@ int intel_guc_log_create(struct intel_guc_log *log)
>>   
>>   	GEM_BUG_ON(log->vma);
>>   
>> -	/*
>> -	 *  GuC Log buffer Layout
>> -	 * (this ordering must follow "enum guc_log_buffer_type" definition)
>> -	 *
>> -	 *  +===============================+ 00B
>> -	 *  |      Debug state header       |
>> -	 *  +-------------------------------+ 32B
>> -	 *  |    Crash dump state header    |
>> -	 *  +-------------------------------+ 64B
>> -	 *  |     Capture state header      |
>> -	 *  +-------------------------------+ 96B
>> -	 *  |                               |
>> -	 *  +===============================+ PAGE_SIZE (4KB)
>> -	 *  |          Debug logs           |
>> -	 *  +===============================+ + DEBUG_SIZE
>> -	 *  |        Crash Dump logs        |
>> -	 *  +===============================+ + CRASH_SIZE
>> -	 *  |         Capture logs          |
>> -	 *  +===============================+ + CAPTURE_SIZE
>> -	 */
>>   	if (intel_guc_capture_output_min_size_est(guc) > CAPTURE_BUFFER_SIZE)
>>   		DRM_WARN("GuC log buffer for state_capture maybe too small. %d < %d\n",
>>   			 CAPTURE_BUFFER_SIZE, intel_guc_capture_output_min_size_est(guc));
>>   
>> -	guc_log_size = PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE +
>> -		       CAPTURE_BUFFER_SIZE;
>> +	guc_log_size = intel_guc_log_size(log);
>>   
>>   	vma = intel_guc_allocate_vma(guc, guc_log_size);
>>   	if (IS_ERR(vma)) {
>> -- 
>> 2.37.1
>>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information
  2022-07-28  2:20 ` [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information John.C.Harrison
@ 2022-08-05  0:40   ` Teres Alexis, Alan Previn
  2022-08-08 18:43     ` John Harrison
  2022-08-19 10:45   ` Jani Nikula
  1 sibling, 1 reply; 29+ messages in thread
From: Teres Alexis, Alan Previn @ 2022-08-05  0:40 UTC (permalink / raw)
  To: Harrison, John C, Intel-GFX; +Cc: DRI-Devel

I have a question on below code. Everything else looked good.
Will r-b as soon as we can close on below question
...alan


On Wed, 2022-07-27 at 19:20 -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> It is useful to be able to match GuC events to kernel events when
> looking at the GuC log. That requires being able to convert GuC
> timestamps to kernel time. So, when dumping error captures and/or GuC
> logs, include a stamp in both time zones plus the clock frequency.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1675,6 +1678,13 @@ gt_record_uc(struct intel_gt_coredump *gt,
>  	 */
>  	error_uc->guc_fw.path = kstrdup(uc->guc.fw.path, ALLOW_FAIL);
>  	error_uc->huc_fw.path = kstrdup(uc->huc.fw.path, ALLOW_FAIL);
> +
> +	/*
> +	 * Save the GuC log and include a timestamp reference for converting the
> +	 * log times to system times (in conjunction with the error->boottime and
> +	 * gt->clock_frequency fields saved elsewhere).
> +	 */
> +	error_uc->timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);

Alan:this register is in the GUC-SHIM domain and so unless i am mistaken u might need to ensure we hold a wakeref so
that are getting a live value of the real timestamp register that this register is mirror-ing and not a stale snapshot.
Or was this already done farther up the stack? Or are we doing the opposite - in which case we should ensure we drop al
 wakeref prior to this point. (which i am not sure is a reliable method since we wouldnt know what GuC ref was at). 
>  	error_uc->guc_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
>  						"GuC log buffer", compress);
>  

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information
  2022-08-05  0:40   ` [Intel-gfx] " Teres Alexis, Alan Previn
@ 2022-08-08 18:43     ` John Harrison
  2022-08-15  4:55       ` Teres Alexis, Alan Previn
  0 siblings, 1 reply; 29+ messages in thread
From: John Harrison @ 2022-08-08 18:43 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn, Intel-GFX; +Cc: DRI-Devel

On 8/4/2022 17:40, Teres Alexis, Alan Previn wrote:
> I have a question on below code. Everything else looked good.
> Will r-b as soon as we can close on below question
> ...alan
>
>
> On Wed, 2022-07-27 at 19:20 -0700, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> It is useful to be able to match GuC events to kernel events when
>> looking at the GuC log. That requires being able to convert GuC
>> timestamps to kernel time. So, when dumping error captures and/or GuC
>> logs, include a stamp in both time zones plus the clock frequency.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -1675,6 +1678,13 @@ gt_record_uc(struct intel_gt_coredump *gt,
>>   	 */
>>   	error_uc->guc_fw.path = kstrdup(uc->guc.fw.path, ALLOW_FAIL);
>>   	error_uc->huc_fw.path = kstrdup(uc->huc.fw.path, ALLOW_FAIL);
>> +
>> +	/*
>> +	 * Save the GuC log and include a timestamp reference for converting the
>> +	 * log times to system times (in conjunction with the error->boottime and
>> +	 * gt->clock_frequency fields saved elsewhere).
>> +	 */
>> +	error_uc->timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
> Alan:this register is in the GUC-SHIM domain and so unless i am mistaken u might need to ensure we hold a wakeref so
> that are getting a live value of the real timestamp register that this register is mirror-ing and not a stale snapshot.
> Or was this already done farther up the stack? Or are we doing the opposite - in which case we should ensure we drop al
>   wakeref prior to this point. (which i am not sure is a reliable method since we wouldnt know what GuC ref was at).
The intel_uncore_read() does a forcewake acquire implicitly.

Not sure what you mean about dropping all wakerefs prior to this point?

John.

>>   	error_uc->guc_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
>>   						"GuC log buffer", compress);
>>   


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information
  2022-08-08 18:43     ` John Harrison
@ 2022-08-15  4:55       ` Teres Alexis, Alan Previn
  0 siblings, 0 replies; 29+ messages in thread
From: Teres Alexis, Alan Previn @ 2022-08-15  4:55 UTC (permalink / raw)
  To: Harrison, John C, Intel-GFX; +Cc: DRI-Devel

Sounds good - thanks. (ignore the "doing the opposite" comment).

Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>

On Mon, 2022-08-08 at 11:43 -0700, Harrison, John C wrote:
> On 8/4/2022 17:40, Teres Alexis, Alan Previn wrote:
> > I have a question on below code. Everything else looked good.
> > Will r-b as soon as we can close on below question
> > ...alan
> > 
> > 
> > On Wed, 2022-07-27 at 19:20 -0700, John.C.Harrison@Intel.com wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > It is useful to be able to match GuC events to kernel events when
> > > looking at the GuC log. That requires being able to convert GuC
> > > timestamps to kernel time. So, when dumping error captures and/or GuC
> > > logs, include a stamp in both time zones plus the clock frequency.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > > @@ -1675,6 +1678,13 @@ gt_record_uc(struct intel_gt_coredump *gt,
> > >   	 */
> > >   	error_uc->guc_fw.path = kstrdup(uc->guc.fw.path, ALLOW_FAIL);
> > >   	error_uc->huc_fw.path = kstrdup(uc->huc.fw.path, ALLOW_FAIL);
> > > +
> > > +	/*
> > > +	 * Save the GuC log and include a timestamp reference for converting the
> > > +	 * log times to system times (in conjunction with the error->boottime and
> > > +	 * gt->clock_frequency fields saved elsewhere).
> > > +	 */
> > > +	error_uc->timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
> > Alan:this register is in the GUC-SHIM domain and so unless i am mistaken u might need to ensure we hold a wakeref so
> > that are getting a live value of the real timestamp register that this register is mirror-ing and not a stale snapshot.
> > Or was this already done farther up the stack? Or are we doing the opposite - in which case we should ensure we drop al
> >   wakeref prior to this point. (which i am not sure is a reliable method since we wouldnt know what GuC ref was at).
> The intel_uncore_read() does a forcewake acquire implicitly.
> 
> Not sure what you mean about dropping all wakerefs prior to this point?
> 
> John.
> 
> > >   	error_uc->guc_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
> > >   						"GuC log buffer", compress);
> > >   


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable
  2022-07-28  2:20 ` [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable John.C.Harrison
@ 2022-08-15  5:43   ` Teres Alexis, Alan Previn
  2022-08-24  9:01   ` Joonas Lahtinen
  1 sibling, 0 replies; 29+ messages in thread
From: Teres Alexis, Alan Previn @ 2022-08-15  5:43 UTC (permalink / raw)
  To: Harrison, John C, Intel-GFX; +Cc: DRI-Devel

Looks good to me. 

Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>


On Wed, 2022-07-27 at 19:20 -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The GuC log buffer sizes had to be configured statically at compile
> time. This can be quite troublesome when needing to get larger logs
> out of a released driver. So re-organise the code to allow a boot time
> module parameter override.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  53 ++----
>  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    |  14 +-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 176 +++++++++++++++++-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  42 +++--
>  drivers/gpu/drm/i915/i915_params.c            |  12 ++
>  drivers/gpu/drm/i915/i915_params.h            |   3 +
>  6 files changed, 226 insertions(+), 74 deletions(-)
> 
...
> +static s32 scale_log_param(struct intel_guc_log *log, const struct guc_log_section *section,
> +			   s32 param)
> +{
> +	/* -1 means default */
> +	if (param < 0)
> +		return section->default_val;
> +
> +	/* Check for 32-bit overflow */
> +	if (param >= SZ_4K) {
> +		drm_err(&guc_to_gt(log_to_guc(log))->i915->drm, "Size too large for GuC %s log: %dMB!",
> +			section->name, param);
> +		return section->default_val;
> +	}
> +
> +	/* Param units are 1MB */
> +	return param * SZ_1M;
> +}
> +




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information
  2022-07-28  2:20 ` [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information John.C.Harrison
  2022-08-05  0:40   ` [Intel-gfx] " Teres Alexis, Alan Previn
@ 2022-08-19 10:45   ` Jani Nikula
  2022-08-19 21:02     ` John Harrison
  1 sibling, 1 reply; 29+ messages in thread
From: Jani Nikula @ 2022-08-19 10:45 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX; +Cc: DRI-Devel

On Wed, 27 Jul 2022, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> It is useful to be able to match GuC events to kernel events when
> looking at the GuC log. That requires being able to convert GuC
> timestamps to kernel time. So, when dumping error captures and/or GuC
> logs, include a stamp in both time zones plus the clock frequency.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h    |  2 ++
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c     | 19 +++++++++++++++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h     |  2 ++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c |  2 ++
>  drivers/gpu/drm/i915/i915_gpu_error.c      | 12 ++++++++++++
>  drivers/gpu/drm/i915/i915_gpu_error.h      |  3 +++
>  6 files changed, 40 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index 60d6eb5f245b7..fc7979bd91db5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1007,6 +1007,8 @@
>  #define   GEN11_LSN_UNSLCVC_GAFS_HALF_CL2_MAXALLOC	(1 << 9)
>  #define   GEN11_LSN_UNSLCVC_GAFS_HALF_SF_MAXALLOC	(1 << 7)
>  
> +#define GUCPMTIMESTAMP				_MMIO(0xc3e8)
> +
>  #define __GEN9_RCS0_MOCS0			0xc800
>  #define GEN9_GFX_MOCS(i)			_MMIO(__GEN9_RCS0_MOCS0 + (i) * 4)
>  #define __GEN9_VCS0_MOCS0			0xc900
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 2706a8c650900..ab4aacc516aa4 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -389,6 +389,25 @@ void intel_guc_write_params(struct intel_guc *guc)
>  	intel_uncore_forcewake_put(uncore, FORCEWAKE_GT);
>  }
>  
> +void intel_guc_dump_time_info(struct intel_guc *guc, struct drm_printer *p)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	intel_wakeref_t wakeref;
> +	u32 stamp = 0;
> +	u64 ktime;
> +
> +	intel_device_info_print_runtime(RUNTIME_INFO(gt->i915), p);

Why does this function print runtime info alone? Seems kind of random,
considering what intel_device_info_print_runtime() actually
prints. Should it print both device info and runtime info, or nothing at
all?

This conflicts with [1] and I don't know what to do about it. The first
instinct is to just remove it.

BR,
Jani.


[1] https://patchwork.freedesktop.org/patch/msgid/4be86d7a0737b2c49eee460d29d42843418536fe.1660137416.git.jani.nikula@intel.com

> +
> +	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
> +		stamp = intel_uncore_read(gt->uncore, GUCPMTIMESTAMP);
> +	ktime = ktime_get_boottime_ns();
> +
> +	drm_printf(p, "Kernel timestamp: 0x%08llX [%llu]\n", ktime, ktime);
> +	drm_printf(p, "GuC timestamp: 0x%08X [%u]\n", stamp, stamp);
> +	drm_printf(p, "CS timestamp frequency: %u Hz, %u ns\n",
> +		   gt->clock_frequency, gt->clock_period_ns);
> +}
> +
>  int intel_guc_init(struct intel_guc *guc)
>  {
>  	struct intel_gt *gt = guc_to_gt(guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index a7acffbf15d1f..804133df1ac9b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -464,4 +464,6 @@ void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>  
>  void intel_guc_write_barrier(struct intel_guc *guc);
>  
> +void intel_guc_dump_time_info(struct intel_guc *guc, struct drm_printer *p);
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> index 991d4a02248dc..07d31ae32f765 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> @@ -764,6 +764,8 @@ int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
>  	if (!obj)
>  		return 0;
>  
> +	intel_guc_dump_time_info(guc, p);
> +
>  	map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC);
>  	if (IS_ERR(map)) {
>  		DRM_DEBUG("Failed to pin object\n");
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 32e92651ef7c2..addba75252343 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -678,6 +678,7 @@ static void err_print_uc(struct drm_i915_error_state_buf *m,
>  
>  	intel_uc_fw_dump(&error_uc->guc_fw, &p);
>  	intel_uc_fw_dump(&error_uc->huc_fw, &p);
> +	err_printf(m, "GuC timestamp: 0x%08x\n", error_uc->timestamp);
>  	intel_gpu_error_print_vma(m, NULL, error_uc->guc_log);
>  }
>  
> @@ -720,6 +721,8 @@ static void err_print_gt_global_nonguc(struct drm_i915_error_state_buf *m,
>  	int i;
>  
>  	err_printf(m, "GT awake: %s\n", str_yes_no(gt->awake));
> +	err_printf(m, "CS timestamp frequency: %u Hz, %d ns\n",
> +		   gt->clock_frequency, gt->clock_period_ns);
>  	err_printf(m, "EIR: 0x%08x\n", gt->eir);
>  	err_printf(m, "PGTBL_ER: 0x%08x\n", gt->pgtbl_er);
>  
> @@ -1675,6 +1678,13 @@ gt_record_uc(struct intel_gt_coredump *gt,
>  	 */
>  	error_uc->guc_fw.path = kstrdup(uc->guc.fw.path, ALLOW_FAIL);
>  	error_uc->huc_fw.path = kstrdup(uc->huc.fw.path, ALLOW_FAIL);
> +
> +	/*
> +	 * Save the GuC log and include a timestamp reference for converting the
> +	 * log times to system times (in conjunction with the error->boottime and
> +	 * gt->clock_frequency fields saved elsewhere).
> +	 */
> +	error_uc->timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
>  	error_uc->guc_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
>  						"GuC log buffer", compress);
>  
> @@ -1833,6 +1843,8 @@ static void gt_record_global_regs(struct intel_gt_coredump *gt)
>  static void gt_record_info(struct intel_gt_coredump *gt)
>  {
>  	memcpy(&gt->info, &gt->_gt->info, sizeof(struct intel_gt_info));
> +	gt->clock_frequency = gt->_gt->clock_frequency;
> +	gt->clock_period_ns = gt->_gt->clock_period_ns;
>  }
>  
>  /*
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index 55a143b92d10e..d8a8b3d529e09 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -150,6 +150,8 @@ struct intel_gt_coredump {
>  	u32 gtt_cache;
>  	u32 aux_err; /* gen12 */
>  	u32 gam_done; /* gen12 */
> +	u32 clock_frequency;
> +	u32 clock_period_ns;
>  
>  	/* Display related */
>  	u32 derrmr;
> @@ -164,6 +166,7 @@ struct intel_gt_coredump {
>  		struct intel_uc_fw guc_fw;
>  		struct intel_uc_fw huc_fw;
>  		struct i915_vma_coredump *guc_log;
> +		u32 timestamp;
>  		bool is_guc_capture;
>  	} *uc;

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information
  2022-08-19 10:45   ` Jani Nikula
@ 2022-08-19 21:02     ` John Harrison
  2022-08-23 10:09       ` Jani Nikula
  0 siblings, 1 reply; 29+ messages in thread
From: John Harrison @ 2022-08-19 21:02 UTC (permalink / raw)
  To: Jani Nikula, Intel-GFX; +Cc: DRI-Devel

On 8/19/2022 03:45, Jani Nikula wrote:
> On Wed, 27 Jul 2022, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> It is useful to be able to match GuC events to kernel events when
>> looking at the GuC log. That requires being able to convert GuC
>> timestamps to kernel time. So, when dumping error captures and/or GuC
>> logs, include a stamp in both time zones plus the clock frequency.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_gt_regs.h    |  2 ++
>>   drivers/gpu/drm/i915/gt/uc/intel_guc.c     | 19 +++++++++++++++++++
>>   drivers/gpu/drm/i915/gt/uc/intel_guc.h     |  2 ++
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_log.c |  2 ++
>>   drivers/gpu/drm/i915/i915_gpu_error.c      | 12 ++++++++++++
>>   drivers/gpu/drm/i915/i915_gpu_error.h      |  3 +++
>>   6 files changed, 40 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>> index 60d6eb5f245b7..fc7979bd91db5 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>> @@ -1007,6 +1007,8 @@
>>   #define   GEN11_LSN_UNSLCVC_GAFS_HALF_CL2_MAXALLOC	(1 << 9)
>>   #define   GEN11_LSN_UNSLCVC_GAFS_HALF_SF_MAXALLOC	(1 << 7)
>>   
>> +#define GUCPMTIMESTAMP				_MMIO(0xc3e8)
>> +
>>   #define __GEN9_RCS0_MOCS0			0xc800
>>   #define GEN9_GFX_MOCS(i)			_MMIO(__GEN9_RCS0_MOCS0 + (i) * 4)
>>   #define __GEN9_VCS0_MOCS0			0xc900
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> index 2706a8c650900..ab4aacc516aa4 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> @@ -389,6 +389,25 @@ void intel_guc_write_params(struct intel_guc *guc)
>>   	intel_uncore_forcewake_put(uncore, FORCEWAKE_GT);
>>   }
>>   
>> +void intel_guc_dump_time_info(struct intel_guc *guc, struct drm_printer *p)
>> +{
>> +	struct intel_gt *gt = guc_to_gt(guc);
>> +	intel_wakeref_t wakeref;
>> +	u32 stamp = 0;
>> +	u64 ktime;
>> +
>> +	intel_device_info_print_runtime(RUNTIME_INFO(gt->i915), p);
> Why does this function print runtime info alone? Seems kind of random,
> considering what intel_device_info_print_runtime() actually
> prints. Should it print both device info and runtime info, or nothing at
> all?
Hmm. That was added specifically to print the rawclk value (and only for 
the rawclk value) because that was the frequency that the GuC time stamp 
register ticks at. I think we later worked out that the CS frequency was 
more correct. Hence printing gt->clock_frequency lower down. I guess I 
forgot to go back and remove the rawclk print, though.

So yeah, it can be removed.

John.


>
> This conflicts with [1] and I don't know what to do about it. The first
> instinct is to just remove it.
>
> BR,
> Jani.
>
>
> [1] https://patchwork.freedesktop.org/patch/msgid/4be86d7a0737b2c49eee460d29d42843418536fe.1660137416.git.jani.nikula@intel.com
>
>> +
>> +	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
>> +		stamp = intel_uncore_read(gt->uncore, GUCPMTIMESTAMP);
>> +	ktime = ktime_get_boottime_ns();
>> +
>> +	drm_printf(p, "Kernel timestamp: 0x%08llX [%llu]\n", ktime, ktime);
>> +	drm_printf(p, "GuC timestamp: 0x%08X [%u]\n", stamp, stamp);
>> +	drm_printf(p, "CS timestamp frequency: %u Hz, %u ns\n",
>> +		   gt->clock_frequency, gt->clock_period_ns);
>> +}
>> +
>>   int intel_guc_init(struct intel_guc *guc)
>>   {
>>   	struct intel_gt *gt = guc_to_gt(guc);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> index a7acffbf15d1f..804133df1ac9b 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> @@ -464,4 +464,6 @@ void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>>   
>>   void intel_guc_write_barrier(struct intel_guc *guc);
>>   
>> +void intel_guc_dump_time_info(struct intel_guc *guc, struct drm_printer *p);
>> +
>>   #endif
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>> index 991d4a02248dc..07d31ae32f765 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>> @@ -764,6 +764,8 @@ int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
>>   	if (!obj)
>>   		return 0;
>>   
>> +	intel_guc_dump_time_info(guc, p);
>> +
>>   	map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC);
>>   	if (IS_ERR(map)) {
>>   		DRM_DEBUG("Failed to pin object\n");
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index 32e92651ef7c2..addba75252343 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -678,6 +678,7 @@ static void err_print_uc(struct drm_i915_error_state_buf *m,
>>   
>>   	intel_uc_fw_dump(&error_uc->guc_fw, &p);
>>   	intel_uc_fw_dump(&error_uc->huc_fw, &p);
>> +	err_printf(m, "GuC timestamp: 0x%08x\n", error_uc->timestamp);
>>   	intel_gpu_error_print_vma(m, NULL, error_uc->guc_log);
>>   }
>>   
>> @@ -720,6 +721,8 @@ static void err_print_gt_global_nonguc(struct drm_i915_error_state_buf *m,
>>   	int i;
>>   
>>   	err_printf(m, "GT awake: %s\n", str_yes_no(gt->awake));
>> +	err_printf(m, "CS timestamp frequency: %u Hz, %d ns\n",
>> +		   gt->clock_frequency, gt->clock_period_ns);
>>   	err_printf(m, "EIR: 0x%08x\n", gt->eir);
>>   	err_printf(m, "PGTBL_ER: 0x%08x\n", gt->pgtbl_er);
>>   
>> @@ -1675,6 +1678,13 @@ gt_record_uc(struct intel_gt_coredump *gt,
>>   	 */
>>   	error_uc->guc_fw.path = kstrdup(uc->guc.fw.path, ALLOW_FAIL);
>>   	error_uc->huc_fw.path = kstrdup(uc->huc.fw.path, ALLOW_FAIL);
>> +
>> +	/*
>> +	 * Save the GuC log and include a timestamp reference for converting the
>> +	 * log times to system times (in conjunction with the error->boottime and
>> +	 * gt->clock_frequency fields saved elsewhere).
>> +	 */
>> +	error_uc->timestamp = intel_uncore_read(gt->_gt->uncore, GUCPMTIMESTAMP);
>>   	error_uc->guc_log = create_vma_coredump(gt->_gt, uc->guc.log.vma,
>>   						"GuC log buffer", compress);
>>   
>> @@ -1833,6 +1843,8 @@ static void gt_record_global_regs(struct intel_gt_coredump *gt)
>>   static void gt_record_info(struct intel_gt_coredump *gt)
>>   {
>>   	memcpy(&gt->info, &gt->_gt->info, sizeof(struct intel_gt_info));
>> +	gt->clock_frequency = gt->_gt->clock_frequency;
>> +	gt->clock_period_ns = gt->_gt->clock_period_ns;
>>   }
>>   
>>   /*
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
>> index 55a143b92d10e..d8a8b3d529e09 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
>> @@ -150,6 +150,8 @@ struct intel_gt_coredump {
>>   	u32 gtt_cache;
>>   	u32 aux_err; /* gen12 */
>>   	u32 gam_done; /* gen12 */
>> +	u32 clock_frequency;
>> +	u32 clock_period_ns;
>>   
>>   	/* Display related */
>>   	u32 derrmr;
>> @@ -164,6 +166,7 @@ struct intel_gt_coredump {
>>   		struct intel_uc_fw guc_fw;
>>   		struct intel_uc_fw huc_fw;
>>   		struct i915_vma_coredump *guc_log;
>> +		u32 timestamp;
>>   		bool is_guc_capture;
>>   	} *uc;


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information
  2022-08-19 21:02     ` John Harrison
@ 2022-08-23 10:09       ` Jani Nikula
  0 siblings, 0 replies; 29+ messages in thread
From: Jani Nikula @ 2022-08-23 10:09 UTC (permalink / raw)
  To: John Harrison, Intel-GFX; +Cc: DRI-Devel

On Fri, 19 Aug 2022, John Harrison <john.c.harrison@intel.com> wrote:
> On 8/19/2022 03:45, Jani Nikula wrote:
>> On Wed, 27 Jul 2022, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> It is useful to be able to match GuC events to kernel events when
>>> looking at the GuC log. That requires being able to convert GuC
>>> timestamps to kernel time. So, when dumping error captures and/or GuC
>>> logs, include a stamp in both time zones plus the clock frequency.
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_gt_regs.h    |  2 ++
>>>   drivers/gpu/drm/i915/gt/uc/intel_guc.c     | 19 +++++++++++++++++++
>>>   drivers/gpu/drm/i915/gt/uc/intel_guc.h     |  2 ++
>>>   drivers/gpu/drm/i915/gt/uc/intel_guc_log.c |  2 ++
>>>   drivers/gpu/drm/i915/i915_gpu_error.c      | 12 ++++++++++++
>>>   drivers/gpu/drm/i915/i915_gpu_error.h      |  3 +++
>>>   6 files changed, 40 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>>> index 60d6eb5f245b7..fc7979bd91db5 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>>> @@ -1007,6 +1007,8 @@
>>>   #define   GEN11_LSN_UNSLCVC_GAFS_HALF_CL2_MAXALLOC	(1 << 9)
>>>   #define   GEN11_LSN_UNSLCVC_GAFS_HALF_SF_MAXALLOC	(1 << 7)
>>>   
>>> +#define GUCPMTIMESTAMP				_MMIO(0xc3e8)
>>> +
>>>   #define __GEN9_RCS0_MOCS0			0xc800
>>>   #define GEN9_GFX_MOCS(i)			_MMIO(__GEN9_RCS0_MOCS0 + (i) * 4)
>>>   #define __GEN9_VCS0_MOCS0			0xc900
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> index 2706a8c650900..ab4aacc516aa4 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>> @@ -389,6 +389,25 @@ void intel_guc_write_params(struct intel_guc *guc)
>>>   	intel_uncore_forcewake_put(uncore, FORCEWAKE_GT);
>>>   }
>>>   
>>> +void intel_guc_dump_time_info(struct intel_guc *guc, struct drm_printer *p)
>>> +{
>>> +	struct intel_gt *gt = guc_to_gt(guc);
>>> +	intel_wakeref_t wakeref;
>>> +	u32 stamp = 0;
>>> +	u64 ktime;
>>> +
>>> +	intel_device_info_print_runtime(RUNTIME_INFO(gt->i915), p);
>> Why does this function print runtime info alone? Seems kind of random,
>> considering what intel_device_info_print_runtime() actually
>> prints. Should it print both device info and runtime info, or nothing at
>> all?
> Hmm. That was added specifically to print the rawclk value (and only for 
> the rawclk value) because that was the frequency that the GuC time stamp 
> register ticks at. I think we later worked out that the CS frequency was 
> more correct. Hence printing gt->clock_frequency lower down. I guess I 
> forgot to go back and remove the rawclk print, though.
>
> So yeah, it can be removed.

Could you r-b the patch [1] doing just that please?

BR,
Jani.

[1] https://patchwork.freedesktop.org/patch/msgid/b395ac4c909042f5daabf29959d8733993545aa2.1660910433.git.jani.nikula@intel.com


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable
  2022-07-28  2:20 ` [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable John.C.Harrison
  2022-08-15  5:43   ` [Intel-gfx] " Teres Alexis, Alan Previn
@ 2022-08-24  9:01   ` Joonas Lahtinen
       [not found]     ` <4bd7b51a-caf0-d987-c7df-6cfb24f36597@intel.com>
  1 sibling, 1 reply; 29+ messages in thread
From: Joonas Lahtinen @ 2022-08-24  9:01 UTC (permalink / raw)
  To: Intel-GFX, John.C.Harrison; +Cc: DRI-Devel

NACK on this one. Let's get this reverted or fixed to eliminate
new module parameters.

What prevents us just from using the maximum sizes? Or alternatively
we could check the already existing drm.debug variable or anything else
but addding 3 new module parameters.

For future reference, please do Cc maintainers when adding new uAPI
like module parameters.

Regards, Joonas

Quoting John.C.Harrison@Intel.com (2022-07-28 05:20:27)
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The GuC log buffer sizes had to be configured statically at compile
> time. This can be quite troublesome when needing to get larger logs
> out of a released driver. So re-organise the code to allow a boot time
> module parameter override.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  53 ++----
>  .../gpu/drm/i915/gt/uc/intel_guc_capture.c    |  14 +-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 176 +++++++++++++++++-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  42 +++--
>  drivers/gpu/drm/i915/i915_params.c            |  12 ++
>  drivers/gpu/drm/i915/i915_params.h            |   3 +
>  6 files changed, 226 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index ab4aacc516aa4..01f2705cb94a3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -224,53 +224,22 @@ static u32 guc_ctl_feature_flags(struct intel_guc *guc)
>  
>  static u32 guc_ctl_log_params_flags(struct intel_guc *guc)
>  {
> -       u32 offset = intel_guc_ggtt_offset(guc, guc->log.vma) >> PAGE_SHIFT;
> -       u32 flags;
> -
> -       #if (((CRASH_BUFFER_SIZE) % SZ_1M) == 0)
> -       #define LOG_UNIT SZ_1M
> -       #define LOG_FLAG GUC_LOG_LOG_ALLOC_UNITS
> -       #else
> -       #define LOG_UNIT SZ_4K
> -       #define LOG_FLAG 0
> -       #endif
> -
> -       #if (((CAPTURE_BUFFER_SIZE) % SZ_1M) == 0)
> -       #define CAPTURE_UNIT SZ_1M
> -       #define CAPTURE_FLAG GUC_LOG_CAPTURE_ALLOC_UNITS
> -       #else
> -       #define CAPTURE_UNIT SZ_4K
> -       #define CAPTURE_FLAG 0
> -       #endif
> -
> -       BUILD_BUG_ON(!CRASH_BUFFER_SIZE);
> -       BUILD_BUG_ON(!IS_ALIGNED(CRASH_BUFFER_SIZE, LOG_UNIT));
> -       BUILD_BUG_ON(!DEBUG_BUFFER_SIZE);
> -       BUILD_BUG_ON(!IS_ALIGNED(DEBUG_BUFFER_SIZE, LOG_UNIT));
> -       BUILD_BUG_ON(!CAPTURE_BUFFER_SIZE);
> -       BUILD_BUG_ON(!IS_ALIGNED(CAPTURE_BUFFER_SIZE, CAPTURE_UNIT));
> -
> -       BUILD_BUG_ON((CRASH_BUFFER_SIZE / LOG_UNIT - 1) >
> -                       (GUC_LOG_CRASH_MASK >> GUC_LOG_CRASH_SHIFT));
> -       BUILD_BUG_ON((DEBUG_BUFFER_SIZE / LOG_UNIT - 1) >
> -                       (GUC_LOG_DEBUG_MASK >> GUC_LOG_DEBUG_SHIFT));
> -       BUILD_BUG_ON((CAPTURE_BUFFER_SIZE / CAPTURE_UNIT - 1) >
> -                       (GUC_LOG_CAPTURE_MASK >> GUC_LOG_CAPTURE_SHIFT));
> +       struct intel_guc_log *log = &guc->log;
> +       u32 offset, flags;
> +
> +       GEM_BUG_ON(!log->sizes_initialised);
> +
> +       offset = intel_guc_ggtt_offset(guc, log->vma) >> PAGE_SHIFT;
>  
>         flags = GUC_LOG_VALID |
>                 GUC_LOG_NOTIFY_ON_HALF_FULL |
> -               CAPTURE_FLAG |
> -               LOG_FLAG |
> -               ((CRASH_BUFFER_SIZE / LOG_UNIT - 1) << GUC_LOG_CRASH_SHIFT) |
> -               ((DEBUG_BUFFER_SIZE / LOG_UNIT - 1) << GUC_LOG_DEBUG_SHIFT) |
> -               ((CAPTURE_BUFFER_SIZE / CAPTURE_UNIT - 1) << GUC_LOG_CAPTURE_SHIFT) |
> +               log->sizes[GUC_LOG_SECTIONS_DEBUG].flag |
> +               log->sizes[GUC_LOG_SECTIONS_CAPTURE].flag |
> +               (log->sizes[GUC_LOG_SECTIONS_CRASH].count << GUC_LOG_CRASH_SHIFT) |
> +               (log->sizes[GUC_LOG_SECTIONS_DEBUG].count << GUC_LOG_DEBUG_SHIFT) |
> +               (log->sizes[GUC_LOG_SECTIONS_CAPTURE].count << GUC_LOG_CAPTURE_SHIFT) |
>                 (offset << GUC_LOG_BUF_ADDR_SHIFT);
>  
> -       #undef LOG_UNIT
> -       #undef LOG_FLAG
> -       #undef CAPTURE_UNIT
> -       #undef CAPTURE_FLAG
> -
>         return flags;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> index b54b7883320b1..d2ac53d4f3b6e 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> @@ -656,16 +656,17 @@ static void check_guc_capture_size(struct intel_guc *guc)
>         struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
>         int min_size = guc_capture_output_min_size_est(guc);
>         int spare_size = min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER;
> +       u32 buffer_size = intel_guc_log_section_size_capture(&guc->log);
>  
>         if (min_size < 0)
>                 drm_warn(&i915->drm, "Failed to calculate GuC error state capture buffer minimum size: %d!\n",
>                          min_size);
> -       else if (min_size > CAPTURE_BUFFER_SIZE)
> +       else if (min_size > buffer_size)
>                 drm_warn(&i915->drm, "GuC error state capture buffer is too small: %d < %d\n",
> -                        CAPTURE_BUFFER_SIZE, min_size);
> -       else if (spare_size > CAPTURE_BUFFER_SIZE)
> +                        buffer_size, min_size);
> +       else if (spare_size > buffer_size)
>                 drm_notice(&i915->drm, "GuC error state capture buffer maybe too small: %d < %d (min = %d)\n",
> -                          CAPTURE_BUFFER_SIZE, spare_size, min_size);
> +                          buffer_size, spare_size, min_size);
>  }
>  
>  /*
> @@ -1294,7 +1295,8 @@ static void __guc_capture_process_output(struct intel_guc *guc)
>  
>         log_buf_state = guc->log.buf_addr +
>                         (sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER);
> -       src_data = guc->log.buf_addr + intel_guc_get_log_buffer_offset(GUC_CAPTURE_LOG_BUFFER);
> +       src_data = guc->log.buf_addr +
> +                  intel_guc_get_log_buffer_offset(&guc->log, GUC_CAPTURE_LOG_BUFFER);
>  
>         /*
>          * Make a copy of the state structure, inside GuC log buffer
> @@ -1302,7 +1304,7 @@ static void __guc_capture_process_output(struct intel_guc *guc)
>          * from it multiple times.
>          */
>         memcpy(&log_buf_state_local, log_buf_state, sizeof(struct guc_log_buffer_state));
> -       buffer_size = intel_guc_get_log_buffer_size(GUC_CAPTURE_LOG_BUFFER);
> +       buffer_size = intel_guc_get_log_buffer_size(&guc->log, GUC_CAPTURE_LOG_BUFFER);
>         read_offset = log_buf_state_local.read_ptr;
>         write_offset = log_buf_state_local.sampled_write_ptr;
>         full_count = log_buf_state_local.buffer_full_cnt;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> index 4722d4b18ed19..890b6853bd609 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> @@ -13,8 +13,158 @@
>  #include "intel_guc_capture.h"
>  #include "intel_guc_log.h"
>  
> +#if defined(CONFIG_DRM_I915_DEBUG_GUC)
> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_2M
> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_16M
> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_4M
> +#elif defined(CONFIG_DRM_I915_DEBUG_GEM)
> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_1M
> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_2M
> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_4M
> +#else
> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_8K
> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_64K
> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_2M
> +#endif
> +
>  static void guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log);
>  
> +struct guc_log_section {
> +       u32 max;
> +       u32 flag;
> +       u32 default_val;
> +       const char *name;
> +};
> +
> +static s32 scale_log_param(struct intel_guc_log *log, const struct guc_log_section *section,
> +                          s32 param)
> +{
> +       /* -1 means default */
> +       if (param < 0)
> +               return section->default_val;
> +
> +       /* Check for 32-bit overflow */
> +       if (param >= SZ_4K) {
> +               drm_err(&guc_to_gt(log_to_guc(log))->i915->drm, "Size too large for GuC %s log: %dMB!",
> +                       section->name, param);
> +               return section->default_val;
> +       }
> +
> +       /* Param units are 1MB */
> +       return param * SZ_1M;
> +}
> +
> +static void _guc_log_init_sizes(struct intel_guc_log *log)
> +{
> +       struct intel_guc *guc = log_to_guc(log);
> +       struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> +       static const struct guc_log_section sections[GUC_LOG_SECTIONS_LIMIT] = {
> +               {
> +                       GUC_LOG_CRASH_MASK >> GUC_LOG_CRASH_SHIFT,
> +                       GUC_LOG_LOG_ALLOC_UNITS,
> +                       GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE,
> +                       "crash dump"
> +               },
> +               {
> +                       GUC_LOG_DEBUG_MASK >> GUC_LOG_DEBUG_SHIFT,
> +                       GUC_LOG_LOG_ALLOC_UNITS,
> +                       GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE,
> +                       "debug",
> +               },
> +               {
> +                       GUC_LOG_CAPTURE_MASK >> GUC_LOG_CAPTURE_SHIFT,
> +                       GUC_LOG_CAPTURE_ALLOC_UNITS,
> +                       GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE,
> +                       "capture",
> +               }
> +       };
> +       s32 params[GUC_LOG_SECTIONS_LIMIT] = {
> +               i915->params.guc_log_size_crash,
> +               i915->params.guc_log_size_debug,
> +               i915->params.guc_log_size_capture,
> +       };
> +       int i;
> +
> +       for (i = 0; i < GUC_LOG_SECTIONS_LIMIT; i++)
> +               log->sizes[i].bytes = scale_log_param(log, sections + i, params[i]);
> +
> +       /* If debug size > 1MB then bump default crash size to keep the same units */
> +       if (log->sizes[GUC_LOG_SECTIONS_DEBUG].bytes >= SZ_1M &&
> +           (i915->params.guc_log_size_crash == -1) &&
> +           GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE < SZ_1M)
> +               log->sizes[GUC_LOG_SECTIONS_CRASH].bytes = SZ_1M;
> +
> +       /* Prepare the GuC API structure fields: */
> +       for (i = 0; i < GUC_LOG_SECTIONS_LIMIT; i++) {
> +               /* Convert to correct units */
> +               if ((log->sizes[i].bytes % SZ_1M) == 0) {
> +                       log->sizes[i].units = SZ_1M;
> +                       log->sizes[i].flag = sections[i].flag;
> +               } else {
> +                       log->sizes[i].units = SZ_4K;
> +                       log->sizes[i].flag = 0;
> +               }
> +
> +               if (!IS_ALIGNED(log->sizes[i].bytes, log->sizes[i].units))
> +                       drm_err(&i915->drm, "Mis-aligned GuC log %s size: 0x%X vs 0x%X!",
> +                               sections[i].name, log->sizes[i].bytes, log->sizes[i].units);
> +               log->sizes[i].count = log->sizes[i].bytes / log->sizes[i].units;
> +
> +               if (!log->sizes[i].count) {
> +                       drm_err(&i915->drm, "Zero GuC log %s size!", sections[i].name);
> +               } else {
> +                       /* Size is +1 unit */
> +                       log->sizes[i].count--;
> +               }
> +
> +               /* Clip to field size */
> +               if (log->sizes[i].count > sections[i].max) {
> +                       drm_err(&i915->drm, "GuC log %s size too large: %d vs %d!",
> +                               sections[i].name, log->sizes[i].count + 1, sections[i].max + 1);
> +                       log->sizes[i].count = sections[i].max;
> +               }
> +       }
> +
> +       if (log->sizes[GUC_LOG_SECTIONS_CRASH].units != log->sizes[GUC_LOG_SECTIONS_DEBUG].units) {
> +               drm_err(&i915->drm, "Unit mis-match for GuC log crash and debug sections: %d vs %d!",
> +                       log->sizes[GUC_LOG_SECTIONS_CRASH].units,
> +                       log->sizes[GUC_LOG_SECTIONS_DEBUG].units);
> +               log->sizes[GUC_LOG_SECTIONS_CRASH].units = log->sizes[GUC_LOG_SECTIONS_DEBUG].units;
> +               log->sizes[GUC_LOG_SECTIONS_CRASH].count = 0;
> +       }
> +
> +       log->sizes_initialised = true;
> +}
> +
> +static void guc_log_init_sizes(struct intel_guc_log *log)
> +{
> +       if (log->sizes_initialised)
> +               return;
> +
> +       _guc_log_init_sizes(log);
> +}
> +
> +static u32 intel_guc_log_section_size_crash(struct intel_guc_log *log)
> +{
> +       guc_log_init_sizes(log);
> +
> +       return log->sizes[GUC_LOG_SECTIONS_CRASH].bytes;
> +}
> +
> +static u32 intel_guc_log_section_size_debug(struct intel_guc_log *log)
> +{
> +       guc_log_init_sizes(log);
> +
> +       return log->sizes[GUC_LOG_SECTIONS_DEBUG].bytes;
> +}
> +
> +u32 intel_guc_log_section_size_capture(struct intel_guc_log *log)
> +{
> +       guc_log_init_sizes(log);
> +
> +       return log->sizes[GUC_LOG_SECTIONS_CAPTURE].bytes;
> +}
> +
>  static u32 intel_guc_log_size(struct intel_guc_log *log)
>  {
>         /*
> @@ -38,7 +188,10 @@ static u32 intel_guc_log_size(struct intel_guc_log *log)
>          *  |         Capture logs          |
>          *  +===============================+ + CAPTURE_SIZE
>          */
> -       return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
> +       return PAGE_SIZE +
> +               intel_guc_log_section_size_crash(log) +
> +               intel_guc_log_section_size_debug(log) +
> +               intel_guc_log_section_size_capture(log);
>  }
>  
>  /**
> @@ -165,7 +318,8 @@ static void guc_move_to_next_buf(struct intel_guc_log *log)
>         smp_wmb();
>  
>         /* All data has been written, so now move the offset of sub buffer. */
> -       relay_reserve(log->relay.channel, log->vma->obj->base.size - CAPTURE_BUFFER_SIZE);
> +       relay_reserve(log->relay.channel, log->vma->obj->base.size -
> +                                         intel_guc_log_section_size_capture(log));
>  
>         /* Switch to the next sub buffer */
>         relay_flush(log->relay.channel);
> @@ -210,15 +364,16 @@ bool intel_guc_check_log_buf_overflow(struct intel_guc_log *log,
>         return overflow;
>  }
>  
> -unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type)
> +unsigned int intel_guc_get_log_buffer_size(struct intel_guc_log *log,
> +                                          enum guc_log_buffer_type type)
>  {
>         switch (type) {
>         case GUC_DEBUG_LOG_BUFFER:
> -               return DEBUG_BUFFER_SIZE;
> +               return intel_guc_log_section_size_debug(log);
>         case GUC_CRASH_DUMP_LOG_BUFFER:
> -               return CRASH_BUFFER_SIZE;
> +               return intel_guc_log_section_size_crash(log);
>         case GUC_CAPTURE_LOG_BUFFER:
> -               return CAPTURE_BUFFER_SIZE;
> +               return intel_guc_log_section_size_capture(log);
>         default:
>                 MISSING_CASE(type);
>         }
> @@ -226,7 +381,8 @@ unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type)
>         return 0;
>  }
>  
> -size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type)
> +size_t intel_guc_get_log_buffer_offset(struct intel_guc_log *log,
> +                                      enum guc_log_buffer_type type)
>  {
>         enum guc_log_buffer_type i;
>         size_t offset = PAGE_SIZE;/* for the log_buffer_states */
> @@ -234,7 +390,7 @@ size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type)
>         for (i = GUC_DEBUG_LOG_BUFFER; i < GUC_MAX_LOG_BUFFER; ++i) {
>                 if (i == type)
>                         break;
> -               offset += intel_guc_get_log_buffer_size(i);
> +               offset += intel_guc_get_log_buffer_size(log, i);
>         }
>  
>         return offset;
> @@ -285,7 +441,7 @@ static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log)
>                  */
>                 memcpy(&log_buf_state_local, log_buf_state,
>                        sizeof(struct guc_log_buffer_state));
> -               buffer_size = intel_guc_get_log_buffer_size(type);
> +               buffer_size = intel_guc_get_log_buffer_size(log, type);
>                 read_offset = log_buf_state_local.read_ptr;
>                 write_offset = log_buf_state_local.sampled_write_ptr;
>                 full_cnt = log_buf_state_local.buffer_full_cnt;
> @@ -400,7 +556,7 @@ static int guc_log_relay_create(struct intel_guc_log *log)
>           * Keep the size of sub buffers same as shared log buffer
>           * but GuC log-events excludes the error-state-capture logs
>           */
> -       subbuf_size = log->vma->size - CAPTURE_BUFFER_SIZE;
> +       subbuf_size = log->vma->size - intel_guc_log_section_size_capture(log);
>  
>         /*
>          * Store up to 8 snapshots, which is large enough to buffer sufficient
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> index dc9715411d626..02127703be809 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> @@ -15,20 +15,6 @@
>  
>  struct intel_guc;
>  
> -#if defined(CONFIG_DRM_I915_DEBUG_GUC)
> -#define CRASH_BUFFER_SIZE      SZ_2M
> -#define DEBUG_BUFFER_SIZE      SZ_16M
> -#define CAPTURE_BUFFER_SIZE    SZ_4M
> -#elif defined(CONFIG_DRM_I915_DEBUG_GEM)
> -#define CRASH_BUFFER_SIZE      SZ_1M
> -#define DEBUG_BUFFER_SIZE      SZ_2M
> -#define CAPTURE_BUFFER_SIZE    SZ_4M
> -#else
> -#define CRASH_BUFFER_SIZE      SZ_8K
> -#define DEBUG_BUFFER_SIZE      SZ_64K
> -#define CAPTURE_BUFFER_SIZE    SZ_2M
> -#endif
> -
>  /*
>   * While we're using plain log level in i915, GuC controls are much more...
>   * "elaborate"? We have a couple of bits for verbosity, separate bit for actual
> @@ -46,10 +32,30 @@ struct intel_guc;
>  #define GUC_VERBOSITY_TO_LOG_LEVEL(x)  ((x) + 2)
>  #define GUC_LOG_LEVEL_MAX GUC_VERBOSITY_TO_LOG_LEVEL(GUC_LOG_VERBOSITY_MAX)
>  
> +enum {
> +       GUC_LOG_SECTIONS_CRASH,
> +       GUC_LOG_SECTIONS_DEBUG,
> +       GUC_LOG_SECTIONS_CAPTURE,
> +       GUC_LOG_SECTIONS_LIMIT
> +};
> +
>  struct intel_guc_log {
>         u32 level;
> +
> +       /* Allocation settings */
> +       struct {
> +               s32 bytes;      /* Size in bytes */
> +               s32 units;      /* GuC API units - 1MB or 4KB */
> +               s32 count;      /* Number of API units */
> +               u32 flag;       /* GuC API units flag */
> +       } sizes[GUC_LOG_SECTIONS_LIMIT];
> +       bool sizes_initialised;
> +
> +       /* Combined buffer allocation */
>         struct i915_vma *vma;
>         void *buf_addr;
> +
> +       /* RelayFS support */
>         struct {
>                 bool buf_in_use;
>                 bool started;
> @@ -58,6 +64,7 @@ struct intel_guc_log {
>                 struct mutex lock;
>                 u32 full_count;
>         } relay;
> +
>         /* logging related stats */
>         struct {
>                 u32 sampled_overflow;
> @@ -69,8 +76,9 @@ struct intel_guc_log {
>  void intel_guc_log_init_early(struct intel_guc_log *log);
>  bool intel_guc_check_log_buf_overflow(struct intel_guc_log *log, enum guc_log_buffer_type type,
>                                       unsigned int full_cnt);
> -unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type);
> -size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type);
> +unsigned int intel_guc_get_log_buffer_size(struct intel_guc_log *log,
> +                                          enum guc_log_buffer_type type);
> +size_t intel_guc_get_log_buffer_offset(struct intel_guc_log *log, enum guc_log_buffer_type type);
>  int intel_guc_log_create(struct intel_guc_log *log);
>  void intel_guc_log_destroy(struct intel_guc_log *log);
>  
> @@ -92,4 +100,6 @@ void intel_guc_log_info(struct intel_guc_log *log, struct drm_printer *p);
>  int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
>                        bool dump_load_err);
>  
> +u32 intel_guc_log_section_size_capture(struct intel_guc_log *log);
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index 6fc475a5db615..06ca5b8221118 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -171,6 +171,18 @@ i915_param_named(guc_log_level, int, 0400,
>         "GuC firmware logging level. Requires GuC to be loaded. "
>         "(-1=auto [default], 0=disable, 1..4=enable with verbosity min..max)");
>  
> +i915_param_named(guc_log_size_crash, int, 0400,
> +       "GuC firmware logging buffer size for crash dumps (in MB)"
> +       "(-1=auto [default], NB: max = 4, other restrictions apply)");
> +
> +i915_param_named(guc_log_size_debug, int, 0400,
> +       "GuC firmware logging buffer size for debug logs (in MB)"
> +       "(-1=auto [default], NB: max = 16, other restrictions apply)");
> +
> +i915_param_named(guc_log_size_capture, int, 0400,
> +       "GuC error capture register dump buffer size (in MB)"
> +       "(-1=auto [default], NB: max = 4, other restrictions apply)");
> +
>  i915_param_named_unsafe(guc_firmware_path, charp, 0400,
>         "GuC firmware path to use instead of the default one");
>  
> diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
> index 2733cb6cfe094..f684d1ab87078 100644
> --- a/drivers/gpu/drm/i915/i915_params.h
> +++ b/drivers/gpu/drm/i915/i915_params.h
> @@ -61,6 +61,9 @@ struct drm_printer;
>         param(int, invert_brightness, 0, 0600) \
>         param(int, enable_guc, -1, 0400) \
>         param(int, guc_log_level, -1, 0400) \
> +       param(int, guc_log_size_crash, -1, 0400) \
> +       param(int, guc_log_size_debug, -1, 0400) \
> +       param(int, guc_log_size_capture, -1, 0400) \
>         param(char *, guc_firmware_path, NULL, 0400) \
>         param(char *, huc_firmware_path, NULL, 0400) \
>         param(char *, dmc_firmware_path, NULL, 0400) \
> -- 
> 2.37.1
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable
       [not found]     ` <4bd7b51a-caf0-d987-c7df-6cfb24f36597@intel.com>
@ 2022-08-25  7:15       ` Joonas Lahtinen
  2022-08-25 16:31         ` John Harrison
  0 siblings, 1 reply; 29+ messages in thread
From: Joonas Lahtinen @ 2022-08-25  7:15 UTC (permalink / raw)
  To: Intel-GFX, John Harrison; +Cc: DRI-Devel

Quoting John Harrison (2022-08-24 21:45:09)
> On 8/24/2022 02:01, Joonas Lahtinen wrote:
> > NACK on this one. Let's get this reverted or fixed to eliminate
> > new module parameters.
> >
> > What prevents us just from using the maximum sizes? Or alternatively
> > we could check the already existing drm.debug variable or anything else
> > but addding 3 new module parameters.
> We don't want to waste 24MB of memory for all users when 99.999% of them 
> don't care about GuC logs.

It is not exactly wasting memory if it is what is needed to capture
the error logs to properly debug a system. And it's definitely not much
on any modern system where you will have a GPU. You can always leave
the Kconfig options in place for the cases where it matters.

On the other hand, if 99.999% don't need this, then it could just stay
as a kernel config time option as well?

> We also don't want to tie the GuC logging buffer size to the DRM 
> debugging output. Enabling kernel debug output can change timings and 
> prevent the issue that one is trying to capture in the GuC log. And it 
> seems unlikely we could add an entire new top level DRM debug flag just 
> for an internal i915 only log buffer size. Plus drm.debug is explicitly 
> stated as being dynamically changeable via sysfs at any time. The GuC 
> log buffer size cannot be changed without reloading the i915 driver. Or 
> at least, not without reloading the GuC, but we definitely don't want to 
> create a UAPI for arbitrarily reloading the GuC.
> 
> We can make these parameters 'unsafe' so that they taint the kernel if 
> used. But this is exactly what module parameters are for - configuration 
> that we don't want to hardcode as CONFIG options but which must be set 
> at module load time.

It's better to have sane defaults. And if somebody wants something
strange, they can have a Kconfig behind EXPERT option. But even then
there should really be need for it.

So for now, let's get the module parameters reverted and go with
reasonable default buffer sizes when GuC is enabled. The compile time
options can be left in place.

Thank you in advance.

Regards, Joonas

> 
> John.
> 
> >
> > For future reference, please do Cc maintainers when adding new uAPI
> > like module parameters.
> >
> > Regards, Joonas
> >
> > Quoting John.C.Harrison@Intel.com (2022-07-28 05:20:27)
> >> From: John Harrison <John.C.Harrison@Intel.com>
> >>
> >> The GuC log buffer sizes had to be configured statically at compile
> >> time. This can be quite troublesome when needing to get larger logs
> >> out of a released driver. So re-organise the code to allow a boot time
> >> module parameter override.
> >>
> >> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  53 ++----
> >>   .../gpu/drm/i915/gt/uc/intel_guc_capture.c    |  14 +-
> >>   drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 176 +++++++++++++++++-
> >>   drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  42 +++--
> >>   drivers/gpu/drm/i915/i915_params.c            |  12 ++
> >>   drivers/gpu/drm/i915/i915_params.h            |   3 +
> >>   6 files changed, 226 insertions(+), 74 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> >> index ab4aacc516aa4..01f2705cb94a3 100644
> >> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> >> @@ -224,53 +224,22 @@ static u32 guc_ctl_feature_flags(struct intel_guc *guc)
> >>   
> >>   static u32 guc_ctl_log_params_flags(struct intel_guc *guc)
> >>   {
> >> -       u32 offset = intel_guc_ggtt_offset(guc, guc->log.vma) >> PAGE_SHIFT;
> >> -       u32 flags;
> >> -
> >> -       #if (((CRASH_BUFFER_SIZE) % SZ_1M) == 0)
> >> -       #define LOG_UNIT SZ_1M
> >> -       #define LOG_FLAG GUC_LOG_LOG_ALLOC_UNITS
> >> -       #else
> >> -       #define LOG_UNIT SZ_4K
> >> -       #define LOG_FLAG 0
> >> -       #endif
> >> -
> >> -       #if (((CAPTURE_BUFFER_SIZE) % SZ_1M) == 0)
> >> -       #define CAPTURE_UNIT SZ_1M
> >> -       #define CAPTURE_FLAG GUC_LOG_CAPTURE_ALLOC_UNITS
> >> -       #else
> >> -       #define CAPTURE_UNIT SZ_4K
> >> -       #define CAPTURE_FLAG 0
> >> -       #endif
> >> -
> >> -       BUILD_BUG_ON(!CRASH_BUFFER_SIZE);
> >> -       BUILD_BUG_ON(!IS_ALIGNED(CRASH_BUFFER_SIZE, LOG_UNIT));
> >> -       BUILD_BUG_ON(!DEBUG_BUFFER_SIZE);
> >> -       BUILD_BUG_ON(!IS_ALIGNED(DEBUG_BUFFER_SIZE, LOG_UNIT));
> >> -       BUILD_BUG_ON(!CAPTURE_BUFFER_SIZE);
> >> -       BUILD_BUG_ON(!IS_ALIGNED(CAPTURE_BUFFER_SIZE, CAPTURE_UNIT));
> >> -
> >> -       BUILD_BUG_ON((CRASH_BUFFER_SIZE / LOG_UNIT - 1) >
> >> -                       (GUC_LOG_CRASH_MASK >> GUC_LOG_CRASH_SHIFT));
> >> -       BUILD_BUG_ON((DEBUG_BUFFER_SIZE / LOG_UNIT - 1) >
> >> -                       (GUC_LOG_DEBUG_MASK >> GUC_LOG_DEBUG_SHIFT));
> >> -       BUILD_BUG_ON((CAPTURE_BUFFER_SIZE / CAPTURE_UNIT - 1) >
> >> -                       (GUC_LOG_CAPTURE_MASK >> GUC_LOG_CAPTURE_SHIFT));
> >> +       struct intel_guc_log *log = &guc->log;
> >> +       u32 offset, flags;
> >> +
> >> +       GEM_BUG_ON(!log->sizes_initialised);
> >> +
> >> +       offset = intel_guc_ggtt_offset(guc, log->vma) >> PAGE_SHIFT;
> >>   
> >>          flags = GUC_LOG_VALID |
> >>                  GUC_LOG_NOTIFY_ON_HALF_FULL |
> >> -               CAPTURE_FLAG |
> >> -               LOG_FLAG |
> >> -               ((CRASH_BUFFER_SIZE / LOG_UNIT - 1) << GUC_LOG_CRASH_SHIFT) |
> >> -               ((DEBUG_BUFFER_SIZE / LOG_UNIT - 1) << GUC_LOG_DEBUG_SHIFT) |
> >> -               ((CAPTURE_BUFFER_SIZE / CAPTURE_UNIT - 1) << GUC_LOG_CAPTURE_SHIFT) |
> >> +               log->sizes[GUC_LOG_SECTIONS_DEBUG].flag |
> >> +               log->sizes[GUC_LOG_SECTIONS_CAPTURE].flag |
> >> +               (log->sizes[GUC_LOG_SECTIONS_CRASH].count << GUC_LOG_CRASH_SHIFT) |
> >> +               (log->sizes[GUC_LOG_SECTIONS_DEBUG].count << GUC_LOG_DEBUG_SHIFT) |
> >> +               (log->sizes[GUC_LOG_SECTIONS_CAPTURE].count << GUC_LOG_CAPTURE_SHIFT) |
> >>                  (offset << GUC_LOG_BUF_ADDR_SHIFT);
> >>   
> >> -       #undef LOG_UNIT
> >> -       #undef LOG_FLAG
> >> -       #undef CAPTURE_UNIT
> >> -       #undef CAPTURE_FLAG
> >> -
> >>          return flags;
> >>   }
> >>   
> >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> >> index b54b7883320b1..d2ac53d4f3b6e 100644
> >> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> >> @@ -656,16 +656,17 @@ static void check_guc_capture_size(struct intel_guc *guc)
> >>          struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> >>          int min_size = guc_capture_output_min_size_est(guc);
> >>          int spare_size = min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER;
> >> +       u32 buffer_size = intel_guc_log_section_size_capture(&guc->log);
> >>   
> >>          if (min_size < 0)
> >>                  drm_warn(&i915->drm, "Failed to calculate GuC error state capture buffer minimum size: %d!\n",
> >>                           min_size);
> >> -       else if (min_size > CAPTURE_BUFFER_SIZE)
> >> +       else if (min_size > buffer_size)
> >>                  drm_warn(&i915->drm, "GuC error state capture buffer is too small: %d < %d\n",
> >> -                        CAPTURE_BUFFER_SIZE, min_size);
> >> -       else if (spare_size > CAPTURE_BUFFER_SIZE)
> >> +                        buffer_size, min_size);
> >> +       else if (spare_size > buffer_size)
> >>                  drm_notice(&i915->drm, "GuC error state capture buffer maybe too small: %d < %d (min = %d)\n",
> >> -                          CAPTURE_BUFFER_SIZE, spare_size, min_size);
> >> +                          buffer_size, spare_size, min_size);
> >>   }
> >>   
> >>   /*
> >> @@ -1294,7 +1295,8 @@ static void __guc_capture_process_output(struct intel_guc *guc)
> >>   
> >>          log_buf_state = guc->log.buf_addr +
> >>                          (sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER);
> >> -       src_data = guc->log.buf_addr + intel_guc_get_log_buffer_offset(GUC_CAPTURE_LOG_BUFFER);
> >> +       src_data = guc->log.buf_addr +
> >> +                  intel_guc_get_log_buffer_offset(&guc->log, GUC_CAPTURE_LOG_BUFFER);
> >>   
> >>          /*
> >>           * Make a copy of the state structure, inside GuC log buffer
> >> @@ -1302,7 +1304,7 @@ static void __guc_capture_process_output(struct intel_guc *guc)
> >>           * from it multiple times.
> >>           */
> >>          memcpy(&log_buf_state_local, log_buf_state, sizeof(struct guc_log_buffer_state));
> >> -       buffer_size = intel_guc_get_log_buffer_size(GUC_CAPTURE_LOG_BUFFER);
> >> +       buffer_size = intel_guc_get_log_buffer_size(&guc->log, GUC_CAPTURE_LOG_BUFFER);
> >>          read_offset = log_buf_state_local.read_ptr;
> >>          write_offset = log_buf_state_local.sampled_write_ptr;
> >>          full_count = log_buf_state_local.buffer_full_cnt;
> >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> >> index 4722d4b18ed19..890b6853bd609 100644
> >> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> >> @@ -13,8 +13,158 @@
> >>   #include "intel_guc_capture.h"
> >>   #include "intel_guc_log.h"
> >>   
> >> +#if defined(CONFIG_DRM_I915_DEBUG_GUC)
> >> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_2M
> >> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_16M
> >> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_4M
> >> +#elif defined(CONFIG_DRM_I915_DEBUG_GEM)
> >> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_1M
> >> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_2M
> >> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_4M
> >> +#else
> >> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_8K
> >> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_64K
> >> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_2M
> >> +#endif
> >> +
> >>   static void guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log);
> >>   
> >> +struct guc_log_section {
> >> +       u32 max;
> >> +       u32 flag;
> >> +       u32 default_val;
> >> +       const char *name;
> >> +};
> >> +
> >> +static s32 scale_log_param(struct intel_guc_log *log, const struct guc_log_section *section,
> >> +                          s32 param)
> >> +{
> >> +       /* -1 means default */
> >> +       if (param < 0)
> >> +               return section->default_val;
> >> +
> >> +       /* Check for 32-bit overflow */
> >> +       if (param >= SZ_4K) {
> >> +               drm_err(&guc_to_gt(log_to_guc(log))->i915->drm, "Size too large for GuC %s log: %dMB!",
> >> +                       section->name, param);
> >> +               return section->default_val;
> >> +       }
> >> +
> >> +       /* Param units are 1MB */
> >> +       return param * SZ_1M;
> >> +}
> >> +
> >> +static void _guc_log_init_sizes(struct intel_guc_log *log)
> >> +{
> >> +       struct intel_guc *guc = log_to_guc(log);
> >> +       struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> >> +       static const struct guc_log_section sections[GUC_LOG_SECTIONS_LIMIT] = {
> >> +               {
> >> +                       GUC_LOG_CRASH_MASK >> GUC_LOG_CRASH_SHIFT,
> >> +                       GUC_LOG_LOG_ALLOC_UNITS,
> >> +                       GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE,
> >> +                       "crash dump"
> >> +               },
> >> +               {
> >> +                       GUC_LOG_DEBUG_MASK >> GUC_LOG_DEBUG_SHIFT,
> >> +                       GUC_LOG_LOG_ALLOC_UNITS,
> >> +                       GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE,
> >> +                       "debug",
> >> +               },
> >> +               {
> >> +                       GUC_LOG_CAPTURE_MASK >> GUC_LOG_CAPTURE_SHIFT,
> >> +                       GUC_LOG_CAPTURE_ALLOC_UNITS,
> >> +                       GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE,
> >> +                       "capture",
> >> +               }
> >> +       };
> >> +       s32 params[GUC_LOG_SECTIONS_LIMIT] = {
> >> +               i915->params.guc_log_size_crash,
> >> +               i915->params.guc_log_size_debug,
> >> +               i915->params.guc_log_size_capture,
> >> +       };
> >> +       int i;
> >> +
> >> +       for (i = 0; i < GUC_LOG_SECTIONS_LIMIT; i++)
> >> +               log->sizes[i].bytes = scale_log_param(log, sections + i, params[i]);
> >> +
> >> +       /* If debug size > 1MB then bump default crash size to keep the same units */
> >> +       if (log->sizes[GUC_LOG_SECTIONS_DEBUG].bytes >= SZ_1M &&
> >> +           (i915->params.guc_log_size_crash == -1) &&
> >> +           GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE < SZ_1M)
> >> +               log->sizes[GUC_LOG_SECTIONS_CRASH].bytes = SZ_1M;
> >> +
> >> +       /* Prepare the GuC API structure fields: */
> >> +       for (i = 0; i < GUC_LOG_SECTIONS_LIMIT; i++) {
> >> +               /* Convert to correct units */
> >> +               if ((log->sizes[i].bytes % SZ_1M) == 0) {
> >> +                       log->sizes[i].units = SZ_1M;
> >> +                       log->sizes[i].flag = sections[i].flag;
> >> +               } else {
> >> +                       log->sizes[i].units = SZ_4K;
> >> +                       log->sizes[i].flag = 0;
> >> +               }
> >> +
> >> +               if (!IS_ALIGNED(log->sizes[i].bytes, log->sizes[i].units))
> >> +                       drm_err(&i915->drm, "Mis-aligned GuC log %s size: 0x%X vs 0x%X!",
> >> +                               sections[i].name, log->sizes[i].bytes, log->sizes[i].units);
> >> +               log->sizes[i].count = log->sizes[i].bytes / log->sizes[i].units;
> >> +
> >> +               if (!log->sizes[i].count) {
> >> +                       drm_err(&i915->drm, "Zero GuC log %s size!", sections[i].name);
> >> +               } else {
> >> +                       /* Size is +1 unit */
> >> +                       log->sizes[i].count--;
> >> +               }
> >> +
> >> +               /* Clip to field size */
> >> +               if (log->sizes[i].count > sections[i].max) {
> >> +                       drm_err(&i915->drm, "GuC log %s size too large: %d vs %d!",
> >> +                               sections[i].name, log->sizes[i].count + 1, sections[i].max + 1);
> >> +                       log->sizes[i].count = sections[i].max;
> >> +               }
> >> +       }
> >> +
> >> +       if (log->sizes[GUC_LOG_SECTIONS_CRASH].units != log->sizes[GUC_LOG_SECTIONS_DEBUG].units) {
> >> +               drm_err(&i915->drm, "Unit mis-match for GuC log crash and debug sections: %d vs %d!",
> >> +                       log->sizes[GUC_LOG_SECTIONS_CRASH].units,
> >> +                       log->sizes[GUC_LOG_SECTIONS_DEBUG].units);
> >> +               log->sizes[GUC_LOG_SECTIONS_CRASH].units = log->sizes[GUC_LOG_SECTIONS_DEBUG].units;
> >> +               log->sizes[GUC_LOG_SECTIONS_CRASH].count = 0;
> >> +       }
> >> +
> >> +       log->sizes_initialised = true;
> >> +}
> >> +
> >> +static void guc_log_init_sizes(struct intel_guc_log *log)
> >> +{
> >> +       if (log->sizes_initialised)
> >> +               return;
> >> +
> >> +       _guc_log_init_sizes(log);
> >> +}
> >> +
> >> +static u32 intel_guc_log_section_size_crash(struct intel_guc_log *log)
> >> +{
> >> +       guc_log_init_sizes(log);
> >> +
> >> +       return log->sizes[GUC_LOG_SECTIONS_CRASH].bytes;
> >> +}
> >> +
> >> +static u32 intel_guc_log_section_size_debug(struct intel_guc_log *log)
> >> +{
> >> +       guc_log_init_sizes(log);
> >> +
> >> +       return log->sizes[GUC_LOG_SECTIONS_DEBUG].bytes;
> >> +}
> >> +
> >> +u32 intel_guc_log_section_size_capture(struct intel_guc_log *log)
> >> +{
> >> +       guc_log_init_sizes(log);
> >> +
> >> +       return log->sizes[GUC_LOG_SECTIONS_CAPTURE].bytes;
> >> +}
> >> +
> >>   static u32 intel_guc_log_size(struct intel_guc_log *log)
> >>   {
> >>          /*
> >> @@ -38,7 +188,10 @@ static u32 intel_guc_log_size(struct intel_guc_log *log)
> >>           *  |         Capture logs          |
> >>           *  +===============================+ + CAPTURE_SIZE
> >>           */
> >> -       return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
> >> +       return PAGE_SIZE +
> >> +               intel_guc_log_section_size_crash(log) +
> >> +               intel_guc_log_section_size_debug(log) +
> >> +               intel_guc_log_section_size_capture(log);
> >>   }
> >>   
> >>   /**
> >> @@ -165,7 +318,8 @@ static void guc_move_to_next_buf(struct intel_guc_log *log)
> >>          smp_wmb();
> >>   
> >>          /* All data has been written, so now move the offset of sub buffer. */
> >> -       relay_reserve(log->relay.channel, log->vma->obj->base.size - CAPTURE_BUFFER_SIZE);
> >> +       relay_reserve(log->relay.channel, log->vma->obj->base.size -
> >> +                                         intel_guc_log_section_size_capture(log));
> >>   
> >>          /* Switch to the next sub buffer */
> >>          relay_flush(log->relay.channel);
> >> @@ -210,15 +364,16 @@ bool intel_guc_check_log_buf_overflow(struct intel_guc_log *log,
> >>          return overflow;
> >>   }
> >>   
> >> -unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type)
> >> +unsigned int intel_guc_get_log_buffer_size(struct intel_guc_log *log,
> >> +                                          enum guc_log_buffer_type type)
> >>   {
> >>          switch (type) {
> >>          case GUC_DEBUG_LOG_BUFFER:
> >> -               return DEBUG_BUFFER_SIZE;
> >> +               return intel_guc_log_section_size_debug(log);
> >>          case GUC_CRASH_DUMP_LOG_BUFFER:
> >> -               return CRASH_BUFFER_SIZE;
> >> +               return intel_guc_log_section_size_crash(log);
> >>          case GUC_CAPTURE_LOG_BUFFER:
> >> -               return CAPTURE_BUFFER_SIZE;
> >> +               return intel_guc_log_section_size_capture(log);
> >>          default:
> >>                  MISSING_CASE(type);
> >>          }
> >> @@ -226,7 +381,8 @@ unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type)
> >>          return 0;
> >>   }
> >>   
> >> -size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type)
> >> +size_t intel_guc_get_log_buffer_offset(struct intel_guc_log *log,
> >> +                                      enum guc_log_buffer_type type)
> >>   {
> >>          enum guc_log_buffer_type i;
> >>          size_t offset = PAGE_SIZE;/* for the log_buffer_states */
> >> @@ -234,7 +390,7 @@ size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type)
> >>          for (i = GUC_DEBUG_LOG_BUFFER; i < GUC_MAX_LOG_BUFFER; ++i) {
> >>                  if (i == type)
> >>                          break;
> >> -               offset += intel_guc_get_log_buffer_size(i);
> >> +               offset += intel_guc_get_log_buffer_size(log, i);
> >>          }
> >>   
> >>          return offset;
> >> @@ -285,7 +441,7 @@ static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log)
> >>                   */
> >>                  memcpy(&log_buf_state_local, log_buf_state,
> >>                         sizeof(struct guc_log_buffer_state));
> >> -               buffer_size = intel_guc_get_log_buffer_size(type);
> >> +               buffer_size = intel_guc_get_log_buffer_size(log, type);
> >>                  read_offset = log_buf_state_local.read_ptr;
> >>                  write_offset = log_buf_state_local.sampled_write_ptr;
> >>                  full_cnt = log_buf_state_local.buffer_full_cnt;
> >> @@ -400,7 +556,7 @@ static int guc_log_relay_create(struct intel_guc_log *log)
> >>            * Keep the size of sub buffers same as shared log buffer
> >>            * but GuC log-events excludes the error-state-capture logs
> >>            */
> >> -       subbuf_size = log->vma->size - CAPTURE_BUFFER_SIZE;
> >> +       subbuf_size = log->vma->size - intel_guc_log_section_size_capture(log);
> >>   
> >>          /*
> >>           * Store up to 8 snapshots, which is large enough to buffer sufficient
> >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> >> index dc9715411d626..02127703be809 100644
> >> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> >> @@ -15,20 +15,6 @@
> >>   
> >>   struct intel_guc;
> >>   
> >> -#if defined(CONFIG_DRM_I915_DEBUG_GUC)
> >> -#define CRASH_BUFFER_SIZE      SZ_2M
> >> -#define DEBUG_BUFFER_SIZE      SZ_16M
> >> -#define CAPTURE_BUFFER_SIZE    SZ_4M
> >> -#elif defined(CONFIG_DRM_I915_DEBUG_GEM)
> >> -#define CRASH_BUFFER_SIZE      SZ_1M
> >> -#define DEBUG_BUFFER_SIZE      SZ_2M
> >> -#define CAPTURE_BUFFER_SIZE    SZ_4M
> >> -#else
> >> -#define CRASH_BUFFER_SIZE      SZ_8K
> >> -#define DEBUG_BUFFER_SIZE      SZ_64K
> >> -#define CAPTURE_BUFFER_SIZE    SZ_2M
> >> -#endif
> >> -
> >>   /*
> >>    * While we're using plain log level in i915, GuC controls are much more...
> >>    * "elaborate"? We have a couple of bits for verbosity, separate bit for actual
> >> @@ -46,10 +32,30 @@ struct intel_guc;
> >>   #define GUC_VERBOSITY_TO_LOG_LEVEL(x)  ((x) + 2)
> >>   #define GUC_LOG_LEVEL_MAX GUC_VERBOSITY_TO_LOG_LEVEL(GUC_LOG_VERBOSITY_MAX)
> >>   
> >> +enum {
> >> +       GUC_LOG_SECTIONS_CRASH,
> >> +       GUC_LOG_SECTIONS_DEBUG,
> >> +       GUC_LOG_SECTIONS_CAPTURE,
> >> +       GUC_LOG_SECTIONS_LIMIT
> >> +};
> >> +
> >>   struct intel_guc_log {
> >>          u32 level;
> >> +
> >> +       /* Allocation settings */
> >> +       struct {
> >> +               s32 bytes;      /* Size in bytes */
> >> +               s32 units;      /* GuC API units - 1MB or 4KB */
> >> +               s32 count;      /* Number of API units */
> >> +               u32 flag;       /* GuC API units flag */
> >> +       } sizes[GUC_LOG_SECTIONS_LIMIT];
> >> +       bool sizes_initialised;
> >> +
> >> +       /* Combined buffer allocation */
> >>          struct i915_vma *vma;
> >>          void *buf_addr;
> >> +
> >> +       /* RelayFS support */
> >>          struct {
> >>                  bool buf_in_use;
> >>                  bool started;
> >> @@ -58,6 +64,7 @@ struct intel_guc_log {
> >>                  struct mutex lock;
> >>                  u32 full_count;
> >>          } relay;
> >> +
> >>          /* logging related stats */
> >>          struct {
> >>                  u32 sampled_overflow;
> >> @@ -69,8 +76,9 @@ struct intel_guc_log {
> >>   void intel_guc_log_init_early(struct intel_guc_log *log);
> >>   bool intel_guc_check_log_buf_overflow(struct intel_guc_log *log, enum guc_log_buffer_type type,
> >>                                        unsigned int full_cnt);
> >> -unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type);
> >> -size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type);
> >> +unsigned int intel_guc_get_log_buffer_size(struct intel_guc_log *log,
> >> +                                          enum guc_log_buffer_type type);
> >> +size_t intel_guc_get_log_buffer_offset(struct intel_guc_log *log, enum guc_log_buffer_type type);
> >>   int intel_guc_log_create(struct intel_guc_log *log);
> >>   void intel_guc_log_destroy(struct intel_guc_log *log);
> >>   
> >> @@ -92,4 +100,6 @@ void intel_guc_log_info(struct intel_guc_log *log, struct drm_printer *p);
> >>   int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
> >>                         bool dump_load_err);
> >>   
> >> +u32 intel_guc_log_section_size_capture(struct intel_guc_log *log);
> >> +
> >>   #endif
> >> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> >> index 6fc475a5db615..06ca5b8221118 100644
> >> --- a/drivers/gpu/drm/i915/i915_params.c
> >> +++ b/drivers/gpu/drm/i915/i915_params.c
> >> @@ -171,6 +171,18 @@ i915_param_named(guc_log_level, int, 0400,
> >>          "GuC firmware logging level. Requires GuC to be loaded. "
> >>          "(-1=auto [default], 0=disable, 1..4=enable with verbosity min..max)");
> >>   
> >> +i915_param_named(guc_log_size_crash, int, 0400,
> >> +       "GuC firmware logging buffer size for crash dumps (in MB)"
> >> +       "(-1=auto [default], NB: max = 4, other restrictions apply)");
> >> +
> >> +i915_param_named(guc_log_size_debug, int, 0400,
> >> +       "GuC firmware logging buffer size for debug logs (in MB)"
> >> +       "(-1=auto [default], NB: max = 16, other restrictions apply)");
> >> +
> >> +i915_param_named(guc_log_size_capture, int, 0400,
> >> +       "GuC error capture register dump buffer size (in MB)"
> >> +       "(-1=auto [default], NB: max = 4, other restrictions apply)");
> >> +
> >>   i915_param_named_unsafe(guc_firmware_path, charp, 0400,
> >>          "GuC firmware path to use instead of the default one");
> >>   
> >> diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
> >> index 2733cb6cfe094..f684d1ab87078 100644
> >> --- a/drivers/gpu/drm/i915/i915_params.h
> >> +++ b/drivers/gpu/drm/i915/i915_params.h
> >> @@ -61,6 +61,9 @@ struct drm_printer;
> >>          param(int, invert_brightness, 0, 0600) \
> >>          param(int, enable_guc, -1, 0400) \
> >>          param(int, guc_log_level, -1, 0400) \
> >> +       param(int, guc_log_size_crash, -1, 0400) \
> >> +       param(int, guc_log_size_debug, -1, 0400) \
> >> +       param(int, guc_log_size_capture, -1, 0400) \
> >>          param(char *, guc_firmware_path, NULL, 0400) \
> >>          param(char *, huc_firmware_path, NULL, 0400) \
> >>          param(char *, dmc_firmware_path, NULL, 0400) \
> >> -- 
> >> 2.37.1
> >>
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable
  2022-08-25  7:15       ` Joonas Lahtinen
@ 2022-08-25 16:31         ` John Harrison
  2022-08-26  6:23           ` Joonas Lahtinen
  0 siblings, 1 reply; 29+ messages in thread
From: John Harrison @ 2022-08-25 16:31 UTC (permalink / raw)
  To: Joonas Lahtinen, Intel-GFX; +Cc: DRI-Devel

On 8/25/2022 00:15, Joonas Lahtinen wrote:
> Quoting John Harrison (2022-08-24 21:45:09)
>> On 8/24/2022 02:01, Joonas Lahtinen wrote:
>>> NACK on this one. Let's get this reverted or fixed to eliminate
>>> new module parameters.
>>>
>>> What prevents us just from using the maximum sizes? Or alternatively
>>> we could check the already existing drm.debug variable or anything else
>>> but addding 3 new module parameters.
>> We don't want to waste 24MB of memory for all users when 99.999% of them
>> don't care about GuC logs.
> It is not exactly wasting memory if it is what is needed to capture
> the error logs to properly debug a system. And it's definitely not much
> on any modern system where you will have a GPU. You can always leave
> the Kconfig options in place for the cases where it matters.
>
> On the other hand, if 99.999% don't need this, then it could just stay
> as a kernel config time option as well?
No. The point is that we need to able to ask customers to increase the 
log size, repro an issue and send us the results. All on a pre-installed 
system where they do not have the option to build a custom kernel. 
Either we always allocate the maximum and waste the memory for all end 
users or we have a runtime configuration option. Compile time is not 
acceptable for some important customers/situations.

>
>> We also don't want to tie the GuC logging buffer size to the DRM
>> debugging output. Enabling kernel debug output can change timings and
>> prevent the issue that one is trying to capture in the GuC log. And it
>> seems unlikely we could add an entire new top level DRM debug flag just
>> for an internal i915 only log buffer size. Plus drm.debug is explicitly
>> stated as being dynamically changeable via sysfs at any time. The GuC
>> log buffer size cannot be changed without reloading the i915 driver. Or
>> at least, not without reloading the GuC, but we definitely don't want to
>> create a UAPI for arbitrarily reloading the GuC.
>>
>> We can make these parameters 'unsafe' so that they taint the kernel if
>> used. But this is exactly what module parameters are for - configuration
>> that we don't want to hardcode as CONFIG options but which must be set
>> at module load time.
> It's better to have sane defaults. And if somebody wants something
> strange, they can have a Kconfig behind EXPERT option. But even then
> there should really be need for it.
Define sane.

Sane for most users is to not allocate 24MB of memory for an internal 
debug only buffer they will never use. And which completely swamps any 
error capture log with the ASCII encoding of said buffer.

But as above, we need a way to (very occasionally) get larger GuC logs 
from customers without rebuilding the kernel.

John.

>
> So for now, let's get the module parameters reverted and go with
> reasonable default buffer sizes when GuC is enabled. The compile time
> options can be left in place.
>
> Thank you in advance.
>
> Regards, Joonas
>
>> John.
>>
>>> For future reference, please do Cc maintainers when adding new uAPI
>>> like module parameters.
>>>
>>> Regards, Joonas
>>>
>>> Quoting John.C.Harrison@Intel.com (2022-07-28 05:20:27)
>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> The GuC log buffer sizes had to be configured statically at compile
>>>> time. This can be quite troublesome when needing to get larger logs
>>>> out of a released driver. So re-organise the code to allow a boot time
>>>> module parameter override.
>>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  53 ++----
>>>>    .../gpu/drm/i915/gt/uc/intel_guc_capture.c    |  14 +-
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 176 +++++++++++++++++-
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  42 +++--
>>>>    drivers/gpu/drm/i915/i915_params.c            |  12 ++
>>>>    drivers/gpu/drm/i915/i915_params.h            |   3 +
>>>>    6 files changed, 226 insertions(+), 74 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>>> index ab4aacc516aa4..01f2705cb94a3 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>>>> @@ -224,53 +224,22 @@ static u32 guc_ctl_feature_flags(struct intel_guc *guc)
>>>>    
>>>>    static u32 guc_ctl_log_params_flags(struct intel_guc *guc)
>>>>    {
>>>> -       u32 offset = intel_guc_ggtt_offset(guc, guc->log.vma) >> PAGE_SHIFT;
>>>> -       u32 flags;
>>>> -
>>>> -       #if (((CRASH_BUFFER_SIZE) % SZ_1M) == 0)
>>>> -       #define LOG_UNIT SZ_1M
>>>> -       #define LOG_FLAG GUC_LOG_LOG_ALLOC_UNITS
>>>> -       #else
>>>> -       #define LOG_UNIT SZ_4K
>>>> -       #define LOG_FLAG 0
>>>> -       #endif
>>>> -
>>>> -       #if (((CAPTURE_BUFFER_SIZE) % SZ_1M) == 0)
>>>> -       #define CAPTURE_UNIT SZ_1M
>>>> -       #define CAPTURE_FLAG GUC_LOG_CAPTURE_ALLOC_UNITS
>>>> -       #else
>>>> -       #define CAPTURE_UNIT SZ_4K
>>>> -       #define CAPTURE_FLAG 0
>>>> -       #endif
>>>> -
>>>> -       BUILD_BUG_ON(!CRASH_BUFFER_SIZE);
>>>> -       BUILD_BUG_ON(!IS_ALIGNED(CRASH_BUFFER_SIZE, LOG_UNIT));
>>>> -       BUILD_BUG_ON(!DEBUG_BUFFER_SIZE);
>>>> -       BUILD_BUG_ON(!IS_ALIGNED(DEBUG_BUFFER_SIZE, LOG_UNIT));
>>>> -       BUILD_BUG_ON(!CAPTURE_BUFFER_SIZE);
>>>> -       BUILD_BUG_ON(!IS_ALIGNED(CAPTURE_BUFFER_SIZE, CAPTURE_UNIT));
>>>> -
>>>> -       BUILD_BUG_ON((CRASH_BUFFER_SIZE / LOG_UNIT - 1) >
>>>> -                       (GUC_LOG_CRASH_MASK >> GUC_LOG_CRASH_SHIFT));
>>>> -       BUILD_BUG_ON((DEBUG_BUFFER_SIZE / LOG_UNIT - 1) >
>>>> -                       (GUC_LOG_DEBUG_MASK >> GUC_LOG_DEBUG_SHIFT));
>>>> -       BUILD_BUG_ON((CAPTURE_BUFFER_SIZE / CAPTURE_UNIT - 1) >
>>>> -                       (GUC_LOG_CAPTURE_MASK >> GUC_LOG_CAPTURE_SHIFT));
>>>> +       struct intel_guc_log *log = &guc->log;
>>>> +       u32 offset, flags;
>>>> +
>>>> +       GEM_BUG_ON(!log->sizes_initialised);
>>>> +
>>>> +       offset = intel_guc_ggtt_offset(guc, log->vma) >> PAGE_SHIFT;
>>>>    
>>>>           flags = GUC_LOG_VALID |
>>>>                   GUC_LOG_NOTIFY_ON_HALF_FULL |
>>>> -               CAPTURE_FLAG |
>>>> -               LOG_FLAG |
>>>> -               ((CRASH_BUFFER_SIZE / LOG_UNIT - 1) << GUC_LOG_CRASH_SHIFT) |
>>>> -               ((DEBUG_BUFFER_SIZE / LOG_UNIT - 1) << GUC_LOG_DEBUG_SHIFT) |
>>>> -               ((CAPTURE_BUFFER_SIZE / CAPTURE_UNIT - 1) << GUC_LOG_CAPTURE_SHIFT) |
>>>> +               log->sizes[GUC_LOG_SECTIONS_DEBUG].flag |
>>>> +               log->sizes[GUC_LOG_SECTIONS_CAPTURE].flag |
>>>> +               (log->sizes[GUC_LOG_SECTIONS_CRASH].count << GUC_LOG_CRASH_SHIFT) |
>>>> +               (log->sizes[GUC_LOG_SECTIONS_DEBUG].count << GUC_LOG_DEBUG_SHIFT) |
>>>> +               (log->sizes[GUC_LOG_SECTIONS_CAPTURE].count << GUC_LOG_CAPTURE_SHIFT) |
>>>>                   (offset << GUC_LOG_BUF_ADDR_SHIFT);
>>>>    
>>>> -       #undef LOG_UNIT
>>>> -       #undef LOG_FLAG
>>>> -       #undef CAPTURE_UNIT
>>>> -       #undef CAPTURE_FLAG
>>>> -
>>>>           return flags;
>>>>    }
>>>>    
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
>>>> index b54b7883320b1..d2ac53d4f3b6e 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
>>>> @@ -656,16 +656,17 @@ static void check_guc_capture_size(struct intel_guc *guc)
>>>>           struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
>>>>           int min_size = guc_capture_output_min_size_est(guc);
>>>>           int spare_size = min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER;
>>>> +       u32 buffer_size = intel_guc_log_section_size_capture(&guc->log);
>>>>    
>>>>           if (min_size < 0)
>>>>                   drm_warn(&i915->drm, "Failed to calculate GuC error state capture buffer minimum size: %d!\n",
>>>>                            min_size);
>>>> -       else if (min_size > CAPTURE_BUFFER_SIZE)
>>>> +       else if (min_size > buffer_size)
>>>>                   drm_warn(&i915->drm, "GuC error state capture buffer is too small: %d < %d\n",
>>>> -                        CAPTURE_BUFFER_SIZE, min_size);
>>>> -       else if (spare_size > CAPTURE_BUFFER_SIZE)
>>>> +                        buffer_size, min_size);
>>>> +       else if (spare_size > buffer_size)
>>>>                   drm_notice(&i915->drm, "GuC error state capture buffer maybe too small: %d < %d (min = %d)\n",
>>>> -                          CAPTURE_BUFFER_SIZE, spare_size, min_size);
>>>> +                          buffer_size, spare_size, min_size);
>>>>    }
>>>>    
>>>>    /*
>>>> @@ -1294,7 +1295,8 @@ static void __guc_capture_process_output(struct intel_guc *guc)
>>>>    
>>>>           log_buf_state = guc->log.buf_addr +
>>>>                           (sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER);
>>>> -       src_data = guc->log.buf_addr + intel_guc_get_log_buffer_offset(GUC_CAPTURE_LOG_BUFFER);
>>>> +       src_data = guc->log.buf_addr +
>>>> +                  intel_guc_get_log_buffer_offset(&guc->log, GUC_CAPTURE_LOG_BUFFER);
>>>>    
>>>>           /*
>>>>            * Make a copy of the state structure, inside GuC log buffer
>>>> @@ -1302,7 +1304,7 @@ static void __guc_capture_process_output(struct intel_guc *guc)
>>>>            * from it multiple times.
>>>>            */
>>>>           memcpy(&log_buf_state_local, log_buf_state, sizeof(struct guc_log_buffer_state));
>>>> -       buffer_size = intel_guc_get_log_buffer_size(GUC_CAPTURE_LOG_BUFFER);
>>>> +       buffer_size = intel_guc_get_log_buffer_size(&guc->log, GUC_CAPTURE_LOG_BUFFER);
>>>>           read_offset = log_buf_state_local.read_ptr;
>>>>           write_offset = log_buf_state_local.sampled_write_ptr;
>>>>           full_count = log_buf_state_local.buffer_full_cnt;
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>>>> index 4722d4b18ed19..890b6853bd609 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
>>>> @@ -13,8 +13,158 @@
>>>>    #include "intel_guc_capture.h"
>>>>    #include "intel_guc_log.h"
>>>>    
>>>> +#if defined(CONFIG_DRM_I915_DEBUG_GUC)
>>>> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_2M
>>>> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_16M
>>>> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_4M
>>>> +#elif defined(CONFIG_DRM_I915_DEBUG_GEM)
>>>> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_1M
>>>> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_2M
>>>> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_4M
>>>> +#else
>>>> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_8K
>>>> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_64K
>>>> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_2M
>>>> +#endif
>>>> +
>>>>    static void guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log);
>>>>    
>>>> +struct guc_log_section {
>>>> +       u32 max;
>>>> +       u32 flag;
>>>> +       u32 default_val;
>>>> +       const char *name;
>>>> +};
>>>> +
>>>> +static s32 scale_log_param(struct intel_guc_log *log, const struct guc_log_section *section,
>>>> +                          s32 param)
>>>> +{
>>>> +       /* -1 means default */
>>>> +       if (param < 0)
>>>> +               return section->default_val;
>>>> +
>>>> +       /* Check for 32-bit overflow */
>>>> +       if (param >= SZ_4K) {
>>>> +               drm_err(&guc_to_gt(log_to_guc(log))->i915->drm, "Size too large for GuC %s log: %dMB!",
>>>> +                       section->name, param);
>>>> +               return section->default_val;
>>>> +       }
>>>> +
>>>> +       /* Param units are 1MB */
>>>> +       return param * SZ_1M;
>>>> +}
>>>> +
>>>> +static void _guc_log_init_sizes(struct intel_guc_log *log)
>>>> +{
>>>> +       struct intel_guc *guc = log_to_guc(log);
>>>> +       struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
>>>> +       static const struct guc_log_section sections[GUC_LOG_SECTIONS_LIMIT] = {
>>>> +               {
>>>> +                       GUC_LOG_CRASH_MASK >> GUC_LOG_CRASH_SHIFT,
>>>> +                       GUC_LOG_LOG_ALLOC_UNITS,
>>>> +                       GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE,
>>>> +                       "crash dump"
>>>> +               },
>>>> +               {
>>>> +                       GUC_LOG_DEBUG_MASK >> GUC_LOG_DEBUG_SHIFT,
>>>> +                       GUC_LOG_LOG_ALLOC_UNITS,
>>>> +                       GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE,
>>>> +                       "debug",
>>>> +               },
>>>> +               {
>>>> +                       GUC_LOG_CAPTURE_MASK >> GUC_LOG_CAPTURE_SHIFT,
>>>> +                       GUC_LOG_CAPTURE_ALLOC_UNITS,
>>>> +                       GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE,
>>>> +                       "capture",
>>>> +               }
>>>> +       };
>>>> +       s32 params[GUC_LOG_SECTIONS_LIMIT] = {
>>>> +               i915->params.guc_log_size_crash,
>>>> +               i915->params.guc_log_size_debug,
>>>> +               i915->params.guc_log_size_capture,
>>>> +       };
>>>> +       int i;
>>>> +
>>>> +       for (i = 0; i < GUC_LOG_SECTIONS_LIMIT; i++)
>>>> +               log->sizes[i].bytes = scale_log_param(log, sections + i, params[i]);
>>>> +
>>>> +       /* If debug size > 1MB then bump default crash size to keep the same units */
>>>> +       if (log->sizes[GUC_LOG_SECTIONS_DEBUG].bytes >= SZ_1M &&
>>>> +           (i915->params.guc_log_size_crash == -1) &&
>>>> +           GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE < SZ_1M)
>>>> +               log->sizes[GUC_LOG_SECTIONS_CRASH].bytes = SZ_1M;
>>>> +
>>>> +       /* Prepare the GuC API structure fields: */
>>>> +       for (i = 0; i < GUC_LOG_SECTIONS_LIMIT; i++) {
>>>> +               /* Convert to correct units */
>>>> +               if ((log->sizes[i].bytes % SZ_1M) == 0) {
>>>> +                       log->sizes[i].units = SZ_1M;
>>>> +                       log->sizes[i].flag = sections[i].flag;
>>>> +               } else {
>>>> +                       log->sizes[i].units = SZ_4K;
>>>> +                       log->sizes[i].flag = 0;
>>>> +               }
>>>> +
>>>> +               if (!IS_ALIGNED(log->sizes[i].bytes, log->sizes[i].units))
>>>> +                       drm_err(&i915->drm, "Mis-aligned GuC log %s size: 0x%X vs 0x%X!",
>>>> +                               sections[i].name, log->sizes[i].bytes, log->sizes[i].units);
>>>> +               log->sizes[i].count = log->sizes[i].bytes / log->sizes[i].units;
>>>> +
>>>> +               if (!log->sizes[i].count) {
>>>> +                       drm_err(&i915->drm, "Zero GuC log %s size!", sections[i].name);
>>>> +               } else {
>>>> +                       /* Size is +1 unit */
>>>> +                       log->sizes[i].count--;
>>>> +               }
>>>> +
>>>> +               /* Clip to field size */
>>>> +               if (log->sizes[i].count > sections[i].max) {
>>>> +                       drm_err(&i915->drm, "GuC log %s size too large: %d vs %d!",
>>>> +                               sections[i].name, log->sizes[i].count + 1, sections[i].max + 1);
>>>> +                       log->sizes[i].count = sections[i].max;
>>>> +               }
>>>> +       }
>>>> +
>>>> +       if (log->sizes[GUC_LOG_SECTIONS_CRASH].units != log->sizes[GUC_LOG_SECTIONS_DEBUG].units) {
>>>> +               drm_err(&i915->drm, "Unit mis-match for GuC log crash and debug sections: %d vs %d!",
>>>> +                       log->sizes[GUC_LOG_SECTIONS_CRASH].units,
>>>> +                       log->sizes[GUC_LOG_SECTIONS_DEBUG].units);
>>>> +               log->sizes[GUC_LOG_SECTIONS_CRASH].units = log->sizes[GUC_LOG_SECTIONS_DEBUG].units;
>>>> +               log->sizes[GUC_LOG_SECTIONS_CRASH].count = 0;
>>>> +       }
>>>> +
>>>> +       log->sizes_initialised = true;
>>>> +}
>>>> +
>>>> +static void guc_log_init_sizes(struct intel_guc_log *log)
>>>> +{
>>>> +       if (log->sizes_initialised)
>>>> +               return;
>>>> +
>>>> +       _guc_log_init_sizes(log);
>>>> +}
>>>> +
>>>> +static u32 intel_guc_log_section_size_crash(struct intel_guc_log *log)
>>>> +{
>>>> +       guc_log_init_sizes(log);
>>>> +
>>>> +       return log->sizes[GUC_LOG_SECTIONS_CRASH].bytes;
>>>> +}
>>>> +
>>>> +static u32 intel_guc_log_section_size_debug(struct intel_guc_log *log)
>>>> +{
>>>> +       guc_log_init_sizes(log);
>>>> +
>>>> +       return log->sizes[GUC_LOG_SECTIONS_DEBUG].bytes;
>>>> +}
>>>> +
>>>> +u32 intel_guc_log_section_size_capture(struct intel_guc_log *log)
>>>> +{
>>>> +       guc_log_init_sizes(log);
>>>> +
>>>> +       return log->sizes[GUC_LOG_SECTIONS_CAPTURE].bytes;
>>>> +}
>>>> +
>>>>    static u32 intel_guc_log_size(struct intel_guc_log *log)
>>>>    {
>>>>           /*
>>>> @@ -38,7 +188,10 @@ static u32 intel_guc_log_size(struct intel_guc_log *log)
>>>>            *  |         Capture logs          |
>>>>            *  +===============================+ + CAPTURE_SIZE
>>>>            */
>>>> -       return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
>>>> +       return PAGE_SIZE +
>>>> +               intel_guc_log_section_size_crash(log) +
>>>> +               intel_guc_log_section_size_debug(log) +
>>>> +               intel_guc_log_section_size_capture(log);
>>>>    }
>>>>    
>>>>    /**
>>>> @@ -165,7 +318,8 @@ static void guc_move_to_next_buf(struct intel_guc_log *log)
>>>>           smp_wmb();
>>>>    
>>>>           /* All data has been written, so now move the offset of sub buffer. */
>>>> -       relay_reserve(log->relay.channel, log->vma->obj->base.size - CAPTURE_BUFFER_SIZE);
>>>> +       relay_reserve(log->relay.channel, log->vma->obj->base.size -
>>>> +                                         intel_guc_log_section_size_capture(log));
>>>>    
>>>>           /* Switch to the next sub buffer */
>>>>           relay_flush(log->relay.channel);
>>>> @@ -210,15 +364,16 @@ bool intel_guc_check_log_buf_overflow(struct intel_guc_log *log,
>>>>           return overflow;
>>>>    }
>>>>    
>>>> -unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type)
>>>> +unsigned int intel_guc_get_log_buffer_size(struct intel_guc_log *log,
>>>> +                                          enum guc_log_buffer_type type)
>>>>    {
>>>>           switch (type) {
>>>>           case GUC_DEBUG_LOG_BUFFER:
>>>> -               return DEBUG_BUFFER_SIZE;
>>>> +               return intel_guc_log_section_size_debug(log);
>>>>           case GUC_CRASH_DUMP_LOG_BUFFER:
>>>> -               return CRASH_BUFFER_SIZE;
>>>> +               return intel_guc_log_section_size_crash(log);
>>>>           case GUC_CAPTURE_LOG_BUFFER:
>>>> -               return CAPTURE_BUFFER_SIZE;
>>>> +               return intel_guc_log_section_size_capture(log);
>>>>           default:
>>>>                   MISSING_CASE(type);
>>>>           }
>>>> @@ -226,7 +381,8 @@ unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type)
>>>>           return 0;
>>>>    }
>>>>    
>>>> -size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type)
>>>> +size_t intel_guc_get_log_buffer_offset(struct intel_guc_log *log,
>>>> +                                      enum guc_log_buffer_type type)
>>>>    {
>>>>           enum guc_log_buffer_type i;
>>>>           size_t offset = PAGE_SIZE;/* for the log_buffer_states */
>>>> @@ -234,7 +390,7 @@ size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type)
>>>>           for (i = GUC_DEBUG_LOG_BUFFER; i < GUC_MAX_LOG_BUFFER; ++i) {
>>>>                   if (i == type)
>>>>                           break;
>>>> -               offset += intel_guc_get_log_buffer_size(i);
>>>> +               offset += intel_guc_get_log_buffer_size(log, i);
>>>>           }
>>>>    
>>>>           return offset;
>>>> @@ -285,7 +441,7 @@ static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log)
>>>>                    */
>>>>                   memcpy(&log_buf_state_local, log_buf_state,
>>>>                          sizeof(struct guc_log_buffer_state));
>>>> -               buffer_size = intel_guc_get_log_buffer_size(type);
>>>> +               buffer_size = intel_guc_get_log_buffer_size(log, type);
>>>>                   read_offset = log_buf_state_local.read_ptr;
>>>>                   write_offset = log_buf_state_local.sampled_write_ptr;
>>>>                   full_cnt = log_buf_state_local.buffer_full_cnt;
>>>> @@ -400,7 +556,7 @@ static int guc_log_relay_create(struct intel_guc_log *log)
>>>>             * Keep the size of sub buffers same as shared log buffer
>>>>             * but GuC log-events excludes the error-state-capture logs
>>>>             */
>>>> -       subbuf_size = log->vma->size - CAPTURE_BUFFER_SIZE;
>>>> +       subbuf_size = log->vma->size - intel_guc_log_section_size_capture(log);
>>>>    
>>>>           /*
>>>>            * Store up to 8 snapshots, which is large enough to buffer sufficient
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
>>>> index dc9715411d626..02127703be809 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
>>>> @@ -15,20 +15,6 @@
>>>>    
>>>>    struct intel_guc;
>>>>    
>>>> -#if defined(CONFIG_DRM_I915_DEBUG_GUC)
>>>> -#define CRASH_BUFFER_SIZE      SZ_2M
>>>> -#define DEBUG_BUFFER_SIZE      SZ_16M
>>>> -#define CAPTURE_BUFFER_SIZE    SZ_4M
>>>> -#elif defined(CONFIG_DRM_I915_DEBUG_GEM)
>>>> -#define CRASH_BUFFER_SIZE      SZ_1M
>>>> -#define DEBUG_BUFFER_SIZE      SZ_2M
>>>> -#define CAPTURE_BUFFER_SIZE    SZ_4M
>>>> -#else
>>>> -#define CRASH_BUFFER_SIZE      SZ_8K
>>>> -#define DEBUG_BUFFER_SIZE      SZ_64K
>>>> -#define CAPTURE_BUFFER_SIZE    SZ_2M
>>>> -#endif
>>>> -
>>>>    /*
>>>>     * While we're using plain log level in i915, GuC controls are much more...
>>>>     * "elaborate"? We have a couple of bits for verbosity, separate bit for actual
>>>> @@ -46,10 +32,30 @@ struct intel_guc;
>>>>    #define GUC_VERBOSITY_TO_LOG_LEVEL(x)  ((x) + 2)
>>>>    #define GUC_LOG_LEVEL_MAX GUC_VERBOSITY_TO_LOG_LEVEL(GUC_LOG_VERBOSITY_MAX)
>>>>    
>>>> +enum {
>>>> +       GUC_LOG_SECTIONS_CRASH,
>>>> +       GUC_LOG_SECTIONS_DEBUG,
>>>> +       GUC_LOG_SECTIONS_CAPTURE,
>>>> +       GUC_LOG_SECTIONS_LIMIT
>>>> +};
>>>> +
>>>>    struct intel_guc_log {
>>>>           u32 level;
>>>> +
>>>> +       /* Allocation settings */
>>>> +       struct {
>>>> +               s32 bytes;      /* Size in bytes */
>>>> +               s32 units;      /* GuC API units - 1MB or 4KB */
>>>> +               s32 count;      /* Number of API units */
>>>> +               u32 flag;       /* GuC API units flag */
>>>> +       } sizes[GUC_LOG_SECTIONS_LIMIT];
>>>> +       bool sizes_initialised;
>>>> +
>>>> +       /* Combined buffer allocation */
>>>>           struct i915_vma *vma;
>>>>           void *buf_addr;
>>>> +
>>>> +       /* RelayFS support */
>>>>           struct {
>>>>                   bool buf_in_use;
>>>>                   bool started;
>>>> @@ -58,6 +64,7 @@ struct intel_guc_log {
>>>>                   struct mutex lock;
>>>>                   u32 full_count;
>>>>           } relay;
>>>> +
>>>>           /* logging related stats */
>>>>           struct {
>>>>                   u32 sampled_overflow;
>>>> @@ -69,8 +76,9 @@ struct intel_guc_log {
>>>>    void intel_guc_log_init_early(struct intel_guc_log *log);
>>>>    bool intel_guc_check_log_buf_overflow(struct intel_guc_log *log, enum guc_log_buffer_type type,
>>>>                                         unsigned int full_cnt);
>>>> -unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type);
>>>> -size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type);
>>>> +unsigned int intel_guc_get_log_buffer_size(struct intel_guc_log *log,
>>>> +                                          enum guc_log_buffer_type type);
>>>> +size_t intel_guc_get_log_buffer_offset(struct intel_guc_log *log, enum guc_log_buffer_type type);
>>>>    int intel_guc_log_create(struct intel_guc_log *log);
>>>>    void intel_guc_log_destroy(struct intel_guc_log *log);
>>>>    
>>>> @@ -92,4 +100,6 @@ void intel_guc_log_info(struct intel_guc_log *log, struct drm_printer *p);
>>>>    int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
>>>>                          bool dump_load_err);
>>>>    
>>>> +u32 intel_guc_log_section_size_capture(struct intel_guc_log *log);
>>>> +
>>>>    #endif
>>>> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
>>>> index 6fc475a5db615..06ca5b8221118 100644
>>>> --- a/drivers/gpu/drm/i915/i915_params.c
>>>> +++ b/drivers/gpu/drm/i915/i915_params.c
>>>> @@ -171,6 +171,18 @@ i915_param_named(guc_log_level, int, 0400,
>>>>           "GuC firmware logging level. Requires GuC to be loaded. "
>>>>           "(-1=auto [default], 0=disable, 1..4=enable with verbosity min..max)");
>>>>    
>>>> +i915_param_named(guc_log_size_crash, int, 0400,
>>>> +       "GuC firmware logging buffer size for crash dumps (in MB)"
>>>> +       "(-1=auto [default], NB: max = 4, other restrictions apply)");
>>>> +
>>>> +i915_param_named(guc_log_size_debug, int, 0400,
>>>> +       "GuC firmware logging buffer size for debug logs (in MB)"
>>>> +       "(-1=auto [default], NB: max = 16, other restrictions apply)");
>>>> +
>>>> +i915_param_named(guc_log_size_capture, int, 0400,
>>>> +       "GuC error capture register dump buffer size (in MB)"
>>>> +       "(-1=auto [default], NB: max = 4, other restrictions apply)");
>>>> +
>>>>    i915_param_named_unsafe(guc_firmware_path, charp, 0400,
>>>>           "GuC firmware path to use instead of the default one");
>>>>    
>>>> diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
>>>> index 2733cb6cfe094..f684d1ab87078 100644
>>>> --- a/drivers/gpu/drm/i915/i915_params.h
>>>> +++ b/drivers/gpu/drm/i915/i915_params.h
>>>> @@ -61,6 +61,9 @@ struct drm_printer;
>>>>           param(int, invert_brightness, 0, 0600) \
>>>>           param(int, enable_guc, -1, 0400) \
>>>>           param(int, guc_log_level, -1, 0400) \
>>>> +       param(int, guc_log_size_crash, -1, 0400) \
>>>> +       param(int, guc_log_size_debug, -1, 0400) \
>>>> +       param(int, guc_log_size_capture, -1, 0400) \
>>>>           param(char *, guc_firmware_path, NULL, 0400) \
>>>>           param(char *, huc_firmware_path, NULL, 0400) \
>>>>           param(char *, dmc_firmware_path, NULL, 0400) \
>>>> -- 
>>>> 2.37.1
>>>>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable
  2022-08-25 16:31         ` John Harrison
@ 2022-08-26  6:23           ` Joonas Lahtinen
  2022-09-12  7:12             ` Joonas Lahtinen
  0 siblings, 1 reply; 29+ messages in thread
From: Joonas Lahtinen @ 2022-08-26  6:23 UTC (permalink / raw)
  To: Intel-GFX, John Harrison; +Cc: DRI-Devel

Quoting John Harrison (2022-08-25 19:31:39)
> On 8/25/2022 00:15, Joonas Lahtinen wrote:
> > Quoting John Harrison (2022-08-24 21:45:09)
> >> On 8/24/2022 02:01, Joonas Lahtinen wrote:
> >>> NACK on this one. Let's get this reverted or fixed to eliminate
> >>> new module parameters.
> >>>
> >>> What prevents us just from using the maximum sizes? Or alternatively
> >>> we could check the already existing drm.debug variable or anything else
> >>> but addding 3 new module parameters.
> >> We don't want to waste 24MB of memory for all users when 99.999% of them
> >> don't care about GuC logs.
> > It is not exactly wasting memory if it is what is needed to capture
> > the error logs to properly debug a system. And it's definitely not much
> > on any modern system where you will have a GPU. You can always leave
> > the Kconfig options in place for the cases where it matters.
> >
> > On the other hand, if 99.999% don't need this, then it could just stay
> > as a kernel config time option as well?
> No. The point is that we need to able to ask customers to increase the 
> log size, repro an issue and send us the results.

Or we could just ask them to provide the logs for each bug report and
save one round trip time.

> All on a pre-installed 
> system where they do not have the option to build a custom kernel.

If you argue that way, they don't always have an easy way to change the
kernel cmdline options either. Accounting for initrd, update-grub etc.

> Either we always allocate the maximum and waste the memory for all end 
> users or we have a runtime configuration option.

Doesn't have to be maximum (as there seems to be limitations in log
readback speeds also), just something that is useful for most of the
debug scenarios.

> Compile time is not 
> acceptable for some important customers/situations.

Yet spending the time discussing how to make the debug logs useful with
the issue reporter wouldn't be an issue in such urgency?

One can argue what is most convenient way, but there's no way to beat
sane default. If somebody has problem with that extra memory usage, then
we have the config options to reduce/disable.

> >> We also don't want to tie the GuC logging buffer size to the DRM
> >> debugging output. Enabling kernel debug output can change timings and
> >> prevent the issue that one is trying to capture in the GuC log. And it
> >> seems unlikely we could add an entire new top level DRM debug flag just
> >> for an internal i915 only log buffer size. Plus drm.debug is explicitly
> >> stated as being dynamically changeable via sysfs at any time. The GuC
> >> log buffer size cannot be changed without reloading the i915 driver. Or
> >> at least, not without reloading the GuC, but we definitely don't want to
> >> create a UAPI for arbitrarily reloading the GuC.
> >>
> >> We can make these parameters 'unsafe' so that they taint the kernel if
> >> used. But this is exactly what module parameters are for - configuration
> >> that we don't want to hardcode as CONFIG options but which must be set
> >> at module load time.
> > It's better to have sane defaults. And if somebody wants something
> > strange, they can have a Kconfig behind EXPERT option. But even then
> > there should really be need for it.
> Define sane.

I was hoping you would be the expert on that as you've suggested the
patch to begin with.

Try to aim to cover >90% of the debug scenarios (that are only 0.001%) and
we're already only needing to recompile kernel in 1 per million cases.

We can live with that.

I will push a fixup to remove the module parameters, please figure the
rest out in a timely manner.

Regards, Joonas

> 
> Sane for most users is to not allocate 24MB of memory for an internal 
> debug only buffer they will never use. And which completely swamps any 
> error capture log with the ASCII encoding of said buffer.
> 
> But as above, we need a way to (very occasionally) get larger GuC logs 
> from customers without rebuilding the kernel.
> 
> John.
> 
> >
> > So for now, let's get the module parameters reverted and go with
> > reasonable default buffer sizes when GuC is enabled. The compile time
> > options can be left in place.
> >
> > Thank you in advance.
> >
> > Regards, Joonas
> >
> >> John.
> >>
> >>> For future reference, please do Cc maintainers when adding new uAPI
> >>> like module parameters.
> >>>
> >>> Regards, Joonas
> >>>
> >>> Quoting John.C.Harrison@Intel.com (2022-07-28 05:20:27)
> >>>> From: John Harrison <John.C.Harrison@Intel.com>
> >>>>
> >>>> The GuC log buffer sizes had to be configured statically at compile
> >>>> time. This can be quite troublesome when needing to get larger logs
> >>>> out of a released driver. So re-organise the code to allow a boot time
> >>>> module parameter override.
> >>>>
> >>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  53 ++----
> >>>>    .../gpu/drm/i915/gt/uc/intel_guc_capture.c    |  14 +-
> >>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 176 +++++++++++++++++-
> >>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_log.h    |  42 +++--
> >>>>    drivers/gpu/drm/i915/i915_params.c            |  12 ++
> >>>>    drivers/gpu/drm/i915/i915_params.h            |   3 +
> >>>>    6 files changed, 226 insertions(+), 74 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> >>>> index ab4aacc516aa4..01f2705cb94a3 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> >>>> @@ -224,53 +224,22 @@ static u32 guc_ctl_feature_flags(struct intel_guc *guc)
> >>>>    
> >>>>    static u32 guc_ctl_log_params_flags(struct intel_guc *guc)
> >>>>    {
> >>>> -       u32 offset = intel_guc_ggtt_offset(guc, guc->log.vma) >> PAGE_SHIFT;
> >>>> -       u32 flags;
> >>>> -
> >>>> -       #if (((CRASH_BUFFER_SIZE) % SZ_1M) == 0)
> >>>> -       #define LOG_UNIT SZ_1M
> >>>> -       #define LOG_FLAG GUC_LOG_LOG_ALLOC_UNITS
> >>>> -       #else
> >>>> -       #define LOG_UNIT SZ_4K
> >>>> -       #define LOG_FLAG 0
> >>>> -       #endif
> >>>> -
> >>>> -       #if (((CAPTURE_BUFFER_SIZE) % SZ_1M) == 0)
> >>>> -       #define CAPTURE_UNIT SZ_1M
> >>>> -       #define CAPTURE_FLAG GUC_LOG_CAPTURE_ALLOC_UNITS
> >>>> -       #else
> >>>> -       #define CAPTURE_UNIT SZ_4K
> >>>> -       #define CAPTURE_FLAG 0
> >>>> -       #endif
> >>>> -
> >>>> -       BUILD_BUG_ON(!CRASH_BUFFER_SIZE);
> >>>> -       BUILD_BUG_ON(!IS_ALIGNED(CRASH_BUFFER_SIZE, LOG_UNIT));
> >>>> -       BUILD_BUG_ON(!DEBUG_BUFFER_SIZE);
> >>>> -       BUILD_BUG_ON(!IS_ALIGNED(DEBUG_BUFFER_SIZE, LOG_UNIT));
> >>>> -       BUILD_BUG_ON(!CAPTURE_BUFFER_SIZE);
> >>>> -       BUILD_BUG_ON(!IS_ALIGNED(CAPTURE_BUFFER_SIZE, CAPTURE_UNIT));
> >>>> -
> >>>> -       BUILD_BUG_ON((CRASH_BUFFER_SIZE / LOG_UNIT - 1) >
> >>>> -                       (GUC_LOG_CRASH_MASK >> GUC_LOG_CRASH_SHIFT));
> >>>> -       BUILD_BUG_ON((DEBUG_BUFFER_SIZE / LOG_UNIT - 1) >
> >>>> -                       (GUC_LOG_DEBUG_MASK >> GUC_LOG_DEBUG_SHIFT));
> >>>> -       BUILD_BUG_ON((CAPTURE_BUFFER_SIZE / CAPTURE_UNIT - 1) >
> >>>> -                       (GUC_LOG_CAPTURE_MASK >> GUC_LOG_CAPTURE_SHIFT));
> >>>> +       struct intel_guc_log *log = &guc->log;
> >>>> +       u32 offset, flags;
> >>>> +
> >>>> +       GEM_BUG_ON(!log->sizes_initialised);
> >>>> +
> >>>> +       offset = intel_guc_ggtt_offset(guc, log->vma) >> PAGE_SHIFT;
> >>>>    
> >>>>           flags = GUC_LOG_VALID |
> >>>>                   GUC_LOG_NOTIFY_ON_HALF_FULL |
> >>>> -               CAPTURE_FLAG |
> >>>> -               LOG_FLAG |
> >>>> -               ((CRASH_BUFFER_SIZE / LOG_UNIT - 1) << GUC_LOG_CRASH_SHIFT) |
> >>>> -               ((DEBUG_BUFFER_SIZE / LOG_UNIT - 1) << GUC_LOG_DEBUG_SHIFT) |
> >>>> -               ((CAPTURE_BUFFER_SIZE / CAPTURE_UNIT - 1) << GUC_LOG_CAPTURE_SHIFT) |
> >>>> +               log->sizes[GUC_LOG_SECTIONS_DEBUG].flag |
> >>>> +               log->sizes[GUC_LOG_SECTIONS_CAPTURE].flag |
> >>>> +               (log->sizes[GUC_LOG_SECTIONS_CRASH].count << GUC_LOG_CRASH_SHIFT) |
> >>>> +               (log->sizes[GUC_LOG_SECTIONS_DEBUG].count << GUC_LOG_DEBUG_SHIFT) |
> >>>> +               (log->sizes[GUC_LOG_SECTIONS_CAPTURE].count << GUC_LOG_CAPTURE_SHIFT) |
> >>>>                   (offset << GUC_LOG_BUF_ADDR_SHIFT);
> >>>>    
> >>>> -       #undef LOG_UNIT
> >>>> -       #undef LOG_FLAG
> >>>> -       #undef CAPTURE_UNIT
> >>>> -       #undef CAPTURE_FLAG
> >>>> -
> >>>>           return flags;
> >>>>    }
> >>>>    
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> >>>> index b54b7883320b1..d2ac53d4f3b6e 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
> >>>> @@ -656,16 +656,17 @@ static void check_guc_capture_size(struct intel_guc *guc)
> >>>>           struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> >>>>           int min_size = guc_capture_output_min_size_est(guc);
> >>>>           int spare_size = min_size * GUC_CAPTURE_OVERBUFFER_MULTIPLIER;
> >>>> +       u32 buffer_size = intel_guc_log_section_size_capture(&guc->log);
> >>>>    
> >>>>           if (min_size < 0)
> >>>>                   drm_warn(&i915->drm, "Failed to calculate GuC error state capture buffer minimum size: %d!\n",
> >>>>                            min_size);
> >>>> -       else if (min_size > CAPTURE_BUFFER_SIZE)
> >>>> +       else if (min_size > buffer_size)
> >>>>                   drm_warn(&i915->drm, "GuC error state capture buffer is too small: %d < %d\n",
> >>>> -                        CAPTURE_BUFFER_SIZE, min_size);
> >>>> -       else if (spare_size > CAPTURE_BUFFER_SIZE)
> >>>> +                        buffer_size, min_size);
> >>>> +       else if (spare_size > buffer_size)
> >>>>                   drm_notice(&i915->drm, "GuC error state capture buffer maybe too small: %d < %d (min = %d)\n",
> >>>> -                          CAPTURE_BUFFER_SIZE, spare_size, min_size);
> >>>> +                          buffer_size, spare_size, min_size);
> >>>>    }
> >>>>    
> >>>>    /*
> >>>> @@ -1294,7 +1295,8 @@ static void __guc_capture_process_output(struct intel_guc *guc)
> >>>>    
> >>>>           log_buf_state = guc->log.buf_addr +
> >>>>                           (sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER);
> >>>> -       src_data = guc->log.buf_addr + intel_guc_get_log_buffer_offset(GUC_CAPTURE_LOG_BUFFER);
> >>>> +       src_data = guc->log.buf_addr +
> >>>> +                  intel_guc_get_log_buffer_offset(&guc->log, GUC_CAPTURE_LOG_BUFFER);
> >>>>    
> >>>>           /*
> >>>>            * Make a copy of the state structure, inside GuC log buffer
> >>>> @@ -1302,7 +1304,7 @@ static void __guc_capture_process_output(struct intel_guc *guc)
> >>>>            * from it multiple times.
> >>>>            */
> >>>>           memcpy(&log_buf_state_local, log_buf_state, sizeof(struct guc_log_buffer_state));
> >>>> -       buffer_size = intel_guc_get_log_buffer_size(GUC_CAPTURE_LOG_BUFFER);
> >>>> +       buffer_size = intel_guc_get_log_buffer_size(&guc->log, GUC_CAPTURE_LOG_BUFFER);
> >>>>           read_offset = log_buf_state_local.read_ptr;
> >>>>           write_offset = log_buf_state_local.sampled_write_ptr;
> >>>>           full_count = log_buf_state_local.buffer_full_cnt;
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> >>>> index 4722d4b18ed19..890b6853bd609 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c
> >>>> @@ -13,8 +13,158 @@
> >>>>    #include "intel_guc_capture.h"
> >>>>    #include "intel_guc_log.h"
> >>>>    
> >>>> +#if defined(CONFIG_DRM_I915_DEBUG_GUC)
> >>>> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_2M
> >>>> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_16M
> >>>> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_4M
> >>>> +#elif defined(CONFIG_DRM_I915_DEBUG_GEM)
> >>>> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_1M
> >>>> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_2M
> >>>> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_4M
> >>>> +#else
> >>>> +#define GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE      SZ_8K
> >>>> +#define GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE      SZ_64K
> >>>> +#define GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE    SZ_2M
> >>>> +#endif
> >>>> +
> >>>>    static void guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log);
> >>>>    
> >>>> +struct guc_log_section {
> >>>> +       u32 max;
> >>>> +       u32 flag;
> >>>> +       u32 default_val;
> >>>> +       const char *name;
> >>>> +};
> >>>> +
> >>>> +static s32 scale_log_param(struct intel_guc_log *log, const struct guc_log_section *section,
> >>>> +                          s32 param)
> >>>> +{
> >>>> +       /* -1 means default */
> >>>> +       if (param < 0)
> >>>> +               return section->default_val;
> >>>> +
> >>>> +       /* Check for 32-bit overflow */
> >>>> +       if (param >= SZ_4K) {
> >>>> +               drm_err(&guc_to_gt(log_to_guc(log))->i915->drm, "Size too large for GuC %s log: %dMB!",
> >>>> +                       section->name, param);
> >>>> +               return section->default_val;
> >>>> +       }
> >>>> +
> >>>> +       /* Param units are 1MB */
> >>>> +       return param * SZ_1M;
> >>>> +}
> >>>> +
> >>>> +static void _guc_log_init_sizes(struct intel_guc_log *log)
> >>>> +{
> >>>> +       struct intel_guc *guc = log_to_guc(log);
> >>>> +       struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
> >>>> +       static const struct guc_log_section sections[GUC_LOG_SECTIONS_LIMIT] = {
> >>>> +               {
> >>>> +                       GUC_LOG_CRASH_MASK >> GUC_LOG_CRASH_SHIFT,
> >>>> +                       GUC_LOG_LOG_ALLOC_UNITS,
> >>>> +                       GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE,
> >>>> +                       "crash dump"
> >>>> +               },
> >>>> +               {
> >>>> +                       GUC_LOG_DEBUG_MASK >> GUC_LOG_DEBUG_SHIFT,
> >>>> +                       GUC_LOG_LOG_ALLOC_UNITS,
> >>>> +                       GUC_LOG_DEFAULT_DEBUG_BUFFER_SIZE,
> >>>> +                       "debug",
> >>>> +               },
> >>>> +               {
> >>>> +                       GUC_LOG_CAPTURE_MASK >> GUC_LOG_CAPTURE_SHIFT,
> >>>> +                       GUC_LOG_CAPTURE_ALLOC_UNITS,
> >>>> +                       GUC_LOG_DEFAULT_CAPTURE_BUFFER_SIZE,
> >>>> +                       "capture",
> >>>> +               }
> >>>> +       };
> >>>> +       s32 params[GUC_LOG_SECTIONS_LIMIT] = {
> >>>> +               i915->params.guc_log_size_crash,
> >>>> +               i915->params.guc_log_size_debug,
> >>>> +               i915->params.guc_log_size_capture,
> >>>> +       };
> >>>> +       int i;
> >>>> +
> >>>> +       for (i = 0; i < GUC_LOG_SECTIONS_LIMIT; i++)
> >>>> +               log->sizes[i].bytes = scale_log_param(log, sections + i, params[i]);
> >>>> +
> >>>> +       /* If debug size > 1MB then bump default crash size to keep the same units */
> >>>> +       if (log->sizes[GUC_LOG_SECTIONS_DEBUG].bytes >= SZ_1M &&
> >>>> +           (i915->params.guc_log_size_crash == -1) &&
> >>>> +           GUC_LOG_DEFAULT_CRASH_BUFFER_SIZE < SZ_1M)
> >>>> +               log->sizes[GUC_LOG_SECTIONS_CRASH].bytes = SZ_1M;
> >>>> +
> >>>> +       /* Prepare the GuC API structure fields: */
> >>>> +       for (i = 0; i < GUC_LOG_SECTIONS_LIMIT; i++) {
> >>>> +               /* Convert to correct units */
> >>>> +               if ((log->sizes[i].bytes % SZ_1M) == 0) {
> >>>> +                       log->sizes[i].units = SZ_1M;
> >>>> +                       log->sizes[i].flag = sections[i].flag;
> >>>> +               } else {
> >>>> +                       log->sizes[i].units = SZ_4K;
> >>>> +                       log->sizes[i].flag = 0;
> >>>> +               }
> >>>> +
> >>>> +               if (!IS_ALIGNED(log->sizes[i].bytes, log->sizes[i].units))
> >>>> +                       drm_err(&i915->drm, "Mis-aligned GuC log %s size: 0x%X vs 0x%X!",
> >>>> +                               sections[i].name, log->sizes[i].bytes, log->sizes[i].units);
> >>>> +               log->sizes[i].count = log->sizes[i].bytes / log->sizes[i].units;
> >>>> +
> >>>> +               if (!log->sizes[i].count) {
> >>>> +                       drm_err(&i915->drm, "Zero GuC log %s size!", sections[i].name);
> >>>> +               } else {
> >>>> +                       /* Size is +1 unit */
> >>>> +                       log->sizes[i].count--;
> >>>> +               }
> >>>> +
> >>>> +               /* Clip to field size */
> >>>> +               if (log->sizes[i].count > sections[i].max) {
> >>>> +                       drm_err(&i915->drm, "GuC log %s size too large: %d vs %d!",
> >>>> +                               sections[i].name, log->sizes[i].count + 1, sections[i].max + 1);
> >>>> +                       log->sizes[i].count = sections[i].max;
> >>>> +               }
> >>>> +       }
> >>>> +
> >>>> +       if (log->sizes[GUC_LOG_SECTIONS_CRASH].units != log->sizes[GUC_LOG_SECTIONS_DEBUG].units) {
> >>>> +               drm_err(&i915->drm, "Unit mis-match for GuC log crash and debug sections: %d vs %d!",
> >>>> +                       log->sizes[GUC_LOG_SECTIONS_CRASH].units,
> >>>> +                       log->sizes[GUC_LOG_SECTIONS_DEBUG].units);
> >>>> +               log->sizes[GUC_LOG_SECTIONS_CRASH].units = log->sizes[GUC_LOG_SECTIONS_DEBUG].units;
> >>>> +               log->sizes[GUC_LOG_SECTIONS_CRASH].count = 0;
> >>>> +       }
> >>>> +
> >>>> +       log->sizes_initialised = true;
> >>>> +}
> >>>> +
> >>>> +static void guc_log_init_sizes(struct intel_guc_log *log)
> >>>> +{
> >>>> +       if (log->sizes_initialised)
> >>>> +               return;
> >>>> +
> >>>> +       _guc_log_init_sizes(log);
> >>>> +}
> >>>> +
> >>>> +static u32 intel_guc_log_section_size_crash(struct intel_guc_log *log)
> >>>> +{
> >>>> +       guc_log_init_sizes(log);
> >>>> +
> >>>> +       return log->sizes[GUC_LOG_SECTIONS_CRASH].bytes;
> >>>> +}
> >>>> +
> >>>> +static u32 intel_guc_log_section_size_debug(struct intel_guc_log *log)
> >>>> +{
> >>>> +       guc_log_init_sizes(log);
> >>>> +
> >>>> +       return log->sizes[GUC_LOG_SECTIONS_DEBUG].bytes;
> >>>> +}
> >>>> +
> >>>> +u32 intel_guc_log_section_size_capture(struct intel_guc_log *log)
> >>>> +{
> >>>> +       guc_log_init_sizes(log);
> >>>> +
> >>>> +       return log->sizes[GUC_LOG_SECTIONS_CAPTURE].bytes;
> >>>> +}
> >>>> +
> >>>>    static u32 intel_guc_log_size(struct intel_guc_log *log)
> >>>>    {
> >>>>           /*
> >>>> @@ -38,7 +188,10 @@ static u32 intel_guc_log_size(struct intel_guc_log *log)
> >>>>            *  |         Capture logs          |
> >>>>            *  +===============================+ + CAPTURE_SIZE
> >>>>            */
> >>>> -       return PAGE_SIZE + CRASH_BUFFER_SIZE + DEBUG_BUFFER_SIZE + CAPTURE_BUFFER_SIZE;
> >>>> +       return PAGE_SIZE +
> >>>> +               intel_guc_log_section_size_crash(log) +
> >>>> +               intel_guc_log_section_size_debug(log) +
> >>>> +               intel_guc_log_section_size_capture(log);
> >>>>    }
> >>>>    
> >>>>    /**
> >>>> @@ -165,7 +318,8 @@ static void guc_move_to_next_buf(struct intel_guc_log *log)
> >>>>           smp_wmb();
> >>>>    
> >>>>           /* All data has been written, so now move the offset of sub buffer. */
> >>>> -       relay_reserve(log->relay.channel, log->vma->obj->base.size - CAPTURE_BUFFER_SIZE);
> >>>> +       relay_reserve(log->relay.channel, log->vma->obj->base.size -
> >>>> +                                         intel_guc_log_section_size_capture(log));
> >>>>    
> >>>>           /* Switch to the next sub buffer */
> >>>>           relay_flush(log->relay.channel);
> >>>> @@ -210,15 +364,16 @@ bool intel_guc_check_log_buf_overflow(struct intel_guc_log *log,
> >>>>           return overflow;
> >>>>    }
> >>>>    
> >>>> -unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type)
> >>>> +unsigned int intel_guc_get_log_buffer_size(struct intel_guc_log *log,
> >>>> +                                          enum guc_log_buffer_type type)
> >>>>    {
> >>>>           switch (type) {
> >>>>           case GUC_DEBUG_LOG_BUFFER:
> >>>> -               return DEBUG_BUFFER_SIZE;
> >>>> +               return intel_guc_log_section_size_debug(log);
> >>>>           case GUC_CRASH_DUMP_LOG_BUFFER:
> >>>> -               return CRASH_BUFFER_SIZE;
> >>>> +               return intel_guc_log_section_size_crash(log);
> >>>>           case GUC_CAPTURE_LOG_BUFFER:
> >>>> -               return CAPTURE_BUFFER_SIZE;
> >>>> +               return intel_guc_log_section_size_capture(log);
> >>>>           default:
> >>>>                   MISSING_CASE(type);
> >>>>           }
> >>>> @@ -226,7 +381,8 @@ unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type)
> >>>>           return 0;
> >>>>    }
> >>>>    
> >>>> -size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type)
> >>>> +size_t intel_guc_get_log_buffer_offset(struct intel_guc_log *log,
> >>>> +                                      enum guc_log_buffer_type type)
> >>>>    {
> >>>>           enum guc_log_buffer_type i;
> >>>>           size_t offset = PAGE_SIZE;/* for the log_buffer_states */
> >>>> @@ -234,7 +390,7 @@ size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type)
> >>>>           for (i = GUC_DEBUG_LOG_BUFFER; i < GUC_MAX_LOG_BUFFER; ++i) {
> >>>>                   if (i == type)
> >>>>                           break;
> >>>> -               offset += intel_guc_get_log_buffer_size(i);
> >>>> +               offset += intel_guc_get_log_buffer_size(log, i);
> >>>>           }
> >>>>    
> >>>>           return offset;
> >>>> @@ -285,7 +441,7 @@ static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log)
> >>>>                    */
> >>>>                   memcpy(&log_buf_state_local, log_buf_state,
> >>>>                          sizeof(struct guc_log_buffer_state));
> >>>> -               buffer_size = intel_guc_get_log_buffer_size(type);
> >>>> +               buffer_size = intel_guc_get_log_buffer_size(log, type);
> >>>>                   read_offset = log_buf_state_local.read_ptr;
> >>>>                   write_offset = log_buf_state_local.sampled_write_ptr;
> >>>>                   full_cnt = log_buf_state_local.buffer_full_cnt;
> >>>> @@ -400,7 +556,7 @@ static int guc_log_relay_create(struct intel_guc_log *log)
> >>>>             * Keep the size of sub buffers same as shared log buffer
> >>>>             * but GuC log-events excludes the error-state-capture logs
> >>>>             */
> >>>> -       subbuf_size = log->vma->size - CAPTURE_BUFFER_SIZE;
> >>>> +       subbuf_size = log->vma->size - intel_guc_log_section_size_capture(log);
> >>>>    
> >>>>           /*
> >>>>            * Store up to 8 snapshots, which is large enough to buffer sufficient
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> >>>> index dc9715411d626..02127703be809 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h
> >>>> @@ -15,20 +15,6 @@
> >>>>    
> >>>>    struct intel_guc;
> >>>>    
> >>>> -#if defined(CONFIG_DRM_I915_DEBUG_GUC)
> >>>> -#define CRASH_BUFFER_SIZE      SZ_2M
> >>>> -#define DEBUG_BUFFER_SIZE      SZ_16M
> >>>> -#define CAPTURE_BUFFER_SIZE    SZ_4M
> >>>> -#elif defined(CONFIG_DRM_I915_DEBUG_GEM)
> >>>> -#define CRASH_BUFFER_SIZE      SZ_1M
> >>>> -#define DEBUG_BUFFER_SIZE      SZ_2M
> >>>> -#define CAPTURE_BUFFER_SIZE    SZ_4M
> >>>> -#else
> >>>> -#define CRASH_BUFFER_SIZE      SZ_8K
> >>>> -#define DEBUG_BUFFER_SIZE      SZ_64K
> >>>> -#define CAPTURE_BUFFER_SIZE    SZ_2M
> >>>> -#endif
> >>>> -
> >>>>    /*
> >>>>     * While we're using plain log level in i915, GuC controls are much more...
> >>>>     * "elaborate"? We have a couple of bits for verbosity, separate bit for actual
> >>>> @@ -46,10 +32,30 @@ struct intel_guc;
> >>>>    #define GUC_VERBOSITY_TO_LOG_LEVEL(x)  ((x) + 2)
> >>>>    #define GUC_LOG_LEVEL_MAX GUC_VERBOSITY_TO_LOG_LEVEL(GUC_LOG_VERBOSITY_MAX)
> >>>>    
> >>>> +enum {
> >>>> +       GUC_LOG_SECTIONS_CRASH,
> >>>> +       GUC_LOG_SECTIONS_DEBUG,
> >>>> +       GUC_LOG_SECTIONS_CAPTURE,
> >>>> +       GUC_LOG_SECTIONS_LIMIT
> >>>> +};
> >>>> +
> >>>>    struct intel_guc_log {
> >>>>           u32 level;
> >>>> +
> >>>> +       /* Allocation settings */
> >>>> +       struct {
> >>>> +               s32 bytes;      /* Size in bytes */
> >>>> +               s32 units;      /* GuC API units - 1MB or 4KB */
> >>>> +               s32 count;      /* Number of API units */
> >>>> +               u32 flag;       /* GuC API units flag */
> >>>> +       } sizes[GUC_LOG_SECTIONS_LIMIT];
> >>>> +       bool sizes_initialised;
> >>>> +
> >>>> +       /* Combined buffer allocation */
> >>>>           struct i915_vma *vma;
> >>>>           void *buf_addr;
> >>>> +
> >>>> +       /* RelayFS support */
> >>>>           struct {
> >>>>                   bool buf_in_use;
> >>>>                   bool started;
> >>>> @@ -58,6 +64,7 @@ struct intel_guc_log {
> >>>>                   struct mutex lock;
> >>>>                   u32 full_count;
> >>>>           } relay;
> >>>> +
> >>>>           /* logging related stats */
> >>>>           struct {
> >>>>                   u32 sampled_overflow;
> >>>> @@ -69,8 +76,9 @@ struct intel_guc_log {
> >>>>    void intel_guc_log_init_early(struct intel_guc_log *log);
> >>>>    bool intel_guc_check_log_buf_overflow(struct intel_guc_log *log, enum guc_log_buffer_type type,
> >>>>                                         unsigned int full_cnt);
> >>>> -unsigned int intel_guc_get_log_buffer_size(enum guc_log_buffer_type type);
> >>>> -size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type);
> >>>> +unsigned int intel_guc_get_log_buffer_size(struct intel_guc_log *log,
> >>>> +                                          enum guc_log_buffer_type type);
> >>>> +size_t intel_guc_get_log_buffer_offset(struct intel_guc_log *log, enum guc_log_buffer_type type);
> >>>>    int intel_guc_log_create(struct intel_guc_log *log);
> >>>>    void intel_guc_log_destroy(struct intel_guc_log *log);
> >>>>    
> >>>> @@ -92,4 +100,6 @@ void intel_guc_log_info(struct intel_guc_log *log, struct drm_printer *p);
> >>>>    int intel_guc_log_dump(struct intel_guc_log *log, struct drm_printer *p,
> >>>>                          bool dump_load_err);
> >>>>    
> >>>> +u32 intel_guc_log_section_size_capture(struct intel_guc_log *log);
> >>>> +
> >>>>    #endif
> >>>> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> >>>> index 6fc475a5db615..06ca5b8221118 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_params.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_params.c
> >>>> @@ -171,6 +171,18 @@ i915_param_named(guc_log_level, int, 0400,
> >>>>           "GuC firmware logging level. Requires GuC to be loaded. "
> >>>>           "(-1=auto [default], 0=disable, 1..4=enable with verbosity min..max)");
> >>>>    
> >>>> +i915_param_named(guc_log_size_crash, int, 0400,
> >>>> +       "GuC firmware logging buffer size for crash dumps (in MB)"
> >>>> +       "(-1=auto [default], NB: max = 4, other restrictions apply)");
> >>>> +
> >>>> +i915_param_named(guc_log_size_debug, int, 0400,
> >>>> +       "GuC firmware logging buffer size for debug logs (in MB)"
> >>>> +       "(-1=auto [default], NB: max = 16, other restrictions apply)");
> >>>> +
> >>>> +i915_param_named(guc_log_size_capture, int, 0400,
> >>>> +       "GuC error capture register dump buffer size (in MB)"
> >>>> +       "(-1=auto [default], NB: max = 4, other restrictions apply)");
> >>>> +
> >>>>    i915_param_named_unsafe(guc_firmware_path, charp, 0400,
> >>>>           "GuC firmware path to use instead of the default one");
> >>>>    
> >>>> diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
> >>>> index 2733cb6cfe094..f684d1ab87078 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_params.h
> >>>> +++ b/drivers/gpu/drm/i915/i915_params.h
> >>>> @@ -61,6 +61,9 @@ struct drm_printer;
> >>>>           param(int, invert_brightness, 0, 0600) \
> >>>>           param(int, enable_guc, -1, 0400) \
> >>>>           param(int, guc_log_level, -1, 0400) \
> >>>> +       param(int, guc_log_size_crash, -1, 0400) \
> >>>> +       param(int, guc_log_size_debug, -1, 0400) \
> >>>> +       param(int, guc_log_size_capture, -1, 0400) \
> >>>>           param(char *, guc_firmware_path, NULL, 0400) \
> >>>>           param(char *, huc_firmware_path, NULL, 0400) \
> >>>>           param(char *, dmc_firmware_path, NULL, 0400) \
> >>>> -- 
> >>>> 2.37.1
> >>>>
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable
  2022-08-26  6:23           ` Joonas Lahtinen
@ 2022-09-12  7:12             ` Joonas Lahtinen
  2022-09-12 23:46               ` John Harrison
  0 siblings, 1 reply; 29+ messages in thread
From: Joonas Lahtinen @ 2022-09-12  7:12 UTC (permalink / raw)
  To: Intel-GFX, John Harrison; +Cc: DRI-Devel

Quoting Joonas Lahtinen (2022-08-26 09:23:08)
> Quoting John Harrison (2022-08-25 19:31:39)
> > On 8/25/2022 00:15, Joonas Lahtinen wrote:
> > > Quoting John Harrison (2022-08-24 21:45:09)
> > >> We also don't want to tie the GuC logging buffer size to the DRM
> > >> debugging output. Enabling kernel debug output can change timings and
> > >> prevent the issue that one is trying to capture in the GuC log. And it
> > >> seems unlikely we could add an entire new top level DRM debug flag just
> > >> for an internal i915 only log buffer size. Plus drm.debug is explicitly
> > >> stated as being dynamically changeable via sysfs at any time. The GuC
> > >> log buffer size cannot be changed without reloading the i915 driver. Or
> > >> at least, not without reloading the GuC, but we definitely don't want to
> > >> create a UAPI for arbitrarily reloading the GuC.
> > >>
> > >> We can make these parameters 'unsafe' so that they taint the kernel if
> > >> used. But this is exactly what module parameters are for - configuration
> > >> that we don't want to hardcode as CONFIG options but which must be set
> > >> at module load time.
> > > It's better to have sane defaults. And if somebody wants something
> > > strange, they can have a Kconfig behind EXPERT option. But even then
> > > there should really be need for it.
> > Define sane.
> 
> I was hoping you would be the expert on that as you've suggested the
> patch to begin with.
> 
> Try to aim to cover >90% of the debug scenarios (that are only 0.001%) and
> we're already only needing to recompile kernel in 1 per million cases.
> 
> We can live with that.
> 
> I will push a fixup to remove the module parameters, please figure the
> rest out in a timely manner.

John, what is the status of the followup patch here to configure those
reasonable defaults?

We shouldn't be ignoring this and proceed to pile other GuC patches
on top.

Regards, Joonas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable
  2022-09-12  7:12             ` Joonas Lahtinen
@ 2022-09-12 23:46               ` John Harrison
  0 siblings, 0 replies; 29+ messages in thread
From: John Harrison @ 2022-09-12 23:46 UTC (permalink / raw)
  To: Joonas Lahtinen, Intel-GFX; +Cc: DRI-Devel

On 9/12/2022 00:12, Joonas Lahtinen wrote:
> Quoting Joonas Lahtinen (2022-08-26 09:23:08)
>> Quoting John Harrison (2022-08-25 19:31:39)
>>> On 8/25/2022 00:15, Joonas Lahtinen wrote:
>>>> Quoting John Harrison (2022-08-24 21:45:09)
>>>>> We also don't want to tie the GuC logging buffer size to the DRM
>>>>> debugging output. Enabling kernel debug output can change timings and
>>>>> prevent the issue that one is trying to capture in the GuC log. And it
>>>>> seems unlikely we could add an entire new top level DRM debug flag just
>>>>> for an internal i915 only log buffer size. Plus drm.debug is explicitly
>>>>> stated as being dynamically changeable via sysfs at any time. The GuC
>>>>> log buffer size cannot be changed without reloading the i915 driver. Or
>>>>> at least, not without reloading the GuC, but we definitely don't want to
>>>>> create a UAPI for arbitrarily reloading the GuC.
>>>>>
>>>>> We can make these parameters 'unsafe' so that they taint the kernel if
>>>>> used. But this is exactly what module parameters are for - configuration
>>>>> that we don't want to hardcode as CONFIG options but which must be set
>>>>> at module load time.
>>>> It's better to have sane defaults. And if somebody wants something
>>>> strange, they can have a Kconfig behind EXPERT option. But even then
>>>> there should really be need for it.
>>> Define sane.
>> I was hoping you would be the expert on that as you've suggested the
>> patch to begin with.
And as the 'expert' I suggested an approach that everyone was happy with 
except for yourself.

>>
>> Try to aim to cover >90% of the debug scenarios (that are only 0.001%) and
>> we're already only needing to recompile kernel in 1 per million cases.
>>
>> We can live with that.
This is not about how many instances of debug scenarios need a rebuild. 
It's about whether or not the person doing the repro has the ability to 
rebuild a custom kernel.

>>
>> I will push a fixup to remove the module parameters, please figure the
>> rest out in a timely manner.
Your fixup was evidently not tested because it broke non-debug builds.

> John, what is the status of the followup patch here to configure those
> reasonable defaults?
>
> We shouldn't be ignoring this and proceed to pile other GuC patches
> on top.
Being out of office is not ignoring.

And I don't see what other options we have. Setting a large default 
means that the vast majority of people who don't care about GuC will 
have their error capture logs filled with apparent gibberish in the form 
of a huge ASCII dump of the GuC log binary data. Which is something that 
people will surely complain about. Whereas setting a 'sensibly small' 
default means that we won't be able to use the GuC logs in many of the 
cases where we actually need to.

So right now, it seems the only choice we have is to fix up the broken 
driver caused by your incomplete removal and then re-add the module 
parameter to our internal tree so that our internal customers can at 
least use it.

John.


>
> Regards, Joonas


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-09-12 23:47 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-28  2:20 [PATCH 0/7] Fixes and improvements to GuC logging and error capture John.C.Harrison
2022-07-28  2:20 ` [PATCH 1/7] drm/i915/guc: Add a helper for log buffer size John.C.Harrison
2022-08-02 17:37   ` Teres Alexis, Alan Previn
2022-08-03  0:29     ` John Harrison
2022-07-28  2:20 ` [PATCH 2/7] drm/i915/guc: Fix capture size warning and bump the size John.C.Harrison
2022-08-02 17:46   ` [Intel-gfx] " Teres Alexis, Alan Previn
2022-07-28  2:20 ` [PATCH 3/7] drm/i915/guc: Add GuC <-> kernel time stamp translation information John.C.Harrison
2022-08-05  0:40   ` [Intel-gfx] " Teres Alexis, Alan Previn
2022-08-08 18:43     ` John Harrison
2022-08-15  4:55       ` Teres Alexis, Alan Previn
2022-08-19 10:45   ` Jani Nikula
2022-08-19 21:02     ` John Harrison
2022-08-23 10:09       ` Jani Nikula
2022-07-28  2:20 ` [PATCH 4/7] drm/i915/guc: Record CTB info in error logs John.C.Harrison
2022-08-02 18:27   ` [Intel-gfx] " Teres Alexis, Alan Previn
2022-08-03  0:20     ` John Harrison
2022-07-28  2:20 ` [PATCH 5/7] drm/i915/guc: Use streaming loads to speed up dumping the guc log John.C.Harrison
2022-08-02 18:48   ` [Intel-gfx] " Teres Alexis, Alan Previn
2022-08-03  0:14     ` John Harrison
2022-07-28  2:20 ` [PATCH 6/7] drm/i915/guc: Make GuC log sizes runtime configurable John.C.Harrison
2022-08-15  5:43   ` [Intel-gfx] " Teres Alexis, Alan Previn
2022-08-24  9:01   ` Joonas Lahtinen
     [not found]     ` <4bd7b51a-caf0-d987-c7df-6cfb24f36597@intel.com>
2022-08-25  7:15       ` Joonas Lahtinen
2022-08-25 16:31         ` John Harrison
2022-08-26  6:23           ` Joonas Lahtinen
2022-09-12  7:12             ` Joonas Lahtinen
2022-09-12 23:46               ` John Harrison
2022-07-28  2:20 ` [PATCH 7/7] drm/i915/guc: Reduce spam from error capture John.C.Harrison
2022-08-02 18:54   ` [Intel-gfx] " Teres Alexis, Alan Previn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).