intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness
@ 2022-01-27  0:15 Umesh Nerlige Ramappa
  2022-01-27  0:15 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Use existing uncore helper to read gpm_timestamp Umesh Nerlige Ramappa
  2022-01-27  0:48 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness Patchwork
  0 siblings, 2 replies; 5+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-01-27  0:15 UTC (permalink / raw)
  To: intel-gfx

GuC updates shared memory and KMD reads it. Since this is not
synchronized, we run into a race where the value read is inconsistent.
Sometimes the inconsistency is in reading the upper MSB bytes of the
last_switch_in value. 2 types of cases are seen - upper 8 bits are zero
and upper 24 bits are zero. Since these are non-zero values, it is
not trivial to determine validity of these values. Instead we read the
values multiple times until they are consistent. In test runs, 3
attempts results in consistent values. The upper bound is set to 6
attempts and may need to be tuned as per any new occurences.

Since the duration that gt is parked can vary, the patch also updates
the gt timestamp on unpark before starting the worker.

v2:
- Initialize i
- Use READ_ONCE to access engine record

Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 58 +++++++++++++++++--
 1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index db9615dcb0ec..4e9154cacc58 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1114,6 +1114,19 @@ __extend_last_switch(struct intel_guc *guc, u64 *prev_start, u32 new_start)
 	if (new_start == lower_32_bits(*prev_start))
 		return;
 
+	/*
+	 * When gt is unparked, we update the gt timestamp and start the ping
+	 * worker that updates the gt_stamp every POLL_TIME_CLKS. As long as gt
+	 * is unparked, all switched in contexts will have a start time that is
+	 * within +/- POLL_TIME_CLKS of the most recent gt_stamp.
+	 *
+	 * If neither gt_stamp nor new_start has rolled over, then the
+	 * gt_stamp_hi does not need to be adjusted, however if one of them has
+	 * rolled over, we need to adjust gt_stamp_hi accordingly.
+	 *
+	 * The below conditions address the cases of new_start rollover and
+	 * gt_stamp_last rollover respectively.
+	 */
 	if (new_start < gt_stamp_last &&
 	    (new_start - gt_stamp_last) <= POLL_TIME_CLKS)
 		gt_stamp_hi++;
@@ -1125,17 +1138,45 @@ __extend_last_switch(struct intel_guc *guc, u64 *prev_start, u32 new_start)
 	*prev_start = ((u64)gt_stamp_hi << 32) | new_start;
 }
 
-static void guc_update_engine_gt_clks(struct intel_engine_cs *engine)
+/*
+ * GuC updates shared memory and KMD reads it. Since this is not synchronized,
+ * we run into a race where the value read is inconsistent. Sometimes the
+ * inconsistency is in reading the upper MSB bytes of the last_in value when
+ * this race occurs. 2 types of cases are seen - upper 8 bits are zero and upper
+ * 24 bits are zero. Since these are non-zero values, it is non-trivial to
+ * determine validity of these values. Instead we read the values multiple times
+ * until they are consistent. In test runs, 3 attempts results in consistent
+ * values. The upper bound is set to 6 attempts and may need to be tuned as per
+ * any new occurences.
+ */
+static void __get_engine_usage_record(struct intel_engine_cs *engine,
+				      u32 *last_in, u32 *id, u32 *total)
 {
 	struct guc_engine_usage_record *rec = intel_guc_engine_usage(engine);
+	int i = 0;
+
+	do {
+		*last_in = READ_ONCE(rec->last_switch_in_stamp);
+		*id = READ_ONCE(rec->current_context_index);
+		*total = READ_ONCE(rec->total_runtime);
+
+		if (READ_ONCE(rec->last_switch_in_stamp) == *last_in &&
+		    READ_ONCE(rec->current_context_index) == *id &&
+		    READ_ONCE(rec->total_runtime) == *total)
+			break;
+	} while (++i < 6);
+}
+
+static void guc_update_engine_gt_clks(struct intel_engine_cs *engine)
+{
 	struct intel_engine_guc_stats *stats = &engine->stats.guc;
 	struct intel_guc *guc = &engine->gt->uc.guc;
-	u32 last_switch = rec->last_switch_in_stamp;
-	u32 ctx_id = rec->current_context_index;
-	u32 total = rec->total_runtime;
+	u32 last_switch, ctx_id, total;
 
 	lockdep_assert_held(&guc->timestamp.lock);
 
+	__get_engine_usage_record(engine, &last_switch, &ctx_id, &total);
+
 	stats->running = ctx_id != ~0U && last_switch;
 	if (stats->running)
 		__extend_last_switch(guc, &stats->start_gt_clk, last_switch);
@@ -1237,6 +1278,10 @@ static ktime_t guc_engine_busyness(struct intel_engine_cs *engine, ktime_t *now)
 	if (!in_reset && intel_gt_pm_get_if_awake(gt)) {
 		stats_saved = *stats;
 		gt_stamp_saved = guc->timestamp.gt_stamp;
+		/*
+		 * Update gt_clks, then gt timestamp to simplify the 'gt_stamp -
+		 * start_gt_clk' calculation below for active engines.
+		 */
 		guc_update_engine_gt_clks(engine);
 		guc_update_pm_timestamp(guc, now);
 		intel_gt_pm_put_async(gt);
@@ -1365,10 +1410,15 @@ void intel_guc_busyness_park(struct intel_gt *gt)
 void intel_guc_busyness_unpark(struct intel_gt *gt)
 {
 	struct intel_guc *guc = &gt->uc.guc;
+	unsigned long flags;
+	ktime_t unused;
 
 	if (!guc_submission_initialized(guc))
 		return;
 
+	spin_lock_irqsave(&guc->timestamp.lock, flags);
+	guc_update_pm_timestamp(guc, &unused);
+	spin_unlock_irqrestore(&guc->timestamp.lock, flags);
 	mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
 			 guc->timestamp.ping_delay);
 }
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Use existing uncore helper to read gpm_timestamp
  2022-01-27  0:15 [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness Umesh Nerlige Ramappa
@ 2022-01-27  0:15 ` Umesh Nerlige Ramappa
  2022-01-27  3:23   ` kernel test robot
  2022-01-27  0:48 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness Patchwork
  1 sibling, 1 reply; 5+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-01-27  0:15 UTC (permalink / raw)
  To: intel-gfx

Use intel_uncore_read64_2x32 to read upper and lower fields of the GPM
timestamp.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c   | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 4e9154cacc58..0549b3edc4a2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1205,20 +1205,6 @@ static u32 gpm_timestamp_shift(struct intel_gt *gt)
 	return 3 - shift;
 }
 
-static u64 gpm_timestamp(struct intel_gt *gt)
-{
-	u32 lo, hi, old_hi, loop = 0;
-
-	hi = intel_uncore_read(gt->uncore, MISC_STATUS1);
-	do {
-		lo = intel_uncore_read(gt->uncore, MISC_STATUS0);
-		old_hi = hi;
-		hi = intel_uncore_read(gt->uncore, MISC_STATUS1);
-	} while (old_hi != hi && loop++ < 2);
-
-	return ((u64)hi << 32) | lo;
-}
-
 static void guc_update_pm_timestamp(struct intel_guc *guc, ktime_t *now)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
@@ -1228,7 +1214,8 @@ static void guc_update_pm_timestamp(struct intel_guc *guc, ktime_t *now)
 	lockdep_assert_held(&guc->timestamp.lock);
 
 	gt_stamp_hi = upper_32_bits(guc->timestamp.gt_stamp);
-	gpm_ts = gpm_timestamp(gt) >> guc->timestamp.shift;
+	gpm_ts = intel_uncore_read64_2x32(uncore, MISC_STATUS0, MISC_STATUS1) >>
+		 guc->timestamp.shift;
 	gt_stamp_lo = lower_32_bits(gpm_ts);
 	*now = ktime_get();
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness
  2022-01-27  0:15 [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness Umesh Nerlige Ramappa
  2022-01-27  0:15 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Use existing uncore helper to read gpm_timestamp Umesh Nerlige Ramappa
@ 2022-01-27  0:48 ` Patchwork
  2022-01-27 12:01   ` Jani Nikula
  1 sibling, 1 reply; 5+ messages in thread
From: Patchwork @ 2022-01-27  0:48 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness
URL   : https://patchwork.freedesktop.org/series/99386/
State : failure

== Summary ==

CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  DESCEND objtool
  CHK     include/generated/compile.h
  CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.o
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c: In function ‘guc_update_pm_timestamp’:
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1217:36: error: ‘uncore’ undeclared (first use in this function); did you mean ‘node’?
  gpm_ts = intel_uncore_read64_2x32(uncore, MISC_STATUS0, MISC_STATUS1) >>
                                    ^~~~~~
                                    node
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1217:36: note: each undeclared identifier is reported only once for each function it appears in
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1210:19: error: unused variable ‘gt’ [-Werror=unused-variable]
  struct intel_gt *gt = guc_to_gt(guc);
                   ^~
cc1: all warnings being treated as errors
scripts/Makefile.build:288: recipe for target 'drivers/gpu/drm/i915/gt/uc/intel_guc_submission.o' failed
make[4]: *** [drivers/gpu/drm/i915/gt/uc/intel_guc_submission.o] Error 1
scripts/Makefile.build:550: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:550: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:550: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1831: recipe for target 'drivers' failed
make: *** [drivers] Error 2



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Use existing uncore helper to read gpm_timestamp
  2022-01-27  0:15 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Use existing uncore helper to read gpm_timestamp Umesh Nerlige Ramappa
@ 2022-01-27  3:23   ` kernel test robot
  0 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2022-01-27  3:23 UTC (permalink / raw)
  To: Umesh Nerlige Ramappa, intel-gfx; +Cc: kbuild-all

Hi Umesh,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on drm-tip/drm-tip]
[cannot apply to linus/master drm-intel/for-linux-next v5.17-rc1 next-20220125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Umesh-Nerlige-Ramappa/drm-i915-pmu-Fix-KMD-and-GuC-race-on-accessing-busyness/20220127-081651
base:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
config: i386-randconfig-m021-20220124 (https://download.01.org/0day-ci/archive/20220127/202201271109.UuklsWk3-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/2d74f201eae0c9ec06e6dc84b363089c0f190058
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Umesh-Nerlige-Ramappa/drm-i915-pmu-Fix-KMD-and-GuC-race-on-accessing-busyness/20220127-081651
        git checkout 2d74f201eae0c9ec06e6dc84b363089c0f190058
        # save the config file to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c: In function 'guc_update_pm_timestamp':
>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1217:36: error: 'uncore' undeclared (first use in this function)
    1217 |  gpm_ts = intel_uncore_read64_2x32(uncore, MISC_STATUS0, MISC_STATUS1) >>
         |                                    ^~~~~~
   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1217:36: note: each undeclared identifier is reported only once for each function it appears in
   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1210:19: warning: unused variable 'gt' [-Wunused-variable]
    1210 |  struct intel_gt *gt = guc_to_gt(guc);
         |                   ^~


vim +/uncore +1217 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

  1207	
  1208	static void guc_update_pm_timestamp(struct intel_guc *guc, ktime_t *now)
  1209	{
  1210		struct intel_gt *gt = guc_to_gt(guc);
  1211		u32 gt_stamp_lo, gt_stamp_hi;
  1212		u64 gpm_ts;
  1213	
  1214		lockdep_assert_held(&guc->timestamp.lock);
  1215	
  1216		gt_stamp_hi = upper_32_bits(guc->timestamp.gt_stamp);
> 1217		gpm_ts = intel_uncore_read64_2x32(uncore, MISC_STATUS0, MISC_STATUS1) >>
  1218			 guc->timestamp.shift;
  1219		gt_stamp_lo = lower_32_bits(gpm_ts);
  1220		*now = ktime_get();
  1221	
  1222		if (gt_stamp_lo < lower_32_bits(guc->timestamp.gt_stamp))
  1223			gt_stamp_hi++;
  1224	
  1225		guc->timestamp.gt_stamp = ((u64)gt_stamp_hi << 32) | gt_stamp_lo;
  1226	}
  1227	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BUILD: failure for series starting with [1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness
  2022-01-27  0:48 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness Patchwork
@ 2022-01-27 12:01   ` Jani Nikula
  0 siblings, 0 replies; 5+ messages in thread
From: Jani Nikula @ 2022-01-27 12:01 UTC (permalink / raw)
  To: Patchwork, Umesh Nerlige Ramappa; +Cc: intel-gfx

On Thu, 27 Jan 2022, Patchwork <patchwork@emeril.freedesktop.org> wrote:
> == Series Details ==
>
> Series: series starting with [1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness
> URL   : https://patchwork.freedesktop.org/series/99386/
> State : failure
>
> == Summary ==
>
> CALL    scripts/checksyscalls.sh
>   CALL    scripts/atomic/check-atomics.sh
>   DESCEND objtool
>   CHK     include/generated/compile.h
>   CC [M]  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.o
> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c: In function ‘guc_update_pm_timestamp’:
> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1217:36: error: ‘uncore’ undeclared (first use in this function); did you mean ‘node’?
>   gpm_ts = intel_uncore_read64_2x32(uncore, MISC_STATUS0, MISC_STATUS1) >>
>                                     ^~~~~~
>                                     node
> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1217:36: note: each undeclared identifier is reported only once for each function it appears in
> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:1210:19: error: unused variable ‘gt’ [-Werror=unused-variable]
>   struct intel_gt *gt = guc_to_gt(guc);

Please use CONFIG_DRM_I915_WERROR=y during development.

BR,
Jani.

>                    ^~
> cc1: all warnings being treated as errors
> scripts/Makefile.build:288: recipe for target 'drivers/gpu/drm/i915/gt/uc/intel_guc_submission.o' failed
> make[4]: *** [drivers/gpu/drm/i915/gt/uc/intel_guc_submission.o] Error 1
> scripts/Makefile.build:550: recipe for target 'drivers/gpu/drm/i915' failed
> make[3]: *** [drivers/gpu/drm/i915] Error 2
> scripts/Makefile.build:550: recipe for target 'drivers/gpu/drm' failed
> make[2]: *** [drivers/gpu/drm] Error 2
> scripts/Makefile.build:550: recipe for target 'drivers/gpu' failed
> make[1]: *** [drivers/gpu] Error 2
> Makefile:1831: recipe for target 'drivers' failed
> make: *** [drivers] Error 2
>
>

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-27 12:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27  0:15 [Intel-gfx] [PATCH 1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness Umesh Nerlige Ramappa
2022-01-27  0:15 ` [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Use existing uncore helper to read gpm_timestamp Umesh Nerlige Ramappa
2022-01-27  3:23   ` kernel test robot
2022-01-27  0:48 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/2] drm/i915/pmu: Fix KMD and GuC race on accessing busyness Patchwork
2022-01-27 12:01   ` Jani Nikula

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).