All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy
@ 2022-06-10  0:09 ` Douglas Anderson
  0 siblings, 0 replies; 8+ messages in thread
From: Douglas Anderson @ 2022-06-10  0:09 UTC (permalink / raw)
  To: Rob Clark, Akhil P Oommen, Jordan Crouse
  Cc: Douglas Anderson, Abhinav Kumar, Bjorn Andersson, Chia-I Wu,
	Dan Carpenter, Daniel Vetter, David Airlie, Dmitry Baryshkov,
	Eric Anholt, Jonathan Marek, Sean Paul, Viresh Kumar,
	Vladimir Lypak, Yangtao Li, dri-devel, freedreno, linux-arm-msm,
	linux-kernel

From testing on sc7180-trogdor devices, reading the GMU registers
needs the GMU clocks to be enabled. Those clocks get turned on in
a6xx_gmu_resume(). Confusingly enough, that function is called as a
result of the runtime_pm of the GPU "struct device", not the GMU
"struct device". Unfortunately the current a6xx_gpu_busy() grabs a
reference to the GMU's "struct device".

The fact that we were grabbing the wrong reference was easily seen to
cause crashes that happen if we change the GPU's pm_runtime usage to
not use autosuspend. It's also believed to cause some long tail GPU
crashes even with autosuspend.

We could look at changing it so that we do pm_runtime_get_if_in_use()
on the GPU's "struct device", but then we run into a different
problem. pm_runtime_get_if_in_use() will return 0 for the GPU's
"struct device" the whole time when we're in the "autosuspend
delay". That is, when we drop the last reference to the GPU but we're
waiting a period before actually suspending then we'll think the GPU
is off. One reason that's bad is that if the GPU didn't actually turn
off then the cycle counter doesn't lose state and that throws off all
of our calculations.

Let's change the code to keep track of the suspend state of
devfreq. msm_devfreq_suspend() is always called before we actually
suspend the GPU and msm_devfreq_resume() after we resume it. This
means we can use the suspended state to know if we're powered or not.

NOTE: one might wonder when exactly our status function is called when
devfreq is supposed to be disabled. The stack crawl I captured was:
  msm_devfreq_get_dev_status
  devfreq_simple_ondemand_func
  devfreq_update_target
  qos_notifier_call
  qos_max_notifier_call
  blocking_notifier_call_chain
  pm_qos_update_target
  freq_qos_apply
  apply_constraint
  __dev_pm_qos_update_request
  dev_pm_qos_update_request
  msm_devfreq_idle_work

Fixes: eadf79286a4b ("drm/msm: Check for powered down HW in the devfreq callbacks")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v3:
- Totally rewrote to not use the pm_runtime functions.
- Moved the code to be common for all adreno GPUs.

Changes in v2:
- Move the set_freq runtime pm grab to the GPU file.
- Use <= for the pm_runtime test, not ==.

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  8 ------
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 13 ++++-----
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 +++------
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  3 ++-
 drivers/gpu/drm/msm/msm_gpu.h         |  9 ++++++-
 drivers/gpu/drm/msm/msm_gpu_devfreq.c | 39 +++++++++++++++++++++------
 6 files changed, 51 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index c424e9a37669..3dcec7acb384 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1666,18 +1666,10 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
 {
 	u64 busy_cycles;
 
-	/* Only read the gpu busy if the hardware is already active */
-	if (pm_runtime_get_if_in_use(&gpu->pdev->dev) == 0) {
-		*out_sample_rate = 1;
-		return 0;
-	}
-
 	busy_cycles = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_RBBM_0_LO,
 			REG_A5XX_RBBM_PERFCTR_RBBM_0_HI);
 	*out_sample_rate = clk_get_rate(gpu->core_clk);
 
-	pm_runtime_put(&gpu->pdev->dev);
-
 	return busy_cycles;
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 9f76f5b15759..dc715d88ff21 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -102,7 +102,8 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu)
 		A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF));
 }
 
-void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
+void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+		       bool suspended)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
@@ -127,15 +128,16 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
 
 	/*
 	 * This can get called from devfreq while the hardware is idle. Don't
-	 * bring up the power if it isn't already active
+	 * bring up the power if it isn't already active. All we're doing here
+	 * is updating the frequency so that when we come back online we're at
+	 * the right rate.
 	 */
-	if (pm_runtime_get_if_in_use(gmu->dev) == 0)
+	if (suspended)
 		return;
 
 	if (!gmu->legacy) {
 		a6xx_hfi_set_freq(gmu, perf_index);
 		dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
-		pm_runtime_put(gmu->dev);
 		return;
 	}
 
@@ -159,7 +161,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
 		dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);
 
 	dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
-	pm_runtime_put(gmu->dev);
 }
 
 unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu)
@@ -895,7 +896,7 @@ static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
 		return;
 
 	gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */
-	a6xx_gmu_set_freq(gpu, gpu_opp);
+	a6xx_gmu_set_freq(gpu, gpu_opp, false);
 	dev_pm_opp_put(gpu_opp);
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 42ed9a3c4905..8c02a67f29f2 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1658,27 +1658,21 @@ static u64 a6xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
 	/* 19.2MHz */
 	*out_sample_rate = 19200000;
 
-	/* Only read the gpu busy if the hardware is already active */
-	if (pm_runtime_get_if_in_use(a6xx_gpu->gmu.dev) == 0)
-		return 0;
-
 	busy_cycles = gmu_read64(&a6xx_gpu->gmu,
 			REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_L,
 			REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_H);
 
-
-	pm_runtime_put(a6xx_gpu->gmu.dev);
-
 	return busy_cycles;
 }
 
-static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
+static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+			      bool suspended)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 
 	mutex_lock(&a6xx_gpu->gmu.lock);
-	a6xx_gmu_set_freq(gpu, opp);
+	a6xx_gmu_set_freq(gpu, opp, suspended);
 	mutex_unlock(&a6xx_gpu->gmu.lock);
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 86e0a7c3fe6d..ab853f61db63 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -77,7 +77,8 @@ void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
 int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
 void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
 
-void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp);
+void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+		       bool suspended);
 unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu);
 
 void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 6def00883046..7ced1a30d4e8 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -68,7 +68,8 @@ struct msm_gpu_funcs {
 	struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu);
 	int (*gpu_state_put)(struct msm_gpu_state *state);
 	unsigned long (*gpu_get_freq)(struct msm_gpu *gpu);
-	void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp);
+	void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+			     bool suspended);
 	struct msm_gem_address_space *(*create_address_space)
 		(struct msm_gpu *gpu, struct platform_device *pdev);
 	struct msm_gem_address_space *(*create_private_address_space)
@@ -92,6 +93,9 @@ struct msm_gpu_devfreq {
 	/** devfreq: devfreq instance */
 	struct devfreq *devfreq;
 
+	/** lock: lock for "suspended", "busy_cycles", and "time" */
+	struct mutex lock;
+
 	/**
 	 * idle_constraint:
 	 *
@@ -135,6 +139,9 @@ struct msm_gpu_devfreq {
 	 * elapsed
 	 */
 	struct msm_hrtimer_work boost_work;
+
+	/** suspended: tracks if we're suspended */
+	bool suspended;
 };
 
 struct msm_gpu {
diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
index d2539ca78c29..ea94bc18e72e 100644
--- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
+++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
@@ -20,6 +20,7 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
 		u32 flags)
 {
 	struct msm_gpu *gpu = dev_to_gpu(dev);
+	struct msm_gpu_devfreq *df = &gpu->devfreq;
 	struct dev_pm_opp *opp;
 
 	/*
@@ -32,10 +33,13 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
 
 	trace_msm_gpu_freq_change(dev_pm_opp_get_freq(opp));
 
-	if (gpu->funcs->gpu_set_freq)
-		gpu->funcs->gpu_set_freq(gpu, opp);
-	else
+	if (gpu->funcs->gpu_set_freq) {
+		mutex_lock(&df->lock);
+		gpu->funcs->gpu_set_freq(gpu, opp, df->suspended);
+		mutex_unlock(&df->lock);
+	} else {
 		clk_set_rate(gpu->core_clk, *freq);
+	}
 
 	dev_pm_opp_put(opp);
 
@@ -58,15 +62,24 @@ static void get_raw_dev_status(struct msm_gpu *gpu,
 	unsigned long sample_rate;
 	ktime_t time;
 
+	mutex_lock(&df->lock);
+
 	status->current_frequency = get_freq(gpu);
-	busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
 	time = ktime_get();
-
-	busy_time = busy_cycles - df->busy_cycles;
 	status->total_time = ktime_us_delta(time, df->time);
+	df->time = time;
 
+	if (df->suspended) {
+		mutex_unlock(&df->lock);
+		status->busy_time = 0;
+		return;
+	}
+
+	busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
+	busy_time = busy_cycles - df->busy_cycles;
 	df->busy_cycles = busy_cycles;
-	df->time = time;
+
+	mutex_unlock(&df->lock);
 
 	busy_time *= USEC_PER_SEC;
 	do_div(busy_time, sample_rate);
@@ -175,6 +188,8 @@ void msm_devfreq_init(struct msm_gpu *gpu)
 	if (!gpu->funcs->gpu_busy)
 		return;
 
+	mutex_init(&df->lock);
+
 	dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq,
 			       DEV_PM_QOS_MAX_FREQUENCY,
 			       PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
@@ -244,12 +259,16 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu)
 void msm_devfreq_resume(struct msm_gpu *gpu)
 {
 	struct msm_gpu_devfreq *df = &gpu->devfreq;
+	unsigned long sample_rate;
 
 	if (!has_devfreq(gpu))
 		return;
 
-	df->busy_cycles = 0;
+	mutex_lock(&df->lock);
+	df->busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
 	df->time = ktime_get();
+	df->suspended = false;
+	mutex_unlock(&df->lock);
 
 	devfreq_resume_device(df->devfreq);
 }
@@ -261,6 +280,10 @@ void msm_devfreq_suspend(struct msm_gpu *gpu)
 	if (!has_devfreq(gpu))
 		return;
 
+	mutex_lock(&df->lock);
+	df->suspended = true;
+	mutex_unlock(&df->lock);
+
 	devfreq_suspend_device(df->devfreq);
 
 	cancel_idle_work(df);
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3] drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy
@ 2022-06-10  0:09 ` Douglas Anderson
  0 siblings, 0 replies; 8+ messages in thread
From: Douglas Anderson @ 2022-06-10  0:09 UTC (permalink / raw)
  To: Rob Clark, Akhil P Oommen, Jordan Crouse
  Cc: freedreno, linux-kernel, Jonathan Marek, Abhinav Kumar,
	David Airlie, Viresh Kumar, Yangtao Li, Vladimir Lypak,
	Douglas Anderson, dri-devel, Bjorn Andersson, Eric Anholt,
	linux-arm-msm, Dmitry Baryshkov, Sean Paul, Dan Carpenter

From testing on sc7180-trogdor devices, reading the GMU registers
needs the GMU clocks to be enabled. Those clocks get turned on in
a6xx_gmu_resume(). Confusingly enough, that function is called as a
result of the runtime_pm of the GPU "struct device", not the GMU
"struct device". Unfortunately the current a6xx_gpu_busy() grabs a
reference to the GMU's "struct device".

The fact that we were grabbing the wrong reference was easily seen to
cause crashes that happen if we change the GPU's pm_runtime usage to
not use autosuspend. It's also believed to cause some long tail GPU
crashes even with autosuspend.

We could look at changing it so that we do pm_runtime_get_if_in_use()
on the GPU's "struct device", but then we run into a different
problem. pm_runtime_get_if_in_use() will return 0 for the GPU's
"struct device" the whole time when we're in the "autosuspend
delay". That is, when we drop the last reference to the GPU but we're
waiting a period before actually suspending then we'll think the GPU
is off. One reason that's bad is that if the GPU didn't actually turn
off then the cycle counter doesn't lose state and that throws off all
of our calculations.

Let's change the code to keep track of the suspend state of
devfreq. msm_devfreq_suspend() is always called before we actually
suspend the GPU and msm_devfreq_resume() after we resume it. This
means we can use the suspended state to know if we're powered or not.

NOTE: one might wonder when exactly our status function is called when
devfreq is supposed to be disabled. The stack crawl I captured was:
  msm_devfreq_get_dev_status
  devfreq_simple_ondemand_func
  devfreq_update_target
  qos_notifier_call
  qos_max_notifier_call
  blocking_notifier_call_chain
  pm_qos_update_target
  freq_qos_apply
  apply_constraint
  __dev_pm_qos_update_request
  dev_pm_qos_update_request
  msm_devfreq_idle_work

Fixes: eadf79286a4b ("drm/msm: Check for powered down HW in the devfreq callbacks")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v3:
- Totally rewrote to not use the pm_runtime functions.
- Moved the code to be common for all adreno GPUs.

Changes in v2:
- Move the set_freq runtime pm grab to the GPU file.
- Use <= for the pm_runtime test, not ==.

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  8 ------
 drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 13 ++++-----
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 +++------
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  3 ++-
 drivers/gpu/drm/msm/msm_gpu.h         |  9 ++++++-
 drivers/gpu/drm/msm/msm_gpu_devfreq.c | 39 +++++++++++++++++++++------
 6 files changed, 51 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index c424e9a37669..3dcec7acb384 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1666,18 +1666,10 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
 {
 	u64 busy_cycles;
 
-	/* Only read the gpu busy if the hardware is already active */
-	if (pm_runtime_get_if_in_use(&gpu->pdev->dev) == 0) {
-		*out_sample_rate = 1;
-		return 0;
-	}
-
 	busy_cycles = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_RBBM_0_LO,
 			REG_A5XX_RBBM_PERFCTR_RBBM_0_HI);
 	*out_sample_rate = clk_get_rate(gpu->core_clk);
 
-	pm_runtime_put(&gpu->pdev->dev);
-
 	return busy_cycles;
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 9f76f5b15759..dc715d88ff21 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -102,7 +102,8 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu)
 		A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF));
 }
 
-void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
+void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+		       bool suspended)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
@@ -127,15 +128,16 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
 
 	/*
 	 * This can get called from devfreq while the hardware is idle. Don't
-	 * bring up the power if it isn't already active
+	 * bring up the power if it isn't already active. All we're doing here
+	 * is updating the frequency so that when we come back online we're at
+	 * the right rate.
 	 */
-	if (pm_runtime_get_if_in_use(gmu->dev) == 0)
+	if (suspended)
 		return;
 
 	if (!gmu->legacy) {
 		a6xx_hfi_set_freq(gmu, perf_index);
 		dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
-		pm_runtime_put(gmu->dev);
 		return;
 	}
 
@@ -159,7 +161,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
 		dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);
 
 	dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
-	pm_runtime_put(gmu->dev);
 }
 
 unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu)
@@ -895,7 +896,7 @@ static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
 		return;
 
 	gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */
-	a6xx_gmu_set_freq(gpu, gpu_opp);
+	a6xx_gmu_set_freq(gpu, gpu_opp, false);
 	dev_pm_opp_put(gpu_opp);
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 42ed9a3c4905..8c02a67f29f2 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1658,27 +1658,21 @@ static u64 a6xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
 	/* 19.2MHz */
 	*out_sample_rate = 19200000;
 
-	/* Only read the gpu busy if the hardware is already active */
-	if (pm_runtime_get_if_in_use(a6xx_gpu->gmu.dev) == 0)
-		return 0;
-
 	busy_cycles = gmu_read64(&a6xx_gpu->gmu,
 			REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_L,
 			REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_H);
 
-
-	pm_runtime_put(a6xx_gpu->gmu.dev);
-
 	return busy_cycles;
 }
 
-static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
+static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+			      bool suspended)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
 
 	mutex_lock(&a6xx_gpu->gmu.lock);
-	a6xx_gmu_set_freq(gpu, opp);
+	a6xx_gmu_set_freq(gpu, opp, suspended);
 	mutex_unlock(&a6xx_gpu->gmu.lock);
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 86e0a7c3fe6d..ab853f61db63 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -77,7 +77,8 @@ void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
 int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
 void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
 
-void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp);
+void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+		       bool suspended);
 unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu);
 
 void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 6def00883046..7ced1a30d4e8 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -68,7 +68,8 @@ struct msm_gpu_funcs {
 	struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu);
 	int (*gpu_state_put)(struct msm_gpu_state *state);
 	unsigned long (*gpu_get_freq)(struct msm_gpu *gpu);
-	void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp);
+	void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp,
+			     bool suspended);
 	struct msm_gem_address_space *(*create_address_space)
 		(struct msm_gpu *gpu, struct platform_device *pdev);
 	struct msm_gem_address_space *(*create_private_address_space)
@@ -92,6 +93,9 @@ struct msm_gpu_devfreq {
 	/** devfreq: devfreq instance */
 	struct devfreq *devfreq;
 
+	/** lock: lock for "suspended", "busy_cycles", and "time" */
+	struct mutex lock;
+
 	/**
 	 * idle_constraint:
 	 *
@@ -135,6 +139,9 @@ struct msm_gpu_devfreq {
 	 * elapsed
 	 */
 	struct msm_hrtimer_work boost_work;
+
+	/** suspended: tracks if we're suspended */
+	bool suspended;
 };
 
 struct msm_gpu {
diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
index d2539ca78c29..ea94bc18e72e 100644
--- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
+++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
@@ -20,6 +20,7 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
 		u32 flags)
 {
 	struct msm_gpu *gpu = dev_to_gpu(dev);
+	struct msm_gpu_devfreq *df = &gpu->devfreq;
 	struct dev_pm_opp *opp;
 
 	/*
@@ -32,10 +33,13 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
 
 	trace_msm_gpu_freq_change(dev_pm_opp_get_freq(opp));
 
-	if (gpu->funcs->gpu_set_freq)
-		gpu->funcs->gpu_set_freq(gpu, opp);
-	else
+	if (gpu->funcs->gpu_set_freq) {
+		mutex_lock(&df->lock);
+		gpu->funcs->gpu_set_freq(gpu, opp, df->suspended);
+		mutex_unlock(&df->lock);
+	} else {
 		clk_set_rate(gpu->core_clk, *freq);
+	}
 
 	dev_pm_opp_put(opp);
 
@@ -58,15 +62,24 @@ static void get_raw_dev_status(struct msm_gpu *gpu,
 	unsigned long sample_rate;
 	ktime_t time;
 
+	mutex_lock(&df->lock);
+
 	status->current_frequency = get_freq(gpu);
-	busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
 	time = ktime_get();
-
-	busy_time = busy_cycles - df->busy_cycles;
 	status->total_time = ktime_us_delta(time, df->time);
+	df->time = time;
 
+	if (df->suspended) {
+		mutex_unlock(&df->lock);
+		status->busy_time = 0;
+		return;
+	}
+
+	busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
+	busy_time = busy_cycles - df->busy_cycles;
 	df->busy_cycles = busy_cycles;
-	df->time = time;
+
+	mutex_unlock(&df->lock);
 
 	busy_time *= USEC_PER_SEC;
 	do_div(busy_time, sample_rate);
@@ -175,6 +188,8 @@ void msm_devfreq_init(struct msm_gpu *gpu)
 	if (!gpu->funcs->gpu_busy)
 		return;
 
+	mutex_init(&df->lock);
+
 	dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq,
 			       DEV_PM_QOS_MAX_FREQUENCY,
 			       PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
@@ -244,12 +259,16 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu)
 void msm_devfreq_resume(struct msm_gpu *gpu)
 {
 	struct msm_gpu_devfreq *df = &gpu->devfreq;
+	unsigned long sample_rate;
 
 	if (!has_devfreq(gpu))
 		return;
 
-	df->busy_cycles = 0;
+	mutex_lock(&df->lock);
+	df->busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
 	df->time = ktime_get();
+	df->suspended = false;
+	mutex_unlock(&df->lock);
 
 	devfreq_resume_device(df->devfreq);
 }
@@ -261,6 +280,10 @@ void msm_devfreq_suspend(struct msm_gpu *gpu)
 	if (!has_devfreq(gpu))
 		return;
 
+	mutex_lock(&df->lock);
+	df->suspended = true;
+	mutex_unlock(&df->lock);
+
 	devfreq_suspend_device(df->devfreq);
 
 	cancel_idle_work(df);
-- 
2.36.1.476.g0c4daa206d-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy
  2022-06-10  0:09 ` Douglas Anderson
@ 2022-06-10 15:06   ` Rob Clark
  -1 siblings, 0 replies; 8+ messages in thread
From: Rob Clark @ 2022-06-10 15:06 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Akhil P Oommen, Jordan Crouse, Abhinav Kumar, Bjorn Andersson,
	Chia-I Wu, Dan Carpenter, Daniel Vetter, David Airlie,
	Dmitry Baryshkov, Eric Anholt, Jonathan Marek, Sean Paul,
	Viresh Kumar, Vladimir Lypak, Yangtao Li, dri-devel, freedreno,
	linux-arm-msm, Linux Kernel Mailing List

On Thu, Jun 9, 2022 at 5:10 PM Douglas Anderson <dianders@chromium.org> wrote:
>
> From testing on sc7180-trogdor devices, reading the GMU registers
> needs the GMU clocks to be enabled. Those clocks get turned on in
> a6xx_gmu_resume(). Confusingly enough, that function is called as a
> result of the runtime_pm of the GPU "struct device", not the GMU
> "struct device". Unfortunately the current a6xx_gpu_busy() grabs a
> reference to the GMU's "struct device".
>
> The fact that we were grabbing the wrong reference was easily seen to
> cause crashes that happen if we change the GPU's pm_runtime usage to
> not use autosuspend. It's also believed to cause some long tail GPU
> crashes even with autosuspend.
>
> We could look at changing it so that we do pm_runtime_get_if_in_use()
> on the GPU's "struct device", but then we run into a different
> problem. pm_runtime_get_if_in_use() will return 0 for the GPU's
> "struct device" the whole time when we're in the "autosuspend
> delay". That is, when we drop the last reference to the GPU but we're
> waiting a period before actually suspending then we'll think the GPU
> is off. One reason that's bad is that if the GPU didn't actually turn
> off then the cycle counter doesn't lose state and that throws off all
> of our calculations.
>
> Let's change the code to keep track of the suspend state of
> devfreq. msm_devfreq_suspend() is always called before we actually
> suspend the GPU and msm_devfreq_resume() after we resume it. This
> means we can use the suspended state to know if we're powered or not.
>
> NOTE: one might wonder when exactly our status function is called when
> devfreq is supposed to be disabled. The stack crawl I captured was:
>   msm_devfreq_get_dev_status
>   devfreq_simple_ondemand_func
>   devfreq_update_target
>   qos_notifier_call
>   qos_max_notifier_call
>   blocking_notifier_call_chain
>   pm_qos_update_target
>   freq_qos_apply
>   apply_constraint
>   __dev_pm_qos_update_request
>   dev_pm_qos_update_request
>   msm_devfreq_idle_work
>
> Fixes: eadf79286a4b ("drm/msm: Check for powered down HW in the devfreq callbacks")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v3:
> - Totally rewrote to not use the pm_runtime functions.
> - Moved the code to be common for all adreno GPUs.
>
> Changes in v2:
> - Move the set_freq runtime pm grab to the GPU file.
> - Use <= for the pm_runtime test, not ==.
>
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  8 ------
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 13 ++++-----
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 +++------
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  3 ++-
>  drivers/gpu/drm/msm/msm_gpu.h         |  9 ++++++-
>  drivers/gpu/drm/msm/msm_gpu_devfreq.c | 39 +++++++++++++++++++++------
>  6 files changed, 51 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index c424e9a37669..3dcec7acb384 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1666,18 +1666,10 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
>  {
>         u64 busy_cycles;
>
> -       /* Only read the gpu busy if the hardware is already active */
> -       if (pm_runtime_get_if_in_use(&gpu->pdev->dev) == 0) {
> -               *out_sample_rate = 1;
> -               return 0;
> -       }
> -
>         busy_cycles = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_RBBM_0_LO,
>                         REG_A5XX_RBBM_PERFCTR_RBBM_0_HI);
>         *out_sample_rate = clk_get_rate(gpu->core_clk);
>
> -       pm_runtime_put(&gpu->pdev->dev);
> -
>         return busy_cycles;
>  }
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 9f76f5b15759..dc715d88ff21 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -102,7 +102,8 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu)
>                 A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF));
>  }
>
> -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +                      bool suspended)
>  {
>         struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>         struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> @@ -127,15 +128,16 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
>
>         /*
>          * This can get called from devfreq while the hardware is idle. Don't
> -        * bring up the power if it isn't already active
> +        * bring up the power if it isn't already active. All we're doing here
> +        * is updating the frequency so that when we come back online we're at
> +        * the right rate.
>          */
> -       if (pm_runtime_get_if_in_use(gmu->dev) == 0)
> +       if (suspended)
>                 return;
>
>         if (!gmu->legacy) {
>                 a6xx_hfi_set_freq(gmu, perf_index);
>                 dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> -               pm_runtime_put(gmu->dev);
>                 return;
>         }
>
> @@ -159,7 +161,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
>                 dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);
>
>         dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> -       pm_runtime_put(gmu->dev);
>  }
>
>  unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu)
> @@ -895,7 +896,7 @@ static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
>                 return;
>
>         gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */
> -       a6xx_gmu_set_freq(gpu, gpu_opp);
> +       a6xx_gmu_set_freq(gpu, gpu_opp, false);
>         dev_pm_opp_put(gpu_opp);
>  }
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 42ed9a3c4905..8c02a67f29f2 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1658,27 +1658,21 @@ static u64 a6xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
>         /* 19.2MHz */
>         *out_sample_rate = 19200000;
>
> -       /* Only read the gpu busy if the hardware is already active */
> -       if (pm_runtime_get_if_in_use(a6xx_gpu->gmu.dev) == 0)
> -               return 0;
> -
>         busy_cycles = gmu_read64(&a6xx_gpu->gmu,
>                         REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_L,
>                         REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_H);
>
> -
> -       pm_runtime_put(a6xx_gpu->gmu.dev);
> -
>         return busy_cycles;
>  }
>
> -static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> +static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +                             bool suspended)
>  {
>         struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>         struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>
>         mutex_lock(&a6xx_gpu->gmu.lock);
> -       a6xx_gmu_set_freq(gpu, opp);
> +       a6xx_gmu_set_freq(gpu, opp, suspended);
>         mutex_unlock(&a6xx_gpu->gmu.lock);
>  }
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 86e0a7c3fe6d..ab853f61db63 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -77,7 +77,8 @@ void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>
> -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +                      bool suspended);
>  unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu);
>
>  void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 6def00883046..7ced1a30d4e8 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -68,7 +68,8 @@ struct msm_gpu_funcs {
>         struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu);
>         int (*gpu_state_put)(struct msm_gpu_state *state);
>         unsigned long (*gpu_get_freq)(struct msm_gpu *gpu);
> -       void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> +       void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +                            bool suspended);

nit, I suppose we should add a comment that gpu_set_freq()/gpu_busy()
should *not* call runpm get/put as they are called with lock held

Otherwise looks good to me.. kinda sad that we have to track the
suspended state behind runpm's back, but I don't really see any better
way.

Reviewed-by: Rob Clark <robdclark@gmail.com>

>         struct msm_gem_address_space *(*create_address_space)
>                 (struct msm_gpu *gpu, struct platform_device *pdev);
>         struct msm_gem_address_space *(*create_private_address_space)
> @@ -92,6 +93,9 @@ struct msm_gpu_devfreq {
>         /** devfreq: devfreq instance */
>         struct devfreq *devfreq;
>
> +       /** lock: lock for "suspended", "busy_cycles", and "time" */
> +       struct mutex lock;
> +
>         /**
>          * idle_constraint:
>          *
> @@ -135,6 +139,9 @@ struct msm_gpu_devfreq {
>          * elapsed
>          */
>         struct msm_hrtimer_work boost_work;
> +
> +       /** suspended: tracks if we're suspended */
> +       bool suspended;
>  };
>
>  struct msm_gpu {
> diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> index d2539ca78c29..ea94bc18e72e 100644
> --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> @@ -20,6 +20,7 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
>                 u32 flags)
>  {
>         struct msm_gpu *gpu = dev_to_gpu(dev);
> +       struct msm_gpu_devfreq *df = &gpu->devfreq;
>         struct dev_pm_opp *opp;
>
>         /*
> @@ -32,10 +33,13 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
>
>         trace_msm_gpu_freq_change(dev_pm_opp_get_freq(opp));
>
> -       if (gpu->funcs->gpu_set_freq)
> -               gpu->funcs->gpu_set_freq(gpu, opp);
> -       else
> +       if (gpu->funcs->gpu_set_freq) {
> +               mutex_lock(&df->lock);
> +               gpu->funcs->gpu_set_freq(gpu, opp, df->suspended);
> +               mutex_unlock(&df->lock);
> +       } else {
>                 clk_set_rate(gpu->core_clk, *freq);
> +       }
>
>         dev_pm_opp_put(opp);
>
> @@ -58,15 +62,24 @@ static void get_raw_dev_status(struct msm_gpu *gpu,
>         unsigned long sample_rate;
>         ktime_t time;
>
> +       mutex_lock(&df->lock);
> +
>         status->current_frequency = get_freq(gpu);
> -       busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
>         time = ktime_get();
> -
> -       busy_time = busy_cycles - df->busy_cycles;
>         status->total_time = ktime_us_delta(time, df->time);
> +       df->time = time;
>
> +       if (df->suspended) {
> +               mutex_unlock(&df->lock);
> +               status->busy_time = 0;
> +               return;
> +       }
> +
> +       busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
> +       busy_time = busy_cycles - df->busy_cycles;
>         df->busy_cycles = busy_cycles;
> -       df->time = time;
> +
> +       mutex_unlock(&df->lock);
>
>         busy_time *= USEC_PER_SEC;
>         do_div(busy_time, sample_rate);
> @@ -175,6 +188,8 @@ void msm_devfreq_init(struct msm_gpu *gpu)
>         if (!gpu->funcs->gpu_busy)
>                 return;
>
> +       mutex_init(&df->lock);
> +
>         dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq,
>                                DEV_PM_QOS_MAX_FREQUENCY,
>                                PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
> @@ -244,12 +259,16 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu)
>  void msm_devfreq_resume(struct msm_gpu *gpu)
>  {
>         struct msm_gpu_devfreq *df = &gpu->devfreq;
> +       unsigned long sample_rate;
>
>         if (!has_devfreq(gpu))
>                 return;
>
> -       df->busy_cycles = 0;
> +       mutex_lock(&df->lock);
> +       df->busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
>         df->time = ktime_get();
> +       df->suspended = false;
> +       mutex_unlock(&df->lock);
>
>         devfreq_resume_device(df->devfreq);
>  }
> @@ -261,6 +280,10 @@ void msm_devfreq_suspend(struct msm_gpu *gpu)
>         if (!has_devfreq(gpu))
>                 return;
>
> +       mutex_lock(&df->lock);
> +       df->suspended = true;
> +       mutex_unlock(&df->lock);
> +
>         devfreq_suspend_device(df->devfreq);
>
>         cancel_idle_work(df);
> --
> 2.36.1.476.g0c4daa206d-goog
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy
@ 2022-06-10 15:06   ` Rob Clark
  0 siblings, 0 replies; 8+ messages in thread
From: Rob Clark @ 2022-06-10 15:06 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: freedreno, Linux Kernel Mailing List, Jonathan Marek,
	Eric Anholt, Akhil P Oommen, Viresh Kumar, Yangtao Li,
	Vladimir Lypak, Abhinav Kumar, dri-devel, Bjorn Andersson,
	David Airlie, linux-arm-msm, Dmitry Baryshkov, Jordan Crouse,
	Sean Paul, Dan Carpenter

On Thu, Jun 9, 2022 at 5:10 PM Douglas Anderson <dianders@chromium.org> wrote:
>
> From testing on sc7180-trogdor devices, reading the GMU registers
> needs the GMU clocks to be enabled. Those clocks get turned on in
> a6xx_gmu_resume(). Confusingly enough, that function is called as a
> result of the runtime_pm of the GPU "struct device", not the GMU
> "struct device". Unfortunately the current a6xx_gpu_busy() grabs a
> reference to the GMU's "struct device".
>
> The fact that we were grabbing the wrong reference was easily seen to
> cause crashes that happen if we change the GPU's pm_runtime usage to
> not use autosuspend. It's also believed to cause some long tail GPU
> crashes even with autosuspend.
>
> We could look at changing it so that we do pm_runtime_get_if_in_use()
> on the GPU's "struct device", but then we run into a different
> problem. pm_runtime_get_if_in_use() will return 0 for the GPU's
> "struct device" the whole time when we're in the "autosuspend
> delay". That is, when we drop the last reference to the GPU but we're
> waiting a period before actually suspending then we'll think the GPU
> is off. One reason that's bad is that if the GPU didn't actually turn
> off then the cycle counter doesn't lose state and that throws off all
> of our calculations.
>
> Let's change the code to keep track of the suspend state of
> devfreq. msm_devfreq_suspend() is always called before we actually
> suspend the GPU and msm_devfreq_resume() after we resume it. This
> means we can use the suspended state to know if we're powered or not.
>
> NOTE: one might wonder when exactly our status function is called when
> devfreq is supposed to be disabled. The stack crawl I captured was:
>   msm_devfreq_get_dev_status
>   devfreq_simple_ondemand_func
>   devfreq_update_target
>   qos_notifier_call
>   qos_max_notifier_call
>   blocking_notifier_call_chain
>   pm_qos_update_target
>   freq_qos_apply
>   apply_constraint
>   __dev_pm_qos_update_request
>   dev_pm_qos_update_request
>   msm_devfreq_idle_work
>
> Fixes: eadf79286a4b ("drm/msm: Check for powered down HW in the devfreq callbacks")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v3:
> - Totally rewrote to not use the pm_runtime functions.
> - Moved the code to be common for all adreno GPUs.
>
> Changes in v2:
> - Move the set_freq runtime pm grab to the GPU file.
> - Use <= for the pm_runtime test, not ==.
>
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  8 ------
>  drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 13 ++++-----
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 +++------
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  3 ++-
>  drivers/gpu/drm/msm/msm_gpu.h         |  9 ++++++-
>  drivers/gpu/drm/msm/msm_gpu_devfreq.c | 39 +++++++++++++++++++++------
>  6 files changed, 51 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index c424e9a37669..3dcec7acb384 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1666,18 +1666,10 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
>  {
>         u64 busy_cycles;
>
> -       /* Only read the gpu busy if the hardware is already active */
> -       if (pm_runtime_get_if_in_use(&gpu->pdev->dev) == 0) {
> -               *out_sample_rate = 1;
> -               return 0;
> -       }
> -
>         busy_cycles = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_RBBM_0_LO,
>                         REG_A5XX_RBBM_PERFCTR_RBBM_0_HI);
>         *out_sample_rate = clk_get_rate(gpu->core_clk);
>
> -       pm_runtime_put(&gpu->pdev->dev);
> -
>         return busy_cycles;
>  }
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 9f76f5b15759..dc715d88ff21 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -102,7 +102,8 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu)
>                 A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF));
>  }
>
> -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +                      bool suspended)
>  {
>         struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>         struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> @@ -127,15 +128,16 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
>
>         /*
>          * This can get called from devfreq while the hardware is idle. Don't
> -        * bring up the power if it isn't already active
> +        * bring up the power if it isn't already active. All we're doing here
> +        * is updating the frequency so that when we come back online we're at
> +        * the right rate.
>          */
> -       if (pm_runtime_get_if_in_use(gmu->dev) == 0)
> +       if (suspended)
>                 return;
>
>         if (!gmu->legacy) {
>                 a6xx_hfi_set_freq(gmu, perf_index);
>                 dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> -               pm_runtime_put(gmu->dev);
>                 return;
>         }
>
> @@ -159,7 +161,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
>                 dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);
>
>         dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> -       pm_runtime_put(gmu->dev);
>  }
>
>  unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu)
> @@ -895,7 +896,7 @@ static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
>                 return;
>
>         gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */
> -       a6xx_gmu_set_freq(gpu, gpu_opp);
> +       a6xx_gmu_set_freq(gpu, gpu_opp, false);
>         dev_pm_opp_put(gpu_opp);
>  }
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 42ed9a3c4905..8c02a67f29f2 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1658,27 +1658,21 @@ static u64 a6xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
>         /* 19.2MHz */
>         *out_sample_rate = 19200000;
>
> -       /* Only read the gpu busy if the hardware is already active */
> -       if (pm_runtime_get_if_in_use(a6xx_gpu->gmu.dev) == 0)
> -               return 0;
> -
>         busy_cycles = gmu_read64(&a6xx_gpu->gmu,
>                         REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_L,
>                         REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_H);
>
> -
> -       pm_runtime_put(a6xx_gpu->gmu.dev);
> -
>         return busy_cycles;
>  }
>
> -static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> +static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +                             bool suspended)
>  {
>         struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>         struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>
>         mutex_lock(&a6xx_gpu->gmu.lock);
> -       a6xx_gmu_set_freq(gpu, opp);
> +       a6xx_gmu_set_freq(gpu, opp, suspended);
>         mutex_unlock(&a6xx_gpu->gmu.lock);
>  }
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 86e0a7c3fe6d..ab853f61db63 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -77,7 +77,8 @@ void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>  int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>  void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>
> -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +                      bool suspended);
>  unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu);
>
>  void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 6def00883046..7ced1a30d4e8 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -68,7 +68,8 @@ struct msm_gpu_funcs {
>         struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu);
>         int (*gpu_state_put)(struct msm_gpu_state *state);
>         unsigned long (*gpu_get_freq)(struct msm_gpu *gpu);
> -       void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> +       void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +                            bool suspended);

nit, I suppose we should add a comment that gpu_set_freq()/gpu_busy()
should *not* call runpm get/put as they are called with lock held

Otherwise looks good to me.. kinda sad that we have to track the
suspended state behind runpm's back, but I don't really see any better
way.

Reviewed-by: Rob Clark <robdclark@gmail.com>

>         struct msm_gem_address_space *(*create_address_space)
>                 (struct msm_gpu *gpu, struct platform_device *pdev);
>         struct msm_gem_address_space *(*create_private_address_space)
> @@ -92,6 +93,9 @@ struct msm_gpu_devfreq {
>         /** devfreq: devfreq instance */
>         struct devfreq *devfreq;
>
> +       /** lock: lock for "suspended", "busy_cycles", and "time" */
> +       struct mutex lock;
> +
>         /**
>          * idle_constraint:
>          *
> @@ -135,6 +139,9 @@ struct msm_gpu_devfreq {
>          * elapsed
>          */
>         struct msm_hrtimer_work boost_work;
> +
> +       /** suspended: tracks if we're suspended */
> +       bool suspended;
>  };
>
>  struct msm_gpu {
> diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> index d2539ca78c29..ea94bc18e72e 100644
> --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> @@ -20,6 +20,7 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
>                 u32 flags)
>  {
>         struct msm_gpu *gpu = dev_to_gpu(dev);
> +       struct msm_gpu_devfreq *df = &gpu->devfreq;
>         struct dev_pm_opp *opp;
>
>         /*
> @@ -32,10 +33,13 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
>
>         trace_msm_gpu_freq_change(dev_pm_opp_get_freq(opp));
>
> -       if (gpu->funcs->gpu_set_freq)
> -               gpu->funcs->gpu_set_freq(gpu, opp);
> -       else
> +       if (gpu->funcs->gpu_set_freq) {
> +               mutex_lock(&df->lock);
> +               gpu->funcs->gpu_set_freq(gpu, opp, df->suspended);
> +               mutex_unlock(&df->lock);
> +       } else {
>                 clk_set_rate(gpu->core_clk, *freq);
> +       }
>
>         dev_pm_opp_put(opp);
>
> @@ -58,15 +62,24 @@ static void get_raw_dev_status(struct msm_gpu *gpu,
>         unsigned long sample_rate;
>         ktime_t time;
>
> +       mutex_lock(&df->lock);
> +
>         status->current_frequency = get_freq(gpu);
> -       busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
>         time = ktime_get();
> -
> -       busy_time = busy_cycles - df->busy_cycles;
>         status->total_time = ktime_us_delta(time, df->time);
> +       df->time = time;
>
> +       if (df->suspended) {
> +               mutex_unlock(&df->lock);
> +               status->busy_time = 0;
> +               return;
> +       }
> +
> +       busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
> +       busy_time = busy_cycles - df->busy_cycles;
>         df->busy_cycles = busy_cycles;
> -       df->time = time;
> +
> +       mutex_unlock(&df->lock);
>
>         busy_time *= USEC_PER_SEC;
>         do_div(busy_time, sample_rate);
> @@ -175,6 +188,8 @@ void msm_devfreq_init(struct msm_gpu *gpu)
>         if (!gpu->funcs->gpu_busy)
>                 return;
>
> +       mutex_init(&df->lock);
> +
>         dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq,
>                                DEV_PM_QOS_MAX_FREQUENCY,
>                                PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
> @@ -244,12 +259,16 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu)
>  void msm_devfreq_resume(struct msm_gpu *gpu)
>  {
>         struct msm_gpu_devfreq *df = &gpu->devfreq;
> +       unsigned long sample_rate;
>
>         if (!has_devfreq(gpu))
>                 return;
>
> -       df->busy_cycles = 0;
> +       mutex_lock(&df->lock);
> +       df->busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
>         df->time = ktime_get();
> +       df->suspended = false;
> +       mutex_unlock(&df->lock);
>
>         devfreq_resume_device(df->devfreq);
>  }
> @@ -261,6 +280,10 @@ void msm_devfreq_suspend(struct msm_gpu *gpu)
>         if (!has_devfreq(gpu))
>                 return;
>
> +       mutex_lock(&df->lock);
> +       df->suspended = true;
> +       mutex_unlock(&df->lock);
> +
>         devfreq_suspend_device(df->devfreq);
>
>         cancel_idle_work(df);
> --
> 2.36.1.476.g0c4daa206d-goog
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy
  2022-06-10  0:09 ` Douglas Anderson
@ 2022-06-14 14:42   ` Akhil P Oommen
  -1 siblings, 0 replies; 8+ messages in thread
From: Akhil P Oommen @ 2022-06-14 14:42 UTC (permalink / raw)
  To: Douglas Anderson, Rob Clark, Jordan Crouse
  Cc: Abhinav Kumar, Bjorn Andersson, Chia-I Wu, Dan Carpenter,
	Daniel Vetter, David Airlie, Dmitry Baryshkov, Eric Anholt,
	Jonathan Marek, Sean Paul, Viresh Kumar, Vladimir Lypak,
	Yangtao Li, dri-devel, freedreno, linux-arm-msm, linux-kernel

On 6/10/2022 5:39 AM, Douglas Anderson wrote:
> >From testing on sc7180-trogdor devices, reading the GMU registers
> needs the GMU clocks to be enabled. Those clocks get turned on in
> a6xx_gmu_resume(). Confusingly enough, that function is called as a
> result of the runtime_pm of the GPU "struct device", not the GMU
> "struct device". Unfortunately the current a6xx_gpu_busy() grabs a
> reference to the GMU's "struct device".
>
> The fact that we were grabbing the wrong reference was easily seen to
> cause crashes that happen if we change the GPU's pm_runtime usage to
> not use autosuspend. It's also believed to cause some long tail GPU
> crashes even with autosuspend.
>
> We could look at changing it so that we do pm_runtime_get_if_in_use()
> on the GPU's "struct device", but then we run into a different
> problem. pm_runtime_get_if_in_use() will return 0 for the GPU's
> "struct device" the whole time when we're in the "autosuspend
> delay". That is, when we drop the last reference to the GPU but we're
> waiting a period before actually suspending then we'll think the GPU
> is off. One reason that's bad is that if the GPU didn't actually turn
> off then the cycle counter doesn't lose state and that throws off all
> of our calculations.
>
> Let's change the code to keep track of the suspend state of
> devfreq. msm_devfreq_suspend() is always called before we actually
> suspend the GPU and msm_devfreq_resume() after we resume it. This
> means we can use the suspended state to know if we're powered or not.
>
> NOTE: one might wonder when exactly our status function is called when
> devfreq is supposed to be disabled. The stack crawl I captured was:
>    msm_devfreq_get_dev_status
>    devfreq_simple_ondemand_func
>    devfreq_update_target
>    qos_notifier_call
>    qos_max_notifier_call
>    blocking_notifier_call_chain
>    pm_qos_update_target
>    freq_qos_apply
>    apply_constraint
>    __dev_pm_qos_update_request
>    dev_pm_qos_update_request
>    msm_devfreq_idle_work
>
> Fixes: eadf79286a4b ("drm/msm: Check for powered down HW in the devfreq callbacks")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v3:
> - Totally rewrote to not use the pm_runtime functions.
> - Moved the code to be common for all adreno GPUs.
>
> Changes in v2:
> - Move the set_freq runtime pm grab to the GPU file.
> - Use <= for the pm_runtime test, not ==.
>
>   drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  8 ------
>   drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 13 ++++-----
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 +++------
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  3 ++-
>   drivers/gpu/drm/msm/msm_gpu.h         |  9 ++++++-
>   drivers/gpu/drm/msm/msm_gpu_devfreq.c | 39 +++++++++++++++++++++------
>   6 files changed, 51 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index c424e9a37669..3dcec7acb384 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1666,18 +1666,10 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
>   {
>   	u64 busy_cycles;
>   
> -	/* Only read the gpu busy if the hardware is already active */
> -	if (pm_runtime_get_if_in_use(&gpu->pdev->dev) == 0) {
> -		*out_sample_rate = 1;
> -		return 0;
> -	}
> -
>   	busy_cycles = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_RBBM_0_LO,
>   			REG_A5XX_RBBM_PERFCTR_RBBM_0_HI);
>   	*out_sample_rate = clk_get_rate(gpu->core_clk);
>   
> -	pm_runtime_put(&gpu->pdev->dev);
> -
>   	return busy_cycles;
>   }
>   
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 9f76f5b15759..dc715d88ff21 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -102,7 +102,8 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu)
>   		A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF));
>   }
>   
> -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +		       bool suspended)
>   {
>   	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>   	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> @@ -127,15 +128,16 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
>   
>   	/*
>   	 * This can get called from devfreq while the hardware is idle. Don't
> -	 * bring up the power if it isn't already active
> +	 * bring up the power if it isn't already active. All we're doing here
> +	 * is updating the frequency so that when we come back online we're at
> +	 * the right rate.
>   	 */
> -	if (pm_runtime_get_if_in_use(gmu->dev) == 0)
> +	if (suspended)
>   		return;
>   
>   	if (!gmu->legacy) {
>   		a6xx_hfi_set_freq(gmu, perf_index);
>   		dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> -		pm_runtime_put(gmu->dev);
>   		return;
>   	}
>   
> @@ -159,7 +161,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
>   		dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);
>   
>   	dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> -	pm_runtime_put(gmu->dev);
>   }
>   
>   unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu)
> @@ -895,7 +896,7 @@ static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
>   		return;
>   
>   	gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */
> -	a6xx_gmu_set_freq(gpu, gpu_opp);
> +	a6xx_gmu_set_freq(gpu, gpu_opp, false);
>   	dev_pm_opp_put(gpu_opp);
>   }
>   
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 42ed9a3c4905..8c02a67f29f2 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1658,27 +1658,21 @@ static u64 a6xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
>   	/* 19.2MHz */
>   	*out_sample_rate = 19200000;
>   
> -	/* Only read the gpu busy if the hardware is already active */
> -	if (pm_runtime_get_if_in_use(a6xx_gpu->gmu.dev) == 0)
> -		return 0;
> -
>   	busy_cycles = gmu_read64(&a6xx_gpu->gmu,
>   			REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_L,
>   			REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_H);
>   
> -
> -	pm_runtime_put(a6xx_gpu->gmu.dev);
> -
>   	return busy_cycles;
>   }
>   
> -static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> +static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +			      bool suspended)
>   {
>   	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>   	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>   
>   	mutex_lock(&a6xx_gpu->gmu.lock);
> -	a6xx_gmu_set_freq(gpu, opp);
> +	a6xx_gmu_set_freq(gpu, opp, suspended);
>   	mutex_unlock(&a6xx_gpu->gmu.lock);
>   }
>   
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 86e0a7c3fe6d..ab853f61db63 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -77,7 +77,8 @@ void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>   int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>   void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>   
> -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +		       bool suspended);
>   unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu);
>   
>   void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 6def00883046..7ced1a30d4e8 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -68,7 +68,8 @@ struct msm_gpu_funcs {
>   	struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu);
>   	int (*gpu_state_put)(struct msm_gpu_state *state);
>   	unsigned long (*gpu_get_freq)(struct msm_gpu *gpu);
> -	void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> +	void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +			     bool suspended);
>   	struct msm_gem_address_space *(*create_address_space)
>   		(struct msm_gpu *gpu, struct platform_device *pdev);
>   	struct msm_gem_address_space *(*create_private_address_space)
> @@ -92,6 +93,9 @@ struct msm_gpu_devfreq {
>   	/** devfreq: devfreq instance */
>   	struct devfreq *devfreq;
>   
> +	/** lock: lock for "suspended", "busy_cycles", and "time" */
> +	struct mutex lock;
> +
>   	/**
>   	 * idle_constraint:
>   	 *
> @@ -135,6 +139,9 @@ struct msm_gpu_devfreq {
>   	 * elapsed
>   	 */
>   	struct msm_hrtimer_work boost_work;
> +
> +	/** suspended: tracks if we're suspended */
> +	bool suspended;
>   };
>   
>   struct msm_gpu {
> diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> index d2539ca78c29..ea94bc18e72e 100644
> --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> @@ -20,6 +20,7 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
>   		u32 flags)
>   {
>   	struct msm_gpu *gpu = dev_to_gpu(dev);
> +	struct msm_gpu_devfreq *df = &gpu->devfreq;
>   	struct dev_pm_opp *opp;
>   
>   	/*
> @@ -32,10 +33,13 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
>   
>   	trace_msm_gpu_freq_change(dev_pm_opp_get_freq(opp));
>   
> -	if (gpu->funcs->gpu_set_freq)
> -		gpu->funcs->gpu_set_freq(gpu, opp);
> -	else
> +	if (gpu->funcs->gpu_set_freq) {
> +		mutex_lock(&df->lock);
> +		gpu->funcs->gpu_set_freq(gpu, opp, df->suspended);
> +		mutex_unlock(&df->lock);
> +	} else {
>   		clk_set_rate(gpu->core_clk, *freq);
> +	}
>   
>   	dev_pm_opp_put(opp);
>   
> @@ -58,15 +62,24 @@ static void get_raw_dev_status(struct msm_gpu *gpu,
>   	unsigned long sample_rate;
>   	ktime_t time;
>   
> +	mutex_lock(&df->lock);
> +
>   	status->current_frequency = get_freq(gpu);
> -	busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
>   	time = ktime_get();
> -
> -	busy_time = busy_cycles - df->busy_cycles;
>   	status->total_time = ktime_us_delta(time, df->time);
> +	df->time = time;
>   
> +	if (df->suspended) {
> +		mutex_unlock(&df->lock);
> +		status->busy_time = 0;
> +		return;
> +	}
> +
> +	busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
> +	busy_time = busy_cycles - df->busy_cycles;
>   	df->busy_cycles = busy_cycles;
> -	df->time = time;
> +
> +	mutex_unlock(&df->lock);
>   
>   	busy_time *= USEC_PER_SEC;
>   	do_div(busy_time, sample_rate);
> @@ -175,6 +188,8 @@ void msm_devfreq_init(struct msm_gpu *gpu)
>   	if (!gpu->funcs->gpu_busy)
>   		return;
>   
> +	mutex_init(&df->lock);
> +
>   	dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq,
>   			       DEV_PM_QOS_MAX_FREQUENCY,
>   			       PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
> @@ -244,12 +259,16 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu)
>   void msm_devfreq_resume(struct msm_gpu *gpu)
>   {
>   	struct msm_gpu_devfreq *df = &gpu->devfreq;
> +	unsigned long sample_rate;
>   
>   	if (!has_devfreq(gpu))
>   		return;
>   
> -	df->busy_cycles = 0;
> +	mutex_lock(&df->lock);
> +	df->busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
>   	df->time = ktime_get();
> +	df->suspended = false;
> +	mutex_unlock(&df->lock);
>   
>   	devfreq_resume_device(df->devfreq);
>   }
> @@ -261,6 +280,10 @@ void msm_devfreq_suspend(struct msm_gpu *gpu)
>   	if (!has_devfreq(gpu))
>   		return;
>   
> +	mutex_lock(&df->lock);
> +	df->suspended = true;
> +	mutex_unlock(&df->lock);
> +
>   	devfreq_suspend_device(df->devfreq);
>   
>   	cancel_idle_work(df);

nit: in the commit subject: 6xx -> a6xx

Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>


-Akhil.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy
@ 2022-06-14 14:42   ` Akhil P Oommen
  0 siblings, 0 replies; 8+ messages in thread
From: Akhil P Oommen @ 2022-06-14 14:42 UTC (permalink / raw)
  To: Douglas Anderson, Rob Clark, Jordan Crouse
  Cc: freedreno, linux-kernel, Jonathan Marek, David Airlie,
	Viresh Kumar, Yangtao Li, Vladimir Lypak, Abhinav Kumar,
	dri-devel, Bjorn Andersson, Eric Anholt, linux-arm-msm,
	Dmitry Baryshkov, Sean Paul, Dan Carpenter

On 6/10/2022 5:39 AM, Douglas Anderson wrote:
> >From testing on sc7180-trogdor devices, reading the GMU registers
> needs the GMU clocks to be enabled. Those clocks get turned on in
> a6xx_gmu_resume(). Confusingly enough, that function is called as a
> result of the runtime_pm of the GPU "struct device", not the GMU
> "struct device". Unfortunately the current a6xx_gpu_busy() grabs a
> reference to the GMU's "struct device".
>
> The fact that we were grabbing the wrong reference was easily seen to
> cause crashes that happen if we change the GPU's pm_runtime usage to
> not use autosuspend. It's also believed to cause some long tail GPU
> crashes even with autosuspend.
>
> We could look at changing it so that we do pm_runtime_get_if_in_use()
> on the GPU's "struct device", but then we run into a different
> problem. pm_runtime_get_if_in_use() will return 0 for the GPU's
> "struct device" the whole time when we're in the "autosuspend
> delay". That is, when we drop the last reference to the GPU but we're
> waiting a period before actually suspending then we'll think the GPU
> is off. One reason that's bad is that if the GPU didn't actually turn
> off then the cycle counter doesn't lose state and that throws off all
> of our calculations.
>
> Let's change the code to keep track of the suspend state of
> devfreq. msm_devfreq_suspend() is always called before we actually
> suspend the GPU and msm_devfreq_resume() after we resume it. This
> means we can use the suspended state to know if we're powered or not.
>
> NOTE: one might wonder when exactly our status function is called when
> devfreq is supposed to be disabled. The stack crawl I captured was:
>    msm_devfreq_get_dev_status
>    devfreq_simple_ondemand_func
>    devfreq_update_target
>    qos_notifier_call
>    qos_max_notifier_call
>    blocking_notifier_call_chain
>    pm_qos_update_target
>    freq_qos_apply
>    apply_constraint
>    __dev_pm_qos_update_request
>    dev_pm_qos_update_request
>    msm_devfreq_idle_work
>
> Fixes: eadf79286a4b ("drm/msm: Check for powered down HW in the devfreq callbacks")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v3:
> - Totally rewrote to not use the pm_runtime functions.
> - Moved the code to be common for all adreno GPUs.
>
> Changes in v2:
> - Move the set_freq runtime pm grab to the GPU file.
> - Use <= for the pm_runtime test, not ==.
>
>   drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  8 ------
>   drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 13 ++++-----
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 +++------
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  3 ++-
>   drivers/gpu/drm/msm/msm_gpu.h         |  9 ++++++-
>   drivers/gpu/drm/msm/msm_gpu_devfreq.c | 39 +++++++++++++++++++++------
>   6 files changed, 51 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index c424e9a37669..3dcec7acb384 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -1666,18 +1666,10 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
>   {
>   	u64 busy_cycles;
>   
> -	/* Only read the gpu busy if the hardware is already active */
> -	if (pm_runtime_get_if_in_use(&gpu->pdev->dev) == 0) {
> -		*out_sample_rate = 1;
> -		return 0;
> -	}
> -
>   	busy_cycles = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_RBBM_0_LO,
>   			REG_A5XX_RBBM_PERFCTR_RBBM_0_HI);
>   	*out_sample_rate = clk_get_rate(gpu->core_clk);
>   
> -	pm_runtime_put(&gpu->pdev->dev);
> -
>   	return busy_cycles;
>   }
>   
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> index 9f76f5b15759..dc715d88ff21 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> @@ -102,7 +102,8 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu)
>   		A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF));
>   }
>   
> -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +		       bool suspended)
>   {
>   	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>   	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> @@ -127,15 +128,16 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
>   
>   	/*
>   	 * This can get called from devfreq while the hardware is idle. Don't
> -	 * bring up the power if it isn't already active
> +	 * bring up the power if it isn't already active. All we're doing here
> +	 * is updating the frequency so that when we come back online we're at
> +	 * the right rate.
>   	 */
> -	if (pm_runtime_get_if_in_use(gmu->dev) == 0)
> +	if (suspended)
>   		return;
>   
>   	if (!gmu->legacy) {
>   		a6xx_hfi_set_freq(gmu, perf_index);
>   		dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> -		pm_runtime_put(gmu->dev);
>   		return;
>   	}
>   
> @@ -159,7 +161,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
>   		dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);
>   
>   	dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> -	pm_runtime_put(gmu->dev);
>   }
>   
>   unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu)
> @@ -895,7 +896,7 @@ static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
>   		return;
>   
>   	gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */
> -	a6xx_gmu_set_freq(gpu, gpu_opp);
> +	a6xx_gmu_set_freq(gpu, gpu_opp, false);
>   	dev_pm_opp_put(gpu_opp);
>   }
>   
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 42ed9a3c4905..8c02a67f29f2 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1658,27 +1658,21 @@ static u64 a6xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
>   	/* 19.2MHz */
>   	*out_sample_rate = 19200000;
>   
> -	/* Only read the gpu busy if the hardware is already active */
> -	if (pm_runtime_get_if_in_use(a6xx_gpu->gmu.dev) == 0)
> -		return 0;
> -
>   	busy_cycles = gmu_read64(&a6xx_gpu->gmu,
>   			REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_L,
>   			REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_H);
>   
> -
> -	pm_runtime_put(a6xx_gpu->gmu.dev);
> -
>   	return busy_cycles;
>   }
>   
> -static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> +static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +			      bool suspended)
>   {
>   	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>   	struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
>   
>   	mutex_lock(&a6xx_gpu->gmu.lock);
> -	a6xx_gmu_set_freq(gpu, opp);
> +	a6xx_gmu_set_freq(gpu, opp, suspended);
>   	mutex_unlock(&a6xx_gpu->gmu.lock);
>   }
>   
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> index 86e0a7c3fe6d..ab853f61db63 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> @@ -77,7 +77,8 @@ void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
>   int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
>   void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
>   
> -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +		       bool suspended);
>   unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu);
>   
>   void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 6def00883046..7ced1a30d4e8 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -68,7 +68,8 @@ struct msm_gpu_funcs {
>   	struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu);
>   	int (*gpu_state_put)(struct msm_gpu_state *state);
>   	unsigned long (*gpu_get_freq)(struct msm_gpu *gpu);
> -	void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> +	void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> +			     bool suspended);
>   	struct msm_gem_address_space *(*create_address_space)
>   		(struct msm_gpu *gpu, struct platform_device *pdev);
>   	struct msm_gem_address_space *(*create_private_address_space)
> @@ -92,6 +93,9 @@ struct msm_gpu_devfreq {
>   	/** devfreq: devfreq instance */
>   	struct devfreq *devfreq;
>   
> +	/** lock: lock for "suspended", "busy_cycles", and "time" */
> +	struct mutex lock;
> +
>   	/**
>   	 * idle_constraint:
>   	 *
> @@ -135,6 +139,9 @@ struct msm_gpu_devfreq {
>   	 * elapsed
>   	 */
>   	struct msm_hrtimer_work boost_work;
> +
> +	/** suspended: tracks if we're suspended */
> +	bool suspended;
>   };
>   
>   struct msm_gpu {
> diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> index d2539ca78c29..ea94bc18e72e 100644
> --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> @@ -20,6 +20,7 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
>   		u32 flags)
>   {
>   	struct msm_gpu *gpu = dev_to_gpu(dev);
> +	struct msm_gpu_devfreq *df = &gpu->devfreq;
>   	struct dev_pm_opp *opp;
>   
>   	/*
> @@ -32,10 +33,13 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
>   
>   	trace_msm_gpu_freq_change(dev_pm_opp_get_freq(opp));
>   
> -	if (gpu->funcs->gpu_set_freq)
> -		gpu->funcs->gpu_set_freq(gpu, opp);
> -	else
> +	if (gpu->funcs->gpu_set_freq) {
> +		mutex_lock(&df->lock);
> +		gpu->funcs->gpu_set_freq(gpu, opp, df->suspended);
> +		mutex_unlock(&df->lock);
> +	} else {
>   		clk_set_rate(gpu->core_clk, *freq);
> +	}
>   
>   	dev_pm_opp_put(opp);
>   
> @@ -58,15 +62,24 @@ static void get_raw_dev_status(struct msm_gpu *gpu,
>   	unsigned long sample_rate;
>   	ktime_t time;
>   
> +	mutex_lock(&df->lock);
> +
>   	status->current_frequency = get_freq(gpu);
> -	busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
>   	time = ktime_get();
> -
> -	busy_time = busy_cycles - df->busy_cycles;
>   	status->total_time = ktime_us_delta(time, df->time);
> +	df->time = time;
>   
> +	if (df->suspended) {
> +		mutex_unlock(&df->lock);
> +		status->busy_time = 0;
> +		return;
> +	}
> +
> +	busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
> +	busy_time = busy_cycles - df->busy_cycles;
>   	df->busy_cycles = busy_cycles;
> -	df->time = time;
> +
> +	mutex_unlock(&df->lock);
>   
>   	busy_time *= USEC_PER_SEC;
>   	do_div(busy_time, sample_rate);
> @@ -175,6 +188,8 @@ void msm_devfreq_init(struct msm_gpu *gpu)
>   	if (!gpu->funcs->gpu_busy)
>   		return;
>   
> +	mutex_init(&df->lock);
> +
>   	dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq,
>   			       DEV_PM_QOS_MAX_FREQUENCY,
>   			       PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
> @@ -244,12 +259,16 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu)
>   void msm_devfreq_resume(struct msm_gpu *gpu)
>   {
>   	struct msm_gpu_devfreq *df = &gpu->devfreq;
> +	unsigned long sample_rate;
>   
>   	if (!has_devfreq(gpu))
>   		return;
>   
> -	df->busy_cycles = 0;
> +	mutex_lock(&df->lock);
> +	df->busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
>   	df->time = ktime_get();
> +	df->suspended = false;
> +	mutex_unlock(&df->lock);
>   
>   	devfreq_resume_device(df->devfreq);
>   }
> @@ -261,6 +280,10 @@ void msm_devfreq_suspend(struct msm_gpu *gpu)
>   	if (!has_devfreq(gpu))
>   		return;
>   
> +	mutex_lock(&df->lock);
> +	df->suspended = true;
> +	mutex_unlock(&df->lock);
> +
>   	devfreq_suspend_device(df->devfreq);
>   
>   	cancel_idle_work(df);

nit: in the commit subject: 6xx -> a6xx

Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>


-Akhil.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy
  2022-06-14 14:42   ` Akhil P Oommen
@ 2022-06-14 14:45     ` Doug Anderson
  -1 siblings, 0 replies; 8+ messages in thread
From: Doug Anderson @ 2022-06-14 14:45 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: Rob Clark, Jordan Crouse, Abhinav Kumar, Bjorn Andersson,
	Chia-I Wu, Dan Carpenter, Daniel Vetter, David Airlie,
	Dmitry Baryshkov, Eric Anholt, Jonathan Marek, Sean Paul,
	Viresh Kumar, Vladimir Lypak, Yangtao Li, dri-devel, freedreno,
	linux-arm-msm, LKML

Hi,

On Tue, Jun 14, 2022 at 7:42 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> On 6/10/2022 5:39 AM, Douglas Anderson wrote:
> > >From testing on sc7180-trogdor devices, reading the GMU registers
> > needs the GMU clocks to be enabled. Those clocks get turned on in
> > a6xx_gmu_resume(). Confusingly enough, that function is called as a
> > result of the runtime_pm of the GPU "struct device", not the GMU
> > "struct device". Unfortunately the current a6xx_gpu_busy() grabs a
> > reference to the GMU's "struct device".
> >
> > The fact that we were grabbing the wrong reference was easily seen to
> > cause crashes that happen if we change the GPU's pm_runtime usage to
> > not use autosuspend. It's also believed to cause some long tail GPU
> > crashes even with autosuspend.
> >
> > We could look at changing it so that we do pm_runtime_get_if_in_use()
> > on the GPU's "struct device", but then we run into a different
> > problem. pm_runtime_get_if_in_use() will return 0 for the GPU's
> > "struct device" the whole time when we're in the "autosuspend
> > delay". That is, when we drop the last reference to the GPU but we're
> > waiting a period before actually suspending then we'll think the GPU
> > is off. One reason that's bad is that if the GPU didn't actually turn
> > off then the cycle counter doesn't lose state and that throws off all
> > of our calculations.
> >
> > Let's change the code to keep track of the suspend state of
> > devfreq. msm_devfreq_suspend() is always called before we actually
> > suspend the GPU and msm_devfreq_resume() after we resume it. This
> > means we can use the suspended state to know if we're powered or not.
> >
> > NOTE: one might wonder when exactly our status function is called when
> > devfreq is supposed to be disabled. The stack crawl I captured was:
> >    msm_devfreq_get_dev_status
> >    devfreq_simple_ondemand_func
> >    devfreq_update_target
> >    qos_notifier_call
> >    qos_max_notifier_call
> >    blocking_notifier_call_chain
> >    pm_qos_update_target
> >    freq_qos_apply
> >    apply_constraint
> >    __dev_pm_qos_update_request
> >    dev_pm_qos_update_request
> >    msm_devfreq_idle_work
> >
> > Fixes: eadf79286a4b ("drm/msm: Check for powered down HW in the devfreq callbacks")
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> > Changes in v3:
> > - Totally rewrote to not use the pm_runtime functions.
> > - Moved the code to be common for all adreno GPUs.
> >
> > Changes in v2:
> > - Move the set_freq runtime pm grab to the GPU file.
> > - Use <= for the pm_runtime test, not ==.
> >
> >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  8 ------
> >   drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 13 ++++-----
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 +++------
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  3 ++-
> >   drivers/gpu/drm/msm/msm_gpu.h         |  9 ++++++-
> >   drivers/gpu/drm/msm/msm_gpu_devfreq.c | 39 +++++++++++++++++++++------
> >   6 files changed, 51 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > index c424e9a37669..3dcec7acb384 100644
> > --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > @@ -1666,18 +1666,10 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
> >   {
> >       u64 busy_cycles;
> >
> > -     /* Only read the gpu busy if the hardware is already active */
> > -     if (pm_runtime_get_if_in_use(&gpu->pdev->dev) == 0) {
> > -             *out_sample_rate = 1;
> > -             return 0;
> > -     }
> > -
> >       busy_cycles = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_RBBM_0_LO,
> >                       REG_A5XX_RBBM_PERFCTR_RBBM_0_HI);
> >       *out_sample_rate = clk_get_rate(gpu->core_clk);
> >
> > -     pm_runtime_put(&gpu->pdev->dev);
> > -
> >       return busy_cycles;
> >   }
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > index 9f76f5b15759..dc715d88ff21 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -102,7 +102,8 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu)
> >               A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF));
> >   }
> >
> > -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> > +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> > +                    bool suspended)
> >   {
> >       struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >       struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > @@ -127,15 +128,16 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> >
> >       /*
> >        * This can get called from devfreq while the hardware is idle. Don't
> > -      * bring up the power if it isn't already active
> > +      * bring up the power if it isn't already active. All we're doing here
> > +      * is updating the frequency so that when we come back online we're at
> > +      * the right rate.
> >        */
> > -     if (pm_runtime_get_if_in_use(gmu->dev) == 0)
> > +     if (suspended)
> >               return;
> >
> >       if (!gmu->legacy) {
> >               a6xx_hfi_set_freq(gmu, perf_index);
> >               dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> > -             pm_runtime_put(gmu->dev);
> >               return;
> >       }
> >
> > @@ -159,7 +161,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> >               dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);
> >
> >       dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> > -     pm_runtime_put(gmu->dev);
> >   }
> >
> >   unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu)
> > @@ -895,7 +896,7 @@ static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
> >               return;
> >
> >       gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */
> > -     a6xx_gmu_set_freq(gpu, gpu_opp);
> > +     a6xx_gmu_set_freq(gpu, gpu_opp, false);
> >       dev_pm_opp_put(gpu_opp);
> >   }
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 42ed9a3c4905..8c02a67f29f2 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1658,27 +1658,21 @@ static u64 a6xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
> >       /* 19.2MHz */
> >       *out_sample_rate = 19200000;
> >
> > -     /* Only read the gpu busy if the hardware is already active */
> > -     if (pm_runtime_get_if_in_use(a6xx_gpu->gmu.dev) == 0)
> > -             return 0;
> > -
> >       busy_cycles = gmu_read64(&a6xx_gpu->gmu,
> >                       REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_L,
> >                       REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_H);
> >
> > -
> > -     pm_runtime_put(a6xx_gpu->gmu.dev);
> > -
> >       return busy_cycles;
> >   }
> >
> > -static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> > +static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> > +                           bool suspended)
> >   {
> >       struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >       struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >
> >       mutex_lock(&a6xx_gpu->gmu.lock);
> > -     a6xx_gmu_set_freq(gpu, opp);
> > +     a6xx_gmu_set_freq(gpu, opp, suspended);
> >       mutex_unlock(&a6xx_gpu->gmu.lock);
> >   }
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > index 86e0a7c3fe6d..ab853f61db63 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > @@ -77,7 +77,8 @@ void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >   int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >   void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
> >
> > -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> > +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> > +                    bool suspended);
> >   unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu);
> >
> >   void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
> > diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> > index 6def00883046..7ced1a30d4e8 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu.h
> > +++ b/drivers/gpu/drm/msm/msm_gpu.h
> > @@ -68,7 +68,8 @@ struct msm_gpu_funcs {
> >       struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu);
> >       int (*gpu_state_put)(struct msm_gpu_state *state);
> >       unsigned long (*gpu_get_freq)(struct msm_gpu *gpu);
> > -     void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> > +     void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> > +                          bool suspended);
> >       struct msm_gem_address_space *(*create_address_space)
> >               (struct msm_gpu *gpu, struct platform_device *pdev);
> >       struct msm_gem_address_space *(*create_private_address_space)
> > @@ -92,6 +93,9 @@ struct msm_gpu_devfreq {
> >       /** devfreq: devfreq instance */
> >       struct devfreq *devfreq;
> >
> > +     /** lock: lock for "suspended", "busy_cycles", and "time" */
> > +     struct mutex lock;
> > +
> >       /**
> >        * idle_constraint:
> >        *
> > @@ -135,6 +139,9 @@ struct msm_gpu_devfreq {
> >        * elapsed
> >        */
> >       struct msm_hrtimer_work boost_work;
> > +
> > +     /** suspended: tracks if we're suspended */
> > +     bool suspended;
> >   };
> >
> >   struct msm_gpu {
> > diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > index d2539ca78c29..ea94bc18e72e 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > @@ -20,6 +20,7 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
> >               u32 flags)
> >   {
> >       struct msm_gpu *gpu = dev_to_gpu(dev);
> > +     struct msm_gpu_devfreq *df = &gpu->devfreq;
> >       struct dev_pm_opp *opp;
> >
> >       /*
> > @@ -32,10 +33,13 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
> >
> >       trace_msm_gpu_freq_change(dev_pm_opp_get_freq(opp));
> >
> > -     if (gpu->funcs->gpu_set_freq)
> > -             gpu->funcs->gpu_set_freq(gpu, opp);
> > -     else
> > +     if (gpu->funcs->gpu_set_freq) {
> > +             mutex_lock(&df->lock);
> > +             gpu->funcs->gpu_set_freq(gpu, opp, df->suspended);
> > +             mutex_unlock(&df->lock);
> > +     } else {
> >               clk_set_rate(gpu->core_clk, *freq);
> > +     }
> >
> >       dev_pm_opp_put(opp);
> >
> > @@ -58,15 +62,24 @@ static void get_raw_dev_status(struct msm_gpu *gpu,
> >       unsigned long sample_rate;
> >       ktime_t time;
> >
> > +     mutex_lock(&df->lock);
> > +
> >       status->current_frequency = get_freq(gpu);
> > -     busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
> >       time = ktime_get();
> > -
> > -     busy_time = busy_cycles - df->busy_cycles;
> >       status->total_time = ktime_us_delta(time, df->time);
> > +     df->time = time;
> >
> > +     if (df->suspended) {
> > +             mutex_unlock(&df->lock);
> > +             status->busy_time = 0;
> > +             return;
> > +     }
> > +
> > +     busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
> > +     busy_time = busy_cycles - df->busy_cycles;
> >       df->busy_cycles = busy_cycles;
> > -     df->time = time;
> > +
> > +     mutex_unlock(&df->lock);
> >
> >       busy_time *= USEC_PER_SEC;
> >       do_div(busy_time, sample_rate);
> > @@ -175,6 +188,8 @@ void msm_devfreq_init(struct msm_gpu *gpu)
> >       if (!gpu->funcs->gpu_busy)
> >               return;
> >
> > +     mutex_init(&df->lock);
> > +
> >       dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq,
> >                              DEV_PM_QOS_MAX_FREQUENCY,
> >                              PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
> > @@ -244,12 +259,16 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu)
> >   void msm_devfreq_resume(struct msm_gpu *gpu)
> >   {
> >       struct msm_gpu_devfreq *df = &gpu->devfreq;
> > +     unsigned long sample_rate;
> >
> >       if (!has_devfreq(gpu))
> >               return;
> >
> > -     df->busy_cycles = 0;
> > +     mutex_lock(&df->lock);
> > +     df->busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
> >       df->time = ktime_get();
> > +     df->suspended = false;
> > +     mutex_unlock(&df->lock);
> >
> >       devfreq_resume_device(df->devfreq);
> >   }
> > @@ -261,6 +280,10 @@ void msm_devfreq_suspend(struct msm_gpu *gpu)
> >       if (!has_devfreq(gpu))
> >               return;
> >
> > +     mutex_lock(&df->lock);
> > +     df->suspended = true;
> > +     mutex_unlock(&df->lock);
> > +
> >       devfreq_suspend_device(df->devfreq);
> >
> >       cancel_idle_work(df);
>
> nit: in the commit subject: 6xx -> a6xx
>
> Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>

Thanks!

There's actually a v4 of this:

https://lore.kernel.org/r/20220610124639.v4.1.Ie846c5352bc307ee4248d7cab998ab3016b85d06@changeid

I'll send a v5 that adds your Reviewed-by tag and fixes the subject.

-Doug

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy
@ 2022-06-14 14:45     ` Doug Anderson
  0 siblings, 0 replies; 8+ messages in thread
From: Doug Anderson @ 2022-06-14 14:45 UTC (permalink / raw)
  To: Akhil P Oommen
  Cc: freedreno, LKML, Jonathan Marek, David Airlie, Viresh Kumar,
	Yangtao Li, Vladimir Lypak, Abhinav Kumar, dri-devel,
	Bjorn Andersson, Eric Anholt, linux-arm-msm, Dmitry Baryshkov,
	Jordan Crouse, Sean Paul, Dan Carpenter

Hi,

On Tue, Jun 14, 2022 at 7:42 AM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>
> On 6/10/2022 5:39 AM, Douglas Anderson wrote:
> > >From testing on sc7180-trogdor devices, reading the GMU registers
> > needs the GMU clocks to be enabled. Those clocks get turned on in
> > a6xx_gmu_resume(). Confusingly enough, that function is called as a
> > result of the runtime_pm of the GPU "struct device", not the GMU
> > "struct device". Unfortunately the current a6xx_gpu_busy() grabs a
> > reference to the GMU's "struct device".
> >
> > The fact that we were grabbing the wrong reference was easily seen to
> > cause crashes that happen if we change the GPU's pm_runtime usage to
> > not use autosuspend. It's also believed to cause some long tail GPU
> > crashes even with autosuspend.
> >
> > We could look at changing it so that we do pm_runtime_get_if_in_use()
> > on the GPU's "struct device", but then we run into a different
> > problem. pm_runtime_get_if_in_use() will return 0 for the GPU's
> > "struct device" the whole time when we're in the "autosuspend
> > delay". That is, when we drop the last reference to the GPU but we're
> > waiting a period before actually suspending then we'll think the GPU
> > is off. One reason that's bad is that if the GPU didn't actually turn
> > off then the cycle counter doesn't lose state and that throws off all
> > of our calculations.
> >
> > Let's change the code to keep track of the suspend state of
> > devfreq. msm_devfreq_suspend() is always called before we actually
> > suspend the GPU and msm_devfreq_resume() after we resume it. This
> > means we can use the suspended state to know if we're powered or not.
> >
> > NOTE: one might wonder when exactly our status function is called when
> > devfreq is supposed to be disabled. The stack crawl I captured was:
> >    msm_devfreq_get_dev_status
> >    devfreq_simple_ondemand_func
> >    devfreq_update_target
> >    qos_notifier_call
> >    qos_max_notifier_call
> >    blocking_notifier_call_chain
> >    pm_qos_update_target
> >    freq_qos_apply
> >    apply_constraint
> >    __dev_pm_qos_update_request
> >    dev_pm_qos_update_request
> >    msm_devfreq_idle_work
> >
> > Fixes: eadf79286a4b ("drm/msm: Check for powered down HW in the devfreq callbacks")
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> > Changes in v3:
> > - Totally rewrote to not use the pm_runtime functions.
> > - Moved the code to be common for all adreno GPUs.
> >
> > Changes in v2:
> > - Move the set_freq runtime pm grab to the GPU file.
> > - Use <= for the pm_runtime test, not ==.
> >
> >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c |  8 ------
> >   drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 13 ++++-----
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 12 +++------
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  3 ++-
> >   drivers/gpu/drm/msm/msm_gpu.h         |  9 ++++++-
> >   drivers/gpu/drm/msm/msm_gpu_devfreq.c | 39 +++++++++++++++++++++------
> >   6 files changed, 51 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > index c424e9a37669..3dcec7acb384 100644
> > --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > @@ -1666,18 +1666,10 @@ static u64 a5xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
> >   {
> >       u64 busy_cycles;
> >
> > -     /* Only read the gpu busy if the hardware is already active */
> > -     if (pm_runtime_get_if_in_use(&gpu->pdev->dev) == 0) {
> > -             *out_sample_rate = 1;
> > -             return 0;
> > -     }
> > -
> >       busy_cycles = gpu_read64(gpu, REG_A5XX_RBBM_PERFCTR_RBBM_0_LO,
> >                       REG_A5XX_RBBM_PERFCTR_RBBM_0_HI);
> >       *out_sample_rate = clk_get_rate(gpu->core_clk);
> >
> > -     pm_runtime_put(&gpu->pdev->dev);
> > -
> >       return busy_cycles;
> >   }
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > index 9f76f5b15759..dc715d88ff21 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
> > @@ -102,7 +102,8 @@ bool a6xx_gmu_gx_is_on(struct a6xx_gmu *gmu)
> >               A6XX_GMU_SPTPRAC_PWR_CLK_STATUS_GX_HM_CLK_OFF));
> >   }
> >
> > -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> > +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> > +                    bool suspended)
> >   {
> >       struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >       struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> > @@ -127,15 +128,16 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> >
> >       /*
> >        * This can get called from devfreq while the hardware is idle. Don't
> > -      * bring up the power if it isn't already active
> > +      * bring up the power if it isn't already active. All we're doing here
> > +      * is updating the frequency so that when we come back online we're at
> > +      * the right rate.
> >        */
> > -     if (pm_runtime_get_if_in_use(gmu->dev) == 0)
> > +     if (suspended)
> >               return;
> >
> >       if (!gmu->legacy) {
> >               a6xx_hfi_set_freq(gmu, perf_index);
> >               dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> > -             pm_runtime_put(gmu->dev);
> >               return;
> >       }
> >
> > @@ -159,7 +161,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> >               dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);
> >
> >       dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
> > -     pm_runtime_put(gmu->dev);
> >   }
> >
> >   unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu)
> > @@ -895,7 +896,7 @@ static void a6xx_gmu_set_initial_freq(struct msm_gpu *gpu, struct a6xx_gmu *gmu)
> >               return;
> >
> >       gmu->freq = 0; /* so a6xx_gmu_set_freq() doesn't exit early */
> > -     a6xx_gmu_set_freq(gpu, gpu_opp);
> > +     a6xx_gmu_set_freq(gpu, gpu_opp, false);
> >       dev_pm_opp_put(gpu_opp);
> >   }
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 42ed9a3c4905..8c02a67f29f2 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1658,27 +1658,21 @@ static u64 a6xx_gpu_busy(struct msm_gpu *gpu, unsigned long *out_sample_rate)
> >       /* 19.2MHz */
> >       *out_sample_rate = 19200000;
> >
> > -     /* Only read the gpu busy if the hardware is already active */
> > -     if (pm_runtime_get_if_in_use(a6xx_gpu->gmu.dev) == 0)
> > -             return 0;
> > -
> >       busy_cycles = gmu_read64(&a6xx_gpu->gmu,
> >                       REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_L,
> >                       REG_A6XX_GMU_CX_GMU_POWER_COUNTER_XOCLK_0_H);
> >
> > -
> > -     pm_runtime_put(a6xx_gpu->gmu.dev);
> > -
> >       return busy_cycles;
> >   }
> >
> > -static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
> > +static void a6xx_gpu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> > +                           bool suspended)
> >   {
> >       struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >       struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
> >
> >       mutex_lock(&a6xx_gpu->gmu.lock);
> > -     a6xx_gmu_set_freq(gpu, opp);
> > +     a6xx_gmu_set_freq(gpu, opp, suspended);
> >       mutex_unlock(&a6xx_gpu->gmu.lock);
> >   }
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > index 86e0a7c3fe6d..ab853f61db63 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
> > @@ -77,7 +77,8 @@ void a6xx_gmu_clear_oob(struct a6xx_gmu *gmu, enum a6xx_gmu_oob_state state);
> >   int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node);
> >   void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu);
> >
> > -void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> > +void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> > +                    bool suspended);
> >   unsigned long a6xx_gmu_get_freq(struct msm_gpu *gpu);
> >
> >   void a6xx_show(struct msm_gpu *gpu, struct msm_gpu_state *state,
> > diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> > index 6def00883046..7ced1a30d4e8 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu.h
> > +++ b/drivers/gpu/drm/msm/msm_gpu.h
> > @@ -68,7 +68,8 @@ struct msm_gpu_funcs {
> >       struct msm_gpu_state *(*gpu_state_get)(struct msm_gpu *gpu);
> >       int (*gpu_state_put)(struct msm_gpu_state *state);
> >       unsigned long (*gpu_get_freq)(struct msm_gpu *gpu);
> > -     void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp);
> > +     void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp,
> > +                          bool suspended);
> >       struct msm_gem_address_space *(*create_address_space)
> >               (struct msm_gpu *gpu, struct platform_device *pdev);
> >       struct msm_gem_address_space *(*create_private_address_space)
> > @@ -92,6 +93,9 @@ struct msm_gpu_devfreq {
> >       /** devfreq: devfreq instance */
> >       struct devfreq *devfreq;
> >
> > +     /** lock: lock for "suspended", "busy_cycles", and "time" */
> > +     struct mutex lock;
> > +
> >       /**
> >        * idle_constraint:
> >        *
> > @@ -135,6 +139,9 @@ struct msm_gpu_devfreq {
> >        * elapsed
> >        */
> >       struct msm_hrtimer_work boost_work;
> > +
> > +     /** suspended: tracks if we're suspended */
> > +     bool suspended;
> >   };
> >
> >   struct msm_gpu {
> > diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > index d2539ca78c29..ea94bc18e72e 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c
> > @@ -20,6 +20,7 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
> >               u32 flags)
> >   {
> >       struct msm_gpu *gpu = dev_to_gpu(dev);
> > +     struct msm_gpu_devfreq *df = &gpu->devfreq;
> >       struct dev_pm_opp *opp;
> >
> >       /*
> > @@ -32,10 +33,13 @@ static int msm_devfreq_target(struct device *dev, unsigned long *freq,
> >
> >       trace_msm_gpu_freq_change(dev_pm_opp_get_freq(opp));
> >
> > -     if (gpu->funcs->gpu_set_freq)
> > -             gpu->funcs->gpu_set_freq(gpu, opp);
> > -     else
> > +     if (gpu->funcs->gpu_set_freq) {
> > +             mutex_lock(&df->lock);
> > +             gpu->funcs->gpu_set_freq(gpu, opp, df->suspended);
> > +             mutex_unlock(&df->lock);
> > +     } else {
> >               clk_set_rate(gpu->core_clk, *freq);
> > +     }
> >
> >       dev_pm_opp_put(opp);
> >
> > @@ -58,15 +62,24 @@ static void get_raw_dev_status(struct msm_gpu *gpu,
> >       unsigned long sample_rate;
> >       ktime_t time;
> >
> > +     mutex_lock(&df->lock);
> > +
> >       status->current_frequency = get_freq(gpu);
> > -     busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
> >       time = ktime_get();
> > -
> > -     busy_time = busy_cycles - df->busy_cycles;
> >       status->total_time = ktime_us_delta(time, df->time);
> > +     df->time = time;
> >
> > +     if (df->suspended) {
> > +             mutex_unlock(&df->lock);
> > +             status->busy_time = 0;
> > +             return;
> > +     }
> > +
> > +     busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
> > +     busy_time = busy_cycles - df->busy_cycles;
> >       df->busy_cycles = busy_cycles;
> > -     df->time = time;
> > +
> > +     mutex_unlock(&df->lock);
> >
> >       busy_time *= USEC_PER_SEC;
> >       do_div(busy_time, sample_rate);
> > @@ -175,6 +188,8 @@ void msm_devfreq_init(struct msm_gpu *gpu)
> >       if (!gpu->funcs->gpu_busy)
> >               return;
> >
> > +     mutex_init(&df->lock);
> > +
> >       dev_pm_qos_add_request(&gpu->pdev->dev, &df->idle_freq,
> >                              DEV_PM_QOS_MAX_FREQUENCY,
> >                              PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE);
> > @@ -244,12 +259,16 @@ void msm_devfreq_cleanup(struct msm_gpu *gpu)
> >   void msm_devfreq_resume(struct msm_gpu *gpu)
> >   {
> >       struct msm_gpu_devfreq *df = &gpu->devfreq;
> > +     unsigned long sample_rate;
> >
> >       if (!has_devfreq(gpu))
> >               return;
> >
> > -     df->busy_cycles = 0;
> > +     mutex_lock(&df->lock);
> > +     df->busy_cycles = gpu->funcs->gpu_busy(gpu, &sample_rate);
> >       df->time = ktime_get();
> > +     df->suspended = false;
> > +     mutex_unlock(&df->lock);
> >
> >       devfreq_resume_device(df->devfreq);
> >   }
> > @@ -261,6 +280,10 @@ void msm_devfreq_suspend(struct msm_gpu *gpu)
> >       if (!has_devfreq(gpu))
> >               return;
> >
> > +     mutex_lock(&df->lock);
> > +     df->suspended = true;
> > +     mutex_unlock(&df->lock);
> > +
> >       devfreq_suspend_device(df->devfreq);
> >
> >       cancel_idle_work(df);
>
> nit: in the commit subject: 6xx -> a6xx
>
> Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>

Thanks!

There's actually a v4 of this:

https://lore.kernel.org/r/20220610124639.v4.1.Ie846c5352bc307ee4248d7cab998ab3016b85d06@changeid

I'll send a v5 that adds your Reviewed-by tag and fixes the subject.

-Doug

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-06-14 14:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10  0:09 [PATCH v3] drm/msm: Avoid unclocked GMU register access in 6xx gpu_busy Douglas Anderson
2022-06-10  0:09 ` Douglas Anderson
2022-06-10 15:06 ` Rob Clark
2022-06-10 15:06   ` Rob Clark
2022-06-14 14:42 ` Akhil P Oommen
2022-06-14 14:42   ` Akhil P Oommen
2022-06-14 14:45   ` Doug Anderson
2022-06-14 14:45     ` Doug Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.