All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
@ 2022-03-07 15:30 Dmitry Baryshkov
  2022-03-07 15:30 ` [PATCH 2/4] cpufreq: qcom-hw: fix the race between LMH worker and cpuhp Dmitry Baryshkov
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Dmitry Baryshkov @ 2022-03-07 15:30 UTC (permalink / raw)
  To: Andy Gross, Bjorn Andersson, Rafael J. Wysocki, Viresh Kumar
  Cc: linux-arm-msm, linux-pm

Drop affinity hint before freeing the throttling IRQ to fix the
following trace:

[  185.114773] ------------[ cut here ]------------
[  185.119517] WARNING: CPU: 7 PID: 43 at kernel/irq/manage.c:1887 free_irq+0x3a4/0x3dc
[  185.127474] Modules linked in:
[  185.130618] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S      W         5.17.0-rc6-00386-g67382a5b705d-dirty #690
[  185.147125] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  185.154269] pc : free_irq+0x3a4/0x3dc
[  185.158031] lr : free_irq+0x33c/0x3dc
[  185.161792] sp : ffff80000841bc90
[  185.165195] x29: ffff80000841bc90 x28: ffffa6edc5c3d000 x27: ffff6d93729e5908
[  185.172515] x26: 0000000000000000 x25: ffff6d910109fc00 x24: ffff6d91011490e0
[  185.179838] x23: ffff6d9101149218 x22: 0000000000000080 x21: 0000000000000000
[  185.187163] x20: ffff6d9101149000 x19: ffff6d910ab61500 x18: ffffffffffffffff
[  185.194487] x17: 2e35202020202020 x16: 2020202020202020 x15: ffff80008841b9a7
[  185.201805] x14: 00000000000003c9 x13: 0000000000000001 x12: 0000000000000040
[  185.209135] x11: ffff6d91005aab58 x10: ffff6d91005aab5a x9 : ffffc6a5ad1c5408
[  185.216455] x8 : ffff6d91005adb88 x7 : 0000000000000000 x6 : ffffc6a5ab5a91f4
[  185.223776] x5 : 0000000000000000 x4 : ffff6d91011490a8 x3 : ffffc6a5ad266108
[  185.231098] x2 : 0000000013033204 x1 : ffff6d9101149000 x0 : ffff6d910a9cc000
[  185.238421] Call trace:
[  185.240932]  free_irq+0x3a4/0x3dc
[  185.244334]  qcom_cpufreq_hw_cpu_exit+0x78/0xcc
[  185.248985]  cpufreq_offline.isra.0+0x228/0x270
[  185.253639]  cpuhp_cpufreq_offline+0x10/0x20
[  185.258027]  cpuhp_invoke_callback+0x16c/0x2b0
[  185.262592]  cpuhp_thread_fun+0x190/0x250
[  185.266710]  smpboot_thread_fn+0x12c/0x230
[  185.270914]  kthread+0xfc/0x100
[  185.274145]  ret_from_fork+0x10/0x20
[  185.277820] irq event stamp: 212
[  185.281136] hardirqs last  enabled at (211): [<ffffc6a5ac57973c>] _raw_spin_unlock_irqrestore+0x8c/0xa0
[  185.290775] hardirqs last disabled at (212): [<ffffc6a5ac572100>] __schedule+0x710/0xa10
[  185.299081] softirqs last  enabled at (0): [<ffffc6a5ab50f7b0>] copy_process+0x7d0/0x1a14
[  185.307475] softirqs last disabled at (0): [<0000000000000000>] 0x0

Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
 drivers/cpufreq/qcom-cpufreq-hw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index effbb680b453..740518d8ae16 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -412,6 +412,7 @@ static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
 	mutex_unlock(&data->throttle_lock);
 
 	cancel_delayed_work_sync(&data->throttle_work);
+	irq_set_affinity_hint(data->throttle_irq, NULL);
 	free_irq(data->throttle_irq, data);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
  2022-03-07 15:30 [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ Dmitry Baryshkov
@ 2022-03-07 15:30 ` Dmitry Baryshkov
  2022-03-07 21:49   ` Bjorn Andersson
  2022-03-07 15:30 ` [PATCH 3/4] cpufreq: qcom-hw: fix the opp entries refcounting Dmitry Baryshkov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Dmitry Baryshkov @ 2022-03-07 15:30 UTC (permalink / raw)
  To: Andy Gross, Bjorn Andersson, Rafael J. Wysocki, Viresh Kumar
  Cc: linux-arm-msm, linux-pm

qcom_lmh_dcvs_poll() can be running when the cpu is being put offline.
This results in the following warning. The driver would disable the
worker, but it happens closer to the end of cpufreq_offline(). Change
the locking in the qcom_lmh_dcvs_poll(), so that the worker can not run
in parallel with cpufreq_offline() call.

[   37.122433] ------------[ cut here ]------------
[   37.127225] WARNING: CPU: 0 PID: 187 at drivers/base/arch_topology.c:180 topology_update_thermal_pressure+0xec/0x100
[   37.138098] Modules linked in:
[   37.141279] CPU: 0 PID: 187 Comm: kworker/0:3 Tainted: G S                5.17.0-rc6-00389-g37c83d0b8710-dirty #713
[   37.158306] Workqueue: events qcom_lmh_dcvs_poll
[   37.163095] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   37.170278] pc : topology_update_thermal_pressure+0xec/0x100
[   37.176131] lr : topology_update_thermal_pressure+0x20/0x100
[   37.181977] sp : ffff800009b6bce0
[   37.185402] x29: ffff800009b6bce0 x28: ffffd87abe92b000 x27: ffff04bd7292e205
[   37.192792] x26: ffffd87abe930af8 x25: ffffd87abe94e4c8 x24: 0000000000000000
[   37.200180] x23: ffff04bb01177018 x22: ffff04bb011770c0 x21: ffff04bb01177000
[   37.207567] x20: ffff04bb0a419000 x19: 00000000000c4e00 x18: 0000000000000000
[   37.214954] x17: 000000040044ffff x16: 004000b2b5503510 x15: 0000006aaa1326d2
[   37.222333] x14: 0000000000000232 x13: 0000000000000001 x12: 0000000000000040
[   37.229718] x11: ffff04bb00400000 x10: 968f57bd39f701c8 x9 : ffff04bb0acc8674
[   37.237095] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffffd87abd90092c
[   37.244478] x5 : 0000000000000016 x4 : 0000000000000000 x3 : 0000000000000100
[   37.251852] x2 : ffff04bb0a419020 x1 : 0000000000000100 x0 : 0000000000000100
[   37.259235] Call trace:
[   37.261771]  topology_update_thermal_pressure+0xec/0x100
[   37.267266]  qcom_lmh_dcvs_poll+0xbc/0x154
[   37.271505]  process_one_work+0x288/0x69c
[   37.275654]  worker_thread+0x74/0x470
[   37.279450]  kthread+0xfc/0x100
[   37.282712]  ret_from_fork+0x10/0x20
[   37.286417] irq event stamp: 74
[   37.289664] hardirqs last  enabled at (73): [<ffffd87abdd78af4>] _raw_spin_unlock_irq+0x44/0x80
[   37.298632] hardirqs last disabled at (74): [<ffffd87abdd71fc0>] __schedule+0x710/0xa10
[   37.306885] softirqs last  enabled at (58): [<ffffd87abcc90410>] _stext+0x410/0x588
[   37.314778] softirqs last disabled at (51): [<ffffd87abcd1bf68>] __irq_exit_rcu+0x158/0x174
[   37.323386] ---[ end trace 0000000000000000 ]---

Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
 drivers/cpufreq/qcom-cpufreq-hw.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 740518d8ae16..920c80d91c21 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -283,6 +283,23 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
 	struct dev_pm_opp *opp;
 	unsigned int freq;
 
+	/*
+	 * Synchronize against CPU going offline.
+	 * cpufreq_offline() will get the write lock on policy->rwsem.
+	 */
+retry:
+	if (unlikely(!down_read_trylock(&policy->rwsem))) {
+		mutex_lock(&data->throttle_lock);
+		if (data->cancel_throttle) {
+			mutex_unlock(&data->throttle_lock);
+			return;
+		}
+
+		mutex_unlock(&data->throttle_lock);
+
+		schedule();
+		goto retry;
+	}
 	/*
 	 * Get the h/w throttled frequency, normalize it using the
 	 * registered opp table and use it to calculate thermal pressure.
@@ -301,9 +318,10 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
 
 	/*
 	 * In the unlikely case policy is unregistered do not enable
-	 * polling or h/w interrupt
+	 * polling or h/w interrupt.
+	 * If we are here, we have the policy->rwsem read lock,
+	 * cancel_throttle can be toggled only with the write lock.
 	 */
-	mutex_lock(&data->throttle_lock);
 	if (data->cancel_throttle)
 		goto out;
 
@@ -318,7 +336,7 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
 				 msecs_to_jiffies(10));
 
 out:
-	mutex_unlock(&data->throttle_lock);
+	up_read(&policy->rwsem);
 }
 
 static void qcom_lmh_dcvs_poll(struct work_struct *work)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] cpufreq: qcom-hw: fix the opp entries refcounting
  2022-03-07 15:30 [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ Dmitry Baryshkov
  2022-03-07 15:30 ` [PATCH 2/4] cpufreq: qcom-hw: fix the race between LMH worker and cpuhp Dmitry Baryshkov
@ 2022-03-07 15:30 ` Dmitry Baryshkov
  2022-03-07 22:16   ` Bjorn Andersson
  2022-03-07 15:30 ` [PATCH 4/4] cpufreq: qcom-hw: provide online/offline operations Dmitry Baryshkov
  2022-03-07 21:51 ` [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ Bjorn Andersson
  3 siblings, 1 reply; 12+ messages in thread
From: Dmitry Baryshkov @ 2022-03-07 15:30 UTC (permalink / raw)
  To: Andy Gross, Bjorn Andersson, Rafael J. Wysocki, Viresh Kumar
  Cc: linux-arm-msm, linux-pm

The qcom_lmh_dcvs_notify() will get the dev_pm_opp instance for
throttling, but will not put it, ending up with leaking a reference
count and the following backtrace when putting the CPU offline.

Correctly put the reference count of the returned opp instance.

[   84.418025] ------------[ cut here ]------------
[   84.422770] WARNING: CPU: 7 PID: 43 at drivers/opp/core.c:1396 _opp_table_kref_release+0x188/0x190
[   84.431966] Modules linked in:
[   84.435106] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S                5.17.0-rc6-00388-g7cf3c0d89c44-dirty #721
[   84.451631] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
[   84.458781] pc : _opp_table_kref_release+0x188/0x190
[   84.463878] lr : _opp_table_kref_release+0x78/0x190
[   84.468885] sp : ffff80000841bc70
[   84.472294] x29: ffff80000841bc70 x28: ffff6664afe3d000 x27: ffff1db6729e5908
[   84.479621] x26: 0000000000000000 x25: 0000000000000000 x24: ffff1db6729e58e0
[   84.486946] x23: ffff8000080a5000 x22: ffff1db40aad80e0 x21: ffff1db4002fec80
[   84.494277] x20: ffff1db40aad8000 x19: ffffb751c3186300 x18: ffffffffffffffff
[   84.501603] x17: 5300326563697665 x16: 645f676e696c6f6f x15: 00001186c1df5448
[   84.508928] x14: 00000000000002e9 x13: 0000000000000000 x12: 0000000000000000
[   84.516256] x11: ffffb751c3186368 x10: ffffb751c39a2a70 x9 : 0000000000000000
[   84.523585] x8 : ffff1db4008edf00 x7 : ffffb751c328c000 x6 : 0000000000000001
[   84.530916] x5 : 0000000000040000 x4 : 0000000000000001 x3 : ffff1db4008edf00
[   84.538247] x2 : 0000000000000000 x1 : ffff1db400aa6100 x0 : ffff1db40aad80d0
[   84.545579] Call trace:
[   84.548101]  _opp_table_kref_release+0x188/0x190
[   84.552842]  dev_pm_opp_remove_all_dynamic+0x8c/0xc0
[   84.557949]  qcom_cpufreq_hw_cpu_exit+0x30/0xdc
[   84.562608]  cpufreq_offline.isra.0+0x1b4/0x1d8
[   84.567270]  cpuhp_cpufreq_offline+0x10/0x6c
[   84.571663]  cpuhp_invoke_callback+0x16c/0x2b0
[   84.576231]  cpuhp_thread_fun+0x190/0x250
[   84.580353]  smpboot_thread_fn+0x12c/0x230
[   84.584568]  kthread+0xfc/0x100
[   84.587810]  ret_from_fork+0x10/0x20
[   84.591490] irq event stamp: 3482
[   84.594901] hardirqs last  enabled at (3481): [<ffffb751c13c3db0>] call_rcu+0x39c/0x50c
[   84.603119] hardirqs last disabled at (3482): [<ffffb751c236b518>] el1_dbg+0x24/0x8c
[   84.611074] softirqs last  enabled at (310): [<ffffb751c1290410>] _stext+0x410/0x588
[   84.619028] softirqs last disabled at (305): [<ffffb751c131bf68>] __irq_exit_rcu+0x158/0x174
[   84.627691] ---[ end trace 0000000000000000 ]---

Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
 drivers/cpufreq/qcom-cpufreq-hw.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 920c80d91c21..580520215ee7 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -309,12 +309,16 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
 
 	opp = dev_pm_opp_find_freq_floor(dev, &freq_hz);
 	if (IS_ERR(opp) && PTR_ERR(opp) == -ERANGE)
-		dev_pm_opp_find_freq_ceil(dev, &freq_hz);
+		opp = dev_pm_opp_find_freq_ceil(dev, &freq_hz);
 
-	throttled_freq = freq_hz / HZ_PER_KHZ;
+	if (IS_ERR(opp)) {
+		dev_warn(dev, "Can't find the OPP for throttling: %pe!\n", opp);
+	} else {
+		/* Update thermal pressure (the boost frequencies are accepted) */
+		arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
 
-	/* Update thermal pressure (the boost frequencies are accepted) */
-	arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
+		dev_pm_opp_put(opp);
+	}
 
 	/*
 	 * In the unlikely case policy is unregistered do not enable
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] cpufreq: qcom-hw: provide online/offline operations
  2022-03-07 15:30 [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ Dmitry Baryshkov
  2022-03-07 15:30 ` [PATCH 2/4] cpufreq: qcom-hw: fix the race between LMH worker and cpuhp Dmitry Baryshkov
  2022-03-07 15:30 ` [PATCH 3/4] cpufreq: qcom-hw: fix the opp entries refcounting Dmitry Baryshkov
@ 2022-03-07 15:30 ` Dmitry Baryshkov
  2022-03-07 22:40   ` Bjorn Andersson
  2022-03-07 21:51 ` [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ Bjorn Andersson
  3 siblings, 1 reply; 12+ messages in thread
From: Dmitry Baryshkov @ 2022-03-07 15:30 UTC (permalink / raw)
  To: Andy Gross, Bjorn Andersson, Rafael J. Wysocki, Viresh Kumar
  Cc: linux-arm-msm, linux-pm

Provide lightweight online and offline operations. This saves us from
parsing and tearing down the OPP tables each time the CPU is put online
or offline.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
 drivers/cpufreq/qcom-cpufreq-hw.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 580520215ee7..12b67f16b78f 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -424,10 +424,26 @@ static int qcom_cpufreq_hw_lmh_init(struct cpufreq_policy *policy, int index)
 	return 0;
 }
 
-static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
+static int qcom_cpufreq_hw_cpu_online(struct cpufreq_policy *policy)
 {
+	struct qcom_cpufreq_data *data = policy->driver_data;
+	struct platform_device *pdev = cpufreq_get_driver_data();
+	int ret;
+
+	ret = irq_set_affinity_hint(data->throttle_irq, policy->cpus);
+	if (ret)
+		dev_err(&pdev->dev, "Failed to set CPU affinity of %s[%d]\n",
+			data->irq_name, data->throttle_irq);
+
+	return ret;
+}
+
+static int qcom_cpufreq_hw_cpu_offline(struct cpufreq_policy *policy)
+{
+	struct qcom_cpufreq_data *data = policy->driver_data;
+
 	if (data->throttle_irq <= 0)
-		return;
+		return 0;
 
 	mutex_lock(&data->throttle_lock);
 	data->cancel_throttle = true;
@@ -435,6 +451,12 @@ static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
 
 	cancel_delayed_work_sync(&data->throttle_work);
 	irq_set_affinity_hint(data->throttle_irq, NULL);
+
+	return 0;
+}
+
+static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
+{
 	free_irq(data->throttle_irq, data);
 }
 
@@ -588,6 +610,8 @@ static struct cpufreq_driver cpufreq_qcom_hw_driver = {
 	.get		= qcom_cpufreq_hw_get,
 	.init		= qcom_cpufreq_hw_cpu_init,
 	.exit		= qcom_cpufreq_hw_cpu_exit,
+	.online		= qcom_cpufreq_hw_cpu_online,
+	.offline	= qcom_cpufreq_hw_cpu_offline,
 	.register_em	= cpufreq_register_em_with_opp,
 	.fast_switch    = qcom_cpufreq_hw_fast_switch,
 	.name		= "qcom-cpufreq-hw",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
  2022-03-07 15:30 ` [PATCH 2/4] cpufreq: qcom-hw: fix the race between LMH worker and cpuhp Dmitry Baryshkov
@ 2022-03-07 21:49   ` Bjorn Andersson
  2022-03-09 18:45     ` Dmitry Baryshkov
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Andersson @ 2022-03-07 21:49 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Andy Gross, Rafael J. Wysocki, Viresh Kumar, linux-arm-msm, linux-pm

On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote:

> qcom_lmh_dcvs_poll() can be running when the cpu is being put offline.
> This results in the following warning. The driver would disable the
> worker, but it happens closer to the end of cpufreq_offline(). Change
> the locking in the qcom_lmh_dcvs_poll(), so that the worker can not run
> in parallel with cpufreq_offline() call.
> 
> [   37.122433] ------------[ cut here ]------------
> [   37.127225] WARNING: CPU: 0 PID: 187 at drivers/base/arch_topology.c:180 topology_update_thermal_pressure+0xec/0x100

I don't have a warning on line 180 in arch_topology.c

I do however believe that my proposed patch for handling the race during
initialization would end up with a warning there.

As Viresh pointed out as I tried to land those upstream, they would
cause problems as policy->cpus changes during hotplug, i.e. something
very similar to what you're showing here.

Could it be that you're testing this in a tree that has those patches?


PS. The two patches that did land upstream in the end are:
4f774c4a65bf ("cpufreq: Reintroduce ready() callback")
ef8ee1cb8fc8 ("cpufreq: qcom-hw: Delay enabling throttle_irq")

> [   37.138098] Modules linked in:
> [   37.141279] CPU: 0 PID: 187 Comm: kworker/0:3 Tainted: G S                5.17.0-rc6-00389-g37c83d0b8710-dirty #713

389 patches off mainline is quite far off from the upstream, please try
to validate your changes on something closer to mainline.

Regards,
Bjorn

> [   37.158306] Workqueue: events qcom_lmh_dcvs_poll
> [   37.163095] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   37.170278] pc : topology_update_thermal_pressure+0xec/0x100
> [   37.176131] lr : topology_update_thermal_pressure+0x20/0x100
> [   37.181977] sp : ffff800009b6bce0
> [   37.185402] x29: ffff800009b6bce0 x28: ffffd87abe92b000 x27: ffff04bd7292e205
> [   37.192792] x26: ffffd87abe930af8 x25: ffffd87abe94e4c8 x24: 0000000000000000
> [   37.200180] x23: ffff04bb01177018 x22: ffff04bb011770c0 x21: ffff04bb01177000
> [   37.207567] x20: ffff04bb0a419000 x19: 00000000000c4e00 x18: 0000000000000000
> [   37.214954] x17: 000000040044ffff x16: 004000b2b5503510 x15: 0000006aaa1326d2
> [   37.222333] x14: 0000000000000232 x13: 0000000000000001 x12: 0000000000000040
> [   37.229718] x11: ffff04bb00400000 x10: 968f57bd39f701c8 x9 : ffff04bb0acc8674
> [   37.237095] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffffd87abd90092c
> [   37.244478] x5 : 0000000000000016 x4 : 0000000000000000 x3 : 0000000000000100
> [   37.251852] x2 : ffff04bb0a419020 x1 : 0000000000000100 x0 : 0000000000000100
> [   37.259235] Call trace:
> [   37.261771]  topology_update_thermal_pressure+0xec/0x100
> [   37.267266]  qcom_lmh_dcvs_poll+0xbc/0x154
> [   37.271505]  process_one_work+0x288/0x69c
> [   37.275654]  worker_thread+0x74/0x470
> [   37.279450]  kthread+0xfc/0x100
> [   37.282712]  ret_from_fork+0x10/0x20
> [   37.286417] irq event stamp: 74
> [   37.289664] hardirqs last  enabled at (73): [<ffffd87abdd78af4>] _raw_spin_unlock_irq+0x44/0x80
> [   37.298632] hardirqs last disabled at (74): [<ffffd87abdd71fc0>] __schedule+0x710/0xa10
> [   37.306885] softirqs last  enabled at (58): [<ffffd87abcc90410>] _stext+0x410/0x588
> [   37.314778] softirqs last disabled at (51): [<ffffd87abcd1bf68>] __irq_exit_rcu+0x158/0x174
> [   37.323386] ---[ end trace 0000000000000000 ]---
> 
> Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>  drivers/cpufreq/qcom-cpufreq-hw.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index 740518d8ae16..920c80d91c21 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -283,6 +283,23 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
>  	struct dev_pm_opp *opp;
>  	unsigned int freq;
>  
> +	/*
> +	 * Synchronize against CPU going offline.
> +	 * cpufreq_offline() will get the write lock on policy->rwsem.
> +	 */
> +retry:
> +	if (unlikely(!down_read_trylock(&policy->rwsem))) {
> +		mutex_lock(&data->throttle_lock);
> +		if (data->cancel_throttle) {
> +			mutex_unlock(&data->throttle_lock);
> +			return;
> +		}
> +
> +		mutex_unlock(&data->throttle_lock);
> +
> +		schedule();
> +		goto retry;
> +	}
>  	/*
>  	 * Get the h/w throttled frequency, normalize it using the
>  	 * registered opp table and use it to calculate thermal pressure.
> @@ -301,9 +318,10 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
>  
>  	/*
>  	 * In the unlikely case policy is unregistered do not enable
> -	 * polling or h/w interrupt
> +	 * polling or h/w interrupt.
> +	 * If we are here, we have the policy->rwsem read lock,
> +	 * cancel_throttle can be toggled only with the write lock.
>  	 */
> -	mutex_lock(&data->throttle_lock);
>  	if (data->cancel_throttle)
>  		goto out;
>  
> @@ -318,7 +336,7 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
>  				 msecs_to_jiffies(10));
>  
>  out:
> -	mutex_unlock(&data->throttle_lock);
> +	up_read(&policy->rwsem);
>  }
>  
>  static void qcom_lmh_dcvs_poll(struct work_struct *work)
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
  2022-03-07 15:30 [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ Dmitry Baryshkov
                   ` (2 preceding siblings ...)
  2022-03-07 15:30 ` [PATCH 4/4] cpufreq: qcom-hw: provide online/offline operations Dmitry Baryshkov
@ 2022-03-07 21:51 ` Bjorn Andersson
  2022-03-09 18:44   ` Dmitry Baryshkov
  3 siblings, 1 reply; 12+ messages in thread
From: Bjorn Andersson @ 2022-03-07 21:51 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Andy Gross, Rafael J. Wysocki, Viresh Kumar, linux-arm-msm, linux-pm

On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote:

> Drop affinity hint before freeing the throttling IRQ to fix the
> following trace:

Rather than relying on the reader of the git log having to read the
free_irq() implementation to figure out what the problem is, this could
simply state that one isn't allowed to free_irq() something with
affinity set.

Regards,
Bjorn

> 
> [  185.114773] ------------[ cut here ]------------
> [  185.119517] WARNING: CPU: 7 PID: 43 at kernel/irq/manage.c:1887 free_irq+0x3a4/0x3dc
> [  185.127474] Modules linked in:
> [  185.130618] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S      W         5.17.0-rc6-00386-g67382a5b705d-dirty #690
> [  185.147125] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  185.154269] pc : free_irq+0x3a4/0x3dc
> [  185.158031] lr : free_irq+0x33c/0x3dc
> [  185.161792] sp : ffff80000841bc90
> [  185.165195] x29: ffff80000841bc90 x28: ffffa6edc5c3d000 x27: ffff6d93729e5908
> [  185.172515] x26: 0000000000000000 x25: ffff6d910109fc00 x24: ffff6d91011490e0
> [  185.179838] x23: ffff6d9101149218 x22: 0000000000000080 x21: 0000000000000000
> [  185.187163] x20: ffff6d9101149000 x19: ffff6d910ab61500 x18: ffffffffffffffff
> [  185.194487] x17: 2e35202020202020 x16: 2020202020202020 x15: ffff80008841b9a7
> [  185.201805] x14: 00000000000003c9 x13: 0000000000000001 x12: 0000000000000040
> [  185.209135] x11: ffff6d91005aab58 x10: ffff6d91005aab5a x9 : ffffc6a5ad1c5408
> [  185.216455] x8 : ffff6d91005adb88 x7 : 0000000000000000 x6 : ffffc6a5ab5a91f4
> [  185.223776] x5 : 0000000000000000 x4 : ffff6d91011490a8 x3 : ffffc6a5ad266108
> [  185.231098] x2 : 0000000013033204 x1 : ffff6d9101149000 x0 : ffff6d910a9cc000
> [  185.238421] Call trace:
> [  185.240932]  free_irq+0x3a4/0x3dc
> [  185.244334]  qcom_cpufreq_hw_cpu_exit+0x78/0xcc
> [  185.248985]  cpufreq_offline.isra.0+0x228/0x270
> [  185.253639]  cpuhp_cpufreq_offline+0x10/0x20
> [  185.258027]  cpuhp_invoke_callback+0x16c/0x2b0
> [  185.262592]  cpuhp_thread_fun+0x190/0x250
> [  185.266710]  smpboot_thread_fn+0x12c/0x230
> [  185.270914]  kthread+0xfc/0x100
> [  185.274145]  ret_from_fork+0x10/0x20
> [  185.277820] irq event stamp: 212
> [  185.281136] hardirqs last  enabled at (211): [<ffffc6a5ac57973c>] _raw_spin_unlock_irqrestore+0x8c/0xa0
> [  185.290775] hardirqs last disabled at (212): [<ffffc6a5ac572100>] __schedule+0x710/0xa10
> [  185.299081] softirqs last  enabled at (0): [<ffffc6a5ab50f7b0>] copy_process+0x7d0/0x1a14
> [  185.307475] softirqs last disabled at (0): [<0000000000000000>] 0x0
> 
> Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>  drivers/cpufreq/qcom-cpufreq-hw.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index effbb680b453..740518d8ae16 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -412,6 +412,7 @@ static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
>  	mutex_unlock(&data->throttle_lock);
>  
>  	cancel_delayed_work_sync(&data->throttle_work);
> +	irq_set_affinity_hint(data->throttle_irq, NULL);
>  	free_irq(data->throttle_irq, data);
>  }
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] cpufreq: qcom-hw: fix the opp entries refcounting
  2022-03-07 15:30 ` [PATCH 3/4] cpufreq: qcom-hw: fix the opp entries refcounting Dmitry Baryshkov
@ 2022-03-07 22:16   ` Bjorn Andersson
  2022-03-09 18:49     ` Dmitry Baryshkov
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Andersson @ 2022-03-07 22:16 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Andy Gross, Rafael J. Wysocki, Viresh Kumar, linux-arm-msm, linux-pm

On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote:

> The qcom_lmh_dcvs_notify() will get the dev_pm_opp instance for
> throttling, but will not put it, ending up with leaking a reference
> count and the following backtrace when putting the CPU offline.
> 

Good catch, and nice to see this kind of testing of the driver!

> Correctly put the reference count of the returned opp instance.
> 
> [   84.418025] ------------[ cut here ]------------
> [   84.422770] WARNING: CPU: 7 PID: 43 at drivers/opp/core.c:1396 _opp_table_kref_release+0x188/0x190
> [   84.431966] Modules linked in:
> [   84.435106] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S                5.17.0-rc6-00388-g7cf3c0d89c44-dirty #721
> [   84.451631] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
> [   84.458781] pc : _opp_table_kref_release+0x188/0x190
> [   84.463878] lr : _opp_table_kref_release+0x78/0x190
> [   84.468885] sp : ffff80000841bc70
> [   84.472294] x29: ffff80000841bc70 x28: ffff6664afe3d000 x27: ffff1db6729e5908
> [   84.479621] x26: 0000000000000000 x25: 0000000000000000 x24: ffff1db6729e58e0
> [   84.486946] x23: ffff8000080a5000 x22: ffff1db40aad80e0 x21: ffff1db4002fec80
> [   84.494277] x20: ffff1db40aad8000 x19: ffffb751c3186300 x18: ffffffffffffffff
> [   84.501603] x17: 5300326563697665 x16: 645f676e696c6f6f x15: 00001186c1df5448
> [   84.508928] x14: 00000000000002e9 x13: 0000000000000000 x12: 0000000000000000
> [   84.516256] x11: ffffb751c3186368 x10: ffffb751c39a2a70 x9 : 0000000000000000
> [   84.523585] x8 : ffff1db4008edf00 x7 : ffffb751c328c000 x6 : 0000000000000001
> [   84.530916] x5 : 0000000000040000 x4 : 0000000000000001 x3 : ffff1db4008edf00
> [   84.538247] x2 : 0000000000000000 x1 : ffff1db400aa6100 x0 : ffff1db40aad80d0
> [   84.545579] Call trace:
> [   84.548101]  _opp_table_kref_release+0x188/0x190
> [   84.552842]  dev_pm_opp_remove_all_dynamic+0x8c/0xc0
> [   84.557949]  qcom_cpufreq_hw_cpu_exit+0x30/0xdc
> [   84.562608]  cpufreq_offline.isra.0+0x1b4/0x1d8
> [   84.567270]  cpuhp_cpufreq_offline+0x10/0x6c
> [   84.571663]  cpuhp_invoke_callback+0x16c/0x2b0
> [   84.576231]  cpuhp_thread_fun+0x190/0x250
> [   84.580353]  smpboot_thread_fn+0x12c/0x230
> [   84.584568]  kthread+0xfc/0x100
> [   84.587810]  ret_from_fork+0x10/0x20
> [   84.591490] irq event stamp: 3482
> [   84.594901] hardirqs last  enabled at (3481): [<ffffb751c13c3db0>] call_rcu+0x39c/0x50c
> [   84.603119] hardirqs last disabled at (3482): [<ffffb751c236b518>] el1_dbg+0x24/0x8c
> [   84.611074] softirqs last  enabled at (310): [<ffffb751c1290410>] _stext+0x410/0x588
> [   84.619028] softirqs last disabled at (305): [<ffffb751c131bf68>] __irq_exit_rcu+0x158/0x174
> [   84.627691] ---[ end trace 0000000000000000 ]---
> 
> Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>  drivers/cpufreq/qcom-cpufreq-hw.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index 920c80d91c21..580520215ee7 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -309,12 +309,16 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
>  
>  	opp = dev_pm_opp_find_freq_floor(dev, &freq_hz);
>  	if (IS_ERR(opp) && PTR_ERR(opp) == -ERANGE)
> -		dev_pm_opp_find_freq_ceil(dev, &freq_hz);
> +		opp = dev_pm_opp_find_freq_ceil(dev, &freq_hz);
>  
> -	throttled_freq = freq_hz / HZ_PER_KHZ;

Maybe I'm missing something, but where did this division go after your
change?

> +	if (IS_ERR(opp)) {
> +		dev_warn(dev, "Can't find the OPP for throttling: %pe!\n", opp);

qcom_lmh_dcvs_notify() will be invoked repeatedly to poll the hardware
for changing circumstances during thermal pressure. If for some reason
dev_pm_opp_find_freq_ceil() is unable to find an opp it will probably
continue to fail every 10ms.

As such I think you should either omit the warning print, or possibly
use dev_warn_once().

Regards,
Bjorn

> +	} else {
> +		/* Update thermal pressure (the boost frequencies are accepted) */
> +		arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
>  
> -	/* Update thermal pressure (the boost frequencies are accepted) */
> -	arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
> +		dev_pm_opp_put(opp);
> +	}
>  
>  	/*
>  	 * In the unlikely case policy is unregistered do not enable
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] cpufreq: qcom-hw: provide online/offline operations
  2022-03-07 15:30 ` [PATCH 4/4] cpufreq: qcom-hw: provide online/offline operations Dmitry Baryshkov
@ 2022-03-07 22:40   ` Bjorn Andersson
  2022-03-09 19:28     ` Dmitry Baryshkov
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Andersson @ 2022-03-07 22:40 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Andy Gross, Rafael J. Wysocki, Viresh Kumar, linux-arm-msm, linux-pm

On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote:

> Provide lightweight online and offline operations. This saves us from
> parsing and tearing down the OPP tables each time the CPU is put online
> or offline.

Isn't that a slight understatement? Doesn't it also save us from e.g.
ioremapping the memory, traversing DT to discover the policy's
related_cpus and requesting the dcvs interrupt?

I like the idea of getting these things out of the init/exit path. I do
however think that we could move most of this to probe time, and thereby
be able to rely on devm operations for many of these things.

That said, I still like your idea of having a fast path for this...

> 
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>  drivers/cpufreq/qcom-cpufreq-hw.c | 28 ++++++++++++++++++++++++++--
>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index 580520215ee7..12b67f16b78f 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -424,10 +424,26 @@ static int qcom_cpufreq_hw_lmh_init(struct cpufreq_policy *policy, int index)
>  	return 0;
>  }
>  
> -static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
> +static int qcom_cpufreq_hw_cpu_online(struct cpufreq_policy *policy)
>  {
> +	struct qcom_cpufreq_data *data = policy->driver_data;
> +	struct platform_device *pdev = cpufreq_get_driver_data();
> +	int ret;
> +

For backwards compatibility reasons it's valid to not have
data->throttle_irq. This will however cause irq_set_affinity_hint() to
return -EINVAL and we'll get a print.

So you should handle that gracefully.

> +	ret = irq_set_affinity_hint(data->throttle_irq, policy->cpus);
> +	if (ret)
> +		dev_err(&pdev->dev, "Failed to set CPU affinity of %s[%d]\n",
> +			data->irq_name, data->throttle_irq);
> +
> +	return ret;
> +}
> +
> +static int qcom_cpufreq_hw_cpu_offline(struct cpufreq_policy *policy)
> +{
> +	struct qcom_cpufreq_data *data = policy->driver_data;
> +
>  	if (data->throttle_irq <= 0)
> -		return;
> +		return 0;
>  
>  	mutex_lock(&data->throttle_lock);
>  	data->cancel_throttle = true;

This will mark the throttle as cancelled, you need to clear this as
you're bringing the policy online again.

> @@ -435,6 +451,12 @@ static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
>  
>  	cancel_delayed_work_sync(&data->throttle_work);
>  	irq_set_affinity_hint(data->throttle_irq, NULL);

You don't disable_irq(data->throttle_irq) here. I think
qcom_lmh_dcvs_notify() will be unhappy if we get thermal pressure from a
policy with no cpus?

Note though that you can't enable it in online(), as it will be enabled
in ready()...

> +
> +	return 0;
> +}
> +
> +static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
> +{
>  	free_irq(data->throttle_irq, data);

As above, you should treat throttle_irq <= 0 gracefully.

Regards,
Bjorn

>  }
>  
> @@ -588,6 +610,8 @@ static struct cpufreq_driver cpufreq_qcom_hw_driver = {
>  	.get		= qcom_cpufreq_hw_get,
>  	.init		= qcom_cpufreq_hw_cpu_init,
>  	.exit		= qcom_cpufreq_hw_cpu_exit,
> +	.online		= qcom_cpufreq_hw_cpu_online,
> +	.offline	= qcom_cpufreq_hw_cpu_offline,
>  	.register_em	= cpufreq_register_em_with_opp,
>  	.fast_switch    = qcom_cpufreq_hw_fast_switch,
>  	.name		= "qcom-cpufreq-hw",
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
  2022-03-07 21:51 ` [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ Bjorn Andersson
@ 2022-03-09 18:44   ` Dmitry Baryshkov
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Baryshkov @ 2022-03-09 18:44 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Andy Gross, Rafael J. Wysocki, Viresh Kumar, linux-arm-msm, linux-pm

On Tue, 8 Mar 2022 at 00:49, Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
>
> On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote:
>
> > Drop affinity hint before freeing the throttling IRQ to fix the
> > following trace:
>
> Rather than relying on the reader of the git log having to read the
> free_irq() implementation to figure out what the problem is, this could
> simply state that one isn't allowed to free_irq() something with
> affinity set.

Ack, will update the commit message in v2.


> Regards,
> Bjorn
>
> >
> > [  185.114773] ------------[ cut here ]------------
> > [  185.119517] WARNING: CPU: 7 PID: 43 at kernel/irq/manage.c:1887 free_irq+0x3a4/0x3dc
> > [  185.127474] Modules linked in:
> > [  185.130618] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S      W         5.17.0-rc6-00386-g67382a5b705d-dirty #690
> > [  185.147125] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [  185.154269] pc : free_irq+0x3a4/0x3dc
> > [  185.158031] lr : free_irq+0x33c/0x3dc
> > [  185.161792] sp : ffff80000841bc90
> > [  185.165195] x29: ffff80000841bc90 x28: ffffa6edc5c3d000 x27: ffff6d93729e5908
> > [  185.172515] x26: 0000000000000000 x25: ffff6d910109fc00 x24: ffff6d91011490e0
> > [  185.179838] x23: ffff6d9101149218 x22: 0000000000000080 x21: 0000000000000000
> > [  185.187163] x20: ffff6d9101149000 x19: ffff6d910ab61500 x18: ffffffffffffffff
> > [  185.194487] x17: 2e35202020202020 x16: 2020202020202020 x15: ffff80008841b9a7
> > [  185.201805] x14: 00000000000003c9 x13: 0000000000000001 x12: 0000000000000040
> > [  185.209135] x11: ffff6d91005aab58 x10: ffff6d91005aab5a x9 : ffffc6a5ad1c5408
> > [  185.216455] x8 : ffff6d91005adb88 x7 : 0000000000000000 x6 : ffffc6a5ab5a91f4
> > [  185.223776] x5 : 0000000000000000 x4 : ffff6d91011490a8 x3 : ffffc6a5ad266108
> > [  185.231098] x2 : 0000000013033204 x1 : ffff6d9101149000 x0 : ffff6d910a9cc000
> > [  185.238421] Call trace:
> > [  185.240932]  free_irq+0x3a4/0x3dc
> > [  185.244334]  qcom_cpufreq_hw_cpu_exit+0x78/0xcc
> > [  185.248985]  cpufreq_offline.isra.0+0x228/0x270
> > [  185.253639]  cpuhp_cpufreq_offline+0x10/0x20
> > [  185.258027]  cpuhp_invoke_callback+0x16c/0x2b0
> > [  185.262592]  cpuhp_thread_fun+0x190/0x250
> > [  185.266710]  smpboot_thread_fn+0x12c/0x230
> > [  185.270914]  kthread+0xfc/0x100
> > [  185.274145]  ret_from_fork+0x10/0x20
> > [  185.277820] irq event stamp: 212
> > [  185.281136] hardirqs last  enabled at (211): [<ffffc6a5ac57973c>] _raw_spin_unlock_irqrestore+0x8c/0xa0
> > [  185.290775] hardirqs last disabled at (212): [<ffffc6a5ac572100>] __schedule+0x710/0xa10
> > [  185.299081] softirqs last  enabled at (0): [<ffffc6a5ab50f7b0>] copy_process+0x7d0/0x1a14
> > [  185.307475] softirqs last disabled at (0): [<0000000000000000>] 0x0
> >
> > Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > ---
> >  drivers/cpufreq/qcom-cpufreq-hw.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> > index effbb680b453..740518d8ae16 100644
> > --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> > @@ -412,6 +412,7 @@ static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
> >       mutex_unlock(&data->throttle_lock);
> >
> >       cancel_delayed_work_sync(&data->throttle_work);
> > +     irq_set_affinity_hint(data->throttle_irq, NULL);
> >       free_irq(data->throttle_irq, data);
> >  }
> >
> > --
> > 2.34.1
> >



-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
  2022-03-07 21:49   ` Bjorn Andersson
@ 2022-03-09 18:45     ` Dmitry Baryshkov
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Baryshkov @ 2022-03-09 18:45 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Andy Gross, Rafael J. Wysocki, Viresh Kumar, linux-arm-msm, linux-pm

On Tue, 8 Mar 2022 at 00:47, Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
>
> On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote:
>
> > qcom_lmh_dcvs_poll() can be running when the cpu is being put offline.
> > This results in the following warning. The driver would disable the
> > worker, but it happens closer to the end of cpufreq_offline(). Change
> > the locking in the qcom_lmh_dcvs_poll(), so that the worker can not run
> > in parallel with cpufreq_offline() call.
> >
> > [   37.122433] ------------[ cut here ]------------
> > [   37.127225] WARNING: CPU: 0 PID: 187 at drivers/base/arch_topology.c:180 topology_update_thermal_pressure+0xec/0x100
>
> I don't have a warning on line 180 in arch_topology.c
>
> I do however believe that my proposed patch for handling the race during
> initialization would end up with a warning there.
>
> As Viresh pointed out as I tried to land those upstream, they would
> cause problems as policy->cpus changes during hotplug, i.e. something
> very similar to what you're showing here.
>
> Could it be that you're testing this in a tree that has those patches?

Yes, I have been testing my patches on top of integration tree that
had these patchses. However the issue is still present even w/o the
patch in question, so I'll update the backtrace in V2.

>
>
> PS. The two patches that did land upstream in the end are:
> 4f774c4a65bf ("cpufreq: Reintroduce ready() callback")
> ef8ee1cb8fc8 ("cpufreq: qcom-hw: Delay enabling throttle_irq")
>
> > [   37.138098] Modules linked in:
> > [   37.141279] CPU: 0 PID: 187 Comm: kworker/0:3 Tainted: G S                5.17.0-rc6-00389-g37c83d0b8710-dirty #713
>
> 389 patches off mainline is quite far off from the upstream, please try
> to validate your changes on something closer to mainline.
>
> Regards,
> Bjorn
>
> > [   37.158306] Workqueue: events qcom_lmh_dcvs_poll
> > [   37.163095] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [   37.170278] pc : topology_update_thermal_pressure+0xec/0x100
> > [   37.176131] lr : topology_update_thermal_pressure+0x20/0x100
> > [   37.181977] sp : ffff800009b6bce0
> > [   37.185402] x29: ffff800009b6bce0 x28: ffffd87abe92b000 x27: ffff04bd7292e205
> > [   37.192792] x26: ffffd87abe930af8 x25: ffffd87abe94e4c8 x24: 0000000000000000
> > [   37.200180] x23: ffff04bb01177018 x22: ffff04bb011770c0 x21: ffff04bb01177000
> > [   37.207567] x20: ffff04bb0a419000 x19: 00000000000c4e00 x18: 0000000000000000
> > [   37.214954] x17: 000000040044ffff x16: 004000b2b5503510 x15: 0000006aaa1326d2
> > [   37.222333] x14: 0000000000000232 x13: 0000000000000001 x12: 0000000000000040
> > [   37.229718] x11: ffff04bb00400000 x10: 968f57bd39f701c8 x9 : ffff04bb0acc8674
> > [   37.237095] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffffd87abd90092c
> > [   37.244478] x5 : 0000000000000016 x4 : 0000000000000000 x3 : 0000000000000100
> > [   37.251852] x2 : ffff04bb0a419020 x1 : 0000000000000100 x0 : 0000000000000100
> > [   37.259235] Call trace:
> > [   37.261771]  topology_update_thermal_pressure+0xec/0x100
> > [   37.267266]  qcom_lmh_dcvs_poll+0xbc/0x154
> > [   37.271505]  process_one_work+0x288/0x69c
> > [   37.275654]  worker_thread+0x74/0x470
> > [   37.279450]  kthread+0xfc/0x100
> > [   37.282712]  ret_from_fork+0x10/0x20
> > [   37.286417] irq event stamp: 74
> > [   37.289664] hardirqs last  enabled at (73): [<ffffd87abdd78af4>] _raw_spin_unlock_irq+0x44/0x80
> > [   37.298632] hardirqs last disabled at (74): [<ffffd87abdd71fc0>] __schedule+0x710/0xa10
> > [   37.306885] softirqs last  enabled at (58): [<ffffd87abcc90410>] _stext+0x410/0x588
> > [   37.314778] softirqs last disabled at (51): [<ffffd87abcd1bf68>] __irq_exit_rcu+0x158/0x174
> > [   37.323386] ---[ end trace 0000000000000000 ]---
> >
> > Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > ---
> >  drivers/cpufreq/qcom-cpufreq-hw.c | 24 +++++++++++++++++++++---
> >  1 file changed, 21 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> > index 740518d8ae16..920c80d91c21 100644
> > --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> > @@ -283,6 +283,23 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
> >       struct dev_pm_opp *opp;
> >       unsigned int freq;
> >
> > +     /*
> > +      * Synchronize against CPU going offline.
> > +      * cpufreq_offline() will get the write lock on policy->rwsem.
> > +      */
> > +retry:
> > +     if (unlikely(!down_read_trylock(&policy->rwsem))) {
> > +             mutex_lock(&data->throttle_lock);
> > +             if (data->cancel_throttle) {
> > +                     mutex_unlock(&data->throttle_lock);
> > +                     return;
> > +             }
> > +
> > +             mutex_unlock(&data->throttle_lock);
> > +
> > +             schedule();
> > +             goto retry;
> > +     }
> >       /*
> >        * Get the h/w throttled frequency, normalize it using the
> >        * registered opp table and use it to calculate thermal pressure.
> > @@ -301,9 +318,10 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
> >
> >       /*
> >        * In the unlikely case policy is unregistered do not enable
> > -      * polling or h/w interrupt
> > +      * polling or h/w interrupt.
> > +      * If we are here, we have the policy->rwsem read lock,
> > +      * cancel_throttle can be toggled only with the write lock.
> >        */
> > -     mutex_lock(&data->throttle_lock);
> >       if (data->cancel_throttle)
> >               goto out;
> >
> > @@ -318,7 +336,7 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
> >                                msecs_to_jiffies(10));
> >
> >  out:
> > -     mutex_unlock(&data->throttle_lock);
> > +     up_read(&policy->rwsem);
> >  }
> >
> >  static void qcom_lmh_dcvs_poll(struct work_struct *work)
> > --
> > 2.34.1
> >



-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] cpufreq: qcom-hw: fix the opp entries refcounting
  2022-03-07 22:16   ` Bjorn Andersson
@ 2022-03-09 18:49     ` Dmitry Baryshkov
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Baryshkov @ 2022-03-09 18:49 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Andy Gross, Rafael J. Wysocki, Viresh Kumar, linux-arm-msm, linux-pm

On Tue, 8 Mar 2022 at 01:14, Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
>
> On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote:
>
> > The qcom_lmh_dcvs_notify() will get the dev_pm_opp instance for
> > throttling, but will not put it, ending up with leaking a reference
> > count and the following backtrace when putting the CPU offline.
> >
>
> Good catch, and nice to see this kind of testing of the driver!
>
> > Correctly put the reference count of the returned opp instance.
> >
> > [   84.418025] ------------[ cut here ]------------
> > [   84.422770] WARNING: CPU: 7 PID: 43 at drivers/opp/core.c:1396 _opp_table_kref_release+0x188/0x190
> > [   84.431966] Modules linked in:
> > [   84.435106] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S                5.17.0-rc6-00388-g7cf3c0d89c44-dirty #721
> > [   84.451631] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
> > [   84.458781] pc : _opp_table_kref_release+0x188/0x190
> > [   84.463878] lr : _opp_table_kref_release+0x78/0x190
> > [   84.468885] sp : ffff80000841bc70
> > [   84.472294] x29: ffff80000841bc70 x28: ffff6664afe3d000 x27: ffff1db6729e5908
> > [   84.479621] x26: 0000000000000000 x25: 0000000000000000 x24: ffff1db6729e58e0
> > [   84.486946] x23: ffff8000080a5000 x22: ffff1db40aad80e0 x21: ffff1db4002fec80
> > [   84.494277] x20: ffff1db40aad8000 x19: ffffb751c3186300 x18: ffffffffffffffff
> > [   84.501603] x17: 5300326563697665 x16: 645f676e696c6f6f x15: 00001186c1df5448
> > [   84.508928] x14: 00000000000002e9 x13: 0000000000000000 x12: 0000000000000000
> > [   84.516256] x11: ffffb751c3186368 x10: ffffb751c39a2a70 x9 : 0000000000000000
> > [   84.523585] x8 : ffff1db4008edf00 x7 : ffffb751c328c000 x6 : 0000000000000001
> > [   84.530916] x5 : 0000000000040000 x4 : 0000000000000001 x3 : ffff1db4008edf00
> > [   84.538247] x2 : 0000000000000000 x1 : ffff1db400aa6100 x0 : ffff1db40aad80d0
> > [   84.545579] Call trace:
> > [   84.548101]  _opp_table_kref_release+0x188/0x190
> > [   84.552842]  dev_pm_opp_remove_all_dynamic+0x8c/0xc0
> > [   84.557949]  qcom_cpufreq_hw_cpu_exit+0x30/0xdc
> > [   84.562608]  cpufreq_offline.isra.0+0x1b4/0x1d8
> > [   84.567270]  cpuhp_cpufreq_offline+0x10/0x6c
> > [   84.571663]  cpuhp_invoke_callback+0x16c/0x2b0
> > [   84.576231]  cpuhp_thread_fun+0x190/0x250
> > [   84.580353]  smpboot_thread_fn+0x12c/0x230
> > [   84.584568]  kthread+0xfc/0x100
> > [   84.587810]  ret_from_fork+0x10/0x20
> > [   84.591490] irq event stamp: 3482
> > [   84.594901] hardirqs last  enabled at (3481): [<ffffb751c13c3db0>] call_rcu+0x39c/0x50c
> > [   84.603119] hardirqs last disabled at (3482): [<ffffb751c236b518>] el1_dbg+0x24/0x8c
> > [   84.611074] softirqs last  enabled at (310): [<ffffb751c1290410>] _stext+0x410/0x588
> > [   84.619028] softirqs last disabled at (305): [<ffffb751c131bf68>] __irq_exit_rcu+0x158/0x174
> > [   84.627691] ---[ end trace 0000000000000000 ]---
> >
> > Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > ---
> >  drivers/cpufreq/qcom-cpufreq-hw.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> > index 920c80d91c21..580520215ee7 100644
> > --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> > @@ -309,12 +309,16 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
> >
> >       opp = dev_pm_opp_find_freq_floor(dev, &freq_hz);
> >       if (IS_ERR(opp) && PTR_ERR(opp) == -ERANGE)
> > -             dev_pm_opp_find_freq_ceil(dev, &freq_hz);
> > +             opp = dev_pm_opp_find_freq_ceil(dev, &freq_hz);
> >
> > -     throttled_freq = freq_hz / HZ_PER_KHZ;
>
> Maybe I'm missing something, but where did this division go after your
> change?

Oops. It got dropped when fixing the conflict. Let's get it back.

>
> > +     if (IS_ERR(opp)) {
> > +             dev_warn(dev, "Can't find the OPP for throttling: %pe!\n", opp);
>
> qcom_lmh_dcvs_notify() will be invoked repeatedly to poll the hardware
> for changing circumstances during thermal pressure. If for some reason
> dev_pm_opp_find_freq_ceil() is unable to find an opp it will probably
> continue to fail every 10ms.

I'll change it to dev_warn_ratelimited() instead. dev_warn_once() can
easily get lost.

>
> As such I think you should either omit the warning print, or possibly
> use dev_warn_once().
>
> Regards,
> Bjorn
>
> > +     } else {
> > +             /* Update thermal pressure (the boost frequencies are accepted) */
> > +             arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
> >
> > -     /* Update thermal pressure (the boost frequencies are accepted) */
> > -     arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
> > +             dev_pm_opp_put(opp);
> > +     }
> >
> >       /*
> >        * In the unlikely case policy is unregistered do not enable
> > --
> > 2.34.1
> >



-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] cpufreq: qcom-hw: provide online/offline operations
  2022-03-07 22:40   ` Bjorn Andersson
@ 2022-03-09 19:28     ` Dmitry Baryshkov
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Baryshkov @ 2022-03-09 19:28 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Andy Gross, Rafael J. Wysocki, Viresh Kumar, linux-arm-msm, linux-pm

On Tue, 8 Mar 2022 at 01:38, Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
>
> On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote:
>
> > Provide lightweight online and offline operations. This saves us from
> > parsing and tearing down the OPP tables each time the CPU is put online
> > or offline.
>
> Isn't that a slight understatement? Doesn't it also save us from e.g.
> ioremapping the memory, traversing DT to discover the policy's
> related_cpus and requesting the dcvs interrupt?
>
> I like the idea of getting these things out of the init/exit path. I do
> however think that we could move most of this to probe time, and thereby
> be able to rely on devm operations for many of these things.
>
> That said, I still like your idea of having a fast path for this...
>
> >
> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > ---
> >  drivers/cpufreq/qcom-cpufreq-hw.c | 28 ++++++++++++++++++++++++++--
> >  1 file changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> > index 580520215ee7..12b67f16b78f 100644
> > --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> > @@ -424,10 +424,26 @@ static int qcom_cpufreq_hw_lmh_init(struct cpufreq_policy *policy, int index)
> >       return 0;
> >  }
> >
> > -static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
> > +static int qcom_cpufreq_hw_cpu_online(struct cpufreq_policy *policy)
> >  {
> > +     struct qcom_cpufreq_data *data = policy->driver_data;
> > +     struct platform_device *pdev = cpufreq_get_driver_data();
> > +     int ret;
> > +
>
> For backwards compatibility reasons it's valid to not have
> data->throttle_irq. This will however cause irq_set_affinity_hint() to
> return -EINVAL and we'll get a print.
>
> So you should handle that gracefully.

Ack.

>
> > +     ret = irq_set_affinity_hint(data->throttle_irq, policy->cpus);
> > +     if (ret)
> > +             dev_err(&pdev->dev, "Failed to set CPU affinity of %s[%d]\n",
> > +                     data->irq_name, data->throttle_irq);
> > +
> > +     return ret;
> > +}
> > +
> > +static int qcom_cpufreq_hw_cpu_offline(struct cpufreq_policy *policy)
> > +{
> > +     struct qcom_cpufreq_data *data = policy->driver_data;
> > +
> >       if (data->throttle_irq <= 0)
> > -             return;
> > +             return 0;
> >
> >       mutex_lock(&data->throttle_lock);
> >       data->cancel_throttle = true;
>
> This will mark the throttle as cancelled, you need to clear this as
> you're bringing the policy online again.

ack.

>
> > @@ -435,6 +451,12 @@ static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
> >
> >       cancel_delayed_work_sync(&data->throttle_work);
> >       irq_set_affinity_hint(data->throttle_irq, NULL);
>
> You don't disable_irq(data->throttle_irq) here. I think
> qcom_lmh_dcvs_notify() will be unhappy if we get thermal pressure from a
> policy with no cpus?
>
> Note though that you can't enable it in online(), as it will be enabled
> in ready()...

And we can't just disable it here, as it might be enabled or might be
disabled. I think the simplest would be to call free_irq() here and
request_irq() to online().

>
> > +
> > +     return 0;
> > +}
> > +
> > +static void qcom_cpufreq_hw_lmh_exit(struct qcom_cpufreq_data *data)
> > +{
> >       free_irq(data->throttle_irq, data);
>
> As above, you should treat throttle_irq <= 0 gracefully.
>
> Regards,
> Bjorn
>
> >  }
> >
> > @@ -588,6 +610,8 @@ static struct cpufreq_driver cpufreq_qcom_hw_driver = {
> >       .get            = qcom_cpufreq_hw_get,
> >       .init           = qcom_cpufreq_hw_cpu_init,
> >       .exit           = qcom_cpufreq_hw_cpu_exit,
> > +     .online         = qcom_cpufreq_hw_cpu_online,
> > +     .offline        = qcom_cpufreq_hw_cpu_offline,
> >       .register_em    = cpufreq_register_em_with_opp,
> >       .fast_switch    = qcom_cpufreq_hw_fast_switch,
> >       .name           = "qcom-cpufreq-hw",
> > --
> > 2.34.1
> >



-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-09 19:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-07 15:30 [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ Dmitry Baryshkov
2022-03-07 15:30 ` [PATCH 2/4] cpufreq: qcom-hw: fix the race between LMH worker and cpuhp Dmitry Baryshkov
2022-03-07 21:49   ` Bjorn Andersson
2022-03-09 18:45     ` Dmitry Baryshkov
2022-03-07 15:30 ` [PATCH 3/4] cpufreq: qcom-hw: fix the opp entries refcounting Dmitry Baryshkov
2022-03-07 22:16   ` Bjorn Andersson
2022-03-09 18:49     ` Dmitry Baryshkov
2022-03-07 15:30 ` [PATCH 4/4] cpufreq: qcom-hw: provide online/offline operations Dmitry Baryshkov
2022-03-07 22:40   ` Bjorn Andersson
2022-03-09 19:28     ` Dmitry Baryshkov
2022-03-07 21:51 ` [PATCH 1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ Bjorn Andersson
2022-03-09 18:44   ` Dmitry Baryshkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.