[PATCH V3] cpufreq: powernv: Fix the hardlockup by synchronus smp

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V3] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt
@ 2018-04-25 10:59 ` Shilpasri G Bhat
  0 siblings, 0 replies; 8+ messages in thread
From: Shilpasri G Bhat @ 2018-04-25 10:59 UTC (permalink / raw)
  To: rjw, viresh.kumar
  Cc: npiggin, benh, mpe, linux-pm, linuxppc-dev, linux-kernel,
	ppaidipe, svaidy, Shilpasri G Bhat, stable

gpstate_timer_handler() uses synchronous smp_call to set the pstate
on the requested core. This causes the below hard lockup:

[c000003fe566b320] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable)
[c000003fe566b390] [c0000000001d55e0] smp_call_function_any+0x180/0x250
[c000003fe566b3f0] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580
[c000003fe566b4a0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
[c000003fe566b520] [c0000000001b4958] expire_timers+0x138/0x1f0
[c000003fe566b590] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270
[c000003fe566b630] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
[c000003fe566b710] [c000000000114be8] irq_exit+0xe8/0x120
[c000003fe566b730] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[c000003fe566b760] [c000000000009014] decrementer_common+0x114/0x120
-- interrupt: 901 at doorbell_global_ipi+0x34/0x50
LR = arch_send_call_function_ipi_mask+0x120/0x130
[c000003fe566ba50] [c00000000004876c]
arch_send_call_function_ipi_mask+0x4c/0x130
[c000003fe566ba90] [c0000000001d59f0] smp_call_function_many+0x340/0x450
[c000003fe566bb00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
[c000003fe566bb30] [c0000000003a1120] change_huge_pmd+0xe0/0x270
[c000003fe566bba0] [c000000000349278] change_protection_range+0xb88/0xe40
[c000003fe566bcf0] [c0000000003496c0] mprotect_fixup+0x140/0x340
[c000003fe566bdb0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
[c000003fe566be30] [c00000000000b184] system_call+0x58/0x6c

One way to avoid this is removing the smp-call. We can ensure that the timer
always runs on one of the policy-cpus. If the timer gets migrated to a
cpu outside the policy then re-queue it back on the policy->cpus. This way
we can get rid of the smp-call which was being used to set the pstate
on the policy->cpus.

Fixes: 7bc54b652f13 (timers, cpufreq/powernv: Initialize the gpstate timer as pinned)
Cc: <stable@vger.kernel.org>        [4.8+]
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
Changes from V2:
- Remove the check for active policy while requeing the migrated timer
Changes from V1:
- Remove smp_call in the pstate handler.

 drivers/cpufreq/powernv-cpufreq.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 71f8682..e368e1f 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -679,6 +679,16 @@ void gpstate_timer_handler(struct timer_list *t)
 
 	if (!spin_trylock(&gpstates->gpstate_lock))
 		return;
+	/*
+	 * If the timer has migrated to the different cpu then bring
+	 * it back to one of the policy->cpus
+	 */
+	if (!cpumask_test_cpu(raw_smp_processor_id(), policy->cpus)) {
+		gpstates->timer.expires = jiffies + msecs_to_jiffies(1);
+		add_timer_on(&gpstates->timer, cpumask_first(policy->cpus));
+		spin_unlock(&gpstates->gpstate_lock);
+		return;
+	}
 
 	/*
 	 * If PMCR was last updated was using fast_swtich then
@@ -718,10 +728,8 @@ void gpstate_timer_handler(struct timer_list *t)
 	if (gpstate_idx != gpstates->last_lpstate_idx)
 		queue_gpstate_timer(gpstates);
 
+	set_pstate(&freq_data);
 	spin_unlock(&gpstates->gpstate_lock);
-
-	/* Timer may get migrated to a different cpu on cpu hot unplug */
-	smp_call_function_any(policy->cpus, set_pstate, &freq_data, 1);
 }
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH V3] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt
@ 2018-04-25 10:59 ` Shilpasri G Bhat
  0 siblings, 0 replies; 8+ messages in thread
From: Shilpasri G Bhat @ 2018-04-25 10:59 UTC (permalink / raw)
  To: rjw, viresh.kumar
  Cc: npiggin, benh, mpe, linux-pm, linuxppc-dev, linux-kernel,
	ppaidipe, svaidy, Shilpasri G Bhat, stable

gpstate_timer_handler() uses synchronous smp_call to set the pstate
on the requested core. This causes the below hard lockup:

[c000003fe566b320] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable)
[c000003fe566b390] [c0000000001d55e0] smp_call_function_any+0x180/0x250
[c000003fe566b3f0] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580
[c000003fe566b4a0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
[c000003fe566b520] [c0000000001b4958] expire_timers+0x138/0x1f0
[c000003fe566b590] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270
[c000003fe566b630] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
[c000003fe566b710] [c000000000114be8] irq_exit+0xe8/0x120
[c000003fe566b730] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[c000003fe566b760] [c000000000009014] decrementer_common+0x114/0x120
-- interrupt: 901 at doorbell_global_ipi+0x34/0x50
LR = arch_send_call_function_ipi_mask+0x120/0x130
[c000003fe566ba50] [c00000000004876c]
arch_send_call_function_ipi_mask+0x4c/0x130
[c000003fe566ba90] [c0000000001d59f0] smp_call_function_many+0x340/0x450
[c000003fe566bb00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
[c000003fe566bb30] [c0000000003a1120] change_huge_pmd+0xe0/0x270
[c000003fe566bba0] [c000000000349278] change_protection_range+0xb88/0xe40
[c000003fe566bcf0] [c0000000003496c0] mprotect_fixup+0x140/0x340
[c000003fe566bdb0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
[c000003fe566be30] [c00000000000b184] system_call+0x58/0x6c

One way to avoid this is removing the smp-call. We can ensure that the timer
always runs on one of the policy-cpus. If the timer gets migrated to a
cpu outside the policy then re-queue it back on the policy->cpus. This way
we can get rid of the smp-call which was being used to set the pstate
on the policy->cpus.

Fixes: 7bc54b652f13 (timers, cpufreq/powernv: Initialize the gpstate timer as pinned)
Cc: <stable@vger.kernel.org>        [4.8+]
Reported-by: Nicholas Piggin <npiggin@gmail.com>
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
Changes from V2:
- Remove the check for active policy while requeing the migrated timer
Changes from V1:
- Remove smp_call in the pstate handler.

 drivers/cpufreq/powernv-cpufreq.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 71f8682..e368e1f 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -679,6 +679,16 @@ void gpstate_timer_handler(struct timer_list *t)
 
 	if (!spin_trylock(&gpstates->gpstate_lock))
 		return;
+	/*
+	 * If the timer has migrated to the different cpu then bring
+	 * it back to one of the policy->cpus
+	 */
+	if (!cpumask_test_cpu(raw_smp_processor_id(), policy->cpus)) {
+		gpstates->timer.expires = jiffies + msecs_to_jiffies(1);
+		add_timer_on(&gpstates->timer, cpumask_first(policy->cpus));
+		spin_unlock(&gpstates->gpstate_lock);
+		return;
+	}
 
 	/*
 	 * If PMCR was last updated was using fast_swtich then
@@ -718,10 +728,8 @@ void gpstate_timer_handler(struct timer_list *t)
 	if (gpstate_idx != gpstates->last_lpstate_idx)
 		queue_gpstate_timer(gpstates);
 
+	set_pstate(&freq_data);
 	spin_unlock(&gpstates->gpstate_lock);
-
-	/* Timer may get migrated to a different cpu on cpu hot unplug */
-	smp_call_function_any(policy->cpus, set_pstate, &freq_data, 1);
 }
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V3] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt
  2018-04-25 10:59 ` Shilpasri G Bhat
@ 2018-04-25 11:26   ` Nicholas Piggin
  -1 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2018-04-25 11:26 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: rjw, viresh.kumar, benh, mpe, linux-pm, linuxppc-dev,
	linux-kernel, ppaidipe, svaidy, stable

On Wed, 25 Apr 2018 16:29:31 +0530
Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:

> gpstate_timer_handler() uses synchronous smp_call to set the pstate
> on the requested core. This causes the below hard lockup:
> 
> [c000003fe566b320] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable)
> [c000003fe566b390] [c0000000001d55e0] smp_call_function_any+0x180/0x250
> [c000003fe566b3f0] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580
> [c000003fe566b4a0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
> [c000003fe566b520] [c0000000001b4958] expire_timers+0x138/0x1f0
> [c000003fe566b590] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270
> [c000003fe566b630] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
> [c000003fe566b710] [c000000000114be8] irq_exit+0xe8/0x120
> [c000003fe566b730] [c000000000024d0c] timer_interrupt+0x9c/0xe0
> [c000003fe566b760] [c000000000009014] decrementer_common+0x114/0x120
> -- interrupt: 901 at doorbell_global_ipi+0x34/0x50
> LR = arch_send_call_function_ipi_mask+0x120/0x130
> [c000003fe566ba50] [c00000000004876c]
> arch_send_call_function_ipi_mask+0x4c/0x130
> [c000003fe566ba90] [c0000000001d59f0] smp_call_function_many+0x340/0x450
> [c000003fe566bb00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
> [c000003fe566bb30] [c0000000003a1120] change_huge_pmd+0xe0/0x270
> [c000003fe566bba0] [c000000000349278] change_protection_range+0xb88/0xe40
> [c000003fe566bcf0] [c0000000003496c0] mprotect_fixup+0x140/0x340
> [c000003fe566bdb0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
> [c000003fe566be30] [c00000000000b184] system_call+0x58/0x6c
> 
> One way to avoid this is removing the smp-call. We can ensure that the timer
> always runs on one of the policy-cpus. If the timer gets migrated to a
> cpu outside the policy then re-queue it back on the policy->cpus. This way
> we can get rid of the smp-call which was being used to set the pstate
> on the policy->cpus.
> 
> Fixes: 7bc54b652f13 (timers, cpufreq/powernv: Initialize the gpstate timer as pinned)
> Cc: <stable@vger.kernel.org>        [4.8+]
> Reported-by: Nicholas Piggin <npiggin@gmail.com>
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>

Thanks, this looks good to me. I don't know the code though, so

Acked-by: Nicholas Piggin <npiggin@gmail.com>

> ---
> Changes from V2:
> - Remove the check for active policy while requeing the migrated timer
> Changes from V1:
> - Remove smp_call in the pstate handler.
> 
>  drivers/cpufreq/powernv-cpufreq.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> index 71f8682..e368e1f 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -679,6 +679,16 @@ void gpstate_timer_handler(struct timer_list *t)
>  
>  	if (!spin_trylock(&gpstates->gpstate_lock))
>  		return;

I still think it would be good to do something about the trylock failure.
It may be rare, but if it happens it could stop the timer and lead to
some rare unpredictable behaviour? Not for this patch, but while you're
looking at the code it would be good to consider it. Just queueing up
another timer seems like it should be enough.

> +	/*
> +	 * If the timer has migrated to the different cpu then bring
> +	 * it back to one of the policy->cpus
> +	 */
> +	if (!cpumask_test_cpu(raw_smp_processor_id(), policy->cpus)) {
> +		gpstates->timer.expires = jiffies + msecs_to_jiffies(1);
> +		add_timer_on(&gpstates->timer, cpumask_first(policy->cpus));
> +		spin_unlock(&gpstates->gpstate_lock);
> +		return;
> +	}

Really small nitpick, but you could use cpumask_any there.

Thanks,
Nick


>  
>  	/*
>  	 * If PMCR was last updated was using fast_swtich then
> @@ -718,10 +728,8 @@ void gpstate_timer_handler(struct timer_list *t)
>  	if (gpstate_idx != gpstates->last_lpstate_idx)
>  		queue_gpstate_timer(gpstates);
>  
> +	set_pstate(&freq_data);
>  	spin_unlock(&gpstates->gpstate_lock);
> -
> -	/* Timer may get migrated to a different cpu on cpu hot unplug */
> -	smp_call_function_any(policy->cpus, set_pstate, &freq_data, 1);
>  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt
@ 2018-04-25 11:26   ` Nicholas Piggin
  0 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2018-04-25 11:26 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: rjw, viresh.kumar, benh, mpe, linux-pm, linuxppc-dev,
	linux-kernel, ppaidipe, svaidy, stable

On Wed, 25 Apr 2018 16:29:31 +0530
Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:

> gpstate_timer_handler() uses synchronous smp_call to set the pstate
> on the requested core. This causes the below hard lockup:
> 
> [c000003fe566b320] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable)
> [c000003fe566b390] [c0000000001d55e0] smp_call_function_any+0x180/0x250
> [c000003fe566b3f0] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580
> [c000003fe566b4a0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
> [c000003fe566b520] [c0000000001b4958] expire_timers+0x138/0x1f0
> [c000003fe566b590] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270
> [c000003fe566b630] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
> [c000003fe566b710] [c000000000114be8] irq_exit+0xe8/0x120
> [c000003fe566b730] [c000000000024d0c] timer_interrupt+0x9c/0xe0
> [c000003fe566b760] [c000000000009014] decrementer_common+0x114/0x120
> -- interrupt: 901 at doorbell_global_ipi+0x34/0x50
> LR = arch_send_call_function_ipi_mask+0x120/0x130
> [c000003fe566ba50] [c00000000004876c]
> arch_send_call_function_ipi_mask+0x4c/0x130
> [c000003fe566ba90] [c0000000001d59f0] smp_call_function_many+0x340/0x450
> [c000003fe566bb00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
> [c000003fe566bb30] [c0000000003a1120] change_huge_pmd+0xe0/0x270
> [c000003fe566bba0] [c000000000349278] change_protection_range+0xb88/0xe40
> [c000003fe566bcf0] [c0000000003496c0] mprotect_fixup+0x140/0x340
> [c000003fe566bdb0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
> [c000003fe566be30] [c00000000000b184] system_call+0x58/0x6c
> 
> One way to avoid this is removing the smp-call. We can ensure that the timer
> always runs on one of the policy-cpus. If the timer gets migrated to a
> cpu outside the policy then re-queue it back on the policy->cpus. This way
> we can get rid of the smp-call which was being used to set the pstate
> on the policy->cpus.
> 
> Fixes: 7bc54b652f13 (timers, cpufreq/powernv: Initialize the gpstate timer as pinned)
> Cc: <stable@vger.kernel.org>        [4.8+]
> Reported-by: Nicholas Piggin <npiggin@gmail.com>
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>

Thanks, this looks good to me. I don't know the code though, so

Acked-by: Nicholas Piggin <npiggin@gmail.com>

> ---
> Changes from V2:
> - Remove the check for active policy while requeing the migrated timer
> Changes from V1:
> - Remove smp_call in the pstate handler.
> 
>  drivers/cpufreq/powernv-cpufreq.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> index 71f8682..e368e1f 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -679,6 +679,16 @@ void gpstate_timer_handler(struct timer_list *t)
>  
>  	if (!spin_trylock(&gpstates->gpstate_lock))
>  		return;

I still think it would be good to do something about the trylock failure.
It may be rare, but if it happens it could stop the timer and lead to
some rare unpredictable behaviour? Not for this patch, but while you're
looking at the code it would be good to consider it. Just queueing up
another timer seems like it should be enough.

> +	/*
> +	 * If the timer has migrated to the different cpu then bring
> +	 * it back to one of the policy->cpus
> +	 */
> +	if (!cpumask_test_cpu(raw_smp_processor_id(), policy->cpus)) {
> +		gpstates->timer.expires = jiffies + msecs_to_jiffies(1);
> +		add_timer_on(&gpstates->timer, cpumask_first(policy->cpus));
> +		spin_unlock(&gpstates->gpstate_lock);
> +		return;
> +	}

Really small nitpick, but you could use cpumask_any there.

Thanks,
Nick


>  
>  	/*
>  	 * If PMCR was last updated was using fast_swtich then
> @@ -718,10 +728,8 @@ void gpstate_timer_handler(struct timer_list *t)
>  	if (gpstate_idx != gpstates->last_lpstate_idx)
>  		queue_gpstate_timer(gpstates);
>  
> +	set_pstate(&freq_data);
>  	spin_unlock(&gpstates->gpstate_lock);
> -
> -	/* Timer may get migrated to a different cpu on cpu hot unplug */
> -	smp_call_function_any(policy->cpus, set_pstate, &freq_data, 1);
>  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt
  2018-04-25 10:59 ` Shilpasri G Bhat
  (?)
  (?)
@ 2018-04-26  5:14 ` Viresh Kumar
  -1 siblings, 0 replies; 8+ messages in thread
From: Viresh Kumar @ 2018-04-26  5:14 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: rjw, npiggin, benh, mpe, linux-pm, linuxppc-dev, linux-kernel,
	ppaidipe, svaidy, stable

On 25-04-18, 16:29, Shilpasri G Bhat wrote:
> gpstate_timer_handler() uses synchronous smp_call to set the pstate
> on the requested core. This causes the below hard lockup:
> 
> [c000003fe566b320] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable)
> [c000003fe566b390] [c0000000001d55e0] smp_call_function_any+0x180/0x250
> [c000003fe566b3f0] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580
> [c000003fe566b4a0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
> [c000003fe566b520] [c0000000001b4958] expire_timers+0x138/0x1f0
> [c000003fe566b590] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270
> [c000003fe566b630] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
> [c000003fe566b710] [c000000000114be8] irq_exit+0xe8/0x120
> [c000003fe566b730] [c000000000024d0c] timer_interrupt+0x9c/0xe0
> [c000003fe566b760] [c000000000009014] decrementer_common+0x114/0x120
> -- interrupt: 901 at doorbell_global_ipi+0x34/0x50
> LR = arch_send_call_function_ipi_mask+0x120/0x130
> [c000003fe566ba50] [c00000000004876c]
> arch_send_call_function_ipi_mask+0x4c/0x130
> [c000003fe566ba90] [c0000000001d59f0] smp_call_function_many+0x340/0x450
> [c000003fe566bb00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
> [c000003fe566bb30] [c0000000003a1120] change_huge_pmd+0xe0/0x270
> [c000003fe566bba0] [c000000000349278] change_protection_range+0xb88/0xe40
> [c000003fe566bcf0] [c0000000003496c0] mprotect_fixup+0x140/0x340
> [c000003fe566bdb0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
> [c000003fe566be30] [c00000000000b184] system_call+0x58/0x6c
> 
> One way to avoid this is removing the smp-call. We can ensure that the timer
> always runs on one of the policy-cpus. If the timer gets migrated to a
> cpu outside the policy then re-queue it back on the policy->cpus. This way
> we can get rid of the smp-call which was being used to set the pstate
> on the policy->cpus.
> 
> Fixes: 7bc54b652f13 (timers, cpufreq/powernv: Initialize the gpstate timer as pinned)
> Cc: <stable@vger.kernel.org>        [4.8+]
> Reported-by: Nicholas Piggin <npiggin@gmail.com>
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> ---
> Changes from V2:
> - Remove the check for active policy while requeing the migrated timer
> Changes from V1:
> - Remove smp_call in the pstate handler.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V3] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt
  2018-04-25 10:59 ` Shilpasri G Bhat
                   ` (2 preceding siblings ...)
  (?)
@ 2018-04-26  5:32 ` Vaidyanathan Srinivasan
  -1 siblings, 0 replies; 8+ messages in thread
From: Vaidyanathan Srinivasan @ 2018-04-26  5:32 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: rjw, viresh.kumar, npiggin, benh, mpe, linux-pm, linuxppc-dev,
	linux-kernel, ppaidipe, stable

* Shilpa Bhat <shilpa.bhat@linux.vnet.ibm.com> [2018-04-25 16:29:31]:

> gpstate_timer_handler() uses synchronous smp_call to set the pstate
> on the requested core. This causes the below hard lockup:
> 
> [c000003fe566b320] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable)
> [c000003fe566b390] [c0000000001d55e0] smp_call_function_any+0x180/0x250
> [c000003fe566b3f0] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580
> [c000003fe566b4a0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
> [c000003fe566b520] [c0000000001b4958] expire_timers+0x138/0x1f0
> [c000003fe566b590] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270
> [c000003fe566b630] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
> [c000003fe566b710] [c000000000114be8] irq_exit+0xe8/0x120
> [c000003fe566b730] [c000000000024d0c] timer_interrupt+0x9c/0xe0
> [c000003fe566b760] [c000000000009014] decrementer_common+0x114/0x120
> -- interrupt: 901 at doorbell_global_ipi+0x34/0x50
> LR = arch_send_call_function_ipi_mask+0x120/0x130
> [c000003fe566ba50] [c00000000004876c]
> arch_send_call_function_ipi_mask+0x4c/0x130
> [c000003fe566ba90] [c0000000001d59f0] smp_call_function_many+0x340/0x450
> [c000003fe566bb00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
> [c000003fe566bb30] [c0000000003a1120] change_huge_pmd+0xe0/0x270
> [c000003fe566bba0] [c000000000349278] change_protection_range+0xb88/0xe40
> [c000003fe566bcf0] [c0000000003496c0] mprotect_fixup+0x140/0x340
> [c000003fe566bdb0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
> [c000003fe566be30] [c00000000000b184] system_call+0x58/0x6c
> 
> One way to avoid this is removing the smp-call. We can ensure that the timer
> always runs on one of the policy-cpus. If the timer gets migrated to a
> cpu outside the policy then re-queue it back on the policy->cpus. This way
> we can get rid of the smp-call which was being used to set the pstate
> on the policy->cpus.
> 
> Fixes: 7bc54b652f13 (timers, cpufreq/powernv: Initialize the gpstate timer as pinned)
> Cc: <stable@vger.kernel.org>        [4.8+]
> Reported-by: Nicholas Piggin <npiggin@gmail.com>
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> ---
> Changes from V2:
> - Remove the check for active policy while requeing the migrated timer
> Changes from V1:
> - Remove smp_call in the pstate handler.
> 
>  drivers/cpufreq/powernv-cpufreq.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> index 71f8682..e368e1f 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -679,6 +679,16 @@ void gpstate_timer_handler(struct timer_list *t)
> 
>  	if (!spin_trylock(&gpstates->gpstate_lock))
>  		return;
> +	/*
> +	 * If the timer has migrated to the different cpu then bring
> +	 * it back to one of the policy->cpus
> +	 */
> +	if (!cpumask_test_cpu(raw_smp_processor_id(), policy->cpus)) {
> +		gpstates->timer.expires = jiffies + msecs_to_jiffies(1);
> +		add_timer_on(&gpstates->timer, cpumask_first(policy->cpus));
> +		spin_unlock(&gpstates->gpstate_lock);
> +		return;
> +	}
> 
>  	/*
>  	 * If PMCR was last updated was using fast_swtich then
> @@ -718,10 +728,8 @@ void gpstate_timer_handler(struct timer_list *t)
>  	if (gpstate_idx != gpstates->last_lpstate_idx)
>  		queue_gpstate_timer(gpstates);
> 
> +	set_pstate(&freq_data);
>  	spin_unlock(&gpstates->gpstate_lock);
> -
> -	/* Timer may get migrated to a different cpu on cpu hot unplug */
> -	smp_call_function_any(policy->cpus, set_pstate, &freq_data, 1);
>  }

Fix looks good. 

Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [V3] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt
  2018-04-25 10:59 ` Shilpasri G Bhat
@ 2018-04-28 11:12   ` Michael Ellerman
  -1 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2018-04-28 11:12 UTC (permalink / raw)
  To: Shilpasri G Bhat, rjw, viresh.kumar
  Cc: linux-pm, ppaidipe, linux-kernel, npiggin, stable,
	Shilpasri G Bhat, linuxppc-dev

On Wed, 2018-04-25 at 10:59:31 UTC, Shilpasri G Bhat wrote:
> gpstate_timer_handler() uses synchronous smp_call to set the pstate
> on the requested core. This causes the below hard lockup:
> 
> [c000003fe566b320] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable)
> [c000003fe566b390] [c0000000001d55e0] smp_call_function_any+0x180/0x250
> [c000003fe566b3f0] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580
> [c000003fe566b4a0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
> [c000003fe566b520] [c0000000001b4958] expire_timers+0x138/0x1f0
> [c000003fe566b590] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270
> [c000003fe566b630] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
> [c000003fe566b710] [c000000000114be8] irq_exit+0xe8/0x120
> [c000003fe566b730] [c000000000024d0c] timer_interrupt+0x9c/0xe0
> [c000003fe566b760] [c000000000009014] decrementer_common+0x114/0x120
> -- interrupt: 901 at doorbell_global_ipi+0x34/0x50
> LR = arch_send_call_function_ipi_mask+0x120/0x130
> [c000003fe566ba50] [c00000000004876c]
> arch_send_call_function_ipi_mask+0x4c/0x130
> [c000003fe566ba90] [c0000000001d59f0] smp_call_function_many+0x340/0x450
> [c000003fe566bb00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
> [c000003fe566bb30] [c0000000003a1120] change_huge_pmd+0xe0/0x270
> [c000003fe566bba0] [c000000000349278] change_protection_range+0xb88/0xe40
> [c000003fe566bcf0] [c0000000003496c0] mprotect_fixup+0x140/0x340
> [c000003fe566bdb0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
> [c000003fe566be30] [c00000000000b184] system_call+0x58/0x6c
> 
> One way to avoid this is removing the smp-call. We can ensure that the timer
> always runs on one of the policy-cpus. If the timer gets migrated to a
> cpu outside the policy then re-queue it back on the policy->cpus. This way
> we can get rid of the smp-call which was being used to set the pstate
> on the policy->cpus.
> 
> Fixes: 7bc54b652f13 (timers, cpufreq/powernv: Initialize the gpstate timer as pinned)
> Cc: <stable@vger.kernel.org>        [4.8+]
> Reported-by: Nicholas Piggin <npiggin@gmail.com>
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Acked-by: Nicholas Piggin <npiggin@gmail.com>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/c0f7f5b6c69107ca92909512533e70

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [V3] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt
@ 2018-04-28 11:12   ` Michael Ellerman
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2018-04-28 11:12 UTC (permalink / raw)
  To: rjw, viresh.kumar
  Cc: linux-pm, ppaidipe, linux-kernel, npiggin, stable,
	Shilpasri G Bhat, linuxppc-dev

On Wed, 2018-04-25 at 10:59:31 UTC, Shilpasri G Bhat wrote:
> gpstate_timer_handler() uses synchronous smp_call to set the pstate
> on the requested core. This causes the below hard lockup:
> 
> [c000003fe566b320] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable)
> [c000003fe566b390] [c0000000001d55e0] smp_call_function_any+0x180/0x250
> [c000003fe566b3f0] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580
> [c000003fe566b4a0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
> [c000003fe566b520] [c0000000001b4958] expire_timers+0x138/0x1f0
> [c000003fe566b590] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270
> [c000003fe566b630] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
> [c000003fe566b710] [c000000000114be8] irq_exit+0xe8/0x120
> [c000003fe566b730] [c000000000024d0c] timer_interrupt+0x9c/0xe0
> [c000003fe566b760] [c000000000009014] decrementer_common+0x114/0x120
> -- interrupt: 901 at doorbell_global_ipi+0x34/0x50
> LR = arch_send_call_function_ipi_mask+0x120/0x130
> [c000003fe566ba50] [c00000000004876c]
> arch_send_call_function_ipi_mask+0x4c/0x130
> [c000003fe566ba90] [c0000000001d59f0] smp_call_function_many+0x340/0x450
> [c000003fe566bb00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
> [c000003fe566bb30] [c0000000003a1120] change_huge_pmd+0xe0/0x270
> [c000003fe566bba0] [c000000000349278] change_protection_range+0xb88/0xe40
> [c000003fe566bcf0] [c0000000003496c0] mprotect_fixup+0x140/0x340
> [c000003fe566bdb0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
> [c000003fe566be30] [c00000000000b184] system_call+0x58/0x6c
> 
> One way to avoid this is removing the smp-call. We can ensure that the timer
> always runs on one of the policy-cpus. If the timer gets migrated to a
> cpu outside the policy then re-queue it back on the policy->cpus. This way
> we can get rid of the smp-call which was being used to set the pstate
> on the policy->cpus.
> 
> Fixes: 7bc54b652f13 (timers, cpufreq/powernv: Initialize the gpstate timer as pinned)
> Cc: <stable@vger.kernel.org>        [4.8+]
> Reported-by: Nicholas Piggin <npiggin@gmail.com>
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
> Acked-by: Nicholas Piggin <npiggin@gmail.com>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/c0f7f5b6c69107ca92909512533e70

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-04-28 11:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 10:59 [PATCH V3] cpufreq: powernv: Fix the hardlockup by synchronus smp_call in timer interrupt Shilpasri G Bhat
2018-04-25 10:59 ` Shilpasri G Bhat
2018-04-25 11:26 ` Nicholas Piggin
2018-04-25 11:26   ` Nicholas Piggin
2018-04-26  5:14 ` Viresh Kumar
2018-04-26  5:32 ` Vaidyanathan Srinivasan
2018-04-28 11:12 ` [V3] " Michael Ellerman
2018-04-28 11:12   ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.