linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Shutdown problem in SMP system happened on Tegra20
@ 2012-08-24  8:23 Bill Huang
  2012-08-24 18:21 ` Russell King - ARM Linux
  0 siblings, 1 reply; 8+ messages in thread
From: Bill Huang @ 2012-08-24  8:23 UTC (permalink / raw)
  To: 'linux-tegra@vger.kernel.org'
  Cc: 'linux-arm-kernel@lists.infradead.org',
	'linux-kernel@vger.kernel.org'

Hi,

When doing shutdown on Tegra20/Tegra30, we need to read/write PMIC registers through I2C
to perform the power off sequence. Unfortunately, sometimes we'll fail to shutdown
due to I2C timeout on Tegra20. And the cause of the timeout is due to the CPU which I2C
controller IRQ affined to will have chance to be offlined without migrating all irqs affined 
to it, so the following I2C transactions will fail (no any CPU will handle that interrupt
since then).

Some snippet of the shutdown codes:

void kernel_power_off(void)
{
	kernel_shutdown_prepare(SYSTEM_POWER_OFF);
	:
	disable_nonboot_cpus();
	:
	machine_power_off();
}

void machine_power_off(void)
{
	machine_shutdown();
	if (pm_power_off)
		pm_power_off(); /* this is where we send I2C write to shutdown */
}

void machine_shutdown(void)
{
#ifdef CONFIG_SMP
	smp_send_stop();
#endif
}

In "smp_send_stop()", it will send "IPI_CPU_STOPS" to offline other cpus except
current cpu (smp_processor_id()), however, current cpu will not always be cpu0 at
least at Tegra20, that said for example cpu1 might be the current cpu and cpu0 will
be offlined and this is the case why the I2C transaction will timeout. 

For normal case, "disable_nonboot_cpus()" call will disable all other Cpus except
cpu0, that means we won't hit the problem mentioned here since cpu0 will always be
the current cpu in the call "smp_send_stop", but the call to "disable_nonboot_cpus" 
will happen only when "CONFIG_PM_SLEEP_SMP" is enabled which is not the case for
Tegra20/Tegra30, we don't support suspend yet so this can't be enabled.

There are two known fix for this, the first one is enable suspend (ARCH_SUSPEND_POSSIBLE)
so the cpu0 will be the only online cpu while doing "machine_shutdown". The second
fix is adding call to "migrate_irqs()" in "ipi_cpu_stop" so all irqs can be migrated to
the active cpu.

Could someone familiar with the ARM SMP design help to answer my two questions?

1. Does it make sense that "smp_processor_id()" could be non-cpu0 in the call
   "smp_send_stop()"? In Tegra30 it will always be cpu0 but Tegra20 will be 50-50,
   I just can't find the magic.

2. If current cpu is not necessarily be cpu0 in the call "smp_send_stop()", then
   does it make sense to add "migrate_irqs()" in "ipi_cpu_stop()"? Or is there any
   other fix which makes more sense?

Thanks,
Bill
nvpublic

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Shutdown problem in SMP system happened on Tegra20
  2012-08-24  8:23 Shutdown problem in SMP system happened on Tegra20 Bill Huang
@ 2012-08-24 18:21 ` Russell King - ARM Linux
  2012-08-25  0:00   ` Bill Huang
  2012-08-25  7:36   ` Shawn Guo
  0 siblings, 2 replies; 8+ messages in thread
From: Russell King - ARM Linux @ 2012-08-24 18:21 UTC (permalink / raw)
  To: Bill Huang
  Cc: 'linux-tegra@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org'

On Fri, Aug 24, 2012 at 04:23:39PM +0800, Bill Huang wrote:
> When doing shutdown on Tegra20/Tegra30, we need to read/write PMIC
> registers through I2C to perform the power off sequence. Unfortunately,
> sometimes we'll fail to shutdown due to I2C timeout on Tegra20. And the
> cause of the timeout is due to the CPU which I2C controller IRQ affined
> to will have chance to be offlined without migrating all irqs affined 
> to it, so the following I2C transactions will fail (no any CPU will
> handle that interrupt since then).

> Some snippet of the shutdown codes:
> 
> void kernel_power_off(void)
> {
> 	kernel_shutdown_prepare(SYSTEM_POWER_OFF);
> 	:
> 	disable_nonboot_cpus();
> 	:
> 	machine_power_off();
> }
> 
> void machine_power_off(void)
> {
> 	machine_shutdown();
> 	if (pm_power_off)
> 		pm_power_off(); /* this is where we send I2C write to shutdown */
> }
> 
> void machine_shutdown(void)
> {
> #ifdef CONFIG_SMP
> 	smp_send_stop();
> #endif
> }
> 
> In "smp_send_stop()", it will send "IPI_CPU_STOPS" to offline other cpus except
> current cpu (smp_processor_id()), however, current cpu will not always be cpu0 at
> least at Tegra20, that said for example cpu1 might be the current cpu and cpu0 will
> be offlined and this is the case why the I2C transaction will timeout. 
> 
> For normal case, "disable_nonboot_cpus()" call will disable all other Cpus except
> cpu0, that means we won't hit the problem mentioned here since cpu0 will always be
> the current cpu in the call "smp_send_stop", but the call to "disable_nonboot_cpus" 
> will happen only when "CONFIG_PM_SLEEP_SMP" is enabled which is not the case for
> Tegra20/Tegra30, we don't support suspend yet so this can't be enabled.

So what you're asking for is a feature to do what CONFIG_PM_SLEEP_SMP
does, but without CONFIG_PM_SLEEP_SMP enabled?

Why not just ensure that CONFIG_PM_SLEEP_SMP is enabled if your platform
requires that the lowest CPU number be the CPU dealing with reboot?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Shutdown problem in SMP system happened on Tegra20
  2012-08-24 18:21 ` Russell King - ARM Linux
@ 2012-08-25  0:00   ` Bill Huang
  2012-08-25  7:17     ` Russell King - ARM Linux
  2012-08-25  7:36   ` Shawn Guo
  1 sibling, 1 reply; 8+ messages in thread
From: Bill Huang @ 2012-08-25  0:00 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: 'linux-tegra@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org'

nvpublic
> On Fri, Aug 24, 2012 at 04:23:39PM +0800, Bill Huang wrote:
> > When doing shutdown on Tegra20/Tegra30, we need to read/write PMIC
> > registers through I2C to perform the power off sequence.
> > Unfortunately, sometimes we'll fail to shutdown due to I2C timeout on
> > Tegra20. And the cause of the timeout is due to the CPU which I2C
> > controller IRQ affined to will have chance to be offlined without
> > migrating all irqs affined to it, so the following I2C transactions
> > will fail (no any CPU will handle that interrupt since then).
> 
> > Some snippet of the shutdown codes:
> >
> > void kernel_power_off(void)
> > {
> > 	kernel_shutdown_prepare(SYSTEM_POWER_OFF);
> > 	:
> > 	disable_nonboot_cpus();
> > 	:
> > 	machine_power_off();
> > }
> >
> > void machine_power_off(void)
> > {
> > 	machine_shutdown();
> > 	if (pm_power_off)
> > 		pm_power_off(); /* this is where we send I2C write to shutdown */ }
> >
> > void machine_shutdown(void)
> > {
> > #ifdef CONFIG_SMP
> > 	smp_send_stop();
> > #endif
> > }
> >
> > In "smp_send_stop()", it will send "IPI_CPU_STOPS" to offline other
> > cpus except current cpu (smp_processor_id()), however, current cpu
> > will not always be cpu0 at least at Tegra20, that said for example
> > cpu1 might be the current cpu and cpu0 will be offlined and this is the case why the I2C transaction
> will timeout.
> >
> > For normal case, "disable_nonboot_cpus()" call will disable all other
> > Cpus except cpu0, that means we won't hit the problem mentioned here
> > since cpu0 will always be the current cpu in the call "smp_send_stop", but the call to
> "disable_nonboot_cpus"
> > will happen only when "CONFIG_PM_SLEEP_SMP" is enabled which is not
> > the case for Tegra20/Tegra30, we don't support suspend yet so this can't be enabled.
> 
> So what you're asking for is a feature to do what CONFIG_PM_SLEEP_SMP does, but without
> CONFIG_PM_SLEEP_SMP enabled?

Yeah pretty much, I'm actually asking should we take care of this since maybe not all platforms 
will have this config enabled?
> 
> Why not just ensure that CONFIG_PM_SLEEP_SMP is enabled if your platform requires that the lowest CPU
> number be the CPU dealing with reboot?

Someday we will have it enabled, but before that we'll hit the issue, so you don't think
this should be taken care of? Thanks.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Shutdown problem in SMP system happened on Tegra20
  2012-08-25  0:00   ` Bill Huang
@ 2012-08-25  7:17     ` Russell King - ARM Linux
  2012-08-27  5:43       ` Bill Huang
  0 siblings, 1 reply; 8+ messages in thread
From: Russell King - ARM Linux @ 2012-08-25  7:17 UTC (permalink / raw)
  To: Bill Huang
  Cc: 'linux-tegra@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org'

On Sat, Aug 25, 2012 at 08:00:56AM +0800, Bill Huang wrote:
> nvpublic
> > So what you're asking for is a feature to do what CONFIG_PM_SLEEP_SMP does, but without
> > CONFIG_PM_SLEEP_SMP enabled?
> 
> Yeah pretty much, I'm actually asking should we take care of this since maybe not all platforms 
> will have this config enabled?
> > 
> > Why not just ensure that CONFIG_PM_SLEEP_SMP is enabled if your platform requires that the lowest CPU
> > number be the CPU dealing with reboot?
> 
> Someday we will have it enabled, but before that we'll hit the issue, so you don't think
> this should be taken care of? Thanks.

Well, just enable CONFIG_PM_SLEEP_SMP then - it'll be far better than
trying to implement something in the ARM code, especially as sysrq can
be used for poweroff/reboot etc (which will run from atomic contexts
where you can't switch to another CPU.)

What you're asking for will make these callbacks fragile for other
platforms.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Shutdown problem in SMP system happened on Tegra20
  2012-08-24 18:21 ` Russell King - ARM Linux
  2012-08-25  0:00   ` Bill Huang
@ 2012-08-25  7:36   ` Shawn Guo
  2012-08-25  7:52     ` Russell King - ARM Linux
  1 sibling, 1 reply; 8+ messages in thread
From: Shawn Guo @ 2012-08-25  7:36 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Bill Huang, 'linux-tegra@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org'

On Fri, Aug 24, 2012 at 07:21:33PM +0100, Russell King - ARM Linux wrote:
> Why not just ensure that CONFIG_PM_SLEEP_SMP is enabled if your platform
> requires that the lowest CPU number be the CPU dealing with reboot?

I have CONFIG_PM_SLEEP_SMP enabled for imx6q, but still see the imx6q
restart hook running on cpu1 than cpu0.  It seems that
disable_nonboot_cpus is only called in kernel_power_off but never
kernel_restart.  We should probably patch kernel_restart or
machine_restart to have disable_nonboot_cpus called? 

-- 
Regards,
Shawn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Shutdown problem in SMP system happened on Tegra20
  2012-08-25  7:36   ` Shawn Guo
@ 2012-08-25  7:52     ` Russell King - ARM Linux
  2012-08-25 10:33       ` Shawn Guo
  0 siblings, 1 reply; 8+ messages in thread
From: Russell King - ARM Linux @ 2012-08-25  7:52 UTC (permalink / raw)
  To: Shawn Guo
  Cc: Bill Huang, 'linux-tegra@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org'

On Sat, Aug 25, 2012 at 03:36:27PM +0800, Shawn Guo wrote:
> On Fri, Aug 24, 2012 at 07:21:33PM +0100, Russell King - ARM Linux wrote:
> > Why not just ensure that CONFIG_PM_SLEEP_SMP is enabled if your platform
> > requires that the lowest CPU number be the CPU dealing with reboot?
> 
> I have CONFIG_PM_SLEEP_SMP enabled for imx6q, but still see the imx6q
> restart hook running on cpu1 than cpu0.  It seems that
> disable_nonboot_cpus is only called in kernel_power_off but never
> kernel_restart.  We should probably patch kernel_restart or
> machine_restart to have disable_nonboot_cpus called? 

Remember that this path gets called from IRQ context which makes calling
functions which sleep very dodgy.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Shutdown problem in SMP system happened on Tegra20
  2012-08-25  7:52     ` Russell King - ARM Linux
@ 2012-08-25 10:33       ` Shawn Guo
  0 siblings, 0 replies; 8+ messages in thread
From: Shawn Guo @ 2012-08-25 10:33 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Bill Huang, 'linux-tegra@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org'

On Sat, Aug 25, 2012 at 08:52:21AM +0100, Russell King - ARM Linux wrote:
> On Sat, Aug 25, 2012 at 03:36:27PM +0800, Shawn Guo wrote:
> > On Fri, Aug 24, 2012 at 07:21:33PM +0100, Russell King - ARM Linux wrote:
> > > Why not just ensure that CONFIG_PM_SLEEP_SMP is enabled if your platform
> > > requires that the lowest CPU number be the CPU dealing with reboot?
> > 
> > I have CONFIG_PM_SLEEP_SMP enabled for imx6q, but still see the imx6q
> > restart hook running on cpu1 than cpu0.  It seems that
> > disable_nonboot_cpus is only called in kernel_power_off but never
> > kernel_restart.  We should probably patch kernel_restart or
> > machine_restart to have disable_nonboot_cpus called? 
> 
> Remember that this path gets called from IRQ context which makes calling
> functions which sleep very dodgy.

I'm so familiar with the code/context.  Are you saying disable_nonboot_cpus
sleeps and kernel_power_off never gets called from IRQ context?

I tested the code change below on kernel_restart with
CONFIG_DEBUG_ATOMIC_SLEEP enabled.  It fixes imx6q restart issue (with
no change on cpu_relax) while no sleep-inside-atomic-section warning
is seen. 

Regards,
Shawn

diff --git a/kernel/sys.c b/kernel/sys.c
index 241507f..0359328 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -373,6 +373,7 @@ void kernel_restart(char *cmd)
        else
                printk(KERN_EMERG "Restarting system with command '%s'.\n", cmd);
        kmsg_dump(KMSG_DUMP_RESTART);
+       disable_nonboot_cpus();
        machine_restart(cmd);
 }
 EXPORT_SYMBOL_GPL(kernel_restart);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: Shutdown problem in SMP system happened on Tegra20
  2012-08-25  7:17     ` Russell King - ARM Linux
@ 2012-08-27  5:43       ` Bill Huang
  0 siblings, 0 replies; 8+ messages in thread
From: Bill Huang @ 2012-08-27  5:43 UTC (permalink / raw)
  To: 'Russell King - ARM Linux'
  Cc: 'linux-tegra@vger.kernel.org',
	'linux-kernel@vger.kernel.org',
	'linux-arm-kernel@lists.infradead.org'

> On Sat, Aug 25, 2012 at 08:00:56AM +0800, Bill Huang wrote:
> > nvpublic
> > > So what you're asking for is a feature to do what
> > > CONFIG_PM_SLEEP_SMP does, but without CONFIG_PM_SLEEP_SMP enabled?
> >
> > Yeah pretty much, I'm actually asking should we take care of this
> > since maybe not all platforms will have this config enabled?
> > >
> > > Why not just ensure that CONFIG_PM_SLEEP_SMP is enabled if your
> > > platform requires that the lowest CPU number be the CPU dealing with reboot?
> >
> > Someday we will have it enabled, but before that we'll hit the issue,
> > so you don't think this should be taken care of? Thanks.
> 
> Well, just enable CONFIG_PM_SLEEP_SMP then - it'll be far better than trying to implement something in
> the ARM code, especially as sysrq can be used for poweroff/reboot etc (which will run from atomic
> contexts where you can't switch to another CPU.)
> 
> What you're asking for will make these callbacks fragile for other platforms.

OK, thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-08-27  5:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-24  8:23 Shutdown problem in SMP system happened on Tegra20 Bill Huang
2012-08-24 18:21 ` Russell King - ARM Linux
2012-08-25  0:00   ` Bill Huang
2012-08-25  7:17     ` Russell King - ARM Linux
2012-08-27  5:43       ` Bill Huang
2012-08-25  7:36   ` Shawn Guo
2012-08-25  7:52     ` Russell King - ARM Linux
2012-08-25 10:33       ` Shawn Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).