CPU1 does not come back online after failed suspend request

* CPU1 does not come back online after failed suspend request
@ 2016-06-22 15:30 Mason
  2016-06-24 19:21 ` Mason
  2016-06-27 21:12 ` Mason
  0 siblings, 2 replies; 3+ messages in thread
From: Mason @ 2016-06-22 15:30 UTC (permalink / raw)
  To: linux-pm
  Cc: Rafael J. Wysocki, Russell King, Kevin Hilman, Sebastian Frias,
	Thibaud Cornic, Thomas Petazzoni

Hello,

(I'm using v4.7-rc4)

My dual-core platform defines the usual hooks:

static const struct smp_operations tango_smp_ops __initconst = {
	.smp_boot_secondary	= tango_boot_secondary,
	.cpu_kill		= tango_cpu_kill,
	.cpu_die		= tango_cpu_die,
};

static const struct platform_suspend_ops tango_pm_ops = {
	.enter = tango_pm_enter,
	.valid = tango_pm_valid,
};

static int tango_pm_powerdown(unsigned long data)
{
	// tango_suspend(virt_to_phys(cpu_resume)); // SHOULD NOT RETURN
	printk("DEBUG: %s\n", __func__);
	// INSERT ONE SECOND DELAY
	return 42;
}

static int tango_pm_enter(suspend_state_t state)
{
	printk("DEBUG: %s\n", __func__);
	int ret = cpu_suspend(0, tango_pm_powerdown);
	printk("DEBUG: cpu_suspend returned %d\n", ret);
	return 0;
}

I'm trying to test the error path, i.e. when tango_pm_powerdown()
does in fact return.

Secondary core off-lining via /sys/devices/system/cpu/cpu1/online
seems to work as expected:

# cat /sys/devices/system/cpu/online     
0-1
# echo 0 > /sys/devices/system/cpu/cpu1/online
[   64.022349] CPU1: shutdown
[   64.022354] DEBUG: tango_cpu_die
[   64.028370] DEBUG: tango_cpu_kill
# cat /sys/devices/system/cpu/online 
0
# echo 1 > /sys/devices/system/cpu/cpu1/online
[   73.955994] DEBUG: tango_boot_secondary
# cat /sys/devices/system/cpu/online 
0-1

But the secondary core does not come back online after a failed
suspend attempt (see below). I tried adding a 1 second delay in
tango_pm_powerdown() to rule out timing issues.

# echo mem > /sys/power/state
[   16.328980] PM: Syncing filesystems ... done.
[   16.336844] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   16.345421] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   16.354034] Suspending console(s) (use no_console_suspend to debug)
[   16.362965] PM: suspend of devices complete after 1.764 msecs
[   16.363870] PM: late suspend of devices complete after 0.896 msecs
[   16.364519] PM: noirq suspend of devices complete after 0.642 msecs
[   16.364522] Disabling non-boot CPUs ...
[   16.382340] CPU1: shutdown
[   16.382344] DEBUG: tango_cpu_die
[   16.382346] DEBUG: tango_cpu_kill
[   16.392635] DEBUG: tango_pm_enter
[   16.392635] DEBUG: tango_pm_powerdown
[   16.392635] DEBUG: cpu_suspend returned 42
[   16.392664] Enabling non-boot CPUs ...
[   16.412544] DEBUG: tango_boot_secondary
[   17.411927] CPU1: failed to come online
[   17.432448] Error taking CPU1 up: -5
[   17.433034] PM: noirq resume of devices complete after 0.576 msecs
[   17.433750] PM: early resume of devices complete after 0.688 msecs
[   17.435121] nb8800 26000.ethernet eth0: Link is Down
[   17.435301] PM: resume of devices complete after 1.541 msecs
[   17.516826] Restarting tasks ... done.

[root@toto5 ~]# cat /sys/devices/system/cpu/online
0

As you can see, cpu1 did not come back online.
[ 17.411927] CPU1: failed to come online
[ 17.432448] Error taking CPU1 up: -5

The other weirdness is that my 1 second delay happens between
"DEBUG: tango_pm_powerdown" and "DEBUG: cpu_suspend returned 42",
yet the timestamps for these two lines are identical. Is that
because that the timestamp variable is not updated deep within
the suspend framework? (My timer ticks at 27 MHz.)

Any idea what might be going wrong?

Regards.

^ permalink raw reply	[flat|nested] 3+ messages in thread