All of lore.kernel.org
 help / color / mirror / Atom feed
* 4.10-rc1: thinkpad x60: who ate my cpu?
@ 2017-01-08 22:17 Pavel Machek
  2017-01-09  9:30 ` Pavel Machek
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Machek @ 2017-01-08 22:17 UTC (permalink / raw)
  To: kernel list, tglx, mingo, hpa

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

Hi!

I used to have two cpus, and Thinkpad X60 should have two cores, but I
only see one on 4.10-rc1. This machine went through many
suspend/resume cycles. When backups finish, I'll try -rc2.

pavel@duo:~$ uname -a
Linux duo 4.10.0-rc1+ #304 SMP Mon Dec 26 10:33:24 CET 2016 i686
GNU/Linux
pavel@duo:~$ zcat /proc/config.gz | grep SMP
CONFIG_X86_32_SMP=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_SMP=y
# CONFIG_X86_BIGSMP is not set
CONFIG_PM_SLEEP_SMP=y
pavel@duo:~$ cat /proc/cpuinfo
processor    : 0
vendor_id    : GenuineIntel
cpu family   : 6
model	       : 14
model name     : Genuine Intel(R) CPU           T2400  @ 1.83GHz
stepping       : 8
cpu MHz	       	 : 1833.000
cache size	 : 2048 KB
physical id	 : 0
siblings : 1
core id	   : 0
cpu cores  : 1
apicid	     : 0
initial apicid : 0
fdiv_bug       : no
f00f_bug       : no
coma_bug       : no
fpu	       	 : yes
fpu_exception	 : yes
cpuid level	 : 10
wp    		 : yes
flags		   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx
constant_tsc arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr
pdcm dtherm
bugs	:
bogomips	: 3657.62
clflush size	: 64
cache_alignment	: 64
address sizes	: 32 bits physical, 32 bits virtual
power management:

pavel@duo:~$


									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-01-08 22:17 4.10-rc1: thinkpad x60: who ate my cpu? Pavel Machek
@ 2017-01-09  9:30 ` Pavel Machek
  2017-01-13  1:19   ` Woody Suwalski
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Machek @ 2017-01-09  9:30 UTC (permalink / raw)
  To: kernel list, tglx, mingo, hpa

[-- Attachment #1: Type: text/plain, Size: 458 bytes --]

Hi!

> I used to have two cpus, and Thinkpad X60 should have two cores, but I
> only see one on 4.10-rc1. This machine went through many
> suspend/resume cycles. When backups finish, I'll try -rc2.

Whoever did it, he seems to have returned the cpu in -rc3. All seems
to be good now.
								Pavel
								

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-01-09  9:30 ` Pavel Machek
@ 2017-01-13  1:19   ` Woody Suwalski
  2017-01-14 11:30     ` Pavel Machek
  0 siblings, 1 reply; 17+ messages in thread
From: Woody Suwalski @ 2017-01-13  1:19 UTC (permalink / raw)
  To: Pavel Machek, kernel list, tglx, mingo, hpa

Pavel Machek wrote:
> Hi!
>
>> I used to have two cpus, and Thinkpad X60 should have two cores, but I
>> only see one on 4.10-rc1. This machine went through many
>> suspend/resume cycles. When backups finish, I'll try -rc2.
> Whoever did it, he seems to have returned the cpu in -rc3. All seems
> to be good now.
> 								Pavel
> 								
Actually since you have mentioned - I have checked my x60 - same problem 
- only one CPU. However I was running 4.8.13 with uptime 33 days, 
multiple sleep/wake-ups.
Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the issue 
is older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup 
related...

Woody

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-01-13  1:19   ` Woody Suwalski
@ 2017-01-14 11:30     ` Pavel Machek
  2017-01-15  9:56       ` Pavel Machek
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Machek @ 2017-01-14 11:30 UTC (permalink / raw)
  To: Woody Suwalski, Rafael J. Wysocki; +Cc: kernel list, tglx, mingo, hpa

[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]

Hi!

On Thu 2017-01-12 20:19:31, Woody Suwalski wrote:
> Pavel Machek wrote:
> >Hi!
> >
> >>I used to have two cpus, and Thinkpad X60 should have two cores, but I
> >>only see one on 4.10-rc1. This machine went through many
> >>suspend/resume cycles. When backups finish, I'll try -rc2.
> >Whoever did it, he seems to have returned the cpu in -rc3. All seems
> >to be good now.

> Actually since you have mentioned - I have checked my x60 - same problem -
> only one CPU. However I was running 4.8.13 with uptime 33 days, multiple
> sleep/wake-ups.
> Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the issue is
> older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup
> related...

Hmm. So I seen two cores in -rc3 after boot. But it is quite well
possible that -rc1 was ok just after boot, too, and problem happened
sometime later (probably during suspend/resume cycles). Let me go back
to -rc1 to check.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-01-14 11:30     ` Pavel Machek
@ 2017-01-15  9:56       ` Pavel Machek
  2017-02-11 23:48         ` Woody Suwalski
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Machek @ 2017-01-15  9:56 UTC (permalink / raw)
  To: Woody Suwalski, Rafael J. Wysocki; +Cc: kernel list, tglx, mingo, hpa

[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]

On Sat 2017-01-14 12:30:54, Pavel Machek wrote:
> Hi!
> 
> On Thu 2017-01-12 20:19:31, Woody Suwalski wrote:
> > Pavel Machek wrote:
> > >Hi!
> > >
> > >>I used to have two cpus, and Thinkpad X60 should have two cores, but I
> > >>only see one on 4.10-rc1. This machine went through many
> > >>suspend/resume cycles. When backups finish, I'll try -rc2.
> > >Whoever did it, he seems to have returned the cpu in -rc3. All seems
> > >to be good now.
> 
> > Actually since you have mentioned - I have checked my x60 - same problem -
> > only one CPU. However I was running 4.8.13 with uptime 33 days, multiple
> > sleep/wake-ups.
> > Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the issue is
> > older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup
> > related...
> 
> Hmm. So I seen two cores in -rc3 after boot. But it is quite well
> possible that -rc1 was ok just after boot, too, and problem happened
> sometime later (probably during suspend/resume cycles). Let me go back
> to -rc1 to check.

Indeed in -rc1 I see both CPUs after boot. So we have hard to
reproduce case where 4.8 to 4.10 kernels lose one of the cpu cores...



-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-01-15  9:56       ` Pavel Machek
@ 2017-02-11 23:48         ` Woody Suwalski
  2017-02-12 15:43           ` Woody Suwalski
  0 siblings, 1 reply; 17+ messages in thread
From: Woody Suwalski @ 2017-02-11 23:48 UTC (permalink / raw)
  To: Pavel Machek, Rafael J. Wysocki; +Cc: kernel list, tglx, mingo, hpa

Pavel Machek wrote:
> On Sat 2017-01-14 12:30:54, Pavel Machek wrote:
>> Hi!
>>
>> On Thu 2017-01-12 20:19:31, Woody Suwalski wrote:
>>> Pavel Machek wrote:
>>>> Hi!
>>>>
>>>>> I used to have two cpus, and Thinkpad X60 should have two cores, but I
>>>>> only see one on 4.10-rc1. This machine went through many
>>>>> suspend/resume cycles. When backups finish, I'll try -rc2.
>>>> Whoever did it, he seems to have returned the cpu in -rc3. All seems
>>>> to be good now.
>>> Actually since you have mentioned - I have checked my x60 - same problem -
>>> only one CPU. However I was running 4.8.13 with uptime 33 days, multiple
>>> sleep/wake-ups.
>>> Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the issue is
>>> older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup
>>> related...
>> Hmm. So I seen two cores in -rc3 after boot. But it is quite well
>> possible that -rc1 was ok just after boot, too, and problem happened
>> sometime later (probably during suspend/resume cycles). Let me go back
>> to -rc1 to check.
> Indeed in -rc1 I see both CPUs after boot. So we have hard to
> reproduce case where 4.8 to 4.10 kernels lose one of the cpu cores...
>
>
>
Managed to duplicate - but it took again a long time - I have an uptime 
of 29 days.
It must have happened in the last day, as I kept checking as often as I 
remembered.

The kernel is 4.8.17 EOL, installed almost a month ago.
Platform ThinkPad x60,  Intel(R) Core(TM) Duo CPU      T2400  @ 1.83GHz

In dmesg I see that it used to be when 2 CPUs were OK:
[690409.476107] PM: noirq suspend of devices complete after 79.914 msecs
[690409.476547] ACPI: Preparing to enter system sleep state S3
[690409.780081] ACPI : EC: EC stopped
[690409.780083] PM: Saving platform NVS memory
[690409.780284] Disabling non-boot CPUs ...
[690409.805284] smpboot: CPU 1 is now offline
[690409.816464] ACPI: Low-level resume complete
[690409.816464] ACPI : EC: EC started
[690409.816464] PM: Restoring platform NVS memory
[690409.816464] Enabling non-boot CPUs ...
[690409.840574] x86: Booting SMP configuration:
[690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1
[690409.805271] Initializing CPU#1
[690409.805271] Disabled fast string operations
[690409.888252]  cache: parent cpu1 should not be sleeping
[690409.920185] CPU1 is up
[690409.922288] ACPI: Waking up from system sleep state S3

Then the CPU1 failed to start:

[691329.776108] PM: noirq suspend of devices complete after 79.941 msecs
[691329.776550] ACPI: Preparing to enter system sleep state S3
[691330.080081] ACPI : EC: EC stopped
[691330.080083] PM: Saving platform NVS memory
[691330.080284] Disabling non-boot CPUs ...
[691330.105303] smpboot: CPU 1 is now offline
[691330.116477] ACPI: Low-level resume complete
[691330.116477] ACPI : EC: EC started
[691330.116477] PM: Restoring platform NVS memory
[691330.116477] Enabling non-boot CPUs ...
[691330.140570] x86: Booting SMP configuration:
[691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1
[691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1
[691340.164445] Error taking CPU1 up: -5
[691340.166309] ACPI: Waking up from system sleep state S3

And now it is:
[692517.868523] ACPI: Preparing to enter system sleep state S3
[692518.172074] ACPI : EC: EC stopped
[692518.172076] PM: Saving platform NVS memory
[692518.172269] Disabling non-boot CPUs ...
[692518.172269] ACPI: Low-level resume complete
[692518.172269] ACPI : EC: EC started
[692518.172269] PM: Restoring platform NVS memory
[692518.172269] ACPI: Waking up from system sleep state S3

Is there any test I could do on the CPU wakeup while in that state?

Woody

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-02-11 23:48         ` Woody Suwalski
@ 2017-02-12 15:43           ` Woody Suwalski
  2017-02-12 19:57             ` Pavel Machek
  0 siblings, 1 reply; 17+ messages in thread
From: Woody Suwalski @ 2017-02-12 15:43 UTC (permalink / raw)
  To: Pavel Machek, Rafael J. Wysocki; +Cc: kernel list, tglx, mingo, hpa

Woody Suwalski wrote:
> Pavel Machek wrote:
>> On Sat 2017-01-14 12:30:54, Pavel Machek wrote:
>>> Hi!
>>>
>>> On Thu 2017-01-12 20:19:31, Woody Suwalski wrote:
>>>> Pavel Machek wrote:
>>>>> Hi!
>>>>>
>>>>>> I used to have two cpus, and Thinkpad X60 should have two cores, 
>>>>>> but I
>>>>>> only see one on 4.10-rc1. This machine went through many
>>>>>> suspend/resume cycles. When backups finish, I'll try -rc2.
>>>>> Whoever did it, he seems to have returned the cpu in -rc3. All seems
>>>>> to be good now.
>>>> Actually since you have mentioned - I have checked my x60 - same 
>>>> problem -
>>>> only one CPU. However I was running 4.8.13 with uptime 33 days, 
>>>> multiple
>>>> sleep/wake-ups.
>>>> Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the 
>>>> issue is
>>>> older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup
>>>> related...
>>> Hmm. So I seen two cores in -rc3 after boot. But it is quite well
>>> possible that -rc1 was ok just after boot, too, and problem happened
>>> sometime later (probably during suspend/resume cycles). Let me go back
>>> to -rc1 to check.
>> Indeed in -rc1 I see both CPUs after boot. So we have hard to
>> reproduce case where 4.8 to 4.10 kernels lose one of the cpu cores...
>>
>>
>>
> Managed to duplicate - but it took again a long time - I have an 
> uptime of 29 days.
> It must have happened in the last day, as I kept checking as often as 
> I remembered.
>
> The kernel is 4.8.17 EOL, installed almost a month ago.
> Platform ThinkPad x60,  Intel(R) Core(TM) Duo CPU      T2400  @ 1.83GHz
>
> In dmesg I see that it used to be when 2 CPUs were OK:
> [690409.476107] PM: noirq suspend of devices complete after 79.914 msecs
> [690409.476547] ACPI: Preparing to enter system sleep state S3
> [690409.780081] ACPI : EC: EC stopped
> [690409.780083] PM: Saving platform NVS memory
> [690409.780284] Disabling non-boot CPUs ...
> [690409.805284] smpboot: CPU 1 is now offline
> [690409.816464] ACPI: Low-level resume complete
> [690409.816464] ACPI : EC: EC started
> [690409.816464] PM: Restoring platform NVS memory
> [690409.816464] Enabling non-boot CPUs ...
> [690409.840574] x86: Booting SMP configuration:
> [690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [690409.805271] Initializing CPU#1
> [690409.805271] Disabled fast string operations
> [690409.888252]  cache: parent cpu1 should not be sleeping
> [690409.920185] CPU1 is up
> [690409.922288] ACPI: Waking up from system sleep state S3
>
> Then the CPU1 failed to start:
>
> [691329.776108] PM: noirq suspend of devices complete after 79.941 msecs
> [691329.776550] ACPI: Preparing to enter system sleep state S3
> [691330.080081] ACPI : EC: EC stopped
> [691330.080083] PM: Saving platform NVS memory
> [691330.080284] Disabling non-boot CPUs ...
> [691330.105303] smpboot: CPU 1 is now offline
> [691330.116477] ACPI: Low-level resume complete
> [691330.116477] ACPI : EC: EC started
> [691330.116477] PM: Restoring platform NVS memory
> [691330.116477] Enabling non-boot CPUs ...
> [691330.140570] x86: Booting SMP configuration:
> [691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1
> [691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1
> [691340.164445] Error taking CPU1 up: -5
> [691340.166309] ACPI: Waking up from system sleep state S3
>
> And now it is:
> [692517.868523] ACPI: Preparing to enter system sleep state S3
> [692518.172074] ACPI : EC: EC stopped
> [692518.172076] PM: Saving platform NVS memory
> [692518.172269] Disabling non-boot CPUs ...
> [692518.172269] ACPI: Low-level resume complete
> [692518.172269] ACPI : EC: EC started
> [692518.172269] PM: Restoring platform NVS memory
> [692518.172269] ACPI: Waking up from system sleep state S3
>
> Is there any test I could do on the CPU wakeup while in that state?
>
> Woody
>
Is there a way to kick the offline-CPU into operation from /sys level?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-02-12 15:43           ` Woody Suwalski
@ 2017-02-12 19:57             ` Pavel Machek
  2017-02-13  1:35               ` Woody Suwalski
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Machek @ 2017-02-12 19:57 UTC (permalink / raw)
  To: Woody Suwalski; +Cc: Rafael J. Wysocki, kernel list, tglx, mingo, hpa

[-- Attachment #1: Type: text/plain, Size: 2816 bytes --]

Hi!

> >The kernel is 4.8.17 EOL, installed almost a month ago.
> >Platform ThinkPad x60,  Intel(R) Core(TM) Duo CPU      T2400  @ 1.83GHz
> >
> >In dmesg I see that it used to be when 2 CPUs were OK:
> >[690409.476107] PM: noirq suspend of devices complete after 79.914 msecs
> >[690409.476547] ACPI: Preparing to enter system sleep state S3
> >[690409.780081] ACPI : EC: EC stopped
> >[690409.780083] PM: Saving platform NVS memory
> >[690409.780284] Disabling non-boot CPUs ...
> >[690409.805284] smpboot: CPU 1 is now offline
> >[690409.816464] ACPI: Low-level resume complete
> >[690409.816464] ACPI : EC: EC started
> >[690409.816464] PM: Restoring platform NVS memory
> >[690409.816464] Enabling non-boot CPUs ...
> >[690409.840574] x86: Booting SMP configuration:
> >[690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1
> >[690409.805271] Initializing CPU#1
> >[690409.805271] Disabled fast string operations
> >[690409.888252]  cache: parent cpu1 should not be sleeping
> >[690409.920185] CPU1 is up
> >[690409.922288] ACPI: Waking up from system sleep state S3
> >
> >Then the CPU1 failed to start:
> >
> >[691329.776108] PM: noirq suspend of devices complete after 79.941 msecs
> >[691329.776550] ACPI: Preparing to enter system sleep state S3
> >[691330.080081] ACPI : EC: EC stopped
> >[691330.080083] PM: Saving platform NVS memory
> >[691330.080284] Disabling non-boot CPUs ...
> >[691330.105303] smpboot: CPU 1 is now offline
> >[691330.116477] ACPI: Low-level resume complete
> >[691330.116477] ACPI : EC: EC started
> >[691330.116477] PM: Restoring platform NVS memory
> >[691330.116477] Enabling non-boot CPUs ...
> >[691330.140570] x86: Booting SMP configuration:
> >[691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1
> >[691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1
> >[691340.164445] Error taking CPU1 up: -5
> >[691340.166309] ACPI: Waking up from system sleep state S3
> >
> >And now it is:
> >[692517.868523] ACPI: Preparing to enter system sleep state S3
> >[692518.172074] ACPI : EC: EC stopped
> >[692518.172076] PM: Saving platform NVS memory
> >[692518.172269] Disabling non-boot CPUs ...
> >[692518.172269] ACPI: Low-level resume complete
> >[692518.172269] ACPI : EC: EC started
> >[692518.172269] PM: Restoring platform NVS memory
> >[692518.172269] ACPI: Waking up from system sleep state S3
> >
> >Is there any test I could do on the CPU wakeup while in that state?
> >
> Is there a way to kick the offline-CPU into operation from /sys level?

echo 0 > /sys/devices/system/cpu/cpu1/online

should work. And... good thinking :-).

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-02-12 19:57             ` Pavel Machek
@ 2017-02-13  1:35               ` Woody Suwalski
  2017-02-13  8:02                 ` Pavel Machek
  0 siblings, 1 reply; 17+ messages in thread
From: Woody Suwalski @ 2017-02-13  1:35 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Rafael J. Wysocki, kernel list, tglx, mingo, hpa

Pavel Machek wrote:
> Hi!
>
>>> The kernel is 4.8.17 EOL, installed almost a month ago.
>>> Platform ThinkPad x60,  Intel(R) Core(TM) Duo CPU      T2400  @ 1.83GHz
>>>
>>> In dmesg I see that it used to be when 2 CPUs were OK:
>>> [690409.476107] PM: noirq suspend of devices complete after 79.914 msecs
>>> [690409.476547] ACPI: Preparing to enter system sleep state S3
>>> [690409.780081] ACPI : EC: EC stopped
>>> [690409.780083] PM: Saving platform NVS memory
>>> [690409.780284] Disabling non-boot CPUs ...
>>> [690409.805284] smpboot: CPU 1 is now offline
>>> [690409.816464] ACPI: Low-level resume complete
>>> [690409.816464] ACPI : EC: EC started
>>> [690409.816464] PM: Restoring platform NVS memory
>>> [690409.816464] Enabling non-boot CPUs ...
>>> [690409.840574] x86: Booting SMP configuration:
>>> [690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1
>>> [690409.805271] Initializing CPU#1
>>> [690409.805271] Disabled fast string operations
>>> [690409.888252]  cache: parent cpu1 should not be sleeping
>>> [690409.920185] CPU1 is up
>>> [690409.922288] ACPI: Waking up from system sleep state S3
>>>
>>> Then the CPU1 failed to start:
>>>
>>> [691329.776108] PM: noirq suspend of devices complete after 79.941 msecs
>>> [691329.776550] ACPI: Preparing to enter system sleep state S3
>>> [691330.080081] ACPI : EC: EC stopped
>>> [691330.080083] PM: Saving platform NVS memory
>>> [691330.080284] Disabling non-boot CPUs ...
>>> [691330.105303] smpboot: CPU 1 is now offline
>>> [691330.116477] ACPI: Low-level resume complete
>>> [691330.116477] ACPI : EC: EC started
>>> [691330.116477] PM: Restoring platform NVS memory
>>> [691330.116477] Enabling non-boot CPUs ...
>>> [691330.140570] x86: Booting SMP configuration:
>>> [691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1
>>> [691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1
>>> [691340.164445] Error taking CPU1 up: -5
>>> [691340.166309] ACPI: Waking up from system sleep state S3
>>>
>>> And now it is:
>>> [692517.868523] ACPI: Preparing to enter system sleep state S3
>>> [692518.172074] ACPI : EC: EC stopped
>>> [692518.172076] PM: Saving platform NVS memory
>>> [692518.172269] Disabling non-boot CPUs ...
>>> [692518.172269] ACPI: Low-level resume complete
>>> [692518.172269] ACPI : EC: EC started
>>> [692518.172269] PM: Restoring platform NVS memory
>>> [692518.172269] ACPI: Waking up from system sleep state S3
>>>
>>> Is there any test I could do on the CPU wakeup while in that state?
>>>
>> Is there a way to kick the offline-CPU into operation from /sys level?
> echo 0 > /sys/devices/system/cpu/cpu1/online
>
> should work. And... good thinking :-).
>
> 									Pavel
Did not work,
     echo 0 > /sys/devices/system/cpu/cpu1/online
     -su: echo: write error: Device or resource busy

However
     echo 1 > /sys/devices/system/cpu/cpu1/online
did not return an error ( but still only CPU0 seen)

Interesting experiment: I have hibernated and then woke up - and still 
only CPU0. I was expecting that after the power cycle the hotplug will 
bring CPU1 up...

Woody

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-02-13  1:35               ` Woody Suwalski
@ 2017-02-13  8:02                 ` Pavel Machek
  2017-02-13  8:48                   ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Machek @ 2017-02-13  8:02 UTC (permalink / raw)
  To: Woody Suwalski; +Cc: Rafael J. Wysocki, kernel list, tglx, mingo, hpa

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

Hi!

> >>>And now it is:
> >>>[692517.868523] ACPI: Preparing to enter system sleep state S3
> >>>[692518.172074] ACPI : EC: EC stopped
> >>>[692518.172076] PM: Saving platform NVS memory
> >>>[692518.172269] Disabling non-boot CPUs ...
> >>>[692518.172269] ACPI: Low-level resume complete
> >>>[692518.172269] ACPI : EC: EC started
> >>>[692518.172269] PM: Restoring platform NVS memory
> >>>[692518.172269] ACPI: Waking up from system sleep state S3
> >>>
> >>>Is there any test I could do on the CPU wakeup while in that state?
> >>>
> >>Is there a way to kick the offline-CPU into operation from /sys level?
> >echo 0 > /sys/devices/system/cpu/cpu1/online
> >
> >should work. And... good thinking :-).
> >									Pavel
> Did not work,
>     echo 0 > /sys/devices/system/cpu/cpu1/online
>     -su: echo: write error: Device or resource busy
> 
> However
>     echo 1 > /sys/devices/system/cpu/cpu1/online
> did not return an error ( but still only CPU0 seen)
> 
> Interesting experiment: I have hibernated and then woke up - and still only
> CPU0. I was expecting that after the power cycle the hotplug will bring CPU1
> up...

Which is interesting indeed. Hardware was powercycled, so what is
broken is probably some kernel state...

[Evil: /sys/devices/cpu/ -- some perf stuff.
       /sys/devices/system/cpu -- that's where real CPU stuff is
       hiding.]

cd /sys/devices/system/cpu/cpu1
while true; do echo 0 > online; echo 1 > online; done

...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should
not die, but also good -- this is easier to reproduce then running 100
suspend cycles.]

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-02-13  8:02                 ` Pavel Machek
@ 2017-02-13  8:48                   ` Thomas Gleixner
  2017-02-13  9:42                     ` Pavel Machek
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2017-02-13  8:48 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa

On Mon, 13 Feb 2017, Pavel Machek wrote:
> cd /sys/devices/system/cpu/cpu1
> while true; do echo 0 > online; echo 1 > online; done
> 
> ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should
> not die, but also good -- this is easier to reproduce then running 100
> suspend cycles.]

Can you tell where it crashes?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-02-13  8:48                   ` Thomas Gleixner
@ 2017-02-13  9:42                     ` Pavel Machek
  2017-02-13 10:18                       ` lkml
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Machek @ 2017-02-13  9:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa

[-- Attachment #1: Type: text/plain, Size: 707 bytes --]

Hi!

On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote:
> On Mon, 13 Feb 2017, Pavel Machek wrote:
> > cd /sys/devices/system/cpu/cpu1
> > while true; do echo 0 > online; echo 1 > online; done
> > 
> > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should
> > not die, but also good -- this is easier to reproduce then running 100
> > suspend cycles.]
> 
> Can you tell where it crashes?

I did not expect a crash, so I was in X... I have a feeling that this
will be reproducible on a lot of hardware, but let me try.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-02-13  9:42                     ` Pavel Machek
@ 2017-02-13 10:18                       ` lkml
  2017-02-13 11:25                         ` Thomas Gleixner
  2017-02-13 11:53                         ` Pavel Machek
  0 siblings, 2 replies; 17+ messages in thread
From: lkml @ 2017-02-13 10:18 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Thomas Gleixner, Woody Suwalski, Rafael J. Wysocki, kernel list,
	mingo, hpa

On Mon, Feb 13, 2017 at 10:42:36AM +0100, Pavel Machek wrote:
> Hi!
> 
> On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote:
> > On Mon, 13 Feb 2017, Pavel Machek wrote:
> > > cd /sys/devices/system/cpu/cpu1
> > > while true; do echo 0 > online; echo 1 > online; done
> > > 
> > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should
> > > not die, but also good -- this is easier to reproduce then running 100
> > > suspend cycles.]
> > 
> > Can you tell where it crashes?
> 
> I did not expect a crash, so I was in X... I have a feeling that this
> will be reproducible on a lot of hardware, but let me try.

FYI: Lockup reproduced with 4.10.0-rc7 with an X61s.

Caught a glimpse of something about an RCU stall timeout before the system shut
off.  Prior to that, during the loop execution, a bunch of systemd processes
were experiencing watchdog timeouts, and procps `top` would start but
never refresh, leaving the CPU column all "nan".

Regards,
Vito Caputo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-02-13 10:18                       ` lkml
@ 2017-02-13 11:25                         ` Thomas Gleixner
  2017-02-13 11:39                           ` lkml
  2017-02-13 11:40                           ` lkml
  2017-02-13 11:53                         ` Pavel Machek
  1 sibling, 2 replies; 17+ messages in thread
From: Thomas Gleixner @ 2017-02-13 11:25 UTC (permalink / raw)
  To: lkml
  Cc: Pavel Machek, Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa

On Mon, 13 Feb 2017, lkml@pengaru.com wrote:

> On Mon, Feb 13, 2017 at 10:42:36AM +0100, Pavel Machek wrote:
> > Hi!
> > 
> > On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote:
> > > On Mon, 13 Feb 2017, Pavel Machek wrote:
> > > > cd /sys/devices/system/cpu/cpu1
> > > > while true; do echo 0 > online; echo 1 > online; done
> > > > 
> > > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should
> > > > not die, but also good -- this is easier to reproduce then running 100
> > > > suspend cycles.]
> > > 
> > > Can you tell where it crashes?
> > 
> > I did not expect a crash, so I was in X... I have a feeling that this
> > will be reproducible on a lot of hardware, but let me try.
> 
> FYI: Lockup reproduced with 4.10.0-rc7 with an X61s.
> 
> Caught a glimpse of something about an RCU stall timeout before the system shut
> off.  Prior to that, during the loop execution, a bunch of systemd processes
> were experiencing watchdog timeouts, and procps `top` would start but
> never refresh, leaving the CPU column all "nan".

Does the machine use intel_idle by chance?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-02-13 11:25                         ` Thomas Gleixner
@ 2017-02-13 11:39                           ` lkml
  2017-02-13 11:40                           ` lkml
  1 sibling, 0 replies; 17+ messages in thread
From: lkml @ 2017-02-13 11:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Pavel Machek, Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa

On Mon, Feb 13, 2017 at 12:25:28PM +0100, Thomas Gleixner wrote:
> On Mon, 13 Feb 2017, lkml@pengaru.com wrote:
> 
> > On Mon, Feb 13, 2017 at 10:42:36AM +0100, Pavel Machek wrote:
> > > Hi!
> > > 
> > > On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote:
> > > > On Mon, 13 Feb 2017, Pavel Machek wrote:
> > > > > cd /sys/devices/system/cpu/cpu1
> > > > > while true; do echo 0 > online; echo 1 > online; done
> > > > > 
> > > > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should
> > > > > not die, but also good -- this is easier to reproduce then running 100
> > > > > suspend cycles.]
> > > > 
> > > > Can you tell where it crashes?
> > > 
> > > I did not expect a crash, so I was in X... I have a feeling that this
> > > will be reproducible on a lot of hardware, but let me try.
> > 
> > FYI: Lockup reproduced with 4.10.0-rc7 with an X61s.
> > 
> > Caught a glimpse of something about an RCU stall timeout before the system shut
> > off.  Prior to that, during the loop execution, a bunch of systemd processes
> > were experiencing watchdog timeouts, and procps `top` would start but
> > never refresh, leaving the CPU column all "nan".
> 
> Does the machine use intel_idle by chance?
> 

It's not enabled in my kernel, /proc/config.gz attached FWIW.

Regards,
Vito Caputo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-02-13 11:25                         ` Thomas Gleixner
  2017-02-13 11:39                           ` lkml
@ 2017-02-13 11:40                           ` lkml
  1 sibling, 0 replies; 17+ messages in thread
From: lkml @ 2017-02-13 11:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Pavel Machek, Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

On Mon, Feb 13, 2017 at 12:25:28PM +0100, Thomas Gleixner wrote:
> On Mon, 13 Feb 2017, lkml@pengaru.com wrote:
> 
> > On Mon, Feb 13, 2017 at 10:42:36AM +0100, Pavel Machek wrote:
> > > Hi!
> > > 
> > > On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote:
> > > > On Mon, 13 Feb 2017, Pavel Machek wrote:
> > > > > cd /sys/devices/system/cpu/cpu1
> > > > > while true; do echo 0 > online; echo 1 > online; done
> > > > > 
> > > > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should
> > > > > not die, but also good -- this is easier to reproduce then running 100
> > > > > suspend cycles.]
> > > > 
> > > > Can you tell where it crashes?
> > > 
> > > I did not expect a crash, so I was in X... I have a feeling that this
> > > will be reproducible on a lot of hardware, but let me try.
> > 
> > FYI: Lockup reproduced with 4.10.0-rc7 with an X61s.
> > 
> > Caught a glimpse of something about an RCU stall timeout before the system shut
> > off.  Prior to that, during the loop execution, a bunch of systemd processes
> > were experiencing watchdog timeouts, and procps `top` would start but
> > never refresh, leaving the CPU column all "nan".
> 
> Does the machine use intel_idle by chance?
> 

Config actually attached this time!

Regards,
Vito Caputo

[-- Attachment #2: config.gz --]
[-- Type: application/octet-stream, Size: 24085 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 4.10-rc1: thinkpad x60: who ate my cpu?
  2017-02-13 10:18                       ` lkml
  2017-02-13 11:25                         ` Thomas Gleixner
@ 2017-02-13 11:53                         ` Pavel Machek
  1 sibling, 0 replies; 17+ messages in thread
From: Pavel Machek @ 2017-02-13 11:53 UTC (permalink / raw)
  To: lkml
  Cc: Thomas Gleixner, Woody Suwalski, Rafael J. Wysocki, kernel list,
	mingo, hpa

[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]

On Mon 2017-02-13 04:18:51, lkml@pengaru.com wrote:
> On Mon, Feb 13, 2017 at 10:42:36AM +0100, Pavel Machek wrote:
> > Hi!
> > 
> > On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote:
> > > On Mon, 13 Feb 2017, Pavel Machek wrote:
> > > > cd /sys/devices/system/cpu/cpu1
> > > > while true; do echo 0 > online; echo 1 > online; done
> > > > 
> > > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should
> > > > not die, but also good -- this is easier to reproduce then running 100
> > > > suspend cycles.]
> > > 
> > > Can you tell where it crashes?
> > 
> > I did not expect a crash, so I was in X... I have a feeling that this
> > will be reproducible on a lot of hardware, but let me try.
> 
> FYI: Lockup reproduced with 4.10.0-rc7 with an X61s.
> 
> Caught a glimpse of something about an RCU stall timeout before the system shut
> off.  Prior to that, during the loop execution, a bunch of systemd processes
> were experiencing watchdog timeouts, and procps `top` would start but
> never refresh, leaving the CPU column all "nan".

Hmm. I was not able to reproduce the lockup when I was running "while
true" over ssh. Some weird stuff happened when I suspended machine
with that loop running, but... I guess that was expected.

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-02-13 11:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-08 22:17 4.10-rc1: thinkpad x60: who ate my cpu? Pavel Machek
2017-01-09  9:30 ` Pavel Machek
2017-01-13  1:19   ` Woody Suwalski
2017-01-14 11:30     ` Pavel Machek
2017-01-15  9:56       ` Pavel Machek
2017-02-11 23:48         ` Woody Suwalski
2017-02-12 15:43           ` Woody Suwalski
2017-02-12 19:57             ` Pavel Machek
2017-02-13  1:35               ` Woody Suwalski
2017-02-13  8:02                 ` Pavel Machek
2017-02-13  8:48                   ` Thomas Gleixner
2017-02-13  9:42                     ` Pavel Machek
2017-02-13 10:18                       ` lkml
2017-02-13 11:25                         ` Thomas Gleixner
2017-02-13 11:39                           ` lkml
2017-02-13 11:40                           ` lkml
2017-02-13 11:53                         ` Pavel Machek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.