* 4.10-rc1: thinkpad x60: who ate my cpu? @ 2017-01-08 22:17 Pavel Machek 2017-01-09 9:30 ` Pavel Machek 0 siblings, 1 reply; 17+ messages in thread From: Pavel Machek @ 2017-01-08 22:17 UTC (permalink / raw) To: kernel list, tglx, mingo, hpa [-- Attachment #1: Type: text/plain, Size: 1491 bytes --] Hi! I used to have two cpus, and Thinkpad X60 should have two cores, but I only see one on 4.10-rc1. This machine went through many suspend/resume cycles. When backups finish, I'll try -rc2. pavel@duo:~$ uname -a Linux duo 4.10.0-rc1+ #304 SMP Mon Dec 26 10:33:24 CET 2016 i686 GNU/Linux pavel@duo:~$ zcat /proc/config.gz | grep SMP CONFIG_X86_32_SMP=y CONFIG_GENERIC_SMP_IDLE_THREAD=y CONFIG_SMP=y # CONFIG_X86_BIGSMP is not set CONFIG_PM_SLEEP_SMP=y pavel@duo:~$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 14 model name : Genuine Intel(R) CPU T2400 @ 1.83GHz stepping : 8 cpu MHz : 1833.000 cache size : 2048 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm dtherm bugs : bogomips : 3657.62 clflush size : 64 cache_alignment : 64 address sizes : 32 bits physical, 32 bits virtual power management: pavel@duo:~$ Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-01-08 22:17 4.10-rc1: thinkpad x60: who ate my cpu? Pavel Machek @ 2017-01-09 9:30 ` Pavel Machek 2017-01-13 1:19 ` Woody Suwalski 0 siblings, 1 reply; 17+ messages in thread From: Pavel Machek @ 2017-01-09 9:30 UTC (permalink / raw) To: kernel list, tglx, mingo, hpa [-- Attachment #1: Type: text/plain, Size: 458 bytes --] Hi! > I used to have two cpus, and Thinkpad X60 should have two cores, but I > only see one on 4.10-rc1. This machine went through many > suspend/resume cycles. When backups finish, I'll try -rc2. Whoever did it, he seems to have returned the cpu in -rc3. All seems to be good now. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-01-09 9:30 ` Pavel Machek @ 2017-01-13 1:19 ` Woody Suwalski 2017-01-14 11:30 ` Pavel Machek 0 siblings, 1 reply; 17+ messages in thread From: Woody Suwalski @ 2017-01-13 1:19 UTC (permalink / raw) To: Pavel Machek, kernel list, tglx, mingo, hpa Pavel Machek wrote: > Hi! > >> I used to have two cpus, and Thinkpad X60 should have two cores, but I >> only see one on 4.10-rc1. This machine went through many >> suspend/resume cycles. When backups finish, I'll try -rc2. > Whoever did it, he seems to have returned the cpu in -rc3. All seems > to be good now. > Pavel > Actually since you have mentioned - I have checked my x60 - same problem - only one CPU. However I was running 4.8.13 with uptime 33 days, multiple sleep/wake-ups. Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the issue is older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup related... Woody ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-01-13 1:19 ` Woody Suwalski @ 2017-01-14 11:30 ` Pavel Machek 2017-01-15 9:56 ` Pavel Machek 0 siblings, 1 reply; 17+ messages in thread From: Pavel Machek @ 2017-01-14 11:30 UTC (permalink / raw) To: Woody Suwalski, Rafael J. Wysocki; +Cc: kernel list, tglx, mingo, hpa [-- Attachment #1: Type: text/plain, Size: 1131 bytes --] Hi! On Thu 2017-01-12 20:19:31, Woody Suwalski wrote: > Pavel Machek wrote: > >Hi! > > > >>I used to have two cpus, and Thinkpad X60 should have two cores, but I > >>only see one on 4.10-rc1. This machine went through many > >>suspend/resume cycles. When backups finish, I'll try -rc2. > >Whoever did it, he seems to have returned the cpu in -rc3. All seems > >to be good now. > Actually since you have mentioned - I have checked my x60 - same problem - > only one CPU. However I was running 4.8.13 with uptime 33 days, multiple > sleep/wake-ups. > Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the issue is > older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup > related... Hmm. So I seen two cores in -rc3 after boot. But it is quite well possible that -rc1 was ok just after boot, too, and problem happened sometime later (probably during suspend/resume cycles). Let me go back to -rc1 to check. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-01-14 11:30 ` Pavel Machek @ 2017-01-15 9:56 ` Pavel Machek 2017-02-11 23:48 ` Woody Suwalski 0 siblings, 1 reply; 17+ messages in thread From: Pavel Machek @ 2017-01-15 9:56 UTC (permalink / raw) To: Woody Suwalski, Rafael J. Wysocki; +Cc: kernel list, tglx, mingo, hpa [-- Attachment #1: Type: text/plain, Size: 1335 bytes --] On Sat 2017-01-14 12:30:54, Pavel Machek wrote: > Hi! > > On Thu 2017-01-12 20:19:31, Woody Suwalski wrote: > > Pavel Machek wrote: > > >Hi! > > > > > >>I used to have two cpus, and Thinkpad X60 should have two cores, but I > > >>only see one on 4.10-rc1. This machine went through many > > >>suspend/resume cycles. When backups finish, I'll try -rc2. > > >Whoever did it, he seems to have returned the cpu in -rc3. All seems > > >to be good now. > > > Actually since you have mentioned - I have checked my x60 - same problem - > > only one CPU. However I was running 4.8.13 with uptime 33 days, multiple > > sleep/wake-ups. > > Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the issue is > > older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup > > related... > > Hmm. So I seen two cores in -rc3 after boot. But it is quite well > possible that -rc1 was ok just after boot, too, and problem happened > sometime later (probably during suspend/resume cycles). Let me go back > to -rc1 to check. Indeed in -rc1 I see both CPUs after boot. So we have hard to reproduce case where 4.8 to 4.10 kernels lose one of the cpu cores... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-01-15 9:56 ` Pavel Machek @ 2017-02-11 23:48 ` Woody Suwalski 2017-02-12 15:43 ` Woody Suwalski 0 siblings, 1 reply; 17+ messages in thread From: Woody Suwalski @ 2017-02-11 23:48 UTC (permalink / raw) To: Pavel Machek, Rafael J. Wysocki; +Cc: kernel list, tglx, mingo, hpa Pavel Machek wrote: > On Sat 2017-01-14 12:30:54, Pavel Machek wrote: >> Hi! >> >> On Thu 2017-01-12 20:19:31, Woody Suwalski wrote: >>> Pavel Machek wrote: >>>> Hi! >>>> >>>>> I used to have two cpus, and Thinkpad X60 should have two cores, but I >>>>> only see one on 4.10-rc1. This machine went through many >>>>> suspend/resume cycles. When backups finish, I'll try -rc2. >>>> Whoever did it, he seems to have returned the cpu in -rc3. All seems >>>> to be good now. >>> Actually since you have mentioned - I have checked my x60 - same problem - >>> only one CPU. However I was running 4.8.13 with uptime 33 days, multiple >>> sleep/wake-ups. >>> Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the issue is >>> older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup >>> related... >> Hmm. So I seen two cores in -rc3 after boot. But it is quite well >> possible that -rc1 was ok just after boot, too, and problem happened >> sometime later (probably during suspend/resume cycles). Let me go back >> to -rc1 to check. > Indeed in -rc1 I see both CPUs after boot. So we have hard to > reproduce case where 4.8 to 4.10 kernels lose one of the cpu cores... > > > Managed to duplicate - but it took again a long time - I have an uptime of 29 days. It must have happened in the last day, as I kept checking as often as I remembered. The kernel is 4.8.17 EOL, installed almost a month ago. Platform ThinkPad x60, Intel(R) Core(TM) Duo CPU T2400 @ 1.83GHz In dmesg I see that it used to be when 2 CPUs were OK: [690409.476107] PM: noirq suspend of devices complete after 79.914 msecs [690409.476547] ACPI: Preparing to enter system sleep state S3 [690409.780081] ACPI : EC: EC stopped [690409.780083] PM: Saving platform NVS memory [690409.780284] Disabling non-boot CPUs ... [690409.805284] smpboot: CPU 1 is now offline [690409.816464] ACPI: Low-level resume complete [690409.816464] ACPI : EC: EC started [690409.816464] PM: Restoring platform NVS memory [690409.816464] Enabling non-boot CPUs ... [690409.840574] x86: Booting SMP configuration: [690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1 [690409.805271] Initializing CPU#1 [690409.805271] Disabled fast string operations [690409.888252] cache: parent cpu1 should not be sleeping [690409.920185] CPU1 is up [690409.922288] ACPI: Waking up from system sleep state S3 Then the CPU1 failed to start: [691329.776108] PM: noirq suspend of devices complete after 79.941 msecs [691329.776550] ACPI: Preparing to enter system sleep state S3 [691330.080081] ACPI : EC: EC stopped [691330.080083] PM: Saving platform NVS memory [691330.080284] Disabling non-boot CPUs ... [691330.105303] smpboot: CPU 1 is now offline [691330.116477] ACPI: Low-level resume complete [691330.116477] ACPI : EC: EC started [691330.116477] PM: Restoring platform NVS memory [691330.116477] Enabling non-boot CPUs ... [691330.140570] x86: Booting SMP configuration: [691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1 [691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 [691340.164445] Error taking CPU1 up: -5 [691340.166309] ACPI: Waking up from system sleep state S3 And now it is: [692517.868523] ACPI: Preparing to enter system sleep state S3 [692518.172074] ACPI : EC: EC stopped [692518.172076] PM: Saving platform NVS memory [692518.172269] Disabling non-boot CPUs ... [692518.172269] ACPI: Low-level resume complete [692518.172269] ACPI : EC: EC started [692518.172269] PM: Restoring platform NVS memory [692518.172269] ACPI: Waking up from system sleep state S3 Is there any test I could do on the CPU wakeup while in that state? Woody ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-02-11 23:48 ` Woody Suwalski @ 2017-02-12 15:43 ` Woody Suwalski 2017-02-12 19:57 ` Pavel Machek 0 siblings, 1 reply; 17+ messages in thread From: Woody Suwalski @ 2017-02-12 15:43 UTC (permalink / raw) To: Pavel Machek, Rafael J. Wysocki; +Cc: kernel list, tglx, mingo, hpa Woody Suwalski wrote: > Pavel Machek wrote: >> On Sat 2017-01-14 12:30:54, Pavel Machek wrote: >>> Hi! >>> >>> On Thu 2017-01-12 20:19:31, Woody Suwalski wrote: >>>> Pavel Machek wrote: >>>>> Hi! >>>>> >>>>>> I used to have two cpus, and Thinkpad X60 should have two cores, >>>>>> but I >>>>>> only see one on 4.10-rc1. This machine went through many >>>>>> suspend/resume cycles. When backups finish, I'll try -rc2. >>>>> Whoever did it, he seems to have returned the cpu in -rc3. All seems >>>>> to be good now. >>>> Actually since you have mentioned - I have checked my x60 - same >>>> problem - >>>> only one CPU. However I was running 4.8.13 with uptime 33 days, >>>> multiple >>>> sleep/wake-ups. >>>> Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the >>>> issue is >>>> older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup >>>> related... >>> Hmm. So I seen two cores in -rc3 after boot. But it is quite well >>> possible that -rc1 was ok just after boot, too, and problem happened >>> sometime later (probably during suspend/resume cycles). Let me go back >>> to -rc1 to check. >> Indeed in -rc1 I see both CPUs after boot. So we have hard to >> reproduce case where 4.8 to 4.10 kernels lose one of the cpu cores... >> >> >> > Managed to duplicate - but it took again a long time - I have an > uptime of 29 days. > It must have happened in the last day, as I kept checking as often as > I remembered. > > The kernel is 4.8.17 EOL, installed almost a month ago. > Platform ThinkPad x60, Intel(R) Core(TM) Duo CPU T2400 @ 1.83GHz > > In dmesg I see that it used to be when 2 CPUs were OK: > [690409.476107] PM: noirq suspend of devices complete after 79.914 msecs > [690409.476547] ACPI: Preparing to enter system sleep state S3 > [690409.780081] ACPI : EC: EC stopped > [690409.780083] PM: Saving platform NVS memory > [690409.780284] Disabling non-boot CPUs ... > [690409.805284] smpboot: CPU 1 is now offline > [690409.816464] ACPI: Low-level resume complete > [690409.816464] ACPI : EC: EC started > [690409.816464] PM: Restoring platform NVS memory > [690409.816464] Enabling non-boot CPUs ... > [690409.840574] x86: Booting SMP configuration: > [690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1 > [690409.805271] Initializing CPU#1 > [690409.805271] Disabled fast string operations > [690409.888252] cache: parent cpu1 should not be sleeping > [690409.920185] CPU1 is up > [690409.922288] ACPI: Waking up from system sleep state S3 > > Then the CPU1 failed to start: > > [691329.776108] PM: noirq suspend of devices complete after 79.941 msecs > [691329.776550] ACPI: Preparing to enter system sleep state S3 > [691330.080081] ACPI : EC: EC stopped > [691330.080083] PM: Saving platform NVS memory > [691330.080284] Disabling non-boot CPUs ... > [691330.105303] smpboot: CPU 1 is now offline > [691330.116477] ACPI: Low-level resume complete > [691330.116477] ACPI : EC: EC started > [691330.116477] PM: Restoring platform NVS memory > [691330.116477] Enabling non-boot CPUs ... > [691330.140570] x86: Booting SMP configuration: > [691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1 > [691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 > [691340.164445] Error taking CPU1 up: -5 > [691340.166309] ACPI: Waking up from system sleep state S3 > > And now it is: > [692517.868523] ACPI: Preparing to enter system sleep state S3 > [692518.172074] ACPI : EC: EC stopped > [692518.172076] PM: Saving platform NVS memory > [692518.172269] Disabling non-boot CPUs ... > [692518.172269] ACPI: Low-level resume complete > [692518.172269] ACPI : EC: EC started > [692518.172269] PM: Restoring platform NVS memory > [692518.172269] ACPI: Waking up from system sleep state S3 > > Is there any test I could do on the CPU wakeup while in that state? > > Woody > Is there a way to kick the offline-CPU into operation from /sys level? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-02-12 15:43 ` Woody Suwalski @ 2017-02-12 19:57 ` Pavel Machek 2017-02-13 1:35 ` Woody Suwalski 0 siblings, 1 reply; 17+ messages in thread From: Pavel Machek @ 2017-02-12 19:57 UTC (permalink / raw) To: Woody Suwalski; +Cc: Rafael J. Wysocki, kernel list, tglx, mingo, hpa [-- Attachment #1: Type: text/plain, Size: 2816 bytes --] Hi! > >The kernel is 4.8.17 EOL, installed almost a month ago. > >Platform ThinkPad x60, Intel(R) Core(TM) Duo CPU T2400 @ 1.83GHz > > > >In dmesg I see that it used to be when 2 CPUs were OK: > >[690409.476107] PM: noirq suspend of devices complete after 79.914 msecs > >[690409.476547] ACPI: Preparing to enter system sleep state S3 > >[690409.780081] ACPI : EC: EC stopped > >[690409.780083] PM: Saving platform NVS memory > >[690409.780284] Disabling non-boot CPUs ... > >[690409.805284] smpboot: CPU 1 is now offline > >[690409.816464] ACPI: Low-level resume complete > >[690409.816464] ACPI : EC: EC started > >[690409.816464] PM: Restoring platform NVS memory > >[690409.816464] Enabling non-boot CPUs ... > >[690409.840574] x86: Booting SMP configuration: > >[690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1 > >[690409.805271] Initializing CPU#1 > >[690409.805271] Disabled fast string operations > >[690409.888252] cache: parent cpu1 should not be sleeping > >[690409.920185] CPU1 is up > >[690409.922288] ACPI: Waking up from system sleep state S3 > > > >Then the CPU1 failed to start: > > > >[691329.776108] PM: noirq suspend of devices complete after 79.941 msecs > >[691329.776550] ACPI: Preparing to enter system sleep state S3 > >[691330.080081] ACPI : EC: EC stopped > >[691330.080083] PM: Saving platform NVS memory > >[691330.080284] Disabling non-boot CPUs ... > >[691330.105303] smpboot: CPU 1 is now offline > >[691330.116477] ACPI: Low-level resume complete > >[691330.116477] ACPI : EC: EC started > >[691330.116477] PM: Restoring platform NVS memory > >[691330.116477] Enabling non-boot CPUs ... > >[691330.140570] x86: Booting SMP configuration: > >[691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1 > >[691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 > >[691340.164445] Error taking CPU1 up: -5 > >[691340.166309] ACPI: Waking up from system sleep state S3 > > > >And now it is: > >[692517.868523] ACPI: Preparing to enter system sleep state S3 > >[692518.172074] ACPI : EC: EC stopped > >[692518.172076] PM: Saving platform NVS memory > >[692518.172269] Disabling non-boot CPUs ... > >[692518.172269] ACPI: Low-level resume complete > >[692518.172269] ACPI : EC: EC started > >[692518.172269] PM: Restoring platform NVS memory > >[692518.172269] ACPI: Waking up from system sleep state S3 > > > >Is there any test I could do on the CPU wakeup while in that state? > > > Is there a way to kick the offline-CPU into operation from /sys level? echo 0 > /sys/devices/system/cpu/cpu1/online should work. And... good thinking :-). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-02-12 19:57 ` Pavel Machek @ 2017-02-13 1:35 ` Woody Suwalski 2017-02-13 8:02 ` Pavel Machek 0 siblings, 1 reply; 17+ messages in thread From: Woody Suwalski @ 2017-02-13 1:35 UTC (permalink / raw) To: Pavel Machek; +Cc: Rafael J. Wysocki, kernel list, tglx, mingo, hpa Pavel Machek wrote: > Hi! > >>> The kernel is 4.8.17 EOL, installed almost a month ago. >>> Platform ThinkPad x60, Intel(R) Core(TM) Duo CPU T2400 @ 1.83GHz >>> >>> In dmesg I see that it used to be when 2 CPUs were OK: >>> [690409.476107] PM: noirq suspend of devices complete after 79.914 msecs >>> [690409.476547] ACPI: Preparing to enter system sleep state S3 >>> [690409.780081] ACPI : EC: EC stopped >>> [690409.780083] PM: Saving platform NVS memory >>> [690409.780284] Disabling non-boot CPUs ... >>> [690409.805284] smpboot: CPU 1 is now offline >>> [690409.816464] ACPI: Low-level resume complete >>> [690409.816464] ACPI : EC: EC started >>> [690409.816464] PM: Restoring platform NVS memory >>> [690409.816464] Enabling non-boot CPUs ... >>> [690409.840574] x86: Booting SMP configuration: >>> [690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1 >>> [690409.805271] Initializing CPU#1 >>> [690409.805271] Disabled fast string operations >>> [690409.888252] cache: parent cpu1 should not be sleeping >>> [690409.920185] CPU1 is up >>> [690409.922288] ACPI: Waking up from system sleep state S3 >>> >>> Then the CPU1 failed to start: >>> >>> [691329.776108] PM: noirq suspend of devices complete after 79.941 msecs >>> [691329.776550] ACPI: Preparing to enter system sleep state S3 >>> [691330.080081] ACPI : EC: EC stopped >>> [691330.080083] PM: Saving platform NVS memory >>> [691330.080284] Disabling non-boot CPUs ... >>> [691330.105303] smpboot: CPU 1 is now offline >>> [691330.116477] ACPI: Low-level resume complete >>> [691330.116477] ACPI : EC: EC started >>> [691330.116477] PM: Restoring platform NVS memory >>> [691330.116477] Enabling non-boot CPUs ... >>> [691330.140570] x86: Booting SMP configuration: >>> [691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1 >>> [691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 >>> [691340.164445] Error taking CPU1 up: -5 >>> [691340.166309] ACPI: Waking up from system sleep state S3 >>> >>> And now it is: >>> [692517.868523] ACPI: Preparing to enter system sleep state S3 >>> [692518.172074] ACPI : EC: EC stopped >>> [692518.172076] PM: Saving platform NVS memory >>> [692518.172269] Disabling non-boot CPUs ... >>> [692518.172269] ACPI: Low-level resume complete >>> [692518.172269] ACPI : EC: EC started >>> [692518.172269] PM: Restoring platform NVS memory >>> [692518.172269] ACPI: Waking up from system sleep state S3 >>> >>> Is there any test I could do on the CPU wakeup while in that state? >>> >> Is there a way to kick the offline-CPU into operation from /sys level? > echo 0 > /sys/devices/system/cpu/cpu1/online > > should work. And... good thinking :-). > > Pavel Did not work, echo 0 > /sys/devices/system/cpu/cpu1/online -su: echo: write error: Device or resource busy However echo 1 > /sys/devices/system/cpu/cpu1/online did not return an error ( but still only CPU0 seen) Interesting experiment: I have hibernated and then woke up - and still only CPU0. I was expecting that after the power cycle the hotplug will bring CPU1 up... Woody ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-02-13 1:35 ` Woody Suwalski @ 2017-02-13 8:02 ` Pavel Machek 2017-02-13 8:48 ` Thomas Gleixner 0 siblings, 1 reply; 17+ messages in thread From: Pavel Machek @ 2017-02-13 8:02 UTC (permalink / raw) To: Woody Suwalski; +Cc: Rafael J. Wysocki, kernel list, tglx, mingo, hpa [-- Attachment #1: Type: text/plain, Size: 1813 bytes --] Hi! > >>>And now it is: > >>>[692517.868523] ACPI: Preparing to enter system sleep state S3 > >>>[692518.172074] ACPI : EC: EC stopped > >>>[692518.172076] PM: Saving platform NVS memory > >>>[692518.172269] Disabling non-boot CPUs ... > >>>[692518.172269] ACPI: Low-level resume complete > >>>[692518.172269] ACPI : EC: EC started > >>>[692518.172269] PM: Restoring platform NVS memory > >>>[692518.172269] ACPI: Waking up from system sleep state S3 > >>> > >>>Is there any test I could do on the CPU wakeup while in that state? > >>> > >>Is there a way to kick the offline-CPU into operation from /sys level? > >echo 0 > /sys/devices/system/cpu/cpu1/online > > > >should work. And... good thinking :-). > > Pavel > Did not work, > echo 0 > /sys/devices/system/cpu/cpu1/online > -su: echo: write error: Device or resource busy > > However > echo 1 > /sys/devices/system/cpu/cpu1/online > did not return an error ( but still only CPU0 seen) > > Interesting experiment: I have hibernated and then woke up - and still only > CPU0. I was expecting that after the power cycle the hotplug will bring CPU1 > up... Which is interesting indeed. Hardware was powercycled, so what is broken is probably some kernel state... [Evil: /sys/devices/cpu/ -- some perf stuff. /sys/devices/system/cpu -- that's where real CPU stuff is hiding.] cd /sys/devices/system/cpu/cpu1 while true; do echo 0 > online; echo 1 > online; done ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should not die, but also good -- this is easier to reproduce then running 100 suspend cycles.] Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-02-13 8:02 ` Pavel Machek @ 2017-02-13 8:48 ` Thomas Gleixner 2017-02-13 9:42 ` Pavel Machek 0 siblings, 1 reply; 17+ messages in thread From: Thomas Gleixner @ 2017-02-13 8:48 UTC (permalink / raw) To: Pavel Machek; +Cc: Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa On Mon, 13 Feb 2017, Pavel Machek wrote: > cd /sys/devices/system/cpu/cpu1 > while true; do echo 0 > online; echo 1 > online; done > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should > not die, but also good -- this is easier to reproduce then running 100 > suspend cycles.] Can you tell where it crashes? Thanks, tglx ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-02-13 8:48 ` Thomas Gleixner @ 2017-02-13 9:42 ` Pavel Machek 2017-02-13 10:18 ` lkml 0 siblings, 1 reply; 17+ messages in thread From: Pavel Machek @ 2017-02-13 9:42 UTC (permalink / raw) To: Thomas Gleixner Cc: Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa [-- Attachment #1: Type: text/plain, Size: 707 bytes --] Hi! On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote: > On Mon, 13 Feb 2017, Pavel Machek wrote: > > cd /sys/devices/system/cpu/cpu1 > > while true; do echo 0 > online; echo 1 > online; done > > > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should > > not die, but also good -- this is easier to reproduce then running 100 > > suspend cycles.] > > Can you tell where it crashes? I did not expect a crash, so I was in X... I have a feeling that this will be reproducible on a lot of hardware, but let me try. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-02-13 9:42 ` Pavel Machek @ 2017-02-13 10:18 ` lkml 2017-02-13 11:25 ` Thomas Gleixner 2017-02-13 11:53 ` Pavel Machek 0 siblings, 2 replies; 17+ messages in thread From: lkml @ 2017-02-13 10:18 UTC (permalink / raw) To: Pavel Machek Cc: Thomas Gleixner, Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa On Mon, Feb 13, 2017 at 10:42:36AM +0100, Pavel Machek wrote: > Hi! > > On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote: > > On Mon, 13 Feb 2017, Pavel Machek wrote: > > > cd /sys/devices/system/cpu/cpu1 > > > while true; do echo 0 > online; echo 1 > online; done > > > > > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should > > > not die, but also good -- this is easier to reproduce then running 100 > > > suspend cycles.] > > > > Can you tell where it crashes? > > I did not expect a crash, so I was in X... I have a feeling that this > will be reproducible on a lot of hardware, but let me try. FYI: Lockup reproduced with 4.10.0-rc7 with an X61s. Caught a glimpse of something about an RCU stall timeout before the system shut off. Prior to that, during the loop execution, a bunch of systemd processes were experiencing watchdog timeouts, and procps `top` would start but never refresh, leaving the CPU column all "nan". Regards, Vito Caputo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-02-13 10:18 ` lkml @ 2017-02-13 11:25 ` Thomas Gleixner 2017-02-13 11:39 ` lkml 2017-02-13 11:40 ` lkml 2017-02-13 11:53 ` Pavel Machek 1 sibling, 2 replies; 17+ messages in thread From: Thomas Gleixner @ 2017-02-13 11:25 UTC (permalink / raw) To: lkml Cc: Pavel Machek, Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa On Mon, 13 Feb 2017, lkml@pengaru.com wrote: > On Mon, Feb 13, 2017 at 10:42:36AM +0100, Pavel Machek wrote: > > Hi! > > > > On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote: > > > On Mon, 13 Feb 2017, Pavel Machek wrote: > > > > cd /sys/devices/system/cpu/cpu1 > > > > while true; do echo 0 > online; echo 1 > online; done > > > > > > > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should > > > > not die, but also good -- this is easier to reproduce then running 100 > > > > suspend cycles.] > > > > > > Can you tell where it crashes? > > > > I did not expect a crash, so I was in X... I have a feeling that this > > will be reproducible on a lot of hardware, but let me try. > > FYI: Lockup reproduced with 4.10.0-rc7 with an X61s. > > Caught a glimpse of something about an RCU stall timeout before the system shut > off. Prior to that, during the loop execution, a bunch of systemd processes > were experiencing watchdog timeouts, and procps `top` would start but > never refresh, leaving the CPU column all "nan". Does the machine use intel_idle by chance? Thanks, tglx ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-02-13 11:25 ` Thomas Gleixner @ 2017-02-13 11:39 ` lkml 2017-02-13 11:40 ` lkml 1 sibling, 0 replies; 17+ messages in thread From: lkml @ 2017-02-13 11:39 UTC (permalink / raw) To: Thomas Gleixner Cc: Pavel Machek, Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa On Mon, Feb 13, 2017 at 12:25:28PM +0100, Thomas Gleixner wrote: > On Mon, 13 Feb 2017, lkml@pengaru.com wrote: > > > On Mon, Feb 13, 2017 at 10:42:36AM +0100, Pavel Machek wrote: > > > Hi! > > > > > > On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote: > > > > On Mon, 13 Feb 2017, Pavel Machek wrote: > > > > > cd /sys/devices/system/cpu/cpu1 > > > > > while true; do echo 0 > online; echo 1 > online; done > > > > > > > > > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should > > > > > not die, but also good -- this is easier to reproduce then running 100 > > > > > suspend cycles.] > > > > > > > > Can you tell where it crashes? > > > > > > I did not expect a crash, so I was in X... I have a feeling that this > > > will be reproducible on a lot of hardware, but let me try. > > > > FYI: Lockup reproduced with 4.10.0-rc7 with an X61s. > > > > Caught a glimpse of something about an RCU stall timeout before the system shut > > off. Prior to that, during the loop execution, a bunch of systemd processes > > were experiencing watchdog timeouts, and procps `top` would start but > > never refresh, leaving the CPU column all "nan". > > Does the machine use intel_idle by chance? > It's not enabled in my kernel, /proc/config.gz attached FWIW. Regards, Vito Caputo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-02-13 11:25 ` Thomas Gleixner 2017-02-13 11:39 ` lkml @ 2017-02-13 11:40 ` lkml 1 sibling, 0 replies; 17+ messages in thread From: lkml @ 2017-02-13 11:40 UTC (permalink / raw) To: Thomas Gleixner Cc: Pavel Machek, Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa [-- Attachment #1: Type: text/plain, Size: 1275 bytes --] On Mon, Feb 13, 2017 at 12:25:28PM +0100, Thomas Gleixner wrote: > On Mon, 13 Feb 2017, lkml@pengaru.com wrote: > > > On Mon, Feb 13, 2017 at 10:42:36AM +0100, Pavel Machek wrote: > > > Hi! > > > > > > On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote: > > > > On Mon, 13 Feb 2017, Pavel Machek wrote: > > > > > cd /sys/devices/system/cpu/cpu1 > > > > > while true; do echo 0 > online; echo 1 > online; done > > > > > > > > > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should > > > > > not die, but also good -- this is easier to reproduce then running 100 > > > > > suspend cycles.] > > > > > > > > Can you tell where it crashes? > > > > > > I did not expect a crash, so I was in X... I have a feeling that this > > > will be reproducible on a lot of hardware, but let me try. > > > > FYI: Lockup reproduced with 4.10.0-rc7 with an X61s. > > > > Caught a glimpse of something about an RCU stall timeout before the system shut > > off. Prior to that, during the loop execution, a bunch of systemd processes > > were experiencing watchdog timeouts, and procps `top` would start but > > never refresh, leaving the CPU column all "nan". > > Does the machine use intel_idle by chance? > Config actually attached this time! Regards, Vito Caputo [-- Attachment #2: config.gz --] [-- Type: application/octet-stream, Size: 24085 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 4.10-rc1: thinkpad x60: who ate my cpu? 2017-02-13 10:18 ` lkml 2017-02-13 11:25 ` Thomas Gleixner @ 2017-02-13 11:53 ` Pavel Machek 1 sibling, 0 replies; 17+ messages in thread From: Pavel Machek @ 2017-02-13 11:53 UTC (permalink / raw) To: lkml Cc: Thomas Gleixner, Woody Suwalski, Rafael J. Wysocki, kernel list, mingo, hpa [-- Attachment #1: Type: text/plain, Size: 1439 bytes --] On Mon 2017-02-13 04:18:51, lkml@pengaru.com wrote: > On Mon, Feb 13, 2017 at 10:42:36AM +0100, Pavel Machek wrote: > > Hi! > > > > On Mon 2017-02-13 09:48:41, Thomas Gleixner wrote: > > > On Mon, 13 Feb 2017, Pavel Machek wrote: > > > > cd /sys/devices/system/cpu/cpu1 > > > > while true; do echo 0 > online; echo 1 > online; done > > > > > > > > ...crashes x60 with 4.10-rc in few minutes. [Which is bad -- it should > > > > not die, but also good -- this is easier to reproduce then running 100 > > > > suspend cycles.] > > > > > > Can you tell where it crashes? > > > > I did not expect a crash, so I was in X... I have a feeling that this > > will be reproducible on a lot of hardware, but let me try. > > FYI: Lockup reproduced with 4.10.0-rc7 with an X61s. > > Caught a glimpse of something about an RCU stall timeout before the system shut > off. Prior to that, during the loop execution, a bunch of systemd processes > were experiencing watchdog timeouts, and procps `top` would start but > never refresh, leaving the CPU column all "nan". Hmm. I was not able to reproduce the lockup when I was running "while true" over ssh. Some weird stuff happened when I suspended machine with that loop running, but... I guess that was expected. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-02-13 11:53 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-08 22:17 4.10-rc1: thinkpad x60: who ate my cpu? Pavel Machek 2017-01-09 9:30 ` Pavel Machek 2017-01-13 1:19 ` Woody Suwalski 2017-01-14 11:30 ` Pavel Machek 2017-01-15 9:56 ` Pavel Machek 2017-02-11 23:48 ` Woody Suwalski 2017-02-12 15:43 ` Woody Suwalski 2017-02-12 19:57 ` Pavel Machek 2017-02-13 1:35 ` Woody Suwalski 2017-02-13 8:02 ` Pavel Machek 2017-02-13 8:48 ` Thomas Gleixner 2017-02-13 9:42 ` Pavel Machek 2017-02-13 10:18 ` lkml 2017-02-13 11:25 ` Thomas Gleixner 2017-02-13 11:39 ` lkml 2017-02-13 11:40 ` lkml 2017-02-13 11:53 ` Pavel Machek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).