From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751290AbdBLPn5 (ORCPT ); Sun, 12 Feb 2017 10:43:57 -0500 Received: from mail-io0-f195.google.com ([209.85.223.195]:35243 "EHLO mail-io0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751200AbdBLPn4 (ORCPT ); Sun, 12 Feb 2017 10:43:56 -0500 Subject: Re: 4.10-rc1: thinkpad x60: who ate my cpu? To: Pavel Machek , "Rafael J. Wysocki" Cc: kernel list , tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com References: <20170108221721.GB4878@amd> <20170109093001.GA30709@amd> <41553b16-c527-d99b-b56b-31d6a08a7e8a@gmail.com> <20170114113054.GA22012@amd> <20170115095656.GA16524@amd> <1614c21c-3626-074e-e3c3-26e9cd200454@gmail.com> From: Woody Suwalski Message-ID: <3c3d35ac-e4e4-6a6c-a78e-b0478ff39726@gmail.com> Date: Sun, 12 Feb 2017 10:43:56 -0500 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:49.0) Gecko/20100101 Firefox/49.0 SeaMonkey/2.46 MIME-Version: 1.0 In-Reply-To: <1614c21c-3626-074e-e3c3-26e9cd200454@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Woody Suwalski wrote: > Pavel Machek wrote: >> On Sat 2017-01-14 12:30:54, Pavel Machek wrote: >>> Hi! >>> >>> On Thu 2017-01-12 20:19:31, Woody Suwalski wrote: >>>> Pavel Machek wrote: >>>>> Hi! >>>>> >>>>>> I used to have two cpus, and Thinkpad X60 should have two cores, >>>>>> but I >>>>>> only see one on 4.10-rc1. This machine went through many >>>>>> suspend/resume cycles. When backups finish, I'll try -rc2. >>>>> Whoever did it, he seems to have returned the cpu in -rc3. All seems >>>>> to be good now. >>>> Actually since you have mentioned - I have checked my x60 - same >>>> problem - >>>> only one CPU. However I was running 4.8.13 with uptime 33 days, >>>> multiple >>>> sleep/wake-ups. >>>> Installed a current EOL 4.8.17 and rebooted - I see 2 CPUs. So the >>>> issue is >>>> older then 4.10 kernel, and I suspect it is the CPU hotplug / wakeup >>>> related... >>> Hmm. So I seen two cores in -rc3 after boot. But it is quite well >>> possible that -rc1 was ok just after boot, too, and problem happened >>> sometime later (probably during suspend/resume cycles). Let me go back >>> to -rc1 to check. >> Indeed in -rc1 I see both CPUs after boot. So we have hard to >> reproduce case where 4.8 to 4.10 kernels lose one of the cpu cores... >> >> >> > Managed to duplicate - but it took again a long time - I have an > uptime of 29 days. > It must have happened in the last day, as I kept checking as often as > I remembered. > > The kernel is 4.8.17 EOL, installed almost a month ago. > Platform ThinkPad x60, Intel(R) Core(TM) Duo CPU T2400 @ 1.83GHz > > In dmesg I see that it used to be when 2 CPUs were OK: > [690409.476107] PM: noirq suspend of devices complete after 79.914 msecs > [690409.476547] ACPI: Preparing to enter system sleep state S3 > [690409.780081] ACPI : EC: EC stopped > [690409.780083] PM: Saving platform NVS memory > [690409.780284] Disabling non-boot CPUs ... > [690409.805284] smpboot: CPU 1 is now offline > [690409.816464] ACPI: Low-level resume complete > [690409.816464] ACPI : EC: EC started > [690409.816464] PM: Restoring platform NVS memory > [690409.816464] Enabling non-boot CPUs ... > [690409.840574] x86: Booting SMP configuration: > [690409.840576] smpboot: Booting Node 0 Processor 1 APIC 0x1 > [690409.805271] Initializing CPU#1 > [690409.805271] Disabled fast string operations > [690409.888252] cache: parent cpu1 should not be sleeping > [690409.920185] CPU1 is up > [690409.922288] ACPI: Waking up from system sleep state S3 > > Then the CPU1 failed to start: > > [691329.776108] PM: noirq suspend of devices complete after 79.941 msecs > [691329.776550] ACPI: Preparing to enter system sleep state S3 > [691330.080081] ACPI : EC: EC stopped > [691330.080083] PM: Saving platform NVS memory > [691330.080284] Disabling non-boot CPUs ... > [691330.105303] smpboot: CPU 1 is now offline > [691330.116477] ACPI: Low-level resume complete > [691330.116477] ACPI : EC: EC started > [691330.116477] PM: Restoring platform NVS memory > [691330.116477] Enabling non-boot CPUs ... > [691330.140570] x86: Booting SMP configuration: > [691330.140572] smpboot: Booting Node 0 Processor 1 APIC 0x1 > [691340.140015] smpboot: do_boot_cpu failed(-1) to wakeup CPU#1 > [691340.164445] Error taking CPU1 up: -5 > [691340.166309] ACPI: Waking up from system sleep state S3 > > And now it is: > [692517.868523] ACPI: Preparing to enter system sleep state S3 > [692518.172074] ACPI : EC: EC stopped > [692518.172076] PM: Saving platform NVS memory > [692518.172269] Disabling non-boot CPUs ... > [692518.172269] ACPI: Low-level resume complete > [692518.172269] ACPI : EC: EC started > [692518.172269] PM: Restoring platform NVS memory > [692518.172269] ACPI: Waking up from system sleep state S3 > > Is there any test I could do on the CPU wakeup while in that state? > > Woody > Is there a way to kick the offline-CPU into operation from /sys level?