From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752111AbdLFOFW (ORCPT <rfc822;w@1wt.eu>);
        Wed, 6 Dec 2017 09:05:22 -0500
Received: from cloudserver094114.home.net.pl ([79.96.170.134]:51000 "EHLO
        cloudserver094114.home.net.pl" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1751598AbdLFOFV (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 6 Dec 2017 09:05:21 -0500
From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Hocko <mhocko@kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Andy Lutomirski <luto@kernel.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        the arch/x86 maintainers <x86@kernel.org>
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3
Date: Wed, 06 Dec 2017 15:04:43 +0100
Message-ID: <6368800.kWPUrNViPU@aspire.rjw.lan>
In-Reply-To: <alpine.DEB.2.20.1712061320090.1724@nanos>
References: <CA+55aFxPBszFBt91KRNBrsQdJ10b+6fh9ySNzSKSX7JOq4WRPw@mail.gmail.com> <20171206121452.GA6320@dhcp22.suse.cz> <alpine.DEB.2.20.1712061320090.1724@nanos>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wednesday, December 6, 2017 1:23:34 PM CET Thomas Gleixner wrote:
> On Wed, 6 Dec 2017, Michal Hocko wrote:
> > merging tip/x86/urgent on top of your tree fixed this problem for me,
> > but I am seeing something else
> > [  131.711412] ACPI: Preparing to enter system sleep state S3
> > [  131.755328] ACPI: EC: event blocked
> > [  131.755328] ACPI: EC: EC stopped
> > [  131.755328] PM: Saving platform NVS memory
> > [  131.755344] Disabling non-boot CPUs ...
> > [  131.779330] IRQ 124: no longer affine to CPU1
> > [  131.780334] smpboot: CPU 1 is now offline
> > [  131.804465] smpboot: CPU 2 is now offline
> > [  131.827291] IRQ 122: no longer affine to CPU3
> > [  131.827292] IRQ 123: no longer affine to CPU3
> > [  131.828293] smpboot: CPU 3 is now offline
> > [  131.830991] ACPI: Low-level resume complete
> > [  131.831092] ACPI: EC: EC started
> > [  131.831093] PM: Restoring platform NVS memory
> > [  131.831864] do_IRQ: 0.55 No irq handler for vector
> 
> Hmm, that's really odd.
> 
> > [  131.831884] Enabling non-boot CPUs ...
> > [  131.831909] x86: Booting SMP configuration:
> > [  131.831910] smpboot: Booting Node 0 Processor 1 APIC 0x2
> > [  131.832913]  cache: parent cpu1 should not be sleeping
> 
> This is an old one. 
> 
> > [  131.833058] CPU1 is up
> > [  131.833067] smpboot: Booting Node 0 Processor 2 APIC 0x1
> > [  131.833864]  cache: parent cpu2 should not be sleeping
> > [  131.833983] CPU2 is up
> > [  131.833995] smpboot: Booting Node 0 Processor 3 APIC 0x3
> > [  131.834776]  cache: parent cpu3 should not be sleeping
> > [  131.834923] CPU3 is up
> > 
> > "No irq handler" part looks a bit scary (maybe related to lost affinity
> > messages?) but the following messages look quite as well. Is this
> > something known? The system seems to be up and running without any
> > visible issues.
> 
> I assume it's due to the affinity break, just that we don't know right now
> on which CPU that do_IRQ() message triggered. I assume it's CPU0 because
> the others are offline already, but ....

This is resume from S3, so the firmware might do something odd to the other
CPUs, but in case it didn't (which is quite likely or we would have seen more
of these messages), they are offline and in mwait_play_dead(), so IMO it is
safe to assume that this was CPU0.

And this appears to have happened at the atch_suspend_enable_irqs() time,
which is just local_irq_enable() on x86 running on CPU0.

> I'll think about it how we can figure out what's going on.

It looks like an interrupt that have triggered right after we've enabled
interrupts on the boot CPU.

Thanks,
Rafael