From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753790AbdLMST1 (ORCPT ); Wed, 13 Dec 2017 13:19:27 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:36103 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753144AbdLMSTZ (ORCPT ); Wed, 13 Dec 2017 13:19:25 -0500 Date: Wed, 13 Dec 2017 19:19:17 +0100 (CET) From: Thomas Gleixner To: Linus Torvalds cc: Bjorn Helgaas , Maarten Lankhorst , Michal Hocko , "Rafael J. Wysocki" , Andy Lutomirski , Linux Kernel Mailing List , the arch/x86 maintainers , Daniel Vetter , Bjorn Helgaas , "Rafael J. Wysocki" , linux-pci@vger.kernel.org, linux-pm@vger.kernel.org Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3 In-Reply-To: Message-ID: References: <168050887.sZlTFXWCmO@aspire.rjw.lan> <20171206121452.GA6320@dhcp22.suse.cz> <0f1d3d63-fa10-5cef-8014-81753dc60243@mblankhorst.nl> <57c8679e-1b88-c9ad-2299-2bea7560b28f@mblankhorst.nl> <20171213162336.GG53955@bhelgaas-glaptop.roam.corp.google.com> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 13 Dec 2017, Linus Torvalds wrote: > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner wrote: > > > > Definitely. That was fragile forever but puzzles me is that I can't figure > > out what now causes that spurious interrupt to surface out of the blue. > > Perhaps just timing? That's what I'm trying to figure out right now, because that is the only sensible explanation left. The whole machinery of suspend is exactly the same with and without the vector changes. I instrumented all functions involved and the picture is the same. I even do not see any fundamental timing differences where one would say: That's it. What puzzles me even more is that in the range of commits I'm fiddling with there is no other change than the vector management stuff and the point where it breaks makes no sense at all. The point Maarten bisected it to works nicely here, so that might just point to a very subtle timing issue. > How hard would it be to change the ordering to just redirect irqs first? The whole interrupt redirection happens when the non boot CPUs are brought down, which is the very last step before the actual suspend happens. We could probably do that earlier, but that's something Rafael needs to answer ultimately. Thanks, tglx