linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>,
	Michal Hocko <mhocko@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Andy Lutomirski <luto@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Daniel Vetter <daniel.vetter@intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	linux-pci@vger.kernel.org, linux-pm@vger.kernel.org
Subject: Re: Linux 4.15-rc2: Regression in resume from ACPI S3
Date: Wed, 13 Dec 2017 10:23:37 -0600	[thread overview]
Message-ID: <20171213162336.GG53955@bhelgaas-glaptop.roam.corp.google.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1712131507160.1885@nanos>

[+cc linux-pci, linux-pm]

On Wed, Dec 13, 2017 at 04:57:56PM +0100, Thomas Gleixner wrote:
> So I was finally able to figure out what the hell is going on:
> 
> Suspend:
> 
>  - The device suspend code puts the graphics card into a power
>    state != PCI_D0.
> 
>  - Offline non boot CPUs
> 
>  - Break interrupt affinity. Allocate new vector on CPU 0, compose and
>    write MSI message which ends up in:
> 
>    __pci_write_msi_msg(entry, msg)
>    {
> 	if (dev->current_state != PCI_D0 || pci_dev_is_disconnected(dev)) {
> 	   /* Don't touch the hardware now */
> 	} else {
> 	   ....
> 	}
> 	entry->msg = *msg;
>    }
>  
>   So because the device is not in PCI_D0 the message is not written. It's
>   written in the device resume path.

I'm not a PM guru, but this ordering seems fragile.  If we offline
CPUs before re-targeting interrupts directed at those CPUs, aren't we
always going to be at risk of sending interrupts to an offline CPU?

Even if the device is now asleep and therefore should not generate an
interrupt, it seems like there's a window when the device returns to
PCI_D0 where it could generate an interrupt before we have a chance to
update the MSI message.

> Resume:
> [  139.670446] ACPI: Low-level resume complete
> [  139.670541] PM: Restoring platform NVS memory
> [  139.672462] do_IRQ: 0.55 No irq handler for vector
> [  139.672475] Enabling non-boot CPUs ...
> 
> So the spurious interrupt happens early and way before the device resume
> code writes the new MSI message.
> 
> I checked the behaviour on 4.14. The MSI write is delayed there in the same
> way, but there is no spurious interrupt. There is no interrupt coming in at
> all _BEFORE_ the device is put out of PCI_D0.
> 
> And this has certainly nothing to do with the vector management changes,
> but I can't figure yet what makes that spurious interrupt to be sent.
> 
> Any ideas welcome.
> 
> Thanks,
> 
> 	tglx
> 

  reply	other threads:[~2017-12-13 16:23 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-03 16:22 Linux 4.15-rc2 Linus Torvalds
2017-12-04 22:25 ` Linux 4.15-rc2: Regression in resume from ACPI S3 Rafael J. Wysocki
2017-12-04 22:36   ` Linus Torvalds
2017-12-04 22:38     ` Thomas Gleixner
2017-12-04 22:41       ` Rafael J. Wysocki
2017-12-05  0:25         ` Rafael J. Wysocki
2017-12-09 10:33           ` Pavel Machek
2017-12-09 11:41             ` Pavel Machek
     [not found]             ` <CA+55aFw8tuoJ2gcXx3K2sKFf2Y9hXX4naMVQNqGOUivnjwhjkg@mail.gmail.com>
2017-12-09 22:01               ` Pavel Machek
     [not found]                 ` <CA+55aFySAdiBZhZ0PSDjH5PuvPPcMsBRXbxCkObfm1eY7gHDbQ@mail.gmail.com>
2017-12-10 16:23                   ` Pavel Machek
2017-12-10 16:37                     ` Linus Torvalds
2017-12-10 18:56                       ` Pavel Machek
2017-12-10 20:30                         ` Linus Torvalds
2017-12-10 20:43                           ` Pavel Machek
2017-12-10 21:28                             ` Linus Torvalds
2017-12-10 21:35                               ` Pavel Machek
2017-12-12 17:27                               ` Linus Torvalds
2017-12-12 18:05                                 ` Andy Lutomirski
2017-12-12 18:36                                   ` Linus Torvalds
2017-12-12 22:10                                     ` Andy Lutomirski
2017-12-12 22:33                                       ` Linus Torvalds
2017-12-12 23:10                                         ` Andy Lutomirski
2017-12-13 11:16                                       ` Jarkko Nikula
2017-12-13 12:40                                         ` Ingo Molnar
2017-12-13 18:50                                         ` Andy Lutomirski
2017-12-10 21:38                           ` [PATCH] Fix resume on x86-32 machines Pavel Machek
2017-12-10 21:58                             ` Andy Lutomirski
2017-12-10 22:20                               ` Pavel Machek
2017-12-11  9:25                                 ` Jarkko Nikula
2017-12-11 14:22                               ` Rafael J. Wysocki
2017-12-11 14:43                                 ` Rafael J. Wysocki
2017-12-11 14:59                                 ` Jarkko Nikula
2017-12-11 18:31                                 ` Linus Torvalds
2017-12-11 18:41                                   ` Andy Lutomirski
2017-12-11 19:12                                     ` Linus Torvalds
2017-12-14 20:38                                     ` Pavel Machek
2017-12-14 20:47                                       ` Linus Torvalds
2017-12-14 21:20                                         ` Andy Lutomirski
2017-12-14 22:22                                         ` Pavel Machek
2017-12-11 15:13                               ` Ingo Molnar
2017-12-11 16:26                                 ` Andy Lutomirski
2017-12-11 14:09                           ` Linux 4.15-rc2: Regression in resume from ACPI S3 Zhang Rui
2017-12-11 16:28                             ` Andy Lutomirski
2017-12-12  8:00                             ` Pavel Machek
2017-12-06 12:15     ` Michal Hocko
2017-12-06 12:23       ` Thomas Gleixner
2017-12-06 14:04         ` Rafael J. Wysocki
2017-12-06 12:31       ` Maarten Lankhorst
2017-12-06 12:46         ` Thomas Gleixner
2017-12-06 13:09           ` Maarten Lankhorst
2017-12-06 14:15             ` Thomas Gleixner
2017-12-07 13:33               ` Maarten Lankhorst
2017-12-08 10:30                 ` Thomas Gleixner
2017-12-13 15:57                   ` Thomas Gleixner
2017-12-13 16:23                     ` Bjorn Helgaas [this message]
2017-12-13 16:41                       ` Thomas Gleixner
2017-12-13 17:45                         ` Linus Torvalds
2017-12-13 18:19                           ` Thomas Gleixner
2017-12-13 20:52                             ` Thomas Gleixner
2017-12-13 21:06                               ` Thomas Gleixner
2017-12-13 22:48                                 ` Rafael J. Wysocki
2017-12-14 11:54                                 ` Thomas Gleixner
2017-12-14 12:12                                   ` Rafael J. Wysocki
2017-12-14 12:30                                     ` Thomas Gleixner
2017-12-14 15:30                                       ` Rafael J. Wysocki
2017-12-14 15:52                                         ` Thomas Gleixner
2017-12-14 15:54                                           ` Rafael J. Wysocki
2017-12-14 16:17                                             ` Maarten Lankhorst
2017-12-15  2:07                                             ` [PATCH] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() Rafael J. Wysocki
2017-12-15 14:28                                               ` Rafael J. Wysocki
2017-12-15 18:30                                               ` Bjorn Helgaas
2017-12-15 23:44                                                 ` Rafael J. Wysocki
2017-12-14 13:24                                   ` Linux 4.15-rc2: Regression in resume from ACPI S3 Thomas Gleixner
2017-12-14 19:03                                   ` Linus Torvalds
2017-12-14 22:36                                     ` Thomas Gleixner
2017-12-14 22:47                                       ` Linus Torvalds
2017-12-15  9:05                                         ` Thomas Gleixner
2017-12-15  0:34                                       ` Rafael J. Wysocki
2017-12-13 22:39                             ` Rafael J. Wysocki
2017-12-13 23:26                               ` Rafael J. Wysocki
2017-12-07  7:55       ` Michal Hocko
2017-12-10 20:30         ` Michal Hocko
2018-02-21 18:36 ` Linux 4.15-rc2 Eugene Syromiatnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171213162336.GG53955@bhelgaas-glaptop.roam.corp.google.com \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=daniel.vetter@intel.com \
    --cc=dev@mblankhorst.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).