linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Evan Green <evgreen@chromium.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Rajat Jain <rajatja@google.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci <linux-pci@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	x86@kernel.org, Marc Zyngier <maz@kernel.org>
Subject: Re: [PATCH V2] x86/apic/msi: Plug non-maskable MSI affinity race
Date: Fri, 31 Jan 2020 12:32:37 -0800	[thread overview]
Message-ID: <CAE=gft4cGYL7jHLqcGCU9J_efHs5dd+QyP8NfW5iSZCoi-SVOg@mail.gmail.com> (raw)
In-Reply-To: <87imkr4s7n.fsf@nanos.tec.linutronix.de>

On Fri, Jan 31, 2020 at 6:27 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Thomas Gleixner <tglx@linutronix.de> writes:
>
> Evan tracked down a subtle race between the update of the MSI message and
> the device raising an interrupt internally on PCI devices which do not
> support MSI masking. The update of the MSI message is non-atomic and
> consists of either 2 or 3 sequential 32bit wide writes to the PCI config
> space.
>
>    - Write address low 32bits
>    - Write address high 32bits (If supported by device)
>    - Write data
>
> When an interrupt is migrated then both address and data might change, so
> the kernel attempts to mask the MSI interrupt first. But for MSI masking is
> optional, so there exist devices which do not provide it. That means that
> if the device raises an interrupt internally between the writes and MSI
> message is sent built from half updated state.
>
> On x86 this can lead to spurious interrupts on the wrong interrupt
> vector when the affinity setting changes both address and data. As a
> consequence the device interrupt can be lost causing the device to
> become stuck or malfunctioning.
>
> Evan tried to handle that by disabling MSI accross an MSI message
> update. That's not feasible because disabling MSI has issues on its own:
>
>  If MSI is disabled the PCI device is routing an interrupt to the legacy
>  INTx mechanism. The INTx delivery can be disabled, but the disablement is
>  not working on all devices.
>
>  Some devices lose interrupts when both MSI and INTx delivery are disabled.
>
> Another way to solve this would be to enforce the allocation of the same
> vector on all CPUs in the system for this kind of screwed devices. That
> could be done, but it would bring back the vector space exhaustion problems
> which got solved a few years ago.
>
> Fortunately the high address (if supported by the device) is only relevant
> when X2APIC is enabled which implies interrupt remapping. In the interrupt
> remapping case the affinity setting is happening at the interrupt remapping
> unit and the PCI MSI message is programmed only once when the PCI device is
> initialized.
>
> That makes it possible to solve it with a two step update:
>
>   1) Target the MSI msg to the new vector on the current target CPU
>
>   2) Target the MSI msg to the new vector on the new target CPU
>
> In both cases writing the MSI message is only changing a single 32bit word
> which prevents the issue of inconsistency.
>
> After writing the final destination it is necessary to check whether the
> device issued an interrupt while the intermediate state #1 (new vector,
> current CPU) was in effect.
>
> This is possible because the affinity change is always happening on the
> current target CPU. The code runs with interrupts disabled, so the
> interrupt can be detected by checking the IRR of the local APIC. If the
> vector is pending in the IRR then the interrupt is retriggered on the new
> target CPU by sending an IPI for the associated vector on the target CPU.
>
> This can cause spurious interrupts on both the local and the new target
> CPU.
>
>  1) If the new vector is not in use on the local CPU and the device
>     affected by the affinity change raised an interrupt during the
>     transitional state (step #1 above) then interrupt entry code will
>     ignore that spurious interrupt. The vector is marked so that the
>     'No irq handler for vector' warning is supressed once.
>
>  2) If the new vector is in use already on the local CPU then the IRR check
>     might see an pending interrupt from the device which is using this
>     vector. The IPI to the new target CPU will then invoke the handler of
>     the device, which got the affinity change, even if that device did not
>     issue an interrupt
>
>  3) If the new vector is in use already on the local CPU and the device
>     affected by the affinity change raised an interrupt during the
>     transitional state (step #1 above) then the handler of the device which
>     uses that vector on the local CPU will be invoked.
>
> #1 is uninteresting and has no unintended side effects. #2 and #3 might
> expose issues in device driver interrupt handlers which are not prepared to
> handle a spurious interrupt correctly. This not a regression, it's just
> exposing something which was already broken as spurious interrupts can
> happen for a lot of reasons and all driver handlers need to be able to deal
> with them.
>
> Reported-by: Evan Green <evgreen@chromium.org>
> Debugged-by: Evan Green <evgreen@chromium.org>                                                                                        Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Heh, thanks for the credit. Something weird happened on this line with
your signoff, though.
I've been running this on my system for a few hours with no issues
(normal repro in <1 minute). So,

Tested-by: Evan Green <evgreen@chromium.org>

  reply	other threads:[~2020-01-31 20:33 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-18  0:25 [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs Evan Green
2020-01-22 11:25 ` Rajat Jain
2020-01-22 18:00   ` Evan Green
2020-01-23  8:49     ` Thomas Gleixner
2020-01-23 18:16       ` Thomas Gleixner
     [not found]         ` <CAE=gft6YiM5S1A7iJYJTd5zmaAa8=nhLE3B94JtWa+XW-qVSqQ@mail.gmail.com>
2020-01-23 22:59           ` Evan Green
2020-01-24  0:29             ` Evan Green
2020-01-24 14:34               ` Thomas Gleixner
2020-01-24 21:53                 ` Evan Green
2020-01-24 22:50                   ` Thomas Gleixner
2020-01-28 14:38                     ` Thomas Gleixner
2020-01-28 22:22                       ` Evan Green
2020-01-28 22:48                         ` Thomas Gleixner
2020-01-29 18:00                           ` Evan Green
2020-01-29 21:00                             ` Thomas Gleixner
2020-01-29 22:53                               ` Evan Green
2020-01-29 23:16                                 ` Thomas Gleixner
2020-01-29 23:48                                   ` Evan Green
2020-01-31 11:27                                     ` [PATCH] x86/apic/msi: Plug non-maskable MSI affinity race Thomas Gleixner
2020-01-31 14:26                                       ` [PATCH V2] " Thomas Gleixner
2020-01-31 20:32                                         ` Evan Green [this message]
2020-01-31 21:45                                           ` Thomas Gleixner
2020-02-01  8:36                                         ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner
2020-01-24  0:50             ` [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs Thomas Gleixner
2020-01-25 18:32 ` Jacob Pan
2020-01-26  8:09   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAE=gft4cGYL7jHLqcGCU9J_efHs5dd+QyP8NfW5iSZCoi-SVOg@mail.gmail.com' \
    --to=evgreen@chromium.org \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=rajatja@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).