linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Evan Green <evgreen@chromium.org>
Cc: Rajat Jain <rajatja@google.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci <linux-pci@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs
Date: Fri, 24 Jan 2020 01:50:36 +0100	[thread overview]
Message-ID: <87pnf91xur.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <CAE=gft5xta4XCJtctWe=R3w=kVr598JCbk9VSRue04nzKAk3CQ@mail.gmail.com>

Evan Green <evgreen@chromium.org> writes:
> On Thu, Jan 23, 2020 at 12:59 PM Evan Green <evgreen@chromium.org> wrote:
>>
>> On Thu, Jan 23, 2020 at 10:17 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>> >
>> > Evan,
>> >
>> > Thomas Gleixner <tglx@linutronix.de> writes:
>> > > This is not yet debugged fully and as this is happening on MSI-X I'm not
>> > > really convinced yet that your 'torn write' theory holds.

As you pointed out that this is not on MSI-X I'm considering the torn
write theory to be more likely. :)

>> > can you please apply the debug patch below and run your test. When the
>> > failure happens, stop the tracer and collect the trace.
>> >
>> > Another question. Did you ever try to change the affinity of that
>> > interrupt without hotplug rapidly while the device makes traffic? If
>> > not, it would be interesting whether this leads to a failure as well.
>>
>> Thanks for the patch. Looks pretty familiar :)
>> I ran into issues where trace_printks on offlined cores seem to
>> disappear. I even made sure the cores were back online when I
>> collected the trace. So your logs might not be useful. Known issue
>> with the tracer?

No. I tried the patch myself to verify that it does what I want.

The only information I'm missing right now is the interrupt number to
look for. But I'll stare at it with brain awake tomorrow morning again.

>> I also tried changing the affinity rapidly without CPU hotplug, but
>> didn't see the issue, at least not in the few minutes I waited
>> (normally repros easily within 1 minute). An interesting datapoint.

That's what I expected. The main difference is that the vector
modification happens at a point where a device is not supposed to send
an interrupt. They happen when the interrupt of the device is serviced
before the driver handler is invoked and at that point the device should
not send another one.

> One additional datapoint. The intel guys suggested enabling
> CONFIG_IRQ_REMAP, which does seem to eliminate the issue for me. I'm
> still hoping there's a smaller fix so I don't have to add all that in.

Right, I wanted to ask you that as well and forgot. With interrupt
remapping the migration happens at the remapping unit which does not
have the horrible 'move it while servicing' requirement and it suppports
proper masking.

Thanks,

        tglx


      parent reply	other threads:[~2020-01-24  0:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-18  0:25 [PATCH v2] PCI/MSI: Avoid torn updates to MSI pairs Evan Green
2020-01-22 11:25 ` Rajat Jain
2020-01-22 18:00   ` Evan Green
2020-01-23  8:49     ` Thomas Gleixner
2020-01-23 18:16       ` Thomas Gleixner
     [not found]         ` <CAE=gft6YiM5S1A7iJYJTd5zmaAa8=nhLE3B94JtWa+XW-qVSqQ@mail.gmail.com>
2020-01-23 22:59           ` Evan Green
2020-01-24  0:29             ` Evan Green
2020-01-24 14:34               ` Thomas Gleixner
2020-01-24 21:53                 ` Evan Green
2020-01-24 22:50                   ` Thomas Gleixner
2020-01-28 14:38                     ` Thomas Gleixner
2020-01-28 22:22                       ` Evan Green
2020-01-28 22:48                         ` Thomas Gleixner
2020-01-29 18:00                           ` Evan Green
2020-01-29 21:00                             ` Thomas Gleixner
2020-01-29 22:53                               ` Evan Green
2020-01-29 23:16                                 ` Thomas Gleixner
2020-01-29 23:48                                   ` Evan Green
2020-01-31 11:27                                     ` [PATCH] x86/apic/msi: Plug non-maskable MSI affinity race Thomas Gleixner
2020-01-31 14:26                                       ` [PATCH V2] " Thomas Gleixner
2020-01-31 20:32                                         ` Evan Green
2020-01-31 21:45                                           ` Thomas Gleixner
     [not found]                                       ` <20200205144509.7004C21D7D@mail.kernel.org>
2020-02-05 14:58                                         ` [PATCH] " Thomas Gleixner
2020-02-05 20:18                                           ` Sasha Levin
2020-01-24  0:50             ` Thomas Gleixner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pnf91xur.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=bhelgaas@google.com \
    --cc=evgreen@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rajatja@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).