linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Mathias Nyman <mathias.nyman@linux.intel.com>, x86@kernel.org
Cc: linux-pci <linux-pci@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Evan Green <evgreen@chromium.org>, "Ghorai\,
	Sukumar" <sukumar.ghorai@intel.com>, "Amara\,
	Madhusudanarao" <madhusudanarao.amara@intel.com>, "Nandamuri\,
	Srikanth" <srikanth.nandamuri@intel.com>,
	x86@kernel.org
Subject: Re: MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug
Date: Tue, 24 Mar 2020 01:24:05 +0100	[thread overview]
Message-ID: <878sjqfvmi.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <f8057cbc-4814-5083-cddd-d4eb1459529f@linux.intel.com>

Mathias Nyman <mathias.nyman@linux.intel.com> writes:
> On 23.3.2020 16.10, Thomas Gleixner wrote:
>> 
>> thanks for providing the data. I think I decoded the issue. Can you
>> please test the patch below?
>
> Unfortunately it didn't help.

I did not expect that to help, simply because the same issue is caught
by the loop in fixup_irqs(). What I wanted to make sure is that there is
not something in between which causes the latter to fail.

So I stared at the trace data earlier today and looked at the xhci irq
events. They are following a more or less periodic schedule and the
forced migration on CPU hotplug hits definitely in the time frame where
the next interrupt should be raised by the device.

1) First off all I do not have to understand why new systems released
   in 2020 still use non-maskable MSI which is the root cause of all of
   this trouble especially in Intel systems which are known to have
   this disastrouos interrupt migration troubles.

   Please tell your hardware people to stop this. 
    
2) I have no idea why the two step mechanism fails exactly on this
   system. I tried the same test case on a skylake client and I can
   clearly see from the traces that the interrupt raised in the device
   falls exactly into the two step update and causes the IRR to be set
   which resolves the situation by IPI'ing the new target CPU.
    
   I have not found a single instance of IPI recovery in your
   traces. Instead of that your system stops working in exactly this
   situation.

   The two step mechanism tries to work around the fact that PCI does
   not support a 64bit atomic config space update. So we carefully avoid
   changing more than one 32bit value at a time, i.e. we change first
   the vector and then the destination ID (part of address_lo).  This
   ensures that the message is consistent all the time.

   But obviously on your system this does not work as expected. Why? I
   really can't tell.

   Please talk to your hardware folks.

And of course all of this is so well documented that all of us can
clearly figure out what's going on...

Thanks,

        tglx

    





  reply	other threads:[~2020-03-24  0:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-18 19:25 MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug Mathias Nyman
2020-03-19 20:24 ` Evan Green
2020-03-20  8:07   ` Mathias Nyman
2020-03-20  9:52 ` Thomas Gleixner
2020-03-23  9:42   ` Mathias Nyman
2020-03-23 14:10     ` Thomas Gleixner
2020-03-23 20:32       ` Mathias Nyman
2020-03-24  0:24         ` Thomas Gleixner [this message]
2020-03-24 16:17           ` Evan Green
2020-03-24 19:03             ` Thomas Gleixner
2020-05-01 18:43               ` Raj, Ashok
2020-05-05 19:36                 ` Thomas Gleixner
2020-05-05 20:16                   ` Raj, Ashok
2020-05-05 21:47                     ` Thomas Gleixner
2020-05-07 12:18                       ` Raj, Ashok
2020-05-07 12:53                         ` Thomas Gleixner
     [not found]                           ` <20200507175715.GA22426@otc-nc-03>
2020-05-07 19:41                             ` Thomas Gleixner
2020-03-25 17:12             ` Mathias Nyman
     [not found] <20200508005528.GB61703@otc-nc-03>
2020-05-08 11:04 ` Thomas Gleixner
2020-05-08 16:09   ` Raj, Ashok
2020-05-08 16:49     ` Thomas Gleixner
2020-05-11 19:03       ` Raj, Ashok
2020-05-11 20:14         ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878sjqfvmi.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=bhelgaas@google.com \
    --cc=evgreen@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=madhusudanarao.amara@intel.com \
    --cc=mathias.nyman@linux.intel.com \
    --cc=srikanth.nandamuri@intel.com \
    --cc=sukumar.ghorai@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).