linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
       [not found] <20211210161025.3287927-1-sr@denx.de>
@ 2021-12-14 12:28 ` tip-bot2 for Stefan Roese
  2022-03-14 16:36   ` Jeremi Piotrowski
  0 siblings, 1 reply; 12+ messages in thread
From: tip-bot2 for Stefan Roese @ 2021-12-14 12:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Stefan Roese, Thomas Gleixner, linux-pci, Bjorn Helgaas,
	Michal Simek, Marek Vasut, stable, x86, linux-kernel, maz

The following commit has been merged into the irq/urgent branch of tip:

Commit-ID:     83dbf898a2d45289be875deb580e93050ba67529
Gitweb:        https://git.kernel.org/tip/83dbf898a2d45289be875deb580e93050ba67529
Author:        Stefan Roese <sr@denx.de>
AuthorDate:    Tue, 14 Dec 2021 12:49:32 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00

PCI/MSI: Mask MSI-X vectors only on success

Masking all unused MSI-X entries is done to ensure that a crash kernel
starts from a clean slate, which correponds to the reset state of the
device as defined in the PCI-E specificion 3.0 and later:

 Vector Control for MSI-X Table Entries
 --------------------------------------

 "00: Mask bit:  When this bit is set, the function is prohibited from
                 sending a message using this MSI-X Table entry.
                 ...
                 This bit’s state after reset is 1 (entry is masked)."

A Marvell NVME device fails to deliver MSI interrupts after trying to
enable MSI-X interrupts due to that masking. It seems to take the MSI-X
mask bits into account even when MSI-X is disabled.

While not specification compliant, this can be cured by moving the masking
into the success path, so that the MSI-X table entries stay in device reset
state when the MSI-X setup fails.

[ tglx: Move it into the success path, add comment and amend changelog ]

Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries")                                                                                                                                                                                                                 
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-pci@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Marek Vasut <marex@denx.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211210161025.3287927-1-sr@denx.de
---
 drivers/pci/msi.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 48e3f4e..6748cf9 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 		goto out_disable;
 	}
 
-	/* Ensure that all table entries are masked. */
-	msix_mask_all(base, tsize);
-
 	ret = msix_setup_entries(dev, base, entries, nvec, affd);
 	if (ret)
 		goto out_disable;
@@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	/* Set MSI-X enabled bits and unmask the function */
 	pci_intx_for_msi(dev, 0);
 	dev->msix_enabled = 1;
+
+	/*
+	 * Ensure that all table entries are masked to prevent
+	 * stale entries from firing in a crash kernel.
+	 *
+	 * Done late to deal with a broken Marvell NVME device
+	 * which takes the MSI-X mask bits into account even
+	 * when MSI-X is disabled, which prevents MSI delivery.
+	 */
+	msix_mask_all(base, tsize);
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 
 	pcibios_free_irq(dev);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2021-12-14 12:28 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success tip-bot2 for Stefan Roese
@ 2022-03-14 16:36   ` Jeremi Piotrowski
  2022-03-14 16:49     ` Stefan Roese
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremi Piotrowski @ 2022-03-14 16:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stefan Roese, Thomas Gleixner, linux-pci, Bjorn Helgaas,
	Michal Simek, Marek Vasut, stable, x86, maz

Hi Thomas, Hi Stefan,

On Tue, Dec 14, 2021 at 12:28:06PM -0000, tip-bot2 for Stefan Roese wrote:
> The following commit has been merged into the irq/urgent branch of tip:
> 
> Commit-ID:     83dbf898a2d45289be875deb580e93050ba67529
> Gitweb:        https://git.kernel.org/tip/83dbf898a2d45289be875deb580e93050ba67529
> Author:        Stefan Roese <sr@denx.de>
> AuthorDate:    Tue, 14 Dec 2021 12:49:32 +01:00
> Committer:     Thomas Gleixner <tglx@linutronix.de>
> CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00
> 
> PCI/MSI: Mask MSI-X vectors only on success
> 
> Masking all unused MSI-X entries is done to ensure that a crash kernel
> starts from a clean slate, which correponds to the reset state of the
> device as defined in the PCI-E specificion 3.0 and later:
> 
>  Vector Control for MSI-X Table Entries
>  --------------------------------------
> 
>  "00: Mask bit:  When this bit is set, the function is prohibited from
>                  sending a message using this MSI-X Table entry.
>                  ...
>                  This bit’s state after reset is 1 (entry is masked)."
> 
> A Marvell NVME device fails to deliver MSI interrupts after trying to
> enable MSI-X interrupts due to that masking. It seems to take the MSI-X
> mask bits into account even when MSI-X is disabled.
> 
> While not specification compliant, this can be cured by moving the masking
> into the success path, so that the MSI-X table entries stay in device reset
> state when the MSI-X setup fails.
> 
> [ tglx: Move it into the success path, add comment and amend changelog ]
> 
> Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries")                                                                                                                                                                                                                 
> Signed-off-by: Stefan Roese <sr@denx.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-pci@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Michal Simek <michal.simek@xilinx.com>
> Cc: Marek Vasut <marex@denx.de>
> Cc: stable@vger.kernel.org
> Link: https://lore.kernel.org/r/20211210161025.3287927-1-sr@denx.de
> ---
>  drivers/pci/msi.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 48e3f4e..6748cf9 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>  		goto out_disable;
>  	}
>  
> -	/* Ensure that all table entries are masked. */
> -	msix_mask_all(base, tsize);
> -
>  	ret = msix_setup_entries(dev, base, entries, nvec, affd);
>  	if (ret)
>  		goto out_disable;
> @@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>  	/* Set MSI-X enabled bits and unmask the function */
>  	pci_intx_for_msi(dev, 0);
>  	dev->msix_enabled = 1;
> +
> +	/*
> +	 * Ensure that all table entries are masked to prevent
> +	 * stale entries from firing in a crash kernel.
> +	 *
> +	 * Done late to deal with a broken Marvell NVME device
> +	 * which takes the MSI-X mask bits into account even
> +	 * when MSI-X is disabled, which prevents MSI delivery.
> +	 */
> +	msix_mask_all(base, tsize);
>  	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>  
>  	pcibios_free_irq(dev);

We've had reports of issues with AWS m4 instances, which use Intel 82559 VFs
for networking (ixgbevf) with MSI-X interrupts, which I've bisected down to
this commit. Since this commit these VMs no longer have any network connectivity
and so fail to boot. This occurs with both 5.15 and 5.10 kernels, reverting the
backport of this commit restores networking.

Do you have any suggestions of how this can be resolved other than a revert?

Here's the full bisect log:

$ git bisect log
git bisect start
# good: [4e8c680af6d51ba9315e31bd4f7599e080561a2d] Linux 5.15.7
git bisect good 4e8c680af6d51ba9315e31bd4f7599e080561a2d
# bad: [efe3167e52a5833ec20ee6214be9b99b378564a8] Linux 5.15.27
git bisect bad efe3167e52a5833ec20ee6214be9b99b378564a8
# bad: [63dcc388662c3562de94d69bfa771ae4cd29b79f] Linux 5.15.16
git bisect bad 63dcc388662c3562de94d69bfa771ae4cd29b79f
# good: [57dcae4a8b93271c4e370920ea0dbb94a0215d30] Linux 5.15.10
git bisect good 57dcae4a8b93271c4e370920ea0dbb94a0215d30
# bad: [25960cafa06e6fcd830e6c792e6a7de68c1e25ed] Linux 5.15.12
git bisect bad 25960cafa06e6fcd830e6c792e6a7de68c1e25ed
# bad: [fb6ad5cb3b6745e7bffc5fe19b130f3594375634] Linux 5.15.11
git bisect bad fb6ad5cb3b6745e7bffc5fe19b130f3594375634
# good: [257b3bb16634fd936129fe2f57a91594a75b8751] drm/amd/pm: fix a potential gpu_metrics_table memory leak
git bisect good 257b3bb16634fd936129fe2f57a91594a75b8751
# bad: [bbdaa7a48f465a2ee76d65839caeda08af1ef3b2] btrfs: fix double free of anon_dev after failure to create subvolume
git bisect bad bbdaa7a48f465a2ee76d65839caeda08af1ef3b2
# good: [c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6] bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
git bisect good c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6
# bad: [5cb5c3e1b184da9f49e46119a0e506519fc58185] usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
git bisect bad 5cb5c3e1b184da9f49e46119a0e506519fc58185
# good: [e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4] tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
git bisect good e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4
# good: [4df1af29930b03d61fb774bfaa5100dbdb964628] PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
git bisect good 4df1af29930b03d61fb774bfaa5100dbdb964628
# bad: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success
git bisect bad d8888cdabedf353ab9b5a6af75f70bf341a3e7df
# first bad commit: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success

Bests,
Jeremi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-03-14 16:36   ` Jeremi Piotrowski
@ 2022-03-14 16:49     ` Stefan Roese
  2022-03-14 17:04       ` Dusty Mabe
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Roese @ 2022-03-14 16:49 UTC (permalink / raw)
  To: Jeremi Piotrowski, linux-kernel
  Cc: Thomas Gleixner, linux-pci, Bjorn Helgaas, Michal Simek,
	Marek Vasut, stable, x86, maz, Dusty Mabe

Hi Jeremi,

(added Dusty to Cc)

On 3/14/22 17:36, Jeremi Piotrowski wrote:
> Hi Thomas, Hi Stefan,
> 
> On Tue, Dec 14, 2021 at 12:28:06PM -0000, tip-bot2 for Stefan Roese wrote:
>> The following commit has been merged into the irq/urgent branch of tip:
>>
>> Commit-ID:     83dbf898a2d45289be875deb580e93050ba67529
>> Gitweb:        https://git.kernel.org/tip/83dbf898a2d45289be875deb580e93050ba67529
>> Author:        Stefan Roese <sr@denx.de>
>> AuthorDate:    Tue, 14 Dec 2021 12:49:32 +01:00
>> Committer:     Thomas Gleixner <tglx@linutronix.de>
>> CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00
>>
>> PCI/MSI: Mask MSI-X vectors only on success
>>
>> Masking all unused MSI-X entries is done to ensure that a crash kernel
>> starts from a clean slate, which correponds to the reset state of the
>> device as defined in the PCI-E specificion 3.0 and later:
>>
>>   Vector Control for MSI-X Table Entries
>>   --------------------------------------
>>
>>   "00: Mask bit:  When this bit is set, the function is prohibited from
>>                   sending a message using this MSI-X Table entry.
>>                   ...
>>                   This bit’s state after reset is 1 (entry is masked)."
>>
>> A Marvell NVME device fails to deliver MSI interrupts after trying to
>> enable MSI-X interrupts due to that masking. It seems to take the MSI-X
>> mask bits into account even when MSI-X is disabled.
>>
>> While not specification compliant, this can be cured by moving the masking
>> into the success path, so that the MSI-X table entries stay in device reset
>> state when the MSI-X setup fails.
>>
>> [ tglx: Move it into the success path, add comment and amend changelog ]
>>
>> Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries")
>> Signed-off-by: Stefan Roese <sr@denx.de>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Cc: linux-pci@vger.kernel.org
>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: Michal Simek <michal.simek@xilinx.com>
>> Cc: Marek Vasut <marex@denx.de>
>> Cc: stable@vger.kernel.org
>> Link: https://lore.kernel.org/r/20211210161025.3287927-1-sr@denx.de
>> ---
>>   drivers/pci/msi.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> index 48e3f4e..6748cf9 100644
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>>   		goto out_disable;
>>   	}
>>   
>> -	/* Ensure that all table entries are masked. */
>> -	msix_mask_all(base, tsize);
>> -
>>   	ret = msix_setup_entries(dev, base, entries, nvec, affd);
>>   	if (ret)
>>   		goto out_disable;
>> @@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>>   	/* Set MSI-X enabled bits and unmask the function */
>>   	pci_intx_for_msi(dev, 0);
>>   	dev->msix_enabled = 1;
>> +
>> +	/*
>> +	 * Ensure that all table entries are masked to prevent
>> +	 * stale entries from firing in a crash kernel.
>> +	 *
>> +	 * Done late to deal with a broken Marvell NVME device
>> +	 * which takes the MSI-X mask bits into account even
>> +	 * when MSI-X is disabled, which prevents MSI delivery.
>> +	 */
>> +	msix_mask_all(base, tsize);
>>   	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>>   
>>   	pcibios_free_irq(dev);
> 
> We've had reports of issues with AWS m4 instances, which use Intel 82559 VFs
> for networking (ixgbevf) with MSI-X interrupts, which I've bisected down to
> this commit. Since this commit these VMs no longer have any network connectivity
> and so fail to boot. This occurs with both 5.15 and 5.10 kernels, reverting the
> backport of this commit restores networking.
> 
> Do you have any suggestions of how this can be resolved other than a revert?
> 
> Here's the full bisect log:
> 
> $ git bisect log
> git bisect start
> # good: [4e8c680af6d51ba9315e31bd4f7599e080561a2d] Linux 5.15.7
> git bisect good 4e8c680af6d51ba9315e31bd4f7599e080561a2d
> # bad: [efe3167e52a5833ec20ee6214be9b99b378564a8] Linux 5.15.27
> git bisect bad efe3167e52a5833ec20ee6214be9b99b378564a8
> # bad: [63dcc388662c3562de94d69bfa771ae4cd29b79f] Linux 5.15.16
> git bisect bad 63dcc388662c3562de94d69bfa771ae4cd29b79f
> # good: [57dcae4a8b93271c4e370920ea0dbb94a0215d30] Linux 5.15.10
> git bisect good 57dcae4a8b93271c4e370920ea0dbb94a0215d30
> # bad: [25960cafa06e6fcd830e6c792e6a7de68c1e25ed] Linux 5.15.12
> git bisect bad 25960cafa06e6fcd830e6c792e6a7de68c1e25ed
> # bad: [fb6ad5cb3b6745e7bffc5fe19b130f3594375634] Linux 5.15.11
> git bisect bad fb6ad5cb3b6745e7bffc5fe19b130f3594375634
> # good: [257b3bb16634fd936129fe2f57a91594a75b8751] drm/amd/pm: fix a potential gpu_metrics_table memory leak
> git bisect good 257b3bb16634fd936129fe2f57a91594a75b8751
> # bad: [bbdaa7a48f465a2ee76d65839caeda08af1ef3b2] btrfs: fix double free of anon_dev after failure to create subvolume
> git bisect bad bbdaa7a48f465a2ee76d65839caeda08af1ef3b2
> # good: [c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6] bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
> git bisect good c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6
> # bad: [5cb5c3e1b184da9f49e46119a0e506519fc58185] usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
> git bisect bad 5cb5c3e1b184da9f49e46119a0e506519fc58185
> # good: [e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4] tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
> git bisect good e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4
> # good: [4df1af29930b03d61fb774bfaa5100dbdb964628] PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
> git bisect good 4df1af29930b03d61fb774bfaa5100dbdb964628
> # bad: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success
> git bisect bad d8888cdabedf353ab9b5a6af75f70bf341a3e7df
> # first bad commit: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success

I've added Dusty to Cc, as he (and others) already have been dealing
with this issue AFAICT.

Dusty, could you perhaps chime in with the latest status? AFAIU, it's
related to potential issues with the Xen version used on these systems?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-03-14 16:49     ` Stefan Roese
@ 2022-03-14 17:04       ` Dusty Mabe
  2022-03-14 20:29         ` Jeremi Piotrowski
  0 siblings, 1 reply; 12+ messages in thread
From: Dusty Mabe @ 2022-03-14 17:04 UTC (permalink / raw)
  To: Stefan Roese, Jeremi Piotrowski, linux-kernel
  Cc: Thomas Gleixner, linux-pci, Bjorn Helgaas, Michal Simek,
	Marek Vasut, stable, x86, maz



On 3/14/22 12:49, Stefan Roese wrote:

> I've added Dusty to Cc, as he (and others) already have been dealing
> with this issue AFAICT.
> 
> Dusty, could you perhaps chime in with the latest status? AFAIU, it's
> related to potential issues with the Xen version used on these systems?

Thanks Stefan,

Yes. My understanding is that the issue is because AWS is using older versions
of Xen. They are in the process of updating their fleet to a newer version of
Xen so the change introduced with Stefan's commit isn't an issue any longer.

I think the changes are scheduled to be completed in the next 10-12 weeks. For
now we are carrying a revert in the Fedora Kernel.

You can follow this Fedora CoreOS issue if you'd like to know more about when
the change lands in their backend. We work closely with one of their partner
engineers and he keeps us updated. https://github.com/coreos/fedora-coreos-tracker/issues/1066

Dusty


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-03-14 17:04       ` Dusty Mabe
@ 2022-03-14 20:29         ` Jeremi Piotrowski
  2022-04-27  7:59           ` Salvatore Bonaccorso
  0 siblings, 1 reply; 12+ messages in thread
From: Jeremi Piotrowski @ 2022-03-14 20:29 UTC (permalink / raw)
  To: Dusty Mabe
  Cc: Stefan Roese, linux-kernel, Thomas Gleixner, linux-pci,
	Bjorn Helgaas, Michal Simek, Marek Vasut, x86, maz

On Mon, Mar 14, 2022 at 01:04:55PM -0400, Dusty Mabe wrote:
> 
> 
> On 3/14/22 12:49, Stefan Roese wrote:
> 
> > I've added Dusty to Cc, as he (and others) already have been dealing
> > with this issue AFAICT.
> > 
> > Dusty, could you perhaps chime in with the latest status? AFAIU, it's
> > related to potential issues with the Xen version used on these systems?
> 
> Thanks Stefan,
> 
> Yes. My understanding is that the issue is because AWS is using older versions
> of Xen. They are in the process of updating their fleet to a newer version of
> Xen so the change introduced with Stefan's commit isn't an issue any longer.
> 
> I think the changes are scheduled to be completed in the next 10-12 weeks. For
> now we are carrying a revert in the Fedora Kernel.
> 
> You can follow this Fedora CoreOS issue if you'd like to know more about when
> the change lands in their backend. We work closely with one of their partner
> engineers and he keeps us updated. https://github.com/coreos/fedora-coreos-tracker/issues/1066
> 
> Dusty

Thanks for the link and explanation. What a fun coincidence that we hit this in
Flatcar Container Linux as well. We've reverted the commit in our kernels for
the time being, and will track that issue.

Thanks,
Jeremi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-03-14 20:29         ` Jeremi Piotrowski
@ 2022-04-27  7:59           ` Salvatore Bonaccorso
  2022-04-27 17:35             ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Salvatore Bonaccorso @ 2022-04-27  7:59 UTC (permalink / raw)
  To: Jeremi Piotrowski
  Cc: Dusty Mabe, Stefan Roese, linux-kernel, Thomas Gleixner,
	linux-pci, Bjorn Helgaas, Michal Simek, Marek Vasut, x86, maz

Hi,

On Mon, Mar 14, 2022 at 09:29:53PM +0100, Jeremi Piotrowski wrote:
> On Mon, Mar 14, 2022 at 01:04:55PM -0400, Dusty Mabe wrote:
> > 
> > 
> > On 3/14/22 12:49, Stefan Roese wrote:
> > 
> > > I've added Dusty to Cc, as he (and others) already have been dealing
> > > with this issue AFAICT.
> > > 
> > > Dusty, could you perhaps chime in with the latest status? AFAIU, it's
> > > related to potential issues with the Xen version used on these systems?
> > 
> > Thanks Stefan,
> > 
> > Yes. My understanding is that the issue is because AWS is using older versions
> > of Xen. They are in the process of updating their fleet to a newer version of
> > Xen so the change introduced with Stefan's commit isn't an issue any longer.
> > 
> > I think the changes are scheduled to be completed in the next 10-12 weeks. For
> > now we are carrying a revert in the Fedora Kernel.
> > 
> > You can follow this Fedora CoreOS issue if you'd like to know more about when
> > the change lands in their backend. We work closely with one of their partner
> > engineers and he keeps us updated. https://github.com/coreos/fedora-coreos-tracker/issues/1066
> > 
> > Dusty
> 
> Thanks for the link and explanation. What a fun coincidence that we hit this in
> Flatcar Container Linux as well. We've reverted the commit in our kernels for
> the time being, and will track that issue.

Does someone knows here on current state of the AWS instances using
the older Xen version which causes the issues?

AFAIU upstream is not planning to revert 83dbf898a2d4 ("PCI/MSI: Mask
MSI-X vectors only on success") as it fixed a bug. Now several
downstream distros do carry a revert of this commit, which I believe
is an unfortunate situation and wonder if this can be addressed
upstream to deal with the AWS m4.large instance issues.

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-04-27  7:59           ` Salvatore Bonaccorso
@ 2022-04-27 17:35             ` Thomas Gleixner
  2022-04-28 13:48               ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2022-04-27 17:35 UTC (permalink / raw)
  To: Salvatore Bonaccorso, Jeremi Piotrowski
  Cc: Dusty Mabe, Stefan Roese, linux-kernel, linux-pci, Bjorn Helgaas,
	Michal Simek, Marek Vasut, x86, maz, Andrew Cooper,
	Juergen Gross

On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> On Mon, Mar 14, 2022 at 09:29:53PM +0100, Jeremi Piotrowski wrote:
>
> Does someone knows here on current state of the AWS instances using
> the older Xen version which causes the issues?
>
> AFAIU upstream is not planning to revert 83dbf898a2d4 ("PCI/MSI: Mask
> MSI-X vectors only on success") as it fixed a bug. Now several
> downstream distros do carry a revert of this commit, which I believe
> is an unfortunate situation and wonder if this can be addressed
> upstream to deal with the AWS m4.large instance issues.

The problem is that except for a bisect result we've not seen much
information which might help to debug and analyze the problem.

The guest uses MSI-X on that NIC:

 Capabilities: [70] MSI-X: Enable+ Count=3 Masked-

So looking at the commit in question:

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 48e3f4e47b29..6748cf9d7d90 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 		goto out_disable;
 	}
 
-	/* Ensure that all table entries are masked. */
-	msix_mask_all(base, tsize);
-
 	ret = msix_setup_entries(dev, base, entries, nvec, affd);
 	if (ret)
 		goto out_disable;
@@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	/* Set MSI-X enabled bits and unmask the function */
 	pci_intx_for_msi(dev, 0);
 	dev->msix_enabled = 1;
+
+	/*
+	 * Ensure that all table entries are masked to prevent
+	 * stale entries from firing in a crash kernel.
+	 *
+	 * Done late to deal with a broken Marvell NVME device
+	 * which takes the MSI-X mask bits into account even
+	 * when MSI-X is disabled, which prevents MSI delivery.
+	 */
+	msix_mask_all(base, tsize);
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);

IOW, it moves the invocation of msix_mask_all() into the success
path.

As the device uses MSI-X this change does not make any difference from a
hardware perspective simply because _all_ MSI-X interrupts are masked
via the CTRL register already. And it does not matter whether the kernel
masks them individually _before_ or _after_ the allocation. At least not
on real hardware and on a sane emulation.

Now this is XEN and neither real hardware nor sane emulation.

It must be a XEN_HVM guest because XEN_PV guests disable PCI/MSI[-X]
completely which makes the invocation of msix_mask_all() a NOP.

If it's not a XEN_HVM guest, then you can stop reading further as I'm
unable to decode why moving a NOP makes a difference. That belongs in to
the realm of voodoo, but then XEN is voodoo at least for me. :)

XEN guests do not use the common PCI mask/unmask machinery which would
unmask the interrupt on request_irq().

So I assume that the following happens:

Guest                     Hypervisor

msix_capabilities_init()
        ....
        alloc_irq()
           xen_magic()  -> alloc_msix_interrupt()
                           request_irq()

        msix_mask_all() -> trap
                             do_magic()
request_irq()
   unmask()
     xen_magic()
       unmask_evtchn()  -> do_more_magic()

So I assume further that msix_mask_all() actually is able to mask the
interrupts in the hardware (ctrl word of the vector table) despite the
hypervisor having allocated and requested the interrupt already.

Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
XEN_PV does. I can only assume the answer is voodoo...

Maybe the XEN people have some more enlightened answers to that.

Thanks,

        tglx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-04-27 17:35             ` Thomas Gleixner
@ 2022-04-28 13:48               ` Thomas Gleixner
  2022-04-28 13:50                 ` [PATCH] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests Thomas Gleixner
  2022-04-28 18:43                 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success Salvatore Bonaccorso
  0 siblings, 2 replies; 12+ messages in thread
From: Thomas Gleixner @ 2022-04-28 13:48 UTC (permalink / raw)
  To: Salvatore Bonaccorso, Jeremi Piotrowski
  Cc: Dusty Mabe, Stefan Roese, linux-kernel, linux-pci, Bjorn Helgaas,
	Michal Simek, Marek Vasut, x86, maz, Andrew Cooper,
	Juergen Gross

On Wed, Apr 27 2022 at 19:35, Thomas Gleixner wrote:
> On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> XEN guests do not use the common PCI mask/unmask machinery which would
> unmask the interrupt on request_irq().
>
> So I assume that the following happens:
>
> Guest                     Hypervisor
>
> msix_capabilities_init()
>         ....
>         alloc_irq()
>            xen_magic()  -> alloc_msix_interrupt()
>                            request_irq()
>
>         msix_mask_all() -> trap
>                              do_magic()
> request_irq()
>    unmask()
>      xen_magic()
>        unmask_evtchn()  -> do_more_magic()
>
> So I assume further that msix_mask_all() actually is able to mask the
> interrupts in the hardware (ctrl word of the vector table) despite the
> hypervisor having allocated and requested the interrupt already.
>
> Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
> really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
> XEN_PV does. I can only assume the answer is voodoo...
>
> Maybe the XEN people have some more enlightened answers to that.

So I was talking to Juergen about this and he agrees, that for the case
where a XEN HVM guest uses the PIRQ/Eventchannel mechanism PCI/MSI[-X]
masking should be disabled like it is done for XEN PV.

Why the hypervisor grants the mask write is still mysterious, but I
leave that to the folks who can master the XEN voodoo.

I'll send out a patch in minute.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
  2022-04-28 13:48               ` Thomas Gleixner
@ 2022-04-28 13:50                 ` Thomas Gleixner
  2022-05-01  8:12                   ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner
  2022-04-28 18:43                 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success Salvatore Bonaccorso
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2022-04-28 13:50 UTC (permalink / raw)
  To: Salvatore Bonaccorso, Jeremi Piotrowski
  Cc: Dusty Mabe, Stefan Roese, linux-kernel, linux-pci, Bjorn Helgaas,
	Michal Simek, Marek Vasut, x86, maz, Andrew Cooper,
	Juergen Gross

When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then
PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to
XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI
layer.

This can lead to a situation where the PCI/MSI layer masks an MSI[-X]
interrupt and the hypervisor grants the write despite the fact that it
already requested the interrupt. As a consequence interrupt delivery on the
affected device is not happening ever.

Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests
already.

Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reported-by: Dusty Mabe <dustymabe@redhat.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Fixes: 809f9267bbab ("xen: map MSIs into pirqs")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
 arch/x86/pci/xen.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -467,7 +467,6 @@ static __init void xen_setup_pci_msi(voi
 		else
 			xen_msi_ops.setup_msi_irqs = xen_setup_msi_irqs;
 		xen_msi_ops.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
-		pci_msi_ignore_mask = 1;
 	} else if (xen_hvm_domain()) {
 		xen_msi_ops.setup_msi_irqs = xen_hvm_setup_msi_irqs;
 		xen_msi_ops.teardown_msi_irqs = xen_teardown_msi_irqs;
@@ -481,6 +480,11 @@ static __init void xen_setup_pci_msi(voi
 	 * in allocating the native domain and never use it.
 	 */
 	x86_init.irqs.create_pci_msi_domain = xen_create_pci_msi_domain;
+	/*
+	 * With XEN PIRQ/Eventchannels in use PCI/MSI[-X] masking is solely
+	 * controlled by the hypervisor.
+	 */
+	pci_msi_ignore_mask = 1;
 }
 
 #else /* CONFIG_PCI_MSI */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-04-28 13:48               ` Thomas Gleixner
  2022-04-28 13:50                 ` [PATCH] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests Thomas Gleixner
@ 2022-04-28 18:43                 ` Salvatore Bonaccorso
  2022-04-29  6:37                   ` Salvatore Bonaccorso
  1 sibling, 1 reply; 12+ messages in thread
From: Salvatore Bonaccorso @ 2022-04-28 18:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jeremi Piotrowski, Dusty Mabe, Stefan Roese, linux-kernel,
	linux-pci, Bjorn Helgaas, Michal Simek, Marek Vasut, x86, maz,
	Andrew Cooper, Juergen Gross, Noah Meyerhans

Hi Thomas,

On Thu, Apr 28, 2022 at 03:48:03PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 27 2022 at 19:35, Thomas Gleixner wrote:
> > On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> > XEN guests do not use the common PCI mask/unmask machinery which would
> > unmask the interrupt on request_irq().
> >
> > So I assume that the following happens:
> >
> > Guest                     Hypervisor
> >
> > msix_capabilities_init()
> >         ....
> >         alloc_irq()
> >            xen_magic()  -> alloc_msix_interrupt()
> >                            request_irq()
> >
> >         msix_mask_all() -> trap
> >                              do_magic()
> > request_irq()
> >    unmask()
> >      xen_magic()
> >        unmask_evtchn()  -> do_more_magic()
> >
> > So I assume further that msix_mask_all() actually is able to mask the
> > interrupts in the hardware (ctrl word of the vector table) despite the
> > hypervisor having allocated and requested the interrupt already.
> >
> > Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
> > really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
> > XEN_PV does. I can only assume the answer is voodoo...
> >
> > Maybe the XEN people have some more enlightened answers to that.
> 
> So I was talking to Juergen about this and he agrees, that for the case
> where a XEN HVM guest uses the PIRQ/Eventchannel mechanism PCI/MSI[-X]
> masking should be disabled like it is done for XEN PV.
> 
> Why the hypervisor grants the mask write is still mysterious, but I
> leave that to the folks who can master the XEN voodoo.
> 
> I'll send out a patch in minute.

Thank you. We are having Noah Meyerhans now testing the patch and he
will report back if it works (Cc'ed here now).

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-04-28 18:43                 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success Salvatore Bonaccorso
@ 2022-04-29  6:37                   ` Salvatore Bonaccorso
  0 siblings, 0 replies; 12+ messages in thread
From: Salvatore Bonaccorso @ 2022-04-29  6:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jeremi Piotrowski, Dusty Mabe, Stefan Roese, linux-kernel,
	linux-pci, Bjorn Helgaas, Michal Simek, Marek Vasut, x86, maz,
	Andrew Cooper, Juergen Gross, Noah Meyerhans

Hi Thomas,

On Thu, Apr 28, 2022 at 08:43:10PM +0200, Salvatore Bonaccorso wrote:
> Hi Thomas,
> 
> On Thu, Apr 28, 2022 at 03:48:03PM +0200, Thomas Gleixner wrote:
> > On Wed, Apr 27 2022 at 19:35, Thomas Gleixner wrote:
> > > On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> > > XEN guests do not use the common PCI mask/unmask machinery which would
> > > unmask the interrupt on request_irq().
> > >
> > > So I assume that the following happens:
> > >
> > > Guest                     Hypervisor
> > >
> > > msix_capabilities_init()
> > >         ....
> > >         alloc_irq()
> > >            xen_magic()  -> alloc_msix_interrupt()
> > >                            request_irq()
> > >
> > >         msix_mask_all() -> trap
> > >                              do_magic()
> > > request_irq()
> > >    unmask()
> > >      xen_magic()
> > >        unmask_evtchn()  -> do_more_magic()
> > >
> > > So I assume further that msix_mask_all() actually is able to mask the
> > > interrupts in the hardware (ctrl word of the vector table) despite the
> > > hypervisor having allocated and requested the interrupt already.
> > >
> > > Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
> > > really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
> > > XEN_PV does. I can only assume the answer is voodoo...
> > >
> > > Maybe the XEN people have some more enlightened answers to that.
> > 
> > So I was talking to Juergen about this and he agrees, that for the case
> > where a XEN HVM guest uses the PIRQ/Eventchannel mechanism PCI/MSI[-X]
> > masking should be disabled like it is done for XEN PV.
> > 
> > Why the hypervisor grants the mask write is still mysterious, but I
> > leave that to the folks who can master the XEN voodoo.
> > 
> > I'll send out a patch in minute.
> 
> Thank you. We are having Noah Meyerhans now testing the patch and he
> will report back if it works (Cc'ed here now).

To confirm: Noah tested the patch on various different Xen (and other
VM configurations) and works well.

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tip: x86/urgent] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
  2022-04-28 13:50                 ` [PATCH] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests Thomas Gleixner
@ 2022-05-01  8:12                   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2022-05-01  8:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Jeremi Piotrowski, Dusty Mabe, Salvatore Bonaccorso,
	Thomas Gleixner, Noah Meyerhans, stable, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     7e0815b3e09986d2fe651199363e135b9358132a
Gitweb:        https://git.kernel.org/tip/7e0815b3e09986d2fe651199363e135b9358132a
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Thu, 28 Apr 2022 15:50:54 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 29 Apr 2022 14:37:39 +02:00

x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests

When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then
PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to
XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI
layer.

This can lead to a situation where the PCI/MSI layer masks an MSI[-X]
interrupt and the hypervisor grants the write despite the fact that it
already requested the interrupt. As a consequence interrupt delivery on the
affected device is not happening ever.

Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests
already.

Fixes: 809f9267bbab ("xen: map MSIs into pirqs")
Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reported-by: Dusty Mabe <dustymabe@redhat.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Noah Meyerhans <noahm@debian.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglx

---
 arch/x86/pci/xen.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 9bb1e29..b94f727 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -467,7 +467,6 @@ static __init void xen_setup_pci_msi(void)
 		else
 			xen_msi_ops.setup_msi_irqs = xen_setup_msi_irqs;
 		xen_msi_ops.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
-		pci_msi_ignore_mask = 1;
 	} else if (xen_hvm_domain()) {
 		xen_msi_ops.setup_msi_irqs = xen_hvm_setup_msi_irqs;
 		xen_msi_ops.teardown_msi_irqs = xen_teardown_msi_irqs;
@@ -481,6 +480,11 @@ static __init void xen_setup_pci_msi(void)
 	 * in allocating the native domain and never use it.
 	 */
 	x86_init.irqs.create_pci_msi_domain = xen_create_pci_msi_domain;
+	/*
+	 * With XEN PIRQ/Eventchannels in use PCI/MSI[-X] masking is solely
+	 * controlled by the hypervisor.
+	 */
+	pci_msi_ignore_mask = 1;
 }
 
 #else /* CONFIG_PCI_MSI */

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-05-01  8:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20211210161025.3287927-1-sr@denx.de>
2021-12-14 12:28 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success tip-bot2 for Stefan Roese
2022-03-14 16:36   ` Jeremi Piotrowski
2022-03-14 16:49     ` Stefan Roese
2022-03-14 17:04       ` Dusty Mabe
2022-03-14 20:29         ` Jeremi Piotrowski
2022-04-27  7:59           ` Salvatore Bonaccorso
2022-04-27 17:35             ` Thomas Gleixner
2022-04-28 13:48               ` Thomas Gleixner
2022-04-28 13:50                 ` [PATCH] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests Thomas Gleixner
2022-05-01  8:12                   ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner
2022-04-28 18:43                 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success Salvatore Bonaccorso
2022-04-29  6:37                   ` Salvatore Bonaccorso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).