linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] PCI/MSI: Only mask all MSI-X entries when MSI-X is used
@ 2021-12-10 16:10 Stefan Roese
  2021-12-11 10:17 ` Thomas Gleixner
  2021-12-14 12:28 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success tip-bot2 for Stefan Roese
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Roese @ 2021-12-10 16:10 UTC (permalink / raw)
  To: linux-pci; +Cc: Thomas Gleixner, Bjorn Helgaas, Michal Simek, Marek Vasut

This patch moves the masking of the MSI-X entries to a later stage in
msix_capability_init(), which is not reached on platforms not
supporting MSI-X. Without this, MSI interrupts from a NVMe drive are not
received at all on this ZynqMP based platform, only supporting legacy
and MSI interrupts.

Background:
This patch fixes a problem on our ZynqMP based system working with newer
NVMe drives which support MSI & MSI-X. Running v5.4 all is fine and
these drives correctly configure an MSI interrupt and this IRQ is
received just fine in the ZynqMP rootport. But when updating to v5.10
or later (I also tested with v5.15 and v5.16-rc4) the MSI interrupt
gets assigned but no interrupts are received by the NVMe driver at all.

Note: The ZynqMP PCIe rootport driver only supports legacy and MSI
interrupts, not MSI-X (yet).

I've debugged the MSI integration of the ZynqMP PCIe rootport driver
(pcie-xilinx-nwl.c) and found no issues there. Also the MSI framework
in the Kernel did not reveal any problems - at least for me. Looking
a bit deeper into the lspci output, I found an interesting difference
between v5.4 and v5.10 (or later).

v5.4:
04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express])
        ...
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
		Address: 00000000fd480000  Data: 0004
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
	...
	Capabilities: [b0] MSI-X: Enable- Count=67 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00003000
	...

v5.10:
04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express])
        ...
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
                Address: 00000000fd480000  Data: 0004
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
        ...
        Capabilities: [b0] MSI-X: Enable- Count=67 Masked+
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        ...

So the only difference here being the "Masked+" compared to the
"Masked-" in the working v5.4 Kernel. Testing in this area has shown,
that the root cause for the masked bit being set was the call to
msix_mask_all() in msix_capability_init(). Without this, all works just
fine and the MSI interrupts are received again by the NVMe driver.

BTW: I've also tested this problem with the latest version of Thomas's
PCI/MSI Spring cleaning on top of v5.16-rc4. No change - the masked bit
is still set and the MSI interrupt are note received by the NVMe driver.

I'm open to other ideas to fix this issue. So please review and comment.

Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries")
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Marek Vasut <marex@denx.de>
---
 drivers/pci/msi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a7a1c7411348..25b659dd5e2b 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -825,9 +825,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 		goto out_disable;
 	}
 
-	/* Ensure that all table entries are masked. */
-	msix_mask_all(base, tsize);
-
 	ret = msix_setup_entries(dev, base, entries, nvec, affd);
 	if (ret)
 		goto out_disable;
@@ -836,6 +833,9 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	if (ret)
 		goto out_avail;
 
+	/* Ensure that all table entries are masked. */
+	msix_mask_all(base, tsize);
+
 	/* Check if all MSI entries honor device restrictions */
 	ret = msi_verify_entries(dev);
 	if (ret)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] PCI/MSI: Only mask all MSI-X entries when MSI-X is used
  2021-12-10 16:10 [RFC PATCH] PCI/MSI: Only mask all MSI-X entries when MSI-X is used Stefan Roese
@ 2021-12-11 10:17 ` Thomas Gleixner
  2021-12-11 13:58   ` Stefan Roese
  2021-12-14 12:28 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success tip-bot2 for Stefan Roese
  1 sibling, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2021-12-11 10:17 UTC (permalink / raw)
  To: Stefan Roese, linux-pci; +Cc: Bjorn Helgaas, Michal Simek, Marek Vasut

Stefan,

On Fri, Dec 10 2021 at 17:10, Stefan Roese wrote:
> I've debugged the MSI integration of the ZynqMP PCIe rootport driver
> (pcie-xilinx-nwl.c) and found no issues there. Also the MSI framework
> in the Kernel did not reveal any problems - at least for me. Looking
> a bit deeper into the lspci output, I found an interesting difference
> between v5.4 and v5.10 (or later).
>
> v5.4:
> 04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express])
>         ...
> 	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
> 		Address: 00000000fd480000  Data: 0004
> 		Masking: 00000000  Pending: 00000000
> 	Capabilities: [70] Express (v2) Endpoint, MSI 00
> 	...
> 	Capabilities: [b0] MSI-X: Enable- Count=67 Masked-
> 		Vector table: BAR=0 offset=00002000
> 		PBA: BAR=0 offset=00003000
> 	...
>
> v5.10:
> 04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express])
>         ...
>         Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
>                 Address: 00000000fd480000  Data: 0004
>                 Masking: 00000000  Pending: 00000000
>         Capabilities: [70] Express (v2) Endpoint, MSI 00
>         ...
>         Capabilities: [b0] MSI-X: Enable- Count=67 Masked+
>                 Vector table: BAR=0 offset=00002000
>                 PBA: BAR=0 offset=00003000
>         ...
>
> So the only difference here being the "Masked+" compared to the
> "Masked-" in the working v5.4 Kernel. Testing in this area has shown,
> that the root cause for the masked bit being set was the call to
> msix_mask_all() in msix_capability_init(). Without this, all works just
> fine and the MSI interrupts are received again by the NVMe driver.

Not really. The Masked+ in the capabilities entry has nothing to do with
the entries in the table being masked. The Masked+ reflects the
PCI_MSIX_FLAGS_MASKALL bit in the MSI-X control register.

That is set early on and not cleared in the error handling path. The
error handling just clears the MSIX_FLAGS_ENABLE bit.

Can you try the patch below?

It might still be that this Marvell part really combines the per entry
mask bits from MSI-X with MSI, then we need both.

Thanks,

        tglx
---
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -777,7 +777,7 @@ static int msix_capability_init(struct p
 	free_msi_irqs(dev);
 
 out_disable:
-	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
+	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 0);
 
 	return ret;
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] PCI/MSI: Only mask all MSI-X entries when MSI-X is used
  2021-12-11 10:17 ` Thomas Gleixner
@ 2021-12-11 13:58   ` Stefan Roese
  2021-12-11 21:02     ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Roese @ 2021-12-11 13:58 UTC (permalink / raw)
  To: Thomas Gleixner, linux-pci; +Cc: Bjorn Helgaas, Michal Simek, Marek Vasut

Hi Thomas,

On 12/11/21 11:17, Thomas Gleixner wrote:
> Stefan,
> 
> On Fri, Dec 10 2021 at 17:10, Stefan Roese wrote:
>> I've debugged the MSI integration of the ZynqMP PCIe rootport driver
>> (pcie-xilinx-nwl.c) and found no issues there. Also the MSI framework
>> in the Kernel did not reveal any problems - at least for me. Looking
>> a bit deeper into the lspci output, I found an interesting difference
>> between v5.4 and v5.10 (or later).
>>
>> v5.4:
>> 04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express])
>>          ...
>> 	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
>> 		Address: 00000000fd480000  Data: 0004
>> 		Masking: 00000000  Pending: 00000000
>> 	Capabilities: [70] Express (v2) Endpoint, MSI 00
>> 	...
>> 	Capabilities: [b0] MSI-X: Enable- Count=67 Masked-
>> 		Vector table: BAR=0 offset=00002000
>> 		PBA: BAR=0 offset=00003000
>> 	...
>>
>> v5.10:
>> 04:00.0 Non-Volatile memory controller: Marvell Technology Group Ltd. Device 1321 (rev 02) (prog-if 02 [NVM Express])
>>          ...
>>          Capabilities: [50] MSI: Enable+ Count=1/1 Maskable+ 64bit+
>>                  Address: 00000000fd480000  Data: 0004
>>                  Masking: 00000000  Pending: 00000000
>>          Capabilities: [70] Express (v2) Endpoint, MSI 00
>>          ...
>>          Capabilities: [b0] MSI-X: Enable- Count=67 Masked+
>>                  Vector table: BAR=0 offset=00002000
>>                  PBA: BAR=0 offset=00003000
>>          ...
>>
>> So the only difference here being the "Masked+" compared to the
>> "Masked-" in the working v5.4 Kernel. Testing in this area has shown,
>> that the root cause for the masked bit being set was the call to
>> msix_mask_all() in msix_capability_init(). Without this, all works just
>> fine and the MSI interrupts are received again by the NVMe driver.
> 
> Not really. The Masked+ in the capabilities entry has nothing to do with
> the entries in the table being masked. The Masked+ reflects the
> PCI_MSIX_FLAGS_MASKALL bit in the MSI-X control register.
> 
> That is set early on and not cleared in the error handling path. The
> error handling just clears the MSIX_FLAGS_ENABLE bit.
> 
> Can you try the patch below?

Sure, please see below.

> It might still be that this Marvell part really combines the per entry
> mask bits from MSI-X with MSI, then we need both.

With your patch applied only (mine not), the Masked+ is gone but still
the MSI interrupts are not received in the system. So you seem to have
guessed correctly, that we need both changes.

How to continue? Should I integrate your patch into mine and send a new
version? Or will you send it separately to the list for integration?

Thanks,
Stefan

> Thanks,
> 
>          tglx
> ---
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -777,7 +777,7 @@ static int msix_capability_init(struct p
>   	free_msi_irqs(dev);
>   
>   out_disable:
> -	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> +	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 0);
>   
>   	return ret;
>   }
> 

Viele Grüße,
Stefan Roese

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-51 Fax: (+49)-8142-66989-80 Email: sr@denx.de

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] PCI/MSI: Only mask all MSI-X entries when MSI-X is used
  2021-12-11 13:58   ` Stefan Roese
@ 2021-12-11 21:02     ` Thomas Gleixner
  2021-12-14 11:10       ` Stefan Roese
  2021-12-14 12:28       ` [tip: irq/urgent] PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error tip-bot2 for Thomas Gleixner
  0 siblings, 2 replies; 17+ messages in thread
From: Thomas Gleixner @ 2021-12-11 21:02 UTC (permalink / raw)
  To: Stefan Roese, linux-pci
  Cc: Bjorn Helgaas, Michal Simek, Marek Vasut, Greg Kroah-Hartman

Stefan,

On Sat, Dec 11 2021 at 14:58, Stefan Roese wrote:
> On 12/11/21 11:17, Thomas Gleixner wrote:
>> Can you try the patch below?
>
> Sure, please see below.
>
>> It might still be that this Marvell part really combines the per entry
>> mask bits from MSI-X with MSI, then we need both.
>
> With your patch applied only (mine not), the Masked+ is gone but still
> the MSI interrupts are not received in the system. So you seem to have
> guessed correctly, that we need both changes.

Groan. How is that device specification compliant?

Vector Control for MSI-X Table Entries
--------------------------------------

"00: Mask bit:  When this bit is set, the function is prohibited from
                sending a message using this MSI-X Table entry.
                ....
                This bit’s state after reset is 1 (entry is masked)."

So how can that work in the first place if that device is PCI
specification compliant? Seems that PCI/SIG compliance program is just
another rubberstamping nonsense.

Can someone who has access to that group please ask them what their
specification compliance stuff is actualy testing?

Sure, that went unnoticed so far on that marvelous device because the
kernel was missing a defense line, but sigh...

> How to continue? Should I integrate your patch into mine and send a new
> version? Or will you send it separately to the list for integration?

Your patch is incomplete. The function can fail later on, which results
in the same problem, no?

So we need something like the below.

Just to satisfy my curiosity:

  The device supports obviously MSI-X, which is preferred over MSI.

  So why is the MSI-X initialization failing in the first place on this
  platform?

Thanks,

        tglx
---
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -722,9 +722,6 @@ static int msix_capability_init(struct p
 		goto out_disable;
 	}
 
-	/* Ensure that all table entries are masked. */
-	msix_mask_all(base, tsize);
-
 	ret = msix_setup_entries(dev, base, entries, nvec, affd);
 	if (ret)
 		goto out_disable;
@@ -751,6 +748,9 @@ static int msix_capability_init(struct p
 	/* Set MSI-X enabled bits and unmask the function */
 	pci_intx_for_msi(dev, 0);
 	dev->msix_enabled = 1;
+
+	/* Ensure that all table entries are masked. */
+	msix_mask_all(base, tsize);
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 
 	pcibios_free_irq(dev);
@@ -777,7 +777,7 @@ static int msix_capability_init(struct p
 	free_msi_irqs(dev);
 
 out_disable:
-	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
+	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 0);
 
 	return ret;
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] PCI/MSI: Only mask all MSI-X entries when MSI-X is used
  2021-12-11 21:02     ` Thomas Gleixner
@ 2021-12-14 11:10       ` Stefan Roese
  2021-12-14 12:28       ` [tip: irq/urgent] PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Roese @ 2021-12-14 11:10 UTC (permalink / raw)
  To: Thomas Gleixner, linux-pci
  Cc: Bjorn Helgaas, Michal Simek, Marek Vasut, Greg Kroah-Hartman

Hi Thomas,

On 12/11/21 22:02, Thomas Gleixner wrote:
> Stefan,
> 
> On Sat, Dec 11 2021 at 14:58, Stefan Roese wrote:
>> On 12/11/21 11:17, Thomas Gleixner wrote:
>>> Can you try the patch below?
>>
>> Sure, please see below.
>>
>>> It might still be that this Marvell part really combines the per entry
>>> mask bits from MSI-X with MSI, then we need both.
>>
>> With your patch applied only (mine not), the Masked+ is gone but still
>> the MSI interrupts are not received in the system. So you seem to have
>> guessed correctly, that we need both changes.
> 
> Groan. How is that device specification compliant?
> 
> Vector Control for MSI-X Table Entries
> --------------------------------------
> 
> "00: Mask bit:  When this bit is set, the function is prohibited from
>                  sending a message using this MSI-X Table entry.
>                  ....
>                  This bit’s state after reset is 1 (entry is masked)."
> 
> So how can that work in the first place if that device is PCI
> specification compliant? Seems that PCI/SIG compliance program is just
> another rubberstamping nonsense.
> 
> Can someone who has access to that group please ask them what their
> specification compliance stuff is actualy testing?
> 
> Sure, that went unnoticed so far on that marvelous device because the
> kernel was missing a defense line, but sigh...
> 
>> How to continue? Should I integrate your patch into mine and send a new
>> version? Or will you send it separately to the list for integration?
> 
> Your patch is incomplete. The function can fail later on, which results
> in the same problem, no?

Yes, agreed.

> So we need something like the below.

The patch below works fine on my ZynqMP platform. MSI interrupts are now
received okay.

> Just to satisfy my curiosity:
> 
>    The device supports obviously MSI-X, which is preferred over MSI.

I would gladly use MSI-X interrupts, if easily possible. But...

>    So why is the MSI-X initialization failing in the first place on this
>    platform?

... the ZyqnMP PCIe rootport driver only support legacy and MSI
interrupts but not MSI-X (yet) [1].

Thanks,
Stefan

[1] 
https://elixir.bootlin.com/linux/latest/source/drivers/pci/controller/pcie-xilinx-nwl.c

> Thanks,
> 
>          tglx
> ---
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -722,9 +722,6 @@ static int msix_capability_init(struct p
>   		goto out_disable;
>   	}
>   
> -	/* Ensure that all table entries are masked. */
> -	msix_mask_all(base, tsize);
> -
>   	ret = msix_setup_entries(dev, base, entries, nvec, affd);
>   	if (ret)
>   		goto out_disable;
> @@ -751,6 +748,9 @@ static int msix_capability_init(struct p
>   	/* Set MSI-X enabled bits and unmask the function */
>   	pci_intx_for_msi(dev, 0);
>   	dev->msix_enabled = 1;
> +
> +	/* Ensure that all table entries are masked. */
> +	msix_mask_all(base, tsize);
>   	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>   
>   	pcibios_free_irq(dev);
> @@ -777,7 +777,7 @@ static int msix_capability_init(struct p
>   	free_msi_irqs(dev);
>   
>   out_disable:
> -	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
> +	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 0);
>   
>   	return ret;
>   }
> 

Viele Grüße,
Stefan Roese

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-51 Fax: (+49)-8142-66989-80 Email: sr@denx.de

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip: irq/urgent] PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
  2021-12-11 21:02     ` Thomas Gleixner
  2021-12-14 11:10       ` Stefan Roese
@ 2021-12-14 12:28       ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2021-12-14 12:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Stefan Roese, Thomas Gleixner, linux-pci, Bjorn Helgaas,
	Michal Simek, Marek Vasut, stable, x86, linux-kernel, maz

The following commit has been merged into the irq/urgent branch of tip:

Commit-ID:     94185adbfad56815c2c8401e16d81bdb74a79201
Gitweb:        https://git.kernel.org/tip/94185adbfad56815c2c8401e16d81bdb74a79201
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Tue, 14 Dec 2021 12:42:14 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00

PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error

PCI_MSIX_FLAGS_MASKALL is set in the MSI-X control register at MSI-X
interrupt setup time. It's cleared on success, but the error handling path
only clears the PCI_MSIX_FLAGS_ENABLE bit.

That's incorrect as the reset state of the PCI_MSIX_FLAGS_MASKALL bit is
zero. That can be observed via lspci:

        Capabilities: [b0] MSI-X: Enable- Count=67 Masked+

Clear the bit in the error path to restore the reset state.

Fixes: 438553958ba1 ("PCI/MSI: Enable and mask MSI-X early")
Reported-by: Stefan Roese <sr@denx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Stefan Roese <sr@denx.de>
Cc: linux-pci@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Marek Vasut <marex@denx.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tufevoqx.ffs@tglx
---
 drivers/pci/msi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 6748cf9..d84cf30 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -784,7 +784,7 @@ out_free:
 	free_msi_irqs(dev);
 
 out_disable:
-	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
+	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE, 0);
 
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2021-12-10 16:10 [RFC PATCH] PCI/MSI: Only mask all MSI-X entries when MSI-X is used Stefan Roese
  2021-12-11 10:17 ` Thomas Gleixner
@ 2021-12-14 12:28 ` tip-bot2 for Stefan Roese
  2022-03-14 16:36   ` Jeremi Piotrowski
  1 sibling, 1 reply; 17+ messages in thread
From: tip-bot2 for Stefan Roese @ 2021-12-14 12:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Stefan Roese, Thomas Gleixner, linux-pci, Bjorn Helgaas,
	Michal Simek, Marek Vasut, stable, x86, linux-kernel, maz

The following commit has been merged into the irq/urgent branch of tip:

Commit-ID:     83dbf898a2d45289be875deb580e93050ba67529
Gitweb:        https://git.kernel.org/tip/83dbf898a2d45289be875deb580e93050ba67529
Author:        Stefan Roese <sr@denx.de>
AuthorDate:    Tue, 14 Dec 2021 12:49:32 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00

PCI/MSI: Mask MSI-X vectors only on success

Masking all unused MSI-X entries is done to ensure that a crash kernel
starts from a clean slate, which correponds to the reset state of the
device as defined in the PCI-E specificion 3.0 and later:

 Vector Control for MSI-X Table Entries
 --------------------------------------

 "00: Mask bit:  When this bit is set, the function is prohibited from
                 sending a message using this MSI-X Table entry.
                 ...
                 This bit’s state after reset is 1 (entry is masked)."

A Marvell NVME device fails to deliver MSI interrupts after trying to
enable MSI-X interrupts due to that masking. It seems to take the MSI-X
mask bits into account even when MSI-X is disabled.

While not specification compliant, this can be cured by moving the masking
into the success path, so that the MSI-X table entries stay in device reset
state when the MSI-X setup fails.

[ tglx: Move it into the success path, add comment and amend changelog ]

Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries")                                                                                                                                                                                                                 
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-pci@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Marek Vasut <marex@denx.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211210161025.3287927-1-sr@denx.de
---
 drivers/pci/msi.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 48e3f4e..6748cf9 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 		goto out_disable;
 	}
 
-	/* Ensure that all table entries are masked. */
-	msix_mask_all(base, tsize);
-
 	ret = msix_setup_entries(dev, base, entries, nvec, affd);
 	if (ret)
 		goto out_disable;
@@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	/* Set MSI-X enabled bits and unmask the function */
 	pci_intx_for_msi(dev, 0);
 	dev->msix_enabled = 1;
+
+	/*
+	 * Ensure that all table entries are masked to prevent
+	 * stale entries from firing in a crash kernel.
+	 *
+	 * Done late to deal with a broken Marvell NVME device
+	 * which takes the MSI-X mask bits into account even
+	 * when MSI-X is disabled, which prevents MSI delivery.
+	 */
+	msix_mask_all(base, tsize);
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 
 	pcibios_free_irq(dev);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2021-12-14 12:28 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success tip-bot2 for Stefan Roese
@ 2022-03-14 16:36   ` Jeremi Piotrowski
  2022-03-14 16:49     ` Stefan Roese
  0 siblings, 1 reply; 17+ messages in thread
From: Jeremi Piotrowski @ 2022-03-14 16:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Stefan Roese, Thomas Gleixner, linux-pci, Bjorn Helgaas,
	Michal Simek, Marek Vasut, stable, x86, maz

Hi Thomas, Hi Stefan,

On Tue, Dec 14, 2021 at 12:28:06PM -0000, tip-bot2 for Stefan Roese wrote:
> The following commit has been merged into the irq/urgent branch of tip:
> 
> Commit-ID:     83dbf898a2d45289be875deb580e93050ba67529
> Gitweb:        https://git.kernel.org/tip/83dbf898a2d45289be875deb580e93050ba67529
> Author:        Stefan Roese <sr@denx.de>
> AuthorDate:    Tue, 14 Dec 2021 12:49:32 +01:00
> Committer:     Thomas Gleixner <tglx@linutronix.de>
> CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00
> 
> PCI/MSI: Mask MSI-X vectors only on success
> 
> Masking all unused MSI-X entries is done to ensure that a crash kernel
> starts from a clean slate, which correponds to the reset state of the
> device as defined in the PCI-E specificion 3.0 and later:
> 
>  Vector Control for MSI-X Table Entries
>  --------------------------------------
> 
>  "00: Mask bit:  When this bit is set, the function is prohibited from
>                  sending a message using this MSI-X Table entry.
>                  ...
>                  This bit’s state after reset is 1 (entry is masked)."
> 
> A Marvell NVME device fails to deliver MSI interrupts after trying to
> enable MSI-X interrupts due to that masking. It seems to take the MSI-X
> mask bits into account even when MSI-X is disabled.
> 
> While not specification compliant, this can be cured by moving the masking
> into the success path, so that the MSI-X table entries stay in device reset
> state when the MSI-X setup fails.
> 
> [ tglx: Move it into the success path, add comment and amend changelog ]
> 
> Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries")                                                                                                                                                                                                                 
> Signed-off-by: Stefan Roese <sr@denx.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: linux-pci@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Michal Simek <michal.simek@xilinx.com>
> Cc: Marek Vasut <marex@denx.de>
> Cc: stable@vger.kernel.org
> Link: https://lore.kernel.org/r/20211210161025.3287927-1-sr@denx.de
> ---
>  drivers/pci/msi.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 48e3f4e..6748cf9 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>  		goto out_disable;
>  	}
>  
> -	/* Ensure that all table entries are masked. */
> -	msix_mask_all(base, tsize);
> -
>  	ret = msix_setup_entries(dev, base, entries, nvec, affd);
>  	if (ret)
>  		goto out_disable;
> @@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>  	/* Set MSI-X enabled bits and unmask the function */
>  	pci_intx_for_msi(dev, 0);
>  	dev->msix_enabled = 1;
> +
> +	/*
> +	 * Ensure that all table entries are masked to prevent
> +	 * stale entries from firing in a crash kernel.
> +	 *
> +	 * Done late to deal with a broken Marvell NVME device
> +	 * which takes the MSI-X mask bits into account even
> +	 * when MSI-X is disabled, which prevents MSI delivery.
> +	 */
> +	msix_mask_all(base, tsize);
>  	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>  
>  	pcibios_free_irq(dev);

We've had reports of issues with AWS m4 instances, which use Intel 82559 VFs
for networking (ixgbevf) with MSI-X interrupts, which I've bisected down to
this commit. Since this commit these VMs no longer have any network connectivity
and so fail to boot. This occurs with both 5.15 and 5.10 kernels, reverting the
backport of this commit restores networking.

Do you have any suggestions of how this can be resolved other than a revert?

Here's the full bisect log:

$ git bisect log
git bisect start
# good: [4e8c680af6d51ba9315e31bd4f7599e080561a2d] Linux 5.15.7
git bisect good 4e8c680af6d51ba9315e31bd4f7599e080561a2d
# bad: [efe3167e52a5833ec20ee6214be9b99b378564a8] Linux 5.15.27
git bisect bad efe3167e52a5833ec20ee6214be9b99b378564a8
# bad: [63dcc388662c3562de94d69bfa771ae4cd29b79f] Linux 5.15.16
git bisect bad 63dcc388662c3562de94d69bfa771ae4cd29b79f
# good: [57dcae4a8b93271c4e370920ea0dbb94a0215d30] Linux 5.15.10
git bisect good 57dcae4a8b93271c4e370920ea0dbb94a0215d30
# bad: [25960cafa06e6fcd830e6c792e6a7de68c1e25ed] Linux 5.15.12
git bisect bad 25960cafa06e6fcd830e6c792e6a7de68c1e25ed
# bad: [fb6ad5cb3b6745e7bffc5fe19b130f3594375634] Linux 5.15.11
git bisect bad fb6ad5cb3b6745e7bffc5fe19b130f3594375634
# good: [257b3bb16634fd936129fe2f57a91594a75b8751] drm/amd/pm: fix a potential gpu_metrics_table memory leak
git bisect good 257b3bb16634fd936129fe2f57a91594a75b8751
# bad: [bbdaa7a48f465a2ee76d65839caeda08af1ef3b2] btrfs: fix double free of anon_dev after failure to create subvolume
git bisect bad bbdaa7a48f465a2ee76d65839caeda08af1ef3b2
# good: [c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6] bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
git bisect good c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6
# bad: [5cb5c3e1b184da9f49e46119a0e506519fc58185] usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
git bisect bad 5cb5c3e1b184da9f49e46119a0e506519fc58185
# good: [e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4] tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
git bisect good e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4
# good: [4df1af29930b03d61fb774bfaa5100dbdb964628] PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
git bisect good 4df1af29930b03d61fb774bfaa5100dbdb964628
# bad: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success
git bisect bad d8888cdabedf353ab9b5a6af75f70bf341a3e7df
# first bad commit: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success

Bests,
Jeremi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-03-14 16:36   ` Jeremi Piotrowski
@ 2022-03-14 16:49     ` Stefan Roese
  2022-03-14 17:04       ` Dusty Mabe
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Roese @ 2022-03-14 16:49 UTC (permalink / raw)
  To: Jeremi Piotrowski, linux-kernel
  Cc: Thomas Gleixner, linux-pci, Bjorn Helgaas, Michal Simek,
	Marek Vasut, stable, x86, maz, Dusty Mabe

Hi Jeremi,

(added Dusty to Cc)

On 3/14/22 17:36, Jeremi Piotrowski wrote:
> Hi Thomas, Hi Stefan,
> 
> On Tue, Dec 14, 2021 at 12:28:06PM -0000, tip-bot2 for Stefan Roese wrote:
>> The following commit has been merged into the irq/urgent branch of tip:
>>
>> Commit-ID:     83dbf898a2d45289be875deb580e93050ba67529
>> Gitweb:        https://git.kernel.org/tip/83dbf898a2d45289be875deb580e93050ba67529
>> Author:        Stefan Roese <sr@denx.de>
>> AuthorDate:    Tue, 14 Dec 2021 12:49:32 +01:00
>> Committer:     Thomas Gleixner <tglx@linutronix.de>
>> CommitterDate: Tue, 14 Dec 2021 13:23:32 +01:00
>>
>> PCI/MSI: Mask MSI-X vectors only on success
>>
>> Masking all unused MSI-X entries is done to ensure that a crash kernel
>> starts from a clean slate, which correponds to the reset state of the
>> device as defined in the PCI-E specificion 3.0 and later:
>>
>>   Vector Control for MSI-X Table Entries
>>   --------------------------------------
>>
>>   "00: Mask bit:  When this bit is set, the function is prohibited from
>>                   sending a message using this MSI-X Table entry.
>>                   ...
>>                   This bit’s state after reset is 1 (entry is masked)."
>>
>> A Marvell NVME device fails to deliver MSI interrupts after trying to
>> enable MSI-X interrupts due to that masking. It seems to take the MSI-X
>> mask bits into account even when MSI-X is disabled.
>>
>> While not specification compliant, this can be cured by moving the masking
>> into the success path, so that the MSI-X table entries stay in device reset
>> state when the MSI-X setup fails.
>>
>> [ tglx: Move it into the success path, add comment and amend changelog ]
>>
>> Fixes: aa8092c1d1f1 ("PCI/MSI: Mask all unused MSI-X entries")
>> Signed-off-by: Stefan Roese <sr@denx.de>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Cc: linux-pci@vger.kernel.org
>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: Michal Simek <michal.simek@xilinx.com>
>> Cc: Marek Vasut <marex@denx.de>
>> Cc: stable@vger.kernel.org
>> Link: https://lore.kernel.org/r/20211210161025.3287927-1-sr@denx.de
>> ---
>>   drivers/pci/msi.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
>> index 48e3f4e..6748cf9 100644
>> --- a/drivers/pci/msi.c
>> +++ b/drivers/pci/msi.c
>> @@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>>   		goto out_disable;
>>   	}
>>   
>> -	/* Ensure that all table entries are masked. */
>> -	msix_mask_all(base, tsize);
>> -
>>   	ret = msix_setup_entries(dev, base, entries, nvec, affd);
>>   	if (ret)
>>   		goto out_disable;
>> @@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
>>   	/* Set MSI-X enabled bits and unmask the function */
>>   	pci_intx_for_msi(dev, 0);
>>   	dev->msix_enabled = 1;
>> +
>> +	/*
>> +	 * Ensure that all table entries are masked to prevent
>> +	 * stale entries from firing in a crash kernel.
>> +	 *
>> +	 * Done late to deal with a broken Marvell NVME device
>> +	 * which takes the MSI-X mask bits into account even
>> +	 * when MSI-X is disabled, which prevents MSI delivery.
>> +	 */
>> +	msix_mask_all(base, tsize);
>>   	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
>>   
>>   	pcibios_free_irq(dev);
> 
> We've had reports of issues with AWS m4 instances, which use Intel 82559 VFs
> for networking (ixgbevf) with MSI-X interrupts, which I've bisected down to
> this commit. Since this commit these VMs no longer have any network connectivity
> and so fail to boot. This occurs with both 5.15 and 5.10 kernels, reverting the
> backport of this commit restores networking.
> 
> Do you have any suggestions of how this can be resolved other than a revert?
> 
> Here's the full bisect log:
> 
> $ git bisect log
> git bisect start
> # good: [4e8c680af6d51ba9315e31bd4f7599e080561a2d] Linux 5.15.7
> git bisect good 4e8c680af6d51ba9315e31bd4f7599e080561a2d
> # bad: [efe3167e52a5833ec20ee6214be9b99b378564a8] Linux 5.15.27
> git bisect bad efe3167e52a5833ec20ee6214be9b99b378564a8
> # bad: [63dcc388662c3562de94d69bfa771ae4cd29b79f] Linux 5.15.16
> git bisect bad 63dcc388662c3562de94d69bfa771ae4cd29b79f
> # good: [57dcae4a8b93271c4e370920ea0dbb94a0215d30] Linux 5.15.10
> git bisect good 57dcae4a8b93271c4e370920ea0dbb94a0215d30
> # bad: [25960cafa06e6fcd830e6c792e6a7de68c1e25ed] Linux 5.15.12
> git bisect bad 25960cafa06e6fcd830e6c792e6a7de68c1e25ed
> # bad: [fb6ad5cb3b6745e7bffc5fe19b130f3594375634] Linux 5.15.11
> git bisect bad fb6ad5cb3b6745e7bffc5fe19b130f3594375634
> # good: [257b3bb16634fd936129fe2f57a91594a75b8751] drm/amd/pm: fix a potential gpu_metrics_table memory leak
> git bisect good 257b3bb16634fd936129fe2f57a91594a75b8751
> # bad: [bbdaa7a48f465a2ee76d65839caeda08af1ef3b2] btrfs: fix double free of anon_dev after failure to create subvolume
> git bisect bad bbdaa7a48f465a2ee76d65839caeda08af1ef3b2
> # good: [c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6] bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
> git bisect good c8e8e6f4108e4c133b09f31f6cc7557ee6df3bb6
> # bad: [5cb5c3e1b184da9f49e46119a0e506519fc58185] usb: xhci: Extend support for runtime power management for AMD's Yellow carp.
> git bisect bad 5cb5c3e1b184da9f49e46119a0e506519fc58185
> # good: [e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4] tty: n_hdlc: make n_hdlc_tty_wakeup() asynchronous
> git bisect good e7a8a261bab07ec1ed5f5bb990aacc4de9c08eb4
> # good: [4df1af29930b03d61fb774bfaa5100dbdb964628] PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error
> git bisect good 4df1af29930b03d61fb774bfaa5100dbdb964628
> # bad: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success
> git bisect bad d8888cdabedf353ab9b5a6af75f70bf341a3e7df
> # first bad commit: [d8888cdabedf353ab9b5a6af75f70bf341a3e7df] PCI/MSI: Mask MSI-X vectors only on success

I've added Dusty to Cc, as he (and others) already have been dealing
with this issue AFAICT.

Dusty, could you perhaps chime in with the latest status? AFAIU, it's
related to potential issues with the Xen version used on these systems?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-03-14 16:49     ` Stefan Roese
@ 2022-03-14 17:04       ` Dusty Mabe
  2022-03-14 20:29         ` Jeremi Piotrowski
  0 siblings, 1 reply; 17+ messages in thread
From: Dusty Mabe @ 2022-03-14 17:04 UTC (permalink / raw)
  To: Stefan Roese, Jeremi Piotrowski, linux-kernel
  Cc: Thomas Gleixner, linux-pci, Bjorn Helgaas, Michal Simek,
	Marek Vasut, stable, x86, maz



On 3/14/22 12:49, Stefan Roese wrote:

> I've added Dusty to Cc, as he (and others) already have been dealing
> with this issue AFAICT.
> 
> Dusty, could you perhaps chime in with the latest status? AFAIU, it's
> related to potential issues with the Xen version used on these systems?

Thanks Stefan,

Yes. My understanding is that the issue is because AWS is using older versions
of Xen. They are in the process of updating their fleet to a newer version of
Xen so the change introduced with Stefan's commit isn't an issue any longer.

I think the changes are scheduled to be completed in the next 10-12 weeks. For
now we are carrying a revert in the Fedora Kernel.

You can follow this Fedora CoreOS issue if you'd like to know more about when
the change lands in their backend. We work closely with one of their partner
engineers and he keeps us updated. https://github.com/coreos/fedora-coreos-tracker/issues/1066

Dusty


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-03-14 17:04       ` Dusty Mabe
@ 2022-03-14 20:29         ` Jeremi Piotrowski
  2022-04-27  7:59           ` Salvatore Bonaccorso
  0 siblings, 1 reply; 17+ messages in thread
From: Jeremi Piotrowski @ 2022-03-14 20:29 UTC (permalink / raw)
  To: Dusty Mabe
  Cc: Stefan Roese, linux-kernel, Thomas Gleixner, linux-pci,
	Bjorn Helgaas, Michal Simek, Marek Vasut, x86, maz

On Mon, Mar 14, 2022 at 01:04:55PM -0400, Dusty Mabe wrote:
> 
> 
> On 3/14/22 12:49, Stefan Roese wrote:
> 
> > I've added Dusty to Cc, as he (and others) already have been dealing
> > with this issue AFAICT.
> > 
> > Dusty, could you perhaps chime in with the latest status? AFAIU, it's
> > related to potential issues with the Xen version used on these systems?
> 
> Thanks Stefan,
> 
> Yes. My understanding is that the issue is because AWS is using older versions
> of Xen. They are in the process of updating their fleet to a newer version of
> Xen so the change introduced with Stefan's commit isn't an issue any longer.
> 
> I think the changes are scheduled to be completed in the next 10-12 weeks. For
> now we are carrying a revert in the Fedora Kernel.
> 
> You can follow this Fedora CoreOS issue if you'd like to know more about when
> the change lands in their backend. We work closely with one of their partner
> engineers and he keeps us updated. https://github.com/coreos/fedora-coreos-tracker/issues/1066
> 
> Dusty

Thanks for the link and explanation. What a fun coincidence that we hit this in
Flatcar Container Linux as well. We've reverted the commit in our kernels for
the time being, and will track that issue.

Thanks,
Jeremi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-03-14 20:29         ` Jeremi Piotrowski
@ 2022-04-27  7:59           ` Salvatore Bonaccorso
  2022-04-27 17:35             ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Salvatore Bonaccorso @ 2022-04-27  7:59 UTC (permalink / raw)
  To: Jeremi Piotrowski
  Cc: Dusty Mabe, Stefan Roese, linux-kernel, Thomas Gleixner,
	linux-pci, Bjorn Helgaas, Michal Simek, Marek Vasut, x86, maz

Hi,

On Mon, Mar 14, 2022 at 09:29:53PM +0100, Jeremi Piotrowski wrote:
> On Mon, Mar 14, 2022 at 01:04:55PM -0400, Dusty Mabe wrote:
> > 
> > 
> > On 3/14/22 12:49, Stefan Roese wrote:
> > 
> > > I've added Dusty to Cc, as he (and others) already have been dealing
> > > with this issue AFAICT.
> > > 
> > > Dusty, could you perhaps chime in with the latest status? AFAIU, it's
> > > related to potential issues with the Xen version used on these systems?
> > 
> > Thanks Stefan,
> > 
> > Yes. My understanding is that the issue is because AWS is using older versions
> > of Xen. They are in the process of updating their fleet to a newer version of
> > Xen so the change introduced with Stefan's commit isn't an issue any longer.
> > 
> > I think the changes are scheduled to be completed in the next 10-12 weeks. For
> > now we are carrying a revert in the Fedora Kernel.
> > 
> > You can follow this Fedora CoreOS issue if you'd like to know more about when
> > the change lands in their backend. We work closely with one of their partner
> > engineers and he keeps us updated. https://github.com/coreos/fedora-coreos-tracker/issues/1066
> > 
> > Dusty
> 
> Thanks for the link and explanation. What a fun coincidence that we hit this in
> Flatcar Container Linux as well. We've reverted the commit in our kernels for
> the time being, and will track that issue.

Does someone knows here on current state of the AWS instances using
the older Xen version which causes the issues?

AFAIU upstream is not planning to revert 83dbf898a2d4 ("PCI/MSI: Mask
MSI-X vectors only on success") as it fixed a bug. Now several
downstream distros do carry a revert of this commit, which I believe
is an unfortunate situation and wonder if this can be addressed
upstream to deal with the AWS m4.large instance issues.

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-04-27  7:59           ` Salvatore Bonaccorso
@ 2022-04-27 17:35             ` Thomas Gleixner
  2022-04-28 13:48               ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2022-04-27 17:35 UTC (permalink / raw)
  To: Salvatore Bonaccorso, Jeremi Piotrowski
  Cc: Dusty Mabe, Stefan Roese, linux-kernel, linux-pci, Bjorn Helgaas,
	Michal Simek, Marek Vasut, x86, maz, Andrew Cooper,
	Juergen Gross

On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> On Mon, Mar 14, 2022 at 09:29:53PM +0100, Jeremi Piotrowski wrote:
>
> Does someone knows here on current state of the AWS instances using
> the older Xen version which causes the issues?
>
> AFAIU upstream is not planning to revert 83dbf898a2d4 ("PCI/MSI: Mask
> MSI-X vectors only on success") as it fixed a bug. Now several
> downstream distros do carry a revert of this commit, which I believe
> is an unfortunate situation and wonder if this can be addressed
> upstream to deal with the AWS m4.large instance issues.

The problem is that except for a bisect result we've not seen much
information which might help to debug and analyze the problem.

The guest uses MSI-X on that NIC:

 Capabilities: [70] MSI-X: Enable+ Count=3 Masked-

So looking at the commit in question:

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 48e3f4e47b29..6748cf9d7d90 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 		goto out_disable;
 	}
 
-	/* Ensure that all table entries are masked. */
-	msix_mask_all(base, tsize);
-
 	ret = msix_setup_entries(dev, base, entries, nvec, affd);
 	if (ret)
 		goto out_disable;
@@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
 	/* Set MSI-X enabled bits and unmask the function */
 	pci_intx_for_msi(dev, 0);
 	dev->msix_enabled = 1;
+
+	/*
+	 * Ensure that all table entries are masked to prevent
+	 * stale entries from firing in a crash kernel.
+	 *
+	 * Done late to deal with a broken Marvell NVME device
+	 * which takes the MSI-X mask bits into account even
+	 * when MSI-X is disabled, which prevents MSI delivery.
+	 */
+	msix_mask_all(base, tsize);
 	pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);

IOW, it moves the invocation of msix_mask_all() into the success
path.

As the device uses MSI-X this change does not make any difference from a
hardware perspective simply because _all_ MSI-X interrupts are masked
via the CTRL register already. And it does not matter whether the kernel
masks them individually _before_ or _after_ the allocation. At least not
on real hardware and on a sane emulation.

Now this is XEN and neither real hardware nor sane emulation.

It must be a XEN_HVM guest because XEN_PV guests disable PCI/MSI[-X]
completely which makes the invocation of msix_mask_all() a NOP.

If it's not a XEN_HVM guest, then you can stop reading further as I'm
unable to decode why moving a NOP makes a difference. That belongs in to
the realm of voodoo, but then XEN is voodoo at least for me. :)

XEN guests do not use the common PCI mask/unmask machinery which would
unmask the interrupt on request_irq().

So I assume that the following happens:

Guest                     Hypervisor

msix_capabilities_init()
        ....
        alloc_irq()
           xen_magic()  -> alloc_msix_interrupt()
                           request_irq()

        msix_mask_all() -> trap
                             do_magic()
request_irq()
   unmask()
     xen_magic()
       unmask_evtchn()  -> do_more_magic()

So I assume further that msix_mask_all() actually is able to mask the
interrupts in the hardware (ctrl word of the vector table) despite the
hypervisor having allocated and requested the interrupt already.

Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
XEN_PV does. I can only assume the answer is voodoo...

Maybe the XEN people have some more enlightened answers to that.

Thanks,

        tglx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-04-27 17:35             ` Thomas Gleixner
@ 2022-04-28 13:48               ` Thomas Gleixner
  2022-04-28 13:50                 ` [PATCH] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests Thomas Gleixner
  2022-04-28 18:43                 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success Salvatore Bonaccorso
  0 siblings, 2 replies; 17+ messages in thread
From: Thomas Gleixner @ 2022-04-28 13:48 UTC (permalink / raw)
  To: Salvatore Bonaccorso, Jeremi Piotrowski
  Cc: Dusty Mabe, Stefan Roese, linux-kernel, linux-pci, Bjorn Helgaas,
	Michal Simek, Marek Vasut, x86, maz, Andrew Cooper,
	Juergen Gross

On Wed, Apr 27 2022 at 19:35, Thomas Gleixner wrote:
> On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> XEN guests do not use the common PCI mask/unmask machinery which would
> unmask the interrupt on request_irq().
>
> So I assume that the following happens:
>
> Guest                     Hypervisor
>
> msix_capabilities_init()
>         ....
>         alloc_irq()
>            xen_magic()  -> alloc_msix_interrupt()
>                            request_irq()
>
>         msix_mask_all() -> trap
>                              do_magic()
> request_irq()
>    unmask()
>      xen_magic()
>        unmask_evtchn()  -> do_more_magic()
>
> So I assume further that msix_mask_all() actually is able to mask the
> interrupts in the hardware (ctrl word of the vector table) despite the
> hypervisor having allocated and requested the interrupt already.
>
> Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
> really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
> XEN_PV does. I can only assume the answer is voodoo...
>
> Maybe the XEN people have some more enlightened answers to that.

So I was talking to Juergen about this and he agrees, that for the case
where a XEN HVM guest uses the PIRQ/Eventchannel mechanism PCI/MSI[-X]
masking should be disabled like it is done for XEN PV.

Why the hypervisor grants the mask write is still mysterious, but I
leave that to the folks who can master the XEN voodoo.

I'll send out a patch in minute.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
  2022-04-28 13:48               ` Thomas Gleixner
@ 2022-04-28 13:50                 ` Thomas Gleixner
  2022-04-28 18:43                 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success Salvatore Bonaccorso
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2022-04-28 13:50 UTC (permalink / raw)
  To: Salvatore Bonaccorso, Jeremi Piotrowski
  Cc: Dusty Mabe, Stefan Roese, linux-kernel, linux-pci, Bjorn Helgaas,
	Michal Simek, Marek Vasut, x86, maz, Andrew Cooper,
	Juergen Gross

When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then
PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to
XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI
layer.

This can lead to a situation where the PCI/MSI layer masks an MSI[-X]
interrupt and the hypervisor grants the write despite the fact that it
already requested the interrupt. As a consequence interrupt delivery on the
affected device is not happening ever.

Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests
already.

Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reported-by: Dusty Mabe <dustymabe@redhat.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Fixes: 809f9267bbab ("xen: map MSIs into pirqs")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
 arch/x86/pci/xen.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -467,7 +467,6 @@ static __init void xen_setup_pci_msi(voi
 		else
 			xen_msi_ops.setup_msi_irqs = xen_setup_msi_irqs;
 		xen_msi_ops.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
-		pci_msi_ignore_mask = 1;
 	} else if (xen_hvm_domain()) {
 		xen_msi_ops.setup_msi_irqs = xen_hvm_setup_msi_irqs;
 		xen_msi_ops.teardown_msi_irqs = xen_teardown_msi_irqs;
@@ -481,6 +480,11 @@ static __init void xen_setup_pci_msi(voi
 	 * in allocating the native domain and never use it.
 	 */
 	x86_init.irqs.create_pci_msi_domain = xen_create_pci_msi_domain;
+	/*
+	 * With XEN PIRQ/Eventchannels in use PCI/MSI[-X] masking is solely
+	 * controlled by the hypervisor.
+	 */
+	pci_msi_ignore_mask = 1;
 }
 
 #else /* CONFIG_PCI_MSI */

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-04-28 13:48               ` Thomas Gleixner
  2022-04-28 13:50                 ` [PATCH] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests Thomas Gleixner
@ 2022-04-28 18:43                 ` Salvatore Bonaccorso
  2022-04-29  6:37                   ` Salvatore Bonaccorso
  1 sibling, 1 reply; 17+ messages in thread
From: Salvatore Bonaccorso @ 2022-04-28 18:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jeremi Piotrowski, Dusty Mabe, Stefan Roese, linux-kernel,
	linux-pci, Bjorn Helgaas, Michal Simek, Marek Vasut, x86, maz,
	Andrew Cooper, Juergen Gross, Noah Meyerhans

Hi Thomas,

On Thu, Apr 28, 2022 at 03:48:03PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 27 2022 at 19:35, Thomas Gleixner wrote:
> > On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> > XEN guests do not use the common PCI mask/unmask machinery which would
> > unmask the interrupt on request_irq().
> >
> > So I assume that the following happens:
> >
> > Guest                     Hypervisor
> >
> > msix_capabilities_init()
> >         ....
> >         alloc_irq()
> >            xen_magic()  -> alloc_msix_interrupt()
> >                            request_irq()
> >
> >         msix_mask_all() -> trap
> >                              do_magic()
> > request_irq()
> >    unmask()
> >      xen_magic()
> >        unmask_evtchn()  -> do_more_magic()
> >
> > So I assume further that msix_mask_all() actually is able to mask the
> > interrupts in the hardware (ctrl word of the vector table) despite the
> > hypervisor having allocated and requested the interrupt already.
> >
> > Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
> > really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
> > XEN_PV does. I can only assume the answer is voodoo...
> >
> > Maybe the XEN people have some more enlightened answers to that.
> 
> So I was talking to Juergen about this and he agrees, that for the case
> where a XEN HVM guest uses the PIRQ/Eventchannel mechanism PCI/MSI[-X]
> masking should be disabled like it is done for XEN PV.
> 
> Why the hypervisor grants the mask write is still mysterious, but I
> leave that to the folks who can master the XEN voodoo.
> 
> I'll send out a patch in minute.

Thank you. We are having Noah Meyerhans now testing the patch and he
will report back if it works (Cc'ed here now).

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success
  2022-04-28 18:43                 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success Salvatore Bonaccorso
@ 2022-04-29  6:37                   ` Salvatore Bonaccorso
  0 siblings, 0 replies; 17+ messages in thread
From: Salvatore Bonaccorso @ 2022-04-29  6:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jeremi Piotrowski, Dusty Mabe, Stefan Roese, linux-kernel,
	linux-pci, Bjorn Helgaas, Michal Simek, Marek Vasut, x86, maz,
	Andrew Cooper, Juergen Gross, Noah Meyerhans

Hi Thomas,

On Thu, Apr 28, 2022 at 08:43:10PM +0200, Salvatore Bonaccorso wrote:
> Hi Thomas,
> 
> On Thu, Apr 28, 2022 at 03:48:03PM +0200, Thomas Gleixner wrote:
> > On Wed, Apr 27 2022 at 19:35, Thomas Gleixner wrote:
> > > On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> > > XEN guests do not use the common PCI mask/unmask machinery which would
> > > unmask the interrupt on request_irq().
> > >
> > > So I assume that the following happens:
> > >
> > > Guest                     Hypervisor
> > >
> > > msix_capabilities_init()
> > >         ....
> > >         alloc_irq()
> > >            xen_magic()  -> alloc_msix_interrupt()
> > >                            request_irq()
> > >
> > >         msix_mask_all() -> trap
> > >                              do_magic()
> > > request_irq()
> > >    unmask()
> > >      xen_magic()
> > >        unmask_evtchn()  -> do_more_magic()
> > >
> > > So I assume further that msix_mask_all() actually is able to mask the
> > > interrupts in the hardware (ctrl word of the vector table) despite the
> > > hypervisor having allocated and requested the interrupt already.
> > >
> > > Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
> > > really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
> > > XEN_PV does. I can only assume the answer is voodoo...
> > >
> > > Maybe the XEN people have some more enlightened answers to that.
> > 
> > So I was talking to Juergen about this and he agrees, that for the case
> > where a XEN HVM guest uses the PIRQ/Eventchannel mechanism PCI/MSI[-X]
> > masking should be disabled like it is done for XEN PV.
> > 
> > Why the hypervisor grants the mask write is still mysterious, but I
> > leave that to the folks who can master the XEN voodoo.
> > 
> > I'll send out a patch in minute.
> 
> Thank you. We are having Noah Meyerhans now testing the patch and he
> will report back if it works (Cc'ed here now).

To confirm: Noah tested the patch on various different Xen (and other
VM configurations) and works well.

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-04-29  6:37 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-10 16:10 [RFC PATCH] PCI/MSI: Only mask all MSI-X entries when MSI-X is used Stefan Roese
2021-12-11 10:17 ` Thomas Gleixner
2021-12-11 13:58   ` Stefan Roese
2021-12-11 21:02     ` Thomas Gleixner
2021-12-14 11:10       ` Stefan Roese
2021-12-14 12:28       ` [tip: irq/urgent] PCI/MSI: Clear PCI_MSIX_FLAGS_MASKALL on error tip-bot2 for Thomas Gleixner
2021-12-14 12:28 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success tip-bot2 for Stefan Roese
2022-03-14 16:36   ` Jeremi Piotrowski
2022-03-14 16:49     ` Stefan Roese
2022-03-14 17:04       ` Dusty Mabe
2022-03-14 20:29         ` Jeremi Piotrowski
2022-04-27  7:59           ` Salvatore Bonaccorso
2022-04-27 17:35             ` Thomas Gleixner
2022-04-28 13:48               ` Thomas Gleixner
2022-04-28 13:50                 ` [PATCH] x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests Thomas Gleixner
2022-04-28 18:43                 ` [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success Salvatore Bonaccorso
2022-04-29  6:37                   ` Salvatore Bonaccorso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).