Linux-PCI Archive on lore.kernel.org
 help / Atom feed
* [PATCH] PCI: dwc: fix MSI IRQ handler ordering
@ 2019-01-04 21:45 Stephen Warren
  2019-01-04 22:45 ` Trent Piepho
  2019-01-07  9:36 ` Gustavo Pimentel
  0 siblings, 2 replies; 8+ messages in thread
From: Stephen Warren @ 2019-01-04 21:45 UTC (permalink / raw)
  To: Jingoo Han, Gustavo Pimentel
  Cc: Lorenzo Pieralisi, Bjorn Helgaas, linux-pci, Stephen Warren,
	Arnd Bergmann, Faiz Abbas, Harro Haan, Jingoo Han, Joao Pinto,
	Juergen Beisert, Marek Vasut, Matthias Mann, Mohit Kumar,
	Pratyush Anand, Richard Zhu, Sean Cross, Shawn Guo,
	Siva Reddy Kallam, Srikanth T Shivanand, Tim Harvey

From: Stephen Warren <swarren@nvidia.com>

The current code does this when handling MSI IRQs:

a) Process the irq.
b) Clear the latched IRQ status.

If a new IRQ occurs any time after (a) has read the IRQ status for the
last time and before (b), it will be lost. For example, this occurs in
practice when using a Marvell 9171 AHCI controller with NCQ enabled;
many command timeouts occur with certain disk access patterns.

Fix the code to do the following instead, so that if any new IRQs are
raised during the processing of the IRQ, the IRQ status is not cleared,
so that the IRQ is not lost.

a) Clear the latched IRQ status.
b) Process the IRQ.

This change reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt
status after it is handled, not before")

This change re-applies commit ca1658921b63 ("PCI: designware: Fix
missing MSI IRQs")

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Faiz Abbas <faiz_abbas@ti.com>
Cc: Harro Haan <hrhaan@gmail.com>
Cc: Jingoo Han <jg1.han@samsung.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Juergen Beisert <jbe@pengutronix.de>
Cc: Marek Vasut <marex@denx.de>
Cc: Matthias Mann <m.mann@arkona-technologies.de>
Cc: Mohit Kumar <mohit.kumar@st.com>
Cc: Pratyush Anand <pratyush.anand@st.com>
Cc: Richard Zhu <hong-xing.zhu@freescale.com>
Cc: Sean Cross <xobs@kosagi.com>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Siva Reddy Kallam <siva.kallam@samsung.com>
Cc: Srikanth T Shivanand <ts.srikanth@samsung.com>
Cc: Tim Harvey <tharvey@gateworks.com>
---
Note: This issue was found in downstream NVIDIA 4.9 and 4.14 kernels.
However, the exact same code structure is present in mainline and I have
no reason to believe the problem would not reproduce there. I have
compile tested but not runtime tested it in mainline, since my board is
not yet supported in mainline.
---
 drivers/pci/controller/dwc/pcie-designware-host.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 692dd1b264fb..7fd6c56a6f35 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -98,10 +98,10 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
 			irq = irq_find_mapping(pp->irq_domain,
 					       (i * MAX_MSI_IRQS_PER_CTRL) +
 					       pos);
-			generic_handle_irq(irq);
 			dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS +
 						(i * MSI_REG_CTRL_BLOCK_SIZE),
 					    4, 1 << pos);
+			generic_handle_irq(irq);
 			pos++;
 		}
 	}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: dwc: fix MSI IRQ handler ordering
  2019-01-04 21:45 [PATCH] PCI: dwc: fix MSI IRQ handler ordering Stephen Warren
@ 2019-01-04 22:45 ` Trent Piepho
  2019-01-04 23:00   ` Stephen Warren
  2019-01-07  9:36 ` Gustavo Pimentel
  1 sibling, 1 reply; 8+ messages in thread
From: Trent Piepho @ 2019-01-04 22:45 UTC (permalink / raw)
  To: jingoohan1, swarren, gustavo.pimentel
  Cc: jpinto, marex, shawn.guo, pratyush.anand, jg1.han, tharvey,
	lorenzo.pieralisi, swarren, m.mann, faiz_abbas, mohit.kumar,
	xobs, siva.kallam, hrhaan, jbe, hong-xing.zhu, ts.srikanth,
	linux-pci, bhelgaas, arnd

On Fri, 2019-01-04 at 14:45 -0700, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> The current code does this when handling MSI IRQs:
> 
> a) Process the irq.
> b) Clear the latched IRQ status.
> 
> If a new IRQ occurs any time after (a) has read the IRQ status for the
> last time and before (b), it will be lost. For example, this occurs in

There was a patch series to fix this bug, sent in October, stretching
to December.

See "PCI: dwc: Fix interrupt race in when handling MSI"

And "PCI: designware: Move interrupt acking into the proper callback"

The result was more code changes, which in the end produce the same
order of operations as is done here to fix the race.  But it uses the
irq framework correctly.

I do not think this problem was introduced until 4.14, so the nvidia
4.9 must have additional patches if this is observed there.  I'm sure
it was not present in 4.12, as we used that, and did not see the issue
until updating to 4.16.

The current fix (for 4.21?) depends on significant restructuring that
was done to this driver around 4.17, see "PCI: dwc: Move MSI IRQs
allocation to IRQ domains hierarchical API".

I think my original patch ("Fix interrupt race ..") or your patch here,
which is basically the same, should be ported to 4.14 stable.  But
there are some other opinions on that.  It's not clear to me if the
hierarchical domain stuff would be back ported to stable series.  It
does not seem to me to be a good idea to do so.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: dwc: fix MSI IRQ handler ordering
  2019-01-04 22:45 ` Trent Piepho
@ 2019-01-04 23:00   ` Stephen Warren
  2019-01-04 23:23     ` Stephen Warren
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Warren @ 2019-01-04 23:00 UTC (permalink / raw)
  To: Trent Piepho, jingoohan1, gustavo.pimentel
  Cc: jpinto, marex, shawn.guo, tharvey, lorenzo.pieralisi, swarren,
	m.mann, faiz_abbas, xobs, hrhaan, jbe, linux-pci, bhelgaas, arnd

On 1/4/19 3:45 PM, Trent Piepho wrote:
> On Fri, 2019-01-04 at 14:45 -0700, Stephen Warren wrote:
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> The current code does this when handling MSI IRQs:
>>
>> a) Process the irq.
>> b) Clear the latched IRQ status.
>>
>> If a new IRQ occurs any time after (a) has read the IRQ status for the
>> last time and before (b), it will be lost. For example, this occurs in
> 
> There was a patch series to fix this bug, sent in October, stretching
> to December.
> 
> See "PCI: dwc: Fix interrupt race in when handling MSI"
> 
> And "PCI: designware: Move interrupt acking into the proper callback"
> 
> The result was more code changes, which in the end produce the same
> order of operations as is done here to fix the race.  But it uses the
> irq framework correctly.
> 
> I do not think this problem was introduced until 4.14, so the nvidia
> 4.9 must have additional patches if this is observed there.  I'm sure
> it was not present in 4.12, as we used that, and did not see the issue
> until updating to 4.16.

Yes, I took the DWC driver from our 4.14 kernel and dropped it into the 
4.9 kernel, so that's why the issue shows up in 4.9 for us.

> The current fix (for 4.21?) depends on significant restructuring that
> was done to this driver around 4.17, see "PCI: dwc: Move MSI IRQs
> allocation to IRQ domains hierarchical API".
> 
> I think my original patch ("Fix interrupt race ..") or your patch here,
> which is basically the same, should be ported to 4.14 stable.  But
> there are some other opinions on that.  It's not clear to me if the
> hierarchical domain stuff would be back ported to stable series.  It
> does not seem to me to be a good idea to do so.

Hmm. Well I guess I'll go for the patch I posted in our downstream 
kernels, since back-porting a bunch of not-yet-available restructuring 
to our ancient kernels doesn't sound pleasant:-) But I'll go and take a 
quick look at the other patches you mentioned just in case. Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: dwc: fix MSI IRQ handler ordering
  2019-01-04 23:00   ` Stephen Warren
@ 2019-01-04 23:23     ` Stephen Warren
  2019-01-05  0:15       ` Trent Piepho
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Warren @ 2019-01-04 23:23 UTC (permalink / raw)
  To: Trent Piepho, gustavo.pimentel, lorenzo.pieralisi
  Cc: jingoohan1, jpinto, marex, shawn.guo, tharvey, swarren, m.mann,
	faiz_abbas, xobs, hrhaan, jbe, linux-pci, bhelgaas, arnd

On 1/4/19 4:00 PM, Stephen Warren wrote:
> On 1/4/19 3:45 PM, Trent Piepho wrote:
>> On Fri, 2019-01-04 at 14:45 -0700, Stephen Warren wrote:
>>> From: Stephen Warren <swarren@nvidia.com>
>>>
>>> The current code does this when handling MSI IRQs:
>>>
>>> a) Process the irq.
>>> b) Clear the latched IRQ status.
>>>
>>> If a new IRQ occurs any time after (a) has read the IRQ status for the
>>> last time and before (b), it will be lost. For example, this occurs in
>>
>> There was a patch series to fix this bug, sent in October, stretching
>> to December.
>>
>> See "PCI: dwc: Fix interrupt race in when handling MSI"
>>
>> And "PCI: designware: Move interrupt acking into the proper callback"
>>
>> The result was more code changes, which in the end produce the same
>> order of operations as is done here to fix the race.  But it uses the
>> irq framework correctly.
>>
>> I do not think this problem was introduced until 4.14, so the nvidia
>> 4.9 must have additional patches if this is observed there.  I'm sure
>> it was not present in 4.12, as we used that, and did not see the issue
>> until updating to 4.16.
> 
> Yes, I took the DWC driver from our 4.14 kernel and dropped it into the 
> 4.9 kernel, so that's why the issue shows up in 4.9 for us.
> 
>> The current fix (for 4.21?) depends on significant restructuring that
>> was done to this driver around 4.17, see "PCI: dwc: Move MSI IRQs
>> allocation to IRQ domains hierarchical API".
>>
>> I think my original patch ("Fix interrupt race ..") or your patch here,
>> which is basically the same, should be ported to 4.14 stable.  But
>> there are some other opinions on that.  It's not clear to me if the
>> hierarchical domain stuff would be back ported to stable series.  It
>> does not seem to me to be a good idea to do so.
> 
> Hmm. Well I guess I'll go for the patch I posted in our downstream 
> kernels, since back-porting a bunch of not-yet-available restructuring 
> to our ancient kernels doesn't sound pleasant:-) But I'll go and take a 
> quick look at the other patches you mentioned just in case. Thanks!

So I went and read the thread for "PCI: dwc: Fix interrupt race in when 
handling MSI" and see:

Vignesh R wrote:
 > Lorenzo Pieralisi wrote:
>> AFAICS:
>> 
>> 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status after it is handled,
>> not before")
>> 
>> was fixing a bug, causing "timeouts on some wireless lan cards", we want
>> to understand what the problem is, fix it once for all on all DWC
>> based systems.
>> 
> 
> That issue was root caused to be due to a HW errata in dra7xx DWC
> wrapper which requires a special way of handling MSI interrupts at
> wrapper level. More info in this thread:
> https://www.spinics.net/lists/linux-pci/msg70462.html
> 
> Unfortunately, commit 8c934095fa2f did not fix WLAN issue in longer
> tests and also broke PCIe USB cards. Therefore, it makes sense to revert
> 8c934095fa2f
> 
> I am working on patches fix dra7xx wrapper for WLAN card issue.

So it seems quite clear that the correct course of action is to 
immediately revert commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt 
status after it is handled, not before") (or apply my or Trent's patches 
which are effectively reverts) since it (a) attempts to fix a bug (in 
the core DWC driver) that doesn't actually exist (the bug is in the DRA 
HW and is being fixed in the DRA wrapper driver), and (b) has many 
reports that the change causes regressions; I'm at least the 3rd or 4th 
person to confirm this now (Faiz originally, then later Trent, Vignesh, 
myself).

Now, whether there's some cleanup or additional fixes needed beyond the 
simple revert is another question I currently have no insight into, but 
let's at least get back to a driver without that worked well for people 
for years, even if there's a theoretical issue to be fixed that nobody 
hit in practice.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: dwc: fix MSI IRQ handler ordering
  2019-01-04 23:23     ` Stephen Warren
@ 2019-01-05  0:15       ` Trent Piepho
  0 siblings, 0 replies; 8+ messages in thread
From: Trent Piepho @ 2019-01-05  0:15 UTC (permalink / raw)
  To: lorenzo.pieralisi, swarren, gustavo.pimentel
  Cc: jpinto, marex, shawn.guo, jingoohan1, tharvey, m.mann, swarren,
	faiz_abbas, xobs, hrhaan, jbe, linux-pci, bhelgaas, arnd

On Fri, 2019-01-04 at 16:23 -0700, Stephen Warren wrote:
> On 1/4/19 4:00 PM, Stephen Warren wrote:
> > 
> > 
> > Hmm. Well I guess I'll go for the patch I posted in our downstream 
> > kernels, since back-porting a bunch of not-yet-available restructuring 
> > to our ancient kernels doesn't sound pleasant:-) But I'll go and take a 
> > quick look at the other patches you mentioned just in case. Thanks!
> 
> So I went and read the thread for "PCI: dwc: Fix interrupt race in when 
> handling MSI" and see:
> 
> Vignesh R wrote:
>  > Lorenzo Pieralisi wrote:
> > > AFAICS:
> > > 
> > > 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status after it is handled,
> > > not before")
> > > 
> > > was fixing a bug, causing "timeouts on some wireless lan cards", we want
> > > to understand what the problem is, fix it once for all on all DWC
> > > based systems.
> > > 
> So it seems quite clear that the correct course of action is to 
> immediately revert commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt 
> status after it is handled, not before") (or apply my or Trent's patches 
> which are effectively reverts) since it (a) attempts to fix a bug (in 
> the core DWC driver) that doesn't actually exist (the bug is in the DRA 
> HW and is being fixed in the DRA wrapper driver), and (b) has many 
> reports that the change causes regressions; I'm at least the 3rd or 4th 
> person to confirm this now (Faiz originally, then later Trent, Vignesh, 
> myself).

It also seemed clear to me!  But if you check the thread fully, this
was very forcefully rejected.

> Now, whether there's some cleanup or additional fixes needed beyond the 
> simple revert is another question I currently have no insight into, but 
> let's at least get back to a driver without that worked well for people 
> for years, even if there's a theoretical issue to be fixed that nobody 
> hit in practice.

I do not think a real or theoretical mechanism for incorrect behavior
was identified beyond the revert we know of.  More of a case of pieces
not being where the framework intended them to be.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: dwc: fix MSI IRQ handler ordering
  2019-01-04 21:45 [PATCH] PCI: dwc: fix MSI IRQ handler ordering Stephen Warren
  2019-01-04 22:45 ` Trent Piepho
@ 2019-01-07  9:36 ` Gustavo Pimentel
  2019-01-07 17:16   ` Stephen Warren
  1 sibling, 1 reply; 8+ messages in thread
From: Gustavo Pimentel @ 2019-01-07  9:36 UTC (permalink / raw)
  To: Stephen Warren, Jingoo Han, Gustavo Pimentel
  Cc: Lorenzo Pieralisi, Bjorn Helgaas, linux-pci, Stephen Warren,
	Arnd Bergmann, Faiz Abbas, Harro Haan, Jingoo Han, Joao Pinto,
	Juergen Beisert, Marek Vasut, Matthias Mann, Mohit Kumar,
	Pratyush Anand, Richard Zhu, Sean Cross, Shawn Guo,
	Siva Reddy Kallam, Srikanth T Shivanand, Tim Harvey

Hi Stephen,

On 04/01/2019 21:45, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
>
> The current code does this when handling MSI IRQs:
>
> a) Process the irq.
> b) Clear the latched IRQ status.
>
> If a new IRQ occurs any time after (a) has read the IRQ status for the
> last time and before (b), it will be lost. For example, this occurs in
> practice when using a Marvell 9171 AHCI controller with NCQ enabled;
> many command timeouts occur with certain disk access patterns.
>
> Fix the code to do the following instead, so that if any new IRQs are
> raised during the processing of the IRQ, the IRQ status is not cleared,
> so that the IRQ is not lost.
>
> a) Clear the latched IRQ status.
> b) Process the IRQ.
>
> This change reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt
> status after it is handled, not before")
>
> This change re-applies commit ca1658921b63 ("PCI: designware: Fix
> missing MSI IRQs")
>
> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Faiz Abbas <faiz_abbas@ti.com>
> Cc: Harro Haan <hrhaan@gmail.com>
> Cc: Jingoo Han <jg1.han@samsung.com>
> Cc: Joao Pinto <jpinto@synopsys.com>
> Cc: Juergen Beisert <jbe@pengutronix.de>
> Cc: Marek Vasut <marex@denx.de>
> Cc: Matthias Mann <m.mann@arkona-technologies.de>
> Cc: Mohit Kumar <mohit.kumar@st.com>
> Cc: Pratyush Anand <pratyush.anand@st.com>
> Cc: Richard Zhu <hong-xing.zhu@freescale.com>
> Cc: Sean Cross <xobs@kosagi.com>
> Cc: Shawn Guo <shawn.guo@linaro.org>
> Cc: Siva Reddy Kallam <siva.kallam@samsung.com>
> Cc: Srikanth T Shivanand <ts.srikanth@samsung.com>
> Cc: Tim Harvey <tharvey@gateworks.com>
> ---
> Note: This issue was found in downstream NVIDIA 4.9 and 4.14 kernels.
> However, the exact same code structure is present in mainline and I have
> no reason to believe the problem would not reproduce there. I have
> compile tested but not runtime tested it in mainline, since my board is
> not yet supported in mainline.
> ---
>  drivers/pci/controller/dwc/pcie-designware-host.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> index 692dd1b264fb..7fd6c56a6f35 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -98,10 +98,10 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
>  			irq = irq_find_mapping(pp->irq_domain,
>  					       (i * MAX_MSI_IRQS_PER_CTRL) +
>  					       pos);
> -			generic_handle_irq(irq);
>  			dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS +
>  						(i * MSI_REG_CTRL_BLOCK_SIZE),
>  					    4, 1 << pos);
> +			generic_handle_irq(irq);
>  			pos++;
>  		}
>  	}

This fix was already suggested by Trent Piepho, however, after review by Marc
Zyngier and some information obtained from Synopsys IP development team, the
following set of patches was generated to correct the problem.

830920e065e9 ("PCI: dwc: Use interrupt masking instead of disabling")
fce5423e4f43 ("PCI: dwc: Take lock when ACKing an interrupt")
3f7bb2ec20ce ("PCI: dwc: Move interrupt acking into the proper callback")

Regards,

Gustavo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: dwc: fix MSI IRQ handler ordering
  2019-01-07  9:36 ` Gustavo Pimentel
@ 2019-01-07 17:16   ` Stephen Warren
  2019-01-07 18:09     ` Gustavo Pimentel
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Warren @ 2019-01-07 17:16 UTC (permalink / raw)
  To: Gustavo Pimentel
  Cc: Jingoo Han, Lorenzo Pieralisi, Bjorn Helgaas, linux-pci,
	Stephen Warren, Arnd Bergmann, Faiz Abbas, Harro Haan,
	Jingoo Han, Joao Pinto, Juergen Beisert, Marek Vasut,
	Matthias Mann, Mohit Kumar, Pratyush Anand, Richard Zhu,
	Sean Cross, Shawn Guo, Siva Reddy Kallam, Srikanth T Shivanand,
	Tim Harvey

On 1/7/19 2:36 AM, Gustavo Pimentel wrote:
> Hi Stephen,
> 
> On 04/01/2019 21:45, Stephen Warren wrote:
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> The current code does this when handling MSI IRQs:
>>
>> a) Process the irq.
>> b) Clear the latched IRQ status.
>>
>> If a new IRQ occurs any time after (a) has read the IRQ status for the
>> last time and before (b), it will be lost. For example, this occurs in
>> practice when using a Marvell 9171 AHCI controller with NCQ enabled;
>> many command timeouts occur with certain disk access patterns.
>>
>> Fix the code to do the following instead, so that if any new IRQs are
>> raised during the processing of the IRQ, the IRQ status is not cleared,
>> so that the IRQ is not lost.
>>
>> a) Clear the latched IRQ status.
>> b) Process the IRQ.
>>
>> This change reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt
>> status after it is handled, not before")
>>
>> This change re-applies commit ca1658921b63 ("PCI: designware: Fix
>> missing MSI IRQs")
>>
>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: Faiz Abbas <faiz_abbas@ti.com>
>> Cc: Harro Haan <hrhaan@gmail.com>
>> Cc: Jingoo Han <jg1.han@samsung.com>
>> Cc: Joao Pinto <jpinto@synopsys.com>
>> Cc: Juergen Beisert <jbe@pengutronix.de>
>> Cc: Marek Vasut <marex@denx.de>
>> Cc: Matthias Mann <m.mann@arkona-technologies.de>
>> Cc: Mohit Kumar <mohit.kumar@st.com>
>> Cc: Pratyush Anand <pratyush.anand@st.com>
>> Cc: Richard Zhu <hong-xing.zhu@freescale.com>
>> Cc: Sean Cross <xobs@kosagi.com>
>> Cc: Shawn Guo <shawn.guo@linaro.org>
>> Cc: Siva Reddy Kallam <siva.kallam@samsung.com>
>> Cc: Srikanth T Shivanand <ts.srikanth@samsung.com>
>> Cc: Tim Harvey <tharvey@gateworks.com>
>> ---
>> Note: This issue was found in downstream NVIDIA 4.9 and 4.14 kernels.
>> However, the exact same code structure is present in mainline and I have
>> no reason to believe the problem would not reproduce there. I have
>> compile tested but not runtime tested it in mainline, since my board is
>> not yet supported in mainline.
>> ---
>>   drivers/pci/controller/dwc/pcie-designware-host.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
>> index 692dd1b264fb..7fd6c56a6f35 100644
>> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
>> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
>> @@ -98,10 +98,10 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
>>   			irq = irq_find_mapping(pp->irq_domain,
>>   					       (i * MAX_MSI_IRQS_PER_CTRL) +
>>   					       pos);
>> -			generic_handle_irq(irq);
>>   			dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS +
>>   						(i * MSI_REG_CTRL_BLOCK_SIZE),
>>   					    4, 1 << pos);
>> +			generic_handle_irq(irq);
>>   			pos++;
>>   		}
>>   	}
> 
> This fix was already suggested by Trent Piepho, however, after review by Marc
> Zyngier and some information obtained from Synopsys IP development team, the
> following set of patches was generated to correct the problem.
> 
> 830920e065e9 ("PCI: dwc: Use interrupt masking instead of disabling")
> fce5423e4f43 ("PCI: dwc: Take lock when ACKing an interrupt")
> 3f7bb2ec20ce ("PCI: dwc: Move interrupt acking into the proper callback")

OK. I assume that dw_pci_bottom_ack() gets called before 
dw_handle_msi_irq() for each IRQ? In which case, the last of those 3 
changes is essentially identical to the patch I proposed, it's just that 
the acking is handled in a separate function rather than earlier in the 
same function.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: dwc: fix MSI IRQ handler ordering
  2019-01-07 17:16   ` Stephen Warren
@ 2019-01-07 18:09     ` Gustavo Pimentel
  0 siblings, 0 replies; 8+ messages in thread
From: Gustavo Pimentel @ 2019-01-07 18:09 UTC (permalink / raw)
  To: Stephen Warren, Gustavo Pimentel
  Cc: Jingoo Han, Lorenzo Pieralisi, Bjorn Helgaas, linux-pci,
	Stephen Warren, Arnd Bergmann, Faiz Abbas, Harro Haan,
	Jingoo Han, Joao Pinto, Juergen Beisert, Marek Vasut,
	Matthias Mann, Mohit Kumar, Pratyush Anand, Richard Zhu,
	Sean Cross, Shawn Guo, Siva Reddy Kallam, Srikanth T Shivanand,
	Tim Harvey

On 07/01/2019 17:16, Stephen Warren wrote:
> On 1/7/19 2:36 AM, Gustavo Pimentel wrote:
>> Hi Stephen,
>>
>> On 04/01/2019 21:45, Stephen Warren wrote:
>>> From: Stephen Warren <swarren@nvidia.com>
>>>
>>> The current code does this when handling MSI IRQs:
>>>
>>> a) Process the irq.
>>> b) Clear the latched IRQ status.
>>>
>>> If a new IRQ occurs any time after (a) has read the IRQ status for the
>>> last time and before (b), it will be lost. For example, this occurs in
>>> practice when using a Marvell 9171 AHCI controller with NCQ enabled;
>>> many command timeouts occur with certain disk access patterns.
>>>
>>> Fix the code to do the following instead, so that if any new IRQs are
>>> raised during the processing of the IRQ, the IRQ status is not cleared,
>>> so that the IRQ is not lost.
>>>
>>> a) Clear the latched IRQ status.
>>> b) Process the IRQ.
>>>
>>> This change reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt
>>> status after it is handled, not before")
>>>
>>> This change re-applies commit ca1658921b63 ("PCI: designware: Fix
>>> missing MSI IRQs")
>>>
>>> Signed-off-by: Stephen Warren <swarren@nvidia.com>
>>> Cc: Arnd Bergmann <arnd@arndb.de>
>>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>>> Cc: Faiz Abbas <faiz_abbas@ti.com>
>>> Cc: Harro Haan <hrhaan@gmail.com>
>>> Cc: Jingoo Han <jg1.han@samsung.com>
>>> Cc: Joao Pinto <jpinto@synopsys.com>
>>> Cc: Juergen Beisert <jbe@pengutronix.de>
>>> Cc: Marek Vasut <marex@denx.de>
>>> Cc: Matthias Mann <m.mann@arkona-technologies.de>
>>> Cc: Mohit Kumar <mohit.kumar@st.com>
>>> Cc: Pratyush Anand <pratyush.anand@st.com>
>>> Cc: Richard Zhu <hong-xing.zhu@freescale.com>
>>> Cc: Sean Cross <xobs@kosagi.com>
>>> Cc: Shawn Guo <shawn.guo@linaro.org>
>>> Cc: Siva Reddy Kallam <siva.kallam@samsung.com>
>>> Cc: Srikanth T Shivanand <ts.srikanth@samsung.com>
>>> Cc: Tim Harvey <tharvey@gateworks.com>
>>> ---
>>> Note: This issue was found in downstream NVIDIA 4.9 and 4.14 kernels.
>>> However, the exact same code structure is present in mainline and I have
>>> no reason to believe the problem would not reproduce there. I have
>>> compile tested but not runtime tested it in mainline, since my board is
>>> not yet supported in mainline.
>>> ---
>>>   drivers/pci/controller/dwc/pcie-designware-host.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
>>> index 692dd1b264fb..7fd6c56a6f35 100644
>>> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
>>> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
>>> @@ -98,10 +98,10 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
>>>   			irq = irq_find_mapping(pp->irq_domain,
>>>   					       (i * MAX_MSI_IRQS_PER_CTRL) +
>>>   					       pos);
>>> -			generic_handle_irq(irq);
>>>   			dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS +
>>>   						(i * MSI_REG_CTRL_BLOCK_SIZE),
>>>   					    4, 1 << pos);
>>> +			generic_handle_irq(irq);
>>>   			pos++;
>>>   		}
>>>   	}
>>
>> This fix was already suggested by Trent Piepho, however, after review by Marc
>> Zyngier and some information obtained from Synopsys IP development team, the
>> following set of patches was generated to correct the problem.
>>
>> 830920e065e9 ("PCI: dwc: Use interrupt masking instead of disabling")
>> fce5423e4f43 ("PCI: dwc: Take lock when ACKing an interrupt")
>> 3f7bb2ec20ce ("PCI: dwc: Move interrupt acking into the proper callback")
> 
> OK. I assume that dw_pci_bottom_ack() gets called before 
> dw_handle_msi_irq() for each IRQ? In which case, the last of those 3 

Yes, the dw_pci_bottom_ack() gets called before dw_handle_msi_irq().

> changes is essentially identical to the patch I proposed, it's just that 
> the acking is handled in a separate function rather than earlier in the 
> same function.
> 

No, there is also two other fixes. A missing lock/unlock on the ack and the
correct use of the MASK/UNMASK interrupt register instead of ENABLE interrupt
register.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-04 21:45 [PATCH] PCI: dwc: fix MSI IRQ handler ordering Stephen Warren
2019-01-04 22:45 ` Trent Piepho
2019-01-04 23:00   ` Stephen Warren
2019-01-04 23:23     ` Stephen Warren
2019-01-05  0:15       ` Trent Piepho
2019-01-07  9:36 ` Gustavo Pimentel
2019-01-07 17:16   ` Stephen Warren
2019-01-07 18:09     ` Gustavo Pimentel

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org linux-pci@archiver.kernel.org
	public-inbox-index linux-pci


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/ public-inbox