linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] dra7xx: PCIe IRQ handling rework
@ 2018-02-09 12:04 Vignesh R
  2018-02-09 12:04 ` [PATCH 1/3] Revert "PCI: dwc: Clear MSI interrupt status after it is handled, not before" Vignesh R
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Vignesh R @ 2018-02-09 12:04 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Jingoo Han, Joao Pinto
  Cc: Kishon Vijay Abraham I, Bjorn Helgaas, Niklas Cassel, linux-omap,
	linux-pci, linux-kernel, Vignesh R

This series re-works dra7xx MSI IRQ handler to avoid issues of missing
of MSI IRQs and Legacy IRQs.

Tested on AM572 EVM with TUSB PCIe USB card and Intel PCIe WiFi cards.

Vignesh R (3):
  Revert "PCI: dwc: Clear MSI interrupt status after it is handled, not
    before"
  PCI: dwc: pci-dra7xx: Improve MSI IRQ handling
  PCI: dwc: pci-dra7xx: Handle legacy and MSI IRQs together

 drivers/pci/dwc/pci-dra7xx.c           | 32 +++++++++++++++++++++-----------
 drivers/pci/dwc/pcie-designware-host.c |  2 +-
 2 files changed, 22 insertions(+), 12 deletions(-)

-- 
2.16.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] Revert "PCI: dwc: Clear MSI interrupt status after it is handled, not before"
  2018-02-09 12:04 [PATCH 0/3] dra7xx: PCIe IRQ handling rework Vignesh R
@ 2018-02-09 12:04 ` Vignesh R
  2018-02-09 13:46   ` Niklas Cassel
  2018-02-09 12:04 ` [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling Vignesh R
  2018-02-09 12:04 ` [PATCH 3/3] PCI: dwc: pci-dra7xx: Handle legacy and MSI IRQs together Vignesh R
  2 siblings, 1 reply; 12+ messages in thread
From: Vignesh R @ 2018-02-09 12:04 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Jingoo Han, Joao Pinto
  Cc: Kishon Vijay Abraham I, Bjorn Helgaas, Niklas Cassel, linux-omap,
	linux-pci, linux-kernel, Vignesh R

Since commit 06e15e6883bed ("PCI: dwc: Clear MSI interrupt status after
it is handled, not before"), MSI IRQ status in PCIE_MSI_INTR0_STATUS
register is cleared after calling EP's IRQ handler.
But, MSI IRQs in case of PCIe are bit like edge interrupts. If a
another MSI IRQ is raised and the end of current EP's IRQ handler call
but before clearing the MSI IRQ status in PCIE_MSI_INTR0_STATUS register
then the new MSI IRQ is lost.

This issue has been observed in case of USB-Ethernet adapter connected
to PCIe USB card running iperf3 test. iperf3 client stalls after
sometime as reported here[1]. This is because XHCI raises MSI IRQ at the
end of its IRQ handler but before PCIe driver has cleared its status
in PCIE_MSI_INTR0_STATUS register. Hence, the new IRQ is never
registered by PCIe designware core. Since, XHCI does not raise any more
IRQs until the previous one is handled, it leads to a stall in
communication.
Therefore driver should always clear the MSI IRQ status in
PCIE_MSI_INTR0_STATUS before calling EP's IRQ handler. This make sure
that DW PCIe core catches new IRQ raised during the call to EP's IRQ
handler

[1] https://lkml.kernel.org/r/BN6PR18MB124994D85EAC4B5B1AD5EC56866E0@BN6PR18MB1249.namprd18.prod.outlook.com

Signed-off-by: Vignesh R <vigneshr@ti.com>
---
 drivers/pci/dwc/pcie-designware-host.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/dwc/pcie-designware-host.c b/drivers/pci/dwc/pcie-designware-host.c
index 8de2d5c69b1d..c29cbcd430f4 100644
--- a/drivers/pci/dwc/pcie-designware-host.c
+++ b/drivers/pci/dwc/pcie-designware-host.c
@@ -68,9 +68,9 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
 		while ((pos = find_next_bit((unsigned long *) &val, 32,
 					    pos)) != 32) {
 			irq = irq_find_mapping(pp->irq_domain, i * 32 + pos);
-			generic_handle_irq(irq);
 			dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS + i * 12,
 					    4, 1 << pos);
+			generic_handle_irq(irq);
 			pos++;
 		}
 	}
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling
  2018-02-09 12:04 [PATCH 0/3] dra7xx: PCIe IRQ handling rework Vignesh R
  2018-02-09 12:04 ` [PATCH 1/3] Revert "PCI: dwc: Clear MSI interrupt status after it is handled, not before" Vignesh R
@ 2018-02-09 12:04 ` Vignesh R
  2018-02-12 17:58   ` Lorenzo Pieralisi
  2018-02-09 12:04 ` [PATCH 3/3] PCI: dwc: pci-dra7xx: Handle legacy and MSI IRQs together Vignesh R
  2 siblings, 1 reply; 12+ messages in thread
From: Vignesh R @ 2018-02-09 12:04 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Jingoo Han, Joao Pinto
  Cc: Kishon Vijay Abraham I, Bjorn Helgaas, Niklas Cassel, linux-omap,
	linux-pci, linux-kernel, Vignesh R

We need to ensure that there are no pending MSI IRQ vector set (i.e
PCIE_MSI_INTR0_STATUS reads 0 at least once) before exiting
dra7xx_pcie_msi_irq_handler(). Else, the dra7xx PCIe wrapper will not
register new MSI IRQs even though PCIE_MSI_INTR0_STATUS shows IRQs are
pending. Therefore, keep calling dra7xx_pcie_msi_irq_handler() until it
returns IRQ_NONE, which suggests that PCIE_MSI_INTR0_STATUS is 0.

This fixes a bug, where PCIe wifi cards with 4 DMA queues like Intel
8260 used to throw following error and stall during ping/iperf3 tests.

[   97.776310] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.

Signed-off-by: Vignesh R <vigneshr@ti.com>
---
 drivers/pci/dwc/pci-dra7xx.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
index ed8558d638e5..3420cbf7b60a 100644
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -254,14 +254,31 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
 	struct dra7xx_pcie *dra7xx = arg;
 	struct dw_pcie *pci = dra7xx->pci;
 	struct pcie_port *pp = &pci->pp;
+	int count = 0;
 	unsigned long reg;
 	u32 virq, bit;
 
 	reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
+	dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
 
 	switch (reg) {
 	case MSI:
-		dw_handle_msi_irq(pp);
+		/*
+		 * Need to make sure no MSI IRQs are pending before
+		 * exiting handler, else the wrapper will not catch new
+		 * IRQs. So loop around till dw_handle_msi_irq() returns
+		 * IRQ_NONE
+		 */
+		while (dw_handle_msi_irq(pp) != IRQ_NONE && count < 1000)
+			count++;
+
+		if (count == 1000) {
+			dev_err(pci->dev, "too much work in msi irq\n");
+			dra7xx_pcie_writel(dra7xx,
+					   PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI,
+					   reg);
+			return IRQ_HANDLED;
+		}
 		break;
 	case INTA:
 	case INTB:
@@ -275,8 +292,6 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
 		break;
 	}
 
-	dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
-
 	return IRQ_HANDLED;
 }
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] PCI: dwc: pci-dra7xx: Handle legacy and MSI IRQs together
  2018-02-09 12:04 [PATCH 0/3] dra7xx: PCIe IRQ handling rework Vignesh R
  2018-02-09 12:04 ` [PATCH 1/3] Revert "PCI: dwc: Clear MSI interrupt status after it is handled, not before" Vignesh R
  2018-02-09 12:04 ` [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling Vignesh R
@ 2018-02-09 12:04 ` Vignesh R
  2 siblings, 0 replies; 12+ messages in thread
From: Vignesh R @ 2018-02-09 12:04 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Jingoo Han, Joao Pinto
  Cc: Kishon Vijay Abraham I, Bjorn Helgaas, Niklas Cassel, linux-omap,
	linux-pci, linux-kernel, Vignesh R

Currently, pci-dra7xx driver handles MSI and Legacy IRQs exclusive of
each other. This is not true, as there maybe both legacy and MSI IRQs
raised at the same time. Fix this by making sure that driver handles
both MSI and legacy IRQs in the IRQ handler.

Signed-off-by: Vignesh R <vigneshr@ti.com>
---
 drivers/pci/dwc/pci-dra7xx.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
index 3420cbf7b60a..1ddb62620acc 100644
--- a/drivers/pci/dwc/pci-dra7xx.c
+++ b/drivers/pci/dwc/pci-dra7xx.c
@@ -261,8 +261,7 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
 	reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
 	dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
 
-	switch (reg) {
-	case MSI:
+	if (reg & MSI) {
 		/*
 		 * Need to make sure no MSI IRQs are pending before
 		 * exiting handler, else the wrapper will not catch new
@@ -279,17 +278,13 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
 					   reg);
 			return IRQ_HANDLED;
 		}
-		break;
-	case INTA:
-	case INTB:
-	case INTC:
-	case INTD:
+	}
+	if (reg & LEG_EP_INTERRUPTS) {
 		for_each_set_bit(bit, &reg, PCI_NUM_INTX) {
 			virq = irq_find_mapping(dra7xx->irq_domain, bit);
 			if (virq)
 				generic_handle_irq(virq);
 		}
-		break;
 	}
 
 	return IRQ_HANDLED;
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] Revert "PCI: dwc: Clear MSI interrupt status after it is handled, not before"
  2018-02-09 12:04 ` [PATCH 1/3] Revert "PCI: dwc: Clear MSI interrupt status after it is handled, not before" Vignesh R
@ 2018-02-09 13:46   ` Niklas Cassel
  0 siblings, 0 replies; 12+ messages in thread
From: Niklas Cassel @ 2018-02-09 13:46 UTC (permalink / raw)
  To: Vignesh R
  Cc: Lorenzo Pieralisi, Jingoo Han, Joao Pinto,
	Kishon Vijay Abraham I, Bjorn Helgaas, linux-omap, linux-pci,
	linux-kernel

On Fri, Feb 09, 2018 at 05:34:13PM +0530, Vignesh R wrote:
> Since commit 06e15e6883bed ("PCI: dwc: Clear MSI interrupt status after
> it is handled, not before"), MSI IRQ status in PCIE_MSI_INTR0_STATUS
> register is cleared after calling EP's IRQ handler.

Small nit, the SHA1 is actually:
8c934095fa2f ("PCI: dwc: Clear MSI interrupt status after it is handled, not before")

Acked-by: Niklas Cassel <niklas.cassel@axis.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling
  2018-02-09 12:04 ` [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling Vignesh R
@ 2018-02-12 17:58   ` Lorenzo Pieralisi
  2018-02-15  4:29     ` Vignesh R
  0 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Pieralisi @ 2018-02-12 17:58 UTC (permalink / raw)
  To: Vignesh R
  Cc: Jingoo Han, Joao Pinto, Kishon Vijay Abraham I, Bjorn Helgaas,
	Niklas Cassel, linux-omap, linux-pci, linux-kernel

On Fri, Feb 09, 2018 at 05:34:14PM +0530, Vignesh R wrote:
> We need to ensure that there are no pending MSI IRQ vector set (i.e
> PCIE_MSI_INTR0_STATUS reads 0 at least once) before exiting
> dra7xx_pcie_msi_irq_handler(). Else, the dra7xx PCIe wrapper will not
> register new MSI IRQs even though PCIE_MSI_INTR0_STATUS shows IRQs are
> pending. Therefore, keep calling dra7xx_pcie_msi_irq_handler() until it
> returns IRQ_NONE, which suggests that PCIE_MSI_INTR0_STATUS is 0.
> 
> This fixes a bug, where PCIe wifi cards with 4 DMA queues like Intel
> 8260 used to throw following error and stall during ping/iperf3 tests.
> 
> [   97.776310] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.
> 
> Signed-off-by: Vignesh R <vigneshr@ti.com>
> ---
>  drivers/pci/dwc/pci-dra7xx.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
> index ed8558d638e5..3420cbf7b60a 100644
> --- a/drivers/pci/dwc/pci-dra7xx.c
> +++ b/drivers/pci/dwc/pci-dra7xx.c
> @@ -254,14 +254,31 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
>  	struct dra7xx_pcie *dra7xx = arg;
>  	struct dw_pcie *pci = dra7xx->pci;
>  	struct pcie_port *pp = &pci->pp;
> +	int count = 0;
>  	unsigned long reg;
>  	u32 virq, bit;
>  
>  	reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
> +	dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
>  
>  	switch (reg) {
>  	case MSI:
> -		dw_handle_msi_irq(pp);
> +		/*
> +		 * Need to make sure no MSI IRQs are pending before
> +		 * exiting handler, else the wrapper will not catch new
> +		 * IRQs. So loop around till dw_handle_msi_irq() returns
> +		 * IRQ_NONE
> +		 */
> +		while (dw_handle_msi_irq(pp) != IRQ_NONE && count < 1000)
> +			count++;
> +
> +		if (count == 1000) {
> +			dev_err(pci->dev, "too much work in msi irq\n");
> +			dra7xx_pcie_writel(dra7xx,
> +					   PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI,
> +					   reg);
> +			return IRQ_HANDLED;

I am not merging any code patching this IRQ handling routine anymore
unless you thoroughly explain to me how this CONF_IRQSTATUS_MSI register
works (and how it is related to DW registers) and why this specific host
controller needs handling that is not required by any other host
controller relying on dw_handle_msi_irq().

I suspect there is a code design flaw with the way this host handles
IRQs and we are going to find it and fix it the way it should, not with
any plaster like this patch.

Lorenzo

> +		}
>  		break;
>  	case INTA:
>  	case INTB:
> @@ -275,8 +292,6 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
>  		break;
>  	}
>  
> -	dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
> -
>  	return IRQ_HANDLED;
>  }
>  
> -- 
> 2.16.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling
  2018-02-12 17:58   ` Lorenzo Pieralisi
@ 2018-02-15  4:29     ` Vignesh R
  2018-03-01 15:31       ` Vignesh R
  2018-03-06 15:12       ` Lorenzo Pieralisi
  0 siblings, 2 replies; 12+ messages in thread
From: Vignesh R @ 2018-02-15  4:29 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Jingoo Han, Joao Pinto, KISHON VIJAY ABRAHAM, Bjorn Helgaas,
	Niklas Cassel, linux-omap, linux-pci, linux-kernel

Hi,

On Monday 12 February 2018 11:28 PM, Lorenzo Pieralisi wrote:
> On Fri, Feb 09, 2018 at 05:34:14PM +0530, Vignesh R wrote:
>> We need to ensure that there are no pending MSI IRQ vector set (i.e
>> PCIE_MSI_INTR0_STATUS reads 0 at least once) before exiting
>> dra7xx_pcie_msi_irq_handler(). Else, the dra7xx PCIe wrapper will not
>> register new MSI IRQs even though PCIE_MSI_INTR0_STATUS shows IRQs are
>> pending. Therefore, keep calling dra7xx_pcie_msi_irq_handler() until it
>> returns IRQ_NONE, which suggests that PCIE_MSI_INTR0_STATUS is 0.
>> 
>> This fixes a bug, where PCIe wifi cards with 4 DMA queues like Intel
>> 8260 used to throw following error and stall during ping/iperf3 tests.
>> 
>> [   97.776310] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.
>> 
>> Signed-off-by: Vignesh R <vigneshr@ti.com>
>> ---
>>  drivers/pci/dwc/pci-dra7xx.c | 21 ++++++++++++++++++---
>>  1 file changed, 18 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
>> index ed8558d638e5..3420cbf7b60a 100644
>> --- a/drivers/pci/dwc/pci-dra7xx.c
>> +++ b/drivers/pci/dwc/pci-dra7xx.c
>> @@ -254,14 +254,31 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
>>        struct dra7xx_pcie *dra7xx = arg;
>>        struct dw_pcie *pci = dra7xx->pci;
>>        struct pcie_port *pp = &pci->pp;
>> +     int count = 0;
>>        unsigned long reg;
>>        u32 virq, bit;
>>  
>>        reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
>> +     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
>>  
>>        switch (reg) {
>>        case MSI:
>> -             dw_handle_msi_irq(pp);
>> +             /*
>> +              * Need to make sure no MSI IRQs are pending before
>> +              * exiting handler, else the wrapper will not catch new
>> +              * IRQs. So loop around till dw_handle_msi_irq() returns
>> +              * IRQ_NONE
>> +              */
>> +             while (dw_handle_msi_irq(pp) != IRQ_NONE && count < 1000)
>> +                     count++;
>> +
>> +             if (count == 1000) {
>> +                     dev_err(pci->dev, "too much work in msi irq\n");
>> +                     dra7xx_pcie_writel(dra7xx,
>> +                                        PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI,
>> +                                        reg);
>> +                     return IRQ_HANDLED;
> 
> I am not merging any code patching this IRQ handling routine anymore
> unless you thoroughly explain to me how this CONF_IRQSTATUS_MSI register
> works (and how it is related to DW registers) and why this specific host
> controller needs handling that is not required by any other host
> controller relying on dw_handle_msi_irq().

Unlike other DW PCIe controllers, TI implementation has a wrapper on top
of DW core. This wrapper latches the DW core level MSI and legacy
interrupts and then propagates it to GIC.
PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI register is present in this TI
wrapper which aggregates all the MSI IRQs(PCIE_MSI_INTR0_STATUS) of DW
level. They are mapped on the MSI interrupt line of PCIe controller,
using a single status bit in the PCIECTRL_TI_CONF_IRQSTATUS_MSI register.

So, the irq handler, dra7xx_pcie_msi_irq_handler(), first needs to look
at PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI[4] to know that its MSI IRQ and
then call dw_handle_msi_irq() to handle individual MSI vectors.
Driver has to make sure there are no pending vectors in DW core MSI
status register before exiting handler. Otherwise next MSI IRQ will not
be latched by the wrapper.


> 
> I suspect there is a code design flaw with the way this host handles
> IRQs and we are going to find it and fix it the way it should, not with
> any plaster like this patch.
> 

I agree there has been some churn wrt this wrapper level IRQ handler.
But, that was because hardware documentation/TRM did not match
actual behavior and so it took some time to understand how the
hardware is working. I have extensively tested this series on multiple
problematic PCIe USB cards and PCIe WiFi cards over week long stress
tests. And also had some agreement with internal hardware designers.
Hardware documentations will also be updated.


> Lorenzo
> 
>> +             }
>>                break;
>>        case INTA:
>>        case INTB:
>> @@ -275,8 +292,6 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
>>                break;
>>        }
>>  
>> -     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
>> -
>>        return IRQ_HANDLED;
>>  }
>>  
>> -- 
>> 2.16.1
>> 

-- 
Regards
Vignesh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling
  2018-02-15  4:29     ` Vignesh R
@ 2018-03-01 15:31       ` Vignesh R
  2018-03-01 18:18         ` Lorenzo Pieralisi
  2018-03-06 15:12       ` Lorenzo Pieralisi
  1 sibling, 1 reply; 12+ messages in thread
From: Vignesh R @ 2018-03-01 15:31 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Jingoo Han, Joao Pinto, KISHON VIJAY ABRAHAM, Bjorn Helgaas,
	Niklas Cassel, linux-omap, linux-pci, linux-kernel

Hi Lorenzo,

On 15-Feb-18 9:59 AM, Vignesh R wrote:
> Hi,
> 
> On Monday 12 February 2018 11:28 PM, Lorenzo Pieralisi wrote:
>> On Fri, Feb 09, 2018 at 05:34:14PM +0530, Vignesh R wrote:
>>> We need to ensure that there are no pending MSI IRQ vector set (i.e
>>> PCIE_MSI_INTR0_STATUS reads 0 at least once) before exiting
>>> dra7xx_pcie_msi_irq_handler(). Else, the dra7xx PCIe wrapper will not
>>> register new MSI IRQs even though PCIE_MSI_INTR0_STATUS shows IRQs are
>>> pending. Therefore, keep calling dra7xx_pcie_msi_irq_handler() until it
>>> returns IRQ_NONE, which suggests that PCIE_MSI_INTR0_STATUS is 0.
>>>
>>> This fixes a bug, where PCIe wifi cards with 4 DMA queues like Intel
>>> 8260 used to throw following error and stall during ping/iperf3 tests.
>>>
>>> [   97.776310] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.
>>>
>>> Signed-off-by: Vignesh R <vigneshr@ti.com>
>>> ---
>>>   drivers/pci/dwc/pci-dra7xx.c | 21 ++++++++++++++++++---
>>>   1 file changed, 18 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
>>> index ed8558d638e5..3420cbf7b60a 100644
>>> --- a/drivers/pci/dwc/pci-dra7xx.c
>>> +++ b/drivers/pci/dwc/pci-dra7xx.c
>>> @@ -254,14 +254,31 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
>>>         struct dra7xx_pcie *dra7xx = arg;
>>>         struct dw_pcie *pci = dra7xx->pci;
>>>         struct pcie_port *pp = &pci->pp;
>>> +     int count = 0;
>>>         unsigned long reg;
>>>         u32 virq, bit;
>>>   
>>>         reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
>>> +     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
>>>   
>>>         switch (reg) {
>>>         case MSI:
>>> -             dw_handle_msi_irq(pp);
>>> +             /*
>>> +              * Need to make sure no MSI IRQs are pending before
>>> +              * exiting handler, else the wrapper will not catch new
>>> +              * IRQs. So loop around till dw_handle_msi_irq() returns
>>> +              * IRQ_NONE
>>> +              */
>>> +             while (dw_handle_msi_irq(pp) != IRQ_NONE && count < 1000)
>>> +                     count++;
>>> +
>>> +             if (count == 1000) {
>>> +                     dev_err(pci->dev, "too much work in msi irq\n");
>>> +                     dra7xx_pcie_writel(dra7xx,
>>> +                                        PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI,
>>> +                                        reg);
>>> +                     return IRQ_HANDLED;
>>
>> I am not merging any code patching this IRQ handling routine anymore
>> unless you thoroughly explain to me how this CONF_IRQSTATUS_MSI register
>> works (and how it is related to DW registers) and why this specific host
>> controller needs handling that is not required by any other host
>> controller relying on dw_handle_msi_irq().
> 
> Unlike other DW PCIe controllers, TI implementation has a wrapper on top
> of DW core. This wrapper latches the DW core level MSI and legacy
> interrupts and then propagates it to GIC.
> PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI register is present in this TI
> wrapper which aggregates all the MSI IRQs(PCIE_MSI_INTR0_STATUS) of DW
> level. They are mapped on the MSI interrupt line of PCIe controller,
> using a single status bit in the PCIECTRL_TI_CONF_IRQSTATUS_MSI register.
> 
> So, the irq handler, dra7xx_pcie_msi_irq_handler(), first needs to look
> at PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI[4] to know that its MSI IRQ and
> then call dw_handle_msi_irq() to handle individual MSI vectors.
> Driver has to make sure there are no pending vectors in DW core MSI
> status register before exiting handler. Otherwise next MSI IRQ will not
> be latched by the wrapper.
> 
> 

Did above explanation clarify CONF_IRQSTATUS_MSI register usage and the
need for IRQ handler? Are you okay with this fix?

Regards
Vignesh

>>
>> I suspect there is a code design flaw with the way this host handles
>> IRQs and we are going to find it and fix it the way it should, not with
>> any plaster like this patch.
>>
> 
> I agree there has been some churn wrt this wrapper level IRQ handler.
> But, that was because hardware documentation/TRM did not match
> actual behavior and so it took some time to understand how the
> hardware is working. I have extensively tested this series on multiple
> problematic PCIe USB cards and PCIe WiFi cards over week long stress
> tests. And also had some agreement with internal hardware designers.
> Hardware documentations will also be updated.
> 
> 
>> Lorenzo
>>
>>> +             }
>>>                 break;
>>>         case INTA:
>>>         case INTB:
>>> @@ -275,8 +292,6 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
>>>                 break;
>>>         }
>>>   
>>> -     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
>>> -
>>>         return IRQ_HANDLED;
>>>   }
>>>   
>>> -- 
>>> 2.16.1
>>>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling
  2018-03-01 15:31       ` Vignesh R
@ 2018-03-01 18:18         ` Lorenzo Pieralisi
  0 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Pieralisi @ 2018-03-01 18:18 UTC (permalink / raw)
  To: Vignesh R
  Cc: Jingoo Han, Joao Pinto, KISHON VIJAY ABRAHAM, Bjorn Helgaas,
	Niklas Cassel, linux-omap, linux-pci, linux-kernel

On Thu, Mar 01, 2018 at 09:01:53PM +0530, Vignesh R wrote:
> Hi Lorenzo,
> 
> On 15-Feb-18 9:59 AM, Vignesh R wrote:
> > Hi,
> > 
> > On Monday 12 February 2018 11:28 PM, Lorenzo Pieralisi wrote:
> >> On Fri, Feb 09, 2018 at 05:34:14PM +0530, Vignesh R wrote:
> >>> We need to ensure that there are no pending MSI IRQ vector set (i.e
> >>> PCIE_MSI_INTR0_STATUS reads 0 at least once) before exiting
> >>> dra7xx_pcie_msi_irq_handler(). Else, the dra7xx PCIe wrapper will not
> >>> register new MSI IRQs even though PCIE_MSI_INTR0_STATUS shows IRQs are
> >>> pending. Therefore, keep calling dra7xx_pcie_msi_irq_handler() until it
> >>> returns IRQ_NONE, which suggests that PCIE_MSI_INTR0_STATUS is 0.
> >>>
> >>> This fixes a bug, where PCIe wifi cards with 4 DMA queues like Intel
> >>> 8260 used to throw following error and stall during ping/iperf3 tests.
> >>>
> >>> [   97.776310] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.
> >>>
> >>> Signed-off-by: Vignesh R <vigneshr@ti.com>
> >>> ---
> >>>   drivers/pci/dwc/pci-dra7xx.c | 21 ++++++++++++++++++---
> >>>   1 file changed, 18 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
> >>> index ed8558d638e5..3420cbf7b60a 100644
> >>> --- a/drivers/pci/dwc/pci-dra7xx.c
> >>> +++ b/drivers/pci/dwc/pci-dra7xx.c
> >>> @@ -254,14 +254,31 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
> >>>         struct dra7xx_pcie *dra7xx = arg;
> >>>         struct dw_pcie *pci = dra7xx->pci;
> >>>         struct pcie_port *pp = &pci->pp;
> >>> +     int count = 0;
> >>>         unsigned long reg;
> >>>         u32 virq, bit;
> >>>   
> >>>         reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
> >>> +     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
> >>>   
> >>>         switch (reg) {
> >>>         case MSI:
> >>> -             dw_handle_msi_irq(pp);
> >>> +             /*
> >>> +              * Need to make sure no MSI IRQs are pending before
> >>> +              * exiting handler, else the wrapper will not catch new
> >>> +              * IRQs. So loop around till dw_handle_msi_irq() returns
> >>> +              * IRQ_NONE
> >>> +              */
> >>> +             while (dw_handle_msi_irq(pp) != IRQ_NONE && count < 1000)
> >>> +                     count++;
> >>> +
> >>> +             if (count == 1000) {
> >>> +                     dev_err(pci->dev, "too much work in msi irq\n");
> >>> +                     dra7xx_pcie_writel(dra7xx,
> >>> +                                        PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI,
> >>> +                                        reg);
> >>> +                     return IRQ_HANDLED;
> >>
> >> I am not merging any code patching this IRQ handling routine anymore
> >> unless you thoroughly explain to me how this CONF_IRQSTATUS_MSI register
> >> works (and how it is related to DW registers) and why this specific host
> >> controller needs handling that is not required by any other host
> >> controller relying on dw_handle_msi_irq().
> > 
> > Unlike other DW PCIe controllers, TI implementation has a wrapper on top
> > of DW core. This wrapper latches the DW core level MSI and legacy
> > interrupts and then propagates it to GIC.
> > PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI register is present in this TI
> > wrapper which aggregates all the MSI IRQs(PCIE_MSI_INTR0_STATUS) of DW
> > level. They are mapped on the MSI interrupt line of PCIe controller,
> > using a single status bit in the PCIECTRL_TI_CONF_IRQSTATUS_MSI register.
> > 
> > So, the irq handler, dra7xx_pcie_msi_irq_handler(), first needs to look
> > at PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI[4] to know that its MSI IRQ and
> > then call dw_handle_msi_irq() to handle individual MSI vectors.
> > Driver has to make sure there are no pending vectors in DW core MSI
> > status register before exiting handler. Otherwise next MSI IRQ will not
> > be latched by the wrapper.
> > 
> > 
> 
> Did above explanation clarify CONF_IRQSTATUS_MSI register usage and the
> need for IRQ handler? Are you okay with this fix?

Hi Vignesh,

I will get back to it shortly, thanks for your patience.

Lorenzo

> Regards
> Vignesh
> 
> >>
> >> I suspect there is a code design flaw with the way this host handles
> >> IRQs and we are going to find it and fix it the way it should, not with
> >> any plaster like this patch.
> >>
> > 
> > I agree there has been some churn wrt this wrapper level IRQ handler.
> > But, that was because hardware documentation/TRM did not match
> > actual behavior and so it took some time to understand how the
> > hardware is working. I have extensively tested this series on multiple
> > problematic PCIe USB cards and PCIe WiFi cards over week long stress
> > tests. And also had some agreement with internal hardware designers.
> > Hardware documentations will also be updated.
> > 
> > 
> >> Lorenzo
> >>
> >>> +             }
> >>>                 break;
> >>>         case INTA:
> >>>         case INTB:
> >>> @@ -275,8 +292,6 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
> >>>                 break;
> >>>         }
> >>>   
> >>> -     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
> >>> -
> >>>         return IRQ_HANDLED;
> >>>   }
> >>>   
> >>> -- 
> >>> 2.16.1
> >>>
> > 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling
  2018-02-15  4:29     ` Vignesh R
  2018-03-01 15:31       ` Vignesh R
@ 2018-03-06 15:12       ` Lorenzo Pieralisi
  2018-03-09  4:23         ` Vignesh R
  1 sibling, 1 reply; 12+ messages in thread
From: Lorenzo Pieralisi @ 2018-03-06 15:12 UTC (permalink / raw)
  To: Vignesh R
  Cc: Jingoo Han, Joao Pinto, KISHON VIJAY ABRAHAM, Bjorn Helgaas,
	Niklas Cassel, linux-omap, linux-pci, linux-kernel

On Thu, Feb 15, 2018 at 09:59:21AM +0530, Vignesh R wrote:
> Hi,
> 
> On Monday 12 February 2018 11:28 PM, Lorenzo Pieralisi wrote:
> > On Fri, Feb 09, 2018 at 05:34:14PM +0530, Vignesh R wrote:
> >> We need to ensure that there are no pending MSI IRQ vector set (i.e
> >> PCIE_MSI_INTR0_STATUS reads 0 at least once) before exiting
> >> dra7xx_pcie_msi_irq_handler(). Else, the dra7xx PCIe wrapper will not
> >> register new MSI IRQs even though PCIE_MSI_INTR0_STATUS shows IRQs are
> >> pending. Therefore, keep calling dra7xx_pcie_msi_irq_handler() until it
> >> returns IRQ_NONE, which suggests that PCIE_MSI_INTR0_STATUS is 0.
> >> 
> >> This fixes a bug, where PCIe wifi cards with 4 DMA queues like Intel
> >> 8260 used to throw following error and stall during ping/iperf3 tests.
> >> 
> >> [   97.776310] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.
> >> 
> >> Signed-off-by: Vignesh R <vigneshr@ti.com>
> >> ---
> >>  drivers/pci/dwc/pci-dra7xx.c | 21 ++++++++++++++++++---
> >>  1 file changed, 18 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
> >> index ed8558d638e5..3420cbf7b60a 100644
> >> --- a/drivers/pci/dwc/pci-dra7xx.c
> >> +++ b/drivers/pci/dwc/pci-dra7xx.c
> >> @@ -254,14 +254,31 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
> >>        struct dra7xx_pcie *dra7xx = arg;
> >>        struct dw_pcie *pci = dra7xx->pci;
> >>        struct pcie_port *pp = &pci->pp;
> >> +     int count = 0;
> >>        unsigned long reg;
> >>        u32 virq, bit;
> >>  
> >>        reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
> >> +     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
> >>  
> >>        switch (reg) {
> >>        case MSI:
> >> -             dw_handle_msi_irq(pp);
> >> +             /*
> >> +              * Need to make sure no MSI IRQs are pending before
> >> +              * exiting handler, else the wrapper will not catch new
> >> +              * IRQs. So loop around till dw_handle_msi_irq() returns
> >> +              * IRQ_NONE
> >> +              */
> >> +             while (dw_handle_msi_irq(pp) != IRQ_NONE && count < 1000)
> >> +                     count++;
> >> +
> >> +             if (count == 1000) {
> >> +                     dev_err(pci->dev, "too much work in msi irq\n");
> >> +                     dra7xx_pcie_writel(dra7xx,
> >> +                                        PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI,
> >> +                                        reg);
> >> +                     return IRQ_HANDLED;
> > 
> > I am not merging any code patching this IRQ handling routine anymore
> > unless you thoroughly explain to me how this CONF_IRQSTATUS_MSI register
> > works (and how it is related to DW registers) and why this specific host
> > controller needs handling that is not required by any other host
> > controller relying on dw_handle_msi_irq().
> 
> Unlike other DW PCIe controllers, TI implementation has a wrapper on top
> of DW core. This wrapper latches the DW core level MSI and legacy
> interrupts and then propagates it to GIC.
> PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI register is present in this TI
> wrapper which aggregates all the MSI IRQs(PCIE_MSI_INTR0_STATUS) of DW
> level. They are mapped on the MSI interrupt line of PCIe controller,
> using a single status bit in the PCIECTRL_TI_CONF_IRQSTATUS_MSI register.
> 
> So, the irq handler, dra7xx_pcie_msi_irq_handler(), first needs to look
> at PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI[4] to know that its MSI IRQ and
> then call dw_handle_msi_irq() to handle individual MSI vectors.
> Driver has to make sure there are no pending vectors in DW core MSI

How can it make *sure* ? And what makes the wrapper latch MSI IRQs
again ?

> status register before exiting handler. Otherwise next MSI IRQ will not
> be latched by the wrapper.

I am sorry but I do not understand how this works - what is the
condition that makes wrapper latch IRQs again ? This is at least
racy, if not outright broken.

That count == 1000 is a symptom there is something broken on how this
driver handles IRQs and I have the impression that we are applying
plasters on top of plasters to make it less broken than it actually is.

> > I suspect there is a code design flaw with the way this host handles
> > IRQs and we are going to find it and fix it the way it should, not with
> > any plaster like this patch.
> > 
> 
> I agree there has been some churn wrt this wrapper level IRQ handler.
> But, that was because hardware documentation/TRM did not match
> actual behavior and so it took some time to understand how the
> hardware is working.

How does HW work :) ? Please explain in detail how this works in HW
then we will get to the code.

Thanks,
Lorenzo

> I have extensively tested this series on multiple problematic PCIe USB
> cards and PCIe WiFi cards over week long stress tests. And also had
> some agreement with internal hardware designers.  Hardware
> documentations will also be updated.
> 
> 
> > Lorenzo
> > 
> >> +             }
> >>                break;
> >>        case INTA:
> >>        case INTB:
> >> @@ -275,8 +292,6 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
> >>                break;
> >>        }
> >>  
> >> -     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
> >> -
> >>        return IRQ_HANDLED;
> >>  }
> >>  
> >> -- 
> >> 2.16.1
> >> 
> 
> -- 
> Regards
> Vignesh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling
  2018-03-06 15:12       ` Lorenzo Pieralisi
@ 2018-03-09  4:23         ` Vignesh R
  2018-03-15 18:10           ` Lorenzo Pieralisi
  0 siblings, 1 reply; 12+ messages in thread
From: Vignesh R @ 2018-03-09  4:23 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Jingoo Han, Joao Pinto, KISHON VIJAY ABRAHAM, Bjorn Helgaas,
	Niklas Cassel, linux-omap, linux-pci, linux-kernel



On Tuesday 06 March 2018 08:42 PM, Lorenzo Pieralisi wrote:
> On Thu, Feb 15, 2018 at 09:59:21AM +0530, Vignesh R wrote:
>> Hi,
>>
>> On Monday 12 February 2018 11:28 PM, Lorenzo Pieralisi wrote:
>>> On Fri, Feb 09, 2018 at 05:34:14PM +0530, Vignesh R wrote:
>>>> We need to ensure that there are no pending MSI IRQ vector set (i.e
>>>> PCIE_MSI_INTR0_STATUS reads 0 at least once) before exiting
>>>> dra7xx_pcie_msi_irq_handler(). Else, the dra7xx PCIe wrapper will not
>>>> register new MSI IRQs even though PCIE_MSI_INTR0_STATUS shows IRQs are
>>>> pending. Therefore, keep calling dra7xx_pcie_msi_irq_handler() until it
>>>> returns IRQ_NONE, which suggests that PCIE_MSI_INTR0_STATUS is 0.
>>>>
>>>> This fixes a bug, where PCIe wifi cards with 4 DMA queues like Intel
>>>> 8260 used to throw following error and stall during ping/iperf3 tests.
>>>>
>>>> [   97.776310] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.
>>>>
>>>> Signed-off-by: Vignesh R <vigneshr@ti.com>
>>>> ---
>>>>   drivers/pci/dwc/pci-dra7xx.c | 21 ++++++++++++++++++---
>>>>   1 file changed, 18 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
>>>> index ed8558d638e5..3420cbf7b60a 100644
>>>> --- a/drivers/pci/dwc/pci-dra7xx.c
>>>> +++ b/drivers/pci/dwc/pci-dra7xx.c
>>>> @@ -254,14 +254,31 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
>>>>         struct dra7xx_pcie *dra7xx = arg;
>>>>         struct dw_pcie *pci = dra7xx->pci;
>>>>         struct pcie_port *pp = &pci->pp;
>>>> +     int count = 0;
>>>>         unsigned long reg;
>>>>         u32 virq, bit;
>>>>   
>>>>         reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
>>>> +     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
>>>>   
>>>>         switch (reg) {
>>>>         case MSI:
>>>> -             dw_handle_msi_irq(pp);
>>>> +             /*
>>>> +              * Need to make sure no MSI IRQs are pending before
>>>> +              * exiting handler, else the wrapper will not catch new
>>>> +              * IRQs. So loop around till dw_handle_msi_irq() returns
>>>> +              * IRQ_NONE
>>>> +              */
>>>> +             while (dw_handle_msi_irq(pp) != IRQ_NONE && count < 1000)
>>>> +                     count++;
>>>> +
>>>> +             if (count == 1000) {
>>>> +                     dev_err(pci->dev, "too much work in msi irq\n");
>>>> +                     dra7xx_pcie_writel(dra7xx,
>>>> +                                        PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI,
>>>> +                                        reg);
>>>> +                     return IRQ_HANDLED;
>>>
>>> I am not merging any code patching this IRQ handling routine anymore
>>> unless you thoroughly explain to me how this CONF_IRQSTATUS_MSI register
>>> works (and how it is related to DW registers) and why this specific host
>>> controller needs handling that is not required by any other host
>>> controller relying on dw_handle_msi_irq().
>>
>> Unlike other DW PCIe controllers, TI implementation has a wrapper on top
>> of DW core. This wrapper latches the DW core level MSI and legacy
>> interrupts and then propagates it to GIC.
>> PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI register is present in this TI
>> wrapper which aggregates all the MSI IRQs(PCIE_MSI_INTR0_STATUS) of DW
>> level. They are mapped on the MSI interrupt line of PCIe controller,
>> using a single status bit in the PCIECTRL_TI_CONF_IRQSTATUS_MSI register.
>>
>> So, the irq handler, dra7xx_pcie_msi_irq_handler(), first needs to look
>> at PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI[4] to know that its MSI IRQ and
>> then call dw_handle_msi_irq() to handle individual MSI vectors.
>> Driver has to make sure there are no pending vectors in DW core MSI
> 
> How can it make *sure* ? And what makes the wrapper latch MSI IRQs
> again ?
> 

This is the sequence that I got from discussion with internal HW team:
 1. read CONF_IRQSTATUS_MSI in wrapper and check if MSI bit
 2. clear CONF_IRQSTATUS_MSI
 3. read, clear and handle PCIE_MSI_INTR0_STATUS vectors
 4. repeat step 3 until PCIE_MSI_INTR0_STATUS reads 0

If read of PCIE_MSI_INTR0_STATUS returns 0 at least once, then its
guaranteed that the next time any vector is set in PCIE_MSI_INTR0_STATUS
register(due to MSI IRQ), wrapper will latch it and raise an IRQ to CPU.


>> status register before exiting handler. Otherwise next MSI IRQ will not
>> be latched by the wrapper.
> 
> I am sorry but I do not understand how this works - what is the
> condition that makes wrapper latch IRQs again ? This is at least
> racy, if not outright broken.
> 
> That count == 1000 is a symptom there is something broken on how this
> driver handles IRQs and I have the impression that we are applying
> plasters on top of plasters to make it less broken than it actually is.
> 

It is an upper bound on how many times driver looks at
PCIE_MSI_INTR0_STATUS register, so that there is no infinite looping
when there is an IRQ flood due to misbehaving EP. count == 1000
condition should not happen and it means something is wrong in the
system.  I haven't hit this situation in testing
I can either remove this or put a WARN_ON to say this situation should
not have happened, if that makes you more comfortable with the patch.


>>> I suspect there is a code design flaw with the way this host handles
>>> IRQs and we are going to find it and fix it the way it should, not with
>>> any plaster like this patch.
>>>
>>
>> I agree there has been some churn wrt this wrapper level IRQ handler.
>> But, that was because hardware documentation/TRM did not match
>> actual behavior and so it took some time to understand how the
>> hardware is working.
> 
> How does HW work :) ? Please explain in detail how this works in HW
> then we will get to the code.
> 

Software needs to ensure that PCIE_MSI_INTR0_STATUS needs be 0 by
reading it. Then, when the next MSI IRQ is raised  CONF_IRQSTATUS_MSI
register in the wrapper will latch the next IRQ.

This is my current knowledge, let me know if you need to know anything
specifically, I will try to ask HW team.

Regards
Vignesh

> Thanks,
> Lorenzo
> 
>> I have extensively tested this series on multiple problematic PCIe USB
>> cards and PCIe WiFi cards over week long stress tests. And also had
>> some agreement with internal hardware designers.  Hardware
>> documentations will also be updated.
>>
>>
>>> Lorenzo
>>>
>>>> +             }
>>>>                 break;
>>>>         case INTA:
>>>>         case INTB:
>>>> @@ -275,8 +292,6 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
>>>>                 break;
>>>>         }
>>>>   
>>>> -     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
>>>> -
>>>>         return IRQ_HANDLED;
>>>>   }
>>>>   
>>>> -- 
>>>> 2.16.1
>>>>
>>
>> -- 
>> Regards
>> Vignesh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling
  2018-03-09  4:23         ` Vignesh R
@ 2018-03-15 18:10           ` Lorenzo Pieralisi
  0 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Pieralisi @ 2018-03-15 18:10 UTC (permalink / raw)
  To: Vignesh R
  Cc: Jingoo Han, Joao Pinto, KISHON VIJAY ABRAHAM, Bjorn Helgaas,
	Niklas Cassel, linux-omap, linux-pci, linux-kernel,
	gustavo.pimentel

[CC'ed Gustavo]

On Fri, Mar 09, 2018 at 09:53:24AM +0530, Vignesh R wrote:
> 
> 
> On Tuesday 06 March 2018 08:42 PM, Lorenzo Pieralisi wrote:
> > On Thu, Feb 15, 2018 at 09:59:21AM +0530, Vignesh R wrote:
> >> Hi,
> >>
> >> On Monday 12 February 2018 11:28 PM, Lorenzo Pieralisi wrote:
> >>> On Fri, Feb 09, 2018 at 05:34:14PM +0530, Vignesh R wrote:
> >>>> We need to ensure that there are no pending MSI IRQ vector set (i.e
> >>>> PCIE_MSI_INTR0_STATUS reads 0 at least once) before exiting
> >>>> dra7xx_pcie_msi_irq_handler(). Else, the dra7xx PCIe wrapper will not
> >>>> register new MSI IRQs even though PCIE_MSI_INTR0_STATUS shows IRQs are
> >>>> pending. Therefore, keep calling dra7xx_pcie_msi_irq_handler() until it
> >>>> returns IRQ_NONE, which suggests that PCIE_MSI_INTR0_STATUS is 0.
> >>>>
> >>>> This fixes a bug, where PCIe wifi cards with 4 DMA queues like Intel
> >>>> 8260 used to throw following error and stall during ping/iperf3 tests.
> >>>>
> >>>> [   97.776310] iwlwifi 0000:01:00.0: Queue 9 stuck for 2500 ms.
> >>>>
> >>>> Signed-off-by: Vignesh R <vigneshr@ti.com>
> >>>> ---
> >>>>   drivers/pci/dwc/pci-dra7xx.c | 21 ++++++++++++++++++---
> >>>>   1 file changed, 18 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c
> >>>> index ed8558d638e5..3420cbf7b60a 100644
> >>>> --- a/drivers/pci/dwc/pci-dra7xx.c
> >>>> +++ b/drivers/pci/dwc/pci-dra7xx.c
> >>>> @@ -254,14 +254,31 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
> >>>>         struct dra7xx_pcie *dra7xx = arg;
> >>>>         struct dw_pcie *pci = dra7xx->pci;
> >>>>         struct pcie_port *pp = &pci->pp;
> >>>> +     int count = 0;
> >>>>         unsigned long reg;
> >>>>         u32 virq, bit;
> >>>>   
> >>>>         reg = dra7xx_pcie_readl(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI);
> >>>> +     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
> >>>>   
> >>>>         switch (reg) {
> >>>>         case MSI:
> >>>> -             dw_handle_msi_irq(pp);
> >>>> +             /*
> >>>> +              * Need to make sure no MSI IRQs are pending before
> >>>> +              * exiting handler, else the wrapper will not catch new
> >>>> +              * IRQs. So loop around till dw_handle_msi_irq() returns
> >>>> +              * IRQ_NONE
> >>>> +              */
> >>>> +             while (dw_handle_msi_irq(pp) != IRQ_NONE && count < 1000)
> >>>> +                     count++;
> >>>> +
> >>>> +             if (count == 1000) {
> >>>> +                     dev_err(pci->dev, "too much work in msi irq\n");
> >>>> +                     dra7xx_pcie_writel(dra7xx,
> >>>> +                                        PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI,
> >>>> +                                        reg);
> >>>> +                     return IRQ_HANDLED;
> >>>
> >>> I am not merging any code patching this IRQ handling routine anymore
> >>> unless you thoroughly explain to me how this CONF_IRQSTATUS_MSI register
> >>> works (and how it is related to DW registers) and why this specific host
> >>> controller needs handling that is not required by any other host
> >>> controller relying on dw_handle_msi_irq().
> >>
> >> Unlike other DW PCIe controllers, TI implementation has a wrapper on top
> >> of DW core. This wrapper latches the DW core level MSI and legacy
> >> interrupts and then propagates it to GIC.
> >> PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI register is present in this TI
> >> wrapper which aggregates all the MSI IRQs(PCIE_MSI_INTR0_STATUS) of DW
> >> level. They are mapped on the MSI interrupt line of PCIe controller,
> >> using a single status bit in the PCIECTRL_TI_CONF_IRQSTATUS_MSI register.
> >>
> >> So, the irq handler, dra7xx_pcie_msi_irq_handler(), first needs to look
> >> at PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI[4] to know that its MSI IRQ and
> >> then call dw_handle_msi_irq() to handle individual MSI vectors.
> >> Driver has to make sure there are no pending vectors in DW core MSI
> > 
> > How can it make *sure* ? And what makes the wrapper latch MSI IRQs
> > again ?
> > 
> 
> This is the sequence that I got from discussion with internal HW team:
>  1. read CONF_IRQSTATUS_MSI in wrapper and check if MSI bit
>  2. clear CONF_IRQSTATUS_MSI
>  3. read, clear and handle PCIE_MSI_INTR0_STATUS vectors
>  4. repeat step 3 until PCIE_MSI_INTR0_STATUS reads 0
> 
> If read of PCIE_MSI_INTR0_STATUS returns 0 at least once, then its
> guaranteed that the next time any vector is set in PCIE_MSI_INTR0_STATUS
> register(due to MSI IRQ), wrapper will latch it and raise an IRQ to CPU.
> 
> 
> >> status register before exiting handler. Otherwise next MSI IRQ will not
> >> be latched by the wrapper.

Hi Vignesh,

thank you for explaining. To me - this looks like it is better to write
a separate IRQ chip implementation for DRA7XX, let's face it this patch
is a bit hackish (and will prevent us from removing dw_handle_msi_irq()
altogether - which is something we want to achieve) and it is clutching
at straws to re-use DWC MSI code but honestly it is a bit of stretch.

This would duplicate some code, yes but on the other hand it would
give you full control of the (DWC) HW.

You can have a look at pcie-tango.c or the mobiveil patches I have
just reviewed:

https://patchwork.ozlabs.org/patch/878494/

Please note Gustavo's reworked DWC MSI handling that is now queued
for v4.17 in my pci/dwc-msi branch - please do have a look at it
too (and if Gustavo has comments they are welcome).

I do not like the current implementation - sorry, we have to come
up with something cleaner, I do not think that's so complicated.

I do not mind reviewing intermediate patches as long as we get to
a cleaner solution.

Thanks,
Lorenzo

> > 
> > I am sorry but I do not understand how this works - what is the
> > condition that makes wrapper latch IRQs again ? This is at least
> > racy, if not outright broken.
> > 
> > That count == 1000 is a symptom there is something broken on how this
> > driver handles IRQs and I have the impression that we are applying
> > plasters on top of plasters to make it less broken than it actually is.
> > 
> 
> It is an upper bound on how many times driver looks at
> PCIE_MSI_INTR0_STATUS register, so that there is no infinite looping
> when there is an IRQ flood due to misbehaving EP. count == 1000
> condition should not happen and it means something is wrong in the
> system.  I haven't hit this situation in testing
> I can either remove this or put a WARN_ON to say this situation should
> not have happened, if that makes you more comfortable with the patch.
> 
> 
> >>> I suspect there is a code design flaw with the way this host handles
> >>> IRQs and we are going to find it and fix it the way it should, not with
> >>> any plaster like this patch.
> >>>
> >>
> >> I agree there has been some churn wrt this wrapper level IRQ handler.
> >> But, that was because hardware documentation/TRM did not match
> >> actual behavior and so it took some time to understand how the
> >> hardware is working.
> > 
> > How does HW work :) ? Please explain in detail how this works in HW
> > then we will get to the code.
> > 
> 
> Software needs to ensure that PCIE_MSI_INTR0_STATUS needs be 0 by
> reading it. Then, when the next MSI IRQ is raised  CONF_IRQSTATUS_MSI
> register in the wrapper will latch the next IRQ.
> 
> This is my current knowledge, let me know if you need to know anything
> specifically, I will try to ask HW team.
> 
> Regards
> Vignesh
> 
> > Thanks,
> > Lorenzo
> > 
> >> I have extensively tested this series on multiple problematic PCIe USB
> >> cards and PCIe WiFi cards over week long stress tests. And also had
> >> some agreement with internal hardware designers.  Hardware
> >> documentations will also be updated.
> >>
> >>
> >>> Lorenzo
> >>>
> >>>> +             }
> >>>>                 break;
> >>>>         case INTA:
> >>>>         case INTB:
> >>>> @@ -275,8 +292,6 @@ static irqreturn_t dra7xx_pcie_msi_irq_handler(int irq, void *arg)
> >>>>                 break;
> >>>>         }
> >>>>   
> >>>> -     dra7xx_pcie_writel(dra7xx, PCIECTRL_DRA7XX_CONF_IRQSTATUS_MSI, reg);
> >>>> -
> >>>>         return IRQ_HANDLED;
> >>>>   }
> >>>>   
> >>>> -- 
> >>>> 2.16.1
> >>>>
> >>
> >> -- 
> >> Regards
> >> Vignesh

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-03-15 18:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-09 12:04 [PATCH 0/3] dra7xx: PCIe IRQ handling rework Vignesh R
2018-02-09 12:04 ` [PATCH 1/3] Revert "PCI: dwc: Clear MSI interrupt status after it is handled, not before" Vignesh R
2018-02-09 13:46   ` Niklas Cassel
2018-02-09 12:04 ` [PATCH 2/3] PCI: dwc: pci-dra7xx: Improve MSI IRQ handling Vignesh R
2018-02-12 17:58   ` Lorenzo Pieralisi
2018-02-15  4:29     ` Vignesh R
2018-03-01 15:31       ` Vignesh R
2018-03-01 18:18         ` Lorenzo Pieralisi
2018-03-06 15:12       ` Lorenzo Pieralisi
2018-03-09  4:23         ` Vignesh R
2018-03-15 18:10           ` Lorenzo Pieralisi
2018-02-09 12:04 ` [PATCH 3/3] PCI: dwc: pci-dra7xx: Handle legacy and MSI IRQs together Vignesh R

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).