dmaengine.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion
@ 2019-06-22 16:53 Russell King - ARM Linux admin
  2019-06-22 18:02 ` [PATCH] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic() Russell King
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Russell King - ARM Linux admin @ 2019-06-22 16:53 UTC (permalink / raw)
  To: Michael Olbrich, Lucas Stach, Vinod Koul; +Cc: linux-arm-kernel, dmaengine

Old code:

-       while (!(ret = readl_relaxed(sdma->regs + SDMA_H_INTR) & 1)) {
-               if (timeout-- <= 0)
-                       break;
-               udelay(1);
-       }

So, while bit 0 is _clear_ the loop continues to poll.


New code:

+       ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_STATSTOP,
+                                               reg, !(reg & 1), 1, 500);

Doesn't really tell us what the termination condition is (because of
the obfuscation taking away the details), but if we dig into the
macro maze:

#define readl_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
        readx_poll_timeout_atomic(readl_relaxed, addr, val, cond, delay_us, timeout_us)

#define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
({ \
        u64 __timeout_us = (timeout_us); \
        unsigned long __delay_us = (delay_us); \
        ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \
        for (;;) { \
                (val) = op(addr); \
                if (cond) \
                        break; \

"cond" is passed in to here unmodified, so this becomes:

	for (;;) {
		reg = readl_relaxed(sdma->regs + SDMA_H_STATSTOP);
		if (!(reg & 1))
			break;

So, if bit 0 of this register is clear, we terminate the loop.

Seems to me like this is a great illustration why using a helper
_introduces_ bugs, because it hides the detail about what the exit
condition for the embedded loop actually is, and leads to this kind
of error.

In any case, the conversion is obviously incorrect.

I occasionally see the "Timeout waiting for CH0 ready" error during
boot on a cbi4, which, given the above, means that we did end up
seeing bit 1 set (so according to the old code, we waited
successfully.)

Looking at the date of the commit, this is almost a three year old
bug.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic()
  2019-06-22 16:53 [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion Russell King - ARM Linux admin
@ 2019-06-22 18:02 ` Russell King
  2019-06-22 18:10 ` [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion Michael Olbrich
  2019-06-22 18:55 ` [PATCH v2] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic() Russell King
  2 siblings, 0 replies; 13+ messages in thread
From: Russell King @ 2019-06-22 18:02 UTC (permalink / raw)
  To: Michael Olbrich, Lucas Stach, Vinod Koul
  Cc: dmaengine, linux-arm-kernel, Dan Williams, Shawn Guo,
	Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	NXP Linux Team

When imx-sdma was converted to use readl_relaxed_poll_timeout_atomic(),
the termination condition was inverted.  Fix this.

Fixes: 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/dma/imx-sdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 5f3c1378b90e..ba84c0444f35 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -660,7 +660,7 @@ static int sdma_run_channel0(struct sdma_engine *sdma)
 	sdma_enable_channel(sdma, 0);
 
 	ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_STATSTOP,
-						reg, !(reg & 1), 1, 500);
+						reg, reg & 1, 1, 500);
 	if (ret)
 		dev_err(sdma->dev, "Timeout waiting for CH0 ready\n");
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion
  2019-06-22 16:53 [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion Russell King - ARM Linux admin
  2019-06-22 18:02 ` [PATCH] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic() Russell King
@ 2019-06-22 18:10 ` Michael Olbrich
  2019-06-22 18:42   ` Russell King - ARM Linux admin
  2019-06-22 18:55 ` [PATCH v2] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic() Russell King
  2 siblings, 1 reply; 13+ messages in thread
From: Michael Olbrich @ 2019-06-22 18:10 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Lucas Stach, Vinod Koul, linux-arm-kernel, dmaengine

On Sat, Jun 22, 2019 at 05:53:18PM +0100, Russell King - ARM Linux admin wrote:
> Old code:
> 
> -       while (!(ret = readl_relaxed(sdma->regs + SDMA_H_INTR) & 1)) {
> -               if (timeout-- <= 0)
> -                       break;
> -               udelay(1);
> -       }
> 
> So, while bit 0 is _clear_ the loop continues to poll.
> 
> 
> New code:
> 
> +       ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_STATSTOP,
> +                                               reg, !(reg & 1), 1, 500);
> 
> Doesn't really tell us what the termination condition is (because of
> the obfuscation taking away the details), but if we dig into the
> macro maze:
> 
> #define readl_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
>         readx_poll_timeout_atomic(readl_relaxed, addr, val, cond, delay_us, timeout_us)
> 
> #define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
> ({ \
>         u64 __timeout_us = (timeout_us); \
>         unsigned long __delay_us = (delay_us); \
>         ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \
>         for (;;) { \
>                 (val) = op(addr); \
>                 if (cond) \
>                         break; \
> 
> "cond" is passed in to here unmodified, so this becomes:
> 
> 	for (;;) {
> 		reg = readl_relaxed(sdma->regs + SDMA_H_STATSTOP);
> 		if (!(reg & 1))
> 			break;
> 
> So, if bit 0 of this register is clear, we terminate the loop.
> 
> Seems to me like this is a great illustration why using a helper
> _introduces_ bugs, because it hides the detail about what the exit
> condition for the embedded loop actually is, and leads to this kind
> of error.
> 
> In any case, the conversion is obviously incorrect.
> 
> I occasionally see the "Timeout waiting for CH0 ready" error during
> boot on a cbi4, which, given the above, means that we did end up
> seeing bit 1 set (so according to the old code, we waited
> successfully.)

The old code was polling SDMA_H_INTR so it waited for the bit to be set.
The new code (as documented in the commit message) polls SDMA_H_STATSTOP
instead.
I believe this register is called SDMAARM_STOP_STAT in the reference
manual. And the documentation states: "Reading this register yields the
current state of the HE[i] bits".
And from the documentation of the SDMA "DONE" instruction:
"Clear HE bit for the current channel, send an interrupt to the Arm
platform for the current channel and reschedule."

My interpretation of this is, that waiting for the bit in SDMA_H_STATSTOP
to become zero has the same effect as waiting for the bit in SDMA_H_INTR to
be set. Or am I missing something?

Michael

> Looking at the date of the commit, this is almost a three year old
> bug.
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up
> 

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion
  2019-06-22 18:10 ` [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion Michael Olbrich
@ 2019-06-22 18:42   ` Russell King - ARM Linux admin
  2019-06-22 18:51     ` Russell King - ARM Linux admin
  2019-06-24 12:14     ` Lucas Stach
  0 siblings, 2 replies; 13+ messages in thread
From: Russell King - ARM Linux admin @ 2019-06-22 18:42 UTC (permalink / raw)
  To: Michael Olbrich; +Cc: Lucas Stach, Vinod Koul, linux-arm-kernel, dmaengine

On Sat, Jun 22, 2019 at 08:10:29PM +0200, Michael Olbrich wrote:
> On Sat, Jun 22, 2019 at 05:53:18PM +0100, Russell King - ARM Linux admin wrote:
> > Old code:
> > 
> > -       while (!(ret = readl_relaxed(sdma->regs + SDMA_H_INTR) & 1)) {
> > -               if (timeout-- <= 0)
> > -                       break;
> > -               udelay(1);
> > -       }
> > 
> > So, while bit 0 is _clear_ the loop continues to poll.
> > 
> > 
> > New code:
> > 
> > +       ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_STATSTOP,
> > +                                               reg, !(reg & 1), 1, 500);
> > 
> > Doesn't really tell us what the termination condition is (because of
> > the obfuscation taking away the details), but if we dig into the
> > macro maze:
> > 
> > #define readl_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> >         readx_poll_timeout_atomic(readl_relaxed, addr, val, cond, delay_us, timeout_us)
> > 
> > #define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
> > ({ \
> >         u64 __timeout_us = (timeout_us); \
> >         unsigned long __delay_us = (delay_us); \
> >         ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \
> >         for (;;) { \
> >                 (val) = op(addr); \
> >                 if (cond) \
> >                         break; \
> > 
> > "cond" is passed in to here unmodified, so this becomes:
> > 
> > 	for (;;) {
> > 		reg = readl_relaxed(sdma->regs + SDMA_H_STATSTOP);
> > 		if (!(reg & 1))
> > 			break;
> > 
> > So, if bit 0 of this register is clear, we terminate the loop.
> > 
> > Seems to me like this is a great illustration why using a helper
> > _introduces_ bugs, because it hides the detail about what the exit
> > condition for the embedded loop actually is, and leads to this kind
> > of error.
> > 
> > In any case, the conversion is obviously incorrect.
> > 
> > I occasionally see the "Timeout waiting for CH0 ready" error during
> > boot on a cbi4, which, given the above, means that we did end up
> > seeing bit 1 set (so according to the old code, we waited
> > successfully.)
> 
> The old code was polling SDMA_H_INTR so it waited for the bit to be set.
> The new code (as documented in the commit message) polls SDMA_H_STATSTOP
> instead.
> I believe this register is called SDMAARM_STOP_STAT in the reference
> manual. And the documentation states: "Reading this register yields the
> current state of the HE[i] bits".
> And from the documentation of the SDMA "DONE" instruction:
> "Clear HE bit for the current channel, send an interrupt to the Arm
> platform for the current channel and reschedule."
> 
> My interpretation of this is, that waiting for the bit in SDMA_H_STATSTOP
> to become zero has the same effect as waiting for the bit in SDMA_H_INTR to
> be set. Or am I missing something?

So, why do all my iMX6 platforms now randomly spit out:

"imx-sdma 20ec000.sdma: Timeout waiting for CH0 ready"

at boot, whereas they didn't used to with older kernels?  Maybe channel
0 does not clear the HE[0] bit?

The documentation explicitly states that for initialisation, the
following is required:

• Set bit 0 of the SDMA_HSTART register to set HE[0] and allow Channel 0
  to run (assumes EO[0] and DO[0] were both set in previous step). This
  will cause SDMA toload the program RAM and channel contexts configured
  previously.
• Wait for Channel 0 to finish running. This is indicated by HI[0]=1 in
  the SDMA_SDMA_INTR register, or by optional interrupt to the ARM platform.

So, is there a way for a HI bit to be set without clearing the HE bit?
Yes, via the NOTIFY command:

55.5.2.35 NOTIFY (Notify to ARM platform)
Operation:
if (jjj & 4 == 0)
{
  if (jjj&2 == 2)
    HE[CCR] ← 0
  if (jjj&1== 1)
    HI[CCR] ← 1
}
else if (jjj == 4)
  EP[CCR] ← 0
else

So, if jjj is 001 binary, the HE bit can remain set while the HI bit
is cleared.  Maybe the firmware uses this rather than a DONE instruction
when performing the initialisation functions, which means your idea of
going against what is specified in the manual, and using HE[0] instead
of HI[0] is on _very_ shakey ground.

Given that I'm seeing the same issue on _four_ iMX6 platforms here,
I think it's pretty much obvious that your assumptions here are
false.
 
> Michael
> 
> > Looking at the date of the commit, this is almost a three year old
> > bug.
> > 
> > -- 
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> > According to speedtest.net: 11.9Mbps down 500kbps up
> > 
> 
> -- 
> Pengutronix e.K.                           |                             |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion
  2019-06-22 18:42   ` Russell King - ARM Linux admin
@ 2019-06-22 18:51     ` Russell King - ARM Linux admin
  2019-06-24 12:14     ` Lucas Stach
  1 sibling, 0 replies; 13+ messages in thread
From: Russell King - ARM Linux admin @ 2019-06-22 18:51 UTC (permalink / raw)
  To: Michael Olbrich; +Cc: Vinod Koul, dmaengine, linux-arm-kernel, Lucas Stach

On Sat, Jun 22, 2019 at 07:42:37PM +0100, Russell King - ARM Linux admin wrote:
> On Sat, Jun 22, 2019 at 08:10:29PM +0200, Michael Olbrich wrote:
> > On Sat, Jun 22, 2019 at 05:53:18PM +0100, Russell King - ARM Linux admin wrote:
> > > Old code:
> > > 
> > > -       while (!(ret = readl_relaxed(sdma->regs + SDMA_H_INTR) & 1)) {
> > > -               if (timeout-- <= 0)
> > > -                       break;
> > > -               udelay(1);
> > > -       }
> > > 
> > > So, while bit 0 is _clear_ the loop continues to poll.
> > > 
> > > 
> > > New code:
> > > 
> > > +       ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_STATSTOP,
> > > +                                               reg, !(reg & 1), 1, 500);
> > > 
> > > Doesn't really tell us what the termination condition is (because of
> > > the obfuscation taking away the details), but if we dig into the
> > > macro maze:
> > > 
> > > #define readl_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > >         readx_poll_timeout_atomic(readl_relaxed, addr, val, cond, delay_us, timeout_us)
> > > 
> > > #define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
> > > ({ \
> > >         u64 __timeout_us = (timeout_us); \
> > >         unsigned long __delay_us = (delay_us); \
> > >         ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \
> > >         for (;;) { \
> > >                 (val) = op(addr); \
> > >                 if (cond) \
> > >                         break; \
> > > 
> > > "cond" is passed in to here unmodified, so this becomes:
> > > 
> > > 	for (;;) {
> > > 		reg = readl_relaxed(sdma->regs + SDMA_H_STATSTOP);
> > > 		if (!(reg & 1))
> > > 			break;
> > > 
> > > So, if bit 0 of this register is clear, we terminate the loop.
> > > 
> > > Seems to me like this is a great illustration why using a helper
> > > _introduces_ bugs, because it hides the detail about what the exit
> > > condition for the embedded loop actually is, and leads to this kind
> > > of error.
> > > 
> > > In any case, the conversion is obviously incorrect.
> > > 
> > > I occasionally see the "Timeout waiting for CH0 ready" error during
> > > boot on a cbi4, which, given the above, means that we did end up
> > > seeing bit 1 set (so according to the old code, we waited
> > > successfully.)
> > 
> > The old code was polling SDMA_H_INTR so it waited for the bit to be set.
> > The new code (as documented in the commit message) polls SDMA_H_STATSTOP
> > instead.
> > I believe this register is called SDMAARM_STOP_STAT in the reference
> > manual. And the documentation states: "Reading this register yields the
> > current state of the HE[i] bits".
> > And from the documentation of the SDMA "DONE" instruction:
> > "Clear HE bit for the current channel, send an interrupt to the Arm
> > platform for the current channel and reschedule."
> > 
> > My interpretation of this is, that waiting for the bit in SDMA_H_STATSTOP
> > to become zero has the same effect as waiting for the bit in SDMA_H_INTR to
> > be set. Or am I missing something?
> 
> So, why do all my iMX6 platforms now randomly spit out:
> 
> "imx-sdma 20ec000.sdma: Timeout waiting for CH0 ready"
> 
> at boot, whereas they didn't used to with older kernels?  Maybe channel
> 0 does not clear the HE[0] bit?
> 
> The documentation explicitly states that for initialisation, the
> following is required:
> 
> • Set bit 0 of the SDMA_HSTART register to set HE[0] and allow Channel 0
>   to run (assumes EO[0] and DO[0] were both set in previous step). This
>   will cause SDMA toload the program RAM and channel contexts configured
>   previously.
> • Wait for Channel 0 to finish running. This is indicated by HI[0]=1 in
>   the SDMA_SDMA_INTR register, or by optional interrupt to the ARM platform.
> 
> So, is there a way for a HI bit to be set without clearing the HE bit?
> Yes, via the NOTIFY command:
> 
> 55.5.2.35 NOTIFY (Notify to ARM platform)
> Operation:
> if (jjj & 4 == 0)
> {
>   if (jjj&2 == 2)
>     HE[CCR] ← 0
>   if (jjj&1== 1)
>     HI[CCR] ← 1
> }
> else if (jjj == 4)
>   EP[CCR] ← 0
> else
> 
> So, if jjj is 001 binary, the HE bit can remain set while the HI bit
> is cleared.  Maybe the firmware uses this rather than a DONE instruction

Sorry, *set*.

> when performing the initialisation functions, which means your idea of
> going against what is specified in the manual, and using HE[0] instead
> of HI[0] is on _very_ shakey ground.
> 
> Given that I'm seeing the same issue on _four_ iMX6 platforms here,
> I think it's pretty much obvious that your assumptions here are
> false.
>  
> > Michael
> > 
> > > Looking at the date of the commit, this is almost a three year old
> > > bug.
> > > 
> > > -- 
> > > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > > FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> > > According to speedtest.net: 11.9Mbps down 500kbps up
> > > 
> > 
> > -- 
> > Pengutronix e.K.                           |                             |
> > Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
> > Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
> > 
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic()
  2019-06-22 16:53 [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion Russell King - ARM Linux admin
  2019-06-22 18:02 ` [PATCH] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic() Russell King
  2019-06-22 18:10 ` [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion Michael Olbrich
@ 2019-06-22 18:55 ` Russell King
  2019-06-22 19:26   ` Russell King - ARM Linux admin
  2 siblings, 1 reply; 13+ messages in thread
From: Russell King @ 2019-06-22 18:55 UTC (permalink / raw)
  To: Michael Olbrich, Lucas Stach, Vinod Koul
  Cc: dmaengine, linux-arm-kernel, Dan Williams, Shawn Guo,
	Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	NXP Linux Team

When imx-sdma was converted to use readl_relaxed_poll_timeout_atomic(),
the termination condition was inverted.  Fix this.

Fixes: 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/dma/imx-sdma.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 5f3c1378b90e..c45cbdb09714 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -655,15 +655,21 @@ static void sdma_enable_channel(struct sdma_engine *sdma, int channel)
 static int sdma_run_channel0(struct sdma_engine *sdma)
 {
 	int ret;
-	u32 reg;
+	u32 reg, mask;
+
+	// Disable channel 0 interrupt
+	mask = readl_relaxed(sdma->regs + SDMA_H_INTRMSK);
+	writel_relaxed(mask & ~1, sdma->regs + SDMA_H_INTRMSK);
 
 	sdma_enable_channel(sdma, 0);
 
-	ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_STATSTOP,
-						reg, !(reg & 1), 1, 500);
+	ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_INTR,
+						reg, reg & 1, 1, 500);
 	if (ret)
 		dev_err(sdma->dev, "Timeout waiting for CH0 ready\n");
 
+	writel_relaxed(mask, sdma->regs + SDMA_H_INTRMSK);
+
 	/* Set bits of CONFIG register with dynamic context switching */
 	reg = readl(sdma->regs + SDMA_H_CONFIG);
 	if ((reg & SDMA_H_CONFIG_CSM) == 0) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic()
  2019-06-22 18:55 ` [PATCH v2] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic() Russell King
@ 2019-06-22 19:26   ` Russell King - ARM Linux admin
  2019-06-22 20:26     ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 13+ messages in thread
From: Russell King - ARM Linux admin @ 2019-06-22 19:26 UTC (permalink / raw)
  To: Michael Olbrich, Lucas Stach, Vinod Koul
  Cc: dmaengine, linux-arm-kernel, Dan Williams, Shawn Guo,
	Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	NXP Linux Team

Well, this doesn't appear to completely solve the problem either -
one out of four of my platforms still spat out the error (because
the SDMA initialisation can run on a different CPU to that which
receives the interrupt.)

I've thought about using a completion, but that doesn't work either,
because in the case of a single CPU, the interrupts will be masked,
so we can't wait for completion.  I think we need to eliminate that
spinlock around this code.

On Sat, Jun 22, 2019 at 07:55:53PM +0100, Russell King wrote:
> When imx-sdma was converted to use readl_relaxed_poll_timeout_atomic(),
> the termination condition was inverted.  Fix this.
> 
> Fixes: 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler")
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> ---
>  drivers/dma/imx-sdma.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
> index 5f3c1378b90e..c45cbdb09714 100644
> --- a/drivers/dma/imx-sdma.c
> +++ b/drivers/dma/imx-sdma.c
> @@ -655,15 +655,21 @@ static void sdma_enable_channel(struct sdma_engine *sdma, int channel)
>  static int sdma_run_channel0(struct sdma_engine *sdma)
>  {
>  	int ret;
> -	u32 reg;
> +	u32 reg, mask;
> +
> +	// Disable channel 0 interrupt
> +	mask = readl_relaxed(sdma->regs + SDMA_H_INTRMSK);
> +	writel_relaxed(mask & ~1, sdma->regs + SDMA_H_INTRMSK);
>  
>  	sdma_enable_channel(sdma, 0);
>  
> -	ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_STATSTOP,
> -						reg, !(reg & 1), 1, 500);
> +	ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_INTR,
> +						reg, reg & 1, 1, 500);
>  	if (ret)
>  		dev_err(sdma->dev, "Timeout waiting for CH0 ready\n");
>  
> +	writel_relaxed(mask, sdma->regs + SDMA_H_INTRMSK);
> +
>  	/* Set bits of CONFIG register with dynamic context switching */
>  	reg = readl(sdma->regs + SDMA_H_CONFIG);
>  	if ((reg & SDMA_H_CONFIG_CSM) == 0) {
> -- 
> 2.7.4
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic()
  2019-06-22 19:26   ` Russell King - ARM Linux admin
@ 2019-06-22 20:26     ` Russell King - ARM Linux admin
  2019-06-23 13:29       ` Fabio Estevam
  0 siblings, 1 reply; 13+ messages in thread
From: Russell King - ARM Linux admin @ 2019-06-22 20:26 UTC (permalink / raw)
  To: Michael Olbrich, Lucas Stach, Vinod Koul
  Cc: Fabio Estevam, Sascha Hauer, NXP Linux Team,
	Pengutronix Kernel Team, dmaengine, Dan Williams, Shawn Guo,
	linux-arm-kernel

On Sat, Jun 22, 2019 at 08:26:53PM +0100, Russell King - ARM Linux admin wrote:
> Well, this doesn't appear to completely solve the problem either -
> one out of four of my platforms still spat out the error (because
> the SDMA initialisation can run on a different CPU to that which
> receives the interrupt.)
> 
> I've thought about using a completion, but that doesn't work either,
> because in the case of a single CPU, the interrupts will be masked,
> so we can't wait for completion.  I think we need to eliminate that
> spinlock around this code.

It looks like iMX6 Dual does not initialise DMA properly using the 1.1
firmware - md5sum is:

5d4584134cc4cba62e1be2f382cd6f3a  /lib/firmware/imx/sdma/sdma-imx6q.bin

I've tried extending the timeout to 5ms, checking HI[0] (both from the
interrupt handler and from sdma_run_channel0() to cover the case of a
single-core setup).

After boot:

 60:          0          0       GPC   2 Level     sdma

So no interrupt was received.  Looking at the registers:

# /shared/bin32/devmem2 0x20ec02c
Value at address 0x020ec02c: 0x00000000  <= H_INTRMASK
# /shared/bin32/devmem2 0x20ec004
Value at address 0x020ec004: 0x00000000  <= H_INTR
# /shared/bin32/devmem2 0x20ec00c
Value at address 0x020ec00c: 0x00000000  <= H_START
# /shared/bin32/devmem2 0x20ec008
Value at address 0x020ec008: 0x00000001  <= H_STATSTOP

Any ideas?

> 
> On Sat, Jun 22, 2019 at 07:55:53PM +0100, Russell King wrote:
> > When imx-sdma was converted to use readl_relaxed_poll_timeout_atomic(),
> > the termination condition was inverted.  Fix this.
> > 
> > Fixes: 1d069bfa3c78 ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler")
> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> > ---
> >  drivers/dma/imx-sdma.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
> > index 5f3c1378b90e..c45cbdb09714 100644
> > --- a/drivers/dma/imx-sdma.c
> > +++ b/drivers/dma/imx-sdma.c
> > @@ -655,15 +655,21 @@ static void sdma_enable_channel(struct sdma_engine *sdma, int channel)
> >  static int sdma_run_channel0(struct sdma_engine *sdma)
> >  {
> >  	int ret;
> > -	u32 reg;
> > +	u32 reg, mask;
> > +
> > +	// Disable channel 0 interrupt
> > +	mask = readl_relaxed(sdma->regs + SDMA_H_INTRMSK);
> > +	writel_relaxed(mask & ~1, sdma->regs + SDMA_H_INTRMSK);
> >  
> >  	sdma_enable_channel(sdma, 0);
> >  
> > -	ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_STATSTOP,
> > -						reg, !(reg & 1), 1, 500);
> > +	ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_INTR,
> > +						reg, reg & 1, 1, 500);
> >  	if (ret)
> >  		dev_err(sdma->dev, "Timeout waiting for CH0 ready\n");
> >  
> > +	writel_relaxed(mask, sdma->regs + SDMA_H_INTRMSK);
> > +
> >  	/* Set bits of CONFIG register with dynamic context switching */
> >  	reg = readl(sdma->regs + SDMA_H_CONFIG);
> >  	if ((reg & SDMA_H_CONFIG_CSM) == 0) {
> > -- 
> > 2.7.4
> > 
> > 
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
> According to speedtest.net: 11.9Mbps down 500kbps up
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic()
  2019-06-22 20:26     ` Russell King - ARM Linux admin
@ 2019-06-23 13:29       ` Fabio Estevam
  2019-06-25  9:00         ` Robin Gong
  0 siblings, 1 reply; 13+ messages in thread
From: Fabio Estevam @ 2019-06-23 13:29 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Robin Gong
  Cc: Michael Olbrich, Lucas Stach, Vinod Koul, Sascha Hauer,
	NXP Linux Team, Pengutronix Kernel Team, dmaengine, Dan Williams,
	Shawn Guo,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

Hi Russell,

On Sat, Jun 22, 2019 at 5:27 PM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Sat, Jun 22, 2019 at 08:26:53PM +0100, Russell King - ARM Linux admin wrote:
> > Well, this doesn't appear to completely solve the problem either -
> > one out of four of my platforms still spat out the error (because
> > the SDMA initialisation can run on a different CPU to that which
> > receives the interrupt.)
> >
> > I've thought about using a completion, but that doesn't work either,
> > because in the case of a single CPU, the interrupts will be masked,
> > so we can't wait for completion.  I think we need to eliminate that
> > spinlock around this code.
>
> It looks like iMX6 Dual does not initialise DMA properly using the 1.1
> firmware - md5sum is:
>
> 5d4584134cc4cba62e1be2f382cd6f3a  /lib/firmware/imx/sdma/sdma-imx6q.bin
>
> I've tried extending the timeout to 5ms, checking HI[0] (both from the
> interrupt handler and from sdma_run_channel0() to cover the case of a
> single-core setup).
>
> After boot:
>
>  60:          0          0       GPC   2 Level     sdma
>
> So no interrupt was received.  Looking at the registers:
>
> # /shared/bin32/devmem2 0x20ec02c
> Value at address 0x020ec02c: 0x00000000  <= H_INTRMASK
> # /shared/bin32/devmem2 0x20ec004
> Value at address 0x020ec004: 0x00000000  <= H_INTR
> # /shared/bin32/devmem2 0x20ec00c
> Value at address 0x020ec00c: 0x00000000  <= H_START
> # /shared/bin32/devmem2 0x20ec008
> Value at address 0x020ec008: 0x00000001  <= H_STATSTOP
>
> Any ideas?

Could you please try this patch from Robin?
http://lists.infradead.org/pipermail/linux-arm-kernel/2019-June/661914.html

Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion
  2019-06-22 18:42   ` Russell King - ARM Linux admin
  2019-06-22 18:51     ` Russell King - ARM Linux admin
@ 2019-06-24 12:14     ` Lucas Stach
  2019-06-24 12:15       ` Russell King - ARM Linux admin
  1 sibling, 1 reply; 13+ messages in thread
From: Lucas Stach @ 2019-06-24 12:14 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Michael Olbrich
  Cc: Vinod Koul, linux-arm-kernel, dmaengine

Hi Russell,

Am Samstag, den 22.06.2019, 19:42 +0100 schrieb Russell King - ARM Linux admin:
> On Sat, Jun 22, 2019 at 08:10:29PM +0200, Michael Olbrich wrote:
> > On Sat, Jun 22, 2019 at 05:53:18PM +0100, Russell King - ARM Linux admin wrote:
> > > Old code:
> > > 
> > > -       while (!(ret = readl_relaxed(sdma->regs + SDMA_H_INTR) & 1)) {
> > > -               if (timeout-- <= 0)
> > > -                       break;
> > > -               udelay(1);
> > > -       }
> > > 
> > > So, while bit 0 is _clear_ the loop continues to poll.
> > > 
> > > 
> > > New code:
> > > 
> > > +       ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_STATSTOP,
> > > +                                               reg, !(reg & 1), 1, 500);
> > > 
> > > Doesn't really tell us what the termination condition is (because of
> > > the obfuscation taking away the details), but if we dig into the
> > > macro maze:
> > > 
> > > #define readl_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > >         readx_poll_timeout_atomic(readl_relaxed, addr, val, cond, delay_us, timeout_us)
> > > 
> > > #define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
> > > ({ \
> > >         u64 __timeout_us = (timeout_us); \
> > >         unsigned long __delay_us = (delay_us); \
> > >         ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \
> > >         for (;;) { \
> > >                 (val) = op(addr); \
> > >                 if (cond) \
> > >                         break; \
> > > 
> > > "cond" is passed in to here unmodified, so this becomes:
> > > 
> > > 	for (;;) {
> > > > > > 		reg = readl_relaxed(sdma->regs + SDMA_H_STATSTOP);
> > > > > > 		if (!(reg & 1))
> > > > > > 			break;
> > > 
> > > So, if bit 0 of this register is clear, we terminate the loop.
> > > 
> > > Seems to me like this is a great illustration why using a helper
> > > _introduces_ bugs, because it hides the detail about what the exit
> > > condition for the embedded loop actually is, and leads to this kind
> > > of error.
> > > 
> > > In any case, the conversion is obviously incorrect.
> > > 
> > > I occasionally see the "Timeout waiting for CH0 ready" error during
> > > boot on a cbi4, which, given the above, means that we did end up
> > > seeing bit 1 set (so according to the old code, we waited
> > > successfully.)
> > 
> > The old code was polling SDMA_H_INTR so it waited for the bit to be set.
> > The new code (as documented in the commit message) polls SDMA_H_STATSTOP
> > instead.
> > I believe this register is called SDMAARM_STOP_STAT in the reference
> > manual. And the documentation states: "Reading this register yields the
> > current state of the HE[i] bits".
> > And from the documentation of the SDMA "DONE" instruction:
> > "Clear HE bit for the current channel, send an interrupt to the Arm
> > platform for the current channel and reschedule."
> > 
> > My interpretation of this is, that waiting for the bit in SDMA_H_STATSTOP
> > to become zero has the same effect as waiting for the bit in SDMA_H_INTR to
> > be set. Or am I missing something?
> 
> So, why do all my iMX6 platforms now randomly spit out:
> 
> "imx-sdma 20ec000.sdma: Timeout waiting for CH0 ready"

This is due to a DT misconfiguration which was uncovered with a recent
driver change (25aaa75df1e6 dmaengine: imx-sdma: add clock ratio 1:1
check) and fixed with (941acd566b18 dmaengine: imx-sdma: Only check
ratio on parts that support 1:1). Please switch to a recent stable
kernel, 5.1.5 has the fix included.

Regards,
Lucas


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion
  2019-06-24 12:14     ` Lucas Stach
@ 2019-06-24 12:15       ` Russell King - ARM Linux admin
  2019-06-24 12:52         ` Lucas Stach
  0 siblings, 1 reply; 13+ messages in thread
From: Russell King - ARM Linux admin @ 2019-06-24 12:15 UTC (permalink / raw)
  To: Lucas Stach; +Cc: Michael Olbrich, Vinod Koul, linux-arm-kernel, dmaengine

On Mon, Jun 24, 2019 at 02:14:10PM +0200, Lucas Stach wrote:
> Hi Russell,
> 
> Am Samstag, den 22.06.2019, 19:42 +0100 schrieb Russell King - ARM Linux admin:
> > On Sat, Jun 22, 2019 at 08:10:29PM +0200, Michael Olbrich wrote:
> > > On Sat, Jun 22, 2019 at 05:53:18PM +0100, Russell King - ARM Linux admin wrote:
> > > > Old code:
> > > > 
> > > > -       while (!(ret = readl_relaxed(sdma->regs + SDMA_H_INTR) & 1)) {
> > > > -               if (timeout-- <= 0)
> > > > -                       break;
> > > > -               udelay(1);
> > > > -       }
> > > > 
> > > > So, while bit 0 is _clear_ the loop continues to poll.
> > > > 
> > > > 
> > > > New code:
> > > > 
> > > > +       ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_STATSTOP,
> > > > +                                               reg, !(reg & 1), 1, 500);
> > > > 
> > > > Doesn't really tell us what the termination condition is (because of
> > > > the obfuscation taking away the details), but if we dig into the
> > > > macro maze:
> > > > 
> > > > #define readl_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > > >         readx_poll_timeout_atomic(readl_relaxed, addr, val, cond, delay_us, timeout_us)
> > > > 
> > > > #define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
> > > > ({ \
> > > >         u64 __timeout_us = (timeout_us); \
> > > >         unsigned long __delay_us = (delay_us); \
> > > >         ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \
> > > >         for (;;) { \
> > > >                 (val) = op(addr); \
> > > >                 if (cond) \
> > > >                         break; \
> > > > 
> > > > "cond" is passed in to here unmodified, so this becomes:
> > > > 
> > > > 	for (;;) {
> > > > > > > 		reg = readl_relaxed(sdma->regs + SDMA_H_STATSTOP);
> > > > > > > 		if (!(reg & 1))
> > > > > > > 			break;
> > > > 
> > > > So, if bit 0 of this register is clear, we terminate the loop.
> > > > 
> > > > Seems to me like this is a great illustration why using a helper
> > > > _introduces_ bugs, because it hides the detail about what the exit
> > > > condition for the embedded loop actually is, and leads to this kind
> > > > of error.
> > > > 
> > > > In any case, the conversion is obviously incorrect.
> > > > 
> > > > I occasionally see the "Timeout waiting for CH0 ready" error during
> > > > boot on a cbi4, which, given the above, means that we did end up
> > > > seeing bit 1 set (so according to the old code, we waited
> > > > successfully.)
> > > 
> > > The old code was polling SDMA_H_INTR so it waited for the bit to be set.
> > > The new code (as documented in the commit message) polls SDMA_H_STATSTOP
> > > instead.
> > > I believe this register is called SDMAARM_STOP_STAT in the reference
> > > manual. And the documentation states: "Reading this register yields the
> > > current state of the HE[i] bits".
> > > And from the documentation of the SDMA "DONE" instruction:
> > > "Clear HE bit for the current channel, send an interrupt to the Arm
> > > platform for the current channel and reschedule."
> > > 
> > > My interpretation of this is, that waiting for the bit in SDMA_H_STATSTOP
> > > to become zero has the same effect as waiting for the bit in SDMA_H_INTR to
> > > be set. Or am I missing something?
> > 
> > So, why do all my iMX6 platforms now randomly spit out:
> > 
> > "imx-sdma 20ec000.sdma: Timeout waiting for CH0 ready"
> 
> This is due to a DT misconfiguration which was uncovered with a recent
> driver change (25aaa75df1e6 dmaengine: imx-sdma: add clock ratio 1:1
> check) and fixed with (941acd566b18 dmaengine: imx-sdma: Only check
> ratio on parts that support 1:1). Please switch to a recent stable
> kernel, 5.1.5 has the fix included.

Please point to the fix, thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion
  2019-06-24 12:15       ` Russell King - ARM Linux admin
@ 2019-06-24 12:52         ` Lucas Stach
  0 siblings, 0 replies; 13+ messages in thread
From: Lucas Stach @ 2019-06-24 12:52 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Michael Olbrich, Vinod Koul, linux-arm-kernel, dmaengine

Am Montag, den 24.06.2019, 13:15 +0100 schrieb Russell King - ARM Linux admin:
> On Mon, Jun 24, 2019 at 02:14:10PM +0200, Lucas Stach wrote:
> > Hi Russell,
> > 
> > Am Samstag, den 22.06.2019, 19:42 +0100 schrieb Russell King - ARM Linux admin:
> > > On Sat, Jun 22, 2019 at 08:10:29PM +0200, Michael Olbrich wrote:
> > > > On Sat, Jun 22, 2019 at 05:53:18PM +0100, Russell King - ARM Linux admin wrote:
> > > > > Old code:
> > > > > 
> > > > > -       while (!(ret = readl_relaxed(sdma->regs + SDMA_H_INTR) & 1)) {
> > > > > -               if (timeout-- <= 0)
> > > > > -                       break;
> > > > > -               udelay(1);
> > > > > -       }
> > > > > 
> > > > > So, while bit 0 is _clear_ the loop continues to poll.
> > > > > 
> > > > > 
> > > > > New code:
> > > > > 
> > > > > +       ret = readl_relaxed_poll_timeout_atomic(sdma->regs + SDMA_H_STATSTOP,
> > > > > +                                               reg, !(reg & 1), 1, 500);
> > > > > 
> > > > > Doesn't really tell us what the termination condition is (because of
> > > > > the obfuscation taking away the details), but if we dig into the
> > > > > macro maze:
> > > > > 
> > > > > #define readl_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > > > >         readx_poll_timeout_atomic(readl_relaxed, addr, val, cond, delay_us, timeout_us)
> > > > > 
> > > > > #define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
> > > > > ({ \
> > > > >         u64 __timeout_us = (timeout_us); \
> > > > >         unsigned long __delay_us = (delay_us); \
> > > > >         ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \
> > > > >         for (;;) { \
> > > > >                 (val) = op(addr); \
> > > > >                 if (cond) \
> > > > >                         break; \
> > > > > 
> > > > > "cond" is passed in to here unmodified, so this becomes:
> > > > > 
> > > > > 	for (;;) {
> > > > > > > > > > > > > > > > 		reg = readl_relaxed(sdma->regs + SDMA_H_STATSTOP);
> > > > > > > > > > > > > > > > 		if (!(reg & 1))
> > > > > > > > 			break;
> > > > > 
> > > > > So, if bit 0 of this register is clear, we terminate the loop.
> > > > > 
> > > > > Seems to me like this is a great illustration why using a helper
> > > > > _introduces_ bugs, because it hides the detail about what the exit
> > > > > condition for the embedded loop actually is, and leads to this kind
> > > > > of error.
> > > > > 
> > > > > In any case, the conversion is obviously incorrect.
> > > > > 
> > > > > I occasionally see the "Timeout waiting for CH0 ready" error during
> > > > > boot on a cbi4, which, given the above, means that we did end up
> > > > > seeing bit 1 set (so according to the old code, we waited
> > > > > successfully.)
> > > > 
> > > > The old code was polling SDMA_H_INTR so it waited for the bit to be set.
> > > > The new code (as documented in the commit message) polls SDMA_H_STATSTOP
> > > > instead.
> > > > I believe this register is called SDMAARM_STOP_STAT in the reference
> > > > manual. And the documentation states: "Reading this register yields the
> > > > current state of the HE[i] bits".
> > > > And from the documentation of the SDMA "DONE" instruction:
> > > > "Clear HE bit for the current channel, send an interrupt to the Arm
> > > > platform for the current channel and reschedule."
> > > > 
> > > > My interpretation of this is, that waiting for the bit in SDMA_H_STATSTOP
> > > > to become zero has the same effect as waiting for the bit in SDMA_H_INTR to
> > > > be set. Or am I missing something?
> > > 
> > > So, why do all my iMX6 platforms now randomly spit out:
> > > 
> > > "imx-sdma 20ec000.sdma: Timeout waiting for CH0 ready"
> > 
> > This is due to a DT misconfiguration which was uncovered with a recent
> > driver change (25aaa75df1e6 dmaengine: imx-sdma: add clock ratio 1:1
> > check) and fixed with (941acd566b18 dmaengine: imx-sdma: Only check
> > ratio on parts that support 1:1). Please switch to a recent stable
> > kernel, 5.1.5 has the fix included.
> 
> Please point to the fix, thanks.

As I stated above the fix is 941acd566b18 (dmaengine: imx-sdma: Only
check ratio on parts that support 1:1), which is 40aab1990f71 in the
5.1.y stable branch.

Regards,
Lucas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v2] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic()
  2019-06-23 13:29       ` Fabio Estevam
@ 2019-06-25  9:00         ` Robin Gong
  0 siblings, 0 replies; 13+ messages in thread
From: Robin Gong @ 2019-06-25  9:00 UTC (permalink / raw)
  To: Fabio Estevam, Russell King - ARM Linux admin
  Cc: Michael Olbrich, Lucas Stach, Vinod Koul, Sascha Hauer,
	dl-linux-imx, Pengutronix Kernel Team, dmaengine, Dan Williams,
	Shawn Guo,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Sun, Jun 23, 2019 at 21:30 Fabio Estevam <festevam@gmail.com> wrote:
> Hi Russell,
> 
> On Sat, Jun 22, 2019 at 5:27 PM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > On Sat, Jun 22, 2019 at 08:26:53PM +0100, Russell King - ARM Linux admin
> wrote:
> > > Well, this doesn't appear to completely solve the problem either -
> > > one out of four of my platforms still spat out the error (because
> > > the SDMA initialisation can run on a different CPU to that which
> > > receives the interrupt.)
> > >
> > > I've thought about using a completion, but that doesn't work either,
> > > because in the case of a single CPU, the interrupts will be masked,
> > > so we can't wait for completion.  I think we need to eliminate that
> > > spinlock around this code.
> >
> > It looks like iMX6 Dual does not initialise DMA properly using the 1.1
> > firmware - md5sum is:
> >
> > 5d4584134cc4cba62e1be2f382cd6f3a
> > /lib/firmware/imx/sdma/sdma-imx6q.bin
> >
> > I've tried extending the timeout to 5ms, checking HI[0] (both from the
> > interrupt handler and from sdma_run_channel0() to cover the case of a
> > single-core setup).
> >
> > After boot:
> >
> >  60:          0          0       GPC   2 Level     sdma
> >
> > So no interrupt was received.  Looking at the registers:
> >
> > # /shared/bin32/devmem2 0x20ec02c
> > Value at address 0x020ec02c: 0x00000000  <= H_INTRMASK #
> > /shared/bin32/devmem2 0x20ec004 Value at address 0x020ec004:
> > 0x00000000  <= H_INTR # /shared/bin32/devmem2 0x20ec00c Value at
> > address 0x020ec00c: 0x00000000  <= H_START # /shared/bin32/devmem2
> > 0x20ec008 Value at address 0x020ec008: 0x00000001  <= H_STATSTOP
> >
> > Any ideas?
Seems sdma script not run as expected, thus no DONE instruction involved to clear
'HE' of H_STATSTOP and notify ARM by interrupt. So this timeout happened during the
first ' sdma_load_script()' phase ?


> Could you please try this patch from Robin?
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infra
> dead.org%2Fpipermail%2Flinux-arm-kernel%2F2019-June%2F661914.html&a
> mp;data=02%7C01%7Cyibin.gong%40nxp.com%7C7faa18517626429780d908
> d6f7ded3b6%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636968
> 933747699843&amp;sdata=BIipoIgBc5sMahJkz33L5ucqeuHwyYnqg09ornpeLE
> 4%3D&amp;reserved=0
This should be the different case, since in Russell King's case, no any interrupt while my patch
fix the potential interrupt storm.
> 
> Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-06-25  9:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-22 16:53 [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion Russell King - ARM Linux admin
2019-06-22 18:02 ` [PATCH] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic() Russell King
2019-06-22 18:10 ` [BUG] imx-sdma: readl_relaxed_poll_timeout_atomic() conversion Michael Olbrich
2019-06-22 18:42   ` Russell King - ARM Linux admin
2019-06-22 18:51     ` Russell King - ARM Linux admin
2019-06-24 12:14     ` Lucas Stach
2019-06-24 12:15       ` Russell King - ARM Linux admin
2019-06-24 12:52         ` Lucas Stach
2019-06-22 18:55 ` [PATCH v2] dmaengine: imx-sdma: fix incorrect conversion to readl_relaxed_poll_timeout_atomic() Russell King
2019-06-22 19:26   ` Russell King - ARM Linux admin
2019-06-22 20:26     ` Russell King - ARM Linux admin
2019-06-23 13:29       ` Fabio Estevam
2019-06-25  9:00         ` Robin Gong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).