linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] Fix UART DMA freezes for i.MX SOCs
@ 2019-09-19 14:29 Philipp Puschmann
  2019-09-19 14:29 ` [PATCH v4 1/3] dmaengine: imx-sdma: fix buffer ownership Philipp Puschmann
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Philipp Puschmann @ 2019-09-19 14:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: yibin.gong, fugang.duan, l.stach, dan.j.williams, vkoul,
	shawnguo, s.hauer, kernel, festevam, linux-imx, dmaengine,
	linux-arm-kernel, Philipp Puschmann

For some years and since many kernel versions there are reports that
RX UART DMA channel stops working at one point. So far the usual
workaround was to disable RX DMA. This patches fix the underlying
problem.

When a running sdma script does not find any usable destination buffer
to put its data into it just leads to stopping the channel being
scheduled again. As solution we manually retrigger the sdma script for
this channel and by this dissolve the freeze.

While this seems to work fine so far, it may come to buffer overruns
when the channel - even temporary - is stopped. This case has to be
addressed by device drivers by increasing the number of DMA periods.

This patch series was tested with the current kernel and backported to
kernel 4.15 with a special use case using a WL1837MOD via UART and
provoking the hanging of UART RX DMA within seconds after starting a
test application. It resulted in well known
  "Bluetooth: hci0: command 0x0408 tx timeout"
errors and complete stop of UART data reception. Our Bluetooth traffic
consists of many independent small packets, mostly only a few bytes,
causing high usage of periods.

Changelog v4:
 - fixed the fixes tags
 
Changelog v3:
 - fixes typo in dma_wmb
 - add fixes tags
 
Changelog v2:
 - adapt title (this patches are not only for i.MX6)
 - improve some comments and patch descriptions
 - add a dma_wb() around BD_DONE flag
 - add Reviewed-by tags
 - split off  "serial: imx: adapt rx buffer and dma periods"

Philipp Puschmann (3):
  dmaengine: imx-sdma: fix buffer ownership
  dmaengine: imx-sdma: fix dma freezes
  dmaengine: imx-sdma: drop redundant variable

 drivers/dma/imx-sdma.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/3] dmaengine: imx-sdma: fix buffer ownership
  2019-09-19 14:29 [PATCH v4 0/3] Fix UART DMA freezes for i.MX SOCs Philipp Puschmann
@ 2019-09-19 14:29 ` Philipp Puschmann
  2019-09-19 14:29 ` [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes Philipp Puschmann
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Philipp Puschmann @ 2019-09-19 14:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: yibin.gong, fugang.duan, l.stach, dan.j.williams, vkoul,
	shawnguo, s.hauer, kernel, festevam, linux-imx, dmaengine,
	linux-arm-kernel, Philipp Puschmann

BD_DONE flag marks ownership of the buffer. When 1 SDMA owns the
buffer, when 0 ARM owns it. When processing the buffers in
sdma_update_channel_loop the ownership of the currently processed
buffer was set to SDMA again before running the callback function of
the buffer and while the sdma script may be running in parallel. So
there was the possibility to get the buffer overwritten by SDMA before
it has been processed by kernel leading to kind of random errors in the
upper layers, e.g. bluetooth.

Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support")
Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>

---

Changelog v4:
 - fixed the fixes tag
 
Changelog v3:
 - use correct dma_wmb() instead of dma_wb()
 - add fixes tag

Changelog v2:
 - add dma_wb()

 drivers/dma/imx-sdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 9ba74ab7e912..e029a2443cfc 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -802,7 +802,6 @@ static void sdma_update_channel_loop(struct sdma_channel *sdmac)
 		*/
 
 		desc->chn_real_count = bd->mode.count;
-		bd->mode.status |= BD_DONE;
 		bd->mode.count = desc->period_len;
 		desc->buf_ptail = desc->buf_tail;
 		desc->buf_tail = (desc->buf_tail + 1) % desc->num_bd;
@@ -817,6 +816,9 @@ static void sdma_update_channel_loop(struct sdma_channel *sdmac)
 		dmaengine_desc_get_callback_invoke(&desc->vd.tx, NULL);
 		spin_lock(&sdmac->vc.lock);
 
+		dma_wmb();
+		bd->mode.status |= BD_DONE;
+
 		if (error)
 			sdmac->status = old_status;
 	}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes
  2019-09-19 14:29 [PATCH v4 0/3] Fix UART DMA freezes for i.MX SOCs Philipp Puschmann
  2019-09-19 14:29 ` [PATCH v4 1/3] dmaengine: imx-sdma: fix buffer ownership Philipp Puschmann
@ 2019-09-19 14:29 ` Philipp Puschmann
  2019-09-19 15:19   ` Jan Lübbe
  2019-09-24  1:38   ` Robin Gong
  2019-09-19 14:29 ` [PATCH v4 3/3] dmaengine: imx-sdma: drop redundant variable Philipp Puschmann
  2019-09-20  2:44 ` [EXT] [PATCH v4 0/3] Fix UART DMA freezes for i.MX SOCs Andy Duan
  3 siblings, 2 replies; 9+ messages in thread
From: Philipp Puschmann @ 2019-09-19 14:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: yibin.gong, fugang.duan, l.stach, dan.j.williams, vkoul,
	shawnguo, s.hauer, kernel, festevam, linux-imx, dmaengine,
	linux-arm-kernel, Philipp Puschmann

For some years and since many kernel versions there are reports that the
RX UART SDMA channel stops working at some point. The workaround was to
disable DMA for RX. This commit tries to fix the problem itself.

Due to its license i wasn't able to debug the sdma script itself but it
somehow leads to blocking the scheduling of the channel script when a
running sdma script does not find any free descriptor in the ring to put
its data into.

If we detect such a potential case we manually restart the channel.

As sdmac->desc is constant we can move desc out of the loop.

Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support")
Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
---

Changelog v4:
 - fixed the fixes tag
 
Changelog v3:
 - use correct dma_wmb() instead of dma_wb()
 - add fixes tag
 
Changelog v2:
 - clarify comment and commit description

 drivers/dma/imx-sdma.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index e029a2443cfc..a32b5962630e 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -775,21 +775,23 @@ static void sdma_start_desc(struct sdma_channel *sdmac)
 static void sdma_update_channel_loop(struct sdma_channel *sdmac)
 {
 	struct sdma_buffer_descriptor *bd;
-	int error = 0;
-	enum dma_status	old_status = sdmac->status;
+	struct sdma_desc *desc = sdmac->desc;
+	int error = 0, cnt = 0;
+	enum dma_status old_status = sdmac->status;
 
 	/*
 	 * loop mode. Iterate over descriptors, re-setup them and
 	 * call callback function.
 	 */
-	while (sdmac->desc) {
-		struct sdma_desc *desc = sdmac->desc;
+	while (desc) {
 
 		bd = &desc->bd[desc->buf_tail];
 
 		if (bd->mode.status & BD_DONE)
 			break;
 
+		cnt++;
+
 		if (bd->mode.status & BD_RROR) {
 			bd->mode.status &= ~BD_RROR;
 			sdmac->status = DMA_ERROR;
@@ -822,6 +824,17 @@ static void sdma_update_channel_loop(struct sdma_channel *sdmac)
 		if (error)
 			sdmac->status = old_status;
 	}
+
+	/* In some situations it may happen that the sdma does not found any
+	 * usable descriptor in the ring to put data into. The channel is
+	 * stopped then. While there is no specific error condition we can
+	 * check for, a necessary condition is that all available buffers for
+	 * the current channel have been written to by the sdma script. In
+	 * this case and after we have made the buffers available again,
+	 * we restart the channel.
+	 */
+	if (cnt >= desc->num_bd)
+		sdma_enable_channel(sdmac->sdma, sdmac->channel);
 }
 
 static void mxc_sdma_handle_channel_normal(struct sdma_channel *data)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/3] dmaengine: imx-sdma: drop redundant variable
  2019-09-19 14:29 [PATCH v4 0/3] Fix UART DMA freezes for i.MX SOCs Philipp Puschmann
  2019-09-19 14:29 ` [PATCH v4 1/3] dmaengine: imx-sdma: fix buffer ownership Philipp Puschmann
  2019-09-19 14:29 ` [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes Philipp Puschmann
@ 2019-09-19 14:29 ` Philipp Puschmann
  2019-09-20  2:44 ` [EXT] [PATCH v4 0/3] Fix UART DMA freezes for i.MX SOCs Andy Duan
  3 siblings, 0 replies; 9+ messages in thread
From: Philipp Puschmann @ 2019-09-19 14:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: yibin.gong, fugang.duan, l.stach, dan.j.williams, vkoul,
	shawnguo, s.hauer, kernel, festevam, linux-imx, dmaengine,
	linux-arm-kernel, Philipp Puschmann

In sdma_prep_dma_cyclic buf is redundant. Drop it.

Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
---

Changelog v3,v4:
 - no changes

Changelog v2:
 - add Reviewed-by tag

 drivers/dma/imx-sdma.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index a32b5962630e..17961451941a 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -1544,7 +1544,7 @@ static struct dma_async_tx_descriptor *sdma_prep_dma_cyclic(
 	struct sdma_engine *sdma = sdmac->sdma;
 	int num_periods = buf_len / period_len;
 	int channel = sdmac->channel;
-	int i = 0, buf = 0;
+	int i;
 	struct sdma_desc *desc;
 
 	dev_dbg(sdma->dev, "%s channel: %d\n", __func__, channel);
@@ -1565,7 +1565,7 @@ static struct dma_async_tx_descriptor *sdma_prep_dma_cyclic(
 		goto err_bd_out;
 	}
 
-	while (buf < buf_len) {
+	for (i = 0; i < num_periods; i++) {
 		struct sdma_buffer_descriptor *bd = &desc->bd[i];
 		int param;
 
@@ -1592,9 +1592,6 @@ static struct dma_async_tx_descriptor *sdma_prep_dma_cyclic(
 		bd->mode.status = param;
 
 		dma_addr += period_len;
-		buf += period_len;
-
-		i++;
 	}
 
 	return vchan_tx_prep(&sdmac->vc, &desc->vd, flags);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes
  2019-09-19 14:29 ` [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes Philipp Puschmann
@ 2019-09-19 15:19   ` Jan Lübbe
  2019-09-20  8:53     ` Philipp Puschmann
  2019-09-24  1:38   ` Robin Gong
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Lübbe @ 2019-09-19 15:19 UTC (permalink / raw)
  To: Philipp Puschmann, linux-kernel
  Cc: fugang.duan, festevam, s.hauer, vkoul, linux-imx, kernel,
	dan.j.williams, yibin.gong, shawnguo, dmaengine,
	linux-arm-kernel, l.stach

[-- Attachment #1: Type: text/plain, Size: 3663 bytes --]

Hi Philipp,

see below...

On Thu, 2019-09-19 at 16:29 +0200, Philipp Puschmann wrote:
> For some years and since many kernel versions there are reports that the
> RX UART SDMA channel stops working at some point. The workaround was to
> disable DMA for RX. This commit tries to fix the problem itself.
> 
> Due to its license i wasn't able to debug the sdma script itself but it
> somehow leads to blocking the scheduling of the channel script when a
> running sdma script does not find any free descriptor in the ring to put
> its data into.
> 
> If we detect such a potential case we manually restart the channel.
> 
> As sdmac->desc is constant we can move desc out of the loop.
> 
> Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support")
> Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
> ---
> 
> Changelog v4:
>  - fixed the fixes tag
>  
> Changelog v3:
>  - use correct dma_wmb() instead of dma_wb()
>  - add fixes tag
>  
> Changelog v2:
>  - clarify comment and commit description
> 
>  drivers/dma/imx-sdma.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
> index e029a2443cfc..a32b5962630e 100644
> --- a/drivers/dma/imx-sdma.c
> +++ b/drivers/dma/imx-sdma.c
> @@ -775,21 +775,23 @@ static void sdma_start_desc(struct sdma_channel *sdmac)
>  static void sdma_update_channel_loop(struct sdma_channel *sdmac)
>  {
>  	struct sdma_buffer_descriptor *bd;
> -	int error = 0;
> -	enum dma_status	old_status = sdmac->status;
> +	struct sdma_desc *desc = sdmac->desc;
> +	int error = 0, cnt = 0;
> +	enum dma_status old_status = sdmac->status;
>  
>  	/*
>  	 * loop mode. Iterate over descriptors, re-setup them and
>  	 * call callback function.
>  	 */
> -	while (sdmac->desc) {
> -		struct sdma_desc *desc = sdmac->desc;
> +	while (desc) {
>  
>  		bd = &desc->bd[desc->buf_tail];
>  
>  		if (bd->mode.status & BD_DONE)
>  			break;
>  
> +		cnt++;
> +
>  		if (bd->mode.status & BD_RROR) {
>  			bd->mode.status &= ~BD_RROR;
>  			sdmac->status = DMA_ERROR;
> @@ -822,6 +824,17 @@ static void sdma_update_channel_loop(struct sdma_channel *sdmac)
>  		if (error)
>  			sdmac->status = old_status;
>  	}
> +
> +	/* In some situations it may happen that the sdma does not found any
                                                          ^ hasn't
> +	 * usable descriptor in the ring to put data into. The channel is
> +	 * stopped then. While there is no specific error condition we can
> +	 * check for, a necessary condition is that all available buffers for
> +	 * the current channel have been written to by the sdma script. In
> +	 * this case and after we have made the buffers available again,
> +	 * we restart the channel.
> +	 */

Are you sure we can't miss cases where we only had to make some buffers
available again, but the SDMA already ran out of buffers before?

A while ago, I was debugging a similar issue triggered by receiving
data with a wrong baud rate, which leads to all descriptors being
marked with the error flag very quickly (and the SDMA stalling).
I noticed that you can check if the channel is still running by
checking the SDMA_H_STATSTOP register & BIT(sdmac->channel).

I also added a flag for the sdmac->flags field to allow stopping the
channel from the callback (otherwise it would enable the channel
again).

Attached is my current version of that patch for reference.

> +	if (cnt >= desc->num_bd)
> +		sdma_enable_channel(sdmac->sdma, sdmac->channel);
>  }
>  
>  static void mxc_sdma_handle_channel_normal(struct sdma_channel *data)

[-- Attachment #2: 0001-dmaengine-imx-sdma-restart-stopped-cyclic-transfers.patch --]
[-- Type: text/x-patch, Size: 2949 bytes --]

From 73d7dcf84dac5512c50448ff6adf084f1a9bd6f9 Mon Sep 17 00:00:00 2001
From: Jan Luebbe <jlu@pengutronix.de>
Date: Tue, 16 Apr 2019 18:35:04 +0200
Subject: [PATCH] dmaengine: imx-sdma: restart stopped cyclic transfers

For cyclic DMA transfers, we have at least two cases where we can run
out descriptors available to the engine:
- Interrups are disabled for too long and all buffers a filled with
  data.
- DMA errors (such as generated by baud rate mismatch with imx-uart) use
  up all descriptors before we can react.

In this case, SDMA stops the channel and no further transfers are done
until the respective channel is disabled and re-enabled.

The best we can do in this case is to check if the transfer should still
be enabled (it could have been disabled during
sdma_update_channel_loop), but the SDMA channel is stopped. In this
case, we re-start the channel.

To avoid racing with changes to the sdmac->status field (which is
written and restored in sdma_update_channel_loop), we add a new flag
(IMX_DMA_ACTIVE) to indicate that the channel is currently active.

Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
---
 drivers/dma/imx-sdma.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 58fa8520892b..8774259af24c 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -383,6 +383,7 @@ struct sdma_channel {
 };
 
 #define IMX_DMA_SG_LOOP		BIT(0)
+#define IMX_DMA_ACTIVE		BIT(1)
 
 #define MAX_DMA_CHANNELS 32
 #define MXC_SDMA_DEFAULT_PRIORITY 1
@@ -658,6 +659,9 @@ static int sdma_config_ownership(struct sdma_channel *sdmac,
 
 static void sdma_enable_channel(struct sdma_engine *sdma, int channel)
 {
+	struct sdma_channel *sdmac = &sdma->channel[channel];
+
+	sdmac->flags |= IMX_DMA_ACTIVE;
 	writel(BIT(channel), sdma->regs + SDMA_H_START);
 }
 
@@ -774,6 +778,7 @@ static void sdma_start_desc(struct sdma_channel *sdmac)
 
 static void sdma_update_channel_loop(struct sdma_channel *sdmac)
 {
+	struct sdma_engine *sdma = sdmac->sdma;
 	struct sdma_buffer_descriptor *bd;
 	int error = 0;
 	enum dma_status	old_status = sdmac->status;
@@ -820,6 +825,13 @@ static void sdma_update_channel_loop(struct sdma_channel *sdmac)
 		if (error)
 			sdmac->status = old_status;
 	}
+
+	if ((sdmac->flags & IMX_DMA_ACTIVE) &&
+	    !(readl_relaxed(sdma->regs + SDMA_H_STATSTOP) & BIT(sdmac->channel))) {
+		dev_err_ratelimited(sdma->dev, "SDMA channel %d: cyclic transfer disabled by HW, reenabling\n",
+				sdmac->channel);
+		writel(BIT(sdmac->channel), sdma->regs + SDMA_H_START);
+	};
 }
 
 static void mxc_sdma_handle_channel_normal(struct sdma_channel *data)
@@ -1049,6 +1061,7 @@ static int sdma_disable_channel(struct dma_chan *chan)
 	struct sdma_engine *sdma = sdmac->sdma;
 	int channel = sdmac->channel;
 
+	sdmac->flags &= ~IMX_DMA_ACTIVE;
 	writel_relaxed(BIT(channel), sdma->regs + SDMA_H_STATSTOP);
 	sdmac->status = DMA_ERROR;
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [EXT] [PATCH v4 0/3] Fix UART DMA freezes for i.MX SOCs
  2019-09-19 14:29 [PATCH v4 0/3] Fix UART DMA freezes for i.MX SOCs Philipp Puschmann
                   ` (2 preceding siblings ...)
  2019-09-19 14:29 ` [PATCH v4 3/3] dmaengine: imx-sdma: drop redundant variable Philipp Puschmann
@ 2019-09-20  2:44 ` Andy Duan
  3 siblings, 0 replies; 9+ messages in thread
From: Andy Duan @ 2019-09-20  2:44 UTC (permalink / raw)
  To: Philipp Puschmann, linux-kernel
  Cc: Robin Gong, l.stach, dan.j.williams, vkoul, shawnguo, s.hauer,
	kernel, festevam, dl-linux-imx, dmaengine, linux-arm-kernel

From: Philipp Puschmann <philipp.puschmann@emlix.com> Sent: Thursday, September 19, 2019 10:30 PM
> For some years and since many kernel versions there are reports that RX
> UART DMA channel stops working at one point. So far the usual workaround
> was to disable RX DMA. This patches fix the underlying problem.
> 
> When a running sdma script does not find any usable destination buffer to put
> its data into it just leads to stopping the channel being scheduled again. As
> solution we manually retrigger the sdma script for this channel and by this
> dissolve the freeze.
> 
> While this seems to work fine so far, it may come to buffer overruns when the
> channel - even temporary - is stopped. This case has to be addressed by
> device drivers by increasing the number of DMA periods.
> 
> This patch series was tested with the current kernel and backported to kernel
> 4.15 with a special use case using a WL1837MOD via UART and provoking the
> hanging of UART RX DMA within seconds after starting a test application. It
> resulted in well known
>   "Bluetooth: hci0: command 0x0408 tx timeout"
> errors and complete stop of UART data reception. Our Bluetooth traffic
> consists of many independent small packets, mostly only a few bytes, causing
> high usage of periods.
> 
> Changelog v4:
>  - fixed the fixes tags
> 
> Changelog v3:
>  - fixes typo in dma_wmb
>  - add fixes tags
> 
> Changelog v2:
>  - adapt title (this patches are not only for i.MX6)
>  - improve some comments and patch descriptions
>  - add a dma_wb() around BD_DONE flag
>  - add Reviewed-by tags
>  - split off  "serial: imx: adapt rx buffer and dma periods"
> 
> Philipp Puschmann (3):
>   dmaengine: imx-sdma: fix buffer ownership
>   dmaengine: imx-sdma: fix dma freezes
>   dmaengine: imx-sdma: drop redundant variable
> 
>  drivers/dma/imx-sdma.c | 32 ++++++++++++++++++++++----------
>  1 file changed, 22 insertions(+), 10 deletions(-)
> 
> --
> 2.23.0

The patch set look fine that is really to fix some corner issue from the logical view.

Reviewed-by: Fugang Duan <fugang.duan@nxp.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes
  2019-09-19 15:19   ` Jan Lübbe
@ 2019-09-20  8:53     ` Philipp Puschmann
  2019-09-20  9:26       ` Lucas Stach
  0 siblings, 1 reply; 9+ messages in thread
From: Philipp Puschmann @ 2019-09-20  8:53 UTC (permalink / raw)
  To: Jan Lübbe, linux-kernel
  Cc: fugang.duan, festevam, s.hauer, vkoul, linux-imx, kernel,
	dan.j.williams, yibin.gong, shawnguo, dmaengine,
	linux-arm-kernel, l.stach

Hi Jan,

Am 19.09.19 um 17:19 schrieb Jan Lübbe:
> Hi Philipp,
> 
> see below...
> 
> On Thu, 2019-09-19 at 16:29 +0200, Philipp Puschmann wrote:
>> For some years and since many kernel versions there are reports that the
>> RX UART SDMA channel stops working at some point. The workaround was to
>> disable DMA for RX. This commit tries to fix the problem itself.
>>
>> Due to its license i wasn't able to debug the sdma script itself but it
>> somehow leads to blocking the scheduling of the channel script when a
>> running sdma script does not find any free descriptor in the ring to put
>> its data into.
>>
>> If we detect such a potential case we manually restart the channel.
>>
>> As sdmac->desc is constant we can move desc out of the loop.
>>
>> Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support")
>> Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
>> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
>> ---
>>
>> Changelog v4:
>>  - fixed the fixes tag
>>  
>> Changelog v3:
>>  - use correct dma_wmb() instead of dma_wb()
>>  - add fixes tag
>>  
>> Changelog v2:
>>  - clarify comment and commit description
>>
>>  drivers/dma/imx-sdma.c | 21 +++++++++++++++++----
>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
>> index e029a2443cfc..a32b5962630e 100644
>> --- a/drivers/dma/imx-sdma.c
>> +++ b/drivers/dma/imx-sdma.c
>> @@ -775,21 +775,23 @@ static void sdma_start_desc(struct sdma_channel *sdmac)
>>  static void sdma_update_channel_loop(struct sdma_channel *sdmac)
>>  {
>>  	struct sdma_buffer_descriptor *bd;
>> -	int error = 0;
>> -	enum dma_status	old_status = sdmac->status;
>> +	struct sdma_desc *desc = sdmac->desc;
>> +	int error = 0, cnt = 0;
>> +	enum dma_status old_status = sdmac->status;
>>  
>>  	/*
>>  	 * loop mode. Iterate over descriptors, re-setup them and
>>  	 * call callback function.
>>  	 */
>> -	while (sdmac->desc) {
>> -		struct sdma_desc *desc = sdmac->desc;
>> +	while (desc) {
>>  
>>  		bd = &desc->bd[desc->buf_tail];
>>  
>>  		if (bd->mode.status & BD_DONE)
>>  			break;
>>  
>> +		cnt++;
>> +
>>  		if (bd->mode.status & BD_RROR) {
>>  			bd->mode.status &= ~BD_RROR;
>>  			sdmac->status = DMA_ERROR;
>> @@ -822,6 +824,17 @@ static void sdma_update_channel_loop(struct sdma_channel *sdmac)
>>  		if (error)
>>  			sdmac->status = old_status;
>>  	}
>> +
>> +	/* In some situations it may happen that the sdma does not found any
>                                                           ^ hasn't
>> +	 * usable descriptor in the ring to put data into. The channel is
>> +	 * stopped then. While there is no specific error condition we can
>> +	 * check for, a necessary condition is that all available buffers for
>> +	 * the current channel have been written to by the sdma script. In
>> +	 * this case and after we have made the buffers available again,
>> +	 * we restart the channel.
>> +	 */
> 
> Are you sure we can't miss cases where we only had to make some buffers
> available again, but the SDMA already ran out of buffers before?
Think so, yes.
> 
> A while ago, I was debugging a similar issue triggered by receiving
> data with a wrong baud rate, which leads to all descriptors being
> marked with the error flag very quickly (and the SDMA stalling).
> I noticed that you can check if the channel is still running by
> checking the SDMA_H_STATSTOP register & BIT(sdmac->channel).

I think checking for this register is the better approach. Then i could drop the
cnt variable. And by droppting cnt i would propose to move the check and reenabling
to the end of the while loop to reenable the channel after freeing first buffer.

> 
> I also added a flag for the sdmac->flags field to allow stopping the
> channel from the callback (otherwise it would enable the channel
> again).

Could memory and compiler ordering a problem here?
I'm not that into these kind of problems, but is this
	sdmac->flags &= ~IMX_DMA_ACTIVE;
  	writel_relaxed(BIT(channel), sdma->regs + SDMA_H_STATSTOP);
guaranteed to be free of race conditions?

Regards,
Philipp

> 
> Attached is my current version of that patch for reference.
> 
>> +	if (cnt >= desc->num_bd)
>> +		sdma_enable_channel(sdmac->sdma, sdmac->channel);
>>  }
>>  
>>  static void mxc_sdma_handle_channel_normal(struct sdma_channel *data)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes
  2019-09-20  8:53     ` Philipp Puschmann
@ 2019-09-20  9:26       ` Lucas Stach
  0 siblings, 0 replies; 9+ messages in thread
From: Lucas Stach @ 2019-09-20  9:26 UTC (permalink / raw)
  To: Philipp Puschmann, Jan Lübbe, linux-kernel
  Cc: fugang.duan, festevam, s.hauer, vkoul, linux-imx, kernel,
	dan.j.williams, yibin.gong, shawnguo, dmaengine,
	linux-arm-kernel

On Fr, 2019-09-20 at 10:53 +0200, Philipp Puschmann wrote:
> Hi Jan,
> 
> Am 19.09.19 um 17:19 schrieb Jan Lübbe:
> > Hi Philipp,
> > 
> > see below...
> > 
> > On Thu, 2019-09-19 at 16:29 +0200, Philipp Puschmann wrote:
> > > For some years and since many kernel versions there are reports that the
> > > RX UART SDMA channel stops working at some point. The workaround was to
> > > disable DMA for RX. This commit tries to fix the problem itself.
> > > 
> > > Due to its license i wasn't able to debug the sdma script itself but it
> > > somehow leads to blocking the scheduling of the channel script when a
> > > running sdma script does not find any free descriptor in the ring to put
> > > its data into.
> > > 
> > > If we detect such a potential case we manually restart the channel.
> > > 
> > > As sdmac->desc is constant we can move desc out of the loop.
> > > 
> > > Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support")
> > > Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
> > > Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
> > > ---
> > > 
> > > Changelog v4:
> > >  - fixed the fixes tag
> > >  
> > > Changelog v3:
> > >  - use correct dma_wmb() instead of dma_wb()
> > >  - add fixes tag
> > >  
> > > Changelog v2:
> > >  - clarify comment and commit description
> > > 
> > >  drivers/dma/imx-sdma.c | 21 +++++++++++++++++----
> > >  1 file changed, 17 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
> > > index e029a2443cfc..a32b5962630e 100644
> > > --- a/drivers/dma/imx-sdma.c
> > > +++ b/drivers/dma/imx-sdma.c
> > > @@ -775,21 +775,23 @@ static void sdma_start_desc(struct sdma_channel *sdmac)
> > >  static void sdma_update_channel_loop(struct sdma_channel *sdmac)
> > >  {
> > >  	struct sdma_buffer_descriptor *bd;
> > > -	int error = 0;
> > > -	enum dma_status	old_status = sdmac->status;
> > > +	struct sdma_desc *desc = sdmac->desc;
> > > +	int error = 0, cnt = 0;
> > > +	enum dma_status old_status = sdmac->status;
> > >  
> > >  	/*
> > >  	 * loop mode. Iterate over descriptors, re-setup them and
> > >  	 * call callback function.
> > >  	 */
> > > -	while (sdmac->desc) {
> > > -		struct sdma_desc *desc = sdmac->desc;
> > > +	while (desc) {
> > >  
> > >  		bd = &desc->bd[desc->buf_tail];
> > >  
> > >  		if (bd->mode.status & BD_DONE)
> > >  			break;
> > >  
> > > +		cnt++;
> > > +
> > >  		if (bd->mode.status & BD_RROR) {
> > >  			bd->mode.status &= ~BD_RROR;
> > >  			sdmac->status = DMA_ERROR;
> > > @@ -822,6 +824,17 @@ static void sdma_update_channel_loop(struct sdma_channel *sdmac)
> > >  		if (error)
> > >  			sdmac->status = old_status;
> > >  	}
> > > +
> > > +	/* In some situations it may happen that the sdma does not found any
> >                                                           ^ hasn't
> > > +	 * usable descriptor in the ring to put data into. The channel is
> > > +	 * stopped then. While there is no specific error condition we can
> > > +	 * check for, a necessary condition is that all available buffers for
> > > +	 * the current channel have been written to by the sdma script. In
> > > +	 * this case and after we have made the buffers available again,
> > > +	 * we restart the channel.
> > > +	 */
> > 
> > Are you sure we can't miss cases where we only had to make some buffers
> > available again, but the SDMA already ran out of buffers before?
> Think so, yes.
> > A while ago, I was debugging a similar issue triggered by receiving
> > data with a wrong baud rate, which leads to all descriptors being
> > marked with the error flag very quickly (and the SDMA stalling).
> > I noticed that you can check if the channel is still running by
> > checking the SDMA_H_STATSTOP register & BIT(sdmac->channel).
> 
> I think checking for this register is the better approach. Then i could drop the
> cnt variable. And by droppting cnt i would propose to move the check and reenabling
> to the end of the while loop to reenable the channel after freeing first buffer.

You certainly don't want to have a MMIO read at each iteration of the
loop, as that would be quite a bit of overhead. I'm not sure it's worth
it to try to minimize the channel re-enable latency. You are only
getting into this situation because of bad system latencies before this
part of the code run, so the little bit of latency added by cleaning
the descriptors before trying to re-enable the channel will probably
not add much further harm and you don't risk running in the out-of-
descriptors error immediately again. Remember, in a preemptible kernel
the task cleaning the descriptors could be put to sleep immediately
after you you cleaned a single descriptor and kicked the channel back
to life.

> > I also added a flag for the sdmac->flags field to allow stopping the
> > channel from the callback (otherwise it would enable the channel
> > again).
> 
> Could memory and compiler ordering a problem here?
> I'm not that into these kind of problems, but is this
> 	sdmac->flags &= ~IMX_DMA_ACTIVE;
>   	writel_relaxed(BIT(channel), sdma->regs + SDMA_H_STATSTOP);
> guaranteed to be free of race conditions?

In fact the writel_relaxed needs to be replaced by the non-relaxed
version to imply a proper memory barrier before the register write.

Regards,
Lucas


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes
  2019-09-19 14:29 ` [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes Philipp Puschmann
  2019-09-19 15:19   ` Jan Lübbe
@ 2019-09-24  1:38   ` Robin Gong
  1 sibling, 0 replies; 9+ messages in thread
From: Robin Gong @ 2019-09-24  1:38 UTC (permalink / raw)
  To: Philipp Puschmann, linux-kernel
  Cc: Andy Duan, l.stach, dan.j.williams, vkoul, shawnguo, s.hauer,
	kernel, festevam, dl-linux-imx, dmaengine, linux-arm-kernel


On 2019-9-19 22:30 Philipp Puschmann <philipp.puschmann@emlix.com> wrote 
> For some years and since many kernel versions there are reports that the RX
> UART SDMA channel stops working at some point. The workaround was to
> disable DMA for RX. This commit tries to fix the problem itself.
> 
> Due to its license i wasn't able to debug the sdma script itself but it somehow
> leads to blocking the scheduling of the channel script when a running sdma
> script does not find any free descriptor in the ring to put its data into.
> 
> If we detect such a potential case we manually restart the channel.
> 
> As sdmac->desc is constant we can move desc out of the loop.
> 
> Fixes: 1ec1e82f2510 ("dmaengine: Add Freescale i.MX SDMA support")
In fact, it's a refine patch rather than bug fix, just restore cyclic transfer
back in the corner case. There are two causes for such 'corner case':
1. improper number of BD or length of BD setting for cyclic, so that BD could
be consumed very quickly, worst case is uart Aging timer which one byte
may consume one BD. So for such case, enlarge more BDs is the right way as
your UART patch.
2. High cpu loading so that SDMA interrupt handler can't run in time to set 
BD_DONE flag back again, at last all BDs consumed. In such case, this patch
may blind other coding issues such as long time window of disable irq(spin_lock_irq)
. So I think this patch is much like a refine/restore patch, and it's better to add
a clear print information to hint user channel is restoring and unexpected high cpu
loading is coming...

> Signed-off-by: Philipp Puschmann <philipp.puschmann@emlix.com>
> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
> ---
> 
> Changelog v4:
>  - fixed the fixes tag
> 
> Changelog v3:
>  - use correct dma_wmb() instead of dma_wb()
>  - add fixes tag
> 
> Changelog v2:
>  - clarify comment and commit description
> 
>  drivers/dma/imx-sdma.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c index
> e029a2443cfc..a32b5962630e 100644
> --- a/drivers/dma/imx-sdma.c
> +++ b/drivers/dma/imx-sdma.c
> @@ -775,21 +775,23 @@ static void sdma_start_desc(struct sdma_channel
> *sdmac)  static void sdma_update_channel_loop(struct sdma_channel
> *sdmac)  {
>  	struct sdma_buffer_descriptor *bd;
> -	int error = 0;
> -	enum dma_status	old_status = sdmac->status;
> +	struct sdma_desc *desc = sdmac->desc;
> +	int error = 0, cnt = 0;
> +	enum dma_status old_status = sdmac->status;
> 
>  	/*
>  	 * loop mode. Iterate over descriptors, re-setup them and
>  	 * call callback function.
>  	 */
> -	while (sdmac->desc) {
> -		struct sdma_desc *desc = sdmac->desc;
> +	while (desc) {
> 
>  		bd = &desc->bd[desc->buf_tail];
> 
>  		if (bd->mode.status & BD_DONE)
>  			break;
> 
> +		cnt++;
> +
>  		if (bd->mode.status & BD_RROR) {
>  			bd->mode.status &= ~BD_RROR;
>  			sdmac->status = DMA_ERROR;
> @@ -822,6 +824,17 @@ static void sdma_update_channel_loop(struct
> sdma_channel *sdmac)
>  		if (error)
>  			sdmac->status = old_status;
>  	}
> +
> +	/* In some situations it may happen that the sdma does not found any
> +	 * usable descriptor in the ring to put data into. The channel is
> +	 * stopped then. While there is no specific error condition we can
> +	 * check for, a necessary condition is that all available buffers for
> +	 * the current channel have been written to by the sdma script. In
> +	 * this case and after we have made the buffers available again,
> +	 * we restart the channel.
> +	 */
> +	if (cnt >= desc->num_bd)
> +		sdma_enable_channel(sdmac->sdma, sdmac->channel);
>  }
> 
>  static void mxc_sdma_handle_channel_normal(struct sdma_channel *data)
> --
> 2.23.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-09-24  1:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-19 14:29 [PATCH v4 0/3] Fix UART DMA freezes for i.MX SOCs Philipp Puschmann
2019-09-19 14:29 ` [PATCH v4 1/3] dmaengine: imx-sdma: fix buffer ownership Philipp Puschmann
2019-09-19 14:29 ` [PATCH v4 2/3] dmaengine: imx-sdma: fix dma freezes Philipp Puschmann
2019-09-19 15:19   ` Jan Lübbe
2019-09-20  8:53     ` Philipp Puschmann
2019-09-20  9:26       ` Lucas Stach
2019-09-24  1:38   ` Robin Gong
2019-09-19 14:29 ` [PATCH v4 3/3] dmaengine: imx-sdma: drop redundant variable Philipp Puschmann
2019-09-20  2:44 ` [EXT] [PATCH v4 0/3] Fix UART DMA freezes for i.MX SOCs Andy Duan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).