dmaengine.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dma: tegra: add accurate reporting of dma state
@ 2019-04-24 16:23 Ben Dooks
  2019-04-24 16:23 ` [PATCH] " Ben Dooks
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Ben Dooks @ 2019-04-24 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ben Dooks, Dmitry Osipenko, Laxman Dewangan, Jon Hunter,
	Vinod Koul, Dan Williams, Thierry Reding, dmaengine, linux-tegra,
	linux-kernel

The tx_status callback does not report the state of the transfer
beyond complete segments. This causes problems with users such as
ALSA when applications want to know accurately how much data has
been moved.

This patch addes a function tegra_dma_update_residual() to query
the hardware and modify the residual information accordinly. It
takes into account any hardware issues when trying to read the
state, such as delays between finishing a buffer and signalling
the interrupt.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
Cc: Dmitry Osipenko <digetx@gmail.com>
Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
Cc: linux-kernel@vger.kernel.org (open list)
---
 drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
 1 file changed, 86 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
index cf462b1abc0b..544e7273e741 100644
--- a/drivers/dma/tegra20-apb-dma.c
+++ b/drivers/dma/tegra20-apb-dma.c
@@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
 	return 0;
 }
 
+static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
+					      struct tegra_dma_sg_req *sg_req,
+					      struct tegra_dma_desc *dma_desc,
+					      unsigned int residual)
+{
+	unsigned long status = 0x0;
+	unsigned long wcount;
+	unsigned long ahbptr;
+	unsigned long tmp = 0x0;
+	unsigned int result;
+	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
+	int done;
+
+	/* if we're not the current request, then don't alter the residual */
+	if (sg_req != list_first_entry(&tdc->pending_sg_req,
+				       struct tegra_dma_sg_req, node)) {
+		result = residual;
+		ahbptr = 0xffffffff;
+		goto done;
+	}
+
+	/* loop until we have a reliable result for residual */
+	do {
+		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
+		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
+		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */
+
+		/* check status, if channel isn't busy then skip */
+		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
+			result = residual;
+			break;
+		}
+
+		/* if we've got an interrupt pending on the channel, don't
+		 * try and deal with the residue as the hardware has likely
+		 * moved on to the next buffer. return all data moved.
+		 */
+		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
+			result = residual - sg_req->req_len;
+			break;
+		}
+
+		if (tdc->tdma->chip_data->support_separate_wcount_reg)
+			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
+		else
+			wcount = status;
+
+		/* If the request is at the full point, then there is a
+		 * chance that we have read the status register in the
+		 * middle of the hardware reloading the next buffer.
+		 *
+		 * The sequence seems to be at the end of the buffer, to
+		 * load the new word count before raising the EOC flag (or
+		 * changing the ping-pong flag which could have also been
+		 * used to determine a new buffer). This  means there is a
+		 * small window where we cannot determine zero-done for the
+		 * current buffer, or moved to next buffer.
+		 *
+		 * If done shows 0, then retry the load, as it may hit the
+		 * above hardware race. We will either get a new value which
+		 * is from the first buffer, or we get an EOC (new buffer)
+		 * or both a new value and an EOC...
+		 */
+		done = get_current_xferred_count(tdc, sg_req, wcount);
+		if (done != 0) {
+			result = residual - done;
+			break;
+		}
+
+		ndelay(100);
+	} while (--retries > 0);
+
+	if (retries <= 0) {
+		dev_err(tdc2dev(tdc), "timeout waiting for dma load\n");
+		result = residual;
+	}
+
+done:	
+	dev_dbg(tdc2dev(tdc), "residual: req %08lx, ahb@%08lx, wcount %08lx, done %d\n",
+		 sg_req->ch_regs.ahb_ptr, ahbptr, wcount, done);
+
+	return result;
+}
+
 static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
 	dma_cookie_t cookie, struct dma_tx_state *txstate)
 {
@@ -849,6 +933,7 @@ static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
 		residual = dma_desc->bytes_requested -
 			   (dma_desc->bytes_transferred %
 			    dma_desc->bytes_requested);
+		residual = tegra_dma_update_residual(tdc, sg_req, dma_desc, residual);
 		dma_set_residue(txstate, residual);
 	}
 
@@ -1444,12 +1529,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
 		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
 		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
 	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
-	/*
-	 * XXX The hardware appears to support
-	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
-	 * only used by this driver during tegra_dma_terminate_all()
-	 */
-	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
+	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
 	tdma->dma_dev.device_config = tegra_dma_slave_config;
 	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
 	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] dma: tegra: add accurate reporting of dma state
  2019-04-24 16:23 dma: tegra: add accurate reporting of dma state Ben Dooks
@ 2019-04-24 16:23 ` Ben Dooks
  2019-04-24 18:17 ` Dmitry Osipenko
  2019-05-01  8:33 ` Jon Hunter
  2 siblings, 0 replies; 13+ messages in thread
From: Ben Dooks @ 2019-04-24 16:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ben Dooks, Dmitry Osipenko, Laxman Dewangan, Jon Hunter,
	Vinod Koul, Dan Williams, Thierry Reding, dmaengine, linux-tegra,
	linux-kernel

The tx_status callback does not report the state of the transfer
beyond complete segments. This causes problems with users such as
ALSA when applications want to know accurately how much data has
been moved.

This patch addes a function tegra_dma_update_residual() to query
the hardware and modify the residual information accordinly. It
takes into account any hardware issues when trying to read the
state, such as delays between finishing a buffer and signalling
the interrupt.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
Cc: Dmitry Osipenko <digetx@gmail.com>
Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
Cc: linux-kernel@vger.kernel.org (open list)
---
 drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
 1 file changed, 86 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
index cf462b1abc0b..544e7273e741 100644
--- a/drivers/dma/tegra20-apb-dma.c
+++ b/drivers/dma/tegra20-apb-dma.c
@@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
 	return 0;
 }
 
+static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
+					      struct tegra_dma_sg_req *sg_req,
+					      struct tegra_dma_desc *dma_desc,
+					      unsigned int residual)
+{
+	unsigned long status = 0x0;
+	unsigned long wcount;
+	unsigned long ahbptr;
+	unsigned long tmp = 0x0;
+	unsigned int result;
+	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
+	int done;
+
+	/* if we're not the current request, then don't alter the residual */
+	if (sg_req != list_first_entry(&tdc->pending_sg_req,
+				       struct tegra_dma_sg_req, node)) {
+		result = residual;
+		ahbptr = 0xffffffff;
+		goto done;
+	}
+
+	/* loop until we have a reliable result for residual */
+	do {
+		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
+		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
+		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */
+
+		/* check status, if channel isn't busy then skip */
+		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
+			result = residual;
+			break;
+		}
+
+		/* if we've got an interrupt pending on the channel, don't
+		 * try and deal with the residue as the hardware has likely
+		 * moved on to the next buffer. return all data moved.
+		 */
+		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
+			result = residual - sg_req->req_len;
+			break;
+		}
+
+		if (tdc->tdma->chip_data->support_separate_wcount_reg)
+			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
+		else
+			wcount = status;
+
+		/* If the request is at the full point, then there is a
+		 * chance that we have read the status register in the
+		 * middle of the hardware reloading the next buffer.
+		 *
+		 * The sequence seems to be at the end of the buffer, to
+		 * load the new word count before raising the EOC flag (or
+		 * changing the ping-pong flag which could have also been
+		 * used to determine a new buffer). This  means there is a
+		 * small window where we cannot determine zero-done for the
+		 * current buffer, or moved to next buffer.
+		 *
+		 * If done shows 0, then retry the load, as it may hit the
+		 * above hardware race. We will either get a new value which
+		 * is from the first buffer, or we get an EOC (new buffer)
+		 * or both a new value and an EOC...
+		 */
+		done = get_current_xferred_count(tdc, sg_req, wcount);
+		if (done != 0) {
+			result = residual - done;
+			break;
+		}
+
+		ndelay(100);
+	} while (--retries > 0);
+
+	if (retries <= 0) {
+		dev_err(tdc2dev(tdc), "timeout waiting for dma load\n");
+		result = residual;
+	}
+
+done:	
+	dev_dbg(tdc2dev(tdc), "residual: req %08lx, ahb@%08lx, wcount %08lx, done %d\n",
+		 sg_req->ch_regs.ahb_ptr, ahbptr, wcount, done);
+
+	return result;
+}
+
 static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
 	dma_cookie_t cookie, struct dma_tx_state *txstate)
 {
@@ -849,6 +933,7 @@ static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
 		residual = dma_desc->bytes_requested -
 			   (dma_desc->bytes_transferred %
 			    dma_desc->bytes_requested);
+		residual = tegra_dma_update_residual(tdc, sg_req, dma_desc, residual);
 		dma_set_residue(txstate, residual);
 	}
 
@@ -1444,12 +1529,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
 		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
 		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
 	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
-	/*
-	 * XXX The hardware appears to support
-	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
-	 * only used by this driver during tegra_dma_terminate_all()
-	 */
-	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
+	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
 	tdma->dma_dev.device_config = tegra_dma_slave_config;
 	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
 	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* dma: tegra: add accurate reporting of dma state
@ 2019-04-24 18:17 ` Dmitry Osipenko
  2019-04-24 18:17   ` [PATCH] " Dmitry Osipenko
  2019-05-01  8:58   ` Ben Dooks
  0 siblings, 2 replies; 13+ messages in thread
From: Dmitry Osipenko @ 2019-04-24 18:17 UTC (permalink / raw)
  To: Ben Dooks, linux-kernel
  Cc: Laxman Dewangan, Jon Hunter, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel

24.04.2019 19:23, Ben Dooks пишет:
> The tx_status callback does not report the state of the transfer
> beyond complete segments. This causes problems with users such as
> ALSA when applications want to know accurately how much data has
> been moved.
> 
> This patch addes a function tegra_dma_update_residual() to query
> the hardware and modify the residual information accordinly. It
> takes into account any hardware issues when trying to read the
> state, such as delays between finishing a buffer and signalling
> the interrupt.
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>

Hello Ben,

Thank you very much for keeping it up. I have couple comments, please see them below.

> Cc: Dmitry Osipenko <digetx@gmail.com>
> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
> Cc: linux-kernel@vger.kernel.org (open list)
> ---
>  drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>  1 file changed, 86 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
> index cf462b1abc0b..544e7273e741 100644
> --- a/drivers/dma/tegra20-apb-dma.c
> +++ b/drivers/dma/tegra20-apb-dma.c
> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
>  	return 0;
>  }
>  
> +static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
> +					      struct tegra_dma_sg_req *sg_req,
> +					      struct tegra_dma_desc *dma_desc,
> +					      unsigned int residual)
> +{
> +	unsigned long status = 0x0;
> +	unsigned long wcount;
> +	unsigned long ahbptr;
> +	unsigned long tmp = 0x0;
> +	unsigned int result;

You could pre-assign ahbptr=0xffffffff and result=residual here, then you could remove all the duplicated assigns below. 

> +	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
> +	int done;
> +
> +	/* if we're not the current request, then don't alter the residual */
> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
> +				       struct tegra_dma_sg_req, node)) {
> +		result = residual;
> +		ahbptr = 0xffffffff;
> +		goto done;
> +	}
> +
> +	/* loop until we have a reliable result for residual */
> +	do {
> +		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> +		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */

The "tmp" variable isn't used anywhere in the code, please remove it.

> +
> +		/* check status, if channel isn't busy then skip */
> +		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
> +			result = residual;
> +			break;
> +		}

This doesn't look correct because TRM says "Busy bit gets set as soon as a channel is enabled and gets cleared after transfer completes", hence a cleared BUSY bit means that all transfers are completed and result=residual is incorrect here. Given that there is a check for EOC bit being set below, this hunk should be removed.

> +
> +		/* if we've got an interrupt pending on the channel, don't
> +		 * try and deal with the residue as the hardware has likely
> +		 * moved on to the next buffer. return all data moved.
> +		 */
> +		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
> +			result = residual - sg_req->req_len;
> +			break;
> +		}
> +
> +		if (tdc->tdma->chip_data->support_separate_wcount_reg)
> +			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
> +		else
> +			wcount = status;
> +
> +		/* If the request is at the full point, then there is a
> +		 * chance that we have read the status register in the
> +		 * middle of the hardware reloading the next buffer.
> +		 *
> +		 * The sequence seems to be at the end of the buffer, to
> +		 * load the new word count before raising the EOC flag (or
> +		 * changing the ping-pong flag which could have also been
> +		 * used to determine a new buffer). This  means there is a
> +		 * small window where we cannot determine zero-done for the
> +		 * current buffer, or moved to next buffer.
> +		 *
> +		 * If done shows 0, then retry the load, as it may hit the
> +		 * above hardware race. We will either get a new value which
> +		 * is from the first buffer, or we get an EOC (new buffer)
> +		 * or both a new value and an EOC...
> +		 */
> +		done = get_current_xferred_count(tdc, sg_req, wcount);
> +		if (done != 0) {
> +			result = residual - done;
> +			break;
> +		}
> +
> +		ndelay(100);

Please use udelay(1) because there is no ndelay on arm32 and ndelay(100) is getting rounded up to 1usec. AFAIK, arm64 doesn't have reliable ndelay on Tegra either because timer rate changes with the CPU frequency scaling.

Secondly done=0 isn't a error case, technically this could be the case when tegra_dma_update_residual() is invoked just after starting the transfer. Hence I think this do-while loop and timeout checking aren't needed at all since done=0 is a perfectly valid case.


Altogether seems the tegra_dma_update_residual() could be reduced to:

static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
					      struct tegra_dma_sg_req *sg_req,
					      struct tegra_dma_desc *dma_desc,
					      unsigned int residual)
{
	unsigned long status, wcount;

	if (list_is_first(&sg_req->node, &tdc->pending_sg_req))
		return residual;

	if (tdc->tdma->chip_data->support_separate_wcount_reg)
		wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);

	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);

	if (!tdc->tdma->chip_data->support_separate_wcount_reg)
		wcount = status;

	if (status & TEGRA_APBDMA_STATUS_ISE_EOC)
		return residual - sg_req->req_len;

	return residual - get_current_xferred_count(tdc, sg_req, wcount);
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma: tegra: add accurate reporting of dma state
  2019-04-24 18:17 ` Dmitry Osipenko
@ 2019-04-24 18:17   ` Dmitry Osipenko
  2019-05-01  8:58   ` Ben Dooks
  1 sibling, 0 replies; 13+ messages in thread
From: Dmitry Osipenko @ 2019-04-24 18:17 UTC (permalink / raw)
  To: Ben Dooks, linux-kernel
  Cc: Laxman Dewangan, Jon Hunter, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel

24.04.2019 19:23, Ben Dooks пишет:
> The tx_status callback does not report the state of the transfer
> beyond complete segments. This causes problems with users such as
> ALSA when applications want to know accurately how much data has
> been moved.
> 
> This patch addes a function tegra_dma_update_residual() to query
> the hardware and modify the residual information accordinly. It
> takes into account any hardware issues when trying to read the
> state, such as delays between finishing a buffer and signalling
> the interrupt.
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>

Hello Ben,

Thank you very much for keeping it up. I have couple comments, please see them below.

> Cc: Dmitry Osipenko <digetx@gmail.com>
> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
> Cc: linux-kernel@vger.kernel.org (open list)
> ---
>  drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>  1 file changed, 86 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
> index cf462b1abc0b..544e7273e741 100644
> --- a/drivers/dma/tegra20-apb-dma.c
> +++ b/drivers/dma/tegra20-apb-dma.c
> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
>  	return 0;
>  }
>  
> +static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
> +					      struct tegra_dma_sg_req *sg_req,
> +					      struct tegra_dma_desc *dma_desc,
> +					      unsigned int residual)
> +{
> +	unsigned long status = 0x0;
> +	unsigned long wcount;
> +	unsigned long ahbptr;
> +	unsigned long tmp = 0x0;
> +	unsigned int result;

You could pre-assign ahbptr=0xffffffff and result=residual here, then you could remove all the duplicated assigns below. 

> +	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
> +	int done;
> +
> +	/* if we're not the current request, then don't alter the residual */
> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
> +				       struct tegra_dma_sg_req, node)) {
> +		result = residual;
> +		ahbptr = 0xffffffff;
> +		goto done;
> +	}
> +
> +	/* loop until we have a reliable result for residual */
> +	do {
> +		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> +		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */

The "tmp" variable isn't used anywhere in the code, please remove it.

> +
> +		/* check status, if channel isn't busy then skip */
> +		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
> +			result = residual;
> +			break;
> +		}

This doesn't look correct because TRM says "Busy bit gets set as soon as a channel is enabled and gets cleared after transfer completes", hence a cleared BUSY bit means that all transfers are completed and result=residual is incorrect here. Given that there is a check for EOC bit being set below, this hunk should be removed.

> +
> +		/* if we've got an interrupt pending on the channel, don't
> +		 * try and deal with the residue as the hardware has likely
> +		 * moved on to the next buffer. return all data moved.
> +		 */
> +		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
> +			result = residual - sg_req->req_len;
> +			break;
> +		}
> +
> +		if (tdc->tdma->chip_data->support_separate_wcount_reg)
> +			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
> +		else
> +			wcount = status;
> +
> +		/* If the request is at the full point, then there is a
> +		 * chance that we have read the status register in the
> +		 * middle of the hardware reloading the next buffer.
> +		 *
> +		 * The sequence seems to be at the end of the buffer, to
> +		 * load the new word count before raising the EOC flag (or
> +		 * changing the ping-pong flag which could have also been
> +		 * used to determine a new buffer). This  means there is a
> +		 * small window where we cannot determine zero-done for the
> +		 * current buffer, or moved to next buffer.
> +		 *
> +		 * If done shows 0, then retry the load, as it may hit the
> +		 * above hardware race. We will either get a new value which
> +		 * is from the first buffer, or we get an EOC (new buffer)
> +		 * or both a new value and an EOC...
> +		 */
> +		done = get_current_xferred_count(tdc, sg_req, wcount);
> +		if (done != 0) {
> +			result = residual - done;
> +			break;
> +		}
> +
> +		ndelay(100);

Please use udelay(1) because there is no ndelay on arm32 and ndelay(100) is getting rounded up to 1usec. AFAIK, arm64 doesn't have reliable ndelay on Tegra either because timer rate changes with the CPU frequency scaling.

Secondly done=0 isn't a error case, technically this could be the case when tegra_dma_update_residual() is invoked just after starting the transfer. Hence I think this do-while loop and timeout checking aren't needed at all since done=0 is a perfectly valid case.


Altogether seems the tegra_dma_update_residual() could be reduced to:

static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
					      struct tegra_dma_sg_req *sg_req,
					      struct tegra_dma_desc *dma_desc,
					      unsigned int residual)
{
	unsigned long status, wcount;

	if (list_is_first(&sg_req->node, &tdc->pending_sg_req))
		return residual;

	if (tdc->tdma->chip_data->support_separate_wcount_reg)
		wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);

	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);

	if (!tdc->tdma->chip_data->support_separate_wcount_reg)
		wcount = status;

	if (status & TEGRA_APBDMA_STATUS_ISE_EOC)
		return residual - sg_req->req_len;

	return residual - get_current_xferred_count(tdc, sg_req, wcount);
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* dma: tegra: add accurate reporting of dma state
@ 2019-05-01  8:33 ` Jon Hunter
  2019-05-01  8:33   ` [PATCH] " Jon Hunter
  2019-05-01 13:13   ` Vinod Koul
  0 siblings, 2 replies; 13+ messages in thread
From: Jon Hunter @ 2019-05-01  8:33 UTC (permalink / raw)
  To: Ben Dooks, linux-kernel
  Cc: Dmitry Osipenko, Laxman Dewangan, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel

On 24/04/2019 17:23, Ben Dooks wrote:
> The tx_status callback does not report the state of the transfer
> beyond complete segments. This causes problems with users such as
> ALSA when applications want to know accurately how much data has
> been moved.
> 
> This patch addes a function tegra_dma_update_residual() to query
> the hardware and modify the residual information accordinly. It
> takes into account any hardware issues when trying to read the
> state, such as delays between finishing a buffer and signalling
> the interrupt.
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> ---
> Cc: Dmitry Osipenko <digetx@gmail.com>
> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
> Cc: linux-kernel@vger.kernel.org (open list)
> ---
>  drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>  1 file changed, 86 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
> index cf462b1abc0b..544e7273e741 100644
> --- a/drivers/dma/tegra20-apb-dma.c
> +++ b/drivers/dma/tegra20-apb-dma.c
> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
>  	return 0;
>  }
>  
> +static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
> +					      struct tegra_dma_sg_req *sg_req,
> +					      struct tegra_dma_desc *dma_desc,
> +					      unsigned int residual)
> +{
> +	unsigned long status = 0x0;
> +	unsigned long wcount;
> +	unsigned long ahbptr;
> +	unsigned long tmp = 0x0;
> +	unsigned int result;
> +	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
> +	int done;
> +
> +	/* if we're not the current request, then don't alter the residual */
> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
> +				       struct tegra_dma_sg_req, node)) {
> +		result = residual;
> +		ahbptr = 0xffffffff;
> +		goto done;
> +	}
> +
> +	/* loop until we have a reliable result for residual */
> +	do {
> +		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> +		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */
> +
> +		/* check status, if channel isn't busy then skip */
> +		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
> +			result = residual;
> +			break;
> +		}
> +
> +		/* if we've got an interrupt pending on the channel, don't
> +		 * try and deal with the residue as the hardware has likely
> +		 * moved on to the next buffer. return all data moved.
> +		 */
> +		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
> +			result = residual - sg_req->req_len;
> +			break;
> +		}
> +
> +		if (tdc->tdma->chip_data->support_separate_wcount_reg)
> +			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
> +		else
> +			wcount = status;
> +
> +		/* If the request is at the full point, then there is a
> +		 * chance that we have read the status register in the
> +		 * middle of the hardware reloading the next buffer.
> +		 *
> +		 * The sequence seems to be at the end of the buffer, to
> +		 * load the new word count before raising the EOC flag (or
> +		 * changing the ping-pong flag which could have also been
> +		 * used to determine a new buffer). This  means there is a
> +		 * small window where we cannot determine zero-done for the
> +		 * current buffer, or moved to next buffer.
> +		 *
> +		 * If done shows 0, then retry the load, as it may hit the
> +		 * above hardware race. We will either get a new value which
> +		 * is from the first buffer, or we get an EOC (new buffer)
> +		 * or both a new value and an EOC...
> +		 */
> +		done = get_current_xferred_count(tdc, sg_req, wcount);
> +		if (done != 0) {
> +			result = residual - done;
> +			break;
> +		}
> +
> +		ndelay(100);
> +	} while (--retries > 0);
> +
> +	if (retries <= 0) {
> +		dev_err(tdc2dev(tdc), "timeout waiting for dma load\n");
> +		result = residual;
> +	}
> +
> +done:	
> +	dev_dbg(tdc2dev(tdc), "residual: req %08lx, ahb@%08lx, wcount %08lx, done %d\n",
> +		 sg_req->ch_regs.ahb_ptr, ahbptr, wcount, done);
> +
> +	return result;
> +}
> +
>  static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>  	dma_cookie_t cookie, struct dma_tx_state *txstate)
>  {
> @@ -849,6 +933,7 @@ static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>  		residual = dma_desc->bytes_requested -
>  			   (dma_desc->bytes_transferred %
>  			    dma_desc->bytes_requested);
> +		residual = tegra_dma_update_residual(tdc, sg_req, dma_desc, residual);
>  		dma_set_residue(txstate, residual);
>  	}
>  
> @@ -1444,12 +1529,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
>  		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
>  		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
>  	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
> -	/*
> -	 * XXX The hardware appears to support
> -	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
> -	 * only used by this driver during tegra_dma_terminate_all()
> -	 */
> -	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
> +	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
>  	tdma->dma_dev.device_config = tegra_dma_slave_config;
>  	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
>  	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;

In addition to Dmitry's comments, can you please make sure you run this
through checkpatch.pl?

Thanks
Jon

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma: tegra: add accurate reporting of dma state
  2019-05-01  8:33 ` Jon Hunter
@ 2019-05-01  8:33   ` Jon Hunter
  2019-05-01 13:13   ` Vinod Koul
  1 sibling, 0 replies; 13+ messages in thread
From: Jon Hunter @ 2019-05-01  8:33 UTC (permalink / raw)
  To: Ben Dooks, linux-kernel
  Cc: Dmitry Osipenko, Laxman Dewangan, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel


On 24/04/2019 17:23, Ben Dooks wrote:
> The tx_status callback does not report the state of the transfer
> beyond complete segments. This causes problems with users such as
> ALSA when applications want to know accurately how much data has
> been moved.
> 
> This patch addes a function tegra_dma_update_residual() to query
> the hardware and modify the residual information accordinly. It
> takes into account any hardware issues when trying to read the
> state, such as delays between finishing a buffer and signalling
> the interrupt.
> 
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> ---
> Cc: Dmitry Osipenko <digetx@gmail.com>
> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
> Cc: linux-kernel@vger.kernel.org (open list)
> ---
>  drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>  1 file changed, 86 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
> index cf462b1abc0b..544e7273e741 100644
> --- a/drivers/dma/tegra20-apb-dma.c
> +++ b/drivers/dma/tegra20-apb-dma.c
> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
>  	return 0;
>  }
>  
> +static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
> +					      struct tegra_dma_sg_req *sg_req,
> +					      struct tegra_dma_desc *dma_desc,
> +					      unsigned int residual)
> +{
> +	unsigned long status = 0x0;
> +	unsigned long wcount;
> +	unsigned long ahbptr;
> +	unsigned long tmp = 0x0;
> +	unsigned int result;
> +	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
> +	int done;
> +
> +	/* if we're not the current request, then don't alter the residual */
> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
> +				       struct tegra_dma_sg_req, node)) {
> +		result = residual;
> +		ahbptr = 0xffffffff;
> +		goto done;
> +	}
> +
> +	/* loop until we have a reliable result for residual */
> +	do {
> +		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> +		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */
> +
> +		/* check status, if channel isn't busy then skip */
> +		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
> +			result = residual;
> +			break;
> +		}
> +
> +		/* if we've got an interrupt pending on the channel, don't
> +		 * try and deal with the residue as the hardware has likely
> +		 * moved on to the next buffer. return all data moved.
> +		 */
> +		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
> +			result = residual - sg_req->req_len;
> +			break;
> +		}
> +
> +		if (tdc->tdma->chip_data->support_separate_wcount_reg)
> +			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
> +		else
> +			wcount = status;
> +
> +		/* If the request is at the full point, then there is a
> +		 * chance that we have read the status register in the
> +		 * middle of the hardware reloading the next buffer.
> +		 *
> +		 * The sequence seems to be at the end of the buffer, to
> +		 * load the new word count before raising the EOC flag (or
> +		 * changing the ping-pong flag which could have also been
> +		 * used to determine a new buffer). This  means there is a
> +		 * small window where we cannot determine zero-done for the
> +		 * current buffer, or moved to next buffer.
> +		 *
> +		 * If done shows 0, then retry the load, as it may hit the
> +		 * above hardware race. We will either get a new value which
> +		 * is from the first buffer, or we get an EOC (new buffer)
> +		 * or both a new value and an EOC...
> +		 */
> +		done = get_current_xferred_count(tdc, sg_req, wcount);
> +		if (done != 0) {
> +			result = residual - done;
> +			break;
> +		}
> +
> +		ndelay(100);
> +	} while (--retries > 0);
> +
> +	if (retries <= 0) {
> +		dev_err(tdc2dev(tdc), "timeout waiting for dma load\n");
> +		result = residual;
> +	}
> +
> +done:	
> +	dev_dbg(tdc2dev(tdc), "residual: req %08lx, ahb@%08lx, wcount %08lx, done %d\n",
> +		 sg_req->ch_regs.ahb_ptr, ahbptr, wcount, done);
> +
> +	return result;
> +}
> +
>  static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>  	dma_cookie_t cookie, struct dma_tx_state *txstate)
>  {
> @@ -849,6 +933,7 @@ static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
>  		residual = dma_desc->bytes_requested -
>  			   (dma_desc->bytes_transferred %
>  			    dma_desc->bytes_requested);
> +		residual = tegra_dma_update_residual(tdc, sg_req, dma_desc, residual);
>  		dma_set_residue(txstate, residual);
>  	}
>  
> @@ -1444,12 +1529,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
>  		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
>  		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
>  	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
> -	/*
> -	 * XXX The hardware appears to support
> -	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
> -	 * only used by this driver during tegra_dma_terminate_all()
> -	 */
> -	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
> +	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
>  	tdma->dma_dev.device_config = tegra_dma_slave_config;
>  	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
>  	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;

In addition to Dmitry's comments, can you please make sure you run this
through checkpatch.pl?

Thanks
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 13+ messages in thread

* dma: tegra: add accurate reporting of dma state
@ 2019-05-01  8:58   ` Ben Dooks
  2019-05-01  8:58     ` [PATCH] " Ben Dooks
  2019-05-04 16:06     ` Dmitry Osipenko
  0 siblings, 2 replies; 13+ messages in thread
From: Ben Dooks @ 2019-05-01  8:58 UTC (permalink / raw)
  To: Dmitry Osipenko, linux-kernel
  Cc: Laxman Dewangan, Jon Hunter, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel

On 24/04/2019 19:17, Dmitry Osipenko wrote:
> 24.04.2019 19:23, Ben Dooks пишет:
>> The tx_status callback does not report the state of the transfer
>> beyond complete segments. This causes problems with users such as
>> ALSA when applications want to know accurately how much data has
>> been moved.
>>
>> This patch addes a function tegra_dma_update_residual() to query
>> the hardware and modify the residual information accordinly. It
>> takes into account any hardware issues when trying to read the
>> state, such as delays between finishing a buffer and signalling
>> the interrupt.
>>
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> 
> Hello Ben,
> 
> Thank you very much for keeping it up. I have couple comments, please see them below.
> 
>> Cc: Dmitry Osipenko <digetx@gmail.com>
>> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
>> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
>> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
>> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
>> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
>> Cc: linux-kernel@vger.kernel.org (open list)
>> ---
>>   drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>>   1 file changed, 86 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
>> index cf462b1abc0b..544e7273e741 100644
>> --- a/drivers/dma/tegra20-apb-dma.c
>> +++ b/drivers/dma/tegra20-apb-dma.c
>> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
>>   	return 0;
>>   }
>>   
>> +static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
>> +					      struct tegra_dma_sg_req *sg_req,
>> +					      struct tegra_dma_desc *dma_desc,
>> +					      unsigned int residual)
>> +{
>> +	unsigned long status = 0x0;
>> +	unsigned long wcount;
>> +	unsigned long ahbptr;
>> +	unsigned long tmp = 0x0;
>> +	unsigned int result;
> 
> You could pre-assign ahbptr=0xffffffff and result=residual here, then you could remove all the duplicated assigns below.

ok, ta.

>> +	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
>> +	int done;
>> +
>> +	/* if we're not the current request, then don't alter the residual */
>> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
>> +				       struct tegra_dma_sg_req, node)) {
>> +		result = residual;
>> +		ahbptr = 0xffffffff;
>> +		goto done;
>> +	}
>> +
>> +	/* loop until we have a reliable result for residual */
>> +	do {
>> +		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
>> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
>> +		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */
> 
> The "tmp" variable isn't used anywhere in the code, please remove it.

must have been left over.

>> +
>> +		/* check status, if channel isn't busy then skip */
>> +		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
>> +			result = residual;
>> +			break;
>> +		}
> 
> This doesn't look correct because TRM says "Busy bit gets set as soon as a channel is enabled and gets cleared after transfer completes", hence a cleared BUSY bit means that all transfers are completed and result=residual is incorrect here. Given that there is a check for EOC bit being set below, this hunk should be removed.

I'll check notes, but see below.

>> +
>> +		/* if we've got an interrupt pending on the channel, don't
>> +		 * try and deal with the residue as the hardware has likely
>> +		 * moved on to the next buffer. return all data moved.
>> +		 */
>> +		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
>> +			result = residual - sg_req->req_len;
>> +			break;
>> +		}
>> +
>> +		if (tdc->tdma->chip_data->support_separate_wcount_reg)
>> +			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
>> +		else
>> +			wcount = status;
>> +
>> +		/* If the request is at the full point, then there is a
>> +		 * chance that we have read the status register in the
>> +		 * middle of the hardware reloading the next buffer.
>> +		 *
>> +		 * The sequence seems to be at the end of the buffer, to
>> +		 * load the new word count before raising the EOC flag (or
>> +		 * changing the ping-pong flag which could have also been
>> +		 * used to determine a new buffer). This  means there is a
>> +		 * small window where we cannot determine zero-done for the
>> +		 * current buffer, or moved to next buffer.
>> +		 *
>> +		 * If done shows 0, then retry the load, as it may hit the
>> +		 * above hardware race. We will either get a new value which
>> +		 * is from the first buffer, or we get an EOC (new buffer)
>> +		 * or both a new value and an EOC...
>> +		 */
>> +		done = get_current_xferred_count(tdc, sg_req, wcount);
>> +		if (done != 0) {
>> +			result = residual - done;
>> +			break;
>> +		}
>> +
>> +		ndelay(100);
> 
> Please use udelay(1) because there is no ndelay on arm32 and ndelay(100) is getting rounded up to 1usec. AFAIK, arm64 doesn't have reliable ndelay on Tegra either because timer rate changes with the CPU frequency scaling.

I'll check, but last time it was implemented. This seems a backwards step.

> Secondly done=0 isn't a error case, technically this could be the case when tegra_dma_update_residual() is invoked just after starting the transfer. Hence I think this do-while loop and timeout checking aren't needed at all since done=0 is a perfectly valid case.

this is not checking for an error, it's checking for a possible
inaccurate reading.

> 
> Altogether seems the tegra_dma_update_residual() could be reduced to:
> 
> static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
> 					      struct tegra_dma_sg_req *sg_req,
> 					      struct tegra_dma_desc *dma_desc,
> 					      unsigned int residual)
> {
> 	unsigned long status, wcount;
> 
> 	if (list_is_first(&sg_req->node, &tdc->pending_sg_req))
> 		return residual;
> 
> 	if (tdc->tdma->chip_data->support_separate_wcount_reg)
> 		wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
> 
> 	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> 
> 	if (!tdc->tdma->chip_data->support_separate_wcount_reg)
> 		wcount = status;
> 
> 	if (status & TEGRA_APBDMA_STATUS_ISE_EOC)
> 		return residual - sg_req->req_len;
> 
> 	return residual - get_current_xferred_count(tdc, sg_req, wcount);
> }

I'm not sure if that will work all the time. It took days of testing to
get reliable error data for the cases we're looking for here.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma: tegra: add accurate reporting of dma state
  2019-05-01  8:58   ` Ben Dooks
@ 2019-05-01  8:58     ` Ben Dooks
  2019-05-04 16:06     ` Dmitry Osipenko
  1 sibling, 0 replies; 13+ messages in thread
From: Ben Dooks @ 2019-05-01  8:58 UTC (permalink / raw)
  To: Dmitry Osipenko, linux-kernel
  Cc: Laxman Dewangan, Jon Hunter, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel

On 24/04/2019 19:17, Dmitry Osipenko wrote:
> 24.04.2019 19:23, Ben Dooks пишет:
>> The tx_status callback does not report the state of the transfer
>> beyond complete segments. This causes problems with users such as
>> ALSA when applications want to know accurately how much data has
>> been moved.
>>
>> This patch addes a function tegra_dma_update_residual() to query
>> the hardware and modify the residual information accordinly. It
>> takes into account any hardware issues when trying to read the
>> state, such as delays between finishing a buffer and signalling
>> the interrupt.
>>
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> 
> Hello Ben,
> 
> Thank you very much for keeping it up. I have couple comments, please see them below.
> 
>> Cc: Dmitry Osipenko <digetx@gmail.com>
>> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
>> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API)
>> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA ARCHITECTURE SUPPORT)
>> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE SUBSYSTEM)
>> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
>> Cc: linux-kernel@vger.kernel.org (open list)
>> ---
>>   drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>>   1 file changed, 86 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
>> index cf462b1abc0b..544e7273e741 100644
>> --- a/drivers/dma/tegra20-apb-dma.c
>> +++ b/drivers/dma/tegra20-apb-dma.c
>> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
>>   	return 0;
>>   }
>>   
>> +static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
>> +					      struct tegra_dma_sg_req *sg_req,
>> +					      struct tegra_dma_desc *dma_desc,
>> +					      unsigned int residual)
>> +{
>> +	unsigned long status = 0x0;
>> +	unsigned long wcount;
>> +	unsigned long ahbptr;
>> +	unsigned long tmp = 0x0;
>> +	unsigned int result;
> 
> You could pre-assign ahbptr=0xffffffff and result=residual here, then you could remove all the duplicated assigns below.

ok, ta.

>> +	int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
>> +	int done;
>> +
>> +	/* if we're not the current request, then don't alter the residual */
>> +	if (sg_req != list_first_entry(&tdc->pending_sg_req,
>> +				       struct tegra_dma_sg_req, node)) {
>> +		result = residual;
>> +		ahbptr = 0xffffffff;
>> +		goto done;
>> +	}
>> +
>> +	/* loop until we have a reliable result for residual */
>> +	do {
>> +		ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
>> +		status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
>> +		tmp =  tdc_read(tdc, 0x08);	/* total count for debug */
> 
> The "tmp" variable isn't used anywhere in the code, please remove it.

must have been left over.

>> +
>> +		/* check status, if channel isn't busy then skip */
>> +		if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
>> +			result = residual;
>> +			break;
>> +		}
> 
> This doesn't look correct because TRM says "Busy bit gets set as soon as a channel is enabled and gets cleared after transfer completes", hence a cleared BUSY bit means that all transfers are completed and result=residual is incorrect here. Given that there is a check for EOC bit being set below, this hunk should be removed.

I'll check notes, but see below.

>> +
>> +		/* if we've got an interrupt pending on the channel, don't
>> +		 * try and deal with the residue as the hardware has likely
>> +		 * moved on to the next buffer. return all data moved.
>> +		 */
>> +		if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
>> +			result = residual - sg_req->req_len;
>> +			break;
>> +		}
>> +
>> +		if (tdc->tdma->chip_data->support_separate_wcount_reg)
>> +			wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
>> +		else
>> +			wcount = status;
>> +
>> +		/* If the request is at the full point, then there is a
>> +		 * chance that we have read the status register in the
>> +		 * middle of the hardware reloading the next buffer.
>> +		 *
>> +		 * The sequence seems to be at the end of the buffer, to
>> +		 * load the new word count before raising the EOC flag (or
>> +		 * changing the ping-pong flag which could have also been
>> +		 * used to determine a new buffer). This  means there is a
>> +		 * small window where we cannot determine zero-done for the
>> +		 * current buffer, or moved to next buffer.
>> +		 *
>> +		 * If done shows 0, then retry the load, as it may hit the
>> +		 * above hardware race. We will either get a new value which
>> +		 * is from the first buffer, or we get an EOC (new buffer)
>> +		 * or both a new value and an EOC...
>> +		 */
>> +		done = get_current_xferred_count(tdc, sg_req, wcount);
>> +		if (done != 0) {
>> +			result = residual - done;
>> +			break;
>> +		}
>> +
>> +		ndelay(100);
> 
> Please use udelay(1) because there is no ndelay on arm32 and ndelay(100) is getting rounded up to 1usec. AFAIK, arm64 doesn't have reliable ndelay on Tegra either because timer rate changes with the CPU frequency scaling.

I'll check, but last time it was implemented. This seems a backwards step.

> Secondly done=0 isn't a error case, technically this could be the case when tegra_dma_update_residual() is invoked just after starting the transfer. Hence I think this do-while loop and timeout checking aren't needed at all since done=0 is a perfectly valid case.

this is not checking for an error, it's checking for a possible
inaccurate reading.

> 
> Altogether seems the tegra_dma_update_residual() could be reduced to:
> 
> static unsigned int tegra_dma_update_residual(struct tegra_dma_channel *tdc,
> 					      struct tegra_dma_sg_req *sg_req,
> 					      struct tegra_dma_desc *dma_desc,
> 					      unsigned int residual)
> {
> 	unsigned long status, wcount;
> 
> 	if (list_is_first(&sg_req->node, &tdc->pending_sg_req))
> 		return residual;
> 
> 	if (tdc->tdma->chip_data->support_separate_wcount_reg)
> 		wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
> 
> 	status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
> 
> 	if (!tdc->tdma->chip_data->support_separate_wcount_reg)
> 		wcount = status;
> 
> 	if (status & TEGRA_APBDMA_STATUS_ISE_EOC)
> 		return residual - sg_req->req_len;
> 
> 	return residual - get_current_xferred_count(tdc, sg_req, wcount);
> }

I'm not sure if that will work all the time. It took days of testing to
get reliable error data for the cases we're looking for here.


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* dma: tegra: add accurate reporting of dma state
@ 2019-05-01 13:13   ` Vinod Koul
  2019-05-01 13:13     ` [PATCH] " Vinod Koul
  0 siblings, 1 reply; 13+ messages in thread
From: Vinod Koul @ 2019-05-01 13:13 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Ben Dooks, linux-kernel, Dmitry Osipenko, Laxman Dewangan,
	Dan Williams, Thierry Reding, dmaengine, linux-tegra,
	linux-kernel

On 01-05-19, 09:33, Jon Hunter wrote:

> > @@ -1444,12 +1529,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
> >  		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
> >  		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
> >  	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
> > -	/*
> > -	 * XXX The hardware appears to support
> > -	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
> > -	 * only used by this driver during tegra_dma_terminate_all()
> > -	 */
> > -	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
> > +	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
> >  	tdma->dma_dev.device_config = tegra_dma_slave_config;
> >  	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
> >  	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;
> 
> In addition to Dmitry's comments, can you please make sure you run this
> through checkpatch.pl?

And use correct subsystem name !

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma: tegra: add accurate reporting of dma state
  2019-05-01 13:13   ` Vinod Koul
@ 2019-05-01 13:13     ` Vinod Koul
  0 siblings, 0 replies; 13+ messages in thread
From: Vinod Koul @ 2019-05-01 13:13 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Ben Dooks, linux-kernel, Dmitry Osipenko, Laxman Dewangan,
	Dan Williams, Thierry Reding, dmaengine, linux-tegra,
	linux-kernel

On 01-05-19, 09:33, Jon Hunter wrote:

> > @@ -1444,12 +1529,7 @@ static int tegra_dma_probe(struct platform_device *pdev)
> >  		BIT(DMA_SLAVE_BUSWIDTH_4_BYTES) |
> >  		BIT(DMA_SLAVE_BUSWIDTH_8_BYTES);
> >  	tdma->dma_dev.directions = BIT(DMA_DEV_TO_MEM) | BIT(DMA_MEM_TO_DEV);
> > -	/*
> > -	 * XXX The hardware appears to support
> > -	 * DMA_RESIDUE_GRANULARITY_BURST-level reporting, but it's
> > -	 * only used by this driver during tegra_dma_terminate_all()
> > -	 */
> > -	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
> > +	tdma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
> >  	tdma->dma_dev.device_config = tegra_dma_slave_config;
> >  	tdma->dma_dev.device_terminate_all = tegra_dma_terminate_all;
> >  	tdma->dma_dev.device_tx_status = tegra_dma_tx_status;
> 
> In addition to Dmitry's comments, can you please make sure you run this
> through checkpatch.pl?

And use correct subsystem name !

-- 
~Vinod

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma: tegra: add accurate reporting of dma state
  2019-05-01  8:58   ` Ben Dooks
  2019-05-01  8:58     ` [PATCH] " Ben Dooks
@ 2019-05-04 16:06     ` Dmitry Osipenko
  2019-05-05 13:39       ` Dmitry Osipenko
  1 sibling, 1 reply; 13+ messages in thread
From: Dmitry Osipenko @ 2019-05-04 16:06 UTC (permalink / raw)
  To: Ben Dooks, linux-kernel
  Cc: Laxman Dewangan, Jon Hunter, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel

01.05.2019 11:58, Ben Dooks пишет:
> On 24/04/2019 19:17, Dmitry Osipenko wrote:
>> 24.04.2019 19:23, Ben Dooks пишет:
>>> The tx_status callback does not report the state of the transfer
>>> beyond complete segments. This causes problems with users such as
>>> ALSA when applications want to know accurately how much data has
>>> been moved.
>>>
>>> This patch addes a function tegra_dma_update_residual() to query
>>> the hardware and modify the residual information accordinly. It
>>> takes into account any hardware issues when trying to read the
>>> state, such as delays between finishing a buffer and signalling
>>> the interrupt.
>>>
>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>
>> Hello Ben,
>>
>> Thank you very much for keeping it up. I have couple comments, please
>> see them below.
>>
>>> Cc: Dmitry Osipenko <digetx@gmail.com>
>>> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>>> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>>> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD
>>> ENGINE SUBSYSTEM)
>>> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS
>>> TRANSFERS/TRANSFORMS (IOAT) API)
>>> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA
>>> ARCHITECTURE SUPPORT)
>>> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE
>>> SUBSYSTEM)
>>> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
>>> Cc: linux-kernel@vger.kernel.org (open list)
>>> ---
>>>   drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>>>   1 file changed, 86 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/dma/tegra20-apb-dma.c
>>> b/drivers/dma/tegra20-apb-dma.c
>>> index cf462b1abc0b..544e7273e741 100644
>>> --- a/drivers/dma/tegra20-apb-dma.c
>>> +++ b/drivers/dma/tegra20-apb-dma.c
>>> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct
>>> dma_chan *dc)
>>>       return 0;
>>>   }
>>>   +static unsigned int tegra_dma_update_residual(struct
>>> tegra_dma_channel *tdc,
>>> +                          struct tegra_dma_sg_req *sg_req,
>>> +                          struct tegra_dma_desc *dma_desc,
>>> +                          unsigned int residual)
>>> +{
>>> +    unsigned long status = 0x0;
>>> +    unsigned long wcount;
>>> +    unsigned long ahbptr;
>>> +    unsigned long tmp = 0x0;
>>> +    unsigned int result;
>>
>> You could pre-assign ahbptr=0xffffffff and result=residual here, then
>> you could remove all the duplicated assigns below.
> 
> ok, ta.
> 
>>> +    int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
>>> +    int done;
>>> +
>>> +    /* if we're not the current request, then don't alter the
>>> residual */
>>> +    if (sg_req != list_first_entry(&tdc->pending_sg_req,
>>> +                       struct tegra_dma_sg_req, node)) {
>>> +        result = residual;
>>> +        ahbptr = 0xffffffff;
>>> +        goto done;
>>> +    }
>>> +
>>> +    /* loop until we have a reliable result for residual */
>>> +    do {
>>> +        ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
>>> +        status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
>>> +        tmp =  tdc_read(tdc, 0x08);    /* total count for debug */
>>
>> The "tmp" variable isn't used anywhere in the code, please remove it.
> 
> must have been left over.
> 
>>> +
>>> +        /* check status, if channel isn't busy then skip */
>>> +        if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
>>> +            result = residual;
>>> +            break;
>>> +        }
>>
>> This doesn't look correct because TRM says "Busy bit gets set as soon
>> as a channel is enabled and gets cleared after transfer completes",
>> hence a cleared BUSY bit means that all transfers are completed and
>> result=residual is incorrect here. Given that there is a check for EOC
>> bit being set below, this hunk should be removed.
> 
> I'll check notes, but see below.
> 
>>> +
>>> +        /* if we've got an interrupt pending on the channel, don't
>>> +         * try and deal with the residue as the hardware has likely
>>> +         * moved on to the next buffer. return all data moved.
>>> +         */
>>> +        if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
>>> +            result = residual - sg_req->req_len;
>>> +            break;
>>> +        }
>>> +
>>> +        if (tdc->tdma->chip_data->support_separate_wcount_reg)
>>> +            wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
>>> +        else
>>> +            wcount = status;
>>> +
>>> +        /* If the request is at the full point, then there is a
>>> +         * chance that we have read the status register in the
>>> +         * middle of the hardware reloading the next buffer.
>>> +         *
>>> +         * The sequence seems to be at the end of the buffer, to
>>> +         * load the new word count before raising the EOC flag (or
>>> +         * changing the ping-pong flag which could have also been
>>> +         * used to determine a new buffer). This  means there is a
>>> +         * small window where we cannot determine zero-done for the
>>> +         * current buffer, or moved to next buffer.
>>> +         *
>>> +         * If done shows 0, then retry the load, as it may hit the
>>> +         * above hardware race. We will either get a new value which
>>> +         * is from the first buffer, or we get an EOC (new buffer)
>>> +         * or both a new value and an EOC...
>>> +         */
>>> +        done = get_current_xferred_count(tdc, sg_req, wcount);
>>> +        if (done != 0) {
>>> +            result = residual - done;
>>> +            break;
>>> +        }
>>> +
>>> +        ndelay(100);
>>
>> Please use udelay(1) because there is no ndelay on arm32 and
>> ndelay(100) is getting rounded up to 1usec. AFAIK, arm64 doesn't have
>> reliable ndelay on Tegra either because timer rate changes with the
>> CPU frequency scaling.
> 
> I'll check, but last time it was implemented. This seems a backwards step.
> 
>> Secondly done=0 isn't a error case, technically this could be the case
>> when tegra_dma_update_residual() is invoked just after starting the
>> transfer. Hence I think this do-while loop and timeout checking aren't
>> needed at all since done=0 is a perfectly valid case.
> 
> this is not checking for an error, it's checking for a possible
> inaccurate reading.

If you'll change reading order of the status / words registers like I
suggested, then there won't be a case for the inaccuracy.

The EOC bit should be set atomically once transfer is finished, you
can't get wrapped around words count and EOC bit not being set.

For oneshot transfer that runs with interrupt being disabled, the words
counter will stop at 0 and the unset BUSY bit will indicate that the
transfer is completed.

>>
>> Altogether seems the tegra_dma_update_residual() could be reduced to:
>>
>> static unsigned int tegra_dma_update_residual(struct tegra_dma_channel
>> *tdc,
>>                           struct tegra_dma_sg_req *sg_req,
>>                           struct tegra_dma_desc *dma_desc,
>>                           unsigned int residual) 
>> {
>>     unsigned long status, wcount;
>>
>>     if (list_is_first(&sg_req->node, &tdc->pending_sg_req))
>>         return residual;
>>
>>     if (tdc->tdma->chip_data->support_separate_wcount_reg)
>>         wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
>>
>>     status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
>>
>>     if (!tdc->tdma->chip_data->support_separate_wcount_reg)
>>         wcount = status;
>>
>>     if (status & TEGRA_APBDMA_STATUS_ISE_EOC)
>>         return residual - sg_req->req_len;
>>
>>     return residual - get_current_xferred_count(tdc, sg_req, wcount);
>> }
> 
> I'm not sure if that will work all the time. It took days of testing to
> get reliable error data for the cases we're looking for here.

Could you please tell exactly what those cases are. I don't see when the
simplified variant could fail, but maybe I already forgot some extra
details about how APB DMA works.

I tested the variant I'm suggesting (with the fixed typos and added
check for the BUSY bit) and it works absolutely fine, audio stuttering
issue is fixed, everything else works too. Please consider to use it for
the next version of the patch if there are no objections.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma: tegra: add accurate reporting of dma state
  2019-05-04 16:06     ` Dmitry Osipenko
@ 2019-05-05 13:39       ` Dmitry Osipenko
  2019-06-12 18:57         ` Dmitry Osipenko
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Osipenko @ 2019-05-05 13:39 UTC (permalink / raw)
  To: Ben Dooks, linux-kernel
  Cc: Laxman Dewangan, Jon Hunter, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel

04.05.2019 19:06, Dmitry Osipenko пишет:
> 01.05.2019 11:58, Ben Dooks пишет:
>> On 24/04/2019 19:17, Dmitry Osipenko wrote:
>>> 24.04.2019 19:23, Ben Dooks пишет:
>>>> The tx_status callback does not report the state of the transfer
>>>> beyond complete segments. This causes problems with users such as
>>>> ALSA when applications want to know accurately how much data has
>>>> been moved.
>>>>
>>>> This patch addes a function tegra_dma_update_residual() to query
>>>> the hardware and modify the residual information accordinly. It
>>>> takes into account any hardware issues when trying to read the
>>>> state, such as delays between finishing a buffer and signalling
>>>> the interrupt.
>>>>
>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>
>>> Hello Ben,
>>>
>>> Thank you very much for keeping it up. I have couple comments, please
>>> see them below.
>>>
>>>> Cc: Dmitry Osipenko <digetx@gmail.com>
>>>> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>>>> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>>>> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD
>>>> ENGINE SUBSYSTEM)
>>>> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS
>>>> TRANSFERS/TRANSFORMS (IOAT) API)
>>>> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA
>>>> ARCHITECTURE SUPPORT)
>>>> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE
>>>> SUBSYSTEM)
>>>> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
>>>> Cc: linux-kernel@vger.kernel.org (open list)
>>>> ---
>>>>   drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>>>>   1 file changed, 86 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/dma/tegra20-apb-dma.c
>>>> b/drivers/dma/tegra20-apb-dma.c
>>>> index cf462b1abc0b..544e7273e741 100644
>>>> --- a/drivers/dma/tegra20-apb-dma.c
>>>> +++ b/drivers/dma/tegra20-apb-dma.c
>>>> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct
>>>> dma_chan *dc)
>>>>       return 0;
>>>>   }
>>>>   +static unsigned int tegra_dma_update_residual(struct
>>>> tegra_dma_channel *tdc,
>>>> +                          struct tegra_dma_sg_req *sg_req,
>>>> +                          struct tegra_dma_desc *dma_desc,
>>>> +                          unsigned int residual)
>>>> +{
>>>> +    unsigned long status = 0x0;
>>>> +    unsigned long wcount;
>>>> +    unsigned long ahbptr;
>>>> +    unsigned long tmp = 0x0;
>>>> +    unsigned int result;
>>>
>>> You could pre-assign ahbptr=0xffffffff and result=residual here, then
>>> you could remove all the duplicated assigns below.
>>
>> ok, ta.
>>
>>>> +    int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
>>>> +    int done;
>>>> +
>>>> +    /* if we're not the current request, then don't alter the
>>>> residual */
>>>> +    if (sg_req != list_first_entry(&tdc->pending_sg_req,
>>>> +                       struct tegra_dma_sg_req, node)) {
>>>> +        result = residual;
>>>> +        ahbptr = 0xffffffff;
>>>> +        goto done;
>>>> +    }
>>>> +
>>>> +    /* loop until we have a reliable result for residual */
>>>> +    do {
>>>> +        ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
>>>> +        status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
>>>> +        tmp =  tdc_read(tdc, 0x08);    /* total count for debug */
>>>
>>> The "tmp" variable isn't used anywhere in the code, please remove it.
>>
>> must have been left over.
>>
>>>> +
>>>> +        /* check status, if channel isn't busy then skip */
>>>> +        if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
>>>> +            result = residual;
>>>> +            break;
>>>> +        }
>>>
>>> This doesn't look correct because TRM says "Busy bit gets set as soon
>>> as a channel is enabled and gets cleared after transfer completes",
>>> hence a cleared BUSY bit means that all transfers are completed and
>>> result=residual is incorrect here. Given that there is a check for EOC
>>> bit being set below, this hunk should be removed.
>>
>> I'll check notes, but see below.
>>
>>>> +
>>>> +        /* if we've got an interrupt pending on the channel, don't
>>>> +         * try and deal with the residue as the hardware has likely
>>>> +         * moved on to the next buffer. return all data moved.
>>>> +         */
>>>> +        if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
>>>> +            result = residual - sg_req->req_len;
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        if (tdc->tdma->chip_data->support_separate_wcount_reg)
>>>> +            wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
>>>> +        else
>>>> +            wcount = status;
>>>> +
>>>> +        /* If the request is at the full point, then there is a
>>>> +         * chance that we have read the status register in the
>>>> +         * middle of the hardware reloading the next buffer.
>>>> +         *
>>>> +         * The sequence seems to be at the end of the buffer, to
>>>> +         * load the new word count before raising the EOC flag (or
>>>> +         * changing the ping-pong flag which could have also been
>>>> +         * used to determine a new buffer). This  means there is a
>>>> +         * small window where we cannot determine zero-done for the
>>>> +         * current buffer, or moved to next buffer.
>>>> +         *
>>>> +         * If done shows 0, then retry the load, as it may hit the
>>>> +         * above hardware race. We will either get a new value which
>>>> +         * is from the first buffer, or we get an EOC (new buffer)
>>>> +         * or both a new value and an EOC...
>>>> +         */
>>>> +        done = get_current_xferred_count(tdc, sg_req, wcount);
>>>> +        if (done != 0) {
>>>> +            result = residual - done;
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        ndelay(100);
>>>
>>> Please use udelay(1) because there is no ndelay on arm32 and
>>> ndelay(100) is getting rounded up to 1usec. AFAIK, arm64 doesn't have
>>> reliable ndelay on Tegra either because timer rate changes with the
>>> CPU frequency scaling.
>>
>> I'll check, but last time it was implemented. This seems a backwards step.
>>
>>> Secondly done=0 isn't a error case, technically this could be the case
>>> when tegra_dma_update_residual() is invoked just after starting the
>>> transfer. Hence I think this do-while loop and timeout checking aren't
>>> needed at all since done=0 is a perfectly valid case.
>>
>> this is not checking for an error, it's checking for a possible
>> inaccurate reading.
> 
> If you'll change reading order of the status / words registers like I
> suggested, then there won't be a case for the inaccuracy.
> 
> The EOC bit should be set atomically once transfer is finished, you
> can't get wrapped around words count and EOC bit not being set.
> 
> For oneshot transfer that runs with interrupt being disabled, the words
> counter will stop at 0 and the unset BUSY bit will indicate that the
> transfer is completed.
> 
>>>
>>> Altogether seems the tegra_dma_update_residual() could be reduced to:
>>>
>>> static unsigned int tegra_dma_update_residual(struct tegra_dma_channel
>>> *tdc,
>>>                           struct tegra_dma_sg_req *sg_req,
>>>                           struct tegra_dma_desc *dma_desc,
>>>                           unsigned int residual) 
>>> {
>>>     unsigned long status, wcount;
>>>
>>>     if (list_is_first(&sg_req->node, &tdc->pending_sg_req))
>>>         return residual;
>>>
>>>     if (tdc->tdma->chip_data->support_separate_wcount_reg)
>>>         wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
>>>
>>>     status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
>>>
>>>     if (!tdc->tdma->chip_data->support_separate_wcount_reg)
>>>         wcount = status;
>>>
>>>     if (status & TEGRA_APBDMA_STATUS_ISE_EOC)
>>>         return residual - sg_req->req_len;
>>>
>>>     return residual - get_current_xferred_count(tdc, sg_req, wcount);
>>> }
>>
>> I'm not sure if that will work all the time. It took days of testing to
>> get reliable error data for the cases we're looking for here.
> 
> Could you please tell exactly what those cases are. I don't see when the
> simplified variant could fail, but maybe I already forgot some extra
> details about how APB DMA works.
> 
> I tested the variant I'm suggesting (with the fixed typos and added
> check for the BUSY bit) and it works absolutely fine, audio stuttering
> issue is fixed, everything else works too. Please consider to use it for
> the next version of the patch if there are no objections.
> 

Actually the BUSY bit checking shouldn't be needed. I think it's a bug
in the driver that it may not enable EOC interrupt and will send a patch
to fix it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] dma: tegra: add accurate reporting of dma state
  2019-05-05 13:39       ` Dmitry Osipenko
@ 2019-06-12 18:57         ` Dmitry Osipenko
  0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Osipenko @ 2019-06-12 18:57 UTC (permalink / raw)
  To: Ben Dooks, linux-kernel
  Cc: Laxman Dewangan, Jon Hunter, Vinod Koul, Dan Williams,
	Thierry Reding, dmaengine, linux-tegra, linux-kernel

05.05.2019 16:39, Dmitry Osipenko пишет:
> 04.05.2019 19:06, Dmitry Osipenko пишет:
>> 01.05.2019 11:58, Ben Dooks пишет:
>>> On 24/04/2019 19:17, Dmitry Osipenko wrote:
>>>> 24.04.2019 19:23, Ben Dooks пишет:
>>>>> The tx_status callback does not report the state of the transfer
>>>>> beyond complete segments. This causes problems with users such as
>>>>> ALSA when applications want to know accurately how much data has
>>>>> been moved.
>>>>>
>>>>> This patch addes a function tegra_dma_update_residual() to query
>>>>> the hardware and modify the residual information accordinly. It
>>>>> takes into account any hardware issues when trying to read the
>>>>> state, such as delays between finishing a buffer and signalling
>>>>> the interrupt.
>>>>>
>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>
>>>> Hello Ben,
>>>>
>>>> Thank you very much for keeping it up. I have couple comments, please
>>>> see them below.
>>>>
>>>>> Cc: Dmitry Osipenko <digetx@gmail.com>
>>>>> Cc: Laxman Dewangan <ldewangan@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>>>>> Cc: Jon Hunter <jonathanh@nvidia.com> (supporter:TEGRA DMA DRIVERS)
>>>>> Cc: Vinod Koul <vkoul@kernel.org> (maintainer:DMA GENERIC OFFLOAD
>>>>> ENGINE SUBSYSTEM)
>>>>> Cc: Dan Williams <dan.j.williams@intel.com> (reviewer:ASYNCHRONOUS
>>>>> TRANSFERS/TRANSFORMS (IOAT) API)
>>>>> Cc: Thierry Reding <thierry.reding@gmail.com> (supporter:TEGRA
>>>>> ARCHITECTURE SUPPORT)
>>>>> Cc: dmaengine@vger.kernel.org (open list:DMA GENERIC OFFLOAD ENGINE
>>>>> SUBSYSTEM)
>>>>> Cc: linux-tegra@vger.kernel.org (open list:TEGRA ARCHITECTURE SUPPORT)
>>>>> Cc: linux-kernel@vger.kernel.org (open list)
>>>>> ---
>>>>>   drivers/dma/tegra20-apb-dma.c | 92 ++++++++++++++++++++++++++++++++---
>>>>>   1 file changed, 86 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/dma/tegra20-apb-dma.c
>>>>> b/drivers/dma/tegra20-apb-dma.c
>>>>> index cf462b1abc0b..544e7273e741 100644
>>>>> --- a/drivers/dma/tegra20-apb-dma.c
>>>>> +++ b/drivers/dma/tegra20-apb-dma.c
>>>>> @@ -808,6 +808,90 @@ static int tegra_dma_terminate_all(struct
>>>>> dma_chan *dc)
>>>>>       return 0;
>>>>>   }
>>>>>   +static unsigned int tegra_dma_update_residual(struct
>>>>> tegra_dma_channel *tdc,
>>>>> +                          struct tegra_dma_sg_req *sg_req,
>>>>> +                          struct tegra_dma_desc *dma_desc,
>>>>> +                          unsigned int residual)
>>>>> +{
>>>>> +    unsigned long status = 0x0;
>>>>> +    unsigned long wcount;
>>>>> +    unsigned long ahbptr;
>>>>> +    unsigned long tmp = 0x0;
>>>>> +    unsigned int result;
>>>>
>>>> You could pre-assign ahbptr=0xffffffff and result=residual here, then
>>>> you could remove all the duplicated assigns below.
>>>
>>> ok, ta.
>>>
>>>>> +    int retries = TEGRA_APBDMA_BURST_COMPLETE_TIME * 10;
>>>>> +    int done;
>>>>> +
>>>>> +    /* if we're not the current request, then don't alter the
>>>>> residual */
>>>>> +    if (sg_req != list_first_entry(&tdc->pending_sg_req,
>>>>> +                       struct tegra_dma_sg_req, node)) {
>>>>> +        result = residual;
>>>>> +        ahbptr = 0xffffffff;
>>>>> +        goto done;
>>>>> +    }
>>>>> +
>>>>> +    /* loop until we have a reliable result for residual */
>>>>> +    do {
>>>>> +        ahbptr = tdc_read(tdc, TEGRA_APBDMA_CHAN_AHBPTR);
>>>>> +        status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
>>>>> +        tmp =  tdc_read(tdc, 0x08);    /* total count for debug */
>>>>
>>>> The "tmp" variable isn't used anywhere in the code, please remove it.
>>>
>>> must have been left over.
>>>
>>>>> +
>>>>> +        /* check status, if channel isn't busy then skip */
>>>>> +        if (!(status & TEGRA_APBDMA_STATUS_BUSY)) {
>>>>> +            result = residual;
>>>>> +            break;
>>>>> +        }
>>>>
>>>> This doesn't look correct because TRM says "Busy bit gets set as soon
>>>> as a channel is enabled and gets cleared after transfer completes",
>>>> hence a cleared BUSY bit means that all transfers are completed and
>>>> result=residual is incorrect here. Given that there is a check for EOC
>>>> bit being set below, this hunk should be removed.
>>>
>>> I'll check notes, but see below.
>>>
>>>>> +
>>>>> +        /* if we've got an interrupt pending on the channel, don't
>>>>> +         * try and deal with the residue as the hardware has likely
>>>>> +         * moved on to the next buffer. return all data moved.
>>>>> +         */
>>>>> +        if (status & TEGRA_APBDMA_STATUS_ISE_EOC) {
>>>>> +            result = residual - sg_req->req_len;
>>>>> +            break;
>>>>> +        }
>>>>> +
>>>>> +        if (tdc->tdma->chip_data->support_separate_wcount_reg)
>>>>> +            wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
>>>>> +        else
>>>>> +            wcount = status;
>>>>> +
>>>>> +        /* If the request is at the full point, then there is a
>>>>> +         * chance that we have read the status register in the
>>>>> +         * middle of the hardware reloading the next buffer.
>>>>> +         *
>>>>> +         * The sequence seems to be at the end of the buffer, to
>>>>> +         * load the new word count before raising the EOC flag (or
>>>>> +         * changing the ping-pong flag which could have also been
>>>>> +         * used to determine a new buffer). This  means there is a
>>>>> +         * small window where we cannot determine zero-done for the
>>>>> +         * current buffer, or moved to next buffer.
>>>>> +         *
>>>>> +         * If done shows 0, then retry the load, as it may hit the
>>>>> +         * above hardware race. We will either get a new value which
>>>>> +         * is from the first buffer, or we get an EOC (new buffer)
>>>>> +         * or both a new value and an EOC...
>>>>> +         */
>>>>> +        done = get_current_xferred_count(tdc, sg_req, wcount);
>>>>> +        if (done != 0) {
>>>>> +            result = residual - done;
>>>>> +            break;
>>>>> +        }
>>>>> +
>>>>> +        ndelay(100);
>>>>
>>>> Please use udelay(1) because there is no ndelay on arm32 and
>>>> ndelay(100) is getting rounded up to 1usec. AFAIK, arm64 doesn't have
>>>> reliable ndelay on Tegra either because timer rate changes with the
>>>> CPU frequency scaling.
>>>
>>> I'll check, but last time it was implemented. This seems a backwards step.
>>>
>>>> Secondly done=0 isn't a error case, technically this could be the case
>>>> when tegra_dma_update_residual() is invoked just after starting the
>>>> transfer. Hence I think this do-while loop and timeout checking aren't
>>>> needed at all since done=0 is a perfectly valid case.
>>>
>>> this is not checking for an error, it's checking for a possible
>>> inaccurate reading.
>>
>> If you'll change reading order of the status / words registers like I
>> suggested, then there won't be a case for the inaccuracy.
>>
>> The EOC bit should be set atomically once transfer is finished, you
>> can't get wrapped around words count and EOC bit not being set.
>>
>> For oneshot transfer that runs with interrupt being disabled, the words
>> counter will stop at 0 and the unset BUSY bit will indicate that the
>> transfer is completed.
>>
>>>>
>>>> Altogether seems the tegra_dma_update_residual() could be reduced to:
>>>>
>>>> static unsigned int tegra_dma_update_residual(struct tegra_dma_channel
>>>> *tdc,
>>>>                           struct tegra_dma_sg_req *sg_req,
>>>>                           struct tegra_dma_desc *dma_desc,
>>>>                           unsigned int residual) 
>>>> {
>>>>     unsigned long status, wcount;
>>>>
>>>>     if (list_is_first(&sg_req->node, &tdc->pending_sg_req))
>>>>         return residual;
>>>>
>>>>     if (tdc->tdma->chip_data->support_separate_wcount_reg)
>>>>         wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
>>>>
>>>>     status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
>>>>
>>>>     if (!tdc->tdma->chip_data->support_separate_wcount_reg)
>>>>         wcount = status;
>>>>
>>>>     if (status & TEGRA_APBDMA_STATUS_ISE_EOC)
>>>>         return residual - sg_req->req_len;
>>>>
>>>>     return residual - get_current_xferred_count(tdc, sg_req, wcount);
>>>> }
>>>
>>> I'm not sure if that will work all the time. It took days of testing to
>>> get reliable error data for the cases we're looking for here.
>>
>> Could you please tell exactly what those cases are. I don't see when the
>> simplified variant could fail, but maybe I already forgot some extra
>> details about how APB DMA works.
>>
>> I tested the variant I'm suggesting (with the fixed typos and added
>> check for the BUSY bit) and it works absolutely fine, audio stuttering
>> issue is fixed, everything else works too. Please consider to use it for
>> the next version of the patch if there are no objections.
>>
> 
> Actually the BUSY bit checking shouldn't be needed. I think it's a bug
> in the driver that it may not enable EOC interrupt and will send a patch
> to fix it.
> 

Hello Ben,

I'm going to post a reduced version of the patch that I'm was suggesting
here since it fixes a longstanding problem that I'm experiencing. Any
other changes could be made on top of it later on if needed. Please let
me know if you have any objections, I can wait a bit longer if you're
going to send an updated version of the patch that addresses all of the
comments anytime soon.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-06-12 18:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-24 16:23 dma: tegra: add accurate reporting of dma state Ben Dooks
2019-04-24 16:23 ` [PATCH] " Ben Dooks
2019-04-24 18:17 ` Dmitry Osipenko
2019-04-24 18:17   ` [PATCH] " Dmitry Osipenko
2019-05-01  8:58   ` Ben Dooks
2019-05-01  8:58     ` [PATCH] " Ben Dooks
2019-05-04 16:06     ` Dmitry Osipenko
2019-05-05 13:39       ` Dmitry Osipenko
2019-06-12 18:57         ` Dmitry Osipenko
2019-05-01  8:33 ` Jon Hunter
2019-05-01  8:33   ` [PATCH] " Jon Hunter
2019-05-01 13:13   ` Vinod Koul
2019-05-01 13:13     ` [PATCH] " Vinod Koul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).