linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
@ 2019-03-27 12:21 Arnaud Pouliquen
  2019-04-23 15:18 ` Arnaud Pouliquen
  2019-04-26 12:17 ` Vinod Koul
  0 siblings, 2 replies; 10+ messages in thread
From: Arnaud Pouliquen @ 2019-03-27 12:21 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, arnaud.pouliquen, Pierre-Yves MORDRET, linux-stm32,
	linux-kernel, dmaengine

During residue calculation. the DMA can switch to the next sg. When
this race condition occurs, the residue returned value is not valid.
Indeed the position in the sg returned by the hardware is the position
of the next sg, not the current sg.
Solution is to check the sg after the calculation to verify it.
If a transition is detected we consider that the DMA has switched to
the beginning of next sg.

Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
---
 drivers/dma/stm32-dma.c | 70 ++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 57 insertions(+), 13 deletions(-)

diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
index 4903a40..30309d2 100644
--- a/drivers/dma/stm32-dma.c
+++ b/drivers/dma/stm32-dma.c
@@ -1038,33 +1038,77 @@ static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
 	return ndtr << width;
 }
 
+static bool stm32_dma_is_current_sg(struct stm32_dma_chan *chan)
+{
+	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
+	struct stm32_dma_sg_req *sg_req;
+	u32 dma_scr, dma_smar, id;
+
+	id = chan->id;
+	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(id));
+
+	if (!(dma_scr & STM32_DMA_SCR_DBM))
+		return true;
+
+	sg_req = &chan->desc->sg_req[chan->next_sg];
+
+	if (dma_scr & STM32_DMA_SCR_CT) {
+		dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM0AR(id));
+		return (dma_smar == sg_req->chan_reg.dma_sm0ar);
+	}
+
+	dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM1AR(id));
+
+	return (dma_smar == sg_req->chan_reg.dma_sm1ar);
+}
+
 static size_t stm32_dma_desc_residue(struct stm32_dma_chan *chan,
 				     struct stm32_dma_desc *desc,
 				     u32 next_sg)
 {
 	u32 modulo, burst_size;
-	u32 residue = 0;
+	u32 residue;
+	u32 n_sg = next_sg;
+	struct stm32_dma_sg_req *sg_req = &chan->desc->sg_req[chan->next_sg];
 	int i;
 
+	residue = stm32_dma_get_remaining_bytes(chan);
+
 	/*
-	 * In cyclic mode, for the last period, residue = remaining bytes from
-	 * NDTR
+	 * Calculate the residue means compute the descriptors
+	 * information:
+	 * - the sg currently transferred
+	 * - the remaining position in this sg (NDTR).
+	 *
+	 * The issue is that a race condition can occur if DMA is
+	 * running. DMA can have started to transfer the next sg before
+	 * the position in sg is read. In this case the remaing position
+	 * can correspond to the new sg position.
+	 * The strategy implemented in the stm32 driver is to check the
+	 * sg transition. If detected we can not trust the SxNDTR register
+	 * value, this register can not be up to date during the transition.
+	 * In this case we can assume that the dma is at the beginning of next
+	 * sg so we calculate the residue in consequence.
 	 */
-	if (chan->desc->cyclic && next_sg == 0) {
-		residue = stm32_dma_get_remaining_bytes(chan);
-		goto end;
+
+	if (!stm32_dma_is_current_sg(chan)) {
+		n_sg++;
+		if (n_sg == chan->desc->num_sgs)
+			n_sg = 0;
+		residue = sg_req->len;
 	}
 
 	/*
-	 * For all other periods in cyclic mode, and in sg mode,
-	 * residue = remaining bytes from NDTR + remaining periods/sg to be
-	 * transferred
+	 * In cyclic mode, for the last period, residue = remaining bytes
+	 * from NDTR,
+	 * else for all other periods in cyclic mode, and in sg mode,
+	 * residue = remaining bytes from NDTR + remaining
+	 * periods/sg to be transferred
 	 */
-	for (i = next_sg; i < desc->num_sgs; i++)
-		residue += desc->sg_req[i].len;
-	residue += stm32_dma_get_remaining_bytes(chan);
+	if (!chan->desc->cyclic || n_sg != 0)
+		for (i = n_sg; i < desc->num_sgs; i++)
+			residue += desc->sg_req[i].len;
 
-end:
 	if (!chan->mem_burst)
 		return residue;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
  2019-03-27 12:21 [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma Arnaud Pouliquen
@ 2019-04-23 15:18 ` Arnaud Pouliquen
  2019-04-24  9:26   ` Pierre Yves MORDRET
  2019-04-26 12:17 ` Vinod Koul
  1 sibling, 1 reply; 10+ messages in thread
From: Arnaud Pouliquen @ 2019-04-23 15:18 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel, dmaengine

Hello Vinod,

Just a gentle reminder, if you could take a moment to review this patch.
FYI, the patch has already been internally reviewed by Pierre Yves
mordret...
His sign-off is in the commit message.

Thanks,

Arnaud


On 3/27/19 1:21 PM, Arnaud Pouliquen wrote:
> During residue calculation. the DMA can switch to the next sg. When
> this race condition occurs, the residue returned value is not valid.
> Indeed the position in the sg returned by the hardware is the position
> of the next sg, not the current sg.
> Solution is to check the sg after the calculation to verify it.
> If a transition is detected we consider that the DMA has switched to
> the beginning of next sg.
> 
> Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
> ---
>  drivers/dma/stm32-dma.c | 70 ++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 57 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
> index 4903a40..30309d2 100644
> --- a/drivers/dma/stm32-dma.c
> +++ b/drivers/dma/stm32-dma.c
> @@ -1038,33 +1038,77 @@ static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
>  	return ndtr << width;
>  }
>  
> +static bool stm32_dma_is_current_sg(struct stm32_dma_chan *chan)
> +{
> +	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
> +	struct stm32_dma_sg_req *sg_req;
> +	u32 dma_scr, dma_smar, id;
> +
> +	id = chan->id;
> +	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(id));
> +
> +	if (!(dma_scr & STM32_DMA_SCR_DBM))
> +		return true;
> +
> +	sg_req = &chan->desc->sg_req[chan->next_sg];
> +
> +	if (dma_scr & STM32_DMA_SCR_CT) {
> +		dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM0AR(id));
> +		return (dma_smar == sg_req->chan_reg.dma_sm0ar);
> +	}
> +
> +	dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM1AR(id));
> +
> +	return (dma_smar == sg_req->chan_reg.dma_sm1ar);
> +}
> +
>  static size_t stm32_dma_desc_residue(struct stm32_dma_chan *chan,
>  				     struct stm32_dma_desc *desc,
>  				     u32 next_sg)
>  {
>  	u32 modulo, burst_size;
> -	u32 residue = 0;
> +	u32 residue;
> +	u32 n_sg = next_sg;
> +	struct stm32_dma_sg_req *sg_req = &chan->desc->sg_req[chan->next_sg];
>  	int i;
>  
> +	residue = stm32_dma_get_remaining_bytes(chan);
> +
>  	/*
> -	 * In cyclic mode, for the last period, residue = remaining bytes from
> -	 * NDTR
> +	 * Calculate the residue means compute the descriptors
> +	 * information:
> +	 * - the sg currently transferred
> +	 * - the remaining position in this sg (NDTR).
> +	 *
> +	 * The issue is that a race condition can occur if DMA is
> +	 * running. DMA can have started to transfer the next sg before
> +	 * the position in sg is read. In this case the remaing position
> +	 * can correspond to the new sg position.
> +	 * The strategy implemented in the stm32 driver is to check the
> +	 * sg transition. If detected we can not trust the SxNDTR register
> +	 * value, this register can not be up to date during the transition.
> +	 * In this case we can assume that the dma is at the beginning of next
> +	 * sg so we calculate the residue in consequence.
>  	 */
> -	if (chan->desc->cyclic && next_sg == 0) {
> -		residue = stm32_dma_get_remaining_bytes(chan);
> -		goto end;
> +
> +	if (!stm32_dma_is_current_sg(chan)) {
> +		n_sg++;
> +		if (n_sg == chan->desc->num_sgs)
> +			n_sg = 0;
> +		residue = sg_req->len;
>  	}
>  
>  	/*
> -	 * For all other periods in cyclic mode, and in sg mode,
> -	 * residue = remaining bytes from NDTR + remaining periods/sg to be
> -	 * transferred
> +	 * In cyclic mode, for the last period, residue = remaining bytes
> +	 * from NDTR,
> +	 * else for all other periods in cyclic mode, and in sg mode,
> +	 * residue = remaining bytes from NDTR + remaining
> +	 * periods/sg to be transferred
>  	 */
> -	for (i = next_sg; i < desc->num_sgs; i++)
> -		residue += desc->sg_req[i].len;
> -	residue += stm32_dma_get_remaining_bytes(chan);
> +	if (!chan->desc->cyclic || n_sg != 0)
> +		for (i = n_sg; i < desc->num_sgs; i++)
> +			residue += desc->sg_req[i].len;
>  
> -end:
>  	if (!chan->mem_burst)
>  		return residue;
>  
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
  2019-04-23 15:18 ` Arnaud Pouliquen
@ 2019-04-24  9:26   ` Pierre Yves MORDRET
  0 siblings, 0 replies; 10+ messages in thread
From: Pierre Yves MORDRET @ 2019-04-24  9:26 UTC (permalink / raw)
  To: Arnaud Pouliquen, Vinod Koul
  Cc: Dan Williams, linux-stm32, linux-kernel, dmaengine

Hello

Reviewed-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>

Thanks

On 4/23/19 5:18 PM, Arnaud Pouliquen wrote:
> Hello Vinod,
> 
> Just a gentle reminder, if you could take a moment to review this patch.
> FYI, the patch has already been internally reviewed by Pierre Yves
> mordret...
> His sign-off is in the commit message.
> 
> Thanks,
> 
> Arnaud
> 
> 
> On 3/27/19 1:21 PM, Arnaud Pouliquen wrote:
>> During residue calculation. the DMA can switch to the next sg. When
>> this race condition occurs, the residue returned value is not valid.
>> Indeed the position in the sg returned by the hardware is the position
>> of the next sg, not the current sg.
>> Solution is to check the sg after the calculation to verify it.
>> If a transition is detected we consider that the DMA has switched to
>> the beginning of next sg.
>>
>> Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
>> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
>> ---
>>  drivers/dma/stm32-dma.c | 70 ++++++++++++++++++++++++++++++++++++++++---------
>>  1 file changed, 57 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
>> index 4903a40..30309d2 100644
>> --- a/drivers/dma/stm32-dma.c
>> +++ b/drivers/dma/stm32-dma.c
>> @@ -1038,33 +1038,77 @@ static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
>>  	return ndtr << width;
>>  }
>>  
>> +static bool stm32_dma_is_current_sg(struct stm32_dma_chan *chan)
>> +{
>> +	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
>> +	struct stm32_dma_sg_req *sg_req;
>> +	u32 dma_scr, dma_smar, id;
>> +
>> +	id = chan->id;
>> +	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(id));
>> +
>> +	if (!(dma_scr & STM32_DMA_SCR_DBM))
>> +		return true;
>> +
>> +	sg_req = &chan->desc->sg_req[chan->next_sg];
>> +
>> +	if (dma_scr & STM32_DMA_SCR_CT) {
>> +		dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM0AR(id));
>> +		return (dma_smar == sg_req->chan_reg.dma_sm0ar);
>> +	}
>> +
>> +	dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM1AR(id));
>> +
>> +	return (dma_smar == sg_req->chan_reg.dma_sm1ar);
>> +}
>> +
>>  static size_t stm32_dma_desc_residue(struct stm32_dma_chan *chan,
>>  				     struct stm32_dma_desc *desc,
>>  				     u32 next_sg)
>>  {
>>  	u32 modulo, burst_size;
>> -	u32 residue = 0;
>> +	u32 residue;
>> +	u32 n_sg = next_sg;
>> +	struct stm32_dma_sg_req *sg_req = &chan->desc->sg_req[chan->next_sg];
>>  	int i;
>>  
>> +	residue = stm32_dma_get_remaining_bytes(chan);
>> +
>>  	/*
>> -	 * In cyclic mode, for the last period, residue = remaining bytes from
>> -	 * NDTR
>> +	 * Calculate the residue means compute the descriptors
>> +	 * information:
>> +	 * - the sg currently transferred
>> +	 * - the remaining position in this sg (NDTR).
>> +	 *
>> +	 * The issue is that a race condition can occur if DMA is
>> +	 * running. DMA can have started to transfer the next sg before
>> +	 * the position in sg is read. In this case the remaing position
>> +	 * can correspond to the new sg position.
>> +	 * The strategy implemented in the stm32 driver is to check the
>> +	 * sg transition. If detected we can not trust the SxNDTR register
>> +	 * value, this register can not be up to date during the transition.
>> +	 * In this case we can assume that the dma is at the beginning of next
>> +	 * sg so we calculate the residue in consequence.
>>  	 */
>> -	if (chan->desc->cyclic && next_sg == 0) {
>> -		residue = stm32_dma_get_remaining_bytes(chan);
>> -		goto end;
>> +
>> +	if (!stm32_dma_is_current_sg(chan)) {
>> +		n_sg++;
>> +		if (n_sg == chan->desc->num_sgs)
>> +			n_sg = 0;
>> +		residue = sg_req->len;
>>  	}
>>  
>>  	/*
>> -	 * For all other periods in cyclic mode, and in sg mode,
>> -	 * residue = remaining bytes from NDTR + remaining periods/sg to be
>> -	 * transferred
>> +	 * In cyclic mode, for the last period, residue = remaining bytes
>> +	 * from NDTR,
>> +	 * else for all other periods in cyclic mode, and in sg mode,
>> +	 * residue = remaining bytes from NDTR + remaining
>> +	 * periods/sg to be transferred
>>  	 */
>> -	for (i = next_sg; i < desc->num_sgs; i++)
>> -		residue += desc->sg_req[i].len;
>> -	residue += stm32_dma_get_remaining_bytes(chan);
>> +	if (!chan->desc->cyclic || n_sg != 0)
>> +		for (i = n_sg; i < desc->num_sgs; i++)
>> +			residue += desc->sg_req[i].len;
>>  
>> -end:
>>  	if (!chan->mem_burst)
>>  		return residue;
>>  
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
  2019-03-27 12:21 [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma Arnaud Pouliquen
  2019-04-23 15:18 ` Arnaud Pouliquen
@ 2019-04-26 12:17 ` Vinod Koul
  2019-04-26 13:41   ` Arnaud Pouliquen
  1 sibling, 1 reply; 10+ messages in thread
From: Vinod Koul @ 2019-04-26 12:17 UTC (permalink / raw)
  To: Arnaud Pouliquen
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel, dmaengine

Hi Arnaud,

Sorry for delay in review, the conference travel/vacation plan delayed
this.

On 27-03-19, 13:21, Arnaud Pouliquen wrote:
> During residue calculation. the DMA can switch to the next sg. When
> this race condition occurs, the residue returned value is not valid.
> Indeed the position in the sg returned by the hardware is the position
> of the next sg, not the current sg.
> Solution is to check the sg after the calculation to verify it.
> If a transition is detected we consider that the DMA has switched to
> the beginning of next sg.

Now, that sounds like duct tape. Why should we bother doing that.

Also looking back at the stm32_dma_desc_residue() and calls to it from
stm32_dma_tx_status() am not sure we are doing the right thing

why are we looking at next_sg here, can you explain me that please

> 
> Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
> ---
>  drivers/dma/stm32-dma.c | 70 ++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 57 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
> index 4903a40..30309d2 100644
> --- a/drivers/dma/stm32-dma.c
> +++ b/drivers/dma/stm32-dma.c
> @@ -1038,33 +1038,77 @@ static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
>  	return ndtr << width;
>  }
>  
> +static bool stm32_dma_is_current_sg(struct stm32_dma_chan *chan)
> +{
> +	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
> +	struct stm32_dma_sg_req *sg_req;
> +	u32 dma_scr, dma_smar, id;
> +
> +	id = chan->id;
> +	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(id));
> +
> +	if (!(dma_scr & STM32_DMA_SCR_DBM))
> +		return true;
> +
> +	sg_req = &chan->desc->sg_req[chan->next_sg];
> +
> +	if (dma_scr & STM32_DMA_SCR_CT) {
> +		dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM0AR(id));
> +		return (dma_smar == sg_req->chan_reg.dma_sm0ar);
> +	}
> +
> +	dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM1AR(id));
> +
> +	return (dma_smar == sg_req->chan_reg.dma_sm1ar);
> +}
> +
>  static size_t stm32_dma_desc_residue(struct stm32_dma_chan *chan,
>  				     struct stm32_dma_desc *desc,
>  				     u32 next_sg)
>  {
>  	u32 modulo, burst_size;
> -	u32 residue = 0;
> +	u32 residue;
> +	u32 n_sg = next_sg;
> +	struct stm32_dma_sg_req *sg_req = &chan->desc->sg_req[chan->next_sg];
>  	int i;
>  
> +	residue = stm32_dma_get_remaining_bytes(chan);
> +
>  	/*
> -	 * In cyclic mode, for the last period, residue = remaining bytes from
> -	 * NDTR
> +	 * Calculate the residue means compute the descriptors
> +	 * information:
> +	 * - the sg currently transferred
> +	 * - the remaining position in this sg (NDTR).
> +	 *
> +	 * The issue is that a race condition can occur if DMA is
> +	 * running. DMA can have started to transfer the next sg before
> +	 * the position in sg is read. In this case the remaing position
> +	 * can correspond to the new sg position.
> +	 * The strategy implemented in the stm32 driver is to check the
> +	 * sg transition. If detected we can not trust the SxNDTR register
> +	 * value, this register can not be up to date during the transition.
> +	 * In this case we can assume that the dma is at the beginning of next
> +	 * sg so we calculate the residue in consequence.
>  	 */
> -	if (chan->desc->cyclic && next_sg == 0) {
> -		residue = stm32_dma_get_remaining_bytes(chan);
> -		goto end;
> +
> +	if (!stm32_dma_is_current_sg(chan)) {
> +		n_sg++;
> +		if (n_sg == chan->desc->num_sgs)
> +			n_sg = 0;
> +		residue = sg_req->len;
>  	}
>  
>  	/*
> -	 * For all other periods in cyclic mode, and in sg mode,
> -	 * residue = remaining bytes from NDTR + remaining periods/sg to be
> -	 * transferred
> +	 * In cyclic mode, for the last period, residue = remaining bytes
> +	 * from NDTR,
> +	 * else for all other periods in cyclic mode, and in sg mode,
> +	 * residue = remaining bytes from NDTR + remaining
> +	 * periods/sg to be transferred
>  	 */
> -	for (i = next_sg; i < desc->num_sgs; i++)
> -		residue += desc->sg_req[i].len;
> -	residue += stm32_dma_get_remaining_bytes(chan);
> +	if (!chan->desc->cyclic || n_sg != 0)
> +		for (i = n_sg; i < desc->num_sgs; i++)
> +			residue += desc->sg_req[i].len;
>  
> -end:
>  	if (!chan->mem_burst)
>  		return residue;
>  
> -- 
> 2.7.4

-- 
~Vinod

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
  2019-04-26 12:17 ` Vinod Koul
@ 2019-04-26 13:41   ` Arnaud Pouliquen
  2019-04-29  5:13     ` Vinod Koul
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaud Pouliquen @ 2019-04-26 13:41 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel, dmaengine

Hi Vinod

On 4/26/19 2:17 PM, Vinod Koul wrote:
> Hi Arnaud,
> 
> Sorry for delay in review, the conference travel/vacation plan delayed
> this.
no problem, just a rememder to be sure that you not missed it in the
patch stream.

> 
> On 27-03-19, 13:21, Arnaud Pouliquen wrote:
>> During residue calculation. the DMA can switch to the next sg. When
>> this race condition occurs, the residue returned value is not valid.
>> Indeed the position in the sg returned by the hardware is the position
>> of the next sg, not the current sg.
>> Solution is to check the sg after the calculation to verify it.
>> If a transition is detected we consider that the DMA has switched to
>> the beginning of next sg.
> 
> Now, that sounds like duct tape. Why should we bother doing that.
> 
> Also looking back at the stm32_dma_desc_residue() and calls to it from
> stm32_dma_tx_status() am not sure we are doing the right thing
Please, could you explain what you have in mind here?

> 
> why are we looking at next_sg here, can you explain me that please

This solution is similar to one implemented in the at_hdmac.c driver
(atc_get_bytes_left function).

Yes could be consider as a workaround for a hardware issue...

In stm32 DMA Peripheral, we can register up to 2 sg descriptors (sg1 &
sg2)in DMA registers, and use it in a cyclic mode (auto reload). This
mode is mainly use for audio transfer initiated by an ALSA driver.

From hardware point of view the DMA transfers first block based on sg1,
then it updates registers to prepare sg2 transfer, and then generates an
IRQ to inform that it issues the next transfer (sg2).

Then driver can update sg1 to prepare the third transfer...

In parallel the client driver can requests status to get the residue to
update internal pointer.
The issue is in the race condition between the call of the
device_tx_status ops and the update of the DMA register on sg switch.

During a short time the hardware updated the registers containing the
sg ID but not the transfer counter(SxNDTR). In this case there is a
mismatch between the Sg ID and the associated transfer counter.
So residue calculation is wrong.
Idea of this patch is to perform the calculation and then to crosscheck
that the hardware has not switched to the next sg during the
calculation. The way to crosscheck is to compare the the sg ID before
and after the calculation.

I tested the solution to force a new recalculation but no real solution
to trust the registers during this phase. In this case an approximation
is to consider that the DMA is transferring the first bytes of the next sg.
So we return the residue corresponding to the beginning of the next buffer.

Don't hesitate if it is still not clear

Thanks
Arnaud

> 
>>
>> Signed-off-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
>> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
>> ---
>>  drivers/dma/stm32-dma.c | 70 ++++++++++++++++++++++++++++++++++++++++---------
>>  1 file changed, 57 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
>> index 4903a40..30309d2 100644
>> --- a/drivers/dma/stm32-dma.c
>> +++ b/drivers/dma/stm32-dma.c
>> @@ -1038,33 +1038,77 @@ static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
>>  	return ndtr << width;
>>  }
>>  
>> +static bool stm32_dma_is_current_sg(struct stm32_dma_chan *chan)
>> +{
>> +	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
>> +	struct stm32_dma_sg_req *sg_req;
>> +	u32 dma_scr, dma_smar, id;
>> +
>> +	id = chan->id;
>> +	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(id));
>> +
>> +	if (!(dma_scr & STM32_DMA_SCR_DBM))
>> +		return true;
>> +
>> +	sg_req = &chan->desc->sg_req[chan->next_sg];
>> +
>> +	if (dma_scr & STM32_DMA_SCR_CT) {
>> +		dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM0AR(id));
>> +		return (dma_smar == sg_req->chan_reg.dma_sm0ar);
>> +	}
>> +
>> +	dma_smar = stm32_dma_read(dmadev, STM32_DMA_SM1AR(id));
>> +
>> +	return (dma_smar == sg_req->chan_reg.dma_sm1ar);
>> +}
>> +
>>  static size_t stm32_dma_desc_residue(struct stm32_dma_chan *chan,
>>  				     struct stm32_dma_desc *desc,
>>  				     u32 next_sg)
>>  {
>>  	u32 modulo, burst_size;
>> -	u32 residue = 0;
>> +	u32 residue;
>> +	u32 n_sg = next_sg;
>> +	struct stm32_dma_sg_req *sg_req = &chan->desc->sg_req[chan->next_sg];
>>  	int i;
>>  
>> +	residue = stm32_dma_get_remaining_bytes(chan);
>> +
>>  	/*
>> -	 * In cyclic mode, for the last period, residue = remaining bytes from
>> -	 * NDTR
>> +	 * Calculate the residue means compute the descriptors
>> +	 * information:
>> +	 * - the sg currently transferred
>> +	 * - the remaining position in this sg (NDTR).
>> +	 *
>> +	 * The issue is that a race condition can occur if DMA is
>> +	 * running. DMA can have started to transfer the next sg before
>> +	 * the position in sg is read. In this case the remaing position
>> +	 * can correspond to the new sg position.
>> +	 * The strategy implemented in the stm32 driver is to check the
>> +	 * sg transition. If detected we can not trust the SxNDTR register
>> +	 * value, this register can not be up to date during the transition.
>> +	 * In this case we can assume that the dma is at the beginning of next
>> +	 * sg so we calculate the residue in consequence.
>>  	 */
>> -	if (chan->desc->cyclic && next_sg == 0) {
>> -		residue = stm32_dma_get_remaining_bytes(chan);
>> -		goto end;
>> +
>> +	if (!stm32_dma_is_current_sg(chan)) {
>> +		n_sg++;
>> +		if (n_sg == chan->desc->num_sgs)
>> +			n_sg = 0;
>> +		residue = sg_req->len;
>>  	}
>>  
>>  	/*
>> -	 * For all other periods in cyclic mode, and in sg mode,
>> -	 * residue = remaining bytes from NDTR + remaining periods/sg to be
>> -	 * transferred
>> +	 * In cyclic mode, for the last period, residue = remaining bytes
>> +	 * from NDTR,
>> +	 * else for all other periods in cyclic mode, and in sg mode,
>> +	 * residue = remaining bytes from NDTR + remaining
>> +	 * periods/sg to be transferred
>>  	 */
>> -	for (i = next_sg; i < desc->num_sgs; i++)
>> -		residue += desc->sg_req[i].len;
>> -	residue += stm32_dma_get_remaining_bytes(chan);
>> +	if (!chan->desc->cyclic || n_sg != 0)
>> +		for (i = n_sg; i < desc->num_sgs; i++)
>> +			residue += desc->sg_req[i].len;
>>  
>> -end:
>>  	if (!chan->mem_burst)
>>  		return residue;
>>  
>> -- 
>> 2.7.4
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
  2019-04-26 13:41   ` Arnaud Pouliquen
@ 2019-04-29  5:13     ` Vinod Koul
  2019-04-29 14:52       ` Arnaud Pouliquen
  0 siblings, 1 reply; 10+ messages in thread
From: Vinod Koul @ 2019-04-29  5:13 UTC (permalink / raw)
  To: Arnaud Pouliquen
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel, dmaengine

On 26-04-19, 15:41, Arnaud Pouliquen wrote:
> >> During residue calculation. the DMA can switch to the next sg. When
> >> this race condition occurs, the residue returned value is not valid.
> >> Indeed the position in the sg returned by the hardware is the position
> >> of the next sg, not the current sg.
> >> Solution is to check the sg after the calculation to verify it.
> >> If a transition is detected we consider that the DMA has switched to
> >> the beginning of next sg.
> > 
> > Now, that sounds like duct tape. Why should we bother doing that.
> > 
> > Also looking back at the stm32_dma_desc_residue() and calls to it from
> > stm32_dma_tx_status() am not sure we are doing the right thing
> Please, could you explain what you have in mind here?

So when we call vchan_find_desc() that tells us if the descriptor is in
the issued queue or not..  Ideally it should not matter if we have one
or N descriptors issued to hardware.

So why should you bother checking for next_sg.

> > why are we looking at next_sg here, can you explain me that please
> 
> This solution is similar to one implemented in the at_hdmac.c driver
> (atc_get_bytes_left function).
> 
> Yes could be consider as a workaround for a hardware issue...
> 
> In stm32 DMA Peripheral, we can register up to 2 sg descriptors (sg1 &
> sg2)in DMA registers, and use it in a cyclic mode (auto reload). This
> mode is mainly use for audio transfer initiated by an ALSA driver.
> 
> >From hardware point of view the DMA transfers first block based on sg1,
> then it updates registers to prepare sg2 transfer, and then generates an
> IRQ to inform that it issues the next transfer (sg2).
> 
> Then driver can update sg1 to prepare the third transfer...
> 
> In parallel the client driver can requests status to get the residue to
> update internal pointer.
> The issue is in the race condition between the call of the
> device_tx_status ops and the update of the DMA register on sg switch.

Sorry I do not agree! You are in stm32_dma_tx_status() hold the lock and
irqs are disabled, so even if sg2 was loaded, you will not get an
interrupt and wont know. By looking at sg1 register you will see that
sg1 is telling you that it has finished and residue can be zero. That is
fine and correct to report.

Most important thing here is that reside is for _requested_ descriptor
and not _current_ descriptor, so looking into sg2 doesnt not fit.

> During a short time the hardware updated the registers containing the
> sg ID but not the transfer counter(SxNDTR). In this case there is a
> mismatch between the Sg ID and the associated transfer counter.
> So residue calculation is wrong.
> Idea of this patch is to perform the calculation and then to crosscheck
> that the hardware has not switched to the next sg during the
> calculation. The way to crosscheck is to compare the the sg ID before
> and after the calculation.
> 
> I tested the solution to force a new recalculation but no real solution
> to trust the registers during this phase. In this case an approximation
> is to consider that the DMA is transferring the first bytes of the next sg.
> So we return the residue corresponding to the beginning of the next buffer.

And that is wrong!. The argument is 'cookie' and you return residue for
that cookie.

For example, if you have dma txn with cookie 1, 2, 3, 4 submitted, then currently HW
is processing cookie 2, then for tx_status on:
cookie 1: return DMA_COMPLETE, residue 0
cookie 2: return DMA_IN_PROGRESS, residue (read from HW)
cookie 3: return DMA_IN_PROGRESS, residue txn length
cookie 4: return DMA_IN_PROGRESS, residue txn length

Thanks
-- 
~Vinod

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
  2019-04-29  5:13     ` Vinod Koul
@ 2019-04-29 14:52       ` Arnaud Pouliquen
  2019-04-30  8:22         ` Vinod Koul
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaud Pouliquen @ 2019-04-29 14:52 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel, dmaengine



On 4/29/19 7:13 AM, Vinod Koul wrote:
> On 26-04-19, 15:41, Arnaud Pouliquen wrote:
>>>> During residue calculation. the DMA can switch to the next sg. When
>>>> this race condition occurs, the residue returned value is not valid.
>>>> Indeed the position in the sg returned by the hardware is the position
>>>> of the next sg, not the current sg.
>>>> Solution is to check the sg after the calculation to verify it.
>>>> If a transition is detected we consider that the DMA has switched to
>>>> the beginning of next sg.
>>>
>>> Now, that sounds like duct tape. Why should we bother doing that.
>>>
>>> Also looking back at the stm32_dma_desc_residue() and calls to it from
>>> stm32_dma_tx_status() am not sure we are doing the right thing
>> Please, could you explain what you have in mind here?
> 
> So when we call vchan_find_desc() that tells us if the descriptor is in
> the issued queue or not..  Ideally it should not matter if we have one
> or N descriptors issued to hardware.
> 
> So why should you bother checking for next_sg.
> 
>>> why are we looking at next_sg here, can you explain me that please
>>
>> This solution is similar to one implemented in the at_hdmac.c driver
>> (atc_get_bytes_left function).
>>
>> Yes could be consider as a workaround for a hardware issue...
>>
>> In stm32 DMA Peripheral, we can register up to 2 sg descriptors (sg1 &
>> sg2)in DMA registers, and use it in a cyclic mode (auto reload). This
>> mode is mainly use for audio transfer initiated by an ALSA driver.
>>
>> >From hardware point of view the DMA transfers first block based on sg1,
>> then it updates registers to prepare sg2 transfer, and then generates an
>> IRQ to inform that it issues the next transfer (sg2).
>>
>> Then driver can update sg1 to prepare the third transfer...
>>
>> In parallel the client driver can requests status to get the residue to
>> update internal pointer.
>> The issue is in the race condition between the call of the
>> device_tx_status ops and the update of the DMA register on sg switch.
> 
> Sorry I do not agree! You are in stm32_dma_tx_status() hold the lock and
> IRQs are disabled, so even if sg2 was loaded, you will not get an
> interrupt and wont know. By looking at sg1 register you will see that
> sg1 is telling you that it has finished and residue can be zero. That is
> fine and correct to report.
> 
> Most important thing here is that reside is for _requested_ descriptor
> and not _current_ descriptor, so looking into sg2 doesnt not fit.
> 
>> During a short time the hardware updated the registers containing the
>> sg ID but not the transfer counter(SxNDTR). In this case there is a
>> mismatch between the Sg ID and the associated transfer counter.
>> So residue calculation is wrong.
>> Idea of this patch is to perform the calculation and then to crosscheck
>> that the hardware has not switched to the next sg during the
>> calculation. The way to crosscheck is to compare the the sg ID before
>> and after the calculation.
>>
>> I tested the solution to force a new recalculation but no real solution
>> to trust the registers during this phase. In this case an approximation
>> is to consider that the DMA is transferring the first bytes of the next sg.
>> So we return the residue corresponding to the beginning of the next buffer.
> 
> And that is wrong!. The argument is 'cookie' and you return residue for
> that cookie.
> 
> For example, if you have dma txn with cookie 1, 2, 3, 4 submitted, then currently HW
> is processing cookie 2, then for tx_status on:
> cookie 1: return DMA_COMPLETE, residue 0
> cookie 2: return DMA_IN_PROGRESS, residue (read from HW)
> cookie 3: return DMA_IN_PROGRESS, residue txn length
> cookie 4: return DMA_IN_PROGRESS, residue txn length
> 
> Thanks
> 
I think i miss something in my explanation, as from my humble POV (not
enough expert in DMA framework...) we only one cookie here as only one
cyclic transfer...

Regarding your answers it looks like my sg explanation are not clear and
introduce confusions... Sorry for this, i was used sg for internal STM32
DMA driver, not for the framework API itself.

Let try retry to re-explain you the stm32 DMA cyclic mode management.

STM32 STM32 hardware:
-------------------
(ref manual:
https://www.st.com/content/ccc/resource/technical/document/reference_manual/group0/51/ba/9e/5e/78/5b/4b/dd/DM00327659/files/DM00327659.pdf/jcr:content/translations/en.DM00327659.pdf)

The stm32 DMA supports cyclic mode using a hardware double
buffer mode.
In this double buffer, we can program up to 2 transfers. When one is
completed, the DMA automatically switch on the other. This could be see
as a hardware LLI with only 2 transfer descriptors.
A hardware bit CT (current target) is used to determine the
current transfer (CT = 0 or 1).
A hardware NDT (num of data to transfer) counter can be read to
determine DMA position in current transfer.
An IRQ is generated when this CT bit is updated to allows driver to
update the double buffer for the next transfer.

On client side (a.e audio):
-------------------------
The client requests a cyclic transfer by calling
stm32_dma_prep_dma_cyclic. For instance it can request the transfer of a
buffer divided in 10 periods. In this case only one cookie submitted
(right?).

At stm32dma driver level these 10 periods are registered in an internal
software table (desc->sg_req[]).As cyclic, the last sg_req point to the
first one.

So to be able to transfer the whole software table, we have to update
the STM32 DMA double buffer at each end of transfer period.
The filed chan->next_sg points to the next sg_req in the software table.
that should be write in the STM32 DMA double buffer.

Residue calculation:
-------------------
During a transfer we can get the position in a period thanks to the
NDT(num of data to transfer) bit-field.

So the calculation is :
1) Get the NDT field value
3) add the periods remaining in the desc->sg_req[] table.

In parallel the STM32 DMA hardware updates the transfer buffer in 3 steps:
1) update CT register field.
2) Update NDT register field.
3) generate the IRQ (As you mention the IRQ is not treated during the
device_tx_status as protected from interrupts).

We are facing issue when computing the residue during the update of the
CT and the NDT. The CT and NDT can as been updated ( both or only CT...)
without driver context update (IRQ disabled).
In this case we can point to the beginning of the current transfer(
completed) instead of the next_transfer. This generates a residue error
and for audio a time-stamp regression (so video freeze or audio plop).

So the patch proposed consists in:
1) getting the current NDT value
2) reading CT and check that the hardware does not point to the next_sg.
	if yes:
	- CT has been updated by hardware but IRQ still not treated.
	- By default we consider the current_sg as completed, so we
	  point to the beginning of the next_sg buffer.

Hope that will help to clarify.
Thanks
arnaud




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
  2019-04-29 14:52       ` Arnaud Pouliquen
@ 2019-04-30  8:22         ` Vinod Koul
  2019-04-30 14:58           ` Arnaud Pouliquen
  0 siblings, 1 reply; 10+ messages in thread
From: Vinod Koul @ 2019-04-30  8:22 UTC (permalink / raw)
  To: Arnaud Pouliquen
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel, dmaengine

On 29-04-19, 16:52, Arnaud Pouliquen wrote:
> 
> 
> On 4/29/19 7:13 AM, Vinod Koul wrote:
> > On 26-04-19, 15:41, Arnaud Pouliquen wrote:
> >>>> During residue calculation. the DMA can switch to the next sg. When
> >>>> this race condition occurs, the residue returned value is not valid.
> >>>> Indeed the position in the sg returned by the hardware is the position
> >>>> of the next sg, not the current sg.
> >>>> Solution is to check the sg after the calculation to verify it.
> >>>> If a transition is detected we consider that the DMA has switched to
> >>>> the beginning of next sg.
> >>>
> >>> Now, that sounds like duct tape. Why should we bother doing that.
> >>>
> >>> Also looking back at the stm32_dma_desc_residue() and calls to it from
> >>> stm32_dma_tx_status() am not sure we are doing the right thing
> >> Please, could you explain what you have in mind here?
> > 
> > So when we call vchan_find_desc() that tells us if the descriptor is in
> > the issued queue or not..  Ideally it should not matter if we have one
> > or N descriptors issued to hardware.
> > 
> > So why should you bother checking for next_sg.
> > 
> >>> why are we looking at next_sg here, can you explain me that please
> >>
> >> This solution is similar to one implemented in the at_hdmac.c driver
> >> (atc_get_bytes_left function).
> >>
> >> Yes could be consider as a workaround for a hardware issue...
> >>
> >> In stm32 DMA Peripheral, we can register up to 2 sg descriptors (sg1 &
> >> sg2)in DMA registers, and use it in a cyclic mode (auto reload). This
> >> mode is mainly use for audio transfer initiated by an ALSA driver.
> >>
> >> >From hardware point of view the DMA transfers first block based on sg1,
> >> then it updates registers to prepare sg2 transfer, and then generates an
> >> IRQ to inform that it issues the next transfer (sg2).
> >>
> >> Then driver can update sg1 to prepare the third transfer...
> >>
> >> In parallel the client driver can requests status to get the residue to
> >> update internal pointer.
> >> The issue is in the race condition between the call of the
> >> device_tx_status ops and the update of the DMA register on sg switch.
> > 
> > Sorry I do not agree! You are in stm32_dma_tx_status() hold the lock and
> > IRQs are disabled, so even if sg2 was loaded, you will not get an
> > interrupt and wont know. By looking at sg1 register you will see that
> > sg1 is telling you that it has finished and residue can be zero. That is
> > fine and correct to report.
> > 
> > Most important thing here is that reside is for _requested_ descriptor
> > and not _current_ descriptor, so looking into sg2 doesnt not fit.
> > 
> >> During a short time the hardware updated the registers containing the
> >> sg ID but not the transfer counter(SxNDTR). In this case there is a
> >> mismatch between the Sg ID and the associated transfer counter.
> >> So residue calculation is wrong.
> >> Idea of this patch is to perform the calculation and then to crosscheck
> >> that the hardware has not switched to the next sg during the
> >> calculation. The way to crosscheck is to compare the the sg ID before
> >> and after the calculation.
> >>
> >> I tested the solution to force a new recalculation but no real solution
> >> to trust the registers during this phase. In this case an approximation
> >> is to consider that the DMA is transferring the first bytes of the next sg.
> >> So we return the residue corresponding to the beginning of the next buffer.
> > 
> > And that is wrong!. The argument is 'cookie' and you return residue for
> > that cookie.
> > 
> > For example, if you have dma txn with cookie 1, 2, 3, 4 submitted, then currently HW
> > is processing cookie 2, then for tx_status on:
> > cookie 1: return DMA_COMPLETE, residue 0
> > cookie 2: return DMA_IN_PROGRESS, residue (read from HW)
> > cookie 3: return DMA_IN_PROGRESS, residue txn length
> > cookie 4: return DMA_IN_PROGRESS, residue txn length
> > 
> > Thanks
> > 
> I think i miss something in my explanation, as from my humble POV (not
> enough expert in DMA framework...) we only one cookie here as only one
> cyclic transfer...

> Regarding your answers it looks like my sg explanation are not clear and
> introduce confusions... Sorry for this, i was used sg for internal STM32
> DMA driver, not for the framework API itself.
> 
> Let try retry to re-explain you the stm32 DMA cyclic mode management.
> 
> STM32 STM32 hardware:
> -------------------
> (ref manual:
> https://www.st.com/content/ccc/resource/technical/document/reference_manual/group0/51/ba/9e/5e/78/5b/4b/dd/DM00327659/files/DM00327659.pdf/jcr:content/translations/en.DM00327659.pdf)
> 
> The stm32 DMA supports cyclic mode using a hardware double
> buffer mode.
> In this double buffer, we can program up to 2 transfers. When one is
> completed, the DMA automatically switch on the other. This could be see
> as a hardware LLI with only 2 transfer descriptors.
> A hardware bit CT (current target) is used to determine the
> current transfer (CT = 0 or 1).
> A hardware NDT (num of data to transfer) counter can be read to
> determine DMA position in current transfer.
> An IRQ is generated when this CT bit is updated to allows driver to
> update the double buffer for the next transfer.
> 
> On client side (a.e audio):
> -------------------------
> The client requests a cyclic transfer by calling
> stm32_dma_prep_dma_cyclic. For instance it can request the transfer of a
> buffer divided in 10 periods. In this case only one cookie submitted
> (right?).
> 
> At stm32dma driver level these 10 periods are registered in an internal
> software table (desc->sg_req[]).As cyclic, the last sg_req point to the
> first one.
> 
> So to be able to transfer the whole software table, we have to update
> the STM32 DMA double buffer at each end of transfer period.
> The filed chan->next_sg points to the next sg_req in the software table.
> that should be write in the STM32 DMA double buffer.
> 
> Residue calculation:
> -------------------
> During a transfer we can get the position in a period thanks to the
> NDT(num of data to transfer) bit-field.
> 
> So the calculation is :
> 1) Get the NDT field value
> 3) add the periods remaining in the desc->sg_req[] table.
> 
> In parallel the STM32 DMA hardware updates the transfer buffer in 3 steps:
> 1) update CT register field.
> 2) Update NDT register field.
> 3) generate the IRQ (As you mention the IRQ is not treated during the
> device_tx_status as protected from interrupts).
> 
> We are facing issue when computing the residue during the update of the
> CT and the NDT. The CT and NDT can as been updated ( both or only CT...)
> without driver context update (IRQ disabled).
> In this case we can point to the beginning of the current transfer(
> completed) instead of the next_transfer. This generates a residue error
> and for audio a time-stamp regression (so video freeze or audio plop).
> 
> So the patch proposed consists in:
> 1) getting the current NDT value
> 2) reading CT and check that the hardware does not point to the next_sg.
> 	if yes:
> 	- CT has been updated by hardware but IRQ still not treated.
> 	- By default we consider the current_sg as completed, so we
> 	  point to the beginning of the next_sg buffer.
> 
> Hope that will help to clarify.

Yes that helps, maybe we should add these bits in code and changelog..
:)

And how does this impact non cyclic case where N descriptors maybe
issued. The driver seems to support non cyclic too...

Thanks
-- 
~Vinod

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
  2019-04-30  8:22         ` Vinod Koul
@ 2019-04-30 14:58           ` Arnaud Pouliquen
  2019-04-30 17:18             ` Vinod Koul
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaud Pouliquen @ 2019-04-30 14:58 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel, dmaengine



On 4/30/19 10:22 AM, Vinod Koul wrote:
> On 29-04-19, 16:52, Arnaud Pouliquen wrote:
>>
>>
>> On 4/29/19 7:13 AM, Vinod Koul wrote:
>>> On 26-04-19, 15:41, Arnaud Pouliquen wrote:
>>>>>> During residue calculation. the DMA can switch to the next sg. When
>>>>>> this race condition occurs, the residue returned value is not valid.
>>>>>> Indeed the position in the sg returned by the hardware is the position
>>>>>> of the next sg, not the current sg.
>>>>>> Solution is to check the sg after the calculation to verify it.
>>>>>> If a transition is detected we consider that the DMA has switched to
>>>>>> the beginning of next sg.
>>>>>
>>>>> Now, that sounds like duct tape. Why should we bother doing that.
>>>>>
>>>>> Also looking back at the stm32_dma_desc_residue() and calls to it from
>>>>> stm32_dma_tx_status() am not sure we are doing the right thing
>>>> Please, could you explain what you have in mind here?
>>>
>>> So when we call vchan_find_desc() that tells us if the descriptor is in
>>> the issued queue or not..  Ideally it should not matter if we have one
>>> or N descriptors issued to hardware.
>>>
>>> So why should you bother checking for next_sg.
>>>
>>>>> why are we looking at next_sg here, can you explain me that please
>>>>
>>>> This solution is similar to one implemented in the at_hdmac.c driver
>>>> (atc_get_bytes_left function).
>>>>
>>>> Yes could be consider as a workaround for a hardware issue...
>>>>
>>>> In stm32 DMA Peripheral, we can register up to 2 sg descriptors (sg1 &
>>>> sg2)in DMA registers, and use it in a cyclic mode (auto reload). This
>>>> mode is mainly use for audio transfer initiated by an ALSA driver.
>>>>
>>>> >From hardware point of view the DMA transfers first block based on sg1,
>>>> then it updates registers to prepare sg2 transfer, and then generates an
>>>> IRQ to inform that it issues the next transfer (sg2).
>>>>
>>>> Then driver can update sg1 to prepare the third transfer...
>>>>
>>>> In parallel the client driver can requests status to get the residue to
>>>> update internal pointer.
>>>> The issue is in the race condition between the call of the
>>>> device_tx_status ops and the update of the DMA register on sg switch.
>>>
>>> Sorry I do not agree! You are in stm32_dma_tx_status() hold the lock and
>>> IRQs are disabled, so even if sg2 was loaded, you will not get an
>>> interrupt and wont know. By looking at sg1 register you will see that
>>> sg1 is telling you that it has finished and residue can be zero. That is
>>> fine and correct to report.
>>>
>>> Most important thing here is that reside is for _requested_ descriptor
>>> and not _current_ descriptor, so looking into sg2 doesnt not fit.
>>>
>>>> During a short time the hardware updated the registers containing the
>>>> sg ID but not the transfer counter(SxNDTR). In this case there is a
>>>> mismatch between the Sg ID and the associated transfer counter.
>>>> So residue calculation is wrong.
>>>> Idea of this patch is to perform the calculation and then to crosscheck
>>>> that the hardware has not switched to the next sg during the
>>>> calculation. The way to crosscheck is to compare the the sg ID before
>>>> and after the calculation.
>>>>
>>>> I tested the solution to force a new recalculation but no real solution
>>>> to trust the registers during this phase. In this case an approximation
>>>> is to consider that the DMA is transferring the first bytes of the next sg.
>>>> So we return the residue corresponding to the beginning of the next buffer.
>>>
>>> And that is wrong!. The argument is 'cookie' and you return residue for
>>> that cookie.
>>>
>>> For example, if you have dma txn with cookie 1, 2, 3, 4 submitted, then currently HW
>>> is processing cookie 2, then for tx_status on:
>>> cookie 1: return DMA_COMPLETE, residue 0
>>> cookie 2: return DMA_IN_PROGRESS, residue (read from HW)
>>> cookie 3: return DMA_IN_PROGRESS, residue txn length
>>> cookie 4: return DMA_IN_PROGRESS, residue txn length
>>>
>>> Thanks
>>>
>> I think i miss something in my explanation, as from my humble POV (not
>> enough expert in DMA framework...) we only one cookie here as only one
>> cyclic transfer...
> 
>> Regarding your answers it looks like my sg explanation are not clear and
>> introduce confusions... Sorry for this, i was used sg for internal STM32
>> DMA driver, not for the framework API itself.
>>
>> Let try retry to re-explain you the stm32 DMA cyclic mode management.
>>
>> STM32 STM32 hardware:
>> -------------------
>> (ref manual:
>> https://www.st.com/content/ccc/resource/technical/document/reference_manual/group0/51/ba/9e/5e/78/5b/4b/dd/DM00327659/files/DM00327659.pdf/jcr:content/translations/en.DM00327659.pdf)
>>
>> The stm32 DMA supports cyclic mode using a hardware double
>> buffer mode.
>> In this double buffer, we can program up to 2 transfers. When one is
>> completed, the DMA automatically switch on the other. This could be see
>> as a hardware LLI with only 2 transfer descriptors.
>> A hardware bit CT (current target) is used to determine the
>> current transfer (CT = 0 or 1).
>> A hardware NDT (num of data to transfer) counter can be read to
>> determine DMA position in current transfer.
>> An IRQ is generated when this CT bit is updated to allows driver to
>> update the double buffer for the next transfer.
>>
>> On client side (a.e audio):
>> -------------------------
>> The client requests a cyclic transfer by calling
>> stm32_dma_prep_dma_cyclic. For instance it can request the transfer of a
>> buffer divided in 10 periods. In this case only one cookie submitted
>> (right?).
>>
>> At stm32dma driver level these 10 periods are registered in an internal
>> software table (desc->sg_req[]).As cyclic, the last sg_req point to the
>> first one.
>>
>> So to be able to transfer the whole software table, we have to update
>> the STM32 DMA double buffer at each end of transfer period.
>> The filed chan->next_sg points to the next sg_req in the software table.
>> that should be write in the STM32 DMA double buffer.
>>
>> Residue calculation:
>> -------------------
>> During a transfer we can get the position in a period thanks to the
>> NDT(num of data to transfer) bit-field.
>>
>> So the calculation is :
>> 1) Get the NDT field value
>> 3) add the periods remaining in the desc->sg_req[] table.
>>
>> In parallel the STM32 DMA hardware updates the transfer buffer in 3 steps:
>> 1) update CT register field.
>> 2) Update NDT register field.
>> 3) generate the IRQ (As you mention the IRQ is not treated during the
>> device_tx_status as protected from interrupts).
>>
>> We are facing issue when computing the residue during the update of the
>> CT and the NDT. The CT and NDT can as been updated ( both or only CT...)
>> without driver context update (IRQ disabled).
>> In this case we can point to the beginning of the current transfer(
>> completed) instead of the next_transfer. This generates a residue error
>> and for audio a time-stamp regression (so video freeze or audio plop).
>>
>> So the patch proposed consists in:
>> 1) getting the current NDT value
>> 2) reading CT and check that the hardware does not point to the next_sg.
>> 	if yes:
>> 	- CT has been updated by hardware but IRQ still not treated.
>> 	- By default we consider the current_sg as completed, so we
>> 	  point to the beginning of the next_sg buffer.
>>
>> Hope that will help to clarify.
> 
> Yes that helps, maybe we should add these bits in code and changelog..
> :)I will update the comments and commit message in a V2 in this way
> 
> And how does this impact non cyclic case where N descriptors maybe
> issued. The driver seems to support non cyclic too...

Correct it supports SG as well, but double buffer mode is not used in
such case. Hw is programmed under IT for every descriptors : no
automatic register reloaded as in cyclic mode. We won't end up in the
situation depicted below.

Thanks
-- 
~Arnaud

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma
  2019-04-30 14:58           ` Arnaud Pouliquen
@ 2019-04-30 17:18             ` Vinod Koul
  0 siblings, 0 replies; 10+ messages in thread
From: Vinod Koul @ 2019-04-30 17:18 UTC (permalink / raw)
  To: Arnaud Pouliquen
  Cc: Dan Williams, Pierre-Yves MORDRET, linux-stm32, linux-kernel, dmaengine

On 30-04-19, 16:58, Arnaud Pouliquen wrote:

> >> Hope that will help to clarify.
> > 
> > Yes that helps, maybe we should add these bits in code and changelog..
> > :)I will update the comments and commit message in a V2 in this way
> > 
> > And how does this impact non cyclic case where N descriptors maybe
> > issued. The driver seems to support non cyclic too...
> 
> Correct it supports SG as well, but double buffer mode is not used in
> such case. Hw is programmed under IT for every descriptors : no
> automatic register reloaded as in cyclic mode. We won't end up in the
> situation depicted below.

Okay sounds good then. Can you add a bit more of this in the code (this
was very helpful) not as a fix but documentation so that people (or you
down the line) will remember why this was done like this

Thanks

-- 
~Vinod

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-04-30 17:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-27 12:21 [PATCH] dmaengine: stm32-dma: fix residue calculation in stm32-dma Arnaud Pouliquen
2019-04-23 15:18 ` Arnaud Pouliquen
2019-04-24  9:26   ` Pierre Yves MORDRET
2019-04-26 12:17 ` Vinod Koul
2019-04-26 13:41   ` Arnaud Pouliquen
2019-04-29  5:13     ` Vinod Koul
2019-04-29 14:52       ` Arnaud Pouliquen
2019-04-30  8:22         ` Vinod Koul
2019-04-30 14:58           ` Arnaud Pouliquen
2019-04-30 17:18             ` Vinod Koul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).