linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sdhci: fix the fake timeout bug
@ 2018-11-30  7:00 Du, Alek
  2018-11-30  9:19 ` Adrian Hunter
  0 siblings, 1 reply; 14+ messages in thread
From: Du, Alek @ 2018-11-30  7:00 UTC (permalink / raw)
  To: linux-mmc, adrian.hunter, ulf.hansson; +Cc: linux-kernel

From b893df3a1a937bd5fe22d39ceae93454a2e5e0e4 Mon Sep 17 00:00:00 2001
From: Alek Du <alek.du@intel.com>
Date: Fri, 30 Nov 2018 14:02:28 +0800
Subject: [PATCH] sdhci: fix the fake timeout bug

We observed some fake timeouts on some devices, the log is like this:

[ 7586.290201] mmc1: Timeout waiting for hardware cmd interrupt.
[ 7586.290420] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
...
[ 7586.291774] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x00000203

From the clock control register dump, we are pretty sure the clock was
stabilized.

In other cases, we also observed:

[ 7596.530171] mmc1: Timeout waiting for hardware cmd interrupt.

and

[ 1956.534634] mmc1: Reset 0x2 never completed.

But we are pretty sure the mmc controller is working perfectly under low
system load.

After checking the sdhci code, we found the timeout check actually has a
little window that the CPU can be scheduled out and when it comes back,
the original time set or check is not valid.

Signed-off-by: Alek Du <alek.du@intel.com>
---
 drivers/mmc/host/sdhci.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 99bdae53fa2e..f88c49fc574e 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -218,12 +218,17 @@ void sdhci_reset(struct sdhci_host *host, u8 mask)
 	/* hw clears the bit when it's done */
 	while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
 		if (ktime_after(ktime_get(), timeout)) {
+			/* check it again, since there is a window between
+			   bit check and time check */
+			if (!(sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask))
+				break;
 			pr_err("%s: Reset 0x%x never completed.\n",
 				mmc_hostname(host->mmc), (int)mask);
 			sdhci_dumpregs(host);
 			return;
+		} else {
+			udelay(10);
 		}
-		udelay(10);
 	}
 }
 EXPORT_SYMBOL_GPL(sdhci_reset);
@@ -1395,9 +1400,10 @@ void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd)
 		timeout += DIV_ROUND_UP(cmd->busy_timeout, 1000) * HZ + HZ;
 	else
 		timeout += 10 * HZ;
-	sdhci_mod_timer(host, cmd->mrq, timeout);
 
 	sdhci_writew(host, SDHCI_MAKE_CMD(cmd->opcode, flags), SDHCI_COMMAND);
+	/* setup timer after command to avoid fake timeout */
+	sdhci_mod_timer(host, cmd->mrq, timeout);
 }
 EXPORT_SYMBOL_GPL(sdhci_send_command);
 
@@ -1611,12 +1617,19 @@ void sdhci_enable_clk(struct sdhci_host *host, u16 clk)
 	while (!((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
 		& SDHCI_CLOCK_INT_STABLE)) {
 		if (ktime_after(ktime_get(), timeout)) {
+			/* check it again since there is a window between
+			   status check and time check */
+			if ((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
+					& SDHCI_CLOCK_INT_STABLE)
+				break;
 			pr_err("%s: Internal clock never stabilised.\n",
 			       mmc_hostname(host->mmc));
 			sdhci_dumpregs(host);
 			return;
 		}
-		udelay(10);
+		else {
+			udelay(10);
+		}
 	}
 
 	clk |= SDHCI_CLOCK_CARD_EN;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] sdhci: fix the fake timeout bug
  2018-11-30  7:00 [PATCH] sdhci: fix the fake timeout bug Du, Alek
@ 2018-11-30  9:19 ` Adrian Hunter
  2018-11-30 14:13   ` Du, Alek
  0 siblings, 1 reply; 14+ messages in thread
From: Adrian Hunter @ 2018-11-30  9:19 UTC (permalink / raw)
  To: Du, Alek, linux-mmc, ulf.hansson; +Cc: linux-kernel

On 30/11/18 9:00 AM, Du, Alek wrote:
>>From b893df3a1a937bd5fe22d39ceae93454a2e5e0e4 Mon Sep 17 00:00:00 2001
> From: Alek Du <alek.du@intel.com>
> Date: Fri, 30 Nov 2018 14:02:28 +0800
> Subject: [PATCH] sdhci: fix the fake timeout bug
> 
> We observed some fake timeouts on some devices, the log is like this:
> 
> [ 7586.290201] mmc1: Timeout waiting for hardware cmd interrupt.
> [ 7586.290420] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> ...
> [ 7586.291774] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x00000203
> 
>>From the clock control register dump, we are pretty sure the clock was
> stabilized.
> 
> In other cases, we also observed:
> 
> [ 7596.530171] mmc1: Timeout waiting for hardware cmd interrupt.
> 
> and
> 
> [ 1956.534634] mmc1: Reset 0x2 never completed.
> 
> But we are pretty sure the mmc controller is working perfectly under low
> system load.
> 
> After checking the sdhci code, we found the timeout check actually has a
> little window that the CPU can be scheduled out and when it comes back,
> the original time set or check is not valid.

Commands and resets are under spin lock, so no possibility for preemption,
and certainly a few microseconds isn't going to make any difference to these
timeouts, so I don't see how this patch could help.

I recently sent a patch for GLK laptops that had BIOS issues:

	https://marc.info/?l=linux-mmc&m=154270005901609

And also some improvements to error handling:

	https://marc.info/?l=linux-mmc&m=154229013900437

If those don't help, please share more details of the actual problem.

> 
> Signed-off-by: Alek Du <alek.du@intel.com>
> ---
>  drivers/mmc/host/sdhci.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> index 99bdae53fa2e..f88c49fc574e 100644
> --- a/drivers/mmc/host/sdhci.c
> +++ b/drivers/mmc/host/sdhci.c
> @@ -218,12 +218,17 @@ void sdhci_reset(struct sdhci_host *host, u8 mask)
>  	/* hw clears the bit when it's done */
>  	while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
>  		if (ktime_after(ktime_get(), timeout)) {
> +			/* check it again, since there is a window between
> +			   bit check and time check */
> +			if (!(sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask))
> +				break;
>  			pr_err("%s: Reset 0x%x never completed.\n",
>  				mmc_hostname(host->mmc), (int)mask);
>  			sdhci_dumpregs(host);
>  			return;
> +		} else {
> +			udelay(10);
>  		}
> -		udelay(10);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(sdhci_reset);
> @@ -1395,9 +1400,10 @@ void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd)
>  		timeout += DIV_ROUND_UP(cmd->busy_timeout, 1000) * HZ + HZ;
>  	else
>  		timeout += 10 * HZ;
> -	sdhci_mod_timer(host, cmd->mrq, timeout);
>  
>  	sdhci_writew(host, SDHCI_MAKE_CMD(cmd->opcode, flags), SDHCI_COMMAND);
> +	/* setup timer after command to avoid fake timeout */
> +	sdhci_mod_timer(host, cmd->mrq, timeout);
>  }
>  EXPORT_SYMBOL_GPL(sdhci_send_command);
>  
> @@ -1611,12 +1617,19 @@ void sdhci_enable_clk(struct sdhci_host *host, u16 clk)
>  	while (!((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
>  		& SDHCI_CLOCK_INT_STABLE)) {
>  		if (ktime_after(ktime_get(), timeout)) {
> +			/* check it again since there is a window between
> +			   status check and time check */
> +			if ((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
> +					& SDHCI_CLOCK_INT_STABLE)
> +				break;
>  			pr_err("%s: Internal clock never stabilised.\n",
>  			       mmc_hostname(host->mmc));
>  			sdhci_dumpregs(host);
>  			return;
>  		}
> -		udelay(10);
> +		else {
> +			udelay(10);
> +		}
>  	}
>  
>  	clk |= SDHCI_CLOCK_CARD_EN;
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sdhci: fix the fake timeout bug
  2018-11-30  9:19 ` Adrian Hunter
@ 2018-11-30 14:13   ` Du, Alek
  2018-11-30 14:40     ` Adrian Hunter
  0 siblings, 1 reply; 14+ messages in thread
From: Du, Alek @ 2018-11-30 14:13 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-mmc, ulf.hansson, linux-kernel

On Fri, 30 Nov 2018 11:19:03 +0200
Adrian Hunter <adrian.hunter@intel.com> wrote:

> Commands and resets are under spin lock, so no possibility for
> preemption, and certainly a few microseconds isn't going to make any
> difference to these timeouts, so I don't see how this patch could
> help.

Thanks for your review.

I believe my case the scheduling out delay sometime can reach 10~20
milliseconds... This may be due to the hypervisor...

Please look at the sdhci_enable_clk() below, there is a window in clock
stabilization check. The first check is to check status register, the
second check is to check if time passed. That's why I can capture a
case that after time passed, the actually clock control register
indicated that clock is stable. So the error handling is wrong...

Also the sdhci_send_command commands is not in spin lock There is a
windows between mod_timer and real command send...

Thanks,
Alek

> 
> I recently sent a patch for GLK laptops that had BIOS issues:
> 
> 	https://marc.info/?l=linux-mmc&m=154270005901609
> 
> And also some improvements to error handling:
> 
> 	https://marc.info/?l=linux-mmc&m=154229013900437
> 
> If those don't help, please share more details of the actual problem.

Sorry, I can't see how this patches could fix my problem.



> 
> > 
> > Signed-off-by: Alek Du <alek.du@intel.com>
> > ---
> >  drivers/mmc/host/sdhci.c | 19 ++++++++++++++++---
> >  1 file changed, 16 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> > index 99bdae53fa2e..f88c49fc574e 100644
> > --- a/drivers/mmc/host/sdhci.c
> > +++ b/drivers/mmc/host/sdhci.c
> > @@ -218,12 +218,17 @@ void sdhci_reset(struct sdhci_host *host, u8
> > mask) /* hw clears the bit when it's done */
> >  	while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
> >  		if (ktime_after(ktime_get(), timeout)) {
> > +			/* check it again, since there is a window
> > between
> > +			   bit check and time check */
> > +			if (!(sdhci_readb(host,
> > SDHCI_SOFTWARE_RESET) & mask))
> > +				break;
> >  			pr_err("%s: Reset 0x%x never completed.\n",
> >  				mmc_hostname(host->mmc),
> > (int)mask); sdhci_dumpregs(host);
> >  			return;
> > +		} else {
> > +			udelay(10);
> >  		}
> > -		udelay(10);
> >  	}
> >  }
> >  EXPORT_SYMBOL_GPL(sdhci_reset);
> > @@ -1395,9 +1400,10 @@ void sdhci_send_command(struct sdhci_host
> > *host, struct mmc_command *cmd) timeout +=
> > DIV_ROUND_UP(cmd->busy_timeout, 1000) * HZ + HZ; else
> >  		timeout += 10 * HZ;
> > -	sdhci_mod_timer(host, cmd->mrq, timeout);
> >  
> >  	sdhci_writew(host, SDHCI_MAKE_CMD(cmd->opcode, flags),
> > SDHCI_COMMAND);
> > +	/* setup timer after command to avoid fake timeout */
> > +	sdhci_mod_timer(host, cmd->mrq, timeout);
> >  }
> >  EXPORT_SYMBOL_GPL(sdhci_send_command);
> >  
> > @@ -1611,12 +1617,19 @@ void sdhci_enable_clk(struct sdhci_host
> > *host, u16 clk) while (!((clk = sdhci_readw(host,
> > SDHCI_CLOCK_CONTROL)) & SDHCI_CLOCK_INT_STABLE)) {
> >  		if (ktime_after(ktime_get(), timeout)) {
> > +			/* check it again since there is a window
> > between
> > +			   status check and time check */
> > +			if ((clk = sdhci_readw(host,
> > SDHCI_CLOCK_CONTROL))
> > +					& SDHCI_CLOCK_INT_STABLE)
> > +				break;
> >  			pr_err("%s: Internal clock never
> > stabilised.\n", mmc_hostname(host->mmc));
> >  			sdhci_dumpregs(host);
> >  			return;
> >  		}
> > -		udelay(10);
> > +		else {
> > +			udelay(10);
> > +		}
> >  	}
> >  
> >  	clk |= SDHCI_CLOCK_CARD_EN;
> >   
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sdhci: fix the fake timeout bug
  2018-11-30 14:13   ` Du, Alek
@ 2018-11-30 14:40     ` Adrian Hunter
  2018-12-01  5:42       ` Du, Alek
  0 siblings, 1 reply; 14+ messages in thread
From: Adrian Hunter @ 2018-11-30 14:40 UTC (permalink / raw)
  To: Du, Alek; +Cc: linux-mmc, ulf.hansson, linux-kernel

On 30/11/18 4:13 PM, Du, Alek wrote:
> On Fri, 30 Nov 2018 11:19:03 +0200
> Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> Commands and resets are under spin lock, so no possibility for
>> preemption, and certainly a few microseconds isn't going to make any
>> difference to these timeouts, so I don't see how this patch could
>> help.
> 
> Thanks for your review.
> 
> I believe my case the scheduling out delay sometime can reach 10~20
> milliseconds... This may be due to the hypervisor...

So you are saying this only happens under virtualization?

> 
> Please look at the sdhci_enable_clk() below, there is a window in clock
> stabilization check. The first check is to check status register, the
> second check is to check if time passed. That's why I can capture a
> case that after time passed, the actually clock control register
> indicated that clock is stable. So the error handling is wrong...

Sure, but "Internal clock never stabilised." is not one of the  errors you
listed.

> 
> Also the sdhci_send_command commands is not in spin lock There is a
> windows between mod_timer and real command send...

What code path does not have a spin lock?

> 
> Thanks,
> Alek
> 
>>
>> I recently sent a patch for GLK laptops that had BIOS issues:
>>
>> 	https://marc.info/?l=linux-mmc&m=154270005901609
>>
>> And also some improvements to error handling:
>>
>> 	https://marc.info/?l=linux-mmc&m=154229013900437
>>
>> If those don't help, please share more details of the actual problem.
> 
> Sorry, I can't see how this patches could fix my problem.
> 
> 
> 
>>
>>>
>>> Signed-off-by: Alek Du <alek.du@intel.com>
>>> ---
>>>  drivers/mmc/host/sdhci.c | 19 ++++++++++++++++---
>>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
>>> index 99bdae53fa2e..f88c49fc574e 100644
>>> --- a/drivers/mmc/host/sdhci.c
>>> +++ b/drivers/mmc/host/sdhci.c
>>> @@ -218,12 +218,17 @@ void sdhci_reset(struct sdhci_host *host, u8
>>> mask) /* hw clears the bit when it's done */
>>>  	while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
>>>  		if (ktime_after(ktime_get(), timeout)) {
>>> +			/* check it again, since there is a window
>>> between
>>> +			   bit check and time check */
>>> +			if (!(sdhci_readb(host,
>>> SDHCI_SOFTWARE_RESET) & mask))
>>> +				break;
>>>  			pr_err("%s: Reset 0x%x never completed.\n",
>>>  				mmc_hostname(host->mmc),
>>> (int)mask); sdhci_dumpregs(host);
>>>  			return;
>>> +		} else {
>>> +			udelay(10);
>>>  		}
>>> -		udelay(10);
>>>  	}
>>>  }
>>>  EXPORT_SYMBOL_GPL(sdhci_reset);
>>> @@ -1395,9 +1400,10 @@ void sdhci_send_command(struct sdhci_host
>>> *host, struct mmc_command *cmd) timeout +=
>>> DIV_ROUND_UP(cmd->busy_timeout, 1000) * HZ + HZ; else
>>>  		timeout += 10 * HZ;
>>> -	sdhci_mod_timer(host, cmd->mrq, timeout);
>>>  
>>>  	sdhci_writew(host, SDHCI_MAKE_CMD(cmd->opcode, flags),
>>> SDHCI_COMMAND);
>>> +	/* setup timer after command to avoid fake timeout */
>>> +	sdhci_mod_timer(host, cmd->mrq, timeout);
>>>  }
>>>  EXPORT_SYMBOL_GPL(sdhci_send_command);
>>>  
>>> @@ -1611,12 +1617,19 @@ void sdhci_enable_clk(struct sdhci_host
>>> *host, u16 clk) while (!((clk = sdhci_readw(host,
>>> SDHCI_CLOCK_CONTROL)) & SDHCI_CLOCK_INT_STABLE)) {
>>>  		if (ktime_after(ktime_get(), timeout)) {
>>> +			/* check it again since there is a window
>>> between
>>> +			   status check and time check */
>>> +			if ((clk = sdhci_readw(host,
>>> SDHCI_CLOCK_CONTROL))
>>> +					& SDHCI_CLOCK_INT_STABLE)
>>> +				break;
>>>  			pr_err("%s: Internal clock never
>>> stabilised.\n", mmc_hostname(host->mmc));
>>>  			sdhci_dumpregs(host);
>>>  			return;
>>>  		}
>>> -		udelay(10);
>>> +		else {
>>> +			udelay(10);
>>> +		}
>>>  	}
>>>  
>>>  	clk |= SDHCI_CLOCK_CARD_EN;
>>>   
>>
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sdhci: fix the fake timeout bug
  2018-11-30 14:40     ` Adrian Hunter
@ 2018-12-01  5:42       ` Du, Alek
  2018-12-04  1:01         ` [PATCH V2] sdhci: fix the timeout check window for clock and reset Du, Alek
  2018-12-04 12:47         ` [PATCH] sdhci: fix the fake timeout bug Adrian Hunter
  0 siblings, 2 replies; 14+ messages in thread
From: Du, Alek @ 2018-12-01  5:42 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-mmc, ulf.hansson, linux-kernel

On Fri, 30 Nov 2018 16:40:04 +0200
Adrian Hunter <adrian.hunter@intel.com> wrote:

> So you are saying this only happens under virtualization?
> 

Yes, I saw the issue under ACRN virtualization Service OS (4.19 kernel).
But theoretically it can happen in other case when scheduling is not
that good. (due to bad driver or other high priority task)


> > 
> > Please look at the sdhci_enable_clk() below, there is a window in
> > clock stabilization check. The first check is to check status
> > register, the second check is to check if time passed. That's why I
> > can capture a case that after time passed, the actually clock
> > control register indicated that clock is stable. So the error
> > handling is wrong...  
> 
> Sure, but "Internal clock never stabilised." is not one of the
> errors you listed.

Sorry my bad not listing all the error log:

Case 1. clock stabilization timeout: (the below clock control dump shows clock is good)
[159525.255629] mmc1: Internal clock never stabilised.
[159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[159525.256049] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
[159525.256277] mmc1: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
[159525.256523] mmc1: sdhci: Argument:  0x00000000 | Trn mode: 0x00000000
[159525.256752] mmc1: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000000
[159525.256979] mmc1: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
[159525.257205] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa03

Case 2. Reset timeout: (the same check window in sdhci_reset())
[ 7639.968613] mmc1: Reset 0x4 never completed.

Case 3. Hardware interrupt timeout
[ 1049.561728] mmc1: Timeout waiting for hardware interrupt.

> 
> > 
> > Also the sdhci_send_command commands is not in spin lock There is a
> > windows between mod_timer and real command send...  
> 
> What code path does not have a spin lock?
Ouch my bad, the sdhci_send_command are called either from spinlock or IRQ handler,
so this part is good ...

I'll send a new patch to cover case 1 and case 2 if you agree.

Thanks,
Alek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V2] sdhci: fix the timeout check window for clock and reset
  2018-12-01  5:42       ` Du, Alek
@ 2018-12-04  1:01         ` Du, Alek
  2018-12-04 12:24           ` Adrian Hunter
  2018-12-04 12:47         ` [PATCH] sdhci: fix the fake timeout bug Adrian Hunter
  1 sibling, 1 reply; 14+ messages in thread
From: Du, Alek @ 2018-12-04  1:01 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-mmc, ulf.hansson, linux-kernel

From 87692fc090978bde8fe872f02d0023a57af6b492 Mon Sep 17 00:00:00 2001
From: Alek Du <alek.du@intel.com>
Date: Fri, 30 Nov 2018 14:02:28 +0800
Subject: [PATCH] sdhci: fix the timeout check window for clock and reset

We observed some fake timeouts on some devices, the log is like this:

case 1:
[159525.255629] mmc1: Internal clock never stabilised.
[159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[159525.256049] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
[159525.256277] mmc1: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
[159525.256523] mmc1: sdhci: Argument:  0x00000000 | Trn mode: 0x00000000
[159525.256752] mmc1: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000000
[159525.256979] mmc1: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
[159525.257205] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa03
From the clock control register dump, we are pretty sure the clock was
stablized.

case 2:
[  914.550127] mmc1: Reset 0x2 never completed.
[  914.550321] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[  914.550608] mmc1: sdhci: Sys addr:  0x00000010 | Version:  0x00001002

After checking the sdhci code, we found the timeout check actually has a
little window that the CPU can be scheduled out and when it comes back,
the original time set or check is not valid.

Signed-off-by: Alek Du <alek.du@intel.com>
---
 drivers/mmc/host/sdhci.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 99bdae53fa2e..af01f7d16eae 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -218,12 +218,17 @@ void sdhci_reset(struct sdhci_host *host, u8 mask)
 	/* hw clears the bit when it's done */
 	while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
 		if (ktime_after(ktime_get(), timeout)) {
+			/* check it again, since there is a window between
+			   bit check and time check */
+			if (!(sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask))
+				break;
 			pr_err("%s: Reset 0x%x never completed.\n",
 				mmc_hostname(host->mmc), (int)mask);
 			sdhci_dumpregs(host);
 			return;
+		} else {
+			udelay(10);
 		}
-		udelay(10);
 	}
 }
 EXPORT_SYMBOL_GPL(sdhci_reset);
@@ -1611,12 +1616,19 @@ void sdhci_enable_clk(struct sdhci_host *host, u16 clk)
 	while (!((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
 		& SDHCI_CLOCK_INT_STABLE)) {
 		if (ktime_after(ktime_get(), timeout)) {
+			/* check it again since there is a window between
+			   status check and time check */
+			if ((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
+					& SDHCI_CLOCK_INT_STABLE)
+				break;
 			pr_err("%s: Internal clock never stabilised.\n",
 			       mmc_hostname(host->mmc));
 			sdhci_dumpregs(host);
 			return;
 		}
-		udelay(10);
+		else {
+			udelay(10);
+		}
 	}
 
 	clk |= SDHCI_CLOCK_CARD_EN;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V2] sdhci: fix the timeout check window for clock and reset
  2018-12-04  1:01         ` [PATCH V2] sdhci: fix the timeout check window for clock and reset Du, Alek
@ 2018-12-04 12:24           ` Adrian Hunter
  2018-12-05  3:14             ` [PATCH V3] " Du, Alek
  0 siblings, 1 reply; 14+ messages in thread
From: Adrian Hunter @ 2018-12-04 12:24 UTC (permalink / raw)
  To: Du, Alek; +Cc: linux-mmc, ulf.hansson, linux-kernel

On 4/12/18 3:01 AM, Du, Alek wrote:
>>From 87692fc090978bde8fe872f02d0023a57af6b492 Mon Sep 17 00:00:00 2001
> From: Alek Du <alek.du@intel.com>
> Date: Fri, 30 Nov 2018 14:02:28 +0800
> Subject: [PATCH] sdhci: fix the timeout check window for clock and reset
> 
> We observed some fake timeouts on some devices, the log is like this:

Please change "fake timeouts" to "premature timeouts".  Also I think it is
worth mentioning that this was with a virtualized environment.

> 
> case 1:
> [159525.255629] mmc1: Internal clock never stabilised.
> [159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> [159525.256049] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
> [159525.256277] mmc1: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
> [159525.256523] mmc1: sdhci: Argument:  0x00000000 | Trn mode: 0x00000000
> [159525.256752] mmc1: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000000
> [159525.256979] mmc1: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
> [159525.257205] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa03
>>From the clock control register dump, we are pretty sure the clock was
> stablized.
> 
> case 2:
> [  914.550127] mmc1: Reset 0x2 never completed.
> [  914.550321] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> [  914.550608] mmc1: sdhci: Sys addr:  0x00000010 | Version:  0x00001002
> 
> After checking the sdhci code, we found the timeout check actually has a
> little window that the CPU can be scheduled out and when it comes back,
> the original time set or check is not valid.
> 
> Signed-off-by: Alek Du <alek.du@intel.com>

Please add fixes tag:

Fixes: 5a436cc0af62 ("mmc: sdhci: Optimize delay loops")

And also a stable tag:

Cc: stable@vger.kernel.org      # v4.12+

> ---
>  drivers/mmc/host/sdhci.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> index 99bdae53fa2e..af01f7d16eae 100644
> --- a/drivers/mmc/host/sdhci.c
> +++ b/drivers/mmc/host/sdhci.c
> @@ -218,12 +218,17 @@ void sdhci_reset(struct sdhci_host *host, u8 mask)
>  	/* hw clears the bit when it's done */
>  	while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
>  		if (ktime_after(ktime_get(), timeout)) {
> +			/* check it again, since there is a window between
> +			   bit check and time check */
> +			if (!(sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask))
> +				break;

It would be neater like:

	while (1) {
		bool timedout = ktime_after(ktime_get(), timeout);

		if (!(sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask))
			break;
		if (timedout) {


>  			pr_err("%s: Reset 0x%x never completed.\n",
>  				mmc_hostname(host->mmc), (int)mask);
>  			sdhci_dumpregs(host);
>  			return;
> +		} else {

The if-clause does a 'return', so the 'else' is not needed

> +			udelay(10);
>  		}
> -		udelay(10);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(sdhci_reset);
> @@ -1611,12 +1616,19 @@ void sdhci_enable_clk(struct sdhci_host *host, u16 clk)
>  	while (!((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
>  		& SDHCI_CLOCK_INT_STABLE)) {
>  		if (ktime_after(ktime_get(), timeout)) {
> +			/* check it again since there is a window between
> +			   status check and time check */
> +			if ((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
> +					& SDHCI_CLOCK_INT_STABLE)
> +				break;

Similarly:

	while (1) {
		bool timedout = ktime_after(ktime_get(), timeout);

		clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL);
		if (clk & SDHCI_CLOCK_INT_STABLE)
			break;
		if (timedout) {


>  			pr_err("%s: Internal clock never stabilised.\n",
>  			       mmc_hostname(host->mmc));
>  			sdhci_dumpregs(host);
>  			return;
>  		}
> -		udelay(10);
> +		else {

The if-clause does a 'return', so the 'else' is not needed

> +			udelay(10);
> +		}
>  	}
>  
>  	clk |= SDHCI_CLOCK_CARD_EN;
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] sdhci: fix the fake timeout bug
  2018-12-01  5:42       ` Du, Alek
  2018-12-04  1:01         ` [PATCH V2] sdhci: fix the timeout check window for clock and reset Du, Alek
@ 2018-12-04 12:47         ` Adrian Hunter
  1 sibling, 0 replies; 14+ messages in thread
From: Adrian Hunter @ 2018-12-04 12:47 UTC (permalink / raw)
  To: Du, Alek; +Cc: linux-mmc, ulf.hansson, linux-kernel

On 1/12/18 7:42 AM, Du, Alek wrote:
> On Fri, 30 Nov 2018 16:40:04 +0200
> Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> So you are saying this only happens under virtualization?
>>
> 
> Yes, I saw the issue under ACRN virtualization Service OS (4.19 kernel).
> But theoretically it can happen in other case when scheduling is not
> that good. (due to bad driver or other high priority task)
> 
> 
>>>
>>> Please look at the sdhci_enable_clk() below, there is a window in
>>> clock stabilization check. The first check is to check status
>>> register, the second check is to check if time passed. That's why I
>>> can capture a case that after time passed, the actually clock
>>> control register indicated that clock is stable. So the error
>>> handling is wrong...  
>>
>> Sure, but "Internal clock never stabilised." is not one of the
>> errors you listed.
> 
> Sorry my bad not listing all the error log:
> 
> Case 1. clock stabilization timeout: (the below clock control dump shows clock is good)
> [159525.255629] mmc1: Internal clock never stabilised.
> [159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> [159525.256049] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
> [159525.256277] mmc1: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
> [159525.256523] mmc1: sdhci: Argument:  0x00000000 | Trn mode: 0x00000000
> [159525.256752] mmc1: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000000
> [159525.256979] mmc1: sdhci: Power:     0x0000000b | Blk gap:  0x00000080
> [159525.257205] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa03
> 
> Case 2. Reset timeout: (the same check window in sdhci_reset())
> [ 7639.968613] mmc1: Reset 0x4 never completed.
> 
> Case 3. Hardware interrupt timeout
> [ 1049.561728] mmc1: Timeout waiting for hardware interrupt.
> 
>>
>>>
>>> Also the sdhci_send_command commands is not in spin lock There is a
>>> windows between mod_timer and real command send...  
>>
>> What code path does not have a spin lock?
> Ouch my bad, the sdhci_send_command are called either from spinlock or IRQ handler,
> so this part is good ...
> 
> I'll send a new patch to cover case 1 and case 2 if you agree.

Please do the mod_timer case also, but please make it a separate patch.
Prior to v4.18 it was essentially a 10-second timer, without a
preemptible gap afterwards, so extremely unlikely to timeout
prematurely, hence:

Fixes: fc1fa1b7db275 ("mmc: sdhci: Program a relatively accurate SW timeout value")
Cc: stable@vger.kernel.org      # v4.18+

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V3] sdhci: fix the timeout check window for clock and reset
  2018-12-04 12:24           ` Adrian Hunter
@ 2018-12-05  3:14             ` Du, Alek
  2018-12-05 11:16               ` Adrian Hunter
  2018-12-05 14:20               ` Ulf Hansson
  0 siblings, 2 replies; 14+ messages in thread
From: Du, Alek @ 2018-12-05  3:14 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-mmc, ulf.hansson, linux-kernel

From bcb38577961823b4f8f2cc0aec7dd450a81b6ddc Mon Sep 17 00:00:00 2001
From: Alek Du <alek.du@intel.com>
Date: Fri, 30 Nov 2018 14:02:28 +0800
Subject: [PATCH] sdhci: fix the timeout check window for clock and reset

We observed some premature timeouts on a virtualization platform, the log
is like this:

case 1:
[159525.255629] mmc1: Internal clock never stabilised.
[159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[159525.256049] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
...
[159525.257205] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa03
From the clock control register dump, we are pretty sure the clock was
stablized.

case 2:
[  914.550127] mmc1: Reset 0x2 never completed.
[  914.550321] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[  914.550608] mmc1: sdhci: Sys addr:  0x00000010 | Version:  0x00001002

After checking the sdhci code, we found the timeout check actually has a
little window that the CPU can be scheduled out and when it comes back,
the original time set or check is not valid.

Fixes: 5a436cc0af62 ("mmc: sdhci: Optimize delay loops")
Signed-off-by: Alek Du <alek.du@intel.com>
Cc: stable@vger.kernel.org      # v4.12+
---
 drivers/mmc/host/sdhci.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 99bdae53fa2e..451b08a818a9 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -216,8 +216,12 @@ void sdhci_reset(struct sdhci_host *host, u8 mask)
 	timeout = ktime_add_ms(ktime_get(), 100);
 
 	/* hw clears the bit when it's done */
-	while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
-		if (ktime_after(ktime_get(), timeout)) {
+	while (1) {
+		bool timedout = ktime_after(ktime_get(), timeout);
+
+		if (!(sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask))
+			break;
+		if (timedout) {
 			pr_err("%s: Reset 0x%x never completed.\n",
 				mmc_hostname(host->mmc), (int)mask);
 			sdhci_dumpregs(host);
@@ -1608,9 +1612,13 @@ void sdhci_enable_clk(struct sdhci_host *host, u16 clk)
 
 	/* Wait max 20 ms */
 	timeout = ktime_add_ms(ktime_get(), 20);
-	while (!((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
-		& SDHCI_CLOCK_INT_STABLE)) {
-		if (ktime_after(ktime_get(), timeout)) {
+	while (1) {
+		bool timedout = ktime_after(ktime_get(), timeout);
+
+		clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL);
+		if (clk & SDHCI_CLOCK_INT_STABLE)
+			break;
+		if (timedout) {
 			pr_err("%s: Internal clock never stabilised.\n",
 			       mmc_hostname(host->mmc));
 			sdhci_dumpregs(host);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V3] sdhci: fix the timeout check window for clock and reset
  2018-12-05  3:14             ` [PATCH V3] " Du, Alek
@ 2018-12-05 11:16               ` Adrian Hunter
  2018-12-05 14:20               ` Ulf Hansson
  1 sibling, 0 replies; 14+ messages in thread
From: Adrian Hunter @ 2018-12-05 11:16 UTC (permalink / raw)
  To: Du, Alek; +Cc: linux-mmc, ulf.hansson, linux-kernel

On 5/12/18 5:14 AM, Du, Alek wrote:
>>From bcb38577961823b4f8f2cc0aec7dd450a81b6ddc Mon Sep 17 00:00:00 2001
> From: Alek Du <alek.du@intel.com>
> Date: Fri, 30 Nov 2018 14:02:28 +0800
> Subject: [PATCH] sdhci: fix the timeout check window for clock and reset
> 
> We observed some premature timeouts on a virtualization platform, the log
> is like this:
> 
> case 1:
> [159525.255629] mmc1: Internal clock never stabilised.
> [159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> [159525.256049] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
> ...
> [159525.257205] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa03
>>From the clock control register dump, we are pretty sure the clock was
> stablized.
> 
> case 2:
> [  914.550127] mmc1: Reset 0x2 never completed.
> [  914.550321] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> [  914.550608] mmc1: sdhci: Sys addr:  0x00000010 | Version:  0x00001002
> 
> After checking the sdhci code, we found the timeout check actually has a
> little window that the CPU can be scheduled out and when it comes back,
> the original time set or check is not valid.
> 
> Fixes: 5a436cc0af62 ("mmc: sdhci: Optimize delay loops")
> Signed-off-by: Alek Du <alek.du@intel.com>
> Cc: stable@vger.kernel.org      # v4.12+

Acked-by: Adrian Hunter <adrian.hunter@intel.com>

> ---
>  drivers/mmc/host/sdhci.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> index 99bdae53fa2e..451b08a818a9 100644
> --- a/drivers/mmc/host/sdhci.c
> +++ b/drivers/mmc/host/sdhci.c
> @@ -216,8 +216,12 @@ void sdhci_reset(struct sdhci_host *host, u8 mask)
>  	timeout = ktime_add_ms(ktime_get(), 100);
>  
>  	/* hw clears the bit when it's done */
> -	while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
> -		if (ktime_after(ktime_get(), timeout)) {
> +	while (1) {
> +		bool timedout = ktime_after(ktime_get(), timeout);
> +
> +		if (!(sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask))
> +			break;
> +		if (timedout) {
>  			pr_err("%s: Reset 0x%x never completed.\n",
>  				mmc_hostname(host->mmc), (int)mask);
>  			sdhci_dumpregs(host);
> @@ -1608,9 +1612,13 @@ void sdhci_enable_clk(struct sdhci_host *host, u16 clk)
>  
>  	/* Wait max 20 ms */
>  	timeout = ktime_add_ms(ktime_get(), 20);
> -	while (!((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
> -		& SDHCI_CLOCK_INT_STABLE)) {
> -		if (ktime_after(ktime_get(), timeout)) {
> +	while (1) {
> +		bool timedout = ktime_after(ktime_get(), timeout);
> +
> +		clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL);
> +		if (clk & SDHCI_CLOCK_INT_STABLE)
> +			break;
> +		if (timedout) {
>  			pr_err("%s: Internal clock never stabilised.\n",
>  			       mmc_hostname(host->mmc));
>  			sdhci_dumpregs(host);
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V3] sdhci: fix the timeout check window for clock and reset
  2018-12-05  3:14             ` [PATCH V3] " Du, Alek
  2018-12-05 11:16               ` Adrian Hunter
@ 2018-12-05 14:20               ` Ulf Hansson
  2018-12-05 23:33                 ` [PATCH V3 rebase] mmc: " Du, Alek
  1 sibling, 1 reply; 14+ messages in thread
From: Ulf Hansson @ 2018-12-05 14:20 UTC (permalink / raw)
  To: alek.du; +Cc: Adrian Hunter, linux-mmc, Linux Kernel Mailing List

On Wed, 5 Dec 2018 at 04:14, Du, Alek <alek.du@intel.com> wrote:
>
> From bcb38577961823b4f8f2cc0aec7dd450a81b6ddc Mon Sep 17 00:00:00 2001
> From: Alek Du <alek.du@intel.com>
> Date: Fri, 30 Nov 2018 14:02:28 +0800
> Subject: [PATCH] sdhci: fix the timeout check window for clock and reset

Please change the prefix to: "mmc: sdhci".

>
> We observed some premature timeouts on a virtualization platform, the log
> is like this:
>
> case 1:
> [159525.255629] mmc1: Internal clock never stabilised.
> [159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> [159525.256049] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
> ...
> [159525.257205] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa03
> From the clock control register dump, we are pretty sure the clock was
> stablized.
>
> case 2:
> [  914.550127] mmc1: Reset 0x2 never completed.
> [  914.550321] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> [  914.550608] mmc1: sdhci: Sys addr:  0x00000010 | Version:  0x00001002
>
> After checking the sdhci code, we found the timeout check actually has a
> little window that the CPU can be scheduled out and when it comes back,
> the original time set or check is not valid.
>
> Fixes: 5a436cc0af62 ("mmc: sdhci: Optimize delay loops")
> Signed-off-by: Alek Du <alek.du@intel.com>
> Cc: stable@vger.kernel.org      # v4.12+

I intended to queue this up as fix, but couldn't apply it. Can you
please re-base this on top of v4.20-rc5 or later?

Kind regards
Uffe

> ---
>  drivers/mmc/host/sdhci.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> index 99bdae53fa2e..451b08a818a9 100644
> --- a/drivers/mmc/host/sdhci.c
> +++ b/drivers/mmc/host/sdhci.c
> @@ -216,8 +216,12 @@ void sdhci_reset(struct sdhci_host *host, u8 mask)
>         timeout = ktime_add_ms(ktime_get(), 100);
>
>         /* hw clears the bit when it's done */
> -       while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
> -               if (ktime_after(ktime_get(), timeout)) {
> +       while (1) {
> +               bool timedout = ktime_after(ktime_get(), timeout);
> +
> +               if (!(sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask))
> +                       break;
> +               if (timedout) {
>                         pr_err("%s: Reset 0x%x never completed.\n",
>                                 mmc_hostname(host->mmc), (int)mask);
>                         sdhci_dumpregs(host);
> @@ -1608,9 +1612,13 @@ void sdhci_enable_clk(struct sdhci_host *host, u16 clk)
>
>         /* Wait max 20 ms */
>         timeout = ktime_add_ms(ktime_get(), 20);
> -       while (!((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
> -               & SDHCI_CLOCK_INT_STABLE)) {
> -               if (ktime_after(ktime_get(), timeout)) {
> +       while (1) {
> +               bool timedout = ktime_after(ktime_get(), timeout);
> +
> +               clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL);
> +               if (clk & SDHCI_CLOCK_INT_STABLE)
> +                       break;
> +               if (timedout) {
>                         pr_err("%s: Internal clock never stabilised.\n",
>                                mmc_hostname(host->mmc));
>                         sdhci_dumpregs(host);
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH V3 rebase] mmc: sdhci: fix the timeout check window for clock and reset
  2018-12-05 14:20               ` Ulf Hansson
@ 2018-12-05 23:33                 ` Du, Alek
  2018-12-06  7:55                   ` Ulf Hansson
  0 siblings, 1 reply; 14+ messages in thread
From: Du, Alek @ 2018-12-05 23:33 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: Adrian Hunter, linux-mmc, Linux Kernel Mailing List

From a081e783383adf1179c71bc37b4e199d087af643 Mon Sep 17 00:00:00 2001
From: Alek Du <alek.du@intel.com>
Date: Fri, 30 Nov 2018 14:02:28 +0800
Subject: [PATCH] mmc: sdhci: fix the timeout check window for clock and reset

We observed some premature timeouts on a virtualization platform, the log
is like this:

case 1:
[159525.255629] mmc1: Internal clock never stabilised.
[159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[159525.256049] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
...
[159525.257205] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa03
From the clock control register dump, we are pretty sure the clock was
stablized.

case 2:
[  914.550127] mmc1: Reset 0x2 never completed.
[  914.550321] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[  914.550608] mmc1: sdhci: Sys addr:  0x00000010 | Version:  0x00001002

After checking the sdhci code, we found the timeout check actually has a
little window that the CPU can be scheduled out and when it comes back,
the original time set or check is not valid.

Fixes: 5a436cc0af62 ("mmc: sdhci: Optimize delay loops")
Cc: stable@vger.kernel.org      # v4.12+
Signed-off-by: Alek Du <alek.du@intel.com>
---
 drivers/mmc/host/sdhci.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 99bdae53fa2e..451b08a818a9 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -216,8 +216,12 @@ void sdhci_reset(struct sdhci_host *host, u8 mask)
 	timeout = ktime_add_ms(ktime_get(), 100);
 
 	/* hw clears the bit when it's done */
-	while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
-		if (ktime_after(ktime_get(), timeout)) {
+	while (1) {
+		bool timedout = ktime_after(ktime_get(), timeout);
+
+		if (!(sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask))
+			break;
+		if (timedout) {
 			pr_err("%s: Reset 0x%x never completed.\n",
 				mmc_hostname(host->mmc), (int)mask);
 			sdhci_dumpregs(host);
@@ -1608,9 +1612,13 @@ void sdhci_enable_clk(struct sdhci_host *host, u16 clk)
 
 	/* Wait max 20 ms */
 	timeout = ktime_add_ms(ktime_get(), 20);
-	while (!((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
-		& SDHCI_CLOCK_INT_STABLE)) {
-		if (ktime_after(ktime_get(), timeout)) {
+	while (1) {
+		bool timedout = ktime_after(ktime_get(), timeout);
+
+		clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL);
+		if (clk & SDHCI_CLOCK_INT_STABLE)
+			break;
+		if (timedout) {
 			pr_err("%s: Internal clock never stabilised.\n",
 			       mmc_hostname(host->mmc));
 			sdhci_dumpregs(host);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 rebase] mmc: sdhci: fix the timeout check window for clock and reset
  2018-12-05 23:33                 ` [PATCH V3 rebase] mmc: " Du, Alek
@ 2018-12-06  7:55                   ` Ulf Hansson
  2018-12-06  9:28                     ` Du, Alek
  0 siblings, 1 reply; 14+ messages in thread
From: Ulf Hansson @ 2018-12-06  7:55 UTC (permalink / raw)
  To: alek.du; +Cc: Adrian Hunter, linux-mmc, Linux Kernel Mailing List

On Thu, 6 Dec 2018 at 00:33, Du, Alek <alek.du@intel.com> wrote:
>
> From a081e783383adf1179c71bc37b4e199d087af643 Mon Sep 17 00:00:00 2001
> From: Alek Du <alek.du@intel.com>
> Date: Fri, 30 Nov 2018 14:02:28 +0800
> Subject: [PATCH] mmc: sdhci: fix the timeout check window for clock and reset
>
> We observed some premature timeouts on a virtualization platform, the log
> is like this:
>
> case 1:
> [159525.255629] mmc1: Internal clock never stabilised.
> [159525.255818] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> [159525.256049] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
> ...
> [159525.257205] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x0000fa03
> From the clock control register dump, we are pretty sure the clock was
> stablized.
>
> case 2:
> [  914.550127] mmc1: Reset 0x2 never completed.
> [  914.550321] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> [  914.550608] mmc1: sdhci: Sys addr:  0x00000010 | Version:  0x00001002
>
> After checking the sdhci code, we found the timeout check actually has a
> little window that the CPU can be scheduled out and when it comes back,
> the original time set or check is not valid.
>
> Fixes: 5a436cc0af62 ("mmc: sdhci: Optimize delay loops")
> Cc: stable@vger.kernel.org      # v4.12+
> Signed-off-by: Alek Du <alek.du@intel.com>

I am still not able to apply this, however the reason is not that a
rebase is needed.

Instead it seems like the patch is malformed, exactly why I don't
know. I am using patchwork to fetch the patch, but I even tried
fetching it directly via my gmail client, no luck. In both case it
can't be applied and checkpatch is also complaining.

Could you check at your side and see what can be wrong? If there is
too much hazzle I can manually fixup the patch next week, as one time
thing.

Kind regards
Uffe

> ---
>  drivers/mmc/host/sdhci.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> index 99bdae53fa2e..451b08a818a9 100644
> --- a/drivers/mmc/host/sdhci.c
> +++ b/drivers/mmc/host/sdhci.c
> @@ -216,8 +216,12 @@ void sdhci_reset(struct sdhci_host *host, u8 mask)
>         timeout = ktime_add_ms(ktime_get(), 100);
>
>         /* hw clears the bit when it's done */
> -       while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
> -               if (ktime_after(ktime_get(), timeout)) {
> +       while (1) {
> +               bool timedout = ktime_after(ktime_get(), timeout);
> +
> +               if (!(sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask))
> +                       break;
> +               if (timedout) {
>                         pr_err("%s: Reset 0x%x never completed.\n",
>                                 mmc_hostname(host->mmc), (int)mask);
>                         sdhci_dumpregs(host);
> @@ -1608,9 +1612,13 @@ void sdhci_enable_clk(struct sdhci_host *host, u16 clk)
>
>         /* Wait max 20 ms */
>         timeout = ktime_add_ms(ktime_get(), 20);
> -       while (!((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL))
> -               & SDHCI_CLOCK_INT_STABLE)) {
> -               if (ktime_after(ktime_get(), timeout)) {
> +       while (1) {
> +               bool timedout = ktime_after(ktime_get(), timeout);
> +
> +               clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL);
> +               if (clk & SDHCI_CLOCK_INT_STABLE)
> +                       break;
> +               if (timedout) {
>                         pr_err("%s: Internal clock never stabilised.\n",
>                                mmc_hostname(host->mmc));
>                         sdhci_dumpregs(host);
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH V3 rebase] mmc: sdhci: fix the timeout check window for clock and reset
  2018-12-06  7:55                   ` Ulf Hansson
@ 2018-12-06  9:28                     ` Du, Alek
  0 siblings, 0 replies; 14+ messages in thread
From: Du, Alek @ 2018-12-06  9:28 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: Adrian Hunter, linux-mmc, Linux Kernel Mailing List

On Thu, 6 Dec 2018 08:55:04 +0100
Ulf Hansson <ulf.hansson@linaro.org> wrote:

> Could you check at your side and see what can be wrong? If there is
> too much hazzle I can manually fixup the patch next week, as one time
> thing.

Sorry for the confusion, I just resent with git send-email to avoid any
thing unexpected. Please try again.



> 
> Kind regards
> Uffe
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-12-06  9:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-30  7:00 [PATCH] sdhci: fix the fake timeout bug Du, Alek
2018-11-30  9:19 ` Adrian Hunter
2018-11-30 14:13   ` Du, Alek
2018-11-30 14:40     ` Adrian Hunter
2018-12-01  5:42       ` Du, Alek
2018-12-04  1:01         ` [PATCH V2] sdhci: fix the timeout check window for clock and reset Du, Alek
2018-12-04 12:24           ` Adrian Hunter
2018-12-05  3:14             ` [PATCH V3] " Du, Alek
2018-12-05 11:16               ` Adrian Hunter
2018-12-05 14:20               ` Ulf Hansson
2018-12-05 23:33                 ` [PATCH V3 rebase] mmc: " Du, Alek
2018-12-06  7:55                   ` Ulf Hansson
2018-12-06  9:28                     ` Du, Alek
2018-12-04 12:47         ` [PATCH] sdhci: fix the fake timeout bug Adrian Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).